Finding similar proteins or protein sequences by aligning has many applications in biological field.
In computational biology, there are three algorithms used to find sequences using dynamic programming. These are Needleman–Wunsch algorithm , Smith–Waterman algorithm, Hirschberg's algorithm .
All these algorithms use matrix substitution and gap scoring to retrieve globally aligned sequences, with variations in implementation. This can be used for aligning local sequences too.
First drawback in these existing solutions is High Performance Computing. Genome comparison is solved by throwing more processing power which is not long term solution. Even if HPC is used, it takes long time to solve this issue.
What if there is a method to solve this in a easier way? Enter Graphical Methods and Graphical Reasoning.
Instead of searching by matrix substitution methods, using Graph database based Graphical Reasoning algorithm to find the aligned sequences can compare and search in bio sequences in easier, faster, accurate way.
Check out the demo NaturalText Protein Search
Details of Machine used
Data used : Random FASTA formatted downloaded from NCBI Number of Protein Sequences : 25000 Database : custom developed General Purpose database as Graph Database Graph Algorithm : Custom developed Graph Framework Hardware Details : 2 core, 2 GB RAM. Execution Details : Pure Python based single process execution
Searching in 25k proteins possible with 2GB memory.
What is the significance of this?
Comparing millions of genes is possible in low cost machine. This has a potential to take genome analysis and personalized medicine to masses.
Generating new sentences in Natural Language as a Graph Clique Problem : Graphical Reasoning based solution
Natural Language Grammar Correction using Graphical Logical Reasoning
Contact Email firstname.lastname@example.orgFollow @naturaltext Tweet to @naturaltext