Posted by

Finding similar proteins or protein sequences by aligning has many applications in biological field.

In computational biology, there are three algorithms used to find sequences using dynamic programming. These are Needleman–Wunsch algorithm , Smith–Waterman algorithm, Hirschberg's algorithm .

All these algorithms use matrix substitution and gap scoring to retrieve globally aligned sequences, with variations in implementation. This can be used for aligning local sequences too.

First drawback in these existing solutions is High Performance Computing. Genome comparison is solved by throwing more processing power which is not long term solution. Even if HPC is used, it takes long time to solve this issue.

What if there is a method to solve this in a easier way? Enter Graphical Methods and Graphical Reasoning.

Instead of searching by matrix substitution methods, using Graph database based Graphical Reasoning algorithm to find the aligned sequences can compare and search in bio sequences in easier, faster, accurate way.

Check out the demo NaturalText Protein Search

Details of Machine used

Data used : Random FASTA formatted downloaded from NCBI
Number of Protein Sequences : 25000
Database :  custom developed General Purpose database as Graph Database
Graph Algorithm :  Custom developed Graph Framework
Hardware Details : 2 core, 2 GB RAM. 
Execution Details : Pure Python based single process execution

Searching in 25k proteins possible with 2GB memory.

What is the significance of this?

Comparing millions of genes is possible in low cost machine. This has a potential to take genome analysis and personalized medicine to masses.

Related Posts:

Generating new sentences in Natural Language as a Graph Clique Problem : Graphical Reasoning based solution

Natural Language Grammar Correction using Graphical Logical Reasoning

Contact Email

Follow @naturaltext