Published:
Posted by

Contents

1. NaturalText Technology

2. Applications in Life Sciences

3. Examples

4. Published Datasets

5. Training Algorithm for specific tasks

6. Cloud Deployment and Data Protection

7. About NaturalText

1. NaturalText Technology

NaturalText applications based on Unsupervised Deep Learning Algorithm that can auto-magically learn from input data. It can use seed values such as English Dictionary, Thesaurus, Wikidata to learn and find relationships and new values of same type.

Results are explained with reasons which unique in the Industry. It is easy to understand why a particular value is grouped or relationship is made. Algorithm is Data and Volume agnostic. Can be used any type, volume of data. It can also work in any language including Chinese and Japanese.

Algorithms created using Graph based Sequence to Sequence Prediction methods. Entire Technology built from scratch. NaturalText uses Python and Rust for coding.

2. Applications in Life Sciences

NaturalText’s technology can be applied to merging various information silos, extracting actionable information and predicting new facts. Applying Machine Learning to Life Sciences would change the way research and development done in the Industry. NaturalText’s technology also works in traditional sequences prediction methods such as alignment, variation finding.

NaturalText works to make information process faster, economically viable, makes sense to researchers.

There are two broad categories in which Machine Learning can be used in Life Sciences.

2.1 Extracting and Combining Information from Multiple Sources

In a complex field such as Life Sciences, information will be spread over multiple silos such as scientific papers, clinical trail reports, patents apart from wet lab extractions such as biological sequences.

Trying to combine these information manually is not possible by going through all the material and also not viable in economic sense. Existing Technologies too cant assist humans in gathering relevant information to reduce the time and workload.

NaturalText’s Text Analysis and Machine Learning algorithms can learn patterns automagically without manual work such as tagging or classification.

This can be applied to extracting information from scientific papers, comparing multiple databases such as protein and disease databases to create whole database and assist professional researchers in finding the information easily.

2.2 Create New Knowledge from Existing Information

Learned predictions are another way to understand the complex spread of information, to get commercial and research value out of it.

For example, if some genes are responsible for some disease based on certain properties which is mentioned in the scientific papers, we can search for genes mentioned with same properties so that a clear picture about that disease and genes would emerge. Doing that manually in million of papers would take many months, even if researchers use latest search tools.

Instead Machine Learning methods can analyse the information, learn the patterns, make predictions and show resulting predictions with reasons which will help scientists to make decisions.

Like having ultra capable assistant to do the complex information processing works.

3 Examples

Let us see some concrete examples to understand how NaturalText’s technologies can help.

3.1 Extracting Relationships from Scientific Papers

PubMed data has around 7 million published papers. If a researcher has to find information about a particular set of genes which may cause some disease, researcher has to search through that millions of documents using a search engine, locate the particular line or paragraph where the information is mentioned, then make a note, compare, removed duplicates etc.

This is inefficient, highly time consuming task. Here is how NaturalText can help in this situation.

NaturalText’s algorithms can sift through hundred millions of lines for potential relationships. Once the relevant lines were found, it will be further analysed for extracting relationships. Extracted relationships will be shown with the reason for why this particular result is shown. All with in hours, using cheap commodity servers.

3.2 Reasoning for Finding New Information

Finding relationships would present another challenge if the total number of extracted relationships in the range of thousands. In that, how it is possible to find out the required relationship or test if a particular relationship exists? Even rank the information so that Scientists can decide which one to look at first?

In order to eliminate redundant relationships and find new or relevant information, NaturalText uses Graphical Reasoning based on graph algorithms to reason out the raw data.

The relationships were checked, sorted and ranked based on relevance.

3.3 Merging Multiple Databases or Datasets

Information silos as in multiple databases and datasets are harsh reality of the most of the information collections and Life Sciences is not exempted from that.

Managing and merging multiple datasets which may have some of the data in common and also contain relationships to the other information would be hard if done manually. If the datasets are in hundreds, it is impossible.

Using combination of pattern learning and reasoning, NaturalText algorithms can match and merge the datasets.

3.4 Learned Prediction for Patterns, Structures, Sequences

Searching for similarity of the information would unlock hidden information which cant be seen or searched by Humans.

For example, if a protein with particular properties is of interest, it would be prudent to check for similar proteins by comparing the properties. Comparing bio sequences by properties is easy and cost effective using NaturalText’s unique algorithms.

Algorithm first learns the existing patterns, uses that as pattern or model to find same kind of information without explicitly coded.

4 Published Datasets

To check the claims, NaturalText published two datasets.

Hierarchical Clustering of 7 million Proteins : 7 Million Proteins from NCBI is grouped based on sequence similarity using single machine under an hour.

Similar sentences Clustered Data : 60 million sentences from PubMed articles has been clustered and published.

Sequence to Sequence Prediction (seq2seq) for DNA, Proteins Analysis : Tech Write-up explaining how NaturalText’s Sequence to Sequence works

5 Training Algorithm for specific tasks

As this is trainable algorithm, using Rosetta Stone like comparable data to train the algorithm is possible. If a data exists in two formats ie one in text, another in Spreadsheet , algorithm can be trained to make connections using that data, which can be applied to new documents.

6 Cloud Deployment and Data Protection

NaturalText Application can only be deployed Cloud based datacenters as it needs to be trained and models needs to be created before it can be applied to real data.

Standard Data Protection and Compliance is offered including Third Party Auditing. For example, AWS deployment has its own Data Protection and Security Services which can used when hosting in AWS.

7 About NaturalText

Started 2015, NaturalText team is working in Language and Machine Learning Technologies to solve the issues in Natural Language Understanding, Artificial Intelligence and Reasoning. It is part of Siva Raja Technologies Pvt Ltd, and funded by family and friends. NaturalText is based on Chennai, India.

NaturalText accepted into Nasscom DeepTech club on Nov 2017.

Rajasankar Viswanathan is the Founder of NaturalText, has 15 years of experience in Software Industry.

Contact Email rajasankar@naturaltext.com