Contex2Vec mines/groups words based on the context used. Like the famous Word2vec by Google, NaturalText's Contex2Vec finds the similarity of words based on context and uses machine learning techniques to generate that context vector.
Converting words into numbers by a "magic method" is the dream of those working in Natural Language processing , because numbers can be processed more easily then words. The "magic method" can be vectorization or machine learning, natural language processing or combination of any of these. Vectorization and machine learning, is used in Contex2Vec.
Ever since, Word2Vec introduced , that dream looks like reality. Asking the question of, if woman is to man , what is for king?, Word2Vec would be return the answer of queen.
Contex2Vec by NaturalText, to extract words based on the context. For example data base and information flow can be similar in context. and, 1/3 can be equal to one-third.
Past 5 months data of US Patents released by USPTO is used for this analysis. Using over 200k words, Contex2Vec extracted around 9k words based on similarity.
This dataset processed in AMD 6 core processor with 16 GB memory for 15 hours, including preprocessing of data, extracting words, learning vectors and comparing each word with other. This is used without any GPU processing or deep learning. Original size of text data is 5 GB in which words occurring more than 50 contexts were used for this analysis. What is considered as a context is generated by the Contex2Vec algorithm.
Contex2Vec algorithm can mine any text data including legal, medicine etc to generate contextually similar words to be used in semantic search and other Natural Language processing tasks.
Take a look at some example groupings by Contex2Vec
embodiment disclosed example shown invention described process described
above-described action taken adversary conditions described elements described embodiment described embodiment disclosed embodiment shown embodiments described example provided example shown examples provided features disclosed functionality described functions described illustrated embodiments illustrative embodiment information described invention described invention disclosed materials described method described methods described n-type semiconductor layer organic light emitting element procedures described process described processing described references cited steps described structures described terms used
As you can see our Contex2Vec algorithm groups all the words that may give the meaning of example. This can be used in search, rewriting etc.
100 nm 20 mm 50 nm
1% 30% 50%
3 hours 30 minutes 30 mm 48 hours 5 hours
1000° C. 110° C. 130° C. 140° C. 160° C.
1.5 times 10 times 2 times 4 times five times four times located outside made aware
images captured information obtained information provided
light emitted light generated
Figures above described embodiments above-described embodiments embodiments described above foregoing embodiments schemes
As with any other Machine Learning technique, Contex2Vec also generates incorrect pairs. Take a look at it.
1.5 times collectively referred formed along located outside made using preserve
This algorithm can be applied to various Natural Processing tasks.
Check the words alphabetically listed in Our Demo
Email email@example.com, to analyze your text dataFollow @naturaltext Tweet to @naturaltext