Posted by

Generating grammaticality correct sentences is a challenge in Natural Language processing. The algorithm should generate sentences not only from existing examples but new possible sentences from existing examples. Otherwise the work is heuristic search rather learning or intelligence.

If the sentence is assumed as a sequence of words following a graphical model, then we can use graphical algorithms to generate new sentences. But this hits the hard wall of clique problem which is yet to be solved.

Assuming grammar as a clique problem, we need directed cliques instead of undirected cliques. If finding undirected cliques is a harder problem, then finding directed cliques is the hardest problem. Our current algorithm trying to solve this by generating new word sequences by generating directed cliques.

Results shown here is not a general proof of clique problem as I havent tested current algorithm on other problem areas such as social network data. This shows how the new sentences can be generated using graphical reasoning that may solve clique issue in Natural Language Grammar and other areas.

Take a look at the below figures.

The sentences generated are

I would warmly
I warmly welcome
I would welcome
I welcome

I would warmly
I warmly for
I would for
I for

I would warmly
I warmly encourage
I would encourage
I encourage

I would warmly
I warmly support
I would support
I support

I would warmly
I warmly thank
I would thank
I thank

what are the new sentences are inferred from this???

I would warmly thank
I would warmly for
I would warmly encourage
I would warmly support
I would warmly welcome

Is all the inferences are found in the original text? No. Some are found correct in the text itself and others are found grammatically correct out of the original text.

In this example, "I would warmly for" may not be grammatically correct. Why it is generated? Because Roman Letter I and pronoun I are not distinguished in the original text. Example: ' the proposal of "SIS I for All” ' . So this also generates "I is" kind of errors too.

"I are" is also generated because "My colleagues and I are" kind of lines in original text. This can be rectified by using an another algorithm that checks for global grammar.

Total number of words in the text : 40723
Total number of inferences made : 3075
Number of unique words in inferences : 369
Time taken : 13 secs

Thus analyzing 40k words, generating 3k new inferences in 13 secs. This is for taking a word as a root word. Theoretically all the words in the text can be analyzed to generate new sentences.

Check all the generated sequences here. Key1 to Key 4 are extracted from graph. Key5 is the inference generated.

As this is a four word sentence, extending this to 5, 6, 7 … words can be done.

Data used : Random text from EU parliament transcript
Database :  custom developed General Purpose database as Graph Database
Graph Algorithm :  custom developed Graph Framework
Hardware Details : Intel i7 8 core, 16 GB RAM. 
Execution Details : Pure Python based single thread execution

I havent validated all the inferences manually, owning to time constraints. I have verified examples randomly in order to check for grammatical correctness. Would be happy to correct if anyone disproves this.

Vist our Facebook Page to comment on this blog.

Related Posts:

Protein Alignment and Search by Graphical Models

Natural Language Grammar Correction using Graphical Logical Reasoning

Credits: Clique Image generated via d3js with help from d3noob's Blocks

Contact Email

Follow @naturaltext