What happens if an algorithm takes Grammar Test? Would it pass?

Well, it seems that it can.

I have created Grammar Checker for English using pattern rules mined by a ML algorithm. One of the ways to check the efficiency of the algorithm is to feed the incorrect/missing lines and check if it finds correct answer. For missing/incorrect word, algorithm suggests words which could be found.

So what is the result?

For a total of 881 lines fed, algorithm finds only exact matches for 47 lines. A mere 5% of the total lines. However, for 144 lines, exact word is in top 5 words suggested. For 204 words, the exact word is in top 10 words suggested. It found the correct word for 560 lines but those words were deep buried in the suggested words.

Test | Answered | Percentage |
---|---|---|

Correct answer found | 47 | 5% |

Answer in top 5 suggestions | 144 | 16% |

Answer in top 10 suggestions | 204 | 23% |

Answer found | 560 | 63% |

Find all results in sortable Html. Also in CSV format

Algorithm used 20GB of raw text data, so it may not have all the available data to find correct answer for most of the lines. In age of petabytes of data, 20 GB may be very small to have any desired effect. But it still finds answers for 20% of questions.

for the question, *_____ your boyfriend is waiting for you outside in the car.*

The answer is **Barbara**. But if the word is not in data, it may not possible for the algorithm to find it. Algorithm may find other words which would be grammatically correct too.

Another issue is, algorithm couldn't differentiate between ** Roman number I** and

**word**. So, it finds,

*I***I is**also correct.

There are cases where the answer is incorrect but algorithm returns grammatically correct answer too. For this question, ** I certainly will if I _____**, the answer is

**can**but algorithm returned

**considered,**which could be correct too.

What happens, if more data is added and model is retrained? Answering percentage may improve. Can the accuracy of the system can be improved with existing data? May be possible to improve the suggested words ranking and finding a way to avoid **I is** kind of errors.

In coming days, I will keep tweaking the algorithm and posting the results along with details about dataset, method used, would the same algorithm can be applied to solving other problems?.

This test done in local data and not yet opened to public. I will host that in coming weeks

Email rajasankar@naturaltext.com, to analyze your text data

Follow @naturaltext Tweet to @naturaltext