Detecting and correcting real-word errors in Tamil sentences

Show simple item record

dc.contributor.author Sakuntharaj, R.
dc.contributor.author Mahesan, S.
dc.date.accessioned 2023-02-02T04:01:42Z
dc.date.available 2023-02-02T04:01:42Z
dc.date.issued 2018-02-15
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10665
dc.description.abstract The spell checker concerns the two types of errors namely non-word errors and real-word errors. Non-word errors can be of two categories. First one is that the word itself is invalid. The other is that the word is valid but not present in a valid lexicon. Real-word error means the word is valid but inappropriate in the context of the sentence. An approach to correcting real-word errors in Tamil language is proposed in this paper. A bigram probability model is constructed to determine appropriateness of the valid word in the context of the sentence using a 3GB volume of corpora of Tamil text. In case of lacking appropriateness, the word is marked as a real-word error and minimum edit distance technique is used to find lexically similar words, and the appropriateness of such words is measured by a word-level bigram language probability model. A hash table with word-length as the key is used to speed up the search for words to check for the lexical similarity. Words of lengths of m-1 to m+1 are considered with m being the length of the word found to be 'inappropriate'. Finally, top five words are selected as suggestion for correction. Test results show that the suggestions generated by the system are with 98% accuracy as approved by a Scholar in Tamil. This technique may be used to check real word errors in other languages too with sufficient corpus to build the bigram probability model for the language. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Tamil en_US
dc.subject Real-word error en_US
dc.subject Bigram en_US
dc.subject Minimum edit distance en_US
dc.subject Error correction en_US
dc.title Detecting and correcting real-word errors in Tamil sentences en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account