Use of dynamic text prediction for Tamil text editing

Show simple item record

dc.contributor.author Sarubi, T.
dc.contributor.author Lorensuhewa, S.A.S.
dc.contributor.author Kalyani, M.A.L.
dc.date.accessioned 2023-02-02T09:47:57Z
dc.date.available 2023-02-02T09:47:57Z
dc.date.issued 2018-02-15
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10714
dc.description.abstract In text processing, typing whole documents by ourselves leads to lots of spelling mistakes and also it is time-consuming. When it comes to morphologically rich languages like Tamil language, it’s even more difficult, due to the absence of a clear picture of the Tamil keyboard layout. The aim of this research study is to develop a user-friendly tool to perform next word prediction and spell checking. In our approach, while user types, we detect the user's typing domain using a classifier and then predict the next word according to the predicted domain. Next word prediction is done using domain-specific language models by giving priority to trigram and then bigram. Language models can continuously learn from user’s typing. Recency-based model is used to reduce the search space. Also, detect misspelt words and propose dictionary lookup with distance measure and improve the dictionary suggestion list using n-grams lookups. According to our experiments, Tamil language results in lowest word prediction percentage (WPP) accuracy among Sinhala and English languages. We further analyzed results by varying the total number of words in all three languages and counted the number of unique words. It can be seen from the results that Tamil language has the highest unique words compared with the other two. Tamil language has a large vocabulary than the other two languages and we believe that the lowest prediction level was obtained due to this diversity. Dynamic prediction helps the users, because within a document, we may need different domain n-gram models to predict words. Dictionary lookup with forward and backward bigrams show highest improved accuracy, of 54% while dictionary lookup achieved 36% accuracy. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Tamil language en_US
dc.subject Dynamic word prediction en_US
dc.subject Language models en_US
dc.subject Spell checker en_US
dc.title Use of dynamic text prediction for Tamil text editing en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account