A p re liminar y model of predictive t ext for Sinhala using N gram Statistics

Show simple item record

dc.contributor.author Chanaka, K.M.R.
dc.contributor.author Lorensuhewa, S.A.S.
dc.contributor.author Kalyani, M.A.L.
dc.date.accessioned 2023-01-30T08:48:00Z
dc.date.available 2023-01-30T08:48:00Z
dc.date.issued 2017-01-26
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10495
dc.description.abstract Most Sri Lankans use Sinhala text processing in their day to day activities. But, they feel it hard to type documents in Sinhala and also it takes more time and involves typing mistakes and therefore efficiency is low. Integration of word prediction facility helps the user to select words rather than typing the word s repeatedly to reduce the number of required keystrokes, minimize mistakes and reduce time. The aim of this research is to explore the use of Natural Language Processing and Machine Learning techniques to assist Sinhala typing tasks by predicting the word s. We predict the next word to type from n gram probabilistic model which involves bi gram, tri gram and a combi nation of bi gram and tr i gram. This composite n gram model includes both bi gram and tri gram, giving high priority to the tri gram suggestions . The n gram corpus is generated from Sinhala corpus collected from online Sinhala newspapers. A maximum prediction percentage of 41 was achieved for sports documents by using domain specific n gram corpus of sports documents and obtained an 18.1% average keystroke reduction by using the prediction model. We tested with other news categories such as political, legal and local collected from local newspapers as well. According to our experimental results, composite n gram model outperformed bi gram and tri g ram word prediction models and the domain specific composite n gram model performs better than the composite model created from a mixed corpus. Our goal is to automatically cluster the document corpus and classify the edited text after entering certain amo unt of text and get the predictions from a relevant cluster dynamically to improve the accuracy at runtime, giving a more relevant prediction. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Word Prediction en_US
dc.subject Dynamic Text Prediction en_US
dc.subject N - Gram Model en_US
dc.subject NLP en_US
dc.subject Text Mining en_US
dc.title A p re liminar y model of predictive t ext for Sinhala using N gram Statistics en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account