Sinhala word suggestion algorithm for ad hoc Romanized Sinhala transliterations using a Trie.

Sumanathilaka, T.G.D.K.; Weerasinghe, R.; Priyadarshana, H.Y.P.P.

IRUOR Home
→
Scholarly Publications
→
Conference and Symposia Proceedings
→
Ruhuna International Science and Technology Conference
→
RISTCON 2023
→
View Item

Sinhala word suggestion algorithm for ad hoc Romanized Sinhala transliterations using a Trie.

Sumanathilaka, T.G.D.K.; Weerasinghe, R.; Priyadarshana, H.Y.P.P.

URI: http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10855

Date: 2023-01-18

Abstract:

With the revolution in social technology, Sinhala and Romanized Sinhala became the main language among the general Sri Lankan community. Informal shorthand typing used with Romanized Sinhala encourages the researchers to dive into the new arena of transliterations. As Sinhala is a lowresource language, the current system uses a rule-based approach for transliteration and suggestion generation on Romanized Sinhala to Sinhala. Therefore, different shorthand typing-based word predictions cannot be achieved. This proposed novel Suggestion transliterator uses an enhanced Trie which is an efficient information retrieval data structure for word prediction. The survey collected was used to identify the different typing patterns and adapted them as rules. Based on the rules, Sinhala dictionary was annotated and used to train the Trie. The trained model was used to identify the possible word prediction. The Romanized Sinhala words predicted by the model are compared with a Romanized Sinhala to Sinhala Knowledge base, which will return the unique Sinhala words as the suggestions. As an example, the shorthand Romanized Sinhala word “Adaraya” can be transliterated and suggested to its Sinhala representation as “ආදරය, ආදාරය”. The model was tested with 200 unique Romanized Sinhala test data. Each Romanized Sinhala sentence was fed to the model and word-level suggestions were compared with the expected output. The model achieved a word-level prediction accuracy of 84%. So, this novel transliterator can gap the ambiguity issue in Romanized Sinhala to Sinhala transliterations which will help future products to enhance the typing experience of their Romanized Sinhala users.

Show full item record