Abstract:
With the revolution in social technology, Sinhala and Romanized Sinhala
became the main language among the general Sri Lankan community.
Informal shorthand typing used with Romanized Sinhala encourages the
researchers to dive into the new arena of transliterations. As Sinhala is a lowresource
language, the current system uses a rule-based approach for
transliteration and suggestion generation on Romanized Sinhala to Sinhala.
Therefore, different shorthand typing-based word predictions cannot be
achieved. This proposed novel Suggestion transliterator uses an enhanced Trie
which is an efficient information retrieval data structure for word prediction.
The survey collected was used to identify the different typing patterns and
adapted them as rules. Based on the rules, Sinhala dictionary was annotated
and used to train the Trie. The trained model was used to identify the possible
word prediction. The Romanized Sinhala words predicted by the model are
compared with a Romanized Sinhala to Sinhala Knowledge base, which will
return the unique Sinhala words as the suggestions. As an example, the
shorthand Romanized Sinhala word “Adaraya” can be transliterated and
suggested to its Sinhala representation as “ආදරය, ආදාරය”. The model was
tested with 200 unique Romanized Sinhala test data. Each Romanized Sinhala
sentence was fed to the model and word-level suggestions were compared with
the expected output. The model achieved a word-level prediction accuracy of
84%. So, this novel transliterator can gap the ambiguity issue in Romanized
Sinhala to Sinhala transliterations which will help future products to enhance
the typing experience of their Romanized Sinhala users.