Word embedding-based sinhala news documents classification

Show simple item record

dc.contributor.author Weerasiri, R.I.
dc.contributor.author Lorensuhewa, S.A.S.
dc.contributor.author Kalyani, M.A.L.
dc.date.accessioned 2022-03-24T04:09:03Z
dc.date.available 2022-03-24T04:09:03Z
dc.date.issued 2022-01-19
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/5590
dc.description.abstract News articles are increasing daily, and a huge number of text documents are added to the Internet. Manual classification of these documents has become an impossible task. In Sinhala news document classification, TF-IDF has been used more often as a word representation, but word embedding has rarely been used. We compared the performance of Word2Vec, Fast Text and Doc2vec with frequently used Term Frequency Inverse Document Frequency (TF-IDF) as word representations for Sinhala news documents classification and applied machine learning approaches for the best word embedding model identified. We also experimented with each representation by removing stop words and investigated the feasibility of using Convolutional Neural Networks (CNN) as well. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Classification en_US
dc.subject Word embedding en_US
dc.subject Fast Text en_US
dc.subject Sinhala documents en_US
dc.title Word embedding-based sinhala news documents classification en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account