An improved kNN algorithm using K-means and fastText to predict sentiments expressed in Tamil texts

Show simple item record

dc.contributor.author Thavareesan, S.
dc.contributor.author Mahesan, S.
dc.date.accessioned 2023-02-24T09:28:05Z
dc.date.available 2023-02-24T09:28:05Z
dc.date.issued 2020-01-22
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/11504
dc.description.abstract With the intention to develop a suitable approach to perform Sentiment Analysis on Tamil Texts using K-means clustering with k-Nearest Neighbour (k-NN) classifier, a corpus (UJ_Corpus_Opinions) consisting of 1518 Positive and 1173 Negative comments has been constructed. For training and testing 820 and 650 positive and 820 and 350 negative comments were considered, respectively. Bag of Words (BoW) and fastText vectors were used to create feature vectors. These feature vectors were clustered using K-means clustering. The cluster centroids were used as classification keys for k-NN classifier. Two types of clustering techniques were utilised to develop two models: (i) using class-wise information, (ii) with no class-wise information. These two models were tested using K-Fold. All these four models were tested with the two types of feature vectors. These models were tested using varying number of centroids (Kc:1..10), neighbours (Kn:1..Kc) and folds (Kf:1..10) to study their influence in the accuracy. The accuracy increases with the values of Kc, and the highest accuracy (74%) was obtained for Kn=1 and Kf=2. Accuracy, in general, was found to be more with fastText than with the BoW. It was noted that the model with fastText and class-wise clustering with K-Fold that obtained 74% accuracy has F1-Score of 0.74. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.subject Sentiment analysis en_US
dc.subject Tamil en_US
dc.subject K-means en_US
dc.subject K-Nearest Neighbour and fastText en_US
dc.title An improved kNN algorithm using K-means and fastText to predict sentiments expressed in Tamil texts en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account