A study on clustering algorithms in data mining using Weka tool

Show simple item record

dc.contributor.author Gunasekara, R.P.T.H.
dc.contributor.author Wijegunasekara, M.C.
dc.contributor.author Dias, N.G.J.
dc.date.accessioned 2023-02-07T04:42:20Z
dc.date.available 2023-02-07T04:42:20Z
dc.date.issued 2015-01-22
dc.identifier.issn 1391-8796
dc.identifier.uri http://ir.lib.ruh.ac.lk/xmlui/handle/iruor/10830
dc.description.abstract This study is based on clustering data mining algorithms by using WEKA machine learning software. This paper discusses about four clustering algorithms: k-means, Expectation Hierarchical clustering algorithm, Maximization(EM) ,Density Based and and study the performance of these clustering algorithms based on the cluster building time of each algorithm and the quality of built clusters. The experiment is done on five datasets using WEKA interface. In this experiment, the selected four clustering algorithms are used for five datasets to create clusters. From the results obtained in the experiment, it was concluded that there are both advantages and disadvantages among these clustering algorithms. The k-mean significantly reflected that it is the best performing algorithm for large datasets and cluster building time taken was significantly low. Density based clustering algorithm was not suitable for data with high variance in density. Hierarchical Clustering algorithm did not support for large datasets. However Hierarchical clustering algorithm was more sensitive for noisy or outlier data. EM clustering algorithm gave log likelihood values of the clusters to ensure more reliable clusters. EM algorithm is an extension of k- mean which satisfies more iterations. Although this is a complex algorithm, it can be applied to parallelization to obtain best performances using cross validation. According to this study, it was identified that to choose the best clustering algorithm it is necessary to study size of the dataset, density of the dataset and its distribution. This study is continued for several clustering algorithms to increase the performance by using parallel programming methodologies. en_US
dc.language.iso en en_US
dc.publisher Faculty of Science, University of Ruhuna, Matara, Sri Lanka en_US
dc.title A study on clustering algorithms in data mining using Weka tool en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account