Abstract:
Clustering data, by recognizing a subset of representative examples, is used
for processing sensory signals and detecting patterns in data. K means is the
simplest clustering algorithm used in data clustering for such purposes. In
K means approach, the number of expected clusters and their initial
centroids should be provided as inputs. However the accuracy of
convergence of the initial centroids towards the actual centroids de pends on
the level of approximation of the initial centroids provided as inputs.
Inappropriate initial centroids can cause the algorithm to get stuck in a local
optimum rather than converging to the global optimum. This work proposes
a method to overcome t his problem by approximating the initial centroids
using genetic algorithm. In the proposed approach a randomly generated set
of centroids are arranged as a chain (chromosome). A collection of such
chromosomes form the initial population for the genetic al gorithm. This
population is evolved using a proposed fitness function. The final centroids
were observed by changing the percentages of selection, recombination, and
mutation amounts. It was observed that the proposed approach yielded
optimal results under recombination between 45% 65%, mutation between
10% 14%, and number of iterations in between20 30.The available
approaches for improving the accuracy of k means algorithm using genetic
algorithm have limitations such as number of clusters or the dimen sions in
data sets that can be used with them. The proposed algorithm can be applied
to data sets with any number of clusters and can be extended to any
dimension.