Clustering is the process of making a group of abstract objects into classes of similar objects.
A cluster of data objects can be treated as one group. While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups.
The main advantage of clustering over classification is that it is adaptable to changes and helps to single out useful features that distinguish different groups.
Applications of Cluster Analysis
Requirements of Clustering in Data Mining
The following points throw light on why clustering is required in data mining −
Scalability − we need highly scalable clustering algorithms to deal with large databases.
Ability to deal with different kinds of attributes − Algorithms should be capable to be applied on any kind of data such as interval-based (numerical) data, categorical, and binary data.
Discovery of clusters with attribute shape − the clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures
that tend to find a spherical cluster of small sizes.
High dimensionality − the clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space.
Ability to deal with noisy data − Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters.
Interpretability − the clustering results should be interpretable, comprehensible, and usable.
Clustering methods can be classified into the following categories −