Techniques

Description

Algorithms’ brief description

Density and Model based

Ÿ Cluster center is built on the density area, a hotspot where data most plotted area

Ÿ Cluster is built around the cluster center and cluster size could growing where there is threshold

Ÿ If there is a new density area, a new cluster will form

Ÿ For all density and model-based clustering techniques algorithm, it shows deficiency when density area is wide, and form a non-rounded hotspot such as L shape

DBSCAN [19]

Ÿ Hard clustering, need to set the minimum number of densities

Ÿ Filter out noise, but, if the minimum density value is too high, a useful data would become noise

Ÿ If the density value is set too low, a lot of clusters is formed and would slower the query process

Expectation Maximization, EM [20]

Ÿ Soft clustering based on Gaussian peak density that could overlap on another cluster area

Grid based

Ÿ Logical grid is build based on maximum grid size (Cluster Size)

Ÿ Less time consumption on cluster exploration

Ÿ However, a data might be placed on a wrong cluster since grids is fixed, hard clustering method

Hierarchical based

Ÿ There are 2 ways cluster is form, agglomerative and divisive

Ÿ Agglomerative is merging few data to form a cluster and merging few clusters to form a greater cluster

Ÿ Divisive is splitting a large cluster into few sub cluster

Ÿ Agglomerative is complex to form but fast to explore while divisive is vice versa

HDBSCAN [21]

Ÿ While the other hierarchical based algorithm was built based on distance linkage, this algorithm is build based on density

Ÿ Distance linkage method, once merging or splitting performed, it is irreversible

Ÿ But, compared to the other hierarchical techniques, HDBSCAN face performance issues as the other density-based techniques

Partition based

Ÿ Centroid as the cluster center is built in the middle of cluster member

Ÿ At first, centroid is placed randomly then the iteration process used to improve the centroid location

Ÿ Iteration also performed when a new data is placed into a data matrix

Ÿ Cluster border is built in the middle of distance between cluster

K-MEANS [22]

Ÿ Hard clustering

Ÿ User need to define the value of K which is the number of clusters

Ÿ The simplest algorithm but the K value need to set accurately

Fuzzy C Means, FCM [23]

Ÿ Soft Clustering where it assigns the cluster membership using degree of membership

Ÿ It is a complex and resource consumption algorithm

Ÿ Since it allows a data belong to more than one cluster, it faces some difficulty for a data that hold too much attribute