Techniques | Description | Algorithms’ brief description |
Density and Model based | Cluster center is built on the density area, a hotspot where data most plotted area Cluster is built around the cluster center and cluster size could growing where there is threshold If there is a new density area, a new cluster will form For all density and model-based clustering techniques algorithm, it shows deficiency when density area is wide, and form a non-rounded hotspot such as L shape | DBSCAN [19] Hard clustering, need to set the minimum number of densities Filter out noise, but, if the minimum density value is too high, a useful data would become noise If the density value is set too low, a lot of clusters is formed and would slower the query process Expectation Maximization, EM [20] Soft clustering based on Gaussian peak density that could overlap on another cluster area |
Grid based | Logical grid is build based on maximum grid size (Cluster Size) Less time consumption on cluster exploration However, a data might be placed on a wrong cluster since grids is fixed, hard clustering method |
|
Hierarchical based | There are 2 ways cluster is form, agglomerative and divisive Agglomerative is merging few data to form a cluster and merging few clusters to form a greater cluster Divisive is splitting a large cluster into few sub cluster Agglomerative is complex to form but fast to explore while divisive is vice versa | HDBSCAN [21] While the other hierarchical based algorithm was built based on distance linkage, this algorithm is build based on density Distance linkage method, once merging or splitting performed, it is irreversible But, compared to the other hierarchical techniques, HDBSCAN face performance issues as the other density-based techniques |
Partition based | Centroid as the cluster center is built in the middle of cluster member At first, centroid is placed randomly then the iteration process used to improve the centroid location Iteration also performed when a new data is placed into a data matrix Cluster border is built in the middle of distance between cluster | K-MEANS [22] Hard clustering User need to define the value of K which is the number of clusters The simplest algorithm but the K value need to set accurately Fuzzy C Means, FCM [23] Soft Clustering where it assigns the cluster membership using degree of membership It is a complex and resource consumption algorithm Since it allows a data belong to more than one cluster, it faces some difficulty for a data that hold too much attribute |