Hierarchical Clustering

Hierarchical clustering is an unsupervised machine learning algorithm used to group data points into various clusters based on their similarity. Hierarchical clustering creates a hierarchy of clusters, where each cluster can be further divided into smaller clusters. It builds a dendrogram, a tree-like structure to represent the relationships between data points. There is no need to predetermine the number of clusters. it can handle clusters of varying sizes and captures hierarchical relationships in the data. Hierarchical clustering provides a flexible way to explore data structures and relationships, making it a valuable tool in various domains.

The dendrogram consists of nodes representing data points or clusters and branches connecting these nodes. The topmost node is called the root node, and the bottommost nodes are called leaves. The height of each branch represents the distance between the merged clusters. The order of cluster joining is indicated by the height in the dendrogram. Objects joined at lower heights are more similar. Dendrograms provide a visual representation of hierarchical relationships, aiding in understanding data grouping and similarity patterns but dendrograms do not directly reveal the number of clusters. The shape alone does not indicate the optimal cluster count.

There are two approaches for hierarchical clustering:

Bottom-Up: Also called agglomerative approach, it starts with each data point as a single cluster. Then it iteratively merges the closest clusters until only one cluster remains. The result is a dendrogram showing the merging process.

Top-Down: Also called divisive approach, it starts with all data points in a single cluster, then recursively splits the cluster into smaller clusters until individual data points form separate clusters.

Leave a comment Cancel reply