Hierarchical Clustering

Builds a tree-based hierarchy of clusters, which is often visualized using a diagram called a dendrogram.

Types of Hierarchical Clustering

The Agglomerative Clustering Algorithm

The bottom-up agglomerative process can be summarized with the following steps:

  1. Begin by assigning each sample xi to its own cluster, ci.
  2. Compute the proximity (distance) between all pairs of clusters.
  3. Find the pair of clusters with the highest similarity (or lowest distance).
  4. Merge this pair of clusters into a single new cluster.
  5. Update the proximity matrix to reflect the distances between this new cluster and all other existing clusters.
  6. Repeat steps 3-5 until only one cluster remains.

The Dendrogram

The output of a hierarchical clustering algorithm is typically visualized as a dendrogram. This tree diagram provides a rich illustration of how clusters were progressively merged.

Pasted image 20250922174342.png

Image taken from Pattern Classification

The vertical axis of the dendrogram represents the cluster distance or dissimilarity. This is the value of the linkage criterion at which each merge occurred. By drawing a horizontal line across the dendrogram, one can "cut" the tree to obtain a specific number of clusters. A cut at a lower distance threshold results in a larger number of smaller clusters, reflecting a finer granularity. A cut at a higher distance results in fewer, larger clusters.

Measuring Distance Between Clusters (Linkage Criteria)

"AI AI CAPTAIN!"