s….s INFO

# Spectral clustering

#### Byarticle

Dec 15, 2021

In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

In application to image segmentation, spectral clustering is known as segmentation-based object categorization.

## . . . Spectral clustering . . .

Given an enumerated set of data points, the similarity matrix may be defined as a symmetric matrix

${displaystyle A}$

, where

${displaystyle A_{ij}geq 0}$

represents a measure of the similarity between data points with indices

${displaystyle i}$

and

${displaystyle j}$

. The general approach to spectral clustering is to use a standard clustering method (there are many such methods, k-means is discussed below) on relevant eigenvectors of a Laplacian matrix of

${displaystyle A}$

. There are many different ways to define a Laplacian which have different mathematical interpretations, and so the clustering will also have different interpretations. The eigenvectors that are relevant are the ones that correspond to smallest several eigenvalues of the Laplacian except for the smallest eigenvalue which will have a value of 0. For computational efficiency, these eigenvectors are often computed as the eigenvectors corresponding to the largest several eigenvalues of a function of the Laplacian.

Spectral clustering is well known to relate to partitioning of a mass-spring system, where each mass is associated with a data point and each spring stiffness corresponds to a weight of an edge describing a similarity of the two related data points. Specifically, the classical reference [1] explains that the eigenvalue problem describing transversal vibration modes of a mass-spring system is exactly the same as the eigenvalue problem for the graph Laplacian matrix defined as

${displaystyle L:=D-A}$

,

where

${displaystyle D}$

is the diagonal matrix

${displaystyle D_{ii}=sum _{j}A_{ij}.}$

The masses that are tightly connected by the springs in the mass-spring system evidently move together from the equilibrium position in low-frequency vibration modes, so that the components of the eigenvectors corresponding to the smallest eigenvalues of the graph Laplacian can be used for meaningful clustering of the masses.

A popular related spectral clustering technique is the normalized cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik,[2] commonly used for image segmentation. It partitions points into two sets

${displaystyle (B_{1},B_{2})}$

based on the eigenvector

${displaystyle v}$

corresponding to the second-smallest eigenvalue of the symmetric normalized Laplacian defined as

${displaystyle L^{text{norm}}:=I-D^{-1/2}AD^{-1/2}.}$

A mathematically equivalent algorithm [3] takes the eigenvector corresponding to the largest eigenvalue of the random walk normalized adjacency matrix

${displaystyle P=D^{-1}A}$

.

Knowing the eigenvectors, partitioning may be done in various ways, such as by computing the median

${displaystyle m}$

of the components of the second smallest eigenvector

${displaystyle v}$

, and placing all points whose component in

${displaystyle v}$

is greater than

${displaystyle m}$

in

${displaystyle B_{1}}$

, and the rest in

${displaystyle B_{2}}$

. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.