In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.

In application to image segmentation, spectral clustering is known as segmentation-based object categorization.
. . . Spectral clustering . . .
Given an enumerated set of data points, the similarity matrix may be defined as a symmetric matrix
, where
represents a measure of the similarity between data points with indices
and
clustering method (there are many such methods, k-means is discussed below) on relevant eigenvectors of a Laplacian matrix of
. The general approach to spectral clustering is to use a standard. There are many different ways to define a Laplacian which have different mathematical interpretations, and so the clustering will also have different interpretations. The eigenvectors that are relevant are the ones that correspond to smallest several eigenvalues of the Laplacian except for the smallest eigenvalue which will have a value of 0. For computational efficiency, these eigenvectors are often computed as the eigenvectors corresponding to the largest several eigenvalues of a function of the Laplacian.
Spectral clustering is well known to relate to partitioning of a mass-spring system, where each mass is associated with a data point and each spring stiffness corresponds to a weight of an edge describing a similarity of the two related data points. Specifically, the classical reference [1] explains that the eigenvalue problem describing transversal vibration modes of a mass-spring system is exactly the same as the eigenvalue problem for the graph Laplacian matrix defined as
- ,
where
is the diagonal matrix
The masses that are tightly connected by the springs in the mass-spring system evidently move together from the equilibrium position in low-frequency vibration modes, so that the components of the eigenvectors corresponding to the smallest eigenvalues of the graph Laplacian can be used for meaningful clustering of the masses.
A popular related spectral clustering technique is the normalized cuts algorithm or Shi–Malik algorithm introduced by Jianbo Shi and Jitendra Malik,[2] commonly used for image segmentation. It partitions points into two sets
based on theeigenvalue of the symmetric normalized Laplacian defined as
corresponding to the second-smallestA mathematically equivalent algorithm [3] takes the eigenvector corresponding to the largest eigenvalue of the random walk normalized adjacency matrix
.
Knowing the eigenvectors, partitioning may be done in various ways, such as by computing the median
of the components of the second smallest eigenvector
, and placing all points whose component in
is greater than
in
, and the rest in
. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.
. . . Spectral clustering . . .