zl程序教程

您现在的位置是:首页 >  硬件

当前栏目

机器学习笔记之谱聚类(三)模型的矩阵形式转化

机器笔记学习 模型 矩阵 转化 聚类 形式
2023-09-11 14:15:53 时间

引言

上一节针对 k-Means \text{k-Means} k-Means算法的缺陷,介绍了谱聚类( Spectral Clustering \text{Spectral Clustering} Spectral Clustering)的结构描述以及目标函数。本节将目标函数转化成矩阵形式,并引出拉普拉斯矩阵

回顾:谱聚类——目标函数表示

谱聚类关于处理聚类任务的朴素思想是:通过对样本的合适划分,使样本的划分代价最小。假设一个样本集合 X = { x ( 1 ) , x ( 2 ) , ⋯   , x ( N ) } \mathcal X = \{x^{(1)},x^{(2)},\cdots,x^{(N)}\} X={x(1),x(2),,x(N)},它对应的图结构描述如下:
G = { V , E } \mathcal G = \{\mathcal V,\mathcal E\} G={V,E}
其中 V \mathcal V V表示结点集合,结点数量与样本数量相同;而边集合 E \mathcal E E可视作一个权重矩阵,矩阵中的元素表示结点之间的关联关系
V = { v ( 1 ) , v ( 2 ) , ⋯   , v ( N ) } E ⇒ W = [ W v ( i ) ⇔ v ( j ) ] N × N \begin{aligned} \mathcal V & = \{v^{(1)},v^{(2)},\cdots,v^{(N)}\} \\ \mathcal E & \Rightarrow \mathcal W = [\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}]_{N \times N} \end{aligned} VE={v(1),v(2),,v(N)}W=[Wv(i)v(j)]N×N
结点 v ( i ) , v ( j ) v^{(i)},v^{(j)} v(i),v(j)之间的关联关系 W v ( i ) ⇔ v ( j ) \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} Wv(i)v(j)通常使用样本之间的核函数( Kernel Function \text{Kernel Function} Kernel Function)表示。这里以高斯核函数为例:
W v ( i ) ⇔ v ( j ) = κ ( x ( i ) , x ( j ) ) = exp ⁡ { − ∣ ∣ x ( i ) − x ( j ) ∣ ∣ 2 2 2 σ 2 } assert ⟨ v ( i ) , v ( j ) ⟩ ∈ E \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} = \kappa(x^{(i)},x^{(j)}) = \exp \left\{- \frac{||x^{(i)} - x^{(j)}||_2^2}{2\sigma^2}\right\} \quad \text{assert} \left\langle v^{(i)},v^{(j)}\right\rangle \in \mathcal E Wv(i)v(j)=κ(x(i),x(j))=exp{2σ2∣∣x(i)x(j)22}assertv(i),v(j)E
假设使用规则将图 G \mathcal G G划分成 K \mathcal K K子图,各子图的结点集合分别表示为: { A 1 , A 2 , ⋯   , A K } \{\mathcal A_1,\mathcal A_2,\cdots,\mathcal A_{\mathcal K}\} {A1,A2,,AK},并且有:
{ V = A 1 ∪ A 2 ∪ ⋯ ∪ A K A i ∩ A j = ϕ ∀ i , j ∈ { 1 , 2 , ⋯   , K } \begin{cases} \mathcal V = \mathcal A_1 \cup \mathcal A_2 \cup \cdots \cup \mathcal A_{\mathcal K} \\ \mathcal A_i \cap \mathcal A_j = \phi \quad \forall i,j \in \{1,2,\cdots,\mathcal K\} \end{cases} {V=A1A2AKAiAj=ϕi,j{1,2,,K}
至此,在执行划分后关于结点集合 V \mathcal V V代价函数 Cut ( V ) \text{Cut}(\mathcal V) Cut(V)可表示为:
其中 A k ˉ \bar {\mathcal A_k} Akˉ表示 A k \mathcal A_k Ak的补集。
Cut ( V ) = Cut ( A 1 , A 2 , ⋯   , A K ) = ∑ k = 1 K ψ ( A k , A k ˉ ) = ∑ k = 1 K ∑ j ≠ k K ψ ( A k , A j ) \begin{aligned} \text{Cut}(\mathcal V) & = \text{Cut}(\mathcal A_1,\mathcal A_2,\cdots,\mathcal A_{\mathcal K}) \\ & = \sum_{k=1}^{\mathcal K} \psi(\mathcal A_k,\bar {\mathcal A_k}) \\ & = \sum_{k=1}^{\mathcal K} \sum_{j \neq k}^{\mathcal K} \psi(\mathcal A_k,\mathcal A_j) \end{aligned} Cut(V)=Cut(A1,A2,,AK)=k=1Kψ(Ak,Akˉ)=k=1Kj=kKψ(Ak,Aj)
最后对 ψ ( A k , A j ) \psi(\mathcal A_k,\mathcal A_j) ψ(Ak,Aj)进行标准化后,作为图划分目标函数
{ Normalized-Cut ( V ) = ∑ k = 1 K ψ ( A k , A k ˉ ) degree ( A k ) degree ( A k ) = ∑ v ( i ) ∈ A k ∑ j = 1 N W v ( i ) ⇔ v ( j ) \begin{cases} \text{Normalized-Cut}(\mathcal V) = \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ \text{degree}(\mathcal A_k) = \sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^{N} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \end{cases} {Normalized-Cut(V)=k=1Kdegree(Ak)ψ(Ak,Akˉ)degree(Ak)=v(i)Akj=1NWv(i)v(j)
最终将切分的选择问题转化为数学中的优化问题
min ⁡ { A k } k = 1 K Normalized-Cut ( V ) \mathop{\min}\limits_{\{\mathcal A_k\}_{k=1}^{\mathcal K}} \text{Normalized-Cut}(\mathcal V) {Ak}k=1KminNormalized-Cut(V)

引入指示向量

为了能够更方便地对优化问题进行求解,通常将优化函数的连加形式转化为矩阵的乘法形式。首先观察 { A k } k = 1 K \{\mathcal A_k\}_{k=1}^{\mathcal K} {Ak}k=1K,它表示结点子集的集合
{ A k } k = 1 K ⇒ { A 1 , A 2 , ⋯   , A K } \{\mathcal A_k\}_{k=1}^{\mathcal K} \Rightarrow \{\mathcal A_1,\mathcal A_2,\cdots,\mathcal A_{\mathcal K}\} {Ak}k=1K{A1,A2,,AK}
将这个 { A k } k = 1 K \{\mathcal A_k\}_{k=1}^{\mathcal K} {Ak}k=1K描述为指示向量( Indicator Vector \text{Indicator Vector} Indicator Vector)的形式:

  • 所谓‘指示向量’ y ( i ) ( i = 1 , 2 , ⋯   , N ) y^{(i)}(i=1,2,\cdots,N) y(i)(i=1,2,,N),就是描述结点 v ( i ) v^{(i)} v(i)位于结点子集 A k ( k = 1 , 2 , ⋯   , K ) \mathcal A_k(k=1,2,\cdots,\mathcal K) Ak(k=1,2,,K)的信息。使用 one-hot \text{one-hot} one-hot向量进行表达。
  • 例如某结点 v ( i ) v^{(i)} v(i)对应的指示向量 y ( i ) y^{(i)} y(i)结果为 ( 1 , 0 , ⋯   , 0 ) ⏟ K 个元素 \underbrace{(1,0,\cdots,0)}_{\mathcal K个元素} K个元素 (1,0,,0),那么该向量表示为 v ( i ) v^{(i)} v(i)属于 A 1 \mathcal A_1 A1结点子集。随着划分结点子集数量的变化, one-hot \text{one-hot} one-hot向量的长度也随之发生变化。
    { y ( i ) ∈ { 0 , 1 } K ∑ k = 1 K y v ( i ) ∈ A k = 1 \begin{cases} y^{(i)} \in \{0,1\}^{\mathcal K} \\ \sum_{k=1}^{\mathcal K} y_{v^{(i)} \in \mathcal A_{k}} = 1 \end{cases} {y(i){0,1}Kk=1Kyv(i)Ak=1

至此,使用一个 K \mathcal K K维的 one-hot \text{one-hot} one-hot向量表示样本 x ( i ) x^{(i)} x(i)对应结点的归属信息,对于整个样本集合 X \mathcal X X,对应的归属信息 Y \mathcal Y Y可表示为:
Y = ( y ( 1 ) , y ( 2 ) , ⋯   , y ( N ) ) T = [ y 1 ( 1 ) , y 2 ( 1 ) , ⋯   , y K ( 1 ) y 1 ( 2 ) , y 2 ( 2 ) , ⋯   , y K ( 2 ) ⋮ y 1 ( N ) , y 2 ( N ) , ⋯   , y K ( N ) ] N × K \begin{aligned} \mathcal Y & = (y^{(1)},y^{(2)},\cdots,y^{(N)})^T \\ & = \begin{bmatrix} y_1^{(1)},y_2^{(1)},\cdots,y_{\mathcal K}^{(1)} \\ y_1^{(2)},y_2^{(2)},\cdots,y_{\mathcal K}^{(2)} \\ \vdots \\ y_1^{(N)},y_2^{(N)},\cdots,y_{\mathcal K}^{(N)} \end{bmatrix}_{N \times \mathcal K} \end{aligned} Y=(y(1),y(2),,y(N))T= y1(1),y2(1),,yK(1)y1(2),y2(2),,yK(2)y1(N),y2(N),,yK(N) N×K
最终关于优化问题可转化为如下形式:
Y ^ = arg ⁡ min ⁡ Y Normalized-Cut ( V ) \hat {\mathcal Y} = \mathop{\arg\min}\limits_{\mathcal Y} \text{Normalized-Cut}(\mathcal V) Y^=YargminNormalized-Cut(V)

优化问题的化简过程

小插曲:观察 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)

对于 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)
Normalize-Cut ( V ) = ∑ k = 1 K ψ ( A k , A k ˉ ) degree ( A k ) = ∑ k = 1 K ∑ m ≠ k ∑ v ( i ) ∈ A k ∑ v ( j ) ∈ A m W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A k ∑ j = 1 N W v ( i ) ⇔ v ( j ) = ∑ k = 1 K ∑ v ( i ) ∈ A k ∑ m ≠ k ∑ v ( j ) ∈ A m W v ( i ) ⇔ v ( j ) ∑ j = 1 N W v ( i ) ⇔ v ( j ) \begin{aligned} \text{Normalize-Cut}(\mathcal V) & = \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ & = \sum_{k=1}^{\mathcal K} \frac{\sum_{m \neq k} \sum_{v^{(i)} \in \mathcal A_k}\sum_{v^{(j)} \in\mathcal A_m} \mathcal W_{v^{(i)}\Leftrightarrow v^{(j)}}}{\sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^{N} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \\ & = \sum_{k=1}^{\mathcal K} \sum_{v^{(i)} \in \mathcal A_k} \frac{\sum_{m \neq k}\sum_{v^{(j)} \in \mathcal A_m} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}}{\sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \end{aligned} Normalize-Cut(V)=k=1Kdegree(Ak)ψ(Ak,Akˉ)=k=1Kv(i)Akj=1NWv(i)v(j)m=kv(i)Akv(j)AmWv(i)v(j)=k=1Kv(i)Akj=1NWv(i)v(j)m=kv(j)AmWv(i)v(j)
观察展开项,可以发现 ∑ m ≠ k ∑ v ( j ) ∈ A m W v ( i ) ⇔ v ( j ) ∑ j = 1 N W v ( i ) ⇔ v ( j ) ≤ 1 \frac{\sum_{m \neq k}\sum_{v^{(j)} \in \mathcal A_m} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}}{\sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \leq 1 j=1NWv(i)v(j)m=kv(j)AmWv(i)v(j)1恒成立。并在当前结点子集 A k \mathcal A_k Ak中只有 v ( i ) v^{(i)} v(i)一个结点时取等

化简过程

Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)表示成矩阵运算的形式。这里将 ∑ k = 1 K ψ ( A k , A k ˉ ) degree ( A k ) \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} k=1Kdegree(Ak)ψ(Ak,Akˉ)中的每一项看作是某对角矩阵的主对角线元素,通过对该矩阵的方式 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)进行表示:
需要注意的是,矩阵的秩 ( Rank ) (\text{Rank}) (Rank)和矩阵的迹 ( Trace ) (\text{Trace}) (Trace)不是一个东西,矩阵的秩表示对角阵中非零元素的数目;而迹表示对角阵元素的和。
Normalize-Cut ( V ) = ∑ k = 1 K ψ ( A k , A k ˉ ) degree ( A k ) = tr { [ ψ ( A 1 , A 1 ˉ ) degree ( A 1 ) ψ ( A 2 , A 2 ˉ ) degree ( A 2 ) ⋱ ψ ( A K , A K ˉ ) degree ( A K ) ] K × K } \begin{aligned} \text{Normalize-Cut}(\mathcal V) & = \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ & = \text{tr}\left\{\begin{bmatrix} \frac{\psi(\mathcal A_1,\bar {\mathcal A_1})}{\text{degree}(\mathcal A_1)} & & & \\ & \frac{\psi(\mathcal A_2,\bar {\mathcal A_2})}{\text{degree}(\mathcal A_2)} & & \\ & & \ddots&\\ & & &\frac{\psi(\mathcal A_{\mathcal K},\bar {\mathcal A_{\mathcal K}})}{\text{degree}(\mathcal A_{\mathcal K})} \end{bmatrix}_{\mathcal K \times \mathcal K}\right\} \end{aligned} Normalize-Cut(V)=k=1Kdegree(Ak)ψ(Ak,Akˉ)=tr degree(A1)ψ(A1,A1ˉ)degree(A2)ψ(A2,A2ˉ)degree(AK)ψ(AK,AKˉ) K×K
可以继续将上述矩阵进行分解,分解成两个对角阵相乘的形式:

  • 关于元素 1 degree ( A k ) ( k = 1 , 2 , ⋯   , K ) \frac{1}{\text{degree}(\mathcal A_{k})}(k=1,2,\cdots,\mathcal K) degree(Ak)1(k=1,2,,K)的矩阵,直接使用逆矩阵的形式表示。
  • 为节省空间, Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)直接用 I \mathcal I I进行表示。
    I = tr { [ ψ ( A 1 , A 1 ˉ ) ψ ( A 2 , A 2 ˉ ) ⋱ ψ ( A K , A K ˉ ) ] K × K ⏟ 记作 O ⋅ [ degree ( A 1 ) degree ( A 2 ) ⋱ degree ( A K ) ] K × K − 1 ⏟ 记作 P } = tr ( O ⋅ P − 1 ) \begin{aligned} \mathcal I & = \text{tr}\left\{\underbrace{\begin{bmatrix} \psi(\mathcal A_1,\bar {\mathcal A_1}) & & & \\ & \psi(\mathcal A_2,\bar {\mathcal A_2}) & & \\ & & \ddots&\\ & & & \psi(\mathcal A_{\mathcal K},\bar {\mathcal A_{\mathcal K}}) \end{bmatrix}_{\mathcal K \times \mathcal K}}_{记作\mathcal O} \cdot \underbrace{\begin{bmatrix} \text{degree}(\mathcal A_1) & & & \\ & \text{degree}(\mathcal A_2) & & \\ & & \ddots &\\ & & & \text{degree}(\mathcal A_{\mathcal K}) \end{bmatrix}_{\mathcal K \times \mathcal K}^{-1}}_{记作\mathcal P} \right\} \\ & = \text{tr}(\mathcal O \cdot \mathcal P^{-1}) \end{aligned} I=tr 记作O ψ(A1,A1ˉ)ψ(A2,A2ˉ)ψ(AK,AKˉ) K×K记作P degree(A1)degree(A2)degree(AK) K×K1 =tr(OP1)

此时已经将优化问题转化为矩阵的表达形式,下一步通过 已知的划分出的结点子集 { A k } k = 1 K ⇒ Y \{\mathcal A_k\}_{k=1}^{\mathcal K} \Rightarrow \mathcal Y {Ak}k=1KY以及对应的权重矩阵 E ⇒ W = [ W v ( i ) ⇔ v ( j ) ] N × N \mathcal E \Rightarrow \mathcal W = [\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}]_{N \times N} EW=[Wv(i)v(j)]N×N来表示矩阵 O , P \mathcal O,\mathcal P O,P
需要说明的点:为什么说'指示向量' { A k } k = 1 K ⇒ Y \{\mathcal A_k\}_{k=1}^{\mathcal K} \Rightarrow \mathcal Y {Ak}k=1KY是已知的。从流程上观察,整个过程是基于目标函数 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)的迭代过程,只有执行了划分之后,才能够计算出 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)的值,因而 Y \mathcal Y Y在每次迭代过程中都是已知的,只不过初始状态划分效果不佳, Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)数值较大而已。通过不断减小 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)的值来优化划分方式 ⇒ Y \Rightarrow \mathcal Y Y.

矩阵 P \mathcal P P的化简过程

重新观察 Y \mathcal Y Y,以及 Y T Y \mathcal Y^T\mathcal Y YTY
Y T Y = [ y ( 1 ) , y ( 2 ) , ⋯   , y ( N ) ] K × N ⋅ [ y ( 1 ) , y ( 2 ) , ⋯   , y ( N ) ] N × K T = [ ∑ i = 1 N y ( i ) [ y ( i ) ] T ] K × K \begin{aligned} \mathcal Y^T\mathcal Y & = \left[y^{(1)},y^{(2)},\cdots,y^{(N)}\right]_{\mathcal K \times N} \cdot \left[y^{(1)},y^{(2)},\cdots,y^{(N)}\right]_{N \times \mathcal K}^T \\ & = \left[\sum_{i=1}^N y^{(i)}[y^{(i)}]^T\right]_{\mathcal K \times \mathcal K} \end{aligned} YTY=[y(1),y(2),,y(N)]K×N[y(1),y(2),,y(N)]N×KT=[i=1Ny(i)[y(i)]T]K×K
由于 y ( i ) y^{(i)} y(i)是一个 one-hot \text{one-hot} one-hot向量,因而 y ( i ) [ y ( i ) ] T y^{(i)}[y^{(i)}]^T y(i)[y(i)]T是一个只有一个元素是1的 K × K \mathcal K \times \mathcal K K×K矩阵,并且这个元素一定在对角线的位置上,这个位置具体取决于 y ( i ) y^{(i)} y(i)数值为1对应的列数。 因而 Y T Y \mathcal Y^T\mathcal Y YTY对应的 K × K \mathcal K \times \mathcal K K×K矩阵内容可描述为:
其中 N 1 N_1 N1的物理意义是: N N N个样本中属于结点子集 A 1 \mathcal A_1 A1的样本数量。也可表示成 N 1 = ∣ A 1 ∣ = ∑ v ( i ) ∈ A 1 ⋅ 1 N_1 = |\mathcal A_1| = \sum_{v^{(i)} \in \mathcal A_1} \cdot 1 N1=A1=v(i)A11,以此类推。
[ ∑ i = 1 N y ( i ) [ y ( i ) ] T ] K × K = [ N 1 N 2 ⋱ N K ] K × K ∑ k = 1 K N k = N = [ ∑ v ( i ) ∈ A 1 ⋅ 1 ∑ v ( i ) ∈ A 2 ⋅ 1 ⋱ ∑ v ( i ) ∈ A K ⋅ 1 ] K × K \begin{aligned} \left[\sum_{i=1}^N y^{(i)}[y^{(i)}]^T\right]_{\mathcal K \times \mathcal K} & = \begin{bmatrix} N_1 & & & \\ & N_2 & & \\ & & \ddots &\\ & & & N_{\mathcal K} \end{bmatrix}_{\mathcal K \times \mathcal K} \quad \sum_{k=1}^{\mathcal K} N_k = N \\ & = \begin{bmatrix} \sum_{v^{(i)} \in \mathcal A_1} \cdot 1 & & & \\ & \sum_{v^{(i)} \in \mathcal A_2} \cdot 1 & & \\ & & \ddots &\\ & & & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}} \cdot 1 \end{bmatrix}_{\mathcal K \times \mathcal K} \end{aligned} [i=1Ny(i)[y(i)]T]K×K= N1N2NK K×Kk=1KNk=N= v(i)A11v(i)A21v(i)AK1 K×K
至此,我们完全可以使用 Y T Y \mathcal Y^T\mathcal Y YTY来描述矩阵 P \mathcal P P了:

  • 其中 ∑ j = 1 N W v ( i ) ⇔ v ( j ) \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} j=1NWv(i)v(j)内部只有与 v ( i ) v^{(i)} v(i)存在边相连的若干项是有值的,其余均是 0 0 0.
  • 由于 ∑ j = 1 N W v ( i ) ⇔ v ( j ) \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} j=1NWv(i)v(j)是一个实数,并且 [ y ( i ) ] T [y^{(i)}]^T [y(i)]T中不含 j j j,直接将连加号带进去即可.
    degree ( A k ) = ∑ v ( i ) ∈ A k ∑ j = 1 N W v ( i ) ⇔ v ( j ) = ∑ v ( i ) ∈ A k ∑ j = 1 N { [ y ( i ) ] T ⋅ W v ( i ) ⇔ v ( j ) ⋅ y ( i ) } = ∑ v ( i ) ∈ A k [ y ( i ) ] T { ∑ j = 1 N W v ( i ) ⇔ v ( j ) } y ( i ) P = [ ∑ v ( i ) ∈ A 1 ∑ j = 1 N W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A 2 ∑ j = 1 N W v ( i ) ⇔ v ( j ) ⋱ ∑ v ( i ) ∈ A K ∑ j = 1 N W v ( i ) ⇔ v ( j ) ) ] K × K = ( y ( 1 ) , y ( 2 ) , ⋯   , y ( N ) ) K × N [ ∑ j = 1 N W v ( 1 ) ⇔ v ( j ) ∑ j = 1 N W v ( 2 ) ⇔ v ( j ) ⋱ ∑ j = 1 N W v ( N ) ⇔ v ( j ) ) ] N × N ⏟ 记作 D ( y ( 1 ) y ( 2 ) ⋮ y ( N ) ) N × K = Y T D Y \begin{aligned} \text{degree}(\mathcal A_k) & = \sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \\ & = \sum_{v^{(i)} \in\mathcal A_k}\sum_{j=1}^N \left\{[y^{(i)}]^T \cdot \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \cdot y^{(i)}\right\} \\ & =\sum_{v^{(i)} \in\mathcal A_k} [y^{(i)}]^T \left\{\sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \right\} y^{(i)}\\ \mathcal P & = \begin{bmatrix} \sum_{v^{(i)} \in \mathcal A_1} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} & & & \\ & \sum_{v^{(i)} \in \mathcal A_2} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} & & \\ & & \ddots &\\ & & & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}) \end{bmatrix}_{\mathcal K \times \mathcal K} \\ & = (y^{(1)},y^{(2)},\cdots,y^{(N)})_{\mathcal K \times N} \underbrace{\begin{bmatrix} \sum_{j=1}^N \mathcal W_{v^{(1)} \Leftrightarrow v^{(j)}} & & & \\ & \sum_{j=1}^N \mathcal W_{v^{(2)} \Leftrightarrow v^{(j)}} & & \\ & & \ddots &\\ & & & \sum_{j=1}^N \mathcal W_{v^{(N)} \Leftrightarrow v^{(j)}}) \end{bmatrix}_{N \times N}}_{记作\mathcal D} \begin{pmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(N)} \end{pmatrix}_{N \times \mathcal K} \\ & = \mathcal Y^T \mathcal D \mathcal Y \end{aligned} degree(Ak)P=v(i)Akj=1NWv(i)v(j)=v(i)Akj=1N{[y(i)]TWv(i)v(j)y(i)}=v(i)Ak[y(i)]T{j=1NWv(i)v(j)}y(i)= v(i)A1j=1NWv(i)v(j)v(i)A2j=1NWv(i)v(j)v(i)AKj=1NWv(i)v(j)) K×K=(y(1),y(2),,y(N))K×N记作D j=1NWv(1)v(j)j=1NWv(2)v(j)j=1NWv(N)v(j)) N×N y(1)y(2)y(N) N×K=YTDY

继续观察矩阵 D \mathcal D D,可以发现 D \mathcal D D中的每一个元素均可以使用 W \mathcal W W进行表示:

  • 其中 W i \mathcal W_i Wi表示权重矩阵的第 i i i行; I N × 1 \mathcal I_{N \times 1} IN×1表示元素均为1,并且长度为 N N N的列向量.
  • diag ( W ⋅ I N × 1 ) \text{diag}(\mathcal W \cdot \mathcal I_{N \times 1}) diag(WIN×1)表示将列向量 W ⋅ I N × 1 \mathcal W \cdot \mathcal I_{N \times 1} WIN×1中的所有元素按顺序放置在矩阵对角线位置,从而构成对角阵 -> 对角转化
    { ∑ j = 1 N W v ( i ) ⇔ v ( j ) = [ W i ] 1 × N ⋅ I N × 1 D = diag ( W ⋅ I N × 1 ) \begin{cases} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} = [\mathcal W_i]_{1 \times N} \cdot \mathcal I_{N \times 1} \\ \mathcal D = \text{diag}(\mathcal W \cdot \mathcal I_{N \times 1}) \end{cases} {j=1NWv(i)v(j)=[Wi]1×NIN×1D=diag(WIN×1)

矩阵 O \mathcal O O的化简过程

至此,可以使用权重矩阵 W \mathcal W W描述矩阵 P \mathcal P P,继续观察矩阵 O \mathcal O O

  • 首先观察矩阵 O \mathcal O O中元素的表达形式,可以将其修改成如下形式:
    结点集合补集的定义: A k ∪ A k ˉ = V \mathcal A_k \cup \bar {\mathcal A_k} = \mathcal V AkAkˉ=V
    ψ ( A k , A k ˉ ) = ψ ( A k , V ) − ψ ( A k , A k ) = ∑ v ( i ) ∈ A k ∑ j = 1 N W v ( i ) ⇔ v ( j ) − ∑ v ( i ) ∈ A k ∑ v ( j ) ∈ A k W v ( i ) ⇔ v ( j ) = degree ( A k ) − ψ ( A k , A k ) \begin{aligned} \psi(\mathcal A_k, \bar {\mathcal A_k}) & = \psi(\mathcal A_k,\mathcal V) - \psi(\mathcal A_k,\mathcal A_k) \\ & = \sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} - \sum_{v^{(i)} \in \mathcal A_k} \sum_{v^{(j)} \in \mathcal A_k} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \\ & = \text{degree}(\mathcal A_k) - \psi(\mathcal A_k,\mathcal A_k) \end{aligned} ψ(Ak,Akˉ)=ψ(Ak,V)ψ(Ak,Ak)=v(i)Akj=1NWv(i)v(j)v(i)Akv(j)AkWv(i)v(j)=degree(Ak)ψ(Ak,Ak)
  • 对应的矩阵 O \mathcal O O可表示成如下形式:
    O = [ ψ ( A 1 , A 1 ˉ ) ψ ( A 2 , A 2 ˉ ) ⋱ ψ ( A K , A K ˉ ) ] K × K = [ degree ( A 1 ) degree ( A 2 ) ⋱ degree ( A K ) ] K × K − [ ψ ( A 1 , A 1 ) ψ ( A 2 , A 2 ) ⋱ ψ ( A K , A K ) ] \begin{aligned} \mathcal O & = \begin{bmatrix} \psi(\mathcal A_1, \bar {\mathcal A_1}) & & & \\ & \psi(\mathcal A_2, \bar {\mathcal A_2}) & & \\ & & \ddots &\\ & & & \psi(\mathcal A_{\mathcal K}, \bar {\mathcal A_{\mathcal K}}) \end{bmatrix}_{\mathcal K \times \mathcal K} \\ & = \begin{bmatrix} \text{degree}(\mathcal A_1) & & & \\ & \text{degree}(\mathcal A_2) & & \\ & & \ddots &\\ & & & \text{degree}(\mathcal A_{\mathcal K}) \end{bmatrix}_{\mathcal K \times \mathcal K} - \begin{bmatrix} \psi(\mathcal A_1,\mathcal A_1) & & & \\ & \psi(\mathcal A_2,\mathcal A_2) & & \\ & & \ddots &\\ & & & \psi(\mathcal A_{\mathcal K},\mathcal A_{\mathcal K}) \end{bmatrix} \end{aligned} O= ψ(A1,A1ˉ)ψ(A2,A2ˉ)ψ(AK,AKˉ) K×K= degree(A1)degree(A2)degree(AK) K×K ψ(A1,A1)ψ(A2,A2)ψ(AK,AK)

第一项我们认识,它就是 Y T D Y \mathcal Y^T\mathcal D\mathcal Y YTDY。在观察第二项之前,先认识一个 K × K \mathcal K \times \mathcal K K×K的矩阵: Y T W Y \mathcal Y^T \mathcal W \mathcal Y YTWY。先将 Y T W Y \mathcal Y^T\mathcal W \mathcal Y YTWY展开,其结果表示如下:
Y T W Y = ( y ( 1 ) , y ( 2 ) , ⋯   , y ( N ) ) K × N [ W v ( 1 ) ⇔ v ( 1 ) , W v ( 1 ) ⇔ v ( 2 ) , ⋯   , W v ( 1 ) ⇔ v ( N ) W v ( 2 ) ⇔ v ( 1 ) , W v ( 2 ) ⇔ v ( 2 ) , ⋯   , W v ( 2 ) ⇔ v ( N ) ⋮ W v ( N ) ⇔ v ( 1 ) , W v ( N ) ⇔ v ( 2 ) , ⋯   , W v ( N ) ⇔ v ( N ) ] N × N ( y ( 1 ) y ( 2 ) ⋮ y ( N ) ) N × K = [ ∑ i = 1 N y ( i ) ⋅ W v ( i ) ⇔ v ( 1 ) ⋯   , ∑ i = 1 N y ( i ) ⋅ W v ( i ) ⇔ v ( N ) ] K × N ( y ( 1 ) y ( 2 ) ⋮ y ( N ) ) N × K = ∑ i = 1 N ∑ j = 1 N { y ( i ) [ y ( j ) ] T ⋅ W v ( i ) ⇔ v ( j ) ⏟ 实数 } K × K \begin{aligned} \mathcal Y^T\mathcal W \mathcal Y & = \left(y^{(1)},y^{(2)},\cdots,y^{(N)}\right)_{\mathcal K \times N}\begin{bmatrix} \mathcal W_{v^{(1)} \Leftrightarrow v^{(1)}},\mathcal W_{v^{(1)} \Leftrightarrow v^{(2)}},\cdots,\mathcal W_{v^{(1)} \Leftrightarrow v^{(N)}} \\ \mathcal W_{v^{(2)} \Leftrightarrow v^{(1)}},\mathcal W_{v^{(2)} \Leftrightarrow v^{(2)}},\cdots,\mathcal W_{v^{(2)} \Leftrightarrow v^{(N)}} \\ \vdots \\ \mathcal W_{v^{(N)} \Leftrightarrow v^{(1)}},\mathcal W_{v^{(N)} \Leftrightarrow v^{(2)}},\cdots,\mathcal W_{v^{(N)} \Leftrightarrow v^{(N)}} \\ \end{bmatrix}_{N \times N} \begin{pmatrix} y^{(1)} \\ y^{(2)}\\ \vdots \\ y^{(N)} \end{pmatrix}_{N \times \mathcal K} \\ & = \left[\sum_{i=1}^N y^{(i)} \cdot \mathcal W_{v^{(i)} \Leftrightarrow v^{(1)}} \cdots,\sum_{i=1}^N y^{(i)} \cdot \mathcal W_{v^{(i)} \Leftrightarrow v^{(N)}}\right]_{\mathcal K \times N} \begin{pmatrix} y^{(1)} \\ y^{(2)}\\ \vdots \\ y^{(N)} \end{pmatrix}_{N \times \mathcal K}\\ & = \sum_{i=1}^{N} \sum_{j=1}^N \left\{y^{(i)} [y^{(j)}]^T \cdot \underbrace{\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}}_{实数}\right\}_{\mathcal K \times \mathcal K} \\ \end{aligned} YTWY=(y(1),y(2),,y(N))K×N Wv(1)v(1),Wv(1)v(2),,Wv(1)v(N)Wv(2)v(1),Wv(2)v(2),,Wv(2)v(N)Wv(N)v(1),Wv(N)v(2),,Wv(N)v(N) N×N y(1)y(2)y(N) N×K=[i=1Ny(i)Wv(i)v(1),i=1Ny(i)Wv(i)v(N)]K×N y(1)y(2)y(N) N×K=i=1Nj=1N y(i)[y(j)]T实数 Wv(i)v(j) K×K
如果将上式的 K × K \mathcal K \times \mathcal K K×K展开,可以表示成如下形式:
这里个人出现一个误区: { y ( i ) [ y ( j ) ] T } K × K \{y^{(i)}[y^{(j)}]^T\}_{\mathcal K \times \mathcal K} {y(i)[y(j)]T}K×K i , j i,j i,j不属于同一个结点集合时,其矩阵结果并不是全零元素。而是存在一个 1 1 1元素。示例:如果 v ( i ) ∈ A 1 , v ( j ) ∈ A K v^{(i)}\in\mathcal A_{1},v^{(j)} \in \mathcal A_{\mathcal K} v(i)A1,v(j)AK,那么对应的 { y ( i ) [ y ( j ) ] T } K × K \{y^{(i)}[y^{(j)}]^T\}_{\mathcal K \times \mathcal K} {y(i)[y(j)]T}K×K应该是这个的样子:
v ( i ) ∈ A 1 , v ( j ) ∈ A K ⇒ { y ( i ) [ y ( j ) ] T } K × K = ( 0 0 ⋯ 0 0 0 ⋯ 0 ⋮ ⋮ ⋱ ⋮ 1 0 ⋯ 0 ) K × K Y T W Y = ( ∑ v ( i ) ∈ A 1 ∑ v ( j ) ∈ A 1 W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A 1 ∑ v ( j ) ∈ A 2 W v ( i ) ⇔ v ( j ) ⋯ ∑ v ( i ) ∈ A 1 ∑ v ( j ) ∈ A K W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A 2 ∑ v ( j ) ∈ A 1 W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A 2 ∑ v ( j ) ∈ A 2 W v ( i ) ⇔ v ( j ) ⋯ ∑ v ( i ) ∈ A 2 ∑ v ( j ) ∈ A K W v ( i ) ⇔ v ( j ) ⋮ ⋮ ⋱ ⋮ ∑ v ( i ) ∈ A K ∑ v ( j ) ∈ A 1 W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A K ∑ v ( j ) ∈ A 2 W v ( i ) ⇔ v ( j ) ⋯ ∑ v ( i ) ∈ A K ∑ v ( j ) ∈ A K W v ( i ) ⇔ v ( j ) ) K × K = [ ψ ( A 1 , A 1 ) ψ ( A 1 , A 2 ) ⋯ ψ ( A 1 , A K ) ψ ( A 2 , A 1 ) ψ ( A 2 , A 2 ) ⋯ ψ ( A 2 , A K ) ⋮ ⋮ ⋱ ⋮ ψ ( A K , A 1 ) ψ ( A K , A 2 ) ⋯ ψ ( A K , A K ) ] K × K \begin{aligned} & v^{(i)}\in\mathcal A_{1},v^{(j)} \in \mathcal A_{\mathcal K} \Rightarrow \{y^{(i)}[y^{(j)}]^T\}_{\mathcal K \times \mathcal K} = \begin{pmatrix} 0 & 0 &\cdots &0 \\ 0 & 0 & \cdots &0 \\ \vdots & \vdots & \ddots & \vdots\\ 1 & 0 & \cdots & 0 \end{pmatrix}_{\mathcal K \times \mathcal K} \\ \mathcal Y^T\mathcal W \mathcal Y & = \begin{pmatrix} \sum_{v^{(i)} \in \mathcal A_1}\sum_{v^{(j)} \in \mathcal A_1 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \sum_{v^{(i)} \in \mathcal A_1}\sum_{v^{(j)} \in \mathcal A_2 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} &\cdots &\sum_{v^{(i)} \in \mathcal A_1}\sum_{v^{(j)} \in \mathcal A_{\mathcal K} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \\ \sum_{v^{(i)} \in \mathcal A_2}\sum_{v^{(j)} \in \mathcal A_1 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \sum_{v^{(i)} \in \mathcal A_2}\sum_{v^{(j)} \in \mathcal A_2 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \cdots & \sum_{v^{(i)} \in \mathcal A_2}\sum_{v^{(j)} \in \mathcal A_{\mathcal K} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \\ \vdots & \vdots & \ddots & \vdots\\ \sum_{v^{(i)} \in \mathcal A_{\mathcal K}}\sum_{v^{(j)} \in \mathcal A_1 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}}\sum_{v^{(j)} \in \mathcal A_2 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \cdots & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}}\sum_{v^{(j)} \in \mathcal A_{\mathcal K} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \end{pmatrix}_{\mathcal K \times \mathcal K} \\ & = \begin{bmatrix} \psi(\mathcal A_1,\mathcal A_1) & \psi(\mathcal A_1,\mathcal A_2) &\cdots &\psi(\mathcal A_1,\mathcal A_{\mathcal K}) \\ \psi(\mathcal A_2,\mathcal A_1) & \psi(\mathcal A_2,\mathcal A_2) & \cdots &\psi(\mathcal A_2,\mathcal A_{\mathcal K}) \\ \vdots & \vdots & \ddots & \vdots\\ \psi(\mathcal A_{\mathcal K},\mathcal A_1) & \psi(\mathcal A_{\mathcal K},\mathcal A_2) & \cdots & \psi(\mathcal A_{\mathcal K},\mathcal A_{\mathcal K}) \end{bmatrix}_{\mathcal K \times \mathcal K} \end{aligned} YTWYv(i)A1,v(j)AK{y(i)[y(j)]T}K×K= 001000000 K×K= v(i)A1v(j)A1Wv(i)v(j)v(i)A2v(j)A1Wv(i)v(j)v(i)AKv(j)A1Wv(i)v(j)v(i)A1v(j)A2Wv(i)v(j)v(i)A2v(j)A2Wv(i)v(j)v(i)AKv(j)A2Wv(i)v(j)v(i)A1v(j)AKWv(i)v(j)v(i)A2v(j)AKWv(i)v(j)v(i)AKv(j)AKWv(i)v(j) K×K= ψ(A1,A1)ψ(A2,A1)ψ(AK,A1)ψ(A1,A2)ψ(A2,A2)ψ(AK,A2)ψ(A1,AK)ψ(A2,AK)ψ(AK,AK) K×K
很明显, Y T W Y \mathcal Y^T\mathcal W\mathcal Y YTWY O \mathcal O O中的第二项是不同的。但如果将 Y T W Y \mathcal Y^T\mathcal W\mathcal Y YTWY替代第二项,记作 O ′ \mathcal O' O。由于要求解的是 tr ( O ⋅ P − 1 ) \text{tr}(\mathcal O \cdot \mathcal P^{-1}) tr(OP1),但是 P \mathcal P P对角矩阵( P − 1 \mathcal P^{-1} P1自然也是对角矩阵),那么会出现这样的现象
其中 O ′ \mathcal O' O和对角阵 P − 1 \mathcal P^{-1} P1相乘,它仅影响对角线上的元素,对其他位置的元素虽然也影响,但其他位置我们并不关心,因为我们只关心 tr ( O ′ P − 1 ) \text{tr}(\mathcal O' \mathcal P^{-1}) tr(OP1),也就是它主对角线上的元素和。
O ′ = Y T D Y − Y T W Y = [ ψ ( A 1 , A 1 ˉ ) − ψ ( A 1 , A 2 ) ⋯ − ψ ( A 1 , A K ) − ψ ( A 2 , A 1 ) ψ ( A 2 , A 2 ˉ ) ⋯ − ψ ( A 2 , A K ) ⋮ ⋮ ⋱ ⋮ − ψ ( A K , A 1 ) − ψ ( A K , A 2 ) ⋯ ψ ( A K , A K ˉ ) ] K × K ⇒ tr ( O ′ P − 1 ) = tr ( O P − 1 ) \begin{aligned} \mathcal O' & = \mathcal Y^T\mathcal D \mathcal Y - \mathcal Y^T\mathcal W \mathcal Y \\ & = \begin{bmatrix} \psi(\mathcal A_1, \bar {\mathcal A_1}) & -\psi(\mathcal A_1,\mathcal A_2) & \cdots & -\psi(\mathcal A_1,\mathcal A_{\mathcal K}) \\ -\psi(\mathcal A_2,\mathcal A_1) & \psi(\mathcal A_2, \bar {\mathcal A_2}) & \cdots & -\psi(\mathcal A_2,\mathcal A_{\mathcal K})\\ \vdots & \vdots & \ddots & \vdots \\ -\psi(\mathcal A_{\mathcal K},\mathcal A_1) & -\psi(\mathcal A_{\mathcal K},\mathcal A_2) & \cdots & \psi(\mathcal A_{\mathcal K}, \bar {\mathcal A_{\mathcal K}}) \end{bmatrix}_{\mathcal K \times \mathcal K} \\ & \Rightarrow \text{tr}(\mathcal O'\mathcal P^{-1}) = \text{tr}(\mathcal O\mathcal P^{-1}) \end{aligned} O=YTDYYTWY= ψ(A1,A1ˉ)ψ(A2,A1)ψ(AK,A1)ψ(A1,A2)ψ(A2,A2ˉ)ψ(AK,A2)ψ(A1,AK)ψ(A2,AK)ψ(AK,AKˉ) K×Ktr(OP1)=tr(OP1)

目标函数整理与拉普拉斯矩阵

至此, O , P \mathcal O,\mathcal P O,P都可以使用 W , Y \mathcal W,\mathcal Y W,Y进行表示。关于目标函数可转化至如下形式:
矩阵的逆应该放在外面,上面也是,latex用的不好~,见笑。
Y ^ = arg ⁡ min ⁡ Y ∑ k = 1 K ψ ( A k , A k ˉ ) degree ( A k ) = arg ⁡ min ⁡ Y tr ( O P − 1 ) = arg ⁡ min ⁡ Y tr ( O ′ P − 1 ) = arg ⁡ min ⁡ Y tr [ Y T ( D − W ) Y ⏟ O ′ ⋅ ( Y T D Y ) − 1 ⏟ P ] \begin{aligned} \hat {\mathcal Y} & = \mathop{\arg\min}\limits_{\mathcal Y}\sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ & = \mathop{\arg\min}\limits_{\mathcal Y} \text{tr}(\mathcal O \mathcal P^{-1}) \\ & = \mathop{\arg\min}\limits_{\mathcal Y} \text{tr}(\mathcal O' \mathcal P^{-1}) \\ & = \mathop{\arg\min}\limits_{\mathcal Y} \text{tr} \left[\underbrace{\mathcal Y^T(\mathcal D - \mathcal W) \mathcal Y}_{\mathcal O'} \cdot \underbrace{\left(\mathcal Y^T \mathcal D \mathcal Y\right)^{-1}}_{\mathcal P}\right] \end{aligned} Y^=Yargmink=1Kdegree(Ak)ψ(Ak,Akˉ)=Yargmintr(OP1)=Yargmintr(OP1)=Yargmintr O YT(DW)YP (YTDY)1
通常令 L = D − W \mathcal L = \mathcal D - \mathcal W L=DW,其中 L \mathcal L L也被称作拉普拉斯矩阵( Laplacian Matrix \text{Laplacian Matrix} Laplacian Matrix)

相关参考:
机器学习-谱聚类(3)-模型的矩阵形式-Indicator Vector
机器学习-谱聚类(4)-模型的矩阵形式-对角矩阵
机器学习-谱聚类(5)-模型的矩阵形式-Laplacian Matrix