机器学习笔记之谱聚类(三)模型的矩阵形式转化
机器学习笔记之谱聚类——模型的矩阵形式转化
引言
上一节针对 k-Means \text{k-Means} k-Means算法的缺陷,介绍了谱聚类( Spectral Clustering \text{Spectral Clustering} Spectral Clustering)的结构描述以及目标函数。本节将目标函数转化成矩阵形式,并引出拉普拉斯矩阵。
回顾:谱聚类——目标函数表示
谱聚类关于处理聚类任务的朴素思想是:通过对样本的合适划分,使样本的划分代价最小。假设一个样本集合
X
=
{
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
N
)
}
\mathcal X = \{x^{(1)},x^{(2)},\cdots,x^{(N)}\}
X={x(1),x(2),⋯,x(N)},它对应的图结构描述如下:
G
=
{
V
,
E
}
\mathcal G = \{\mathcal V,\mathcal E\}
G={V,E}
其中
V
\mathcal V
V表示结点集合,结点数量与样本数量相同;而边集合
E
\mathcal E
E可视作一个权重矩阵,矩阵中的元素表示结点之间的关联关系:
V
=
{
v
(
1
)
,
v
(
2
)
,
⋯
,
v
(
N
)
}
E
⇒
W
=
[
W
v
(
i
)
⇔
v
(
j
)
]
N
×
N
\begin{aligned} \mathcal V & = \{v^{(1)},v^{(2)},\cdots,v^{(N)}\} \\ \mathcal E & \Rightarrow \mathcal W = [\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}]_{N \times N} \end{aligned}
VE={v(1),v(2),⋯,v(N)}⇒W=[Wv(i)⇔v(j)]N×N
而结点
v
(
i
)
,
v
(
j
)
v^{(i)},v^{(j)}
v(i),v(j)之间的关联关系
W
v
(
i
)
⇔
v
(
j
)
\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}
Wv(i)⇔v(j)通常使用样本之间的核函数(
Kernel Function
\text{Kernel Function}
Kernel Function)表示。这里以高斯核函数为例:
W
v
(
i
)
⇔
v
(
j
)
=
κ
(
x
(
i
)
,
x
(
j
)
)
=
exp
{
−
∣
∣
x
(
i
)
−
x
(
j
)
∣
∣
2
2
2
σ
2
}
assert
⟨
v
(
i
)
,
v
(
j
)
⟩
∈
E
\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} = \kappa(x^{(i)},x^{(j)}) = \exp \left\{- \frac{||x^{(i)} - x^{(j)}||_2^2}{2\sigma^2}\right\} \quad \text{assert} \left\langle v^{(i)},v^{(j)}\right\rangle \in \mathcal E
Wv(i)⇔v(j)=κ(x(i),x(j))=exp{−2σ2∣∣x(i)−x(j)∣∣22}assert⟨v(i),v(j)⟩∈E
假设使用规则将图
G
\mathcal G
G划分成
K
\mathcal K
K个子图,各子图的结点集合分别表示为:
{
A
1
,
A
2
,
⋯
,
A
K
}
\{\mathcal A_1,\mathcal A_2,\cdots,\mathcal A_{\mathcal K}\}
{A1,A2,⋯,AK},并且有:
{
V
=
A
1
∪
A
2
∪
⋯
∪
A
K
A
i
∩
A
j
=
ϕ
∀
i
,
j
∈
{
1
,
2
,
⋯
,
K
}
\begin{cases} \mathcal V = \mathcal A_1 \cup \mathcal A_2 \cup \cdots \cup \mathcal A_{\mathcal K} \\ \mathcal A_i \cap \mathcal A_j = \phi \quad \forall i,j \in \{1,2,\cdots,\mathcal K\} \end{cases}
{V=A1∪A2∪⋯∪AKAi∩Aj=ϕ∀i,j∈{1,2,⋯,K}
至此,在执行划分后关于结点集合
V
\mathcal V
V的代价函数
Cut
(
V
)
\text{Cut}(\mathcal V)
Cut(V)可表示为:
其中
A
k
ˉ
\bar {\mathcal A_k}
Akˉ表示
A
k
\mathcal A_k
Ak的补集。
Cut
(
V
)
=
Cut
(
A
1
,
A
2
,
⋯
,
A
K
)
=
∑
k
=
1
K
ψ
(
A
k
,
A
k
ˉ
)
=
∑
k
=
1
K
∑
j
≠
k
K
ψ
(
A
k
,
A
j
)
\begin{aligned} \text{Cut}(\mathcal V) & = \text{Cut}(\mathcal A_1,\mathcal A_2,\cdots,\mathcal A_{\mathcal K}) \\ & = \sum_{k=1}^{\mathcal K} \psi(\mathcal A_k,\bar {\mathcal A_k}) \\ & = \sum_{k=1}^{\mathcal K} \sum_{j \neq k}^{\mathcal K} \psi(\mathcal A_k,\mathcal A_j) \end{aligned}
Cut(V)=Cut(A1,A2,⋯,AK)=k=1∑Kψ(Ak,Akˉ)=k=1∑Kj=k∑Kψ(Ak,Aj)
最后对
ψ
(
A
k
,
A
j
)
\psi(\mathcal A_k,\mathcal A_j)
ψ(Ak,Aj)进行标准化后,作为图划分的目标函数:
{
Normalized-Cut
(
V
)
=
∑
k
=
1
K
ψ
(
A
k
,
A
k
ˉ
)
degree
(
A
k
)
degree
(
A
k
)
=
∑
v
(
i
)
∈
A
k
∑
j
=
1
N
W
v
(
i
)
⇔
v
(
j
)
\begin{cases} \text{Normalized-Cut}(\mathcal V) = \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ \text{degree}(\mathcal A_k) = \sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^{N} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \end{cases}
{Normalized-Cut(V)=∑k=1Kdegree(Ak)ψ(Ak,Akˉ)degree(Ak)=∑v(i)∈Ak∑j=1NWv(i)⇔v(j)
最终将切分的选择问题转化为数学中的优化问题:
min
{
A
k
}
k
=
1
K
Normalized-Cut
(
V
)
\mathop{\min}\limits_{\{\mathcal A_k\}_{k=1}^{\mathcal K}} \text{Normalized-Cut}(\mathcal V)
{Ak}k=1KminNormalized-Cut(V)
引入指示向量
为了能够更方便地对优化问题进行求解,通常将优化函数的连加形式转化为矩阵的乘法形式。首先观察
{
A
k
}
k
=
1
K
\{\mathcal A_k\}_{k=1}^{\mathcal K}
{Ak}k=1K,它表示结点子集的集合:
{
A
k
}
k
=
1
K
⇒
{
A
1
,
A
2
,
⋯
,
A
K
}
\{\mathcal A_k\}_{k=1}^{\mathcal K} \Rightarrow \{\mathcal A_1,\mathcal A_2,\cdots,\mathcal A_{\mathcal K}\}
{Ak}k=1K⇒{A1,A2,⋯,AK}
将这个
{
A
k
}
k
=
1
K
\{\mathcal A_k\}_{k=1}^{\mathcal K}
{Ak}k=1K描述为指示向量(
Indicator Vector
\text{Indicator Vector}
Indicator Vector)的形式:
所谓‘指示向量’
y ( i ) ( i = 1 , 2 , ⋯ , N ) y^{(i)}(i=1,2,\cdots,N) y(i)(i=1,2,⋯,N),就是描述结点
v ( i ) v^{(i)} v(i)位于结点子集
A k ( k = 1 , 2 , ⋯ , K ) \mathcal A_k(k=1,2,\cdots,\mathcal K) Ak(k=1,2,⋯,K)的信息。使用
one-hot \text{one-hot} one-hot向量进行表达。
例如某结点
v ( i ) v^{(i)} v(i)对应的指示向量
y ( i ) y^{(i)} y(i)结果为
( 1 , 0 , ⋯ , 0 ) ⏟ K 个元素 \underbrace{(1,0,\cdots,0)}_{\mathcal K个元素} K个元素 (1,0,⋯,0),那么该向量表示为
v ( i ) v^{(i)} v(i)属于
A 1 \mathcal A_1 A1结点子集。随着划分结点子集数量的变化,
one-hot \text{one-hot} one-hot向量的长度也随之发生变化。
{ y ( i ) ∈ { 0 , 1 } K ∑ k = 1 K y v ( i ) ∈ A k = 1 \begin{cases} y^{(i)} \in \{0,1\}^{\mathcal K} \\ \sum_{k=1}^{\mathcal K} y_{v^{(i)} \in \mathcal A_{k}} = 1 \end{cases} {y(i)∈{0,1}K∑k=1Kyv(i)∈Ak=1
至此,使用一个
K
\mathcal K
K维的
one-hot
\text{one-hot}
one-hot向量表示样本
x
(
i
)
x^{(i)}
x(i)对应结点的归属信息,对于整个样本集合
X
\mathcal X
X,对应的归属信息
Y
\mathcal Y
Y可表示为:
Y
=
(
y
(
1
)
,
y
(
2
)
,
⋯
,
y
(
N
)
)
T
=
[
y
1
(
1
)
,
y
2
(
1
)
,
⋯
,
y
K
(
1
)
y
1
(
2
)
,
y
2
(
2
)
,
⋯
,
y
K
(
2
)
⋮
y
1
(
N
)
,
y
2
(
N
)
,
⋯
,
y
K
(
N
)
]
N
×
K
\begin{aligned} \mathcal Y & = (y^{(1)},y^{(2)},\cdots,y^{(N)})^T \\ & = \begin{bmatrix} y_1^{(1)},y_2^{(1)},\cdots,y_{\mathcal K}^{(1)} \\ y_1^{(2)},y_2^{(2)},\cdots,y_{\mathcal K}^{(2)} \\ \vdots \\ y_1^{(N)},y_2^{(N)},\cdots,y_{\mathcal K}^{(N)} \end{bmatrix}_{N \times \mathcal K} \end{aligned}
Y=(y(1),y(2),⋯,y(N))T=
y1(1),y2(1),⋯,yK(1)y1(2),y2(2),⋯,yK(2)⋮y1(N),y2(N),⋯,yK(N)
N×K
最终关于优化问题可转化为如下形式:
Y
^
=
arg
min
Y
Normalized-Cut
(
V
)
\hat {\mathcal Y} = \mathop{\arg\min}\limits_{\mathcal Y} \text{Normalized-Cut}(\mathcal V)
Y^=YargminNormalized-Cut(V)
优化问题的化简过程
小插曲:观察 Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)
对于
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V):
Normalize-Cut
(
V
)
=
∑
k
=
1
K
ψ
(
A
k
,
A
k
ˉ
)
degree
(
A
k
)
=
∑
k
=
1
K
∑
m
≠
k
∑
v
(
i
)
∈
A
k
∑
v
(
j
)
∈
A
m
W
v
(
i
)
⇔
v
(
j
)
∑
v
(
i
)
∈
A
k
∑
j
=
1
N
W
v
(
i
)
⇔
v
(
j
)
=
∑
k
=
1
K
∑
v
(
i
)
∈
A
k
∑
m
≠
k
∑
v
(
j
)
∈
A
m
W
v
(
i
)
⇔
v
(
j
)
∑
j
=
1
N
W
v
(
i
)
⇔
v
(
j
)
\begin{aligned} \text{Normalize-Cut}(\mathcal V) & = \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ & = \sum_{k=1}^{\mathcal K} \frac{\sum_{m \neq k} \sum_{v^{(i)} \in \mathcal A_k}\sum_{v^{(j)} \in\mathcal A_m} \mathcal W_{v^{(i)}\Leftrightarrow v^{(j)}}}{\sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^{N} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \\ & = \sum_{k=1}^{\mathcal K} \sum_{v^{(i)} \in \mathcal A_k} \frac{\sum_{m \neq k}\sum_{v^{(j)} \in \mathcal A_m} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}}{\sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \end{aligned}
Normalize-Cut(V)=k=1∑Kdegree(Ak)ψ(Ak,Akˉ)=k=1∑K∑v(i)∈Ak∑j=1NWv(i)⇔v(j)∑m=k∑v(i)∈Ak∑v(j)∈AmWv(i)⇔v(j)=k=1∑Kv(i)∈Ak∑∑j=1NWv(i)⇔v(j)∑m=k∑v(j)∈AmWv(i)⇔v(j)
观察展开项,可以发现项
∑
m
≠
k
∑
v
(
j
)
∈
A
m
W
v
(
i
)
⇔
v
(
j
)
∑
j
=
1
N
W
v
(
i
)
⇔
v
(
j
)
≤
1
\frac{\sum_{m \neq k}\sum_{v^{(j)} \in \mathcal A_m} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}}{\sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \leq 1
∑j=1NWv(i)⇔v(j)∑m=k∑v(j)∈AmWv(i)⇔v(j)≤1恒成立。并在当前结点子集
A
k
\mathcal A_k
Ak中只有
v
(
i
)
v^{(i)}
v(i)一个结点时取等。
化简过程
将
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V)表示成矩阵运算的形式。这里将
∑
k
=
1
K
ψ
(
A
k
,
A
k
ˉ
)
degree
(
A
k
)
\sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)}
∑k=1Kdegree(Ak)ψ(Ak,Akˉ)中的每一项看作是某对角矩阵的主对角线元素,通过对该矩阵求迹的方式对
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V)进行表示:
需要注意的是,矩阵的秩
(
Rank
)
(\text{Rank})
(Rank)和矩阵的迹
(
Trace
)
(\text{Trace})
(Trace)不是一个东西,矩阵的秩表示对角阵中非零元素的数目;而迹表示对角阵元素的和。
Normalize-Cut
(
V
)
=
∑
k
=
1
K
ψ
(
A
k
,
A
k
ˉ
)
degree
(
A
k
)
=
tr
{
[
ψ
(
A
1
,
A
1
ˉ
)
degree
(
A
1
)
ψ
(
A
2
,
A
2
ˉ
)
degree
(
A
2
)
⋱
ψ
(
A
K
,
A
K
ˉ
)
degree
(
A
K
)
]
K
×
K
}
\begin{aligned} \text{Normalize-Cut}(\mathcal V) & = \sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ & = \text{tr}\left\{\begin{bmatrix} \frac{\psi(\mathcal A_1,\bar {\mathcal A_1})}{\text{degree}(\mathcal A_1)} & & & \\ & \frac{\psi(\mathcal A_2,\bar {\mathcal A_2})}{\text{degree}(\mathcal A_2)} & & \\ & & \ddots&\\ & & &\frac{\psi(\mathcal A_{\mathcal K},\bar {\mathcal A_{\mathcal K}})}{\text{degree}(\mathcal A_{\mathcal K})} \end{bmatrix}_{\mathcal K \times \mathcal K}\right\} \end{aligned}
Normalize-Cut(V)=k=1∑Kdegree(Ak)ψ(Ak,Akˉ)=tr⎩
⎨
⎧
degree(A1)ψ(A1,A1ˉ)degree(A2)ψ(A2,A2ˉ)⋱degree(AK)ψ(AK,AKˉ)
K×K⎭
⎬
⎫
可以继续将上述矩阵进行分解,分解成两个对角阵相乘的形式:
关于元素
1 degree ( A k ) ( k = 1 , 2 , ⋯ , K ) \frac{1}{\text{degree}(\mathcal A_{k})}(k=1,2,\cdots,\mathcal K) degree(Ak)1(k=1,2,⋯,K)的矩阵,直接使用逆矩阵的形式表示。
为节省空间,
Normalize-Cut ( V ) \text{Normalize-Cut}(\mathcal V) Normalize-Cut(V)直接用
I \mathcal I I进行表示。
I = tr { [ ψ ( A 1 , A 1 ˉ ) ψ ( A 2 , A 2 ˉ ) ⋱ ψ ( A K , A K ˉ ) ] K × K ⏟ 记作 O ⋅ [ degree ( A 1 ) degree ( A 2 ) ⋱ degree ( A K ) ] K × K − 1 ⏟ 记作 P } = tr ( O ⋅ P − 1 ) \begin{aligned} \mathcal I & = \text{tr}\left\{\underbrace{\begin{bmatrix} \psi(\mathcal A_1,\bar {\mathcal A_1}) & & & \\ & \psi(\mathcal A_2,\bar {\mathcal A_2}) & & \\ & & \ddots&\\ & & & \psi(\mathcal A_{\mathcal K},\bar {\mathcal A_{\mathcal K}}) \end{bmatrix}_{\mathcal K \times \mathcal K}}_{记作\mathcal O} \cdot \underbrace{\begin{bmatrix} \text{degree}(\mathcal A_1) & & & \\ & \text{degree}(\mathcal A_2) & & \\ & & \ddots &\\ & & & \text{degree}(\mathcal A_{\mathcal K}) \end{bmatrix}_{\mathcal K \times \mathcal K}^{-1}}_{记作\mathcal P} \right\} \\ & = \text{tr}(\mathcal O \cdot \mathcal P^{-1}) \end{aligned} I=tr⎩ ⎨ ⎧记作O ψ(A1,A1ˉ)ψ(A2,A2ˉ)⋱ψ(AK,AKˉ) K×K⋅记作P degree(A1)degree(A2)⋱degree(AK) K×K−1⎭ ⎬ ⎫=tr(O⋅P−1)
此时已经将优化问题转化为矩阵的表达形式,下一步通过 已知的划分出的结点子集
{
A
k
}
k
=
1
K
⇒
Y
\{\mathcal A_k\}_{k=1}^{\mathcal K} \Rightarrow \mathcal Y
{Ak}k=1K⇒Y以及对应的权重矩阵
E
⇒
W
=
[
W
v
(
i
)
⇔
v
(
j
)
]
N
×
N
\mathcal E \Rightarrow \mathcal W = [\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}]_{N \times N}
E⇒W=[Wv(i)⇔v(j)]N×N来表示矩阵
O
,
P
\mathcal O,\mathcal P
O,P。
需要说明的点:为什么说'指示向量'
{
A
k
}
k
=
1
K
⇒
Y
\{\mathcal A_k\}_{k=1}^{\mathcal K} \Rightarrow \mathcal Y
{Ak}k=1K⇒Y是已知的。从流程上观察,整个过程是基于目标函数
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V)的迭代过程,只有执行了划分之后,才能够计算出
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V)的值,因而
Y
\mathcal Y
Y在每次迭代过程中都是已知的,只不过初始状态划分效果不佳,
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V)数值较大而已。通过不断减小
Normalize-Cut
(
V
)
\text{Normalize-Cut}(\mathcal V)
Normalize-Cut(V)的值来优化划分方式
⇒
Y
\Rightarrow \mathcal Y
⇒Y.
矩阵 P \mathcal P P的化简过程
重新观察
Y
\mathcal Y
Y,以及
Y
T
Y
\mathcal Y^T\mathcal Y
YTY:
Y
T
Y
=
[
y
(
1
)
,
y
(
2
)
,
⋯
,
y
(
N
)
]
K
×
N
⋅
[
y
(
1
)
,
y
(
2
)
,
⋯
,
y
(
N
)
]
N
×
K
T
=
[
∑
i
=
1
N
y
(
i
)
[
y
(
i
)
]
T
]
K
×
K
\begin{aligned} \mathcal Y^T\mathcal Y & = \left[y^{(1)},y^{(2)},\cdots,y^{(N)}\right]_{\mathcal K \times N} \cdot \left[y^{(1)},y^{(2)},\cdots,y^{(N)}\right]_{N \times \mathcal K}^T \\ & = \left[\sum_{i=1}^N y^{(i)}[y^{(i)}]^T\right]_{\mathcal K \times \mathcal K} \end{aligned}
YTY=[y(1),y(2),⋯,y(N)]K×N⋅[y(1),y(2),⋯,y(N)]N×KT=[i=1∑Ny(i)[y(i)]T]K×K
由于
y
(
i
)
y^{(i)}
y(i)是一个
one-hot
\text{one-hot}
one-hot向量,因而
y
(
i
)
[
y
(
i
)
]
T
y^{(i)}[y^{(i)}]^T
y(i)[y(i)]T是一个只有一个元素是1的
K
×
K
\mathcal K \times \mathcal K
K×K矩阵,并且这个元素一定在对角线的位置上,这个位置具体取决于
y
(
i
)
y^{(i)}
y(i)数值为1对应的列数。 因而
Y
T
Y
\mathcal Y^T\mathcal Y
YTY对应的
K
×
K
\mathcal K \times \mathcal K
K×K矩阵内容可描述为:
其中
N
1
N_1
N1的物理意义是:
N
N
N个样本中属于结点子集
A
1
\mathcal A_1
A1的样本数量。也可表示成
N
1
=
∣
A
1
∣
=
∑
v
(
i
)
∈
A
1
⋅
1
N_1 = |\mathcal A_1| = \sum_{v^{(i)} \in \mathcal A_1} \cdot 1
N1=∣A1∣=∑v(i)∈A1⋅1,以此类推。
[
∑
i
=
1
N
y
(
i
)
[
y
(
i
)
]
T
]
K
×
K
=
[
N
1
N
2
⋱
N
K
]
K
×
K
∑
k
=
1
K
N
k
=
N
=
[
∑
v
(
i
)
∈
A
1
⋅
1
∑
v
(
i
)
∈
A
2
⋅
1
⋱
∑
v
(
i
)
∈
A
K
⋅
1
]
K
×
K
\begin{aligned} \left[\sum_{i=1}^N y^{(i)}[y^{(i)}]^T\right]_{\mathcal K \times \mathcal K} & = \begin{bmatrix} N_1 & & & \\ & N_2 & & \\ & & \ddots &\\ & & & N_{\mathcal K} \end{bmatrix}_{\mathcal K \times \mathcal K} \quad \sum_{k=1}^{\mathcal K} N_k = N \\ & = \begin{bmatrix} \sum_{v^{(i)} \in \mathcal A_1} \cdot 1 & & & \\ & \sum_{v^{(i)} \in \mathcal A_2} \cdot 1 & & \\ & & \ddots &\\ & & & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}} \cdot 1 \end{bmatrix}_{\mathcal K \times \mathcal K} \end{aligned}
[i=1∑Ny(i)[y(i)]T]K×K=
N1N2⋱NK
K×Kk=1∑KNk=N=
∑v(i)∈A1⋅1∑v(i)∈A2⋅1⋱∑v(i)∈AK⋅1
K×K
至此,我们完全可以使用
Y
T
Y
\mathcal Y^T\mathcal Y
YTY来描述矩阵
P
\mathcal P
P了:
其中
∑ j = 1 N W v ( i ) ⇔ v ( j ) \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} ∑j=1NWv(i)⇔v(j)内部只有与
v ( i ) v^{(i)} v(i)存在边相连的若干项是有值的,其余均是
0 0 0.由于
∑ j = 1 N W v ( i ) ⇔ v ( j ) \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} ∑j=1NWv(i)⇔v(j)是一个实数,并且
[ y ( i ) ] T [y^{(i)}]^T [y(i)]T中不含
j j j,直接将连加号带进去即可
.
degree ( A k ) = ∑ v ( i ) ∈ A k ∑ j = 1 N W v ( i ) ⇔ v ( j ) = ∑ v ( i ) ∈ A k ∑ j = 1 N { [ y ( i ) ] T ⋅ W v ( i ) ⇔ v ( j ) ⋅ y ( i ) } = ∑ v ( i ) ∈ A k [ y ( i ) ] T { ∑ j = 1 N W v ( i ) ⇔ v ( j ) } y ( i ) P = [ ∑ v ( i ) ∈ A 1 ∑ j = 1 N W v ( i ) ⇔ v ( j ) ∑ v ( i ) ∈ A 2 ∑ j = 1 N W v ( i ) ⇔ v ( j ) ⋱ ∑ v ( i ) ∈ A K ∑ j = 1 N W v ( i ) ⇔ v ( j ) ) ] K × K = ( y ( 1 ) , y ( 2 ) , ⋯ , y ( N ) ) K × N [ ∑ j = 1 N W v ( 1 ) ⇔ v ( j ) ∑ j = 1 N W v ( 2 ) ⇔ v ( j ) ⋱ ∑ j = 1 N W v ( N ) ⇔ v ( j ) ) ] N × N ⏟ 记作 D ( y ( 1 ) y ( 2 ) ⋮ y ( N ) ) N × K = Y T D Y \begin{aligned} \text{degree}(\mathcal A_k) & = \sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \\ & = \sum_{v^{(i)} \in\mathcal A_k}\sum_{j=1}^N \left\{[y^{(i)}]^T \cdot \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \cdot y^{(i)}\right\} \\ & =\sum_{v^{(i)} \in\mathcal A_k} [y^{(i)}]^T \left\{\sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \right\} y^{(i)}\\ \mathcal P & = \begin{bmatrix} \sum_{v^{(i)} \in \mathcal A_1} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} & & & \\ & \sum_{v^{(i)} \in \mathcal A_2} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} & & \\ & & \ddots &\\ & & & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}) \end{bmatrix}_{\mathcal K \times \mathcal K} \\ & = (y^{(1)},y^{(2)},\cdots,y^{(N)})_{\mathcal K \times N} \underbrace{\begin{bmatrix} \sum_{j=1}^N \mathcal W_{v^{(1)} \Leftrightarrow v^{(j)}} & & & \\ & \sum_{j=1}^N \mathcal W_{v^{(2)} \Leftrightarrow v^{(j)}} & & \\ & & \ddots &\\ & & & \sum_{j=1}^N \mathcal W_{v^{(N)} \Leftrightarrow v^{(j)}}) \end{bmatrix}_{N \times N}}_{记作\mathcal D} \begin{pmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(N)} \end{pmatrix}_{N \times \mathcal K} \\ & = \mathcal Y^T \mathcal D \mathcal Y \end{aligned} degree(Ak)P=v(i)∈Ak∑j=1∑NWv(i)⇔v(j)=v(i)∈Ak∑j=1∑N{[y(i)]T⋅Wv(i)⇔v(j)⋅y(i)}=v(i)∈Ak∑[y(i)]T{j=1∑NWv(i)⇔v(j)}y(i)= ∑v(i)∈A1∑j=1NWv(i)⇔v(j)∑v(i)∈A2∑j=1NWv(i)⇔v(j)⋱∑v(i)∈AK∑j=1NWv(i)⇔v(j)) K×K=(y(1),y(2),⋯,y(N))K×N记作D ∑j=1NWv(1)⇔v(j)∑j=1NWv(2)⇔v(j)⋱∑j=1NWv(N)⇔v(j)) N×N y(1)y(2)⋮y(N) N×K=YTDY
继续观察矩阵 D \mathcal D D,可以发现 D \mathcal D D中的每一个元素均可以使用 W \mathcal W W进行表示:
其中
W i \mathcal W_i Wi表示权重矩阵的第
i i i行;
I N × 1 \mathcal I_{N \times 1} IN×1表示元素均为1,并且长度为
N N N的列向量.
-
diag
(
W
⋅
I
N
×
1
)
\text{diag}(\mathcal W \cdot \mathcal I_{N \times 1})
diag(W⋅IN×1)
表示将列向量
W ⋅ I N × 1 \mathcal W \cdot \mathcal I_{N \times 1} W⋅IN×1中的所有元素按顺序放置在矩阵对角线位置,从而构成对角阵 -> 对角转化
{ ∑ j = 1 N W v ( i ) ⇔ v ( j ) = [ W i ] 1 × N ⋅ I N × 1 D = diag ( W ⋅ I N × 1 ) \begin{cases} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} = [\mathcal W_i]_{1 \times N} \cdot \mathcal I_{N \times 1} \\ \mathcal D = \text{diag}(\mathcal W \cdot \mathcal I_{N \times 1}) \end{cases} {∑j=1NWv(i)⇔v(j)=[Wi]1×N⋅IN×1D=diag(W⋅IN×1)
矩阵 O \mathcal O O的化简过程
至此,可以使用权重矩阵 W \mathcal W W描述矩阵 P \mathcal P P,继续观察矩阵 O \mathcal O O:
- 首先观察矩阵
O
\mathcal O
O中元素的表达形式,可以将其修改成如下形式:
结点集合补集的定义:
A k ∪ A k ˉ = V \mathcal A_k \cup \bar {\mathcal A_k} = \mathcal V Ak∪Akˉ=V
ψ ( A k , A k ˉ ) = ψ ( A k , V ) − ψ ( A k , A k ) = ∑ v ( i ) ∈ A k ∑ j = 1 N W v ( i ) ⇔ v ( j ) − ∑ v ( i ) ∈ A k ∑ v ( j ) ∈ A k W v ( i ) ⇔ v ( j ) = degree ( A k ) − ψ ( A k , A k ) \begin{aligned} \psi(\mathcal A_k, \bar {\mathcal A_k}) & = \psi(\mathcal A_k,\mathcal V) - \psi(\mathcal A_k,\mathcal A_k) \\ & = \sum_{v^{(i)} \in \mathcal A_k} \sum_{j=1}^N \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} - \sum_{v^{(i)} \in \mathcal A_k} \sum_{v^{(j)} \in \mathcal A_k} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}} \\ & = \text{degree}(\mathcal A_k) - \psi(\mathcal A_k,\mathcal A_k) \end{aligned} ψ(Ak,Akˉ)=ψ(Ak,V)−ψ(Ak,Ak)=v(i)∈Ak∑j=1∑NWv(i)⇔v(j)−v(i)∈Ak∑v(j)∈Ak∑Wv(i)⇔v(j)=degree(Ak)−ψ(Ak,Ak) - 对应的矩阵
O
\mathcal O
O可表示成如下形式:
O = [ ψ ( A 1 , A 1 ˉ ) ψ ( A 2 , A 2 ˉ ) ⋱ ψ ( A K , A K ˉ ) ] K × K = [ degree ( A 1 ) degree ( A 2 ) ⋱ degree ( A K ) ] K × K − [ ψ ( A 1 , A 1 ) ψ ( A 2 , A 2 ) ⋱ ψ ( A K , A K ) ] \begin{aligned} \mathcal O & = \begin{bmatrix} \psi(\mathcal A_1, \bar {\mathcal A_1}) & & & \\ & \psi(\mathcal A_2, \bar {\mathcal A_2}) & & \\ & & \ddots &\\ & & & \psi(\mathcal A_{\mathcal K}, \bar {\mathcal A_{\mathcal K}}) \end{bmatrix}_{\mathcal K \times \mathcal K} \\ & = \begin{bmatrix} \text{degree}(\mathcal A_1) & & & \\ & \text{degree}(\mathcal A_2) & & \\ & & \ddots &\\ & & & \text{degree}(\mathcal A_{\mathcal K}) \end{bmatrix}_{\mathcal K \times \mathcal K} - \begin{bmatrix} \psi(\mathcal A_1,\mathcal A_1) & & & \\ & \psi(\mathcal A_2,\mathcal A_2) & & \\ & & \ddots &\\ & & & \psi(\mathcal A_{\mathcal K},\mathcal A_{\mathcal K}) \end{bmatrix} \end{aligned} O= ψ(A1,A1ˉ)ψ(A2,A2ˉ)⋱ψ(AK,AKˉ) K×K= degree(A1)degree(A2)⋱degree(AK) K×K− ψ(A1,A1)ψ(A2,A2)⋱ψ(AK,AK)
第一项我们认识,它就是
Y
T
D
Y
\mathcal Y^T\mathcal D\mathcal Y
YTDY。在观察第二项之前,先认识一个
K
×
K
\mathcal K \times \mathcal K
K×K的矩阵:
Y
T
W
Y
\mathcal Y^T \mathcal W \mathcal Y
YTWY。先将
Y
T
W
Y
\mathcal Y^T\mathcal W \mathcal Y
YTWY展开,其结果表示如下:
Y
T
W
Y
=
(
y
(
1
)
,
y
(
2
)
,
⋯
,
y
(
N
)
)
K
×
N
[
W
v
(
1
)
⇔
v
(
1
)
,
W
v
(
1
)
⇔
v
(
2
)
,
⋯
,
W
v
(
1
)
⇔
v
(
N
)
W
v
(
2
)
⇔
v
(
1
)
,
W
v
(
2
)
⇔
v
(
2
)
,
⋯
,
W
v
(
2
)
⇔
v
(
N
)
⋮
W
v
(
N
)
⇔
v
(
1
)
,
W
v
(
N
)
⇔
v
(
2
)
,
⋯
,
W
v
(
N
)
⇔
v
(
N
)
]
N
×
N
(
y
(
1
)
y
(
2
)
⋮
y
(
N
)
)
N
×
K
=
[
∑
i
=
1
N
y
(
i
)
⋅
W
v
(
i
)
⇔
v
(
1
)
⋯
,
∑
i
=
1
N
y
(
i
)
⋅
W
v
(
i
)
⇔
v
(
N
)
]
K
×
N
(
y
(
1
)
y
(
2
)
⋮
y
(
N
)
)
N
×
K
=
∑
i
=
1
N
∑
j
=
1
N
{
y
(
i
)
[
y
(
j
)
]
T
⋅
W
v
(
i
)
⇔
v
(
j
)
⏟
实数
}
K
×
K
\begin{aligned} \mathcal Y^T\mathcal W \mathcal Y & = \left(y^{(1)},y^{(2)},\cdots,y^{(N)}\right)_{\mathcal K \times N}\begin{bmatrix} \mathcal W_{v^{(1)} \Leftrightarrow v^{(1)}},\mathcal W_{v^{(1)} \Leftrightarrow v^{(2)}},\cdots,\mathcal W_{v^{(1)} \Leftrightarrow v^{(N)}} \\ \mathcal W_{v^{(2)} \Leftrightarrow v^{(1)}},\mathcal W_{v^{(2)} \Leftrightarrow v^{(2)}},\cdots,\mathcal W_{v^{(2)} \Leftrightarrow v^{(N)}} \\ \vdots \\ \mathcal W_{v^{(N)} \Leftrightarrow v^{(1)}},\mathcal W_{v^{(N)} \Leftrightarrow v^{(2)}},\cdots,\mathcal W_{v^{(N)} \Leftrightarrow v^{(N)}} \\ \end{bmatrix}_{N \times N} \begin{pmatrix} y^{(1)} \\ y^{(2)}\\ \vdots \\ y^{(N)} \end{pmatrix}_{N \times \mathcal K} \\ & = \left[\sum_{i=1}^N y^{(i)} \cdot \mathcal W_{v^{(i)} \Leftrightarrow v^{(1)}} \cdots,\sum_{i=1}^N y^{(i)} \cdot \mathcal W_{v^{(i)} \Leftrightarrow v^{(N)}}\right]_{\mathcal K \times N} \begin{pmatrix} y^{(1)} \\ y^{(2)}\\ \vdots \\ y^{(N)} \end{pmatrix}_{N \times \mathcal K}\\ & = \sum_{i=1}^{N} \sum_{j=1}^N \left\{y^{(i)} [y^{(j)}]^T \cdot \underbrace{\mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}}_{实数}\right\}_{\mathcal K \times \mathcal K} \\ \end{aligned}
YTWY=(y(1),y(2),⋯,y(N))K×N
Wv(1)⇔v(1),Wv(1)⇔v(2),⋯,Wv(1)⇔v(N)Wv(2)⇔v(1),Wv(2)⇔v(2),⋯,Wv(2)⇔v(N)⋮Wv(N)⇔v(1),Wv(N)⇔v(2),⋯,Wv(N)⇔v(N)
N×N
y(1)y(2)⋮y(N)
N×K=[i=1∑Ny(i)⋅Wv(i)⇔v(1)⋯,i=1∑Ny(i)⋅Wv(i)⇔v(N)]K×N
y(1)y(2)⋮y(N)
N×K=i=1∑Nj=1∑N⎩
⎨
⎧y(i)[y(j)]T⋅实数
Wv(i)⇔v(j)⎭
⎬
⎫K×K
如果将上式的
K
×
K
\mathcal K \times \mathcal K
K×K展开,可以表示成如下形式:
这里个人出现一个误区:
{
y
(
i
)
[
y
(
j
)
]
T
}
K
×
K
\{y^{(i)}[y^{(j)}]^T\}_{\mathcal K \times \mathcal K}
{y(i)[y(j)]T}K×K在
i
,
j
i,j
i,j不属于同一个结点集合时,其矩阵结果并不是全零元素。而是存在一个
1
1
1元素。示例:如果
v
(
i
)
∈
A
1
,
v
(
j
)
∈
A
K
v^{(i)}\in\mathcal A_{1},v^{(j)} \in \mathcal A_{\mathcal K}
v(i)∈A1,v(j)∈AK,那么对应的
{
y
(
i
)
[
y
(
j
)
]
T
}
K
×
K
\{y^{(i)}[y^{(j)}]^T\}_{\mathcal K \times \mathcal K}
{y(i)[y(j)]T}K×K应该是这个的样子:
v
(
i
)
∈
A
1
,
v
(
j
)
∈
A
K
⇒
{
y
(
i
)
[
y
(
j
)
]
T
}
K
×
K
=
(
0
0
⋯
0
0
0
⋯
0
⋮
⋮
⋱
⋮
1
0
⋯
0
)
K
×
K
Y
T
W
Y
=
(
∑
v
(
i
)
∈
A
1
∑
v
(
j
)
∈
A
1
W
v
(
i
)
⇔
v
(
j
)
∑
v
(
i
)
∈
A
1
∑
v
(
j
)
∈
A
2
W
v
(
i
)
⇔
v
(
j
)
⋯
∑
v
(
i
)
∈
A
1
∑
v
(
j
)
∈
A
K
W
v
(
i
)
⇔
v
(
j
)
∑
v
(
i
)
∈
A
2
∑
v
(
j
)
∈
A
1
W
v
(
i
)
⇔
v
(
j
)
∑
v
(
i
)
∈
A
2
∑
v
(
j
)
∈
A
2
W
v
(
i
)
⇔
v
(
j
)
⋯
∑
v
(
i
)
∈
A
2
∑
v
(
j
)
∈
A
K
W
v
(
i
)
⇔
v
(
j
)
⋮
⋮
⋱
⋮
∑
v
(
i
)
∈
A
K
∑
v
(
j
)
∈
A
1
W
v
(
i
)
⇔
v
(
j
)
∑
v
(
i
)
∈
A
K
∑
v
(
j
)
∈
A
2
W
v
(
i
)
⇔
v
(
j
)
⋯
∑
v
(
i
)
∈
A
K
∑
v
(
j
)
∈
A
K
W
v
(
i
)
⇔
v
(
j
)
)
K
×
K
=
[
ψ
(
A
1
,
A
1
)
ψ
(
A
1
,
A
2
)
⋯
ψ
(
A
1
,
A
K
)
ψ
(
A
2
,
A
1
)
ψ
(
A
2
,
A
2
)
⋯
ψ
(
A
2
,
A
K
)
⋮
⋮
⋱
⋮
ψ
(
A
K
,
A
1
)
ψ
(
A
K
,
A
2
)
⋯
ψ
(
A
K
,
A
K
)
]
K
×
K
\begin{aligned} & v^{(i)}\in\mathcal A_{1},v^{(j)} \in \mathcal A_{\mathcal K} \Rightarrow \{y^{(i)}[y^{(j)}]^T\}_{\mathcal K \times \mathcal K} = \begin{pmatrix} 0 & 0 &\cdots &0 \\ 0 & 0 & \cdots &0 \\ \vdots & \vdots & \ddots & \vdots\\ 1 & 0 & \cdots & 0 \end{pmatrix}_{\mathcal K \times \mathcal K} \\ \mathcal Y^T\mathcal W \mathcal Y & = \begin{pmatrix} \sum_{v^{(i)} \in \mathcal A_1}\sum_{v^{(j)} \in \mathcal A_1 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \sum_{v^{(i)} \in \mathcal A_1}\sum_{v^{(j)} \in \mathcal A_2 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} &\cdots &\sum_{v^{(i)} \in \mathcal A_1}\sum_{v^{(j)} \in \mathcal A_{\mathcal K} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \\ \sum_{v^{(i)} \in \mathcal A_2}\sum_{v^{(j)} \in \mathcal A_1 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \sum_{v^{(i)} \in \mathcal A_2}\sum_{v^{(j)} \in \mathcal A_2 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \cdots & \sum_{v^{(i)} \in \mathcal A_2}\sum_{v^{(j)} \in \mathcal A_{\mathcal K} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \\ \vdots & \vdots & \ddots & \vdots\\ \sum_{v^{(i)} \in \mathcal A_{\mathcal K}}\sum_{v^{(j)} \in \mathcal A_1 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}}\sum_{v^{(j)} \in \mathcal A_2 \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} & \cdots & \sum_{v^{(i)} \in \mathcal A_{\mathcal K}}\sum_{v^{(j)} \in \mathcal A_{\mathcal K} \mathcal W_{v^{(i)} \Leftrightarrow v^{(j)}}} \end{pmatrix}_{\mathcal K \times \mathcal K} \\ & = \begin{bmatrix} \psi(\mathcal A_1,\mathcal A_1) & \psi(\mathcal A_1,\mathcal A_2) &\cdots &\psi(\mathcal A_1,\mathcal A_{\mathcal K}) \\ \psi(\mathcal A_2,\mathcal A_1) & \psi(\mathcal A_2,\mathcal A_2) & \cdots &\psi(\mathcal A_2,\mathcal A_{\mathcal K}) \\ \vdots & \vdots & \ddots & \vdots\\ \psi(\mathcal A_{\mathcal K},\mathcal A_1) & \psi(\mathcal A_{\mathcal K},\mathcal A_2) & \cdots & \psi(\mathcal A_{\mathcal K},\mathcal A_{\mathcal K}) \end{bmatrix}_{\mathcal K \times \mathcal K} \end{aligned}
YTWYv(i)∈A1,v(j)∈AK⇒{y(i)[y(j)]T}K×K=
00⋮100⋮0⋯⋯⋱⋯00⋮0
K×K=
∑v(i)∈A1∑v(j)∈A1Wv(i)⇔v(j)∑v(i)∈A2∑v(j)∈A1Wv(i)⇔v(j)⋮∑v(i)∈AK∑v(j)∈A1Wv(i)⇔v(j)∑v(i)∈A1∑v(j)∈A2Wv(i)⇔v(j)∑v(i)∈A2∑v(j)∈A2Wv(i)⇔v(j)⋮∑v(i)∈AK∑v(j)∈A2Wv(i)⇔v(j)⋯⋯⋱⋯∑v(i)∈A1∑v(j)∈AKWv(i)⇔v(j)∑v(i)∈A2∑v(j)∈AKWv(i)⇔v(j)⋮∑v(i)∈AK∑v(j)∈AKWv(i)⇔v(j)
K×K=
ψ(A1,A1)ψ(A2,A1)⋮ψ(AK,A1)ψ(A1,A2)ψ(A2,A2)⋮ψ(AK,A2)⋯⋯⋱⋯ψ(A1,AK)ψ(A2,AK)⋮ψ(AK,AK)
K×K
很明显,
Y
T
W
Y
\mathcal Y^T\mathcal W\mathcal Y
YTWY和
O
\mathcal O
O中的第二项是不同的。但如果将
Y
T
W
Y
\mathcal Y^T\mathcal W\mathcal Y
YTWY替代第二项,记作
O
′
\mathcal O'
O′。由于要求解的是
tr
(
O
⋅
P
−
1
)
\text{tr}(\mathcal O \cdot \mathcal P^{-1})
tr(O⋅P−1),但是
P
\mathcal P
P是对角矩阵(
P
−
1
\mathcal P^{-1}
P−1自然也是对角矩阵),那么会出现这样的现象:
其中
O
′
\mathcal O'
O′和对角阵
P
−
1
\mathcal P^{-1}
P−1相乘,它仅影响对角线上的元素,对其他位置的元素虽然也影响,但其他位置我们并不关心,因为我们只关心
tr
(
O
′
P
−
1
)
\text{tr}(\mathcal O' \mathcal P^{-1})
tr(O′P−1),也就是它主对角线上的元素和。
O
′
=
Y
T
D
Y
−
Y
T
W
Y
=
[
ψ
(
A
1
,
A
1
ˉ
)
−
ψ
(
A
1
,
A
2
)
⋯
−
ψ
(
A
1
,
A
K
)
−
ψ
(
A
2
,
A
1
)
ψ
(
A
2
,
A
2
ˉ
)
⋯
−
ψ
(
A
2
,
A
K
)
⋮
⋮
⋱
⋮
−
ψ
(
A
K
,
A
1
)
−
ψ
(
A
K
,
A
2
)
⋯
ψ
(
A
K
,
A
K
ˉ
)
]
K
×
K
⇒
tr
(
O
′
P
−
1
)
=
tr
(
O
P
−
1
)
\begin{aligned} \mathcal O' & = \mathcal Y^T\mathcal D \mathcal Y - \mathcal Y^T\mathcal W \mathcal Y \\ & = \begin{bmatrix} \psi(\mathcal A_1, \bar {\mathcal A_1}) & -\psi(\mathcal A_1,\mathcal A_2) & \cdots & -\psi(\mathcal A_1,\mathcal A_{\mathcal K}) \\ -\psi(\mathcal A_2,\mathcal A_1) & \psi(\mathcal A_2, \bar {\mathcal A_2}) & \cdots & -\psi(\mathcal A_2,\mathcal A_{\mathcal K})\\ \vdots & \vdots & \ddots & \vdots \\ -\psi(\mathcal A_{\mathcal K},\mathcal A_1) & -\psi(\mathcal A_{\mathcal K},\mathcal A_2) & \cdots & \psi(\mathcal A_{\mathcal K}, \bar {\mathcal A_{\mathcal K}}) \end{bmatrix}_{\mathcal K \times \mathcal K} \\ & \Rightarrow \text{tr}(\mathcal O'\mathcal P^{-1}) = \text{tr}(\mathcal O\mathcal P^{-1}) \end{aligned}
O′=YTDY−YTWY=
ψ(A1,A1ˉ)−ψ(A2,A1)⋮−ψ(AK,A1)−ψ(A1,A2)ψ(A2,A2ˉ)⋮−ψ(AK,A2)⋯⋯⋱⋯−ψ(A1,AK)−ψ(A2,AK)⋮ψ(AK,AKˉ)
K×K⇒tr(O′P−1)=tr(OP−1)
目标函数整理与拉普拉斯矩阵
至此,
O
,
P
\mathcal O,\mathcal P
O,P都可以使用
W
,
Y
\mathcal W,\mathcal Y
W,Y进行表示。关于目标函数可转化至如下形式:
矩阵的逆应该放在外面,上面也是,latex用的不好~,见笑。
Y
^
=
arg
min
Y
∑
k
=
1
K
ψ
(
A
k
,
A
k
ˉ
)
degree
(
A
k
)
=
arg
min
Y
tr
(
O
P
−
1
)
=
arg
min
Y
tr
(
O
′
P
−
1
)
=
arg
min
Y
tr
[
Y
T
(
D
−
W
)
Y
⏟
O
′
⋅
(
Y
T
D
Y
)
−
1
⏟
P
]
\begin{aligned} \hat {\mathcal Y} & = \mathop{\arg\min}\limits_{\mathcal Y}\sum_{k=1}^{\mathcal K} \frac{\psi(\mathcal A_k,\bar {\mathcal A_k})}{\text{degree}(\mathcal A_k)} \\ & = \mathop{\arg\min}\limits_{\mathcal Y} \text{tr}(\mathcal O \mathcal P^{-1}) \\ & = \mathop{\arg\min}\limits_{\mathcal Y} \text{tr}(\mathcal O' \mathcal P^{-1}) \\ & = \mathop{\arg\min}\limits_{\mathcal Y} \text{tr} \left[\underbrace{\mathcal Y^T(\mathcal D - \mathcal W) \mathcal Y}_{\mathcal O'} \cdot \underbrace{\left(\mathcal Y^T \mathcal D \mathcal Y\right)^{-1}}_{\mathcal P}\right] \end{aligned}
Y^=Yargmink=1∑Kdegree(Ak)ψ(Ak,Akˉ)=Yargmintr(OP−1)=Yargmintr(O′P−1)=Yargmintr
O′
YT(D−W)Y⋅P
(YTDY)−1
通常令
L
=
D
−
W
\mathcal L = \mathcal D - \mathcal W
L=D−W,其中
L
\mathcal L
L也被称作拉普拉斯矩阵(
Laplacian Matrix
\text{Laplacian Matrix}
Laplacian Matrix)
相关参考:
机器学习-谱聚类(3)-模型的矩阵形式-Indicator Vector
机器学习-谱聚类(4)-模型的矩阵形式-对角矩阵
机器学习-谱聚类(5)-模型的矩阵形式-Laplacian Matrix
相关文章
- 神经网络与机器学习 笔记—LMS(最小均方算法)和学习率退火
- 神经网络与机器学习 笔记—Rosenblatt感知器收敛算法C++实现
- [吴恩达机器学习笔记]16推荐系统3-4协同过滤算法
- [吴恩达机器学习笔记]16推荐系统1-2基于内容的推荐系统
- [吴恩达机器学习笔记]15.1-3非监督学习异常检测算法/高斯回回归模型
- [吴恩达机器学习笔记]14降维3-4PCA算法原理
- 机器学习笔记之近似推断(二)推断的核心思想
- 机器学习笔记之高斯网络(一)基本介绍
- 机器学习笔记之降维(二)样本均值与样本方差的矩阵表示
- 机器学习笔记之马尔可夫链蒙特卡洛方法(一)蒙特卡洛方法介绍
- 机器学习笔记之隐马尔可夫模型(七)总结部分
- 机器学习笔记之EM算法(四)广义EM
- 机器学习笔记之指数族分布——充分统计量与模型参数的关系
- Andrew Ng机器学习公开课笔记 -- Logistic Regression
- Andrew Ng机器学习公开课笔记 -- Generative Learning algorithms
- 【吴恩达机器学习】第五周课程精简笔记——代价函数和反向传播
- 汇丰银行:为什么机器学习正在加速云计算的采用