zl程序教程

您现在的位置是:首页 >  硬件

当前栏目

机器学习笔记之玻尔兹曼机(三)基于平均场理论变分推断的梯度求解(续)

机器笔记学习 基于 理论 求解 梯度 平均
2023-09-11 14:15:53 时间

机器学习笔记之玻尔兹曼机——基于平均场推断梯度求解[续]

引言

基于玻尔兹曼机(三)梯度求解(基于平均场理论的变分推断)的思路继续求解 Λ 3 \Lambda_3 Λ3的梯度。

Λ 3 \Lambda_3 Λ3梯度求解

Λ 3 \Lambda_3 Λ3进行化简。其就是一个的形式:
Λ 3 = − ∑ h ( i ) Q ( h ( i ) ∣ v ( i ) ; ϕ ) log ⁡ Q ( h ( i ) ∣ v ( i ) ; ϕ ) \Lambda_3 = - \sum_{h^{(i)}} \mathcal Q(h^{(i)} \mid v^{(i)};\phi) \log \mathcal Q(h^{(i)} \mid v^{(i)};\phi) Λ3=h(i)Q(h(i)v(i);ϕ)logQ(h(i)v(i);ϕ)
由于平均场假设作为条件,将上式进行展开:
这里暂时先将负号带回去了~
Λ 3 = ∑ h ( i ) { ∏ j = 1 P Q ( h j ( i ) ∣ v ( i ) ; ϕ ) log ⁡ ∏ j = 1 P [ 1 Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ] } = ∑ h ( i ) { ∏ j = 1 P Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ∑ j = 1 P log ⁡ [ 1 Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ] } \begin{aligned} \Lambda_3 & = \sum_{h^{(i)}} \left\{\prod_{j=1}^{\mathcal P} \mathcal Q(h_j^{(i)} \mid v^{(i)};\phi) \log \prod_{j=1}^{\mathcal P} \left[\frac{1}{\mathcal Q(h_j^{(i)} \mid v^{(i)};\phi)}\right]\right\} \\ & = \sum_{h^{(i)}} \left\{\prod_{j=1}^{\mathcal P} \mathcal Q(h_j^{(i)} \mid v^{(i)};\phi) \sum_{j=1}^{\mathcal P}\log \left[\frac{1}{\mathcal Q(h_j^{(i)} \mid v^{(i)};\phi)}\right]\right\} \end{aligned} Λ3=h(i){j=1PQ(hj(i)v(i);ϕ)logj=1P[Q(hj(i)v(i);ϕ)1]}=h(i){j=1PQ(hj(i)v(i);ϕ)j=1Plog[Q(hj(i)v(i);ϕ)1]}
继续将 ∑ h ( i ) = ∑ h 1 ( i ) , h 2 ( i ) , ⋯   , h P ( i ) \sum_{h^{(i)}} = \sum_{h_1^{(i)},h_2^{(i)},\cdots,h_{\mathcal P}^{(i)}} h(i)=h1(i),h2(i),,hP(i)展开,并与相关项进行归纳
Λ 3 = ∑ j = 1 P ∑ h j ( i ) Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ⋅ log ⁡ [ 1 Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ] \Lambda_3 = \sum_{j=1}^{\mathcal P} \sum_{h_j^{(i)}} \mathcal Q(h_j^{(i)} \mid v^{(i)};\phi) \cdot \log \left[\frac{1}{\mathcal Q(h_j^{(i)} \mid v^{(i)};\phi)}\right] Λ3=j=1Phj(i)Q(hj(i)v(i);ϕ)log[Q(hj(i)v(i);ϕ)1]
继续观察 ∑ h j ( i ) Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ⋅ log ⁡ [ 1 Q ( h j ( i ) ∣ v ( i ) ; ϕ ) ] \sum_{h_j^{(i)}} \mathcal Q(h_j^{(i)} \mid v^{(i)};\phi) \cdot \log \left[\frac{1}{\mathcal Q(h_j^{(i)} \mid v^{(i)};\phi)}\right] hj(i)Q(hj(i)v(i);ϕ)log[Q(hj(i)v(i);ϕ)1],基于 h j ( i ) h_j^{(i)} hj(i)伯努利分布,该式可表示为如下形式:
Q ( h j ( i ) = 1 ∣ v ( i ) ; ϕ ) ⋅ l o g [ 1 Q ( h j ( i ) = 1 ∣ v ( i ) ; ϕ ) ] + Q ( h j ( i ) = 0 ∣ v ( i ) ; ϕ ) ⋅ l o g [ 1 Q ( h j ( i ) = 0 ∣ v ( i ) ; ϕ ) ] = ϕ j ⋅ log ⁡ [ 1 ϕ j ] + ( 1 − ϕ j ) log ⁡ [ 1 1 − ϕ j ] \begin{aligned} & \quad \mathcal Q(h_j^{(i)}=1 \mid v^{(i)};\phi) \cdot log \left[\frac{1}{\mathcal Q(h_j^{(i)}=1 \mid v^{(i)};\phi)}\right] + \mathcal Q(h_j^{(i)}=0 \mid v^{(i)};\phi) \cdot log \left[\frac{1}{\mathcal Q(h_j^{(i)}=0 \mid v^{(i)};\phi)}\right] \\ & = \phi_j \cdot \log \left[\frac{1}{\phi_j}\right] + (1 - \phi_j) \log \left[\frac{1}{1 - \phi_j}\right] \end{aligned} Q(hj(i)=1v(i);ϕ)log[Q(hj(i)=1v(i);ϕ)1]+Q(hj(i)=0v(i);ϕ)log[Q(hj(i)=0v(i);ϕ)1]=ϕjlog[ϕj1]+(1ϕj)log[1ϕj1]
至此, Λ 3 \Lambda_3 Λ3可表示为:
Λ 3 = ∑ j = 1 P { ϕ j ⋅ log ⁡ [ 1 ϕ j ] + ( 1 − ϕ j ) log ⁡ [ 1 1 − ϕ j ] } \Lambda_3 = \sum_{j=1}^{\mathcal P}\left\{\phi_j \cdot \log \left[\frac{1}{\phi_j}\right] + (1 - \phi_j) \log \left[\frac{1}{1 - \phi_j}\right]\right\} Λ3=j=1P{ϕjlog[ϕj1]+(1ϕj)log[1ϕj1]}

求解最优参数 ϕ ^ j \hat {\phi}_j ϕ^j

至此, Λ 1 , Λ 2 , Λ 3 \Lambda_1,\Lambda_2,\Lambda_3 Λ1,Λ2,Λ3全部化简完毕,将这三项分别对 ϕ j \phi_j ϕj求偏导。具体结果如下:
{ ∂ ∂ ϕ j [ ∑ i = 1 D ∑ l = 1 P ϕ l ⋅ v i ( i ) ⋅ W i l ] = ∑ i = 1 D v i ( i ) ⋅ W i j ∂ ∂ ϕ j [ ∑ j = 1 P ∑ l ≠ j P ϕ i ⋅ ϕ l ⋅ J i l ] = ∑ l ≠ j P ϕ l ⋅ J i l ∂ ∂ ϕ j [ ∑ j = 1 P { ϕ j ⋅ log ⁡ [ 1 ϕ j ] + ( 1 − ϕ j ) log ⁡ [ 1 1 − ϕ j ] } ] = − log ⁡ ϕ j 1 − ϕ j \begin{cases} \frac{\partial}{\partial \phi_j} \left[\sum_{i=1}^{\mathcal D}\sum_{l=1}^{\mathcal P} \phi_l \cdot v_i^{(i)} \cdot \mathcal W_{il}\right] = \sum_{i=1}^{\mathcal D} v_i^{(i)} \cdot \mathcal W_{ij} \\ \frac{\partial}{\partial \phi_j} \left[\sum_{j=1}^{\mathcal P}\sum_{l\neq j}^{\mathcal P} \phi_i \cdot \phi_l \cdot \mathcal J_{il}\right] = \sum_{l \neq j}^{\mathcal P} \phi_l \cdot \mathcal J_{il} \\ \frac{\partial}{\partial \phi_j} [\sum_{j=1}^{\mathcal P}\left\{\phi_j \cdot \log \left[\frac{1}{\phi_j}\right] + (1 - \phi_j) \log \left[\frac{1}{1 - \phi_j}\right]\right\}] = -\log \frac{\phi_j}{1 - \phi_j} \end{cases} ϕj[i=1Dl=1Pϕlvi(i)Wil]=i=1Dvi(i)Wijϕj[j=1Pl=jPϕiϕlJil]=l=jPϕlJilϕj[j=1P{ϕjlog[ϕj1]+(1ϕj)log[1ϕj1]}]=log1ϕjϕj
三项之和为0,求解 ϕ j \phi_j ϕj
∂ ∂ ϕ j [ Λ 1 + Λ 2 + Λ 3 ] ≜ 0 ⇔ ∑ i = 1 D v i ( i ) ⋅ W i j + ∑ l ≠ j P ϕ l ⋅ J i l − log ⁡ ϕ j 1 − ϕ j = 0 \frac{\partial}{\partial \phi_j} [\Lambda_1 + \Lambda_2 + \Lambda_3] \triangleq 0 \Leftrightarrow \sum_{i=1}^{\mathcal D} v_i^{(i)} \cdot \mathcal W_{ij} + \sum_{l \neq j}^{\mathcal P} \phi_l \cdot \mathcal J_{il} - \log \frac{\phi_j}{1 - \phi_j} = 0 ϕj[Λ1+Λ2+Λ3]0i=1Dvi(i)Wij+l=jPϕlJillog1ϕjϕj=0
因而有:
ϕ j [ 1 + exp ⁡ ( ∑ i = 1 D v i ( i ) ⋅ W i j + ∑ l ≠ j P ϕ l ⋅ J i l ) ] = exp ⁡ ( ∑ i = 1 D v i ( i ) ⋅ W i j + ∑ l ≠ j P ϕ l ⋅ J i l ) ⇒ ϕ j = 1 1 + exp ⁡ ( ∑ i = 1 D v i ( i ) ⋅ W i j + ∑ l ≠ j P ϕ l ⋅ J i l ) = Sigmoid [ ∑ i = 1 D v i ( i ) ⋅ W i j + ∑ l ≠ j P ϕ l ⋅ J i l ] \phi_j \left[1 + \exp \left(\sum_{i=1}^{\mathcal D} v_i^{(i)} \cdot \mathcal W_{ij} + \sum_{l \neq j}^{\mathcal P} \phi_l \cdot \mathcal J_{il}\right)\right] = \exp \left(\sum_{i=1}^{\mathcal D} v_i^{(i)} \cdot \mathcal W_{ij} + \sum_{l \neq j}^{\mathcal P} \phi_l \cdot \mathcal J_{il}\right) \\ \begin{aligned} \Rightarrow \phi_j & = \frac{1}{1 + \exp \left(\sum_{i=1}^{\mathcal D} v_i^{(i)} \cdot \mathcal W_{ij} + \sum_{l \neq j}^{\mathcal P} \phi_l \cdot \mathcal J_{il}\right)} \\ & = \text{Sigmoid} \left[\sum_{i=1}^{\mathcal D} v_i^{(i)} \cdot \mathcal W_{ij} + \sum_{l \neq j}^{\mathcal P} \phi_l \cdot \mathcal J_{il}\right] \end{aligned} ϕj 1+exp i=1Dvi(i)Wij+l=jPϕlJil =exp i=1Dvi(i)Wij+l=jPϕlJil ϕj=1+exp(i=1Dvi(i)Wij+l=jPϕlJil)1=Sigmoid i=1Dvi(i)Wij+l=jPϕlJil
很明显,这是一个迭代方程——用 ϕ j \phi_j ϕj的其他结果的线性运算 ∑ l ≠ j P \sum_{l \neq j}^{\mathcal P} l=jP ϕ j \phi_j ϕj进行表示。并且它是一个不动点方程。它的具体求解方式是:在初始状态下给定随机结果,固定住 ϕ j \phi_j ϕj之外的其他项,求解当前迭代步骤下 ϕ j \phi_j ϕj的最优解;以此类推,直到所有的 ϕ j ( j = 1 , 2 , ⋯   , P ) \phi_{j}(j=1,2,\cdots,\mathcal P) ϕj(j=1,2,,P)均固定一遍,此时一次迭代结束,继续迭代下去,基于不动点方程的性质, ϕ j ( j = 1 , 2 , ⋯   , P ) \phi_j(j=1,2,\cdots,\mathcal P) ϕj(j=1,2,,P)均会收敛至某一结果,至此 ϕ = { ϕ 1 , ϕ 2 , ⋯   , ϕ P } \phi = \{\phi_1,\phi_2,\cdots,\phi_{\mathcal P}\} ϕ={ϕ1,ϕ2,,ϕP}可以被近似出来,最终求解 Q ( h ( i ) ∣ v ( i ) ; ϕ ) \mathcal Q(h^{(i)} \mid v^{(i)};\phi) Q(h(i)v(i);ϕ)的分布结果,并最终替代 P m o d e l ( h ( i ) ∣ v ( i ) ; θ ) \mathcal P_{model}(h^{(i)} \mid v^{(i)};\theta) Pmodel(h(i)v(i);θ)对模型参数的梯度进行描述
这实际上就是‘坐标上升思想’。

返回要求解的模型参数梯度:
{ ∇ W = η ( E P d a t a [ v ( i ) ( h ( i ) ) T ] − E P m o d e l [ v ( i ) ( h ( i ) ) T ] ) P d a t a ⇒ P d a t a ( v ( i ) ∈ V ) ⋅ P m o d e l ( h ( i ) ∣ v ( i ) ) P m o d e l = P m o d e l ( v ( i ) , h ( i ) ) \begin{cases} \nabla_{\mathcal W} = \eta \left(\mathbb E_{\mathcal P_{data}} \left[v^{(i)}(h^{(i)})^T\right] - \mathbb E_{\mathcal P_{model}} \left[v^{(i)}(h^{(i)})^T\right]\right)\\ \mathcal P_{data} \Rightarrow \mathcal P_{data}(v^{(i)} \in \mathcal V) \cdot \mathcal P_{model}(h^{(i)} \mid v^{(i)}) \\ \mathcal P_{model} = \mathcal P_{model}(v^{(i)},h^{(i)}) \end{cases} W=η(EPdata[v(i)(h(i))T]EPmodel[v(i)(h(i))T])PdataPdata(v(i)V)Pmodel(h(i)v(i))Pmodel=Pmodel(v(i),h(i))
此时关于隐变量的后验概率 P m o d e l \mathcal P_{model} Pmodel并不需要使用MCMC采样方法进行求解,通过变分推断的方式也可以进行求解。
关于负相的处理方式,依然需要使用采样方法,随着技术的迭代,采样方式也得到了更新,例如对比散度方法,同样可以加快采样速度。

至此,关于玻尔兹曼机的介绍到此结束。

相关参考:
(系列二十八)玻尔兹曼机7-平均场推断3