zl程序教程

您现在的位置是:首页 >  其它

当前栏目

交叉熵求导

交叉 求导
2023-09-14 09:15:49 时间

在这里插入图片描述
在这里插入图片描述

. 输入为z向量, z = [ z 1 , z 2 , . . . , z n ] z=[z_{1},z_{2},...,z_{n}] z=[z1,z2,...,zn],维度为(1,n)输出 s = [ e 1 ∑ k = 1 n e k , e 2 ∑ k = 1 n e k , . . . , e n ∑ k = 1 n e k ] s=[\frac{e^{1}}{\sum_{k=1}^{n}e^{k}},\frac{e^{2}}{\sum_{k=1}^{n}e^{k}},...,\frac{e^{n}}{\sum_{k=1}^{n}e^{k}}] s=[k=1neke1,k=1neke2,...,k=1neken],
维度为(1,n)
2. 经过softmax函数, s i = e i ∑ k = 1 n e k s_{i}=\frac{e^{i}}{\sum_{k=1}^{n}e^{k}} si=k=1nekei
3. Softmax Loss损失函数定义为L, L = − ∑ k = 1 n y i ln ⁡ ( s i ) L=-\sum_{k=1}^{n}y_{i}\ln \left ( s_{i}\right ) L=k=1nyiln(si),L是一个标量,维度为(1,1)
其中y向量为模型的Label,维度也是(1,n),为已知量,一般为onehot形式。
我们假设第 j 个类别是正确的,则y=[0,0,…1,…,0],只有 y j = 1 y_{j}=1 yj=1,其余 y j = 0 y_{j}=0 yj=0
L = − y j ln ⁡ ( s j ) = = − ln ⁡ ( s j ) L=-y_{j}\ln \left ( s_{j}\right )==-\ln \left ( s_{j}\right ) L=yjln(sj)==ln(sj)
我们的目标是求 标量L对向量 Z 的导数 ∂ L ∂ Z \frac{\partial L}{\partial Z} ZL
由链式法则, ∂ L ∂ z = ∂ L ∂ s ⋅ ∂ s ∂ z \frac{\partial L}{\partial z}=\frac{\partial L}{\partial s}\cdot\frac{\partial s}{\partial z} zL=sLzs
其中s和z均为维度为(1,n)的向量。

∂ L ∂ s = [ 0 , 0 , . . . , − 1 s j , 0 , . . . , 0 ] , d i m = [ 1 ∗ n ] \frac{\partial L}{\partial s}=[0,0,...,-\frac{1}{s_{j}},0,...,0] ,dim=[1*n] sL=[0,0,...,sj1,0,...,0],dim=[1n]

∂ s ∂ z = \frac{\partial s}{\partial z}= zs=如下,dim=[n*n]

∂ s ∂ z = [ s 1 ∗ [ 1 − s 1 ] − s 1 ∗ s 2 − s 1 ∗ s 3 . . . − s 1 ∗ s j . . . − s 1 ∗ s n − s 2 ∗ s 1 s 2 ∗ [ 1 − s 2 ] − s 2 ∗ s 2 . . . . − s 2 ∗ s j . . . − s 2 ∗ s n − s 3 ∗ s 1 − s 3 ∗ s 2 s 3 ∗ [ 1 − s 3 ] . . . − s 3 ∗ s j . . . − s 3 ∗ s n . . . . . . . . . . . . . . . . . . − s j ∗ s 1 − s j ∗ s 2 − s j ∗ s 3 . . . s j ∗ [ 1 − s j ] . . . − s j ∗ s n . . . . . . . . . . . . . . . . . . − s n ∗ s 1 − s n ∗ s 2 − s n ∗ s 3 . . . . − s n ∗ s j . . . s n ∗ [ 1 − s n ] ] \frac{\partial s}{\partial z}=\begin{bmatrix} s_{1}*[1- s_{1}]& -s_{1}* s_{2}& -s_{1}* s_{3}& ... & -s_{1}* s_{j}&...&-s_{1}* s_{n}& \\ -s_{2}* s_{1}& s_{2}*[1- s_{2}] & -s_{2}* s_{2}& ....&-s_{2}* s_{j}&...&-s_{2}* s_{n} \\ -s_{3}* s_{1}& -s_{3}* s_{2}& s_{3}* [1-s_{3}] & ...&-s_{3}* s_{j}&...&-s_{3}* s_{n} \\ ...& ... & ...& ...& ...& ...& \\ -s_{j}* s_{1}& -s_{j}* s_{2}& -s_{j}* s_{3}& ...&s_{j}* [1-s_{j}]&...&-s_{j}* s_{n} \\ ...& ... & ...& ...& ...& ...& \\ -s_{n}*s_{1}& -s_{n}*s_{2}& - s_{n}*s_{3}& ....& - s_{n}*s_{j}&...&s_{n}*[1-s_{n} ]& \end{bmatrix} zs=s1[1s1]s2s1s3s1...sjs1...sns1s1s2s2[1s2]s3s2...sjs2...sns2s1s3s2s2s3[1s3]...sjs3...sns3.......................s1sjs2sjs3sj...sj[1sj]...snsj.....................s1sns2sns3snsjsnsn[1sn]

[1*n] ∂ L ∂ s \frac{\partial L}{\partial s} sL的矩阵左乘n*n的矩阵 ∂ s ∂ z \frac{\partial s}{\partial z} zs

∂ L ∂ z = ∂ L ∂ s ⋅ ∂ s ∂ z = [ s 1 , s 2 , . . . , s j − 1 , . . . , s n ] = s − y \frac{\partial L}{\partial z}=\frac{\partial L}{\partial s}\cdot\frac{\partial s}{\partial z}=[s_{1},s_{2},...,s_{j}-1,...,s_{n}]=s-y zL=sLzs=[s1,s2,...,sj1,...,sn]=sy

主要链接
在线latex
一个国外的小哥的推导