zl程序教程

您现在的位置是:首页 >  硬件

当前栏目

paddle中的自动求解梯度 : autograd.backward

自动 求解 梯度 Paddle
2023-09-11 14:15:20 时间

简 介: 对于paddle中的autograd.backward进行测试。并对集中常见到的函数进行测试。

关键词 gradient

自动求解梯度
文章目录
函数backward
函数使用说明
实际测试
更多函数例子
square
exp
log
Softmax
总 结

 

§01 动求解梯度


  于paddle中的 paddle.autograd.backward函数参见: paddle.autograd.backward 中的内容。

1.1 函数backward

1.1.1 函数使用说明

  函数定义: def backward(tensors, grad_tensors=None, retain_graph=False)

  计算给定tensor的反向梯度。

(1)参数

    Args:
        tensors(list of Tensors): the tensors which the gradient to be computed. The tensors can not contain the same tensor.
        grad_tensors(list of Tensors of None, optional): the init gradients of the `tensors`` .If not None, it must have the same length with ``tensors`` ,
            and if any of the elements is None, then the init gradient is the default value which is filled with 1.0. 
            If None, all the gradients of the ``tensors`` is the default value which is filled with 1.0.
            Defaults to None.
        retain_graph(bool, optional): If False, the graph used to compute grads will be freed. If you would
            like to add more ops to the built graph after calling this method( :code:`backward` ), set the parameter
            :code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
            Defaults to False.
    
    Returns:
        NoneType: None

(2)举例

Examples:
    .. code-block:: python
        import paddle
        x = paddle.to_tensor([[1, 2], [3, 4]], dtype='float32', stop_gradient=False)
        y = paddle.to_tensor([[3, 2], [3, 4]], dtype='float32')
        grad_tensor1 = paddle.to_tensor([[1,2], [2, 3]], dtype='float32')
        grad_tensor2 = paddle.to_tensor([[1,1], [1, 1]], dtype='float32')
        z1 = paddle.matmul(x, y)
        z2 = paddle.matmul(x, y)
        paddle.autograd.backward([z1, z2], [grad_tensor1, grad_tensor2], True)
        print(x.grad)
        #[[12. 18.]
        # [17. 25.]]
        x.clear_grad()
        paddle.autograd.backward([z1, z2], [grad_tensor1, None], True)
        print(x.grad)
        #[[12. 18.]
        # [17. 25.]]
        x.clear_grad()
        paddle.autograd.backward([z1, z2])
        print(x.grad)
        #[[10. 14.]
        # [10. 14.]]

1.1.2 实际测试

(1)单向量求解

 Ⅰ.定义变量和函数
import sys,os,math,time
import matplotlib.pyplot as plt
from numpy import *
import paddle
from paddle import to_tensor as TT

x = TT([1], dtype='float32', stop_gradient=False)
y = TT([2], dtype='float32')

z = paddle.matmul(x, y)

print("x: {}".format(x),"y: {}".format(y),"z: {}".format(z))
x: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1.])
y: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=True,
       [2.])
z: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [2.])
 Ⅱ.在backward之前
print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x.grad: None
y.grad: None
z.grad: None
 Ⅲ.运行backward之后
paddle.autograd.backward(z, TT([3], dtype='float32'))
x.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [6.])
y.grad: None
z.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [3.])

  从上面的代码可以看到, z = x ⋅ y z = x \cdot y z=xy,所以, ∂ x = ∂ z ⋅ y \partial x = \partial z \cdot y x=zy。根据 ∂ z = 3 ,    y = 2 \partial z = 3,\,\,y = 2 z=3,y=2
  ,所以 ∂ x = 6 \partial x = 6 x=6

 Ⅳ.再次backward

  再次计算backward的时候,环境给出错误。

RuntimeError: (Unavailable) auto_0_ trying to backward through the same graph a second time, but this graph have already been freed. Please specify Tensor.backward(retain_graph=True) when calling backward at the first time.
  [Hint: Expected var->GradVarBase()->GraphIsFreed() == false, but received var->GradVarBase()->GraphIsFreed():1 != false:0.] (at /paddle/paddle/fluid/imperative/basic_engine.cc:74)

  即使调用 clear_grad(),也无法保证重新计算backward()

x.clear_grad()
y.clear_grad()
z.clear_grad()

  将autograd.backward函数中的retain_graph设置为TRUE。

paddle.autograd.backward(z, TT([3], dtype='float32'), retain_graph=True)

  此时就可以重复调用backward。

  第一次调用:

print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [6.])
y.grad: None
z.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [3.])

  第二次调用则:

x.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [12.])
y.grad: None
z.grad: Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=False,
       [3.])

(2)矩阵乘法

 Ⅰ.一个矩阵乘法
  • 定义矩阵相乘:

z = x ⋅ y z = x \cdot y z=xy

  • 反向梯度:

∂ x = ∂ z ⋅ y \partial x = \partial z \cdot y x=zy

x = TT([[1,2],[3,4]], dtype='float32', stop_gradient=False)
y = TT([[3,2],[3,4]], dtype='float32')

z = paddle.matmul(x, y)

print("x: {}".format(x),"y: {}".format(y),"z: {}".format(z))

paddle.autograd.backward(z, TT([[1,2],[2,3]], dtype='float32'), retain_graph=True)
print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[1., 2.],
        [3., 4.]])
y: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=True,
       [[3., 2.],
        [3., 4.]])
z: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[9. , 10.],
        [21., 22.]])
x.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[7. , 11.],
        [12., 18.]])
y.grad: None
z.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[1., 2.],
        [2., 3.]])

  从中可以看到,对于矩阵的自动反向梯度,所遵循的与标量基本是一样的。

print("y*z.grad: {}".format(y.numpy().dot(z.grad.numpy())))
y*z.grad: [[ 7. 12.]
 [11. 18.]]
 Ⅱ.两个矩阵乘法

  定义两个矩阵运算:

z 1 = x ⋅ y ,    z 1 = x ⋅ y z_1 = x \cdot y,\,\,z_1 = x \cdot y z1=xy,z1=xy

  那么梯度:

∂ x = ( ∂ z 1 + ∂ z 2 ) ⋅ y \partial x = \left( {\partial z_1 + \partial z_2 } \right) \cdot y x=(z1+z2)y

x = TT([[1,2],[3,4]], dtype='float32', stop_gradient=False)
y = TT([[3,2],[3,4]], dtype='float32')

z1 = paddle.matmul(x, y)
z2 = paddle.matmul(x, y)

print("x: {}".format(x),"y: {}".format(y),"z: {}".format(z))

paddle.autograd.backward(z1, TT([[1,2],[2,3]], dtype='float32'))
paddle.autograd.backward(z2, TT([[1,1],[1,1]], dtype='float32'))

print("x.grad: {}".format(x.grad), "y.grad: {}".format(y.grad), "z.grad: {}".format(z.grad))
x: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[1., 2.],
        [3., 4.]])
y: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=True,
       [[3., 2.],
        [3., 4.]])
z: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[9. , 10.],
        [21., 22.]])
x.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[12., 18.],
        [17., 25.]])
y.grad: None
z.grad: Tensor(shape=[2, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[1., 2.],
        [2., 3.]])

  检验x的梯度矩阵:

print("y*(z1+z2).grad: {}".format(y.numpy().dot((z1.grad+z2.grad).numpy())))
y*(z1+z2).grad: [[12. 17.]
 [18. 25.]]

  它等于计算公式给出的数值。

1.2 更多函数例子

1.2.1 square

x = TT([1,2], dtype='float32', stop_gradient=False)

z1 = paddle.square(x)

print("x: {}".format(x),"z1: {}".format(z1))

paddle.autograd.backward(z1)
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 4.])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [2., 4.])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 1.])

1.2.2 exp

x = TT([1,2], dtype='float32', stop_gradient=False)

z1 = paddle.exp(x)

print("x: {}".format(x),"z1: {}".format(z1))

paddle.autograd.backward(z1)
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [2.71828175, 7.38905621])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [2.71828175, 7.38905621])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 1.])

1.2.3 log

x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [0.        , 0.69314718])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1.        , 0.50000000])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 1.])

1.2.4 Softmax

(1)理论推导

  为了方便起见,这里使用两个变量的SoftMax:

z 1 = e x 1 e x 1 + e x 2 ,      z 2 = e x 2 e x 1 + e x 2 z_1 = {{e^{x_1 } } \over {e^{x_1 } + e^{x_2 } }},\,\,\,\,z_2 = {{e^{x_2 } } \over {e^{x_1 } + e^{x_2 } }} z1=ex1+ex2ex1,z2=ex1+ex2ex2

那么:
∂ x 1 = ∂ z 1 ⋅ e x 1 ⋅ ( e x 1 + e x 2 ) − e x 1 ⋅ e x 1 ( e x 1 + e x 2 ) 2 + ∂ z 2 ⋅ − e x 2 ⋅ e x 1 ( e x 1 + e x 2 ) 2 \partial x_1 = \partial z_1 \cdot {{e^{x_1 } \cdot \left( {e^{x_1 } + e^{x_2 } } \right) - e^{x_1 } \cdot e^{x_1 } } \over {\left( {e^{x_1 } + e^{x_2 } } \right)^2 }} + \partial z_2 \cdot {{ - e^{x_2 } \cdot e^{x_1 } } \over {\left( {e^{x_1 } + e^{x_2 } } \right)^2 }} x1=z1(ex1+ex2)2ex1(ex1+ex2)ex1ex1+z2(ex1+ex2)2ex2ex1

  同理,也可以得到 ∂ x 2 \partial x_2 x2。这里省略。

  如果 ∂ z 1 = ∂ z 2 \partial z_1 = \partial z_2 z1=z2,那么就会有:。 ∂ x 1 = ∂ x 2 = 0 \partial x_1 = \partial x_2 = 0 x1=x2=0

(2)实验仿真

 Ⅰ.backward对z梯度不尽兴初始化
x = TT([1,2], dtype='float32', stop_gradient=False)

z1 = paddle.exp(x) / paddle.sum(paddle.exp(x))

print("x: {}".format(x),"z1: {}".format(z1))

paddle.autograd.backward(z1)
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [0.26894140, 0.73105860])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [0., 0.])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 1.])
 Ⅱ.对z的梯度进行初始化
x = TT([1,2], dtype='float32', stop_gradient=False)

z1 = paddle.exp(x) / paddle.sum(paddle.exp(x))

print("x: {}".format(x),"z1: {}".format(z1))

paddle.autograd.backward(z1,TT([1,2], dtype='float32'))
print("x.grad: {}".format(x.grad), "z1.grad: {}".format(z1.grad))
x: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 2.])
z1: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [0.26894140, 0.73105860])
x.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [-0.19661194,  0.19661200])
z1.grad: Tensor(shape=[2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [1., 2.])

  直接根据前面推导的公式,可以验证结果是正确的。

x = array([1,2])
a = exp(sum(x))/(sum(exp(x)))**2
print("a: {}".format(a))
a: 0.19661193324148188

 

  结 ※


  于paddle中的autograd.backward进行测试。并对集中常见到的函数进行测试。


■ 相关文献链接: