zl程序教程

您现在的位置是:首页 >  工具

当前栏目

下降法(Descent Directions Method)学习

学习 method 下降
2023-09-14 09:06:48 时间

目标

min ⁡ { f ( x ) : x ∈ R n } \min\{f(\boldsymbol{x}):\boldsymbol{x}\in \mathbb{R}^n \} min{f(x):xRn}

下降方向

f : R n → R f:\mathbb{R}^n \to \mathbb{R} f:RnR是一个在 R n \mathbb{R}^n Rn上连续可微的函数。
0 ≠ d ∈ R n 0\neq \boldsymbol{d}\in \mathbb{R}^n 0=dRn,如果
f ′ ( x ; d ) = ∇ f ( x ) T d < 0 f'(\boldsymbol{x};\boldsymbol{d})=\nabla f(\boldsymbol{x})^T\boldsymbol{d}<0 f(x;d)=f(x)Td<0
那么称 d \boldsymbol{d} d为一个下降方向

性质

f : R n → R f:\mathbb{R}^n \to \mathbb{R} f:RnR是一个在 R n \mathbb{R}^n Rn上连续可微的函数。 x ∈ R n \boldsymbol{x}\in \mathbb{R}^n xRn
如果 d \boldsymbol{d} d是一个下降方向,那么 ∃ ϵ > 0 \exists\epsilon>0 ϵ>0,使得 ∀ t ∈ ( 0 , ϵ ] \forall t\in(0,\epsilon] t(0,ϵ],有
f ( x + t d ) < f ( x ) f(\boldsymbol{x}+t\boldsymbol{d})<f(\boldsymbol{x}) f(x+td)<f(x)
证明:
lim ⁡ t → 0 + f ( x + t d ) − f ( x ) t = f ′ ( x ; d ) < 0 \lim\limits_{t\to 0^{+}}\frac{f(\boldsymbol{x}+t\boldsymbol{d})-f(\boldsymbol{x})}{t}=f'(\boldsymbol{x};\boldsymbol{d})<0 t0+limtf(x+td)f(x)=f(x;d)<0
由保号性, ∃ ϵ > 0 \exists\epsilon>0 ϵ>0,使得 ∀ t ∈ ( 0 , ϵ ] \forall t\in(0,\epsilon] t(0,ϵ],有
f ( x + t d ) − f ( x ) t < 0 \frac{f(\boldsymbol{x}+t\boldsymbol{d})-f(\boldsymbol{x})}{t}<0 tf(x+td)f(x)<0
再根据 t > 0 t>0 t>0

f ( x + t d ) < f ( x ) f(\boldsymbol{x}+t\boldsymbol{d})<f(\boldsymbol{x}) f(x+td)<f(x)

Lipschitz连续

主要用到梯度是Lipschitz连续

假设 f f f连续可微,如果
∥ ∇ f ( x ) − ∇ f ( y ) ∥ ≤ L ∥ x − y ∥ , ∀ x , y ∈ R n \Vert \nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y})\Vert \le L\Vert \boldsymbol{x}-\boldsymbol{y}\Vert,\forall \boldsymbol{x},\boldsymbol{y}\in\mathbb{R}^n f(x)f(y)Lxy,x,yRn
那么称他的梯度 ∇ f ( x ) \nabla f(\boldsymbol{x}) f(x)是Lipchitz连续的
L L L称为Lipschitz常数
显然 L ~ ≥ L \tilde{L}\ge L L~L也满足条件,所以我们只关心满足条件的 L L L中最小的

把梯度Lipschitz连续的函数的一类函数记为 C L 1 , 1 ( R n ) C_{L}^{1,1}(\mathbb{R}^n) CL1,1(Rn)
如果不关心 L L L具体是多少,有时候也记为 C L 1 , 1 C_{L}^{1,1} CL1,1
比如 f ( x ) = a T x ∈ C 0 1 , 1 f(\boldsymbol{x})=\boldsymbol{a}^T \boldsymbol{x}\in C_{0}^{1,1} f(x)=aTxC01,1
f ( x ) = x T A x + 2 b T x + c ∈ C 2 ∥ A ∥ 1 , 1 f(\boldsymbol{x})=\boldsymbol{x}^TA\boldsymbol{x}+2\boldsymbol{b}^T\boldsymbol{x}+c\in C_{2\Vert A\Vert}^{1,1} f(x)=xTAx+2bTx+cC2A1,1

定理

假设 f f f二阶连续可微,那么下面两个命题等价
a) f ∈ C L 1 , 1 f\in C_{L}^{1,1} fCL1,1
b) ∀ x ∈ R n , ∥ ∇ 2 f ( x ) ∥ ≤ L \forall \boldsymbol{x}\in \mathbb{R}^{n},\Vert \nabla^2f(\boldsymbol{x})\Vert \le L xRn,2f(x)L
证明:
b ) ⇒ a ) b)\Rightarrow a) b)a)
∇ f ( y ) = ∇ f ( x ) + ∫ 0 1 ∇ 2 f ( x + t ( y − x ) ) ( y − x ) d x = ∇ f ( x ) + ( ∫ 0 1 ∇ 2 f ( x + t ( y − x ) ) d x ) ( y − x ) \begin{aligned} \nabla f(\boldsymbol{y})&=\nabla f(\boldsymbol{x})+\int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))(\boldsymbol{y}-\boldsymbol{x})\mathrm{d}\boldsymbol{x}\\ &=\nabla f(\boldsymbol{x})+(\int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\mathrm{d}\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x})\\ \end{aligned} f(y)=f(x)+012f(x+t(yx))(yx)dx=f(x)+(012f(x+t(yx))dx)(yx)
所以
∥ ∇ f ( y ) − ∇ f ( x ) ∥ = ∥ ( ∫ 0 1 ∇ 2 f ( x + t ( y − x ) ) d x ) ( y − x ) ∥ ≤ ∥ ∫ 0 1 ∇ 2 f ( x + t ( y − x ) ) d x ∥ ∥ y − x ∥ ≤ ( ∫ 0 1 ∥ ∇ 2 f ( x + t ( y − x ) ) ∥ d x ) ∥ y − x ∥ ≤ L ∥ y − x ∥ \begin{aligned} \Vert \nabla f(\boldsymbol{y})-\nabla f(\boldsymbol{x})\Vert &=\Vert (\int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\mathrm{d}\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x})\Vert\\ &\le \Vert \int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\mathrm{d}\boldsymbol{x}\Vert\Vert\boldsymbol{y}-\boldsymbol{x}\Vert\\ &\le \left(\int_{0}^{1}\Vert\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\Vert\mathrm{d}\boldsymbol{x}\right)\Vert\boldsymbol{y}-\boldsymbol{x}\Vert\\ &\le L\Vert\boldsymbol{y}-\boldsymbol{x}\Vert \end{aligned} f(y)f(x)=(012f(x+t(yx))dx)(yx)012f(x+t(yx))dxyx(012f(x+t(yx))dx)yxLyx
a ) ⇒ b ) a)\Rightarrow b) a)b)
∇ f ( x + α d ) − ∇ f ( x ) = ∫ 0 α ∇ 2 f ( x + t d ) d d t \nabla f(\boldsymbol{x}+\alpha \boldsymbol{d})-\nabla f(\boldsymbol{x})=\int_{0}^{\alpha}\nabla^2f(\boldsymbol{x}+t \boldsymbol{d})\boldsymbol{d}\mathrm{d}t f(x+αd)f(x)=0α2f(x+td)ddt
所以
∥ ( ∫ 0 α ∇ 2 f ( x + t d ) d t ) d ∥ = ∥ ∇ f ( x + α d ) − ∇ f ( x ) ∥ ≤ α L ∥ d ∥ \Vert \left(\int_{0}^{\alpha}\nabla^2f(\boldsymbol{x}+t \boldsymbol{d})\mathrm{d}t\right)\boldsymbol{d}\Vert=\Vert\nabla f(\boldsymbol{x}+\alpha \boldsymbol{d})-\nabla f(\boldsymbol{x})\Vert\le \alpha L\Vert\boldsymbol{d}\Vert (0α2f(x+td)dt)d=f(x+αd)f(x)αLd
两边同除 α \alpha α,并令 α → 0 + \alpha \to 0^{+} α0+
∥ ∇ 2 f ( x ) d ∥ ≤ L ∥ d ∥ ⇒ ∥ ∇ 2 f ( x ) ∥ ≤ L \Vert \nabla^2f(\boldsymbol{x})\boldsymbol{d}\Vert\le L\Vert \boldsymbol{d}\Vert\Rightarrow \Vert \nabla^2f(\boldsymbol{x})\Vert \le L 2f(x)dLd2f(x)L

下降法

有了上面的知识,其实我们的迭代就很明显了
x k + 1 = x k + t k d k , k = 0 , 1 , ⋯ \boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k},k=0,1,\cdots xk+1=xk+tkdk,k=0,1,
下降法的整体框架就是
1.选择初始点 x 0 ∈ R n \boldsymbol{x}_0\in \mathbb{R}^n x0Rn
2. 循环 k = 0 , 1 , ⋯ k=0,1,\cdots k=0,1,
a)选择下降方向 d k \boldsymbol{d}_{k} dk
b)挑选步长 t k t_k tk,使得 f ( x k + t k d k ) < f ( x k ) f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})<f(\boldsymbol{x}_{k}) f(xk+tkdk)<f(xk)
c)迭代 x k + 1 = x k + t k d k \boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k} xk+1=xk+tkdk
d)如果满足停止条件就停止,并返回 x k + 1 \boldsymbol{x}_{k+1} xk+1

那么问题来了,

  • 怎么选起始点
  • 怎么选下降方向
  • 怎么选步长
  • 停止条件是什么

因为最小值点,起码得是一个驻点,所以停止条件经常就是 ∥ f ( x ) ∥ ≤ ϵ \Vert f(x)\Vert \le \epsilon f(x)ϵ
所以终点看步长和下降方向

步长

固定步长

就是 ∀ k , t k = t ~ \forall k,t_k=\tilde{t} k,tk=t~

精确线搜索算法

t k = arg ⁡ min ⁡ t ≥ 0 f ( x k + t k d k ) t_{k}=\arg\min\limits_{t\ge 0}f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k}) tk=argt0minf(xk+tkdk)

非精确线搜索算法

对于非精确线搜索,就是步长需要满足一些条件。

Armijo准则

f ( x k ) − f ( x k + t k d k ) ≥ − α t k ∇ f ( x k ) T d k f(\boldsymbol{x}_{k})-f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})\ge -\alpha t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_k f(xk)f(xk+tkdk)αtkf(xk)Tdk
其中 α ∈ ( 0 , 1 ) \alpha\in(0,1) α(0,1)

如果 t t t足够小,这个准则总是能满足的
大概下面这个图这种感觉
在这里插入图片描述

存在性证明

f : R n → R f:\mathbb{R}^n \to \mathbb{R} f:RnR是一个在 R n \mathbb{R}^n Rn上连续可微的函数。 x ∈ R n \boldsymbol{x}\in \mathbb{R}^n xRn
0 ≠ d ∈ R n 0\neq \boldsymbol{d}\in \mathbb{R}^n 0=dRn是一个下降方向, α ∈ ( 0 , 1 ) \alpha\in(0,1) α(0,1)
那么 ∃ ϵ > 0 , ∀ t ∈ [ 0 , ϵ ] \exists \epsilon>0,\forall t\in [0,\epsilon] ϵ>0,t[0,ϵ],有
f ( x ) − f ( x + t d ) ≥ − α t ∇ f ( x ) T d f(\boldsymbol{x})-f(\boldsymbol{x}+t \boldsymbol{d})\ge -\alpha t\nabla f(\boldsymbol{x})^T\boldsymbol{d} f(x)f(x+td)αtf(x)Td

证明:
f ( x + t d ) = f ( x ) + t ∇ f ( x ) T d + o ( t ∥ d ∥ ) f ( x ) − f ( x + t d ) = − α t ∇ f ( x ) T d − ( 1 − α ) ∇ f ( x ) T d − o ( t ∥ d ∥ ) \begin{aligned} f(\boldsymbol{x}+t \boldsymbol{d})&=f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{d}+o(t\Vert \boldsymbol{d}\Vert)\\ f(\boldsymbol{x})-f(\boldsymbol{x}+t \boldsymbol{d})&=-\alpha t\nabla f(\boldsymbol{x})^T\boldsymbol{d}-(1-\alpha)\nabla f(\boldsymbol{x})^T\boldsymbol{d}-o(t\Vert \boldsymbol{d}\Vert) \end{aligned} f(x+td)f(x)f(x+td)=f(x)+tf(x)Td+o(td)=αtf(x)Td(1α)f(x)Tdo(td)
lim ⁡ t → 0 + ( 1 − α ) t ∇ f ( x ) T d + o ( t ∥ d ∥ ) t = ( 1 − α ) ∇ f ( x ) T d < 0 \lim \limits_{t \to 0^{+}} \frac{(1-\alpha) t \nabla f(\mathbf{x})^{T} \mathbf{d}+o(t\Vert\mathbf{d}\Vert)}{t}=(1-\alpha) \nabla f(\mathbf{x})^{T} \mathbf{d}<0 t0+limt(1α)tf(x)Td+o(td)=(1α)f(x)Td<0
因此 ∃ ϵ > 0 , ∀ t ∈ ( 0 , ϵ ] \exists \epsilon>0,\forall t\in(0,\epsilon] ϵ>0,t(0,ϵ]
( 1 − α ) ∇ f ( x ) T d + o ( t ∥ d ∥ ) < 0 (1-\alpha)\nabla f(\boldsymbol{x})^T\boldsymbol{d}+o(t\Vert \boldsymbol{d}\Vert)<0 (1α)f(x)Td+o(td)<0
进而
f ( x ) − f ( x + t d ) ≥ − α t ∇ f ( x ) T d f(\boldsymbol{x})-f(\boldsymbol{x}+t \boldsymbol{d})\ge -\alpha t\nabla f(\boldsymbol{x})^T\boldsymbol{d} f(x)f(x+td)αtf(x)Td

Goldstein准则

也叫做Armijo-Goldstein准则
为了克服Armijo准则的步长可能过小,这个准则
f ( x k + t k d k ) ≤ f ( x k ) + α t k ∇ f ( x k ) T d k f ( x k + t k d k ) ≥ f ( x k ) + ( 1 − α ) t k ∇ f ( x k ) T d k \begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\ge f(\boldsymbol{x}_{k})+(1-\alpha) t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \end{aligned} f(xk+tkdk)f(xk+tkdk)f(xk)+αtkf(xk)Tdkf(xk)+(1α)tkf(xk)Tdk
其中 α ∈ ( 0 , 1 2 ) \alpha\in(0,\frac{1}{2}) α(0,21)
在这里插入图片描述
然而缺点就是,可能会漏掉最优的点

Wolfe准则

也叫Wolfe-Powell准则
f ( x k + t k d k ) ≤ f ( x k ) + α 1 t k ∇ f ( x k ) T d k ∇ f ( x k + t k d k ) T d k ≥ α 2 ∇ f ( x k ) T d k \begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha_1 t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \end{aligned} f(xk+tkdk)f(xk+tkdk)Tdkf(xk)+α1tkf(xk)Tdkα2f(xk)Tdk
其中 α 1 , α 2 ∈ ( 0 , 1 ) , α 1 < α 2 \alpha_1,\alpha_2\in(0,1),\alpha_1<\alpha_2 α1,α2(0,1),α1<α2

也有强Wolfe准则
f ( x k + t k d k ) ≤ f ( x k ) + α 1 t k ∇ f ( x k ) T d k ∣ ∇ f ( x k + t k d k ) T d k ∣ ≤ ∣ α 2 ∇ f ( x k ) T d k ∣ \begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha_1 t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \left|\nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}\right|&\le \left|\alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\right|\\ \end{aligned} f(xk+tkdk)f(xk+tkdk)Tdkf(xk)+α1tkf(xk)Tdkα2f(xk)Tdk
其中 α 1 , α 2 ∈ ( 0 , 1 ) , α 1 < α 2 \alpha_1,\alpha_2\in(0,1),\alpha_1<\alpha_2 α1,α2(0,1),α1<α2

ϕ ( t k ) = f ( x k + t k d k ) \phi(t_k)=f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k}) ϕ(tk)=f(xk+tkdk)
ϕ ′ ( t k ) = ∇ f ( x k + t k d k ) T d k \phi'(t_k)=\nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k} ϕ(tk)=f(xk+tkdk)Tdk
而极小值点 t k ∗ t_k^{*} tk,满足
ϕ ′ ( t k ∗ ) = 0 \phi'(t_k^*)=0 ϕ(tk)=0
所以第二个条件总是满足的
第一个条件其实就是Armijo准则
在这里插入图片描述

存在性

f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:RnR连续可微, d k \mathbf{d}_k dk时一个下降方向, f f f有下界 { x k + α d k ∣ α > 0 } \left\{\mathbf{x}_k+\alpha\mathbf{d}_k|\alpha>0\right\} {xk+αdkα>0}.如果 0 < α 1 < α 2 < 1 0<\alpha_1<\alpha_2<1 0<α1<α2<1,则存在步长满足Wolfe准则和强Wolfe准则

证明:
ϕ ( t k ) = f ( x k + t k d k ) \phi(t_k)=f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k}) ϕ(tk)=f(xk+tkdk)
l ( t ) = f ( x k ) + α 1 t ∇ f ( x k ) T d k l\left(t\right)=f(\boldsymbol{x}_{k})+\alpha_1 t\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k} l(t)=f(xk)+α1tf(xk)Tdk ϕ \phi ϕ必然有交集
t ′ > 0 t'>0 t>0为他们相交的最小的 t ′ t' t,则
f ( x k + t ′ d k ) = f ( x k ) + α 1 t ′ ∇ f ( x k ) T d k f\left(\mathbf{x}_k+t'\mathbf{d}_k\right)=f(\boldsymbol{x}_{k})+\alpha_1 t'\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k} f(xk+tdk)=f(xk)+α1tf(xk)Tdk
t ′ t' t满足Armijo条件,并且对于小于 t ′ t' t的步长也满足

根据拉格朗日中值定理,存在 t ′ ′ ∈ ( 0 , t ′ ) t''\in\left(0,t'\right) t(0,t),使得
f ( x k + t ′ d k ) − f ( x k ) = t ′ ∇ f ( x k + t ′ ′ d k ) T d k f\left(\mathbf{x}_k+t'\mathbf{d}_k\right)-f(\boldsymbol{x}_{k})=t'\nabla f(\boldsymbol{x}_{k}+t'' \boldsymbol{d}_{k})^T \boldsymbol{d}_{k} f(xk+tdk)f(xk)=tf(xk+tdk)Tdk

结合一下
∇ f ( x k + t ′ ′ d k ) T d k = α 1 ∇ f ( x k ) T d k > α 2 ∇ f ( x k ) T d k \nabla f(\boldsymbol{x}_{k}+t'' \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}=\alpha_1\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}>\alpha_2\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k} f(xk+tdk)Tdk=α1f(xk)Tdk>α2f(xk)Tdk
所以 t ′ ′ t'' t是满足Wolfe准则的步长
注意到 α 2 ∇ f ( x k ) T d k < 0 \alpha_2\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}<0 α2f(xk)Tdk<0以及 α 1 ∇ f ( x k ) T d k < 0 \alpha_1\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}<0 α1f(xk)Tdk<0
所以 t ′ ′ t'' t是满足强Wolfe准则的步长

步骤

https://blog.csdn.net/weixin_43761124/article/details/107436454

收敛性

考虑一般的迭代格式,其中 d k \boldsymbol{d}_k dk是搜索方向, t k t_k tk是步长,且在迭代过程中满足Wolfe准则,假设目标函数 f f f连续可微,有下界,且
∥ ∇ f ( x ) − ∇ f ( y ) ∥ ≤ L ∥ x − y ∥ , ∀ x , y ∈ R n \Vert \nabla f(\boldsymbol{x})- \nabla f(\boldsymbol{y})\Vert\le L\Vert \boldsymbol{x}-\boldsymbol{y}\Vert,\quad \forall \boldsymbol{x},\boldsymbol{y}\in\mathbb{R}^n f(x)f(y)Lxy,x,yRn
那么
∑ k = 0 ∞ cos ⁡ 2 θ k ∥ ∇ 2 f ( x k ) ∥ 2 < + ∞ \sum_{k=0}^{\infty}\cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2<+\infty k=0cos2θk2f(xk)2<+
其中 cos ⁡ θ k \cos\theta_k cosθk为负梯度和下降方向 d k d_k dk的夹角的余弦,即
cos ⁡ θ k = − ∇ f ( x k ) T d k ∥ ∇ f ( x k ) ∥ ∥ d k ∥ \cos \theta_k=\frac{-\nabla f(\boldsymbol{x}_k)^T\boldsymbol{d}_k}{\Vert \nabla f(\boldsymbol{x}_k)\Vert\Vert \boldsymbol{d}_k\Vert} cosθk=f(xk)dkf(xk)Tdk
这个不等式也叫做Zoutendijk不等式

证明:

∇ f ( x k + t k d k ) T d k ≥ α 2 ∇ f ( x k ) T d k ∇ f ( x k + 1 ) T d k ≥ α 2 ∇ f ( x k ) T d k ∇ f ( x k + 1 ) T d k − ∇ f ( x k ) T d k ≥ α 2 ∇ f ( x k ) T d k − ∇ f ( x k ) T d k ( ∇ f ( x k + 1 ) − f ( x k ) ) T d k ≥ ( α 2 − 1 ) ∇ f ( x k ) T d k \begin{aligned} \nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \nabla f(\boldsymbol{x}_{k+1})^T \boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \nabla f(\boldsymbol{x}_{k+1})^T \boldsymbol{d}_{k} -\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}-\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ (\nabla f(\boldsymbol{x}_{k+1})-f(\boldsymbol{x}_{k}))^T\boldsymbol{d}_{k}&\ge(\alpha_2-1)\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k} \end{aligned} f(xk+tkdk)Tdkf(xk+1)Tdkf(xk+1)Tdkf(xk)Tdk(f(xk+1)f(xk))Tdkα2f(xk)Tdkα2f(xk)Tdkα2f(xk)Tdkf(xk)Tdk(α21)f(xk)Tdk
( ∇ f ( x k + 1 ) − f ( x k ) ) T d k ≤ ∥ ∇ f ( x k + 1 ) − f ( x k ) ∥ ∥ d k ∥ ≤ t k L ∥ d k ∥ 2 (\nabla f(\boldsymbol{x}_{k+1})-f(\boldsymbol{x}_{k}))^T\boldsymbol{d}_{k}\le \Vert \nabla f(\boldsymbol{x}_{k+1})-f(\boldsymbol{x}_{k})\Vert \Vert \boldsymbol{d}_{k}\Vert\le t_{k}L\Vert \boldsymbol{d}_{k}\Vert^2 (f(xk+1)f(xk))Tdkf(xk+1)f(xk)dktkLdk2
于是
t k ≥ ( α 2 − 1 ) L ∇ f ( x k ) T d k ∥ d k ∥ 2 t_k\ge \frac{(\alpha_2-1)}{L} \frac{\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}}{\Vert \boldsymbol{d}_{k}\Vert^2} tkL(α21)dk2f(xk)Tdk

注意到 ∇ f ( x k ) T d k < 0 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}<0 f(xk)Tdk<0
f ( x k + t k d k ) ≤ f ( x k ) + α 1 t k ∇ f ( x k ) T d k f ( x k + 1 ) ≤ f ( x k ) + α 1 ( α 2 − 1 ) L ∇ f ( x k ) T d k ∥ d k ∥ 2 ∇ f ( x k ) T d k f ( x k + 1 ) ≤ f ( x k ) + α 1 ( α 2 − 1 ) L cos ⁡ 2 θ k ∥ ∇ 2 f ( x k ) ∥ 2 \begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha_1 t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ f(\boldsymbol{x}_{k+1})&\le f(\boldsymbol{x}_{k})+\alpha_1 \frac{(\alpha_2-1)}{L} \frac{\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}}{\Vert \boldsymbol{d}_{k}\Vert^2}\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ f(\boldsymbol{x}_{k+1})&\le f(\boldsymbol{x}_{k})+\alpha_1 \frac{(\alpha_2-1)}{L} \cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2 \end{aligned} f(xk+tkdk)f(xk+1)f(xk+1)f(xk)+α1tkf(xk)Tdkf(xk)+α1L(α21)dk2f(xk)Tdkf(xk)Tdkf(xk)+α1L(α21)cos2θk2f(xk)2
关于 k k k求和,得到
f ( x k + 1 ) ≤ f ( 0 ) + α 1 ( α 2 − 1 ) L ∑ j = 0 k cos ⁡ 2 θ j ∥ ∇ 2 f ( x j ) ∥ 2 ∑ j = 0 k cos ⁡ 2 θ j ∥ ∇ 2 f ( x j ) ∥ 2 ≤ f ( 0 ) − f ( x k + 1 ) α 1 ( 1 − α 2 ) L \begin{aligned} f(\boldsymbol{x}_{k+1})&\le f(0)+\alpha_1 \frac{(\alpha_2-1)}{L} \sum_{j=0}^{k}\cos^2\theta_j\Vert \nabla^2f(\boldsymbol{x}_j)\Vert^2\\ \sum_{j=0}^{k}\cos^2\theta_j\Vert \nabla^2f(\boldsymbol{x}_j)\Vert^2&\le \frac{f(0)-f(\boldsymbol{x}_{k+1})}{\alpha_1 \frac{(1-\alpha_2)}{L} } \end{aligned} f(xk+1)j=0kcos2θj2f(xj)2f(0)+α1L(α21)j=0kcos2θj2f(xj)2α1L(1α2)f(0)f(xk+1)
其中利用了 0 < α 1 < α 2 < 1 0<\alpha_1<\alpha_2<1 0<α1<α2<1
因为 f f f有下界,所以 k → ∞ k\to\infty k
∑ k = 0 ∞ cos ⁡ 2 θ k ∥ ∇ 2 f ( x k ) ∥ 2 < + ∞ \sum_{k=0}^{\infty}\cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2<+\infty k=0cos2θk2f(xk)2<+

推论

对于迭代法,设
cos ⁡ θ k = − ∇ f ( x k ) T d k ∥ ∇ f ( x k ) ∥ ∥ d k ∥ \cos \theta_k=\frac{-\nabla f(\boldsymbol{x}_k)^T\boldsymbol{d}_k}{\Vert \nabla f(\boldsymbol{x}_k)\Vert\Vert \boldsymbol{d}_k\Vert} cosθk=f(xk)dkf(xk)Tdk
并假设 ∀ k , ∃ γ > 0 \forall k,\exists \gamma>0 k,γ>0使得
θ k < π 2 − γ \theta_k<\frac{\pi}{2}-\gamma θk<2πγ
在Zoutendijk不等式成立的条件下,有
lim ⁡ k → ∞ ∇ f ( x k ) = 0 \lim\limits_{k\to \infty}\nabla f(\boldsymbol{x}_k)=0 klimf(xk)=0
证明:
假设结论不成立,即存在子列 { k l } \{k_l\} {kl}和正常数 δ > 0 \delta>0 δ>0,使得
∥ f ( x k l ) ∥ ≥ δ , l = 1 , 2 , ⋯ \Vert f(\boldsymbol{x}_{k_l})\Vert \ge \delta,l=1,2,\cdots f(xkl)δ,l=1,2,
θ k < π 2 − γ cos ⁡ θ k > cos ⁡ ( π 2 − γ ) = sin ⁡ γ > 0 \begin{aligned} \theta_k &<\frac{\pi}{2}-\gamma\\ \cos \theta_k &>\cos (\frac{\pi}{2}-\gamma)=\sin\gamma>0 \end{aligned} θkcosθk<2πγ>cos(2πγ)=sinγ>0
∑ k = 0 ∞ cos ⁡ 2 θ k ∥ ∇ 2 f ( x k ) ∥ 2 ≥ ∑ l = 1 ∞ cos ⁡ 2 θ k l ∥ ∇ 2 f ( x k l ) ∥ 2 ≥ ∑ l = 1 ∞ sin ⁡ 2 γ δ 2 → + ∞ \begin{aligned} \sum_{k=0}^{\infty}\cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2&\ge \sum_{l=1}^{\infty}\cos^2\theta_{k_l}\Vert \nabla^2f(\boldsymbol{x}_{k_l})\Vert^2\\ &\ge \sum_{l=1}^{\infty}\sin^2\gamma\delta^2\to +\infty\\ \end{aligned} k=0cos2θk2f(xk)2l=1cos2θkl2f(xkl)2l=1sin2γδ2+
矛盾
所以
lim ⁡ k → ∞ ∇ f ( x k ) = 0 \lim\limits_{k\to \infty}\nabla f(\boldsymbol{x}_k)=0 klimf(xk)=0

回溯法

s > 0 , α ∈ ( 0 , 1 ) , β ∈ ( 0 , 1 ) s>0,\alpha\in(0,1),\beta\in(0,1) s>0,α(0,1),β(0,1),我们要让步长满足Armijo准则
即,先猜测 t k = s t_k=s tk=s

f ( x k ) − f ( x k + t k d k ) < − α t k ∇ f ( x k ) T d k f(\boldsymbol{x}_{k})-f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})< -\alpha t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_k f(xk)f(xk+tkdk)<αtkf(xk)Tdk
时,让 t k = β t k t_k=\beta t_k tk=βtk
换句话说, t k = s β i k t_k=s\beta^{i_k} tk=sβik,其中 i k i_k ik满足
f ( x k ) − f ( x k + s β i k d k ) ≥ − α s β i k ∇ f ( x k ) T d k f(\boldsymbol{x}_{k})-f(\boldsymbol{x}_{k}+s\beta^{i_k} \boldsymbol{d}_{k})\ge -\alpha s\beta^{i_k}\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_k f(xk)f(xk+sβikdk)αsβikf(xk)Tdk

梯度方法

引理

f : R n → R f:\mathbb{R}^n \to \mathbb{R} f:RnR是一个在 R n \mathbb{R}^n Rn上连续可微的函数。
x ∈ R n \boldsymbol{x}\in\mathbb{R}^n xRn不是一个驻点,即 ∇ f ( x ) ≠ 0 \nabla f(\boldsymbol{x})\neq 0 f(x)=0
那么
min ⁡ d ∈ R n { f ′ ( x ; d ) : ∥ d ∥ = 1 } \min\limits_{\boldsymbol{d}\in\mathbb{R}^n}\{f'(\boldsymbol{x};\boldsymbol{d}):\Vert \boldsymbol{d}\Vert=1\} dRnmin{f(x;d):d=1}
的最优解是
d = − ∇ f ( x ) ∥ f ( x ) ∥ \boldsymbol{d}=-\frac{\nabla f(\boldsymbol{x})}{\Vert f(\boldsymbol{x})\Vert } d=f(x)f(x)
证明:
f ′ ( x ; d ) = ∇ f ( x ) T d ≥ − ∥ ∇ f ( x ) ∥ ∥ d ∥ f'(\boldsymbol{x};\boldsymbol{d})=\nabla f(\boldsymbol{x})^T\boldsymbol{d}\ge-\Vert \nabla f(\boldsymbol{x})\Vert\Vert \boldsymbol{d}\Vert f(x;d)=f(x)Tdf(x)d
当且仅当 ∇ f ( x ) = λ d \nabla f(\boldsymbol{x})=\lambda \boldsymbol{d} f(x)=λd时取等
d = − ∥ ∇ f ( x ) ∥ 2 \boldsymbol{d}=-\Vert \nabla f(\boldsymbol{x})\Vert^2 d=f(x)2

也就是说,负梯度方向是下降最快的方向

梯度方法框架

1.选择初始点 x 0 ∈ R n \boldsymbol{x}_0\in \mathbb{R}^n x0Rn
2. 循环 k = 0 , 1 , ⋯ k=0,1,\cdots k=0,1,
a)挑选步长 t k t_k tk,使得 f ( x k − t k ∇ f ( x k ) ) < f ( x k ) f(\boldsymbol{x}_{k}-t_{k} \nabla f(\boldsymbol{x}_k))<f(\boldsymbol{x}_{k}) f(xktkf(xk))<f(xk)
c)迭代 x k + 1 = x k − t k ∇ f ( x k ) \boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}-t_{k} \nabla f(\boldsymbol{x}_k) xk+1=xktkf(xk)
d)如果满足停止条件就停止,并返回 x k + 1 \boldsymbol{x}_{k+1} xk+1

这里步长你依然可以用之前的固定步长,精确线搜索,或者回溯法,或者其他非精确线搜索

下面给出梯度方法中用回溯法的代码

function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
    iter=iter+1;
    t=s;
    while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
        t=beta*t;
    end
    x=x-t*grad;
    fun_val=f(x);
    grad=g(x);
    fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
    iter,norm(grad),fun_val);
end

收敛性

下降引理

f ∈ C L 1 , 1 f \in C_{L}^{1,1} fCL1,1,那么 ∀ x , y ∈ R n \forall \mathbf{x},\mathbf{y}\in\mathbb{R}^{n} x,yRn
f ( y ) ≤ f ( x ) + ∇ f ( x ) T ( y − x ) + L 2 ∥ x − y ∥ 2 f(\mathbf{y}) \leq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x})+\frac{L}{2}\|\mathbf{x}-\mathbf{y}\|^{2} f(y)f(x)+f(x)T(yx)+2Lxy2
证明:
f ( y ) − f ( x ) = ∫ 0 1 ⟨ ∇ f ( x + t ( y − x ) ) , y − x ⟩ d t f ( y ) − f ( x ) = ⟨ ∇ f ( x ) , y − x ⟩ + ∫ 0 1 ⟨ ∇ f ( x + t ( y − x ) ) − ∇ f ( x ) , y − x ⟩ d t ∣ f ( y ) − f ( x ) − ⟨ ∇ f ( x ) , y − x ⟩ ∣ = ∣ ∫ 0 1 ⟨ ∇ f ( x + t ( y − x ) ) − ∇ f ( x ) , y − x ⟩ d t ∣ ≤ ∫ 0 1 ∣ ⟨ ∇ f ( x + t ( y − x ) ) − ∇ f ( x ) , y − x ⟩ ∣ d t ≤ ∫ 0 1 ∥ ∇ f ( x + t ( y − x ) ) − ∇ f ( x ) ∥ ⋅ ∥ y − x ∥ d t ≤ ∫ 0 1 t L ∥ y − x ∥ 2 d t = L 2 ∥ y − x ∥ 2 \begin{aligned} f(\mathbf{y})-f(\mathbf{x}) &=\int_{0}^{1}\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x})), \mathbf{y}-\mathbf{x}\rangle \mathrm{d} t\\ f(\mathbf{y})-f(\mathbf{x}) &=\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\int_{0}^{1}\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle \mathrm{d} t\\ |f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle| &=\left|\int_{0}^{1}\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle \mathrm{d} t\right| \\ & \leq \int_{0}^{1}|\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle| \mathrm{d} t \\ & \leq \int_{0}^{1}\|\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x})\| \cdot\|\mathbf{y}-\mathbf{x}\| \mathrm{d} t \\ & \leq \int_{0}^{1} t L\|\mathbf{y}-\mathbf{x}\|^{2} \mathrm{d} t \\ &=\frac{L}{2}\|\mathbf{y}-\mathbf{x}\|^{2} \end{aligned} f(y)f(x)f(y)f(x)f(y)f(x)f(x),yx=01f(x+t(yx)),yxdt=f(x),yx+01f(x+t(yx))f(x),yxdt=01f(x+t(yx))f(x),yxdt01f(x+t(yx))f(x),yxdt01f(x+t(yx))f(x)yxdt01tLyx2dt=2Lyx2

充分下降引理

sufficient decrease lemma

假设 f ∈ C L 1 , 1 f\in C_{L}^{1,1} fCL1,1,那么 ∀ x ∈ R n , t > 0 \forall x\in \mathbb{R}^n,t>0 xRn,t>0,有
f ( x ) − f ( x − t ∇ f ( x ) ) ≥ t ( 1 − L t 2 ) ∥ ∇ f ( x ) ∥ 2 f(\mathbf{x})-f(\mathbf{x}-t \nabla f(\mathbf{x})) \geq t\left(1-\frac{L t}{2}\right)\|\nabla f(\mathbf{x})\|^{2} f(x)f(xtf(x))t(12Lt)f(x)2
证明:
由下降引理
f ( x − t ∇ f ( x ) ) ≤ f ( x ) − t ∥ ∇ f ( x ) ∥ 2 + L t 2 2 ∥ ∇ f ( x ) ∥ 2 = f ( x ) − t ( 1 − L t 2 ) ∥ ∇ f ( x ) ∥ 2 \begin{aligned} f(\mathbf{x}-t \nabla f(\mathbf{x})) & \leq f(\mathbf{x})-t\|\nabla f(\mathbf{x})\|^{2}+\frac{L t^{2}}{2}\|\nabla f(\mathbf{x})\|^{2} \\ &=f(\mathbf{x})-t\left(1-\frac{L t}{2}\right)\|\nabla f(\mathbf{x})\|^{2} \end{aligned} f(xtf(x))f(x)tf(x)2+2Lt2f(x)2=f(x)t(12Lt)f(x)2

固定步长

由充分下降引理
f ( x k ) − f ( x k + 1 ) ≥ t ˉ ( 1 − L t ˉ 2 ) ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq \bar{t}\left(1-\frac{L \bar{t}}{2}\right)\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xk+1)tˉ(12Ltˉ)f(xk)2
要想下降
{ t ˉ > 0 ( 1 − L t ˉ 2 ) > 0 ⇒ 0 < t ˉ < 2 L \begin{cases} \bar{t}>0\\ \left(1-\frac{L \bar{t}}{2}\right)>0 \end{cases}\Rightarrow 0<\bar{t}<\frac{2}{L} {tˉ>0(12Ltˉ)>00<tˉ<L2
显然当 t ˉ = 1 L \bar{t}=\frac{1}{L} tˉ=L1时取得最大值
所以固定步长,可以取 t ˉ = 1 L \bar{t}=\frac{1}{L} tˉ=L1
从而
f ( x k ) − f ( x k + 1 ) = f ( x k ) − f ( x k − 1 L ∇ f ( x k ) ) ≥ 1 2 L ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)=f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{1}{L} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{1}{2 L}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xk+1)=f(xk)f(xkL1f(xk))2L1f(xk)2

精确线搜索

因为 t k = arg ⁡ min ⁡ t ≥ 0 f ( x k − t k ∇ f ( x k ) ) t_{k}=\arg\min\limits_{t\ge 0}f(\boldsymbol{x}_{k}-t_{k}\nabla f\left(\mathbf{x}_{k}\right) ) tk=argt0minf(xktkf(xk))

f ( x k − t k ∇ f ( x k ) ) ≤ f ( x k − 1 L ∇ f ( x k ) ) f\left(\mathbf{x}_{k}-t_{k} \nabla f\left(\mathbf{x}_{k}\right)\right) \leq f\left(\mathbf{x}_{k}-\frac{1}{L} \nabla f\left(\mathbf{x}_{k}\right)\right) f(xktkf(xk))f(xkL1f(xk))
因此
f ( x k ) − f ( x k + 1 ) ≥ f ( x k ) − f ( x k − 1 L ∇ f ( x k ) ) ≥ 1 2 L ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{1}{L} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{1}{2 L}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xk+1)f(xk)f(xkL1f(xk))2L1f(xk)2

回溯法

第一种情况,直接满足Armijo准则,即
f ( x k ) − f ( x k − s ∇ f ( x k ) ) ≥ α s ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-s \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \alpha s\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xksf(xk))αsf(xk)2
第二种情况, t k t_k tk时满足Armijo准则,也就是说 t k β \frac{t_k}{\beta} βtk还不满足Armijo准则

f ( x k ) − f ( x k − t k β ∇ f ( x k ) ) < α t k β ∥ ∇ f ( x k ) ∥ 2 , f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{t_{k}}{\beta} \nabla f\left(\mathbf{x}_{k}\right)\right)<\alpha \frac{t_{k}}{\beta}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}, f(xk)f(xkβtkf(xk))<αβtkf(xk)2,
x = x k , t = t k β x=x_k,t=\frac{t_k}{\beta} x=xk,t=βtk,代入充分下降引理,有
f ( x k ) − f ( x k − t k β ∇ f ( x k ) ) ≥ t k β ( 1 − L t k 2 β ) ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{t_{k}}{\beta} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{t_{k}}{\beta}\left(1-\frac{L t_{k}}{2 \beta}\right)\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xkβtkf(xk))βtk(12βLtk)f(xk)2
联合上面两个式子,得到
t k β ( 1 − L t k 2 β ) < α t k β t k > 2 ( 1 − α ) β L \begin{aligned} \frac{t_{k}}{\beta}\left(1-\frac{L t_{k}}{2 \beta}\right)&<\alpha \frac{t_{k}}{\beta}\\ t_k&>\frac{2(1-\alpha)\beta}{L} \end{aligned} βtk(12βLtk)tk<αβtk>L2(1α)β
综上所述
t k ≥ min ⁡ { s , 2 ( 1 − α ) β L } t_{k} \geq \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} tkmin{s,L2(1α)β}
使得
f ( x k ) − f ( x k − t k ∇ f ( x k ) ) ≥ α min ⁡ { s , 2 ( 1 − α ) β L } ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-t_{k} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xktkf(xk))αmin{s,L2(1α)β}f(xk)2

梯度方法的充分下降

sufficient decrease of the gradient method
f ∈ C L 1 , 1 ( R n ) f\in C_{L}^{1,1}(\mathbb{R}^{n}) fCL1,1(Rn)
{ x k } k ≥ 0 \{\mathbf{x}_k\}_{k\ge 0} {xk}k0为梯度方法产生在解决
min ⁡ x ∈ R n f ( x ) \min\limits_{\mathbf{x}\in\mathbb{R}^n}f(\mathbf{x}) xRnminf(x)
产生的序列,使用的步长

  • 固定步长 t ˉ ∈ ( 0 , 2 L ) \bar{t}\in(0,\frac{2}{L}) tˉ(0,L2)
  • 精确线搜索
  • 回溯法( s ∈ R + + , α ∈ ( 0 , 1 ) , β ∈ ( 0 , 1 ) s\in\mathbb{R}_{++},\alpha\in(0,1),\beta\in(0,1) sR++,α(0,1),β(0,1))

那么
f ( x k ) − f ( x k + 1 ) ≥ M ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xk+1)Mf(xk)2
其中
M = { t ˉ ( 1 − t ˉ L 2 )  固定步长  1 2 L  精确线搜索  α min ⁡ { s , 2 ( 1 − α ) β L } 回溯法  M= \begin{cases} \bar{t}\left(1-\frac{\bar{t} L}{2}\right) & \text { 固定步长 } \\ \frac{1}{2 L} & \text { 精确线搜索 } \\ \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} & \text {回溯法 } \end{cases} M=tˉ(12tˉL)2L1αmin{s,L2(1α)β} 固定步长  精确线搜索 回溯法 

梯度方法收敛性

f ∈ C L 1 , 1 ( R n ) f\in C_{L}^{1,1}(\mathbb{R}^{n}) fCL1,1(Rn)
{ x k } k ≥ 0 \{\mathbf{x}_k\}_{k\ge 0} {xk}k0为梯度方法产生在解决
min ⁡ x ∈ R n f ( x ) \min\limits_{\mathbf{x}\in\mathbb{R}^n}f(\mathbf{x}) xRnminf(x)
产生的序列,使用的步长

  • 固定步长 t ˉ ∈ ( 0 , 2 L ) \bar{t}\in(0,\frac{2}{L}) tˉ(0,L2)
  • 精确线搜索
  • 回溯法( s ∈ R + + , α ∈ ( 0 , 1 ) , β ∈ ( 0 , 1 ) s\in\mathbb{R}_{++},\alpha\in(0,1),\beta\in(0,1) sR++,α(0,1),β(0,1))

假设 f f f有下界,即 ∃ m ∈ R , ∀ x ∈ R n , f ( x ) > m \exists m\in \mathbb{R},\forall \mathbf{x}\in \mathbb{R}^n,f(\mathbf{x})>m mR,xRn,f(x)>m
那么有
a)序列 { f ( x k ) } k ≥ 0 \{f(\mathbf{x}_k)\}_{k\ge 0} {f(xk)}k0单调不增,并且 ∀ k ≥ 0 , f ( x k + 1 ) < f ( x k ) \forall k\ge 0,f(\mathbf{x}_{k+1})<f(\mathbf{x}_k) k0,f(xk+1)<f(xk),除非 ∇ f ( x k ) = 0 \nabla f(\mathbf{x}_k)=0 f(xk)=0
b)当 k → ∞ , ∇ f ( x k ) → 0 k\to \infty,\nabla f(\mathbf{x}_k)\to 0 k,f(xk)0
证明:
a)
f ( x k ) − f ( x k + 1 ) ≥ M ∥ ∇ f ( x k ) ∥ 2 ≥ 0 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}\ge0 f(xk)f(xk+1)Mf(xk)20
M > 0 M>0 M>0,如果 f ( x k ) = f ( x k + 1 ) f\left(\mathbf{x}_{k}\right)=f\left(\mathbf{x}_{k+1}\right) f(xk)=f(xk+1),那么 ∇ f ( x k ) = 0 \nabla f\left(\mathbf{x}_{k}\right)=0 f(xk)=0

b)
{ f ( x k ) } k ≥ 0 \{f(\mathbf{x}_k)\}_{k\ge 0} {f(xk)}k0单调递减有下界,所以收敛
k → ∞ , f ( x k ) − f ( x k + 1 ) k\to \infty,f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) k,f(xk)f(xk+1)那么 k → ∞ , ∇ f ( x k ) → 0 k\to \infty,\nabla f(\mathbf{x}_k)\to 0 k,f(xk)0

梯度的范数收敛速率

{ f ( x k ) } k ≥ 0 \{f(\mathbf{x}_k)\}_{k\ge 0} {f(xk)}k0收敛到 f ∗ f^{*} f,则
min ⁡ k = 0 , 1 , … , n ∥ ∇ f ( x k ) ∥ ≤ f ( x 0 ) − f ∗ M ( n + 1 ) \min _{k=0,1, \ldots, n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\| \leq \sqrt{\frac{f\left(\mathbf{x}_{0}\right)-f^{*}}{M(n+1)}} k=0,1,,nminf(xk)M(n+1)f(x0)f
其中 M = { t ˉ ( 1 − t ˉ L 2 )  固定步长  1 2 L  精确线搜索  α min ⁡ { s , 2 ( 1 − α ) β L } 回溯法  M= \begin{cases} \bar{t}\left(1-\frac{\bar{t} L}{2}\right) & \text { 固定步长 } \\ \frac{1}{2 L} & \text { 精确线搜索 } \\ \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} & \text {回溯法 } \end{cases} M=tˉ(12tˉL)2L1αmin{s,L2(1α)β} 固定步长  精确线搜索 回溯法 

证明:
根据
f ( x k ) − f ( x k + 1 ) ≥ M ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)f(xk+1)Mf(xk)2

f ( x 0 ) − f ∗ ≥ M ∑ k = 0 n ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{0}\right)-f^{*} \geq M\sum_{k=0}^{n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(x0)fMk=0nf(xk)2
显然 ∥ ∇ f ( x k ) ∥ 2 ≥ min ⁡ k = 0 , 1 , ⋯   , n ∥ ∇ f ( x k ) ∥ 2 \left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}\ge \min\limits_{k=0,1,\cdots,n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(xk)2k=0,1,,nminf(xk)2
所以
f ( x 0 ) − f ∗ ≥ M ( n + 1 ) min ⁡ k = 0 , 1 , ⋯   , n ∥ ∇ f ( x k ) ∥ 2 f\left(\mathbf{x}_{0}\right)-f^{*} \geq M(n+1)\min\limits_{k=0,1,\cdots,n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2} f(x0)fM(n+1)k=0,1,,nminf(xk)2
于是
min ⁡ k = 0 , 1 , … , n ∥ ∇ f ( x k ) ∥ ≤ f ( x 0 ) − f ∗ M ( n + 1 ) \min _{k=0,1, \ldots, n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\| \leq \sqrt{\frac{f\left(\mathbf{x}_{0}\right)-f^{*}}{M(n+1)}} k=0,1,,nminf(xk)M(n+1)f(x0)f