下降法(Descent Directions Method)学习
目标
min { f ( x ) : x ∈ R n } \min\{f(\boldsymbol{x}):\boldsymbol{x}\in \mathbb{R}^n \} min{f(x):x∈Rn}
下降方向
设
f
:
R
n
→
R
f:\mathbb{R}^n \to \mathbb{R}
f:Rn→R是一个在
R
n
\mathbb{R}^n
Rn上连续可微的函数。
0
≠
d
∈
R
n
0\neq \boldsymbol{d}\in \mathbb{R}^n
0=d∈Rn,如果
f
′
(
x
;
d
)
=
∇
f
(
x
)
T
d
<
0
f'(\boldsymbol{x};\boldsymbol{d})=\nabla f(\boldsymbol{x})^T\boldsymbol{d}<0
f′(x;d)=∇f(x)Td<0
那么称
d
\boldsymbol{d}
d为一个下降方向
性质
设
f
:
R
n
→
R
f:\mathbb{R}^n \to \mathbb{R}
f:Rn→R是一个在
R
n
\mathbb{R}^n
Rn上连续可微的函数。
x
∈
R
n
\boldsymbol{x}\in \mathbb{R}^n
x∈Rn
如果
d
\boldsymbol{d}
d是一个下降方向,那么
∃
ϵ
>
0
\exists\epsilon>0
∃ϵ>0,使得
∀
t
∈
(
0
,
ϵ
]
\forall t\in(0,\epsilon]
∀t∈(0,ϵ],有
f
(
x
+
t
d
)
<
f
(
x
)
f(\boldsymbol{x}+t\boldsymbol{d})<f(\boldsymbol{x})
f(x+td)<f(x)
证明:
lim
t
→
0
+
f
(
x
+
t
d
)
−
f
(
x
)
t
=
f
′
(
x
;
d
)
<
0
\lim\limits_{t\to 0^{+}}\frac{f(\boldsymbol{x}+t\boldsymbol{d})-f(\boldsymbol{x})}{t}=f'(\boldsymbol{x};\boldsymbol{d})<0
t→0+limtf(x+td)−f(x)=f′(x;d)<0
由保号性,
∃
ϵ
>
0
\exists\epsilon>0
∃ϵ>0,使得
∀
t
∈
(
0
,
ϵ
]
\forall t\in(0,\epsilon]
∀t∈(0,ϵ],有
f
(
x
+
t
d
)
−
f
(
x
)
t
<
0
\frac{f(\boldsymbol{x}+t\boldsymbol{d})-f(\boldsymbol{x})}{t}<0
tf(x+td)−f(x)<0
再根据
t
>
0
t>0
t>0
有
f
(
x
+
t
d
)
<
f
(
x
)
f(\boldsymbol{x}+t\boldsymbol{d})<f(\boldsymbol{x})
f(x+td)<f(x)
Lipschitz连续
主要用到梯度是Lipschitz连续
假设
f
f
f连续可微,如果
∥
∇
f
(
x
)
−
∇
f
(
y
)
∥
≤
L
∥
x
−
y
∥
,
∀
x
,
y
∈
R
n
\Vert \nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y})\Vert \le L\Vert \boldsymbol{x}-\boldsymbol{y}\Vert,\forall \boldsymbol{x},\boldsymbol{y}\in\mathbb{R}^n
∥∇f(x)−∇f(y)∥≤L∥x−y∥,∀x,y∈Rn
那么称他的梯度
∇
f
(
x
)
\nabla f(\boldsymbol{x})
∇f(x)是Lipchitz连续的
L
L
L称为Lipschitz常数
显然
L
~
≥
L
\tilde{L}\ge L
L~≥L也满足条件,所以我们只关心满足条件的
L
L
L中最小的
把梯度Lipschitz连续的函数的一类函数记为
C
L
1
,
1
(
R
n
)
C_{L}^{1,1}(\mathbb{R}^n)
CL1,1(Rn)
如果不关心
L
L
L具体是多少,有时候也记为
C
L
1
,
1
C_{L}^{1,1}
CL1,1
比如
f
(
x
)
=
a
T
x
∈
C
0
1
,
1
f(\boldsymbol{x})=\boldsymbol{a}^T \boldsymbol{x}\in C_{0}^{1,1}
f(x)=aTx∈C01,1
f
(
x
)
=
x
T
A
x
+
2
b
T
x
+
c
∈
C
2
∥
A
∥
1
,
1
f(\boldsymbol{x})=\boldsymbol{x}^TA\boldsymbol{x}+2\boldsymbol{b}^T\boldsymbol{x}+c\in C_{2\Vert A\Vert}^{1,1}
f(x)=xTAx+2bTx+c∈C2∥A∥1,1
定理
假设
f
f
f二阶连续可微,那么下面两个命题等价
a)
f
∈
C
L
1
,
1
f\in C_{L}^{1,1}
f∈CL1,1
b)
∀
x
∈
R
n
,
∥
∇
2
f
(
x
)
∥
≤
L
\forall \boldsymbol{x}\in \mathbb{R}^{n},\Vert \nabla^2f(\boldsymbol{x})\Vert \le L
∀x∈Rn,∥∇2f(x)∥≤L
证明:
b
)
⇒
a
)
b)\Rightarrow a)
b)⇒a)
∇
f
(
y
)
=
∇
f
(
x
)
+
∫
0
1
∇
2
f
(
x
+
t
(
y
−
x
)
)
(
y
−
x
)
d
x
=
∇
f
(
x
)
+
(
∫
0
1
∇
2
f
(
x
+
t
(
y
−
x
)
)
d
x
)
(
y
−
x
)
\begin{aligned} \nabla f(\boldsymbol{y})&=\nabla f(\boldsymbol{x})+\int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))(\boldsymbol{y}-\boldsymbol{x})\mathrm{d}\boldsymbol{x}\\ &=\nabla f(\boldsymbol{x})+(\int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\mathrm{d}\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x})\\ \end{aligned}
∇f(y)=∇f(x)+∫01∇2f(x+t(y−x))(y−x)dx=∇f(x)+(∫01∇2f(x+t(y−x))dx)(y−x)
所以
∥
∇
f
(
y
)
−
∇
f
(
x
)
∥
=
∥
(
∫
0
1
∇
2
f
(
x
+
t
(
y
−
x
)
)
d
x
)
(
y
−
x
)
∥
≤
∥
∫
0
1
∇
2
f
(
x
+
t
(
y
−
x
)
)
d
x
∥
∥
y
−
x
∥
≤
(
∫
0
1
∥
∇
2
f
(
x
+
t
(
y
−
x
)
)
∥
d
x
)
∥
y
−
x
∥
≤
L
∥
y
−
x
∥
\begin{aligned} \Vert \nabla f(\boldsymbol{y})-\nabla f(\boldsymbol{x})\Vert &=\Vert (\int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\mathrm{d}\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x})\Vert\\ &\le \Vert \int_{0}^{1}\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\mathrm{d}\boldsymbol{x}\Vert\Vert\boldsymbol{y}-\boldsymbol{x}\Vert\\ &\le \left(\int_{0}^{1}\Vert\nabla^2 f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\Vert\mathrm{d}\boldsymbol{x}\right)\Vert\boldsymbol{y}-\boldsymbol{x}\Vert\\ &\le L\Vert\boldsymbol{y}-\boldsymbol{x}\Vert \end{aligned}
∥∇f(y)−∇f(x)∥=∥(∫01∇2f(x+t(y−x))dx)(y−x)∥≤∥∫01∇2f(x+t(y−x))dx∥∥y−x∥≤(∫01∥∇2f(x+t(y−x))∥dx)∥y−x∥≤L∥y−x∥
a
)
⇒
b
)
a)\Rightarrow b)
a)⇒b)
∇
f
(
x
+
α
d
)
−
∇
f
(
x
)
=
∫
0
α
∇
2
f
(
x
+
t
d
)
d
d
t
\nabla f(\boldsymbol{x}+\alpha \boldsymbol{d})-\nabla f(\boldsymbol{x})=\int_{0}^{\alpha}\nabla^2f(\boldsymbol{x}+t \boldsymbol{d})\boldsymbol{d}\mathrm{d}t
∇f(x+αd)−∇f(x)=∫0α∇2f(x+td)ddt
所以
∥
(
∫
0
α
∇
2
f
(
x
+
t
d
)
d
t
)
d
∥
=
∥
∇
f
(
x
+
α
d
)
−
∇
f
(
x
)
∥
≤
α
L
∥
d
∥
\Vert \left(\int_{0}^{\alpha}\nabla^2f(\boldsymbol{x}+t \boldsymbol{d})\mathrm{d}t\right)\boldsymbol{d}\Vert=\Vert\nabla f(\boldsymbol{x}+\alpha \boldsymbol{d})-\nabla f(\boldsymbol{x})\Vert\le \alpha L\Vert\boldsymbol{d}\Vert
∥(∫0α∇2f(x+td)dt)d∥=∥∇f(x+αd)−∇f(x)∥≤αL∥d∥
两边同除
α
\alpha
α,并令
α
→
0
+
\alpha \to 0^{+}
α→0+
∥
∇
2
f
(
x
)
d
∥
≤
L
∥
d
∥
⇒
∥
∇
2
f
(
x
)
∥
≤
L
\Vert \nabla^2f(\boldsymbol{x})\boldsymbol{d}\Vert\le L\Vert \boldsymbol{d}\Vert\Rightarrow \Vert \nabla^2f(\boldsymbol{x})\Vert \le L
∥∇2f(x)d∥≤L∥d∥⇒∥∇2f(x)∥≤L
下降法
有了上面的知识,其实我们的迭代就很明显了
x
k
+
1
=
x
k
+
t
k
d
k
,
k
=
0
,
1
,
⋯
\boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k},k=0,1,\cdots
xk+1=xk+tkdk,k=0,1,⋯
下降法的整体框架就是
1.选择初始点
x
0
∈
R
n
\boldsymbol{x}_0\in \mathbb{R}^n
x0∈Rn
2. 循环
k
=
0
,
1
,
⋯
k=0,1,\cdots
k=0,1,⋯
a)选择下降方向
d
k
\boldsymbol{d}_{k}
dk
b)挑选步长
t
k
t_k
tk,使得
f
(
x
k
+
t
k
d
k
)
<
f
(
x
k
)
f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})<f(\boldsymbol{x}_{k})
f(xk+tkdk)<f(xk)
c)迭代
x
k
+
1
=
x
k
+
t
k
d
k
\boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k}
xk+1=xk+tkdk
d)如果满足停止条件就停止,并返回
x
k
+
1
\boldsymbol{x}_{k+1}
xk+1
那么问题来了,
- 怎么选起始点
- 怎么选下降方向
- 怎么选步长
- 停止条件是什么
因为最小值点,起码得是一个驻点,所以停止条件经常就是
∥
f
(
x
)
∥
≤
ϵ
\Vert f(x)\Vert \le \epsilon
∥f(x)∥≤ϵ
所以终点看步长和下降方向
步长
固定步长
就是 ∀ k , t k = t ~ \forall k,t_k=\tilde{t} ∀k,tk=t~
精确线搜索算法
t k = arg min t ≥ 0 f ( x k + t k d k ) t_{k}=\arg\min\limits_{t\ge 0}f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k}) tk=argt≥0minf(xk+tkdk)
非精确线搜索算法
对于非精确线搜索,就是步长需要满足一些条件。
Armijo准则
f
(
x
k
)
−
f
(
x
k
+
t
k
d
k
)
≥
−
α
t
k
∇
f
(
x
k
)
T
d
k
f(\boldsymbol{x}_{k})-f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})\ge -\alpha t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_k
f(xk)−f(xk+tkdk)≥−αtk∇f(xk)Tdk
其中
α
∈
(
0
,
1
)
\alpha\in(0,1)
α∈(0,1)
如果
t
t
t足够小,这个准则总是能满足的
大概下面这个图这种感觉
存在性证明
设
f
:
R
n
→
R
f:\mathbb{R}^n \to \mathbb{R}
f:Rn→R是一个在
R
n
\mathbb{R}^n
Rn上连续可微的函数。
x
∈
R
n
\boldsymbol{x}\in \mathbb{R}^n
x∈Rn
设
0
≠
d
∈
R
n
0\neq \boldsymbol{d}\in \mathbb{R}^n
0=d∈Rn是一个下降方向,
α
∈
(
0
,
1
)
\alpha\in(0,1)
α∈(0,1)
那么
∃
ϵ
>
0
,
∀
t
∈
[
0
,
ϵ
]
\exists \epsilon>0,\forall t\in [0,\epsilon]
∃ϵ>0,∀t∈[0,ϵ],有
f
(
x
)
−
f
(
x
+
t
d
)
≥
−
α
t
∇
f
(
x
)
T
d
f(\boldsymbol{x})-f(\boldsymbol{x}+t \boldsymbol{d})\ge -\alpha t\nabla f(\boldsymbol{x})^T\boldsymbol{d}
f(x)−f(x+td)≥−αt∇f(x)Td
证明:
f
(
x
+
t
d
)
=
f
(
x
)
+
t
∇
f
(
x
)
T
d
+
o
(
t
∥
d
∥
)
f
(
x
)
−
f
(
x
+
t
d
)
=
−
α
t
∇
f
(
x
)
T
d
−
(
1
−
α
)
∇
f
(
x
)
T
d
−
o
(
t
∥
d
∥
)
\begin{aligned} f(\boldsymbol{x}+t \boldsymbol{d})&=f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{d}+o(t\Vert \boldsymbol{d}\Vert)\\ f(\boldsymbol{x})-f(\boldsymbol{x}+t \boldsymbol{d})&=-\alpha t\nabla f(\boldsymbol{x})^T\boldsymbol{d}-(1-\alpha)\nabla f(\boldsymbol{x})^T\boldsymbol{d}-o(t\Vert \boldsymbol{d}\Vert) \end{aligned}
f(x+td)f(x)−f(x+td)=f(x)+t∇f(x)Td+o(t∥d∥)=−αt∇f(x)Td−(1−α)∇f(x)Td−o(t∥d∥)
lim
t
→
0
+
(
1
−
α
)
t
∇
f
(
x
)
T
d
+
o
(
t
∥
d
∥
)
t
=
(
1
−
α
)
∇
f
(
x
)
T
d
<
0
\lim \limits_{t \to 0^{+}} \frac{(1-\alpha) t \nabla f(\mathbf{x})^{T} \mathbf{d}+o(t\Vert\mathbf{d}\Vert)}{t}=(1-\alpha) \nabla f(\mathbf{x})^{T} \mathbf{d}<0
t→0+limt(1−α)t∇f(x)Td+o(t∥d∥)=(1−α)∇f(x)Td<0
因此
∃
ϵ
>
0
,
∀
t
∈
(
0
,
ϵ
]
\exists \epsilon>0,\forall t\in(0,\epsilon]
∃ϵ>0,∀t∈(0,ϵ]
(
1
−
α
)
∇
f
(
x
)
T
d
+
o
(
t
∥
d
∥
)
<
0
(1-\alpha)\nabla f(\boldsymbol{x})^T\boldsymbol{d}+o(t\Vert \boldsymbol{d}\Vert)<0
(1−α)∇f(x)Td+o(t∥d∥)<0
进而
f
(
x
)
−
f
(
x
+
t
d
)
≥
−
α
t
∇
f
(
x
)
T
d
f(\boldsymbol{x})-f(\boldsymbol{x}+t \boldsymbol{d})\ge -\alpha t\nabla f(\boldsymbol{x})^T\boldsymbol{d}
f(x)−f(x+td)≥−αt∇f(x)Td
Goldstein准则
也叫做Armijo-Goldstein准则
为了克服Armijo准则的步长可能过小,这个准则
f
(
x
k
+
t
k
d
k
)
≤
f
(
x
k
)
+
α
t
k
∇
f
(
x
k
)
T
d
k
f
(
x
k
+
t
k
d
k
)
≥
f
(
x
k
)
+
(
1
−
α
)
t
k
∇
f
(
x
k
)
T
d
k
\begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\ge f(\boldsymbol{x}_{k})+(1-\alpha) t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \end{aligned}
f(xk+tkdk)f(xk+tkdk)≤f(xk)+αtk∇f(xk)Tdk≥f(xk)+(1−α)tk∇f(xk)Tdk
其中
α
∈
(
0
,
1
2
)
\alpha\in(0,\frac{1}{2})
α∈(0,21)
然而缺点就是,可能会漏掉最优的点
Wolfe准则
也叫Wolfe-Powell准则
f
(
x
k
+
t
k
d
k
)
≤
f
(
x
k
)
+
α
1
t
k
∇
f
(
x
k
)
T
d
k
∇
f
(
x
k
+
t
k
d
k
)
T
d
k
≥
α
2
∇
f
(
x
k
)
T
d
k
\begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha_1 t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \end{aligned}
f(xk+tkdk)∇f(xk+tkdk)Tdk≤f(xk)+α1tk∇f(xk)Tdk≥α2∇f(xk)Tdk
其中
α
1
,
α
2
∈
(
0
,
1
)
,
α
1
<
α
2
\alpha_1,\alpha_2\in(0,1),\alpha_1<\alpha_2
α1,α2∈(0,1),α1<α2
也有强Wolfe准则
f
(
x
k
+
t
k
d
k
)
≤
f
(
x
k
)
+
α
1
t
k
∇
f
(
x
k
)
T
d
k
∣
∇
f
(
x
k
+
t
k
d
k
)
T
d
k
∣
≤
∣
α
2
∇
f
(
x
k
)
T
d
k
∣
\begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha_1 t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \left|\nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}\right|&\le \left|\alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\right|\\ \end{aligned}
f(xk+tkdk)∣∣∇f(xk+tkdk)Tdk∣∣≤f(xk)+α1tk∇f(xk)Tdk≤∣∣α2∇f(xk)Tdk∣∣
其中
α
1
,
α
2
∈
(
0
,
1
)
,
α
1
<
α
2
\alpha_1,\alpha_2\in(0,1),\alpha_1<\alpha_2
α1,α2∈(0,1),α1<α2
设
ϕ
(
t
k
)
=
f
(
x
k
+
t
k
d
k
)
\phi(t_k)=f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})
ϕ(tk)=f(xk+tkdk)
ϕ
′
(
t
k
)
=
∇
f
(
x
k
+
t
k
d
k
)
T
d
k
\phi'(t_k)=\nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}
ϕ′(tk)=∇f(xk+tkdk)Tdk
而极小值点
t
k
∗
t_k^{*}
tk∗,满足
ϕ
′
(
t
k
∗
)
=
0
\phi'(t_k^*)=0
ϕ′(tk∗)=0
所以第二个条件总是满足的
第一个条件其实就是Armijo准则
存在性
设 f : R n → R f:\mathbb{R}^{n}\to\mathbb{R} f:Rn→R连续可微, d k \mathbf{d}_k dk时一个下降方向, f f f有下界 { x k + α d k ∣ α > 0 } \left\{\mathbf{x}_k+\alpha\mathbf{d}_k|\alpha>0\right\} {xk+αdk∣α>0}.如果 0 < α 1 < α 2 < 1 0<\alpha_1<\alpha_2<1 0<α1<α2<1,则存在步长满足Wolfe准则和强Wolfe准则
证明:
设
ϕ
(
t
k
)
=
f
(
x
k
+
t
k
d
k
)
\phi(t_k)=f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})
ϕ(tk)=f(xk+tkdk)
l
(
t
)
=
f
(
x
k
)
+
α
1
t
∇
f
(
x
k
)
T
d
k
l\left(t\right)=f(\boldsymbol{x}_{k})+\alpha_1 t\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}
l(t)=f(xk)+α1t∇f(xk)Tdk和
ϕ
\phi
ϕ必然有交集
设
t
′
>
0
t'>0
t′>0为他们相交的最小的
t
′
t'
t′,则
f
(
x
k
+
t
′
d
k
)
=
f
(
x
k
)
+
α
1
t
′
∇
f
(
x
k
)
T
d
k
f\left(\mathbf{x}_k+t'\mathbf{d}_k\right)=f(\boldsymbol{x}_{k})+\alpha_1 t'\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}
f(xk+t′dk)=f(xk)+α1t′∇f(xk)Tdk
t
′
t'
t′满足Armijo条件,并且对于小于
t
′
t'
t′的步长也满足
根据拉格朗日中值定理,存在
t
′
′
∈
(
0
,
t
′
)
t''\in\left(0,t'\right)
t′′∈(0,t′),使得
f
(
x
k
+
t
′
d
k
)
−
f
(
x
k
)
=
t
′
∇
f
(
x
k
+
t
′
′
d
k
)
T
d
k
f\left(\mathbf{x}_k+t'\mathbf{d}_k\right)-f(\boldsymbol{x}_{k})=t'\nabla f(\boldsymbol{x}_{k}+t'' \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}
f(xk+t′dk)−f(xk)=t′∇f(xk+t′′dk)Tdk
结合一下
∇
f
(
x
k
+
t
′
′
d
k
)
T
d
k
=
α
1
∇
f
(
x
k
)
T
d
k
>
α
2
∇
f
(
x
k
)
T
d
k
\nabla f(\boldsymbol{x}_{k}+t'' \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}=\alpha_1\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}>\alpha_2\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}
∇f(xk+t′′dk)Tdk=α1∇f(xk)Tdk>α2∇f(xk)Tdk
所以
t
′
′
t''
t′′是满足Wolfe准则的步长
注意到
α
2
∇
f
(
x
k
)
T
d
k
<
0
\alpha_2\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}<0
α2∇f(xk)Tdk<0以及
α
1
∇
f
(
x
k
)
T
d
k
<
0
\alpha_1\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}<0
α1∇f(xk)Tdk<0
所以
t
′
′
t''
t′′是满足强Wolfe准则的步长
步骤
https://blog.csdn.net/weixin_43761124/article/details/107436454
收敛性
考虑一般的迭代格式,其中
d
k
\boldsymbol{d}_k
dk是搜索方向,
t
k
t_k
tk是步长,且在迭代过程中满足Wolfe准则,假设目标函数
f
f
f连续可微,有下界,且
∥
∇
f
(
x
)
−
∇
f
(
y
)
∥
≤
L
∥
x
−
y
∥
,
∀
x
,
y
∈
R
n
\Vert \nabla f(\boldsymbol{x})- \nabla f(\boldsymbol{y})\Vert\le L\Vert \boldsymbol{x}-\boldsymbol{y}\Vert,\quad \forall \boldsymbol{x},\boldsymbol{y}\in\mathbb{R}^n
∥∇f(x)−∇f(y)∥≤L∥x−y∥,∀x,y∈Rn
那么
∑
k
=
0
∞
cos
2
θ
k
∥
∇
2
f
(
x
k
)
∥
2
<
+
∞
\sum_{k=0}^{\infty}\cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2<+\infty
k=0∑∞cos2θk∥∇2f(xk)∥2<+∞
其中
cos
θ
k
\cos\theta_k
cosθk为负梯度和下降方向
d
k
d_k
dk的夹角的余弦,即
cos
θ
k
=
−
∇
f
(
x
k
)
T
d
k
∥
∇
f
(
x
k
)
∥
∥
d
k
∥
\cos \theta_k=\frac{-\nabla f(\boldsymbol{x}_k)^T\boldsymbol{d}_k}{\Vert \nabla f(\boldsymbol{x}_k)\Vert\Vert \boldsymbol{d}_k\Vert}
cosθk=∥∇f(xk)∥∥dk∥−∇f(xk)Tdk
这个不等式也叫做Zoutendijk不等式
证明:
∇
f
(
x
k
+
t
k
d
k
)
T
d
k
≥
α
2
∇
f
(
x
k
)
T
d
k
∇
f
(
x
k
+
1
)
T
d
k
≥
α
2
∇
f
(
x
k
)
T
d
k
∇
f
(
x
k
+
1
)
T
d
k
−
∇
f
(
x
k
)
T
d
k
≥
α
2
∇
f
(
x
k
)
T
d
k
−
∇
f
(
x
k
)
T
d
k
(
∇
f
(
x
k
+
1
)
−
f
(
x
k
)
)
T
d
k
≥
(
α
2
−
1
)
∇
f
(
x
k
)
T
d
k
\begin{aligned} \nabla f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})^T \boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \nabla f(\boldsymbol{x}_{k+1})^T \boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ \nabla f(\boldsymbol{x}_{k+1})^T \boldsymbol{d}_{k} -\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}&\ge \alpha_2 \nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}-\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ (\nabla f(\boldsymbol{x}_{k+1})-f(\boldsymbol{x}_{k}))^T\boldsymbol{d}_{k}&\ge(\alpha_2-1)\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k} \end{aligned}
∇f(xk+tkdk)Tdk∇f(xk+1)Tdk∇f(xk+1)Tdk−∇f(xk)Tdk(∇f(xk+1)−f(xk))Tdk≥α2∇f(xk)Tdk≥α2∇f(xk)Tdk≥α2∇f(xk)Tdk−∇f(xk)Tdk≥(α2−1)∇f(xk)Tdk
(
∇
f
(
x
k
+
1
)
−
f
(
x
k
)
)
T
d
k
≤
∥
∇
f
(
x
k
+
1
)
−
f
(
x
k
)
∥
∥
d
k
∥
≤
t
k
L
∥
d
k
∥
2
(\nabla f(\boldsymbol{x}_{k+1})-f(\boldsymbol{x}_{k}))^T\boldsymbol{d}_{k}\le \Vert \nabla f(\boldsymbol{x}_{k+1})-f(\boldsymbol{x}_{k})\Vert \Vert \boldsymbol{d}_{k}\Vert\le t_{k}L\Vert \boldsymbol{d}_{k}\Vert^2
(∇f(xk+1)−f(xk))Tdk≤∥∇f(xk+1)−f(xk)∥∥dk∥≤tkL∥dk∥2
于是
t
k
≥
(
α
2
−
1
)
L
∇
f
(
x
k
)
T
d
k
∥
d
k
∥
2
t_k\ge \frac{(\alpha_2-1)}{L} \frac{\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}}{\Vert \boldsymbol{d}_{k}\Vert^2}
tk≥L(α2−1)∥dk∥2∇f(xk)Tdk
注意到
∇
f
(
x
k
)
T
d
k
<
0
\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}<0
∇f(xk)Tdk<0
f
(
x
k
+
t
k
d
k
)
≤
f
(
x
k
)
+
α
1
t
k
∇
f
(
x
k
)
T
d
k
f
(
x
k
+
1
)
≤
f
(
x
k
)
+
α
1
(
α
2
−
1
)
L
∇
f
(
x
k
)
T
d
k
∥
d
k
∥
2
∇
f
(
x
k
)
T
d
k
f
(
x
k
+
1
)
≤
f
(
x
k
)
+
α
1
(
α
2
−
1
)
L
cos
2
θ
k
∥
∇
2
f
(
x
k
)
∥
2
\begin{aligned} f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})&\le f(\boldsymbol{x}_{k})+\alpha_1 t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ f(\boldsymbol{x}_{k+1})&\le f(\boldsymbol{x}_{k})+\alpha_1 \frac{(\alpha_2-1)}{L} \frac{\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}}{\Vert \boldsymbol{d}_{k}\Vert^2}\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_{k}\\ f(\boldsymbol{x}_{k+1})&\le f(\boldsymbol{x}_{k})+\alpha_1 \frac{(\alpha_2-1)}{L} \cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2 \end{aligned}
f(xk+tkdk)f(xk+1)f(xk+1)≤f(xk)+α1tk∇f(xk)Tdk≤f(xk)+α1L(α2−1)∥dk∥2∇f(xk)Tdk∇f(xk)Tdk≤f(xk)+α1L(α2−1)cos2θk∥∇2f(xk)∥2
关于
k
k
k求和,得到
f
(
x
k
+
1
)
≤
f
(
0
)
+
α
1
(
α
2
−
1
)
L
∑
j
=
0
k
cos
2
θ
j
∥
∇
2
f
(
x
j
)
∥
2
∑
j
=
0
k
cos
2
θ
j
∥
∇
2
f
(
x
j
)
∥
2
≤
f
(
0
)
−
f
(
x
k
+
1
)
α
1
(
1
−
α
2
)
L
\begin{aligned} f(\boldsymbol{x}_{k+1})&\le f(0)+\alpha_1 \frac{(\alpha_2-1)}{L} \sum_{j=0}^{k}\cos^2\theta_j\Vert \nabla^2f(\boldsymbol{x}_j)\Vert^2\\ \sum_{j=0}^{k}\cos^2\theta_j\Vert \nabla^2f(\boldsymbol{x}_j)\Vert^2&\le \frac{f(0)-f(\boldsymbol{x}_{k+1})}{\alpha_1 \frac{(1-\alpha_2)}{L} } \end{aligned}
f(xk+1)j=0∑kcos2θj∥∇2f(xj)∥2≤f(0)+α1L(α2−1)j=0∑kcos2θj∥∇2f(xj)∥2≤α1L(1−α2)f(0)−f(xk+1)
其中利用了
0
<
α
1
<
α
2
<
1
0<\alpha_1<\alpha_2<1
0<α1<α2<1
因为
f
f
f有下界,所以
k
→
∞
k\to\infty
k→∞时
∑
k
=
0
∞
cos
2
θ
k
∥
∇
2
f
(
x
k
)
∥
2
<
+
∞
\sum_{k=0}^{\infty}\cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2<+\infty
k=0∑∞cos2θk∥∇2f(xk)∥2<+∞
推论
对于迭代法,设
cos
θ
k
=
−
∇
f
(
x
k
)
T
d
k
∥
∇
f
(
x
k
)
∥
∥
d
k
∥
\cos \theta_k=\frac{-\nabla f(\boldsymbol{x}_k)^T\boldsymbol{d}_k}{\Vert \nabla f(\boldsymbol{x}_k)\Vert\Vert \boldsymbol{d}_k\Vert}
cosθk=∥∇f(xk)∥∥dk∥−∇f(xk)Tdk
并假设
∀
k
,
∃
γ
>
0
\forall k,\exists \gamma>0
∀k,∃γ>0使得
θ
k
<
π
2
−
γ
\theta_k<\frac{\pi}{2}-\gamma
θk<2π−γ
在Zoutendijk不等式成立的条件下,有
lim
k
→
∞
∇
f
(
x
k
)
=
0
\lim\limits_{k\to \infty}\nabla f(\boldsymbol{x}_k)=0
k→∞lim∇f(xk)=0
证明:
假设结论不成立,即存在子列
{
k
l
}
\{k_l\}
{kl}和正常数
δ
>
0
\delta>0
δ>0,使得
∥
f
(
x
k
l
)
∥
≥
δ
,
l
=
1
,
2
,
⋯
\Vert f(\boldsymbol{x}_{k_l})\Vert \ge \delta,l=1,2,\cdots
∥f(xkl)∥≥δ,l=1,2,⋯
θ
k
<
π
2
−
γ
cos
θ
k
>
cos
(
π
2
−
γ
)
=
sin
γ
>
0
\begin{aligned} \theta_k &<\frac{\pi}{2}-\gamma\\ \cos \theta_k &>\cos (\frac{\pi}{2}-\gamma)=\sin\gamma>0 \end{aligned}
θkcosθk<2π−γ>cos(2π−γ)=sinγ>0
∑
k
=
0
∞
cos
2
θ
k
∥
∇
2
f
(
x
k
)
∥
2
≥
∑
l
=
1
∞
cos
2
θ
k
l
∥
∇
2
f
(
x
k
l
)
∥
2
≥
∑
l
=
1
∞
sin
2
γ
δ
2
→
+
∞
\begin{aligned} \sum_{k=0}^{\infty}\cos^2\theta_k\Vert \nabla^2f(\boldsymbol{x}_k)\Vert^2&\ge \sum_{l=1}^{\infty}\cos^2\theta_{k_l}\Vert \nabla^2f(\boldsymbol{x}_{k_l})\Vert^2\\ &\ge \sum_{l=1}^{\infty}\sin^2\gamma\delta^2\to +\infty\\ \end{aligned}
k=0∑∞cos2θk∥∇2f(xk)∥2≥l=1∑∞cos2θkl∥∇2f(xkl)∥2≥l=1∑∞sin2γδ2→+∞
矛盾
所以
lim
k
→
∞
∇
f
(
x
k
)
=
0
\lim\limits_{k\to \infty}\nabla f(\boldsymbol{x}_k)=0
k→∞lim∇f(xk)=0
回溯法
设
s
>
0
,
α
∈
(
0
,
1
)
,
β
∈
(
0
,
1
)
s>0,\alpha\in(0,1),\beta\in(0,1)
s>0,α∈(0,1),β∈(0,1),我们要让步长满足Armijo准则
即,先猜测
t
k
=
s
t_k=s
tk=s
当
f
(
x
k
)
−
f
(
x
k
+
t
k
d
k
)
<
−
α
t
k
∇
f
(
x
k
)
T
d
k
f(\boldsymbol{x}_{k})-f(\boldsymbol{x}_{k}+t_{k} \boldsymbol{d}_{k})< -\alpha t_k\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_k
f(xk)−f(xk+tkdk)<−αtk∇f(xk)Tdk
时,让
t
k
=
β
t
k
t_k=\beta t_k
tk=βtk
换句话说,
t
k
=
s
β
i
k
t_k=s\beta^{i_k}
tk=sβik,其中
i
k
i_k
ik满足
f
(
x
k
)
−
f
(
x
k
+
s
β
i
k
d
k
)
≥
−
α
s
β
i
k
∇
f
(
x
k
)
T
d
k
f(\boldsymbol{x}_{k})-f(\boldsymbol{x}_{k}+s\beta^{i_k} \boldsymbol{d}_{k})\ge -\alpha s\beta^{i_k}\nabla f(\boldsymbol{x}_{k})^T\boldsymbol{d}_k
f(xk)−f(xk+sβikdk)≥−αsβik∇f(xk)Tdk
梯度方法
引理
设
f
:
R
n
→
R
f:\mathbb{R}^n \to \mathbb{R}
f:Rn→R是一个在
R
n
\mathbb{R}^n
Rn上连续可微的函数。
设
x
∈
R
n
\boldsymbol{x}\in\mathbb{R}^n
x∈Rn不是一个驻点,即
∇
f
(
x
)
≠
0
\nabla f(\boldsymbol{x})\neq 0
∇f(x)=0
那么
min
d
∈
R
n
{
f
′
(
x
;
d
)
:
∥
d
∥
=
1
}
\min\limits_{\boldsymbol{d}\in\mathbb{R}^n}\{f'(\boldsymbol{x};\boldsymbol{d}):\Vert \boldsymbol{d}\Vert=1\}
d∈Rnmin{f′(x;d):∥d∥=1}
的最优解是
d
=
−
∇
f
(
x
)
∥
f
(
x
)
∥
\boldsymbol{d}=-\frac{\nabla f(\boldsymbol{x})}{\Vert f(\boldsymbol{x})\Vert }
d=−∥f(x)∥∇f(x)
证明:
f
′
(
x
;
d
)
=
∇
f
(
x
)
T
d
≥
−
∥
∇
f
(
x
)
∥
∥
d
∥
f'(\boldsymbol{x};\boldsymbol{d})=\nabla f(\boldsymbol{x})^T\boldsymbol{d}\ge-\Vert \nabla f(\boldsymbol{x})\Vert\Vert \boldsymbol{d}\Vert
f′(x;d)=∇f(x)Td≥−∥∇f(x)∥∥d∥
当且仅当
∇
f
(
x
)
=
λ
d
\nabla f(\boldsymbol{x})=\lambda \boldsymbol{d}
∇f(x)=λd时取等
即
d
=
−
∥
∇
f
(
x
)
∥
2
\boldsymbol{d}=-\Vert \nabla f(\boldsymbol{x})\Vert^2
d=−∥∇f(x)∥2
也就是说,负梯度方向是下降最快的方向
梯度方法框架
1.选择初始点
x
0
∈
R
n
\boldsymbol{x}_0\in \mathbb{R}^n
x0∈Rn
2. 循环
k
=
0
,
1
,
⋯
k=0,1,\cdots
k=0,1,⋯
a)挑选步长
t
k
t_k
tk,使得
f
(
x
k
−
t
k
∇
f
(
x
k
)
)
<
f
(
x
k
)
f(\boldsymbol{x}_{k}-t_{k} \nabla f(\boldsymbol{x}_k))<f(\boldsymbol{x}_{k})
f(xk−tk∇f(xk))<f(xk)
c)迭代
x
k
+
1
=
x
k
−
t
k
∇
f
(
x
k
)
\boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}-t_{k} \nabla f(\boldsymbol{x}_k)
xk+1=xk−tk∇f(xk)
d)如果满足停止条件就停止,并返回
x
k
+
1
\boldsymbol{x}_{k+1}
xk+1
这里步长你依然可以用之前的固定步长,精确线搜索,或者回溯法,或者其他非精确线搜索
下面给出梯度方法中用回溯法的代码
function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
iter=iter+1;
t=s;
while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
t=beta*t;
end
x=x-t*grad;
fun_val=f(x);
grad=g(x);
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(grad),fun_val);
end
收敛性
下降引理
设
f
∈
C
L
1
,
1
f \in C_{L}^{1,1}
f∈CL1,1,那么
∀
x
,
y
∈
R
n
\forall \mathbf{x},\mathbf{y}\in\mathbb{R}^{n}
∀x,y∈Rn
f
(
y
)
≤
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
+
L
2
∥
x
−
y
∥
2
f(\mathbf{y}) \leq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x})+\frac{L}{2}\|\mathbf{x}-\mathbf{y}\|^{2}
f(y)≤f(x)+∇f(x)T(y−x)+2L∥x−y∥2
证明:
f
(
y
)
−
f
(
x
)
=
∫
0
1
⟨
∇
f
(
x
+
t
(
y
−
x
)
)
,
y
−
x
⟩
d
t
f
(
y
)
−
f
(
x
)
=
⟨
∇
f
(
x
)
,
y
−
x
⟩
+
∫
0
1
⟨
∇
f
(
x
+
t
(
y
−
x
)
)
−
∇
f
(
x
)
,
y
−
x
⟩
d
t
∣
f
(
y
)
−
f
(
x
)
−
⟨
∇
f
(
x
)
,
y
−
x
⟩
∣
=
∣
∫
0
1
⟨
∇
f
(
x
+
t
(
y
−
x
)
)
−
∇
f
(
x
)
,
y
−
x
⟩
d
t
∣
≤
∫
0
1
∣
⟨
∇
f
(
x
+
t
(
y
−
x
)
)
−
∇
f
(
x
)
,
y
−
x
⟩
∣
d
t
≤
∫
0
1
∥
∇
f
(
x
+
t
(
y
−
x
)
)
−
∇
f
(
x
)
∥
⋅
∥
y
−
x
∥
d
t
≤
∫
0
1
t
L
∥
y
−
x
∥
2
d
t
=
L
2
∥
y
−
x
∥
2
\begin{aligned} f(\mathbf{y})-f(\mathbf{x}) &=\int_{0}^{1}\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x})), \mathbf{y}-\mathbf{x}\rangle \mathrm{d} t\\ f(\mathbf{y})-f(\mathbf{x}) &=\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\int_{0}^{1}\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle \mathrm{d} t\\ |f(\mathbf{y})-f(\mathbf{x})-\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle| &=\left|\int_{0}^{1}\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle \mathrm{d} t\right| \\ & \leq \int_{0}^{1}|\langle\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle| \mathrm{d} t \\ & \leq \int_{0}^{1}\|\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-\nabla f(\mathbf{x})\| \cdot\|\mathbf{y}-\mathbf{x}\| \mathrm{d} t \\ & \leq \int_{0}^{1} t L\|\mathbf{y}-\mathbf{x}\|^{2} \mathrm{d} t \\ &=\frac{L}{2}\|\mathbf{y}-\mathbf{x}\|^{2} \end{aligned}
f(y)−f(x)f(y)−f(x)∣f(y)−f(x)−⟨∇f(x),y−x⟩∣=∫01⟨∇f(x+t(y−x)),y−x⟩dt=⟨∇f(x),y−x⟩+∫01⟨∇f(x+t(y−x))−∇f(x),y−x⟩dt=∣∣∣∣∫01⟨∇f(x+t(y−x))−∇f(x),y−x⟩dt∣∣∣∣≤∫01∣⟨∇f(x+t(y−x))−∇f(x),y−x⟩∣dt≤∫01∥∇f(x+t(y−x))−∇f(x)∥⋅∥y−x∥dt≤∫01tL∥y−x∥2dt=2L∥y−x∥2
充分下降引理
sufficient decrease lemma
假设
f
∈
C
L
1
,
1
f\in C_{L}^{1,1}
f∈CL1,1,那么
∀
x
∈
R
n
,
t
>
0
\forall x\in \mathbb{R}^n,t>0
∀x∈Rn,t>0,有
f
(
x
)
−
f
(
x
−
t
∇
f
(
x
)
)
≥
t
(
1
−
L
t
2
)
∥
∇
f
(
x
)
∥
2
f(\mathbf{x})-f(\mathbf{x}-t \nabla f(\mathbf{x})) \geq t\left(1-\frac{L t}{2}\right)\|\nabla f(\mathbf{x})\|^{2}
f(x)−f(x−t∇f(x))≥t(1−2Lt)∥∇f(x)∥2
证明:
由下降引理
f
(
x
−
t
∇
f
(
x
)
)
≤
f
(
x
)
−
t
∥
∇
f
(
x
)
∥
2
+
L
t
2
2
∥
∇
f
(
x
)
∥
2
=
f
(
x
)
−
t
(
1
−
L
t
2
)
∥
∇
f
(
x
)
∥
2
\begin{aligned} f(\mathbf{x}-t \nabla f(\mathbf{x})) & \leq f(\mathbf{x})-t\|\nabla f(\mathbf{x})\|^{2}+\frac{L t^{2}}{2}\|\nabla f(\mathbf{x})\|^{2} \\ &=f(\mathbf{x})-t\left(1-\frac{L t}{2}\right)\|\nabla f(\mathbf{x})\|^{2} \end{aligned}
f(x−t∇f(x))≤f(x)−t∥∇f(x)∥2+2Lt2∥∇f(x)∥2=f(x)−t(1−2Lt)∥∇f(x)∥2
固定步长
由充分下降引理
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
t
ˉ
(
1
−
L
t
ˉ
2
)
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq \bar{t}\left(1-\frac{L \bar{t}}{2}\right)\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk+1)≥tˉ(1−2Ltˉ)∥∇f(xk)∥2
要想下降
{
t
ˉ
>
0
(
1
−
L
t
ˉ
2
)
>
0
⇒
0
<
t
ˉ
<
2
L
\begin{cases} \bar{t}>0\\ \left(1-\frac{L \bar{t}}{2}\right)>0 \end{cases}\Rightarrow 0<\bar{t}<\frac{2}{L}
{tˉ>0(1−2Ltˉ)>0⇒0<tˉ<L2
显然当
t
ˉ
=
1
L
\bar{t}=\frac{1}{L}
tˉ=L1时取得最大值
所以固定步长,可以取
t
ˉ
=
1
L
\bar{t}=\frac{1}{L}
tˉ=L1
从而
f
(
x
k
)
−
f
(
x
k
+
1
)
=
f
(
x
k
)
−
f
(
x
k
−
1
L
∇
f
(
x
k
)
)
≥
1
2
L
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)=f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{1}{L} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{1}{2 L}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk+1)=f(xk)−f(xk−L1∇f(xk))≥2L1∥∇f(xk)∥2
精确线搜索
因为
t
k
=
arg
min
t
≥
0
f
(
x
k
−
t
k
∇
f
(
x
k
)
)
t_{k}=\arg\min\limits_{t\ge 0}f(\boldsymbol{x}_{k}-t_{k}\nabla f\left(\mathbf{x}_{k}\right) )
tk=argt≥0minf(xk−tk∇f(xk))
有
f
(
x
k
−
t
k
∇
f
(
x
k
)
)
≤
f
(
x
k
−
1
L
∇
f
(
x
k
)
)
f\left(\mathbf{x}_{k}-t_{k} \nabla f\left(\mathbf{x}_{k}\right)\right) \leq f\left(\mathbf{x}_{k}-\frac{1}{L} \nabla f\left(\mathbf{x}_{k}\right)\right)
f(xk−tk∇f(xk))≤f(xk−L1∇f(xk))
因此
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
f
(
x
k
)
−
f
(
x
k
−
1
L
∇
f
(
x
k
)
)
≥
1
2
L
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{1}{L} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{1}{2 L}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk+1)≥f(xk)−f(xk−L1∇f(xk))≥2L1∥∇f(xk)∥2
回溯法
第一种情况,直接满足Armijo准则,即
f
(
x
k
)
−
f
(
x
k
−
s
∇
f
(
x
k
)
)
≥
α
s
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-s \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \alpha s\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk−s∇f(xk))≥αs∥∇f(xk)∥2
第二种情况,
t
k
t_k
tk时满足Armijo准则,也就是说
t
k
β
\frac{t_k}{\beta}
βtk还不满足Armijo准则
有
f
(
x
k
)
−
f
(
x
k
−
t
k
β
∇
f
(
x
k
)
)
<
α
t
k
β
∥
∇
f
(
x
k
)
∥
2
,
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{t_{k}}{\beta} \nabla f\left(\mathbf{x}_{k}\right)\right)<\alpha \frac{t_{k}}{\beta}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2},
f(xk)−f(xk−βtk∇f(xk))<αβtk∥∇f(xk)∥2,
让
x
=
x
k
,
t
=
t
k
β
x=x_k,t=\frac{t_k}{\beta}
x=xk,t=βtk,代入充分下降引理,有
f
(
x
k
)
−
f
(
x
k
−
t
k
β
∇
f
(
x
k
)
)
≥
t
k
β
(
1
−
L
t
k
2
β
)
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-\frac{t_{k}}{\beta} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \frac{t_{k}}{\beta}\left(1-\frac{L t_{k}}{2 \beta}\right)\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk−βtk∇f(xk))≥βtk(1−2βLtk)∥∇f(xk)∥2
联合上面两个式子,得到
t
k
β
(
1
−
L
t
k
2
β
)
<
α
t
k
β
t
k
>
2
(
1
−
α
)
β
L
\begin{aligned} \frac{t_{k}}{\beta}\left(1-\frac{L t_{k}}{2 \beta}\right)&<\alpha \frac{t_{k}}{\beta}\\ t_k&>\frac{2(1-\alpha)\beta}{L} \end{aligned}
βtk(1−2βLtk)tk<αβtk>L2(1−α)β
综上所述
t
k
≥
min
{
s
,
2
(
1
−
α
)
β
L
}
t_{k} \geq \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\}
tk≥min{s,L2(1−α)β}
使得
f
(
x
k
)
−
f
(
x
k
−
t
k
∇
f
(
x
k
)
)
≥
α
min
{
s
,
2
(
1
−
α
)
β
L
}
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k}-t_{k} \nabla f\left(\mathbf{x}_{k}\right)\right) \geq \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk−tk∇f(xk))≥αmin{s,L2(1−α)β}∥∇f(xk)∥2
梯度方法的充分下降
sufficient decrease of the gradient method
设
f
∈
C
L
1
,
1
(
R
n
)
f\in C_{L}^{1,1}(\mathbb{R}^{n})
f∈CL1,1(Rn)
设
{
x
k
}
k
≥
0
\{\mathbf{x}_k\}_{k\ge 0}
{xk}k≥0为梯度方法产生在解决
min
x
∈
R
n
f
(
x
)
\min\limits_{\mathbf{x}\in\mathbb{R}^n}f(\mathbf{x})
x∈Rnminf(x)
产生的序列,使用的步长
- 固定步长 t ˉ ∈ ( 0 , 2 L ) \bar{t}\in(0,\frac{2}{L}) tˉ∈(0,L2)
- 精确线搜索
- 回溯法( s ∈ R + + , α ∈ ( 0 , 1 ) , β ∈ ( 0 , 1 ) s\in\mathbb{R}_{++},\alpha\in(0,1),\beta\in(0,1) s∈R++,α∈(0,1),β∈(0,1))
那么
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
M
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk+1)≥M∥∇f(xk)∥2
其中
M
=
{
t
ˉ
(
1
−
t
ˉ
L
2
)
固定步长
1
2
L
精确线搜索
α
min
{
s
,
2
(
1
−
α
)
β
L
}
回溯法
M= \begin{cases} \bar{t}\left(1-\frac{\bar{t} L}{2}\right) & \text { 固定步长 } \\ \frac{1}{2 L} & \text { 精确线搜索 } \\ \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} & \text {回溯法 } \end{cases}
M=⎩⎪⎪⎨⎪⎪⎧tˉ(1−2tˉL)2L1αmin{s,L2(1−α)β} 固定步长 精确线搜索 回溯法
梯度方法收敛性
设
f
∈
C
L
1
,
1
(
R
n
)
f\in C_{L}^{1,1}(\mathbb{R}^{n})
f∈CL1,1(Rn)
设
{
x
k
}
k
≥
0
\{\mathbf{x}_k\}_{k\ge 0}
{xk}k≥0为梯度方法产生在解决
min
x
∈
R
n
f
(
x
)
\min\limits_{\mathbf{x}\in\mathbb{R}^n}f(\mathbf{x})
x∈Rnminf(x)
产生的序列,使用的步长
- 固定步长 t ˉ ∈ ( 0 , 2 L ) \bar{t}\in(0,\frac{2}{L}) tˉ∈(0,L2)
- 精确线搜索
- 回溯法( s ∈ R + + , α ∈ ( 0 , 1 ) , β ∈ ( 0 , 1 ) s\in\mathbb{R}_{++},\alpha\in(0,1),\beta\in(0,1) s∈R++,α∈(0,1),β∈(0,1))
假设
f
f
f有下界,即
∃
m
∈
R
,
∀
x
∈
R
n
,
f
(
x
)
>
m
\exists m\in \mathbb{R},\forall \mathbf{x}\in \mathbb{R}^n,f(\mathbf{x})>m
∃m∈R,∀x∈Rn,f(x)>m
那么有
a)序列
{
f
(
x
k
)
}
k
≥
0
\{f(\mathbf{x}_k)\}_{k\ge 0}
{f(xk)}k≥0单调不增,并且
∀
k
≥
0
,
f
(
x
k
+
1
)
<
f
(
x
k
)
\forall k\ge 0,f(\mathbf{x}_{k+1})<f(\mathbf{x}_k)
∀k≥0,f(xk+1)<f(xk),除非
∇
f
(
x
k
)
=
0
\nabla f(\mathbf{x}_k)=0
∇f(xk)=0
b)当
k
→
∞
,
∇
f
(
x
k
)
→
0
k\to \infty,\nabla f(\mathbf{x}_k)\to 0
k→∞,∇f(xk)→0
证明:
a)
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
M
∥
∇
f
(
x
k
)
∥
2
≥
0
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}\ge0
f(xk)−f(xk+1)≥M∥∇f(xk)∥2≥0
M
>
0
M>0
M>0,如果
f
(
x
k
)
=
f
(
x
k
+
1
)
f\left(\mathbf{x}_{k}\right)=f\left(\mathbf{x}_{k+1}\right)
f(xk)=f(xk+1),那么
∇
f
(
x
k
)
=
0
\nabla f\left(\mathbf{x}_{k}\right)=0
∇f(xk)=0
b)
{
f
(
x
k
)
}
k
≥
0
\{f(\mathbf{x}_k)\}_{k\ge 0}
{f(xk)}k≥0单调递减有下界,所以收敛
k
→
∞
,
f
(
x
k
)
−
f
(
x
k
+
1
)
k\to \infty,f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)
k→∞,f(xk)−f(xk+1)那么
k
→
∞
,
∇
f
(
x
k
)
→
0
k\to \infty,\nabla f(\mathbf{x}_k)\to 0
k→∞,∇f(xk)→0
梯度的范数收敛速率
设
{
f
(
x
k
)
}
k
≥
0
\{f(\mathbf{x}_k)\}_{k\ge 0}
{f(xk)}k≥0收敛到
f
∗
f^{*}
f∗,则
min
k
=
0
,
1
,
…
,
n
∥
∇
f
(
x
k
)
∥
≤
f
(
x
0
)
−
f
∗
M
(
n
+
1
)
\min _{k=0,1, \ldots, n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\| \leq \sqrt{\frac{f\left(\mathbf{x}_{0}\right)-f^{*}}{M(n+1)}}
k=0,1,…,nmin∥∇f(xk)∥≤M(n+1)f(x0)−f∗
其中
M
=
{
t
ˉ
(
1
−
t
ˉ
L
2
)
固定步长
1
2
L
精确线搜索
α
min
{
s
,
2
(
1
−
α
)
β
L
}
回溯法
M= \begin{cases} \bar{t}\left(1-\frac{\bar{t} L}{2}\right) & \text { 固定步长 } \\ \frac{1}{2 L} & \text { 精确线搜索 } \\ \alpha \min \left\{s, \frac{2(1-\alpha) \beta}{L}\right\} & \text {回溯法 } \end{cases}
M=⎩⎪⎪⎨⎪⎪⎧tˉ(1−2tˉL)2L1αmin{s,L2(1−α)β} 固定步长 精确线搜索 回溯法
证明:
根据
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
M
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq M\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(xk)−f(xk+1)≥M∥∇f(xk)∥2
有
f
(
x
0
)
−
f
∗
≥
M
∑
k
=
0
n
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{0}\right)-f^{*} \geq M\sum_{k=0}^{n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(x0)−f∗≥Mk=0∑n∥∇f(xk)∥2
显然
∥
∇
f
(
x
k
)
∥
2
≥
min
k
=
0
,
1
,
⋯
,
n
∥
∇
f
(
x
k
)
∥
2
\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}\ge \min\limits_{k=0,1,\cdots,n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
∥∇f(xk)∥2≥k=0,1,⋯,nmin∥∇f(xk)∥2
所以
f
(
x
0
)
−
f
∗
≥
M
(
n
+
1
)
min
k
=
0
,
1
,
⋯
,
n
∥
∇
f
(
x
k
)
∥
2
f\left(\mathbf{x}_{0}\right)-f^{*} \geq M(n+1)\min\limits_{k=0,1,\cdots,n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\|^{2}
f(x0)−f∗≥M(n+1)k=0,1,⋯,nmin∥∇f(xk)∥2
于是
min
k
=
0
,
1
,
…
,
n
∥
∇
f
(
x
k
)
∥
≤
f
(
x
0
)
−
f
∗
M
(
n
+
1
)
\min _{k=0,1, \ldots, n}\left\|\nabla f\left(\mathbf{x}_{k}\right)\right\| \leq \sqrt{\frac{f\left(\mathbf{x}_{0}\right)-f^{*}}{M(n+1)}}
k=0,1,…,nmin∥∇f(xk)∥≤M(n+1)f(x0)−f∗