Introduction to nonlinear optimization第五章习题
5.1. Find without MATLAB the Cholesky factorization of the matrix
A
=
(
1
2
4
7
2
13
23
38
4
23
77
122
7
38
122
294
)
\mathbf{A}=\left(\begin{array}{cccc} 1 & 2 & 4 & 7 \\ 2 & 13 & 23 & 38 \\ 4 & 23 & 77 & 122 \\ 7 & 38 & 122 & 294\\ \end{array}\right)
A=⎝⎜⎜⎛1247213233842377122738122294⎠⎟⎟⎞
解:
设
L
=
(
l
11
0
0
0
l
21
l
22
0
0
l
31
l
32
l
33
0
l
41
l
42
l
43
l
44
)
\mathbf{L}=\begin{pmatrix} l_{11}&0&0&0\\ l_{21}&l_{22}&0&0\\ l_{31}&l_{32}&l_{33}&0\\ l_{41}&l_{42}&l_{43}&l_{44}\\ \end{pmatrix}
L=⎝⎜⎜⎛l11l21l31l410l22l32l4200l33l43000l44⎠⎟⎟⎞
l
11
=
a
11
=
1
l_{11}=\sqrt{a_{11}}=1
l11=a11=1
(
l
21
l
31
l
41
)
=
1
1
(
2
4
7
)
=
(
2
4
7
)
\begin{pmatrix} l_{21}\\ l_{31}\\ l_{41}\\ \end{pmatrix}=\frac{1}{\sqrt{1}}\begin{pmatrix} 2\\ 4\\ 7\\ \end{pmatrix}=\begin{pmatrix} 2\\ 4\\ 7\\ \end{pmatrix}
⎝⎛l21l31l41⎠⎞=11⎝⎛247⎠⎞=⎝⎛247⎠⎞
L
′
=
(
l
22
0
0
l
32
l
33
0
l
42
l
43
l
44
)
\mathbf{L}'=\begin{pmatrix} l_{22}&0&0\\ l_{32}&l_{33}&0\\ l_{42}&l_{43}&l_{44}\\ \end{pmatrix}
L′=⎝⎛l22l32l420l33l4300l44⎠⎞
L
′
L
′
T
=
(
13
23
38
23
77
122
38
122
294
)
−
1
1
(
2
4
7
)
(
2
4
7
)
T
=
(
9
15
24
15
61
94
24
94
245
)
\mathbf{L}'\mathbf{L}'^T=\begin{pmatrix} 13 & 23 & 38 \\ 23 & 77 & 122 \\ 38 & 122 & 294\\ \end{pmatrix}-\frac{1}{1}\begin{pmatrix}2\\ 4\\ 7\\\end{pmatrix}\begin{pmatrix}2\\ 4\\ 7\\\end{pmatrix}^T=\begin{pmatrix} 9&15&24\\ 15&61&94\\ 24&94&245\\ \end{pmatrix}
L′L′T=⎝⎛132338237712238122294⎠⎞−11⎝⎛247⎠⎞⎝⎛247⎠⎞T=⎝⎛915241561942494245⎠⎞
l
22
=
9
=
3
l_{22}=\sqrt{9}=3
l22=9=3
(
l
32
l
42
)
=
1
9
(
15
24
)
=
(
5
8
)
\begin{pmatrix} l_{32}\\ l_{42}\\ \end{pmatrix}=\frac{1}{\sqrt{9}}\begin{pmatrix} 15\\ 24\\ \end{pmatrix}=\begin{pmatrix} 5\\ 8\\ \end{pmatrix}
(l32l42)=91(1524)=(58)
L
′
′
=
(
l
33
0
l
43
l
44
)
\mathbf{L}''=\begin{pmatrix} l_{33}&0\\ l_{43}&l_{44}\\ \end{pmatrix}
L′′=(l33l430l44)
L
′
′
L
′
′
T
=
(
61
94
94
245
)
−
1
9
(
15
24
)
(
15
24
)
T
=
(
36
54
54
181
)
\mathbf{L}''\mathbf{L}''^T=\begin{pmatrix} 61&94\\ 94&245\\ \end{pmatrix}-\frac{1}{9}\begin{pmatrix} 15\\ 24\\ \end{pmatrix}\begin{pmatrix} 15\\ 24\\ \end{pmatrix}^T=\begin{pmatrix} 36&54\\ 54&181\\ \end{pmatrix}
L′′L′′T=(619494245)−91(1524)(1524)T=(365454181)
l
33
=
36
=
6
l_{33}=\sqrt{36}=6
l33=36=6
l
43
=
54
36
=
9
l_{43}=\frac{54}{\sqrt{36}}=9
l43=3654=9
l
44
2
=
181
−
54
⋅
54
36
=
100
⇒
l
44
=
10
l_{44}^2=181-\frac{54\cdot 54}{36}=100\Rightarrow l_{44}=10
l442=181−3654⋅54=100⇒l44=10
所以
(
1
0
0
0
2
3
0
0
4
5
6
0
7
8
9
10
)
\begin{pmatrix} 1&0&0&0\\ 2&3&0&0\\ 4&5&6&0\\ 7&8&9&10\\ \end{pmatrix}
⎝⎜⎜⎛12470358006900010⎠⎟⎟⎞
5.2. Consider the Freudenstein and Roth test function
f
(
x
)
=
f
1
(
x
)
2
+
f
2
(
x
)
2
,
x
∈
R
2
f\left(\mathbf{x}\right)=f_1\left(\mathbf{x}\right)^2+f_2\left(\mathbf{x}\right)^2,\quad \mathbf{x}\in\mathbb{R}^2
f(x)=f1(x)2+f2(x)2,x∈R2
where
f
1
(
x
)
=
−
13
+
x
1
+
(
(
5
−
x
2
)
x
2
−
2
)
x
2
f
2
(
x
)
=
−
29
+
x
1
+
(
(
x
2
+
1
)
x
2
−
14
)
x
2
\begin{aligned} &f_{1}(\mathbf{x})=-13+x_{1}+\left(\left(5-x_{2}\right) x_{2}-2\right) x_{2} \\ &f_{2}(\mathbf{x})=-29+x_{1}+\left(\left(x_{2}+1\right) x_{2}-14\right) x_{2} \end{aligned}
f1(x)=−13+x1+((5−x2)x2−2)x2f2(x)=−29+x1+((x2+1)x2−14)x2
(i) Show that the function
f
f
f has three stationary points. Find them and prove
that one is a global minimizer, one is a strict local minimum and the third is
a saddle point.
(ii) Use MATLAB to employ the following three methods on the problem of minimizing f:
- the gradient method with backtracking and parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
- the hybrid Newton’s method with parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
- damped Gauss-Newton’s method with a backtracking line search strategy with parameters ( s , α , β ) = ( 1 , 0.5 , 0.5 ) \left(s,\alpha,\beta\right)=\left(1,0.5,0.5\right) (s,α,β)=(1,0.5,0.5).
All the algorithms should use the stopping criteria
∥
∇
f
(
x
)
∥
≤
1
0
−
5
\|\nabla f\left(\mathbf{x}\right)\|\le 10^{-5}
∥∇f(x)∥≤10−5. Each
algorithm should be em/loyed four times on the following four starting
points:
(
−
50
,
7
)
T
,
(
20
,
7
)
T
,
(
20
,
−
18
)
T
,
(
5
,
−
10
)
T
(-50,7)^{T},(20,7)^{T},(20,-18)^{T},(5,-10)^{T}
(−50,7)T,(20,7)T,(20,−18)T,(5,−10)T. For each of the four starting points, compare the number of iterations and the point to which each
method converged. If a method did not converge, explain why.
解:
(i)
∂
f
1
(
x
)
∂
x
1
=
∂
f
2
(
x
)
∂
x
1
=
1
\frac{\partial f_1(\boldsymbol{x})}{\partial x_1}=\frac{\partial f_2(\boldsymbol{x})}{\partial x_1}=1
∂x1∂f1(x)=∂x1∂f2(x)=1
∂
f
1
(
x
)
∂
x
2
=
−
3
x
2
2
+
10
x
2
−
2
\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}=-3x_2^2+10x_2-2
∂x2∂f1(x)=−3x22+10x2−2
∂
f
2
(
x
)
∂
x
2
=
3
x
2
2
+
2
x
2
−
14
\frac{\partial f_2(\boldsymbol{x})}{\partial x_2}=3x_2^2+2x_2-14
∂x2∂f2(x)=3x22+2x2−14
∂
2
f
1
(
x
)
∂
x
2
2
=
−
6
x
2
+
10
\frac{\partial^2 f_1(\boldsymbol{x})}{\partial x_2^2}=-6x_2+10
∂x22∂2f1(x)=−6x2+10
∂
2
f
2
(
x
)
∂
x
2
2
=
6
x
2
+
2
\frac{\partial^2 f_2(\boldsymbol{x})}{\partial x_2^2}=6x_2+2
∂x22∂2f2(x)=6x2+2
∂
f
(
x
)
∂
x
1
=
2
(
f
1
(
x
)
+
f
2
(
x
)
)
\frac{\partial f(\boldsymbol{x})}{\partial x_1}=2(f_1(\boldsymbol{x})+f_2(\boldsymbol{x}))
∂x1∂f(x)=2(f1(x)+f2(x))
∂
f
(
x
)
∂
x
2
=
2
(
f
1
(
x
)
∂
f
1
(
x
)
∂
x
2
+
f
2
(
x
)
∂
f
2
(
x
)
∂
x
2
)
\frac{\partial f(\boldsymbol{x})}{\partial x_2}=2(f_1(\boldsymbol{x})\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}+f_2(\boldsymbol{x})\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})
∂x2∂f(x)=2(f1(x)∂x2∂f1(x)+f2(x)∂x2∂f2(x))
令
∇
f
(
x
)
=
0
⇒
∂
f
(
x
)
∂
x
1
=
∂
f
(
x
)
∂
x
2
=
0
\nabla f(\boldsymbol{x})=0\Rightarrow \frac{\partial f(\boldsymbol{x})}{\partial x_1}=\frac{\partial f(\boldsymbol{x})}{\partial x_2}=0
∇f(x)=0⇒∂x1∂f(x)=∂x2∂f(x)=0
所以
f
2
(
x
)
=
−
f
1
(
x
)
f_2(\boldsymbol{x})=-f_1(\boldsymbol{x})
f2(x)=−f1(x)
∂
f
(
x
)
∂
x
2
=
2
(
f
1
(
x
)
∂
f
1
(
x
)
∂
x
2
+
f
2
(
x
)
∂
f
2
(
x
)
∂
x
2
)
=
2
f
1
(
x
)
(
∂
f
1
(
x
)
∂
x
2
−
∂
f
2
(
x
)
∂
x
2
)
=
2
f
1
(
x
)
(
3
x
2
2
−
4
x
2
−
6
)
=
0
\begin{aligned} \frac{\partial f(\boldsymbol{x})}{\partial x_2}&=2(f_1(\boldsymbol{x})\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}+f_2(\boldsymbol{x})\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})\\ &=2f_1(\boldsymbol{x})(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}-\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})\\ &=2f_1(\boldsymbol{x})(3x_2^2-4x_2-6)\\ &=0 \end{aligned}
∂x2∂f(x)=2(f1(x)∂x2∂f1(x)+f2(x)∂x2∂f2(x))=2f1(x)(∂x2∂f1(x)−∂x2∂f2(x))=2f1(x)(3x22−4x2−6)=0
⇒
f
1
(
x
)
=
f
2
(
x
)
=
0
或者
∂
f
1
(
x
)
∂
x
2
=
∂
f
2
(
x
)
∂
x
2
\Rightarrow f_1(x)=f_2(x)=0 \text{或者} \frac{\partial f_1(\boldsymbol{x})}{\partial x_2}=\frac{\partial f_2(\boldsymbol{x})}{\partial x_2}
⇒f1(x)=f2(x)=0或者∂x2∂f1(x)=∂x2∂f2(x)
1)当
f
1
(
x
)
=
f
2
(
x
)
=
0
f_1(\boldsymbol{x})=f_2(\boldsymbol{x})=0
f1(x)=f2(x)=0时
f
1
(
x
)
−
f
2
(
x
)
=
16
+
(
−
2
x
2
2
+
4
x
2
+
12
)
x
2
=
−
2
(
x
2
−
4
)
(
(
x
2
+
1
)
2
+
1
)
f_1(\boldsymbol{x})-f_2(\boldsymbol{x})=16+(-2x_2^2+4x_2+12)x_2=-2(x_2-4)((x_2+1)^2+1)
f1(x)−f2(x)=16+(−2x22+4x2+12)x2=−2(x2−4)((x2+1)2+1)
⇒
{
x
1
=
5
x
2
=
4
\Rightarrow \begin{cases} x_1=5\\ x_2=4 \end{cases}
⇒{x1=5x2=4
f
(
x
)
≥
f
(
(
5
,
4
)
T
)
=
0
f(\boldsymbol{x})\ge f((5,4)^T)=0
f(x)≥f((5,4)T)=0
显然是全局最小值
2)当
∂
f
1
(
x
)
∂
x
2
=
∂
f
2
(
x
)
∂
x
2
\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}=\frac{\partial f_2(\boldsymbol{x})}{\partial x_2}
∂x2∂f1(x)=∂x2∂f2(x)时
⇒
3
x
2
2
−
4
x
2
−
6
=
0
⇒
x
2
=
2
±
22
3
\Rightarrow 3x_2^2-4x_2-6=0\Rightarrow x_2=\frac{2\pm \sqrt{22}}{3}
⇒3x22−4x2−6=0⇒x2=32±22
f
1
(
x
)
+
f
2
(
x
)
=
0
−
42
+
2
x
1
+
(
5
x
2
−
x
2
2
−
2
+
x
2
2
+
x
2
−
14
)
x
2
=
0
−
42
+
2
x
1
+
(
6
x
2
−
16
)
x
2
=
0
−
21
+
x
1
+
3
x
2
2
−
8
x
2
=
0
−
21
+
x
1
+
4
x
2
+
6
−
8
x
2
=
0
x
1
=
4
x
2
+
15
x
1
=
53
±
4
22
3
\begin{aligned} f_1(\boldsymbol{x})+f_2(\boldsymbol{x})&=0\\ -42+2x_1+(5x_2-x_2^2-2+x_2^2+x_2-14)x_2&=0\\ -42+2x_1+(6x_2-16)x_2&=0\\ -21+x_1+3x_2^2-8x_2&=0\\ -21+x_1+4x_2+6-8x_2&=0\\ x_1&=4x_2+15\\ x_1&=\frac{53\pm 4\sqrt{22}}{3} \end{aligned}
f1(x)+f2(x)−42+2x1+(5x2−x22−2+x22+x2−14)x2−42+2x1+(6x2−16)x2−21+x1+3x22−8x2−21+x1+4x2+6−8x2x1x1=0=0=0=0=0=4x2+15=353±422
∂
2
f
(
x
)
∂
x
1
2
=
2
(
∂
f
1
(
x
)
∂
x
1
+
∂
f
2
(
x
)
∂
x
1
)
=
4
\frac{\partial^2 f(\boldsymbol{x})}{\partial x_1^2}=2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_1}+\frac{\partial f_2(\boldsymbol{x})}{\partial x_1})=4
∂x12∂2f(x)=2(∂x1∂f1(x)+∂x1∂f2(x))=4
∂
2
f
(
x
)
∂
x
1
∂
x
2
=
∂
2
f
(
x
)
∂
x
2
∂
x
1
=
2
(
∂
f
1
(
x
)
∂
x
2
+
∂
f
2
(
x
)
∂
x
2
)
=
4
∂
f
1
(
x
)
∂
x
2
=
2
(
12
x
2
−
16
)
=
8
(
3
x
2
−
4
)
\begin{aligned} \frac{\partial^2 f(\boldsymbol{x})}{\partial x_1\partial x_2}&=\frac{\partial^2 f(\boldsymbol{x})}{\partial x_2\partial x_1}\\ &=2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}+\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})\\ &=4\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}\\ &=2(12x_2-16)\\ &=8(3x_2-4) \end{aligned}
∂x1∂x2∂2f(x)=∂x2∂x1∂2f(x)=2(∂x2∂f1(x)+∂x2∂f2(x))=4∂x2∂f1(x)=2(12x2−16)=8(3x2−4)
∂
2
f
(
x
)
∂
x
2
2
=
2
(
(
∂
f
1
(
x
)
∂
x
2
)
2
+
f
1
(
x
)
∂
2
f
1
(
x
)
∂
x
2
2
+
(
∂
f
2
(
x
)
∂
x
2
)
2
+
f
2
(
x
)
∂
2
f
2
(
x
)
∂
x
2
2
)
=
2
(
2
(
∂
f
1
(
x
)
∂
x
2
)
2
+
f
1
(
x
)
(
∂
2
f
1
(
x
)
∂
x
2
2
−
∂
2
f
2
(
x
)
∂
x
2
2
)
)
=
2
(
2
(
∂
f
1
(
x
)
∂
x
2
)
2
+
f
1
(
x
)
(
−
12
x
2
+
8
)
)
=
2
(
2
(
∂
f
1
(
x
)
∂
x
2
)
2
−
4
f
1
(
x
)
(
3
x
2
−
2
)
)
=
4
(
(
∂
f
1
(
x
)
∂
x
2
)
2
−
2
f
1
(
x
)
(
3
x
2
−
2
)
)
\begin{aligned} &\quad \frac{\partial^2 f(\boldsymbol{x})}{\partial x_2^2}\\ &=2((\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2+f_1(\boldsymbol{x})\frac{\partial^2 f_1(\boldsymbol{x})}{\partial x_2^2}+(\frac{\partial f_2(\boldsymbol{x})}{\partial x_2})^2+f_2(\boldsymbol{x})\frac{\partial^2 f_2(\boldsymbol{x})}{\partial x_2^2})\\ &=2(2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2+f_1(\boldsymbol{x})(\frac{\partial^2 f_1(\boldsymbol{x})}{\partial x_2^2}-\frac{\partial^2 f_2(\boldsymbol{x})}{\partial x_2^2}))\\ &=2(2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2+f_1(\boldsymbol{x})(-12x_2+8))\\ &=2(2(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-4f_1(\boldsymbol{x})(3x_2-2))\\ &=4((\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-2f_1(\boldsymbol{x})(3x_2-2))\\ \end{aligned}
∂x22∂2f(x)=2((∂x2∂f1(x))2+f1(x)∂x22∂2f1(x)+(∂x2∂f2(x))2+f2(x)∂x22∂2f2(x))=2(2(∂x2∂f1(x))2+f1(x)(∂x22∂2f1(x)−∂x22∂2f2(x)))=2(2(∂x2∂f1(x))2+f1(x)(−12x2+8))=2(2(∂x2∂f1(x))2−4f1(x)(3x2−2))=4((∂x2∂f1(x))2−2f1(x)(3x2−2))
∣
∇
2
f
(
x
)
∣
=
∣
4
4
∂
f
1
(
x
)
∂
x
2
4
∂
f
1
(
x
)
∂
x
2
4
(
(
∂
f
1
(
x
)
∂
x
2
)
2
−
2
f
1
(
x
)
(
3
x
2
−
2
)
)
∣
=
16
(
∂
f
1
(
x
)
∂
x
2
)
2
−
32
f
1
(
x
)
(
3
x
2
−
2
)
−
16
(
∂
f
1
(
x
)
∂
x
2
)
2
=
−
32
f
1
(
x
)
(
3
x
2
−
2
)
\begin{aligned} \left|\nabla^2f(\boldsymbol{x})\right| &= \left| \begin{array}{cccc} 4&4\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}\\ 4\frac{\partial f_1(\boldsymbol{x})}{\partial x_2}&4((\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-2f_1(\boldsymbol{x})(3x_2-2))\\ \end{array} \right|\\ &=16(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2-32f_1(\boldsymbol{x})(3x_2-2)-16(\frac{\partial f_1(\boldsymbol{x})}{\partial x_2})^2\\ &=-32f_1(\boldsymbol{x})(3x_2-2) \end{aligned}
∣∣∇2f(x)∣∣=∣∣∣∣∣44∂x2∂f1(x)4∂x2∂f1(x)4((∂x2∂f1(x))2−2f1(x)(3x2−2))∣∣∣∣∣=16(∂x2∂f1(x))2−32f1(x)(3x2−2)−16(∂x2∂f1(x))2=−32f1(x)(3x2−2)
f
1
(
x
)
=
−
13
+
x
1
+
(
(
5
−
x
2
)
x
2
−
2
)
x
2
=
−
13
+
x
1
+
(
5
x
2
−
x
2
2
−
2
)
x
2
=
−
13
+
x
1
+
(
5
x
2
−
4
x
2
+
6
3
−
2
)
x
2
=
−
13
+
x
1
+
1
3
(
15
x
2
−
4
x
2
−
6
−
6
)
x
2
=
−
13
+
x
1
+
1
3
(
11
x
2
−
12
)
x
2
=
−
13
+
x
1
+
1
3
(
11
x
2
2
−
12
x
2
)
=
−
13
+
x
1
+
1
3
(
11
4
x
2
+
6
3
−
12
x
2
)
=
−
13
+
x
1
+
1
9
(
44
x
2
+
66
−
36
x
2
)
=
−
13
+
x
1
+
8
9
x
2
+
22
3
=
−
17
3
+
x
1
+
8
9
x
2
=
4
27
(
85
±
11
22
)
>
0
\begin{aligned} &\quad f_1(\boldsymbol{x})\\ &=-13+x_1+((5-x_2)x_2-2)x_2\\ &=-13+x_1+(5x_2-x_2^2-2)x_2\\ &=-13+x_1+(5x_2-\frac{4x_2+6}{3}-2)x_2\\ &=-13+x_1+\frac{1}{3}(15x_2-4x_2-6-6)x_2\\ &=-13+x_1+\frac{1}{3}(11x_2-12)x_2\\ &=-13+x_1+\frac{1}{3}(11x_2^2-12x_2)\\ &=-13+x_1+\frac{1}{3}(11\frac{4x_2+6}{3}-12x_2)\\ &=-13+x_1+\frac{1}{9}(44x_2+66-36x_2)\\ &=-13+x_1+\frac{8}{9}x_2+\frac{22}{3}\\ &=\frac{-17}{3}+x_1+\frac{8}{9}x_2\\ &=\frac{4}{27}(85\pm 11\sqrt{22})\\ &>0 \end{aligned}
f1(x)=−13+x1+((5−x2)x2−2)x2=−13+x1+(5x2−x22−2)x2=−13+x1+(5x2−34x2+6−2)x2=−13+x1+31(15x2−4x2−6−6)x2=−13+x1+31(11x2−12)x2=−13+x1+31(11x22−12x2)=−13+x1+31(1134x2+6−12x2)=−13+x1+91(44x2+66−36x2)=−13+x1+98x2+322=3−17+x1+98x2=274(85±1122)>0
∣
∇
2
f
(
x
)
∣
=
∓
32
22
f
1
(
x
)
\left|\nabla^2f(\boldsymbol{x})\right|=\mp 32\sqrt{22}f_1(\boldsymbol{x})
∣∣∇2f(x)∣∣=∓3222f1(x)
当
{
x
1
=
53
+
4
22
3
x
2
=
2
+
22
3
\begin{cases} x_1=\frac{53+ 4\sqrt{22}}{3}\\ x_2=\frac{2+ \sqrt{22}}{3} \end{cases}
{x1=353+422x2=32+22时,
∣
∇
2
f
(
x
)
∣
<
0
\left|\nabla^2f(\boldsymbol{x})\right|<0
∣∣∇2f(x)∣∣<0,所以特征值有正有负,所以
∇
2
f
(
x
)
\nabla^2f(\boldsymbol{x})
∇2f(x)不定,所以是鞍点
当
{
x
1
=
53
−
4
22
3
x
2
=
2
−
22
3
\begin{cases} x_1=\frac{53- 4\sqrt{22}}{3}\\ x_2=\frac{2- \sqrt{22}}{3} \end{cases}
{x1=353−422x2=32−22时,
∣
∇
2
f
(
x
)
∣
>
0
\left|\nabla^2f(\boldsymbol{x})\right|>0
∣∣∇2f(x)∣∣>0,所以
∇
2
f
(
x
)
≻
0
\nabla^2f(\boldsymbol{x})\succ 0
∇2f(x)≻0,所以是严格局部极小值
(ii)
回溯法
function [x,fun_val]=gradient_method_backtracking(f,g,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
grad=g(x);
fun_val=f(x);
iter=0;
while (norm(grad)>epsilon)
iter=iter+1;
t=s;
while (fun_val-f(x-t*grad)<alpha*t*norm(grad)^2)
t=beta*t;
end
x=x-t*grad;
fun_val=f(x);
grad=g(x);
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(grad),fun_val);
end
混合牛顿
function x=newton_hybrid(f,g,h,x0,alpha,beta,epsilon)
% Hybrid Newton’s method
%
% INPUT
%=======================================
% f ......... objective function
% g ......... gradient of the objective function
% h ......... hessian of the objective function
% x0......... initial point
% alpha ..... tolerance parameter for the stepsize selection strategy
% beta ...... the proportion in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
gval=g(x);
hval=h(x);
[L,p]=chol(hval,'lower');
if (p==0)
d=L'\(L\gval);
else
d=gval;
end
iter=0;
while ((norm(gval)>epsilon)&&(iter<10000))
iter=iter+1;
t=1;
while(f(x-t*d)>f(x)-alpha*t*gval'*d)
t=beta*t;
end
x=x-t*d;
fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
gval=g(x);
hval=h(x);
[L,p]=chol(hval,'lower');
if (p==0)
d=L'\(L\gval);
else
d=gval;
end
end
if (iter==10000)
fprintf('did not converge\n')
end
阻尼高斯牛顿
function [x,fun_val]=damped_Gauss_Newtow(g,grad,J,F,x0,s,alpha,...
beta,epsilon)
% Gradient method with backtracking stepsize rule
%
% INPUT
%=======================================
% g ......... objective function
% grad ...... gradient of the objective function
% J ......... Jacobian matrix
% F ......... vector-valued function
% x0......... initial point
% s ......... initial choice of stepsize
% alpha ..... tolerance parameter for the stepsize selection
% beta ...... the constant in which the stepsize is multiplied
% at each backtracking step (0<beta<1)
% epsilon ... tolerance parameter for stopping rule
% OUTPUT
%=======================================
% x ......... optimal solution (up to a tolerance)
% of min f(x)
% fun_val ... optimal function value
x=x0;
J_val=J(x);
F_val=F(x);
d=(J_val'*J_val)\(J_val'*F_val);
fun_val=g(x);
gval=grad(x);
iter=0;
while (norm(gval)>epsilon&&(iter<10000))
iter=iter+1;
t=s;
while (fun_val-g(x-t*d)<alpha*t*gval'*d)
t=beta*t;
end
x=x-t*d;
J_val=J(x);
F_val=F(x);
d=(J_val'*J_val)\(J_val'*F_val);
fun_val=g(x);
gval=grad(x);
fprintf('iter_number = %3d norm_grad = %2.6f fun_val = %2.6f \n',...
iter,norm(gval),fun_val);
end
if (iter==10000)
fprintf('did not converge\n')
end
用到的数据
clear;
s=1;
alpha=0.5;
beta=0.5;
epsilon=1e-5;
f1=@(x)-13+x(1)+((5-x(2))*x(2)-2)*x(2);
f1_1=@(x)1;
f1_1_1=@(x)0;
f1_2=@(x)-3*x(2)^2+10*x(2)-2;
f1_2_2=@(x)-6*x(2)+10;
f2=@(x)-29+x(1)+((x(2)+1)*x(2)-14)*x(2);
f2_1=@(x)1;
f2_1_1=@(x)0;
f2_2=@(x)3*x(2)^2+2*x(2)-14;
f2_2_2=@(x)6*x(2)+2;
f=@(x)f1(x)^2+f2(x)^2;
f_1=@(x)2*f1(x)+2*f2(x);
f_1_1=@(x)2*f1_1(x)+2*f2_1(x);
f_1_2=@(x)2*f1_2(x)+2*f2_2(x);
f_2=@(x)2*f1(x)*f1_2(x)+2*f2(x)*f2_2(x);
f_2_1=@(x)2*f1_2(x)+2*f2_2(x);
f_2_2=@(x)2*(f1_2(x)^2+f1(x)*f1_2_2(x)+f2_2(x)^2+f2(x)*f2_2_2(x));
gradient=@(x)[f_1(x);f_2(x)];
hessian=@(x)[f_1_1(x),f_1_2(x);f_2_1(x),f_2_2(x)];
F=@(x)[f1(x);f2(x)];
J=@(x)[f1_1(x),f1_2(x);f2_1(x),f2_2(x)];
global_min_x=[5;4];
local_min_x=[(53-4*sqrt(22))/3;(2-sqrt(22))/3];
saddle_x=[(53+4*sqrt(22))/3;(2+sqrt(22))/3];
x1=[-50;7];
x2=[20;7];
x3=[20;-18];
x4=[5;-10];
调用
data;
gradient_method_backtracking(f,gradient,x1,s,alpha,beta,epsilon);
gradient_method_backtracking(f,gradient,x2,s,alpha,beta,epsilon);
gradient_method_backtracking(f,gradient,x3,s,alpha,beta,epsilon);
gradient_method_backtracking(f,gradient,x4,s,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x1,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x2,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x3,alpha,beta,epsilon);
newton_hybrid(f,gradient,hessian,x4,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x1,s,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x2,s,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x3,s,alpha,beta,epsilon);
damped_Gauss_Newtow(f,gradient,J,F,x4,s,alpha,beta,epsilon);
1.使用gradient method with backtracking时
当
x
0
=
(
−
50
,
7
)
T
x_0=(-50,7)^T
x0=(−50,7)T时,经过2252次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
当
x
0
=
(
20
,
7
)
T
x_0=(20,7)^T
x0=(20,7)T时,经过2447次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
当
x
0
=
(
20
,
−
18
)
T
x_0=(20,-18)^T
x0=(20,−18)T时,经过2472次迭代后,收敛到全局最小值
(
53
−
4
22
3
,
2
−
22
3
)
T
(\frac{53- 4\sqrt{22}}{3},\frac{2- \sqrt{22}}{3})^T
(353−422,32−22)T
当
x
0
=
(
5
,
−
10
)
T
x_0=(5,-10)^T
x0=(5,−10)T时,经过2123次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
2.使用hybrid Newtow’s method时
当
x
0
=
(
−
50
,
7
)
T
x_0=(-50,7)^T
x0=(−50,7)T时,经过8次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
当
x
0
=
(
20
,
7
)
T
x_0=(20,7)^T
x0=(20,7)T时,经过8次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
当
x
0
=
(
20
,
−
18
)
T
x_0=(20,-18)^T
x0=(20,−18)T时,经过16次迭代后,收敛到全局最小值
(
53
−
4
22
3
,
2
−
22
3
)
T
(\frac{53- 4\sqrt{22}}{3},\frac{2- \sqrt{22}}{3})^T
(353−422,32−22)T
当
x
0
=
(
5
,
−
10
)
T
x_0=(5,-10)^T
x0=(5,−10)T时,经过13次迭代后,收敛到全局最小值
(
53
−
4
22
3
,
2
−
22
3
)
T
(\frac{53- 4\sqrt{22}}{3},\frac{2- \sqrt{22}}{3})^T
(353−422,32−22)T
3.使用damped Gauss-Newtow’s method时
当
x
0
=
(
−
50
,
7
)
T
x_0=(-50,7)^T
x0=(−50,7)T时,经过29次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
当
x
0
=
(
20
,
7
)
T
x_0=(20,7)^T
x0=(20,7)T时,经过29次迭代后,收敛到全局最小值
(
5
,
4
)
T
(5,4)^T
(5,4)T
当
x
0
=
(
20
,
−
18
)
T
x_0=(20,-18)^T
x0=(20,−18)T时,不收敛
当
x
0
=
(
5
,
−
10
)
T
x_0=(5,-10)^T
x0=(5,−10)T时,不收敛
因为使用backtracking搜索时,最后t趋于0,导致没有走动,所以不收敛
5.3. Let
f
f
f be a twice continuously differentiable function satisfying
L
I
⪰
∇
2
f
(
x
)
⪰
m
I
L\mathbf{I}\succeq \nabla^2 f\left(\mathbf{x}\right)\succeq m\mathbf{I}
LI⪰∇2f(x)⪰mI for some
L
>
m
>
0
L>m>0
L>m>0 and let
x
∗
\mathbf{x}^*
x∗ be the unique minimizer of
f
f
f over
R
n
\mathbb{R}^n
Rn.
(i)Show that
f
(
x
)
−
f
(
x
∗
)
≥
m
2
∥
x
−
x
∗
∥
2
f\left(\mathbf{x}\right)-f\left(\mathbf{x}^*\right)\ge \frac{m}{2}\|\mathbf{x}-\mathbf{x}^*\|^2
f(x)−f(x∗)≥2m∥x−x∗∥2
for any
x
∈
R
n
\mathbf{x}\in\mathbb{R}^n
x∈Rn.
(ii)Let
{
x
k
}
k
≥
0
\left\{\mathbf{x}_k\right\}_{k\ge 0}
{xk}k≥0 be the sequence generated by damped Newton’s method with constant stepsize
t
k
=
m
L
t_k=\frac{m}{L}
tk=Lm.Show that
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
m
2
L
∇
f
(
x
k
)
T
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right) \geq \frac{m}{2 L} \nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)
f(xk)−f(xk+1)≥2Lm∇f(xk)T(∇2f(xk))−1∇f(xk)
(iii) Show that
x
k
→
x
∗
\mathbf{x}_k \to \mathbf{x}^*
xk→x∗ as
k
→
∞
k\to \infty
k→∞.
解:
(i)
f
(
x
)
−
f
(
x
∗
)
=
∇
f
(
x
∗
)
T
(
x
−
x
∗
)
+
1
2
(
x
−
x
∗
)
T
∇
2
f
(
x
∗
)
(
x
−
x
∗
)
≥
m
2
∥
x
−
x
∗
∥
2
f\left(\mathbf{x}\right)-f\left(\mathbf{x}^*\right)=\nabla f\left(\mathbf{x}^*\right)^T\left(\mathbf{x}-\mathbf{x}^*\right)+\frac{1}{2}\left(\mathbf{x}-\mathbf{x}^*\right)^T\nabla^2f\left(\mathbf{x}^*\right)\left(\mathbf{x}-\mathbf{x}^*\right)\ge\frac{m}{2}\|\mathbf{x}-\mathbf{x}^*\|^2
f(x)−f(x∗)=∇f(x∗)T(x−x∗)+21(x−x∗)T∇2f(x∗)(x−x∗)≥2m∥x−x∗∥2
(ii)
因为
L
I
⪰
∇
2
f
(
x
)
L\mathbf{I}\succeq \nabla^2 f\left(\mathbf{x}\right)
LI⪰∇2f(x)
所以
∥
∇
2
f
(
x
)
∥
≤
L
\|\nabla^2 f\left(\mathbf{x}\right)\|\le L
∥∇2f(x)∥≤L
所以
f
∈
C
L
1
,
1
f\in C_{L}^{1,1}
f∈CL1,1
根据下降引理
f
(
x
k
)
−
f
(
x
k
+
1
)
≥
∇
f
(
x
k
)
T
(
x
k
−
x
k
+
1
)
−
L
2
∥
x
k
−
x
k
+
1
∥
2
=
m
L
∇
f
(
x
k
)
T
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
−
L
2
m
2
L
2
∇
f
(
x
k
)
T
(
∇
2
f
(
x
k
)
)
−
1
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
≥
m
L
∇
f
(
x
k
)
T
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
−
m
2
L
∇
f
(
x
k
)
T
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
=
m
2
L
∇
f
(
x
k
)
T
(
∇
2
f
(
x
k
)
)
−
1
∇
f
(
x
k
)
\begin{aligned} &f\left(\mathbf{x}_k\right)-f\left(\mathbf{x}_{k+1}\right)\\ \ge&\nabla f\left(\mathbf{x}_k\right)^T\left(\mathbf{x}_k-\mathbf{x}_{k+1}\right)-\frac{L}{2}\|\mathbf{x}_{k}-\mathbf{x}_{k+1}\|^2\\ =& \frac{m}{L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)-\frac{L}{2}\frac{m^2}{L^2}\nabla f\left(\mathbf{x}_{k}\right)^T\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1}\nabla f\left(\mathbf{x}_{k}\right)\\ \ge&\frac{m}{L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)-\frac{m}{2L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right)\\ =&\frac{m}{2L}\nabla f\left(\mathbf{x}_{k}\right)^{T}\left(\nabla^{2} f\left(\mathbf{x}_{k}\right)\right)^{-1} \nabla f\left(\mathbf{x}_{k}\right) \end{aligned}
≥=≥=f(xk)−f(xk+1)∇f(xk)T(xk−xk+1)−2L∥xk−xk+1∥2Lm∇f(xk)T(∇2f(xk))−1∇f(xk)−2LL2m2∇f(xk)T(∇2f(xk))−1(∇2f(xk))−1∇f(xk)Lm∇f(xk)T(∇2f(xk))−1∇f(xk)−2Lm∇f(xk)T(∇2f(xk))−1∇f(xk)2Lm∇f(xk)T(∇2f(xk))−1∇f(xk)
(iii)不会
相关文章
- [Node.js] Add node.js command line to global
- [PostgreSQL] Use Foreign Keys to Ensure Data Integrity in Postgres
- [Ramda] Convert a QueryString to an Object using Function Composition in Ramda
- [Typescript] 104. Hard - Tuple to Enum Object
- [CSS] Purgecss to remove unused css
- [Functional Programming] From simple implementation to Currying to Partial Application
- [Angular 2] Using a Reducer to Change an Object's Property Inside an Array
- [Angular 2] Using ng-for to repeat template elements
- [WebStrom] Cannot detect file change to trigger webpack re-compile
- [Node.js] Intro to Web Scraping with Node and X-ray
- [LeetCode] Integer to English Words
- 运行RF测试案例,显示unable to open socket to "localhost:56505" error: [Errno 10061] 错误,且关闭RF卡死的解决办法
- vue脚手架搭建报错:vue-cli · Failed to download repo vuejs-templates/webpack: connect ETIMEDOU
- Introduction to nonlinear optimization第一章习题
- PPT:PowerPoint to Flash SDK:SWF