Upload
gregory-lloyd
View
217
Download
0
Embed Size (px)
Citation preview
Dan SimonCleveland State University
Jang, Sun, and MizutaniNeuro-Fuzzy and Soft Computing
Chapter 6Derivative-Based Optimization
1
Outline
1. Gradient Descent2. The Newton-Raphson Method3. The Levenberg–Marquardt Algorithm4. Trust Region Methods
2
Gradient descent: head downhillhttp://en.wikipedia.org/wiki/Gradient_descent
Contour plot
3
Fuzzy controller optimization: Find the MF parameters that minimize tracking errormin E() with respect to = n-element vector of MF parametersE() = controller tracking error
1
1
step size
step number
k
kk
k
k
T
k k n
E
E E E E
k
4
x=-3: 0.1: 3; y=-2: 0.1: 2;for i=1:length(x), for j=1:length(y), z(i,j)=x(i)^2+10*y(j)^2; end, endcontour(x,y,z)
x
y
-3 -2 -1 0 1 2 3-2
-1.5
-1
-0.5
0
0.5
1
1.5
2Contour plot of x2+10y2
too small: convergence takes long time
too large: overshoot minimum
5
Gradient descent is a local optimization method (Rastrigin function)
6
How should we select the step size? • k too small: convergence takes long time• k too large: overshoot minimumLine minimization:
1k k kk
E
1arg mink k
x
y
-3 -2 -1 0 1 2 3-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Step Size Selection
7
Recall the general Taylor series expansion:f(x) = f(x0) + f’(x0)(x – x0) + … Therefore,
Step Size Selection
21
12
( ) ( ) ( ))(k k k
k kk k k
E E E
The minimum of E(k+1) occurs when its derivative is 0, which means that:
2
12
12
1 2
( ) ( )) 0
( ) ( ))
(
(
k kk k
k k
k kk k
k k
E E
E E
8
Compare the two equations:
12
1 2
1
( ) ( ))(
kk
k kk k
k k
k k
E
E
E
Step Size Selection
We see that12
2
( )kk
k
E
Newton’s method, also called the Newton-Raphson method
9
Jang, Fig. 6.3 – The Newton-Raphson method approximates E( ) as a quadratic function
10
The second derivative is called the Hessian (Otto Hesse, 1800s)
12
2
1
( )
kk
k
kk
k
k
E
E
Step Size Selection
How easy or difficult is it to calculate the Hessian?What if the Hessian is not invertible?
The Newton-Raphson method
11
The Levenberg–Marquardt algorithm:
Step Size Selection
12
21
( )k
kk
kk
EI
E
is a parameter selected to balance between steepest descent ( = ) and Newton-Raphson ( = 0). We can also control the step size with another parameter :
12
21
( )k
kk
kk
EI
E
12
Jang, Fig. 6.5 – Illustration of Levenberg-Marquardt gradient descent
13
Trust region methods: These are used in conjuction with the Newton-Raphson method, which approximates E as quadratic in :
Step Size Selection
2
2)(
1( )
2
T
Tk k k k k k
k k
EE E
E
If we use Newton-Raphson to minimize E with a step size of k, then we are implicitly assuming that E(k+k) will be equal to the above expression.
14
Actually, we expect the actual improvement k to be slightly smaller than the true improvement:
1
1
)
ˆ (
( actual value after Newton-Raphson step
) predicted value after Newton-Raphson step
k
k
E
E
Step Size Selection
1
1
) ( )(
( ˆ) ( )k k
k
k k
E
E
E
E
We limit the step size k so that |k|<Rk
Our “trust” in the quadratic approximation is proportional to k
15
k = ratio of actual to expected improvement
Rk = trust region: maximum allowable size of k
Step Size Selection
1
R if
2 if
/ 2 0.2
0
otherwise
.8k k
k k k
k
R R
R
16
17
18
19