Dan Simon Cleveland State University Jang, Sun, and Mizutani Neuro-Fuzzy and Soft Computing Chapter 6 Derivative-Based Optimization 1

Dan SimonCleveland State University

Jang, Sun, and MizutaniNeuro-Fuzzy and Soft Computing

Chapter 6Derivative-Based Optimization

1

Outline

1. Gradient Descent2. The Newton-Raphson Method3. The Levenberg–Marquardt Algorithm4. Trust Region Methods

2

Gradient descent: head downhillhttp://en.wikipedia.org/wiki/Gradient_descent

Contour plot

3

Fuzzy controller optimization: Find the MF parameters that minimize tracking errormin E() with respect to = n-element vector of MF parametersE() = controller tracking error

1

1

step size

step number

k

kk

k

k

T

k k n

E

E E E E

k

4

x=-3: 0.1: 3; y=-2: 0.1: 2;for i=1:length(x), for j=1:length(y), z(i,j)=x(i)^2+10*y(j)^2; end, endcontour(x,y,z)

x

y

-3 -2 -1 0 1 2 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2Contour plot of x2+10y2

too small: convergence takes long time

too large: overshoot minimum

5

Gradient descent is a local optimization method (Rastrigin function)

6

How should we select the step size? • k too small: convergence takes long time• k too large: overshoot minimumLine minimization:

1k k kk

E

1arg mink k

x

y

-3 -2 -1 0 1 2 3-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Step Size Selection

7

Recall the general Taylor series expansion:f(x) = f(x0) + f’(x0)(x – x0) + … Therefore,

Step Size Selection

21

12

( ) ( ) ( ))(k k k

k kk k k

E E E

The minimum of E(k+1) occurs when its derivative is 0, which means that:

2

12

12

1 2

( ) ( )) 0

( ) ( ))

(

(

k kk k

k k

k kk k

k k

E E

E E

8

Compare the two equations:

12

1 2

1

( ) ( ))(

kk

k kk k

k k

k k

E

E

E

Step Size Selection

We see that12

2

( )kk

k

E

Newton’s method, also called the Newton-Raphson method

9

Jang, Fig. 6.3 – The Newton-Raphson method approximates E( ) as a quadratic function

10

The second derivative is called the Hessian (Otto Hesse, 1800s)

12

2

1

( )

kk

k

kk

k

k

E

E

Step Size Selection

How easy or difficult is it to calculate the Hessian?What if the Hessian is not invertible?

The Newton-Raphson method

11

The Levenberg–Marquardt algorithm:

Step Size Selection

12

21

( )k

kk

kk

EI

E

is a parameter selected to balance between steepest descent ( = ) and Newton-Raphson ( = 0). We can also control the step size with another parameter :

12

21

( )k

kk

kk

EI

E

12

Jang, Fig. 6.5 – Illustration of Levenberg-Marquardt gradient descent

13

Trust region methods: These are used in conjuction with the Newton-Raphson method, which approximates E as quadratic in :

Step Size Selection

2

2)(

1( )

2

T

Tk k k k k k

k k

EE E

E

If we use Newton-Raphson to minimize E with a step size of k, then we are implicitly assuming that E(k+k) will be equal to the above expression.

14

Actually, we expect the actual improvement k to be slightly smaller than the true improvement:

1

1

)

ˆ (

( actual value after Newton-Raphson step

) predicted value after Newton-Raphson step

k

k

E

E

Step Size Selection

1

1

) ( )(

( ˆ) ( )k k

k

k k

E

E

E

E

We limit the step size k so that |k|<Rk

Our “trust” in the quadratic approximation is proportional to k

15

k = ratio of actual to expected improvement

Rk = trust region: maximum allowable size of k

Step Size Selection

1

R if

2 if

/ 2 0.2

0

otherwise

.8k k

k k k

k

R R

R

16

17

18

19

Documents

Dan Simon Cleveland State University Jang, Sun, and Mizutani Neuro-Fuzzy and Soft Computing Chapter 6 Derivative-Based Optimization 1