13
ODS: Solution-TT 1 ODS EC-S 1-14 Task 1 ( ) Line Search Methods The solution of this exercise is given on the exercise course slides, which can be down- loaded on the ODS website. Task 2 ( ) Trust Region Method The solution of this exercise is given on the exercise course slides, which can be down- loaded on the ODS website. Task 3 ( ) Dogleg Method Introduction: The Dogleg method searches for the minimum opt p of the quadratic sub problem on a path defined by three points: The unconstrained minimum of the given quadratic problem which is the minimum along the steepest descent direction given by and the initial parameter value 0 p = . ( ) ( ) ( ) g with k T T k t gg dJ p g d gB g θ θ θ =- = ( ) ( ) ( ) 1 () k k ND dJ p H d θ θ θ - =-

ODS Solution 1 WS1415

Embed Size (px)

DESCRIPTION

solutions

Citation preview

Page 1: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 1 ( ) Line Search Methods

The solution of this exercise is given on the exercise course slides, which can be down-

loaded on the ODS website.

Task 2 ( ) Trust Region Method

The solution of this exercise is given on the exercise course slides, which can be down-

loaded on the ODS website.

Task 3 ( ) Dogleg Method

Introduction:

The Dogleg method searches for the minimum opt

p of the quadratic sub problem on a path

defined by three points:

The unconstrained minimum of the given quadratic problem which is

the minimum along the steepest descent direction given by

and the initial parameter value 0p = .

( )

( )

( )g with

k

T

T kt

g g dJp g

dg B gθ

θ

θ= − =

( )( )

( )

1( )

k

k

ND

dJp H

θ

θ

= −

Page 2: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

This is shown in the following figure as an example in the banana scenario. The red lines

are the lines of the contour plot of the quadratic approximation of the objective function.

( )J θ is decreasing along this path. Therefore only the intersection of the TR with these

lines needs to be calculated to determine an approximation of the minimum. The possible

location of the minimum of the constrained optimization problem will be one of the following

three cases:

1 If ND

p is inside the trust region, then ND

p is the solution of the quadratic optimization

problem.

2 If ND

p is outside if the trust region and t

p is inside the TR, the solution lies some-

where on the path between ND

p and t

p .

3 If t

p is outside the trust region, the solution lies on the path between 0 and t

p .

Situation a:

The unconstrained minimum is given by

This lies inside the trust region and therefore Nopt D

p p=

Situation b:

Now, ND

p is located outside the TR.

is located inside the trust region. In this case, we need to calculate the intersection point

of the line connecting t

p with ND

p

and the boundary of the TR.

This can be done using the equation which describes the path between the two points and

then (if the Trust region is modeled as a circle) equalizing it with the radius of the TR.

-0.2 0 0.2 0.4 0.6 0.8

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

initial value

global minimum

pb

0

pu

0.833

0.417tp

− =

1

0.25opt NDp p

− = =

tp s

p

NDp

Page 3: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

this equation can be reformed to a quadratic equation

This equation yields two solutions for τ .For the given problem, they are 1 1.7343τ =

and

2 2.2343τ = − . 2τ is the intersection of the line with the bound of the trust region “on the

other side” and is not the desired solution. The solution is determined by inserting

1 1.7343τ = in the equation above

(The solution lies on the boundary of the trust region)

Situation c:

This Problem can not be solved using the Dogleg Method because the Hessian is not

positive definite. The problem needs to be redefined by using a positive definite quasi Hes-

sian matrix.

Situation d:

Now t

p is located outside of the TR. In this case, we need to calculate the intersection

point of the path connecting 0 with t

p

and the boundary of the TR.

Task 4 ( ) Line Search and Trust Region

a) The minimum *

θ is given for

* 1

B gθ−

= −

(The calculation was done in the lecture)

b) If B is negative definite, the problem is not bounded. Not bounded means the minimum

tends to −∞ if at least on parameter tends to ±∞ .

0.9285

0.3714

t

t

pp

p

= ∆ =

0.9557

0.2943opt

p−

=

( ) ( ) ( )22 22

2 21 2 1 .

T

ND t t ND t tp p p p p pτ τ− − + − − + − ∆

( ) ( )2

2

2

1 with 1 2t ND t

s

p p p

p

τ τ+ − − = ∆ < <�����

Page 4: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

If the problem is positive semi definite (at least one Eigenvalue is zero), there is more

than one minimum.

c) For an arbitrary parameter vector ( )0θ the gradient and the Hessian matrix is given by

this results in the line search direction (sparing the iteration index)

( )( )1 0

.p B g Bθ−

= − +

The optimal step length *α can be calculated by

If we insert this in the update formula of the parameter vector we get

We see that the Newton method reaches the optimum of a quadratic optimization prob-

lem after one iteration regardless of the starting point. If the distance to the minimum

is small enough, every function can locally be approximated by a quadratic function

and therefore the Newton method reaches the minimum after one step if the minimum

is close enough to the starting point.

d) For ( )00θ = the gradient and the Hessian matrix is given by

and the steepest descent search direction is simply

If the same steps as in the exercise before are performed

( )

( )

( )

( ) ( )00

2

2.

dJ d Jg and B

d dθ θ

θ θ

θ θ= =

.p g= −

( ) ( ) ( )( )0

1 0 0 *.

T T

T

g p p B gp p

Bp B p

θθ θ α θ θ

− −= + = + = − =

( )( )

( )

0!

0

*

0

T T

T

dJ p

d

g p p B

p B p

θ α

α

θα

+=

− −→ =

( )

( )

( ) ( )

( ) ( )00

20

2

dJ d Jg B and B

d dθ θ

θ θθ

θ θ= + =

Page 5: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

This yields the minimum along the steepest descent direction as

( )1

T

T

g gg

g Bgθ = − .

(This result is used in the dogleg method as the point t

p )

e) As the Trust Region is not bounded, the quadratic sub problem is not constrained and

therefore the solution is given by the full Newton update step (see exercise a) and c))

The Trust Region method uses a quadratic approximation of the original problem to

solve the original optimization problem iteratively. As in the exercise the original prob-

lem is a quadratic problem, ( )kρ will always be 1 regardless of the size of the TR.

( ) !

*

0

T

T

dJ p

d

g p

p B p

α

α

α

=

−→ =

Page 6: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 5 ( ) Quasi Newton Method

a) The secant equation is

( 1) ( ) ( )

( , ) ( ,1) ( ,1)

k k k

l l l l

B s y+

⋅ =

with

and

As the quasi Hessian needs to be symmetric ( )1k

B+

has the form

Also the ( )1k

B+

needs to be positive definite this results in the following conditions

Together with the conditions from the secant equation for the given problem

We have a system of two equality and two inequality conditions for 3 parameters. This

system of equations is under determined. As a consequence, the solution set of this

problem is infinite. Therefore, the algorithms based on the secant equation introduce

additional conditions (e.g. ( ) ( )1

mink k

B B+

− )

b) As the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula and the Davidon Fletcher

Powell (DFP) formula are quite long two m-files were written to do the calculation

(BFGS.m and DFP.m). Both methods yield

c) For the BFGS method the result is

The DFP method’s result does not differ much.

( )1 133.47 6.522.

6.522 81.478

k

BFGSB+

=

( )1 1 0.

0 2

kB

+ =

1

2

a b

b c

+ =

+ =

2

, 0

det 0 0.

a c

a bac b

b c

>

> → − >

( )1, , .

k a bB a b c

b c

+ = ∈

( ) ( 1) ( ) 1.

1

k k ks θ θ

+ = − =

( ) ( )

( 1) ( )

( ) 3 2 1

3 1 2k k

k dJ dJy

d dθ θ

θ θ

θ θ+

= − = − =

Page 7: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

d) The accuracy can be increased if ( )k

s is decreased. But most of the times, the optimi-

zation will work fine with a not so accurate approximation of the Hessian matrix. There-

fore it is not worthwhile to spend more processing time in calculating a more accurate

Hessian matrix.

e) The benefit of the quasi Newton compared to the Newton method is that the Hessian

matrix no longer needs to be provided. Also, the Hessian matrix is always positive

definite if approximated with the BFGS or DFP algorithm. In real implementations the

big benefit is that it is possible to approximate the inverse of the approximated Hessian

matrix directly. This saves a lot of calculation time in higher dimensional problems

(further information can be found in Nocedals’s book if you interested in this)

Task 6 ( ) Steepest Descent Optimization

The result is shown in the figure.

Page 8: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 7 ( ) Sectioning Optimization

a)

b) Answers:

• No, it must be possible to minimize the parameters independently. This is most times possible, but not always. (For example, in some NLSQ parameter fit problems this approach will not work)

• Pro: the most simple multidimensional optimization alg. and is therefore widely used in practice. Con: the alg. is very likely to “stuck” in valleys (e.g. in the Banana scenario).

Page 9: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 8 ( ) Sectioning Procedure a) Our objective function is given by

( ) [ ] 1

1 2

2

4 2

1 2J

θθ θ θ

θ

=

and the initial parameter vector by

[ ](0)6 4 .

Tθ = −

The first iteration of the sectioning procedure optimizes this function with respect to 1

θ

and keeps 2

θ constant.

� ( ) [ ]( )1 0

2 2

1

1 1 2

2

4 2min arg

1 2J

θθ θ

θθ θ θ

θ=

=

The new objective function now only dependent on 1

θ is

� ( ) [ ] 1 2

1 1 1 1

4 24 4 12 32.

1 2 4J

θθ θ θ θ

= − = − +

Using the first order condition, the minimum of this function can be determined (of course, a computer would use numerical techniques or possibly search techniques as well)

� ( ) !1

1 1

1

2 3 0 1.5.d J

d

θθ θ

θ→ − = → =

With this, the result of the first iteration is given by

(1) 1.5.

=

In the next iteration, the sectioning procedure would optimize the objective function with

respect to 2θ and keep

constant. This procedure is repeated until a stopping criterion is

reached. b) By generalizing the procedure above, the minimum of the function with respect to

1θ and

fixed 2θ can be determined by

Page 10: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

( ) !

1 2 1 2

1

38 3 0 .

8

dJ

d

θθ θ θ θ

θ= + = ⇒ = −

With this, the general result of the first taxi cab iteration for the given problem is

(1) (0) (0)

2 2

3.

8θ θ θ

= −

We see that (0)

1θ is irrelevant, because it is replaced by the result of our first search,

which only depends on (0)

2θ . And the outcome of the second taxi cab iteration is

(2) (1) (1) (0) (0)

1 1 2 2

3 3 3 3.

4 8 8 4θ θ θ θ θ

= − = −

Afterwards, we can set up a generic equation, which returns the resulting ( )i

θ for every it-

eration

( )1

2 (0) (0)

2 2

3 3 3 3 3(1)

8 8 4 8 4

Tk k

kθ θ θ

− = −

( )2 1 (0) (0)

2 2

3 3 3 3 3(2)

8 8 4 8 4

Tk k

kθ θ θ

+

= −

with *k ∈ℕ .

c) As we see from (1) and (2) a sectioning algorithm would never reach the real minimum

at [ ]*0 0

Tθ =

exactly. (Or it would need an infinite number of iterations).

Page 11: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 9 ( ) Simplex-Downhill (Nelder Mead) Algorithm 1

Page 12: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 10 ( ) Simplex-Downhill (Nelder Mead) Algorithm 2 The following enumeration describes the simplex-downhill steps which are shown in the following figure.

1. initial simplex 2. first, reflect the worst vertex → the new vertex is better than our best vertex, so

we expand the simplex 3. same procedure as in step 2 with the new worst vertex 4. reflection of the worst vertex. Here, a reflection is not possible as this would re-

sult in a vertex outside the feasible region 5. reflection of the worst vertex (We assume that now the reflected vertex is the

second best vertex) 6. contraction of the worst vertex (As the resulting point of the reflection would be

worse than the other points, we contract towards the inside) 7. same procedure as before 8. reflection

Page 13: ODS Solution 1 WS1415

ODS: Solution-TT 1 ODS EC-S 1-14

Task 11 ( ) NLSQ Parameter Fit Problem

a) One possible (and the common) objective function would be to minimize the sum of

squares of the residuals.

with [ ] ( ) [ ] [ ] [ ]( )1 2

expi i i i

r y u uθ θ θ= −

(e.g. for i=1 [ ] ( ) ( )1

1 23 1 exp 1r θ θ θ= − )

b) The gradient can be calculated by

If the term

is omitted, an approximation of the Hessian matrix can be determined by

c) As the gradient and at least one approximation of the Hessian matrix can be calculated

simply, a Newton based optimization technique is recommended to solve this problem.

( ) [ ] ( ) [ ] ( )[ ] ( )

[ ] ( )

[ ] [ ]( ) [ ] [ ]( )[ ]( ) [ ]( ) [ ]( ) [ ]( )

( )

[ ] ( )

[ ] ( )

1

1 4

4

11 1 4 4

2 2

2 21 1 4 4

41 2 1 2

...

exp ... exp

exp ... exp

G

rdJ dr dr

d d dr

ru u u u

u u u ur

θ

θθ θ θ

θ θ θθ

θθ θ

θ θ θ θθ

=

− − = −

���������������������

[ ] ( )[ ] ( )2

21 ( )

jNj

j

d rr

d

θθ

θ=

( )

( )( ) ( )

2

2.

Td JG G

d

θθ θ

θ≈

( ) [ ] ( ) [ ] ( ) [ ] ( ) [ ] ( )

[ ] ( )[ ] ( )[ ] ( )[ ] ( )

1

2

1 2 3 4

3

4

1

2

r

rJ r r r r

r

r

θ

θθ θ θ θ θ

θ

θ

=