28
Introduction to optimization methods and line search Jussi Hakanen Post-doctoral researcher [email protected] spring 2014 TIES483 Nonlinear optimization

Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

  • Upload
    others

  • View
    26

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Introduction to optimization

methods and line search

Jussi Hakanen

Post-doctoral researcher [email protected]

spring 2014 TIES483 Nonlinear optimization

Page 2: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

How to find optimal solutions?

Trial and error → widely used in practice, not efficient and high possibility to miss good solutions

Better to use a systematic way to find optimal solution

Typically we know only – function value(s) at the current trial point

– possibly gradients at the current trial point

How can we know which solution is optimal?

How can we find optimal solutions?

spring 2014 TIES483 Nonlinear optimization

Page 3: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Optimality conditions

How can we know that a solution is optimal?

One way is to utilize optimality conditions

Necessary optimality conditions = conditions

that an optimal solution has to satisfy (does

not guarantee optimality)

Sufficient optimality conditions = conditions

that guarantee optimality when satisfied

1. order conditions (1. order derivatives) and 2.

order conditions (2. order derivatives)

Page 4: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Global vs. local minimizers

A solution 𝑥∗ ∈ 𝑆 is a global minimizer if 𝑓 𝑥∗ ≤ 𝑓 𝑥 for all 𝑥 ∈ 𝑆

A solution 𝑥∗ ∈ 𝑆 is a local minimizer if there exists an 𝜖 > 0 s.t. 𝑓 𝑥∗ ≤ 𝑓(𝑥) for all 𝑥 ∈ 𝑆 where 𝑥 − 𝑥∗ < 𝜖

Convexity: a local minimizer is a global minimizer

Global minimizers are preferred, local minimizers are usually easier to identify

spring 2014 TIES483 Nonlinear optimization

Page 5: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Solving an optimization problem

Find optimal values 𝑥∗ for the variables

Problems that can be solved analytically

min 𝑥2, 𝑤ℎ𝑒𝑛 𝑥 ≥ 3 → 𝑥∗ = 3

Usually impossible to solve analytically

Must be solved numerically →

approximation of the solution

– In mathematical optimization a starting point is

iteratively improved

spring 2014 TIES483 Nonlinear optimization

Page 6: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Numerical solution

Modelling → mathematical model of the

problem

Numerical methods → numerical simulation

model for the mathematical model

Optimization method → solve the problem

utilizing the numerical simulation model

SO

modelling → simulation → optimization

spring 2014 TIES483 Nonlinear optimization

Page 7: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Optimization method

Algorithm: a mathematical description 1. Choose a stopping parameter 𝜀 > 0, starting point 𝑥1 and a

symmetric positive definite 𝑛 × 𝑛 matrix 𝐷1(e.g. 𝐷1 = 𝐼). Set 𝑦1 = 𝑥1 and ℎ = 𝑗 = 1.

2. If 𝛻𝑓(𝑦𝑗) < 𝜀, stop. Otherwise, set 𝑑𝑗 = −𝐷𝛻𝑓(𝑦𝑗). Let 𝜆𝑗 be a solution of

min 𝑓(𝑦𝑗 + 𝜆𝑑𝑗), s.t. 𝜆 ≥ 0.

Set 𝑦𝑗+1 = 𝑦𝑗 + 𝜆𝑗𝑑𝑗. If 𝑗 = 𝑛, set 𝑦1 = 𝑥ℎ+1 = 𝑦𝑛+1, ℎ = ℎ + 1, 𝑗 = 1 and repeat (2).

3. Compute 𝐷𝑗+1. Set 𝑗 = 𝑗 + 1 and go to (2).

Method: numerical methods included

Software: a method implemented as a computer programme

spring 2014 TIES483 Nonlinear optimization

Page 8: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Structure of optimization methods

Typically – Constraint handling

converts the problem to (a series of) unconstrained problems

– In unconstrained optimization a search direction is determined at each iteration

– The best solution in the search direction is found with line search

spring 2014 TIES483 Nonlinear optimization

Constraint handling

method

Unconstrained

optimization

Line

search

Page 9: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Local optimization methods

Find a (closest) local optimum

Fast

Usually utilize derivatives

Mathematical convergence

For example – Direct search methods (pattern search, Hooke

& Jeeves, Nelder & Mead, …)

– Gradient based methods (steepest descent, Newton’s method, quasi-Newton method, conjugate gradient, SQP, interior point methods…)

spring 2014 TIES483 Nonlinear optimization

Page 10: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Global optimization methods

Try to get as close to global optimum as

possible

No mathematical convergence

Do not assume much of the problem

Slow, use lots of function evaluations

Heuristic, contain randomness

Most well known are nature-inspired

methods (TIES451 Selected topics in soft computing)

– based on improving a population of solutions at

a time instead of a single solution

spring 2014 TIES483 Nonlinear optimization

Page 11: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Hybrid methods

Combination of global and local methods

Try to combine the benefits of both

– rough estimate with a global method, fine tune

with a local method

Challenge: how the methods should be

combined?

– e.g. when to switch from global to local? (speed

vs. accuracy)

spring 2014 TIES483 Nonlinear optimization

Page 12: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Line search

What did you find out

about line search?

spring 2014 TIES483 Nonlinear optimization

Page 13: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Line search

The idea of line search is to optimize a given

function with respect to a single variable

Optimization algorithms for multivariable problems

generate iteratively search directions in which

better solutions are found

– Line search is used to find these!

Exact minimum is not required but an

approximation of it which is within a given

tolerance 𝜖 > 0

– enough to know that x∗ ∈ [𝑎∗, 𝑏∗] where 𝑏∗ − 𝑎∗ < 𝜖

spring 2014 TIES483 Nonlinear optimization

Page 14: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Optimality conditions

Necessary: Let 𝑓: 𝑅 → 𝑅 be differentiable. If 𝑥∗

is a local minimizer, then 𝑓′ 𝑥∗ = 0. In

addition, if 𝑓 is twice continuously

differentiable and 𝑥∗ is a local minimizer, then

𝑓′′ 𝑥∗ ≥ 0.

Sufficient: Let 𝑓: 𝑅 → 𝑅 be twice continuously

differentiable. If 𝑓′ 𝑥∗ = 0 and 𝑓′′ 𝑥∗ > 0,

then 𝑥∗ is a strict local minimizer.

spring 2014 TIES483 Nonlinear optimization

Page 15: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Examples

spring 2014 TIES483 Nonlinear optimization

𝑓 𝑥 = (𝑥 − 2)2−4

𝑓′ 𝑥 = 2𝑥 − 4

𝑓′′ 𝑥 = 2

If 𝑥∗ = 2, then both the

necessary and sufficient

optimality conditions are

satisfied

𝑓 𝑥 = (𝑥 − 2)2−4

Page 16: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Examples

spring 2014 TIES483 Nonlinear optimization

𝑓 𝑥 = (𝑥 − 2)3−4

𝑓′ 𝑥 = 3 𝑥 − 2 2

𝑓′′ 𝑥 = 6𝑥 − 12

If 𝑥∗ = 2, then the necessary

optimality conditions are

satisfied although 𝑥∗ = 2 is

not a local minimizer

– It is a saddle point

Sufficient optimality

conditions are not satisfied

in 𝑥∗ = 2

𝑓 𝑥 = (𝑥 − 2)3−4

Page 17: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Note on optimality conditions

If 𝑓 is not differentiable, then local minimizer

can be in a point where 𝑓 is

1) not differentiable or

2) discontinuous

spring 2014 TIES483 Nonlinear optimization

𝑓 𝑥 = 𝑥

Page 18: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Finding a unimodal interval

Most line search methods assume that the

search is started from a unimodal interval

[𝑎, 𝑏]

𝑓 is unimodal in [𝑎, 𝑏] if there is exactly one

𝑥∗ ∈ [𝑎, 𝑏] s.t. for all 𝑥1, 𝑥2 ∈ [𝑎, 𝑏] for which

𝑥1 < 𝑥2 holds

– If 𝑥2 < 𝑥∗, then 𝑓 𝑥1 > 𝑓(𝑥2) and

– If 𝑥1 > 𝑥∗, then 𝑓 𝑥1 < 𝑓(𝑥2)

spring 2014 TIES483 Nonlinear optimization

Page 19: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Search with fixed steps

Let (𝐴, 𝐵) be the interval where we want to find a minimum for 𝑓

Compute values for 𝑓 in 𝑃 equally spaced points 𝑥𝑖 in (𝐴, 𝐵)

– 𝑥𝑖 = 𝐴 +𝑖

𝑃+1(𝐵 − 𝐴), 𝑖 = 1, … , 𝑃

When points 𝑥𝑗 , 𝑥𝑗+𝑖 and 𝑥𝑗+2 are found s.t. 𝑓 𝑥𝑗 > 𝑓 𝑥𝑗+1 < 𝑓(𝑥𝑗+2), we know that there

exist at least one local minimizer in (𝑥𝑗 , 𝑥𝑗+2)

The interval can be further reduced

spring 2014 TIES483 Nonlinear optimization

Page 20: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Line search methods

Assume that 𝑓 is unimodal in [𝑎, 𝑏]

General idea is to start reducing the interval [𝑎, 𝑏] s.t. the minimizer is still included in it

An approximation of the minimizer is found when the length of the interval is smaller than a pre-determined tolerance

Line search methods can be divided into – Elimination methods

– Interpolation methods (often use derivatives)

spring 2014 TIES483 Nonlinear optimization

Page 21: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

The method of bisection

Elimination method

1) Choose small but significant constant 2𝜖 > 0 and an allowable

length 𝐿 > 0 for the final interval. Let [𝑎1, 𝑏1] be the original

(unimodal) interval. Set ℎ = 1.

2) If 𝑏ℎ − 𝑎ℎ < 𝐿, stop. Minimizer 𝑥∗ ∈ [𝑎ℎ, 𝑏ℎ]. Otherwise,

compute values of 𝑓 in

𝑥ℎ =𝑎ℎ+𝑏ℎ

2− 𝜖 and 𝑦ℎ =

𝑎ℎ+𝑏ℎ

2+ 𝜖.

3) If 𝑓 𝑥ℎ < 𝑓(𝑦ℎ), set 𝑎ℎ+1 = 𝑎ℎ and 𝑏ℎ+1 = 𝑦ℎ. Otherwise, set

𝑎ℎ+1 = 𝑥ℎ and 𝑏ℎ+1 = 𝑏ℎ. Set ℎ = ℎ + 1 and go to step 2).

spring 2014 TIES483 Nonlinear optimization

𝑎ℎ 𝑏ℎ

𝑥ℎ 𝑦ℎ

2𝜖

Page 22: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

The method of bisection (cont.)

Efficiency:

– Length of the interval after ℎ iterations is 1

2ℎ 𝑏 − 𝑎 + 2𝜖 1 −1

2ℎ

– Number of iterations required if the final length

should be 𝐿 is (why?)

ℎ = −ln (𝐿−2𝜖

𝑏−𝑎−2𝜖)/ ln 2

– For each iteration, the objective function is

evaluated 2 times (in 𝑥ℎ and 𝑦ℎ) → in total 2ℎ

evaluations

spring 2014 TIES483 Nonlinear optimization

Page 23: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Golden section

Assume that we want to separate a sub interval (length 𝑦) from an interval of length 𝐿 such that

𝐿

𝑦=

𝑦

𝐿−𝑦

Then, 𝑦 =5−1

2𝐿 ≈ 0.618𝐿

It is said that now the interval is divided in the ratio of golden section

Theorem Divide an interval [𝑎, 𝑏] in the ratio of golden section first from right (point 𝑑) and then from left (point 𝑐). Then point 𝑐 divides the interval [𝑎, 𝑑] in the ratio of golden section and point 𝑑 does the same for [𝑐, 𝑏].

spring 2014 TIES483 Nonlinear optimization

𝑎 𝑏 𝑐 𝑑

Page 24: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Golden section search

Elimination method, known also as Fibonacci search. Let

𝐶 =5−1

2.

1) Choose an allowable length 𝐿 > 0 for the final interval. Let [𝑎1, 𝑏1] be the original (unimodal) interval. Set 𝑥1 = 𝑎1 +1 − 𝐶 𝑏1 − 𝑎1 = 𝑏1 − 𝐶(𝑏1 − 𝑎1) and 𝑦1 = 𝑎1 + 𝐶 𝑏1 − 𝑎1 .

Compute 𝑓(𝑥1) and 𝑓(𝑦1). Set ℎ = 1.

2) If 𝑏ℎ − 𝑎ℎ < 𝐿, stop. Minimizer 𝑥∗ ∈ [𝑎ℎ , 𝑏ℎ]. Otherwise, if 𝑓 𝑥ℎ ≤ 𝑓(𝑦ℎ) go to step 4).

3) Set 𝑎ℎ+1 = 𝑥ℎ and 𝑏ℎ+1 = 𝑏ℎ. Further set 𝑥ℎ+1 = 𝑦ℎ and 𝑦ℎ+1 = 𝑎ℎ+1 + 𝐶(𝑏ℎ+1 − 𝑎ℎ+1). Compute 𝑓(𝑦ℎ+1) and go to step 5).

4) Set 𝑎ℎ+1 = 𝑎ℎ and 𝑏ℎ+1 = 𝑦ℎ. Further set 𝑦ℎ+1 = 𝑥ℎ and 𝑥ℎ+1 = 𝑎ℎ+1 + (1 − 𝐶)(𝑏ℎ+1 − 𝑎ℎ+1). Compute 𝑓(𝑥ℎ+1).

5) Set ℎ = ℎ + 1 and go to step 2).

spring 2014 TIES483 Nonlinear optimization

𝑎ℎ 𝑏h 𝑥ℎ 𝑦ℎ

Page 25: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Golden section search (cont.)

Efficiency – Length of the interval after ℎ iterations is

𝐶ℎ(𝑏 − 𝑎)

– Number of iterations required if the final length should be 𝐿 is (why?)

ℎ = ln (𝐿

𝑏−𝑎)/ ln 𝐶

– For each iteration (except the last), the objective function is evaluated one time (in 𝑥ℎ+1 or 𝑦ℎ+1) plus in the beginning in two points (𝑥1 and 𝑦1) → in total ℎ + 1 evaluations

spring 2014 TIES483 Nonlinear optimization

Page 26: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Quadratic interpolation

Idea is to approximate 𝑓 with a quadratic polynomial whose minimizer is known

Taylor’s second order polynomial is used: 𝑝 𝑥 = 𝑓 𝑥ℎ + 𝑓′ 𝑥ℎ 𝑥 − 𝑥ℎ +

1

2𝑓′′ 𝑥ℎ 𝑥 − 𝑥ℎ 2

If 𝑓′′ 𝑥ℎ ≠ 0, then 𝑝(𝑥) has a critical point in 𝑥ℎ+1

when 𝑝′(𝑥ℎ+1) = 0 → 𝑥ℎ+1 = 𝑥ℎ −𝑓′(𝑥ℎ)

𝑓′′(𝑥ℎ)

Newton’s method for solving 𝑓′ 𝑥 = 0!!!

Interpolation can also be applied in the case where no derivatives are available (find out the idea by yourself)

spring 2014 TIES483 Nonlinear optimization

Page 27: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Programming assignment

Form the pairs!!!

Start programming by implementing some line

search method

Any programming language is ok

Test your implementation with some

optimization problems where you know the

minimizer

spring 2014 TIES483 Nonlinear optimization

Page 28: Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Topic of the lectures on January 20th

& 22nd

Mon, Jan 20th: unconstrained optimization with multiple variables, optimality conditions and methods that don’t utilize gradient information (=direct search methods)

Wed, Jan 22nd: methods that utilize gradient information

Study this before the lecture!

Questions to be considered – What kind of optimality conditions there exist?

– What kind of techniques direct search methods use to find a local minimizer?

– How gradient information is utilized?

spring 2014 TIES483 Nonlinear optimization