Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral

Introduction to optimization

methods and line search

Jussi Hakanen

Post-doctoral researcher [email protected]

spring 2014 TIES483 Nonlinear optimization

mailto:[email protected]

How to find optimal solutions?

Trial and error → widely used in practice, not efficient and high possibility to miss good solutions

Better to use a systematic way to find optimal solution

Typically we know only – function value(s) at the current trial point

– possibly gradients at the current trial point

How can we know which solution is optimal?

How can we find optimal solutions?


Optimality conditions

How can we know that a solution is optimal?

One way is to utilize optimality conditions

Necessary optimality conditions = conditions

that an optimal solution has to satisfy (does

not guarantee optimality)

Sufficient optimality conditions = conditions

that guarantee optimality when satisfied

1. order conditions (1. order derivatives) and 2.

order conditions (2. order derivatives)

Global vs. local minimizers

A solution 𝑥∗ ∈ 𝑆 is a global minimizer if 𝑓 𝑥∗ ≤ 𝑓 𝑥 for all 𝑥 ∈ 𝑆

A solution 𝑥∗ ∈ 𝑆 is a local minimizer if there exists an 𝜖 > 0 s.t. 𝑓 𝑥∗ ≤ 𝑓(𝑥) for all 𝑥 ∈ 𝑆 where 𝑥 − 𝑥∗ < 𝜖

Convexity: a local minimizer is a global minimizer

Global minimizers are preferred, local minimizers are usually easier to identify


Solving an optimization problem

Find optimal values 𝑥∗ for the variables

Problems that can be solved analytically

min 𝑥2, 𝑤ℎ𝑒𝑛 𝑥 ≥ 3 → 𝑥∗ = 3

Usually impossible to solve analytically

Must be solved numerically →

approximation of the solution

– In mathematical optimization a starting point is

iteratively improved


Numerical solution

Modelling → mathematical model of the

problem

Numerical methods → numerical simulation

model for the mathematical model

Optimization method → solve the problem

utilizing the numerical simulation model

SO

modelling → simulation → optimization


Optimization method

Algorithm: a mathematical description 1. Choose a stopping parameter 𝜀 > 0, starting point 𝑥1 and a

symmetric positive definite 𝑛 × 𝑛 matrix 𝐷1(e.g. 𝐷1 = 𝐼). Set 𝑦1 = 𝑥1 and ℎ = 𝑗 = 1.

2. If 𝛻𝑓(𝑦𝑗) < 𝜀, stop. Otherwise, set 𝑑𝑗 = −𝐷𝛻𝑓(𝑦𝑗). Let 𝜆𝑗 be a solution of

min 𝑓(𝑦𝑗 + 𝜆𝑑𝑗), s.t. 𝜆 ≥ 0.

Set 𝑦𝑗+1 = 𝑦𝑗 + 𝜆𝑗𝑑𝑗. If 𝑗 = 𝑛, set 𝑦1 = 𝑥ℎ+1 = 𝑦𝑛+1, ℎ = ℎ + 1, 𝑗 = 1 and repeat (2).

3. Compute 𝐷𝑗+1. Set 𝑗 = 𝑗 + 1 and go to (2).

Method: numerical methods included

Software: a method implemented as a computer programme


Structure of optimization methods

Typically – Constraint handling

converts the problem to (a series of) unconstrained problems

– In unconstrained optimization a search direction is determined at each iteration

– The best solution in the search direction is found with line search


Constraint handling

method

Unconstrained

optimization

Line

search

Local optimization methods

Find a (closest) local optimum

Fast

Usually utilize derivatives

Mathematical convergence

For example – Direct search methods (pattern search, Hooke

& Jeeves, Nelder & Mead, …)

– Gradient based methods (steepest descent, Newton’s method, quasi-Newton method, conjugate gradient, SQP, interior point methods…)


Global optimization methods

Try to get as close to global optimum as

possible

No mathematical convergence

Do not assume much of the problem

Slow, use lots of function evaluations

Heuristic, contain randomness

Most well known are nature-inspired

methods (TIES451 Selected topics in soft computing)

– based on improving a population of solutions at

a time instead of a single solution


Hybrid methods

Combination of global and local methods

Try to combine the benefits of both

– rough estimate with a global method, fine tune

with a local method

Challenge: how the methods should be

combined?

– e.g. when to switch from global to local? (speed

vs. accuracy)


Line search

What did you find out

about line search?


Line search

The idea of line search is to optimize a given

function with respect to a single variable

Optimization algorithms for multivariable problems

generate iteratively search directions in which

better solutions are found

– Line search is used to find these!

Exact minimum is not required but an

approximation of it which is within a given

tolerance 𝜖 > 0

– enough to know that x∗ ∈ [𝑎∗, 𝑏∗] where 𝑏∗ − 𝑎∗ < 𝜖


Optimality conditions

Necessary: Let 𝑓: 𝑅 → 𝑅 be differentiable. If 𝑥∗

is a local minimizer, then 𝑓′ 𝑥∗ = 0. In

addition, if 𝑓 is twice continuously

differentiable and 𝑥∗ is a local minimizer, then

𝑓′′ 𝑥∗ ≥ 0.

Sufficient: Let 𝑓: 𝑅 → 𝑅 be twice continuously

differentiable. If 𝑓′ 𝑥∗ = 0 and 𝑓′′ 𝑥∗ > 0,

then 𝑥∗ is a strict local minimizer.


Examples


𝑓 𝑥 = (𝑥 − 2)2−4

𝑓′ 𝑥 = 2𝑥 − 4

𝑓′′ 𝑥 = 2

If 𝑥∗ = 2, then both the

necessary and sufficient

optimality conditions are

satisfied

𝑓 𝑥 = (𝑥 − 2)2−4

Examples


𝑓 𝑥 = (𝑥 − 2)3−4

𝑓′ 𝑥 = 3 𝑥 − 2 2

𝑓′′ 𝑥 = 6𝑥 − 12

If 𝑥∗ = 2, then the necessary

optimality conditions are

satisfied although 𝑥∗ = 2 is

not a local minimizer

– It is a saddle point

Sufficient optimality

conditions are not satisfied

in 𝑥∗ = 2

𝑓 𝑥 = (𝑥 − 2)3−4

Note on optimality conditions

If 𝑓 is not differentiable, then local minimizer

can be in a point where 𝑓 is

1) not differentiable or

2) discontinuous


𝑓 𝑥 = 𝑥

Finding a unimodal interval

Most line search methods assume that the

search is started from a unimodal interval

[𝑎, 𝑏]

𝑓 is unimodal in [𝑎, 𝑏] if there is exactly one

𝑥∗ ∈ [𝑎, 𝑏] s.t. for all 𝑥1, 𝑥2 ∈ [𝑎, 𝑏] for which

𝑥1 < 𝑥2 holds

– If 𝑥2 < 𝑥∗, then 𝑓 𝑥1 > 𝑓(𝑥2) and

– If 𝑥1 > 𝑥∗, then 𝑓 𝑥1 < 𝑓(𝑥2)


Search with fixed steps

Let (𝐴, 𝐵) be the interval where we want to find a minimum for 𝑓

Compute values for 𝑓 in 𝑃 equally spaced points 𝑥𝑖 in (𝐴, 𝐵)

– 𝑥𝑖 = 𝐴 +𝑖

𝑃+1(𝐵 − 𝐴), 𝑖 = 1, … , 𝑃

When points 𝑥𝑗 , 𝑥𝑗+𝑖 and 𝑥𝑗+2 are found s.t. 𝑓 𝑥𝑗 > 𝑓 𝑥𝑗+1 < 𝑓(𝑥𝑗+2), we know that there

exist at least one local minimizer in (𝑥𝑗 , 𝑥𝑗+2)

The interval can be further reduced


Line search methods

Assume that 𝑓 is unimodal in [𝑎, 𝑏]

General idea is to start reducing the interval [𝑎, 𝑏] s.t. the minimizer is still included in it

An approximation of the minimizer is found when the length of the interval is smaller than a pre-determined tolerance

Line search methods can be divided into – Elimination methods

– Interpolation methods (often use derivatives)


The method of bisection

Elimination method

1) Choose small but significant constant 2𝜖 > 0 and an allowable

length 𝐿 > 0 for the final interval. Let [𝑎1, 𝑏1] be the original

(unimodal) interval. Set ℎ = 1.

2) If 𝑏ℎ − 𝑎ℎ < 𝐿, stop. Minimizer 𝑥∗ ∈ [𝑎ℎ, 𝑏ℎ]. Otherwise,

compute values of 𝑓 in

𝑥ℎ =𝑎ℎ+𝑏ℎ

2− 𝜖 and 𝑦ℎ =

𝑎ℎ+𝑏ℎ

2+ 𝜖.

3) If 𝑓 𝑥ℎ < 𝑓(𝑦ℎ), set 𝑎ℎ+1 = 𝑎ℎ and 𝑏ℎ+1 = 𝑦ℎ. Otherwise, set

𝑎ℎ+1 = 𝑥ℎ and 𝑏ℎ+1 = 𝑏ℎ. Set ℎ = ℎ + 1 and go to step 2).


𝑎ℎ 𝑏ℎ

𝑥ℎ 𝑦ℎ

2𝜖

The method of bisection (cont.)

Efficiency:

– Length of the interval after ℎ iterations is 1

2ℎ 𝑏 − 𝑎 + 2𝜖 1 −1

2ℎ

– Number of iterations required if the final length

should be 𝐿 is (why?)

ℎ = −ln (𝐿−2𝜖

𝑏−𝑎−2𝜖)/ ln 2

– For each iteration, the objective function is

evaluated 2 times (in 𝑥ℎ and 𝑦ℎ) → in total 2ℎ

evaluations


Golden section

Assume that we want to separate a sub interval (length 𝑦) from an interval of length 𝐿 such that

𝐿

𝑦=

𝑦

𝐿−𝑦

Then, 𝑦 =5−1

2𝐿 ≈ 0.618𝐿

It is said that now the interval is divided in the ratio of golden section

Theorem Divide an interval [𝑎, 𝑏] in the ratio of golden section first from right (point 𝑑) and then from left (point 𝑐). Then point 𝑐 divides the interval [𝑎, 𝑑] in the ratio of golden section and point 𝑑 does the same for [𝑐, 𝑏].


𝑎 𝑏 𝑐 𝑑

Golden section search

Elimination method, known also as Fibonacci search. Let

𝐶 =5−1

2.

1) Choose an allowable length 𝐿 > 0 for the final interval. Let [𝑎1, 𝑏1] be the original (unimodal) interval. Set 𝑥1 = 𝑎1 +1 − 𝐶 𝑏1 − 𝑎1 = 𝑏1 − 𝐶(𝑏1 − 𝑎1) and 𝑦1 = 𝑎1 + 𝐶 𝑏1 − 𝑎1 .

Compute 𝑓(𝑥1) and 𝑓(𝑦1). Set ℎ = 1.

2) If 𝑏ℎ − 𝑎ℎ < 𝐿, stop. Minimizer 𝑥∗ ∈ [𝑎ℎ , 𝑏ℎ]. Otherwise, if 𝑓 𝑥ℎ ≤ 𝑓(𝑦ℎ) go to step 4).

3) Set 𝑎ℎ+1 = 𝑥ℎ and 𝑏ℎ+1 = 𝑏ℎ. Further set 𝑥ℎ+1 = 𝑦ℎ and 𝑦ℎ+1 = 𝑎ℎ+1 + 𝐶(𝑏ℎ+1 − 𝑎ℎ+1). Compute 𝑓(𝑦ℎ+1) and go to step 5).

4) Set 𝑎ℎ+1 = 𝑎ℎ and 𝑏ℎ+1 = 𝑦ℎ. Further set 𝑦ℎ+1 = 𝑥ℎ and 𝑥ℎ+1 = 𝑎ℎ+1 + (1 − 𝐶)(𝑏ℎ+1 − 𝑎ℎ+1). Compute 𝑓(𝑥ℎ+1).

5) Set ℎ = ℎ + 1 and go to step 2).


𝑎ℎ 𝑏h 𝑥ℎ 𝑦ℎ

Golden section search (cont.)

Efficiency – Length of the interval after ℎ iterations is

𝐶ℎ(𝑏 − 𝑎)

– Number of iterations required if the final length should be 𝐿 is (why?)

ℎ = ln (𝐿

𝑏−𝑎)/ ln 𝐶

– For each iteration (except the last), the objective function is evaluated one time (in 𝑥ℎ+1 or 𝑦ℎ+1) plus in the beginning in two points (𝑥1 and 𝑦1) → in total ℎ + 1 evaluations


Quadratic interpolation

Idea is to approximate 𝑓 with a quadratic polynomial whose minimizer is known

Taylor’s second order polynomial is used: 𝑝 𝑥 = 𝑓 𝑥ℎ + 𝑓′ 𝑥ℎ 𝑥 − 𝑥ℎ +

1

2𝑓′′ 𝑥ℎ 𝑥 − 𝑥ℎ 2

If 𝑓′′ 𝑥ℎ ≠ 0, then 𝑝(𝑥) has a critical point in 𝑥ℎ+1

when 𝑝′(𝑥ℎ+1) = 0 → 𝑥ℎ+1 = 𝑥ℎ −𝑓′(𝑥ℎ)

𝑓′′(𝑥ℎ)

Newton’s method for solving 𝑓′ 𝑥 = 0!!!

Interpolation can also be applied in the case where no derivatives are available (find out the idea by yourself)


Programming assignment

Form the pairs!!!

Start programming by implementing some line

search method

Any programming language is ok

Test your implementation with some

optimization problems where you know the

minimizer


Topic of the lectures on January 20th

& 22nd

Mon, Jan 20th: unconstrained optimization with multiple variables, optimality conditions and methods that don’t utilize gradient information (=direct search methods)

Wed, Jan 22nd: methods that utilize gradient information

Study this before the lecture!

Questions to be considered – What kind of optimality conditions there exist?

– What kind of techniques direct search methods use to find a local minimizer?

– How gradient information is utilized?


Documents

Introduction to optimization methods and line searchusers.jyu.fi/~jhaka/opt/TIES483_line_search.pdf · Introduction to optimization methods and line search Jussi Hakanen Post-doctoral