Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
Introduction to optimization
methods and line search
Jussi Hakanen
Post-doctoral researcher [email protected]
spring 2014 TIES483 Nonlinear optimization
How to find optimal solutions?
Trial and error → widely used in practice, not efficient and high possibility to miss good solutions
Better to use a systematic way to find optimal solution
Typically we know only – function value(s) at the current trial point
– possibly gradients at the current trial point
How can we know which solution is optimal?
How can we find optimal solutions?
spring 2014 TIES483 Nonlinear optimization
Optimality conditions
How can we know that a solution is optimal?
One way is to utilize optimality conditions
Necessary optimality conditions = conditions
that an optimal solution has to satisfy (does
not guarantee optimality)
Sufficient optimality conditions = conditions
that guarantee optimality when satisfied
1. order conditions (1. order derivatives) and 2.
order conditions (2. order derivatives)
Global vs. local minimizers
A solution 𝑥∗ ∈ 𝑆 is a global minimizer if 𝑓 𝑥∗ ≤ 𝑓 𝑥 for all 𝑥 ∈ 𝑆
A solution 𝑥∗ ∈ 𝑆 is a local minimizer if there exists an 𝜖 > 0 s.t. 𝑓 𝑥∗ ≤ 𝑓(𝑥) for all 𝑥 ∈ 𝑆 where 𝑥 − 𝑥∗ < 𝜖
Convexity: a local minimizer is a global minimizer
Global minimizers are preferred, local minimizers are usually easier to identify
spring 2014 TIES483 Nonlinear optimization
Solving an optimization problem
Find optimal values 𝑥∗ for the variables
Problems that can be solved analytically
min 𝑥2, 𝑤ℎ𝑒𝑛 𝑥 ≥ 3 → 𝑥∗ = 3
Usually impossible to solve analytically
Must be solved numerically →
approximation of the solution
– In mathematical optimization a starting point is
iteratively improved
spring 2014 TIES483 Nonlinear optimization
Numerical solution
Modelling → mathematical model of the
problem
Numerical methods → numerical simulation
model for the mathematical model
Optimization method → solve the problem
utilizing the numerical simulation model
SO
modelling → simulation → optimization
spring 2014 TIES483 Nonlinear optimization
Optimization method
Algorithm: a mathematical description 1. Choose a stopping parameter 𝜀 > 0, starting point 𝑥1 and a
symmetric positive definite 𝑛 × 𝑛 matrix 𝐷1(e.g. 𝐷1 = 𝐼). Set 𝑦1 = 𝑥1 and ℎ = 𝑗 = 1.
2. If 𝛻𝑓(𝑦𝑗) < 𝜀, stop. Otherwise, set 𝑑𝑗 = −𝐷𝛻𝑓(𝑦𝑗). Let 𝜆𝑗 be a solution of
min 𝑓(𝑦𝑗 + 𝜆𝑑𝑗), s.t. 𝜆 ≥ 0.
Set 𝑦𝑗+1 = 𝑦𝑗 + 𝜆𝑗𝑑𝑗. If 𝑗 = 𝑛, set 𝑦1 = 𝑥ℎ+1 = 𝑦𝑛+1, ℎ = ℎ + 1, 𝑗 = 1 and repeat (2).
3. Compute 𝐷𝑗+1. Set 𝑗 = 𝑗 + 1 and go to (2).
Method: numerical methods included
Software: a method implemented as a computer programme
spring 2014 TIES483 Nonlinear optimization
Structure of optimization methods
Typically – Constraint handling
converts the problem to (a series of) unconstrained problems
– In unconstrained optimization a search direction is determined at each iteration
– The best solution in the search direction is found with line search
spring 2014 TIES483 Nonlinear optimization
Constraint handling
method
Unconstrained
optimization
Line
search
Local optimization methods
Find a (closest) local optimum
Fast
Usually utilize derivatives
Mathematical convergence
For example – Direct search methods (pattern search, Hooke
& Jeeves, Nelder & Mead, …)
– Gradient based methods (steepest descent, Newton’s method, quasi-Newton method, conjugate gradient, SQP, interior point methods…)
spring 2014 TIES483 Nonlinear optimization
Global optimization methods
Try to get as close to global optimum as
possible
No mathematical convergence
Do not assume much of the problem
Slow, use lots of function evaluations
Heuristic, contain randomness
Most well known are nature-inspired
methods (TIES451 Selected topics in soft computing)
– based on improving a population of solutions at
a time instead of a single solution
spring 2014 TIES483 Nonlinear optimization
Hybrid methods
Combination of global and local methods
Try to combine the benefits of both
– rough estimate with a global method, fine tune
with a local method
Challenge: how the methods should be
combined?
– e.g. when to switch from global to local? (speed
vs. accuracy)
spring 2014 TIES483 Nonlinear optimization
Line search
What did you find out
about line search?
spring 2014 TIES483 Nonlinear optimization
Line search
The idea of line search is to optimize a given
function with respect to a single variable
Optimization algorithms for multivariable problems
generate iteratively search directions in which
better solutions are found
– Line search is used to find these!
Exact minimum is not required but an
approximation of it which is within a given
tolerance 𝜖 > 0
– enough to know that x∗ ∈ [𝑎∗, 𝑏∗] where 𝑏∗ − 𝑎∗ < 𝜖
spring 2014 TIES483 Nonlinear optimization
Optimality conditions
Necessary: Let 𝑓: 𝑅 → 𝑅 be differentiable. If 𝑥∗
is a local minimizer, then 𝑓′ 𝑥∗ = 0. In
addition, if 𝑓 is twice continuously
differentiable and 𝑥∗ is a local minimizer, then
𝑓′′ 𝑥∗ ≥ 0.
Sufficient: Let 𝑓: 𝑅 → 𝑅 be twice continuously
differentiable. If 𝑓′ 𝑥∗ = 0 and 𝑓′′ 𝑥∗ > 0,
then 𝑥∗ is a strict local minimizer.
spring 2014 TIES483 Nonlinear optimization
Examples
spring 2014 TIES483 Nonlinear optimization
𝑓 𝑥 = (𝑥 − 2)2−4
𝑓′ 𝑥 = 2𝑥 − 4
𝑓′′ 𝑥 = 2
If 𝑥∗ = 2, then both the
necessary and sufficient
optimality conditions are
satisfied
𝑓 𝑥 = (𝑥 − 2)2−4
Examples
spring 2014 TIES483 Nonlinear optimization
𝑓 𝑥 = (𝑥 − 2)3−4
𝑓′ 𝑥 = 3 𝑥 − 2 2
𝑓′′ 𝑥 = 6𝑥 − 12
If 𝑥∗ = 2, then the necessary
optimality conditions are
satisfied although 𝑥∗ = 2 is
not a local minimizer
– It is a saddle point
Sufficient optimality
conditions are not satisfied
in 𝑥∗ = 2
𝑓 𝑥 = (𝑥 − 2)3−4
Note on optimality conditions
If 𝑓 is not differentiable, then local minimizer
can be in a point where 𝑓 is
1) not differentiable or
2) discontinuous
spring 2014 TIES483 Nonlinear optimization
𝑓 𝑥 = 𝑥
Finding a unimodal interval
Most line search methods assume that the
search is started from a unimodal interval
[𝑎, 𝑏]
𝑓 is unimodal in [𝑎, 𝑏] if there is exactly one
𝑥∗ ∈ [𝑎, 𝑏] s.t. for all 𝑥1, 𝑥2 ∈ [𝑎, 𝑏] for which
𝑥1 < 𝑥2 holds
– If 𝑥2 < 𝑥∗, then 𝑓 𝑥1 > 𝑓(𝑥2) and
– If 𝑥1 > 𝑥∗, then 𝑓 𝑥1 < 𝑓(𝑥2)
spring 2014 TIES483 Nonlinear optimization
Search with fixed steps
Let (𝐴, 𝐵) be the interval where we want to find a minimum for 𝑓
Compute values for 𝑓 in 𝑃 equally spaced points 𝑥𝑖 in (𝐴, 𝐵)
– 𝑥𝑖 = 𝐴 +𝑖
𝑃+1(𝐵 − 𝐴), 𝑖 = 1, … , 𝑃
When points 𝑥𝑗 , 𝑥𝑗+𝑖 and 𝑥𝑗+2 are found s.t. 𝑓 𝑥𝑗 > 𝑓 𝑥𝑗+1 < 𝑓(𝑥𝑗+2), we know that there
exist at least one local minimizer in (𝑥𝑗 , 𝑥𝑗+2)
The interval can be further reduced
spring 2014 TIES483 Nonlinear optimization
Line search methods
Assume that 𝑓 is unimodal in [𝑎, 𝑏]
General idea is to start reducing the interval [𝑎, 𝑏] s.t. the minimizer is still included in it
An approximation of the minimizer is found when the length of the interval is smaller than a pre-determined tolerance
Line search methods can be divided into – Elimination methods
– Interpolation methods (often use derivatives)
spring 2014 TIES483 Nonlinear optimization
The method of bisection
Elimination method
1) Choose small but significant constant 2𝜖 > 0 and an allowable
length 𝐿 > 0 for the final interval. Let [𝑎1, 𝑏1] be the original
(unimodal) interval. Set ℎ = 1.
2) If 𝑏ℎ − 𝑎ℎ < 𝐿, stop. Minimizer 𝑥∗ ∈ [𝑎ℎ, 𝑏ℎ]. Otherwise,
compute values of 𝑓 in
𝑥ℎ =𝑎ℎ+𝑏ℎ
2− 𝜖 and 𝑦ℎ =
𝑎ℎ+𝑏ℎ
2+ 𝜖.
3) If 𝑓 𝑥ℎ < 𝑓(𝑦ℎ), set 𝑎ℎ+1 = 𝑎ℎ and 𝑏ℎ+1 = 𝑦ℎ. Otherwise, set
𝑎ℎ+1 = 𝑥ℎ and 𝑏ℎ+1 = 𝑏ℎ. Set ℎ = ℎ + 1 and go to step 2).
spring 2014 TIES483 Nonlinear optimization
𝑎ℎ 𝑏ℎ
𝑥ℎ 𝑦ℎ
2𝜖
The method of bisection (cont.)
Efficiency:
– Length of the interval after ℎ iterations is 1
2ℎ 𝑏 − 𝑎 + 2𝜖 1 −1
2ℎ
– Number of iterations required if the final length
should be 𝐿 is (why?)
ℎ = −ln (𝐿−2𝜖
𝑏−𝑎−2𝜖)/ ln 2
– For each iteration, the objective function is
evaluated 2 times (in 𝑥ℎ and 𝑦ℎ) → in total 2ℎ
evaluations
spring 2014 TIES483 Nonlinear optimization
Golden section
Assume that we want to separate a sub interval (length 𝑦) from an interval of length 𝐿 such that
𝐿
𝑦=
𝑦
𝐿−𝑦
Then, 𝑦 =5−1
2𝐿 ≈ 0.618𝐿
It is said that now the interval is divided in the ratio of golden section
Theorem Divide an interval [𝑎, 𝑏] in the ratio of golden section first from right (point 𝑑) and then from left (point 𝑐). Then point 𝑐 divides the interval [𝑎, 𝑑] in the ratio of golden section and point 𝑑 does the same for [𝑐, 𝑏].
spring 2014 TIES483 Nonlinear optimization
𝑎 𝑏 𝑐 𝑑
Golden section search
Elimination method, known also as Fibonacci search. Let
𝐶 =5−1
2.
1) Choose an allowable length 𝐿 > 0 for the final interval. Let [𝑎1, 𝑏1] be the original (unimodal) interval. Set 𝑥1 = 𝑎1 +1 − 𝐶 𝑏1 − 𝑎1 = 𝑏1 − 𝐶(𝑏1 − 𝑎1) and 𝑦1 = 𝑎1 + 𝐶 𝑏1 − 𝑎1 .
Compute 𝑓(𝑥1) and 𝑓(𝑦1). Set ℎ = 1.
2) If 𝑏ℎ − 𝑎ℎ < 𝐿, stop. Minimizer 𝑥∗ ∈ [𝑎ℎ , 𝑏ℎ]. Otherwise, if 𝑓 𝑥ℎ ≤ 𝑓(𝑦ℎ) go to step 4).
3) Set 𝑎ℎ+1 = 𝑥ℎ and 𝑏ℎ+1 = 𝑏ℎ. Further set 𝑥ℎ+1 = 𝑦ℎ and 𝑦ℎ+1 = 𝑎ℎ+1 + 𝐶(𝑏ℎ+1 − 𝑎ℎ+1). Compute 𝑓(𝑦ℎ+1) and go to step 5).
4) Set 𝑎ℎ+1 = 𝑎ℎ and 𝑏ℎ+1 = 𝑦ℎ. Further set 𝑦ℎ+1 = 𝑥ℎ and 𝑥ℎ+1 = 𝑎ℎ+1 + (1 − 𝐶)(𝑏ℎ+1 − 𝑎ℎ+1). Compute 𝑓(𝑥ℎ+1).
5) Set ℎ = ℎ + 1 and go to step 2).
spring 2014 TIES483 Nonlinear optimization
𝑎ℎ 𝑏h 𝑥ℎ 𝑦ℎ
Golden section search (cont.)
Efficiency – Length of the interval after ℎ iterations is
𝐶ℎ(𝑏 − 𝑎)
– Number of iterations required if the final length should be 𝐿 is (why?)
ℎ = ln (𝐿
𝑏−𝑎)/ ln 𝐶
– For each iteration (except the last), the objective function is evaluated one time (in 𝑥ℎ+1 or 𝑦ℎ+1) plus in the beginning in two points (𝑥1 and 𝑦1) → in total ℎ + 1 evaluations
spring 2014 TIES483 Nonlinear optimization
Quadratic interpolation
Idea is to approximate 𝑓 with a quadratic polynomial whose minimizer is known
Taylor’s second order polynomial is used: 𝑝 𝑥 = 𝑓 𝑥ℎ + 𝑓′ 𝑥ℎ 𝑥 − 𝑥ℎ +
1
2𝑓′′ 𝑥ℎ 𝑥 − 𝑥ℎ 2
If 𝑓′′ 𝑥ℎ ≠ 0, then 𝑝(𝑥) has a critical point in 𝑥ℎ+1
when 𝑝′(𝑥ℎ+1) = 0 → 𝑥ℎ+1 = 𝑥ℎ −𝑓′(𝑥ℎ)
𝑓′′(𝑥ℎ)
Newton’s method for solving 𝑓′ 𝑥 = 0!!!
Interpolation can also be applied in the case where no derivatives are available (find out the idea by yourself)
spring 2014 TIES483 Nonlinear optimization
Programming assignment
Form the pairs!!!
Start programming by implementing some line
search method
Any programming language is ok
Test your implementation with some
optimization problems where you know the
minimizer
spring 2014 TIES483 Nonlinear optimization
Topic of the lectures on January 20th
& 22nd
Mon, Jan 20th: unconstrained optimization with multiple variables, optimality conditions and methods that don’t utilize gradient information (=direct search methods)
Wed, Jan 22nd: methods that utilize gradient information
Study this before the lecture!
Questions to be considered – What kind of optimality conditions there exist?
– What kind of techniques direct search methods use to find a local minimizer?
– How gradient information is utilized?
spring 2014 TIES483 Nonlinear optimization