48
The Books Arora’s text is the main book of the course. It is a very practical guide to optimization for engineering applications. It contains enough theory to explain what is going on, but not more than that. For people wishing to develop their own optimization software, the best reference is: G.N. Vanderplaats: “Numerical Optimization Techniques for Engineering Design”. It focuses entirely on algorithms. Haftka, Gürdal & Kamat: “Elements of Structural Optimization” has both theory and applications, but is mostly directed towards structures. Various papers and notes will be available from http://www.ime.auc.dk/~no for download.

Basic Optimization review

  • View
    106

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic Optimization review

The Books

Arora’s text is the main book of the course. It isa very practical guide to optimization forengineering applications. It contains enoughtheory to explain what is going on, but not morethan that.

For people wishing to develop their ownoptimization software, the best reference is: G.N.Vanderplaats: “Numerical OptimizationTechniques for Engineering Design”. It focusesentirely on algorithms.

Haftka, Gürdal & Kamat: “Elements of StructuralOptimization” has both theory and applications,but is mostly directed towards structures.

Various papers and notes will be available fromhttp://www.ime.auc.dk/~no for download.

Page 2: Basic Optimization review

The Books (cont’d)

The famous book “NumericalRecipes” by Press et.al. has beenreleased on the internet at theaddress http://www.nr.com/. Itoffers many algorithms that areuseful for optimization purposes.

Page 3: Basic Optimization review

Matematical programmingThe problem below is solvable if you have a model, typicallya computer model, that can compute functions g for given

values of vector x

Minimizego(x) , x = {x1 , x2 , ... , xn}

Subject togi(x) � Gi , i = 1..m

Objective function

As many constraints as necessaryThe design space

Page 4: Basic Optimization review

Optimization- graphical interpretation

x

x

1

2 g (x) = G

1 1

2 2

g (x) = G

Well-posed designproblems can havemany constraints butnever more than oneobjective!

Constrainedoptimum

Unconstrainedoptimum

Feasibledomain

Page 5: Basic Optimization review

DefinitionsGlobal minimum

x

f(x)

x

f(x)

Global minimum: f(x*) �f(x) for all x.

We can have many globalminima, but they must allbe equally good.

Strict global minimum:f(x*) < f(x) for all x.

Page 6: Basic Optimization review

Local minimum: f(x*)� f(x+�) where � is asmall number.

DefinitionsLocal minimum

x

f(x)

x*

Page 7: Basic Optimization review

If f(x) is continuous in a closed and limited set, S,then f has a global minimum in S.

This sounds trivial, but it is important. It gives ushope. We know there is something to look for.

It is also not as simple as it sounds. Remember thatthe set must be closed. This can lead to manystrange problems in practical optimization.

Topology optimization is a good example.

Existence of the solutionWeierstrass’ theorem

Page 8: Basic Optimization review

A closed set: Lj � xj � Uj

An open set: Lj < xj < Uj

DefinitionsOpen and closed sets

f(x) = -1/x.

If 0 < x � 2 then the interval isopen

If 0 � x � 2 then the interval isclosed, but f is undefined forx=0.

Page 9: Basic Optimization review

Necessary condition:

f’(x) = 0

Sufficient condition:

f’’(x) > 0

This will identify only local optima.For general functions, there are noconditions that will ensure a globaloptimum.

Optimality conditions1-D problems

x

f(x)

Page 10: Basic Optimization review

Necessary condition:

for ∂∂

f

xj n

j

= =0 1, ..

Sufficient condition: The Hessian

must be positive definite.

H

fx

fx x

fx x

fx

symmf

x

n

n

=

����������

����������

∂∂

∂∂ ∂

∂∂ ∂

∂∂

∂∂

2

12

2

1 2

2

12

22

2

2

. .

.

.

.

Optimality conditionsMulti-dimensional problems without constraints

x

x

1

2

Page 11: Basic Optimization review

�When the stationarypoint is a maximum.

�When the stationarypoint is a saddle point.

�When the stationarypoint is a local optimum

The necessary condition(s)- identify stationary points. When is a stationary point not

the solution we are looking for?

x

f(x)

Page 12: Basic Optimization review

A convex function is onethat has positive definiteHessian everywhere.

A convex function has onlyone minimum - the globalone.

In 1D a convex function isone that everywhere haspositive second derivative.

ConvexityWhen can we be sure to find an optimum?

x

f(x)

Convex

Page 13: Basic Optimization review

In 2D, a convex function forms a surface that “curves upwards”everywhere.

Convexity (cont’d)In two dimensions

Convex Non-convex

Page 14: Basic Optimization review

For any pair of points, (P,Q) belonging to S, a straight lineconnecting P and Q will be completely contained in S.

This applies in any number of dimensions.

Convex setsFor a convex set, S, the following is true:

P Q

Convex

P Q

Non-convex

Page 15: Basic Optimization review

� If the objective function is convex, and the feasible domain is a convex set,then the optimization problem is convex.

� If all the constraint functions are convex, then the feasible domain isconvex.

�Convex optimization problems have only one optimum - the global one. This is very algorithmically convenient. If we have found a stationary point,then we know that it is the global solution. The necessary conditions arealso sufficient.

�There are no good algorithms for treatment of non-convex problems. Mostalgorithms assume that the problem is convex. Many problems are not, sobeware!

� It is usually very difficult to check if a function is convex. If the function isimplicit, then it is impossible. A good understanding of the physical natureof the problem is usually very helpful.

�Linear problems are always convex.

Convex optimization problems

Page 16: Basic Optimization review

Optimization algorithmsWe shall develop an algorithm for general constrained

optimization in multiple dimensions.

Constrained problems in multiple dimensions

Unconstrained problems in multiple dimensions

Algorithms for 1-D minimization

Today, we develop and implement a 1-D algorithm. It is important that youfinish the work for every lecture. The algorithm of each new lecture is built ontop of the previous.

Page 17: Basic Optimization review

�Golden section search: 0th order

�Bisection: 1st order

�Brent, polynomial interpolation: 2nd order

1D algorithmsCategoriztion by order

Page 18: Basic Optimization review

We assume that a functionf(x) is given, and we want tofind its minimum in theinterval [A,B].

We also assume that is isexpensive to compute f(x). So we must find theminimum with the leastpossible number of functionevaluations.

The function is implicit - wedon’t know what the graphlooks like.

1-D minimization problem- definition

x

f(x)

A B

Page 19: Basic Optimization review

The idea behind goldensection is to successivelyprune the interval for partsthat do no contain theminimum.

This way, the remaininginterval shrinks until it is sosmall that we havedetermined the location ofthe minimum with sufficientaccuracy.

Golden Section Search-a 0th order algorithm

x

f(x)

A B

Page 20: Basic Optimization review

It turns out that, if we don’thave gradient information,then we need two functionevaluations before we canidentify an interval that doesnot contain the minimum.

Golden Section Search- computing function values

x

f(x)

A B� �

Page 21: Basic Optimization review

We don’t know the graph ofthe function, but based onthe function values in � and�, and the assumption thatwe have only one minimumin the interval, we candeduce that the minimumcannot be to the right of �.

So we prune that part.

Golden Section Search (cont’d)- pruning

x

f(x)

A B� �

Page 22: Basic Optimization review

We could continue like this,pruning the interval until itgets small enough.

We would have to computetwo new function values foreach pruning.

Is there a way to save someof these functionevaluations?

Golden Section Search (cont’d)- development

x

f(x)

A B� �

Page 23: Basic Optimization review

τ ττ

τ τ

τ τ

τ

I I

I I

I I I

k k

k k

k k k

( ) ( )

( ) ( )

( ) ( ) ( )

( )

..

+

+

= −

=

���

��

= −

+ − =

= = =

1

1

2

2

1

1 0

0 6181

1618The Golden Section

If we position � and � carefully,then we can make sure that the �of one iteration becomes the � ofthe next and vice versa.

Golden Section Search (cont’d)- re-using function values

�(k)�(k)

�(k+1)�(k+1)

�(k+2)

I(k)

I(k+1) = � I(k)

I(k+1) = �2 I(k)

Page 24: Basic Optimization review

�The idea of the golden section dates back to Pythagoras andlater Italian mathematician Fibonacchi (1202).

� In the middle ages, the golden section became a measure ofaesthetics adobted by many humanists and architects. Itdefines a rhythm of shape that pleases the eye.

�Composition of paintings, sculptures and buildings was doneaccording to the golden section.

�The golden section played an important role also to thecubists in the beginning og the 20th century.

The Golden Section- the idea

Page 25: Basic Optimization review

�Set a search interval, and initialize � and �. Compute f(�)and f(�).

�Choose the right or left interval to prune.

� If the right hand interval is removed, set �:=� and f(�):=f(�). Compute new � and f(�).

� If the left hand interval is removed, set �:=� and f(�):=f(�). Compute new � and f(�).

� If the interval is small enough, stop the algorithm and returnthe happy news.

�Repeat the iteration.

Golden section (cont’d)- the algorithm

Page 26: Basic Optimization review

�Each iteration removes 1 - 0.618 = 38% of theinterval.

�After n iterations, the interval is reduced to 0.618n

times its original size.

� If n is 10, less than 1% of the original intervalremains. If n=15, less than 1‰ remains.

�The algorithm is rock-solid stable. It removes acertain fraction of the interval each time, and itrequires only that the function is unimodal (has oneminimum) in the interval.

Golden section (cont’d)- properties

Page 27: Basic Optimization review

If we have gradientinformation, then we know towhich side of a computedfunction value, the functiondecreases.

In that case, we can cut theinterval in half each time andobtain faster convergence.

The bisection method- cutting the interval in half

x

f(x)

A B�

Page 28: Basic Optimization review

�Faster convergence. After 10 iterations,about 1‰ of the interval is left.

�We need gradient information. It usuallyrequires more computation, but it cansometimes come very cheap. More aboutthat in a later lecture.

�When we rely on gradients, then we alsoassume that the function is differentiable. Golden section does not have thisrequirement.

�This method is less robust than goldensection.

The bisection method- pros and cons

x

f(x)

A B�

Page 29: Basic Optimization review

We compute the functionvalues in the end points anda point in the middle.

We fit a parabola through thethree points.

We analytically determinethe minimum of theparabola.

We let the new point replacethe worst of the previousones and repeat untilconvergence.

Polynomial Interpolation- fast and delicate like an old sports car

x

f(x)

A B�

Page 30: Basic Optimization review

�Convergence is very fast if the functionbehaves nicely.

�No gradients required.

�Only one function evaluation for each newiteration

�The algorithm is very sensitive to non-convexfunctions.

�The algorithm requires 2nd orderdifferentiability.

�The algorithm may diverge completely.

Polynomial Interpolation- pros and cons

x

f(x)

A B�

Not advisable for use in general algorithms, but very useful for specialapplications.

Page 31: Basic Optimization review

x x dk k k k( ) ( ) ( ) ( )+ = +1 α

�Choose a search direction, d(k)

�Minimize along the searchdirection (by golden section). Step = �(k) d(k).

�Repeat until convergence

Unconstrained minimization- in multiple dimensions

x

x

1

2

Page 32: Basic Optimization review

df

fk( ) = −

∇∇

The obvious choice whenminimizing a function is to choosethe path that goes as muchdownhill as possible. Thisalgorithm is known as “steepestdescent”.

Some people call this type ofalgorithm “greedy”.

In real life, being greedy is oftenonly profitable in the short term.

This also applies in optimization.

Choice of directionSteepest descent.

x

x

1

2

Page 33: Basic Optimization review

It is possible to show mathematically, thateach new direction in steepest descent isperpendicular to the previous one.

This means that the algorithmapproaches the optimum using only veryfew directions.

In 2-D, only two different directions areused.

The steps in each direction tend to getsmaller for each iteration.

They may become so small that thealgorithm thinks too soon that it hasconverged.

In any case, convergence can be veryslow.

Steepest descentZig-zagging by nature

x

x

1

2

Page 34: Basic Optimization review

On problems with “similar scales” in the different variable directions,steepest descent often works well.

If the level curves are circular, then the optimum is found in the firstchosen direction.

If the level curves are “longish”, then the algorithm typically requires manyiterations.

Steepest descent- may work well or terribly

Page 35: Basic Optimization review

d f x d

f x

f x

k k k k

kk

k

( ) ( ) ( ) ( )

( )( )

( )

( )

( )

( )

= − ∇ +

=∇

∇�

��

��

β

β

1

1

2

The conjugate gradient method can be seenas a way of detecting and eliminating zig-zagging. It also has more subtlemathematical explanations, but we don’thave to worry much about those.

The search direction is computed by theformula:

The conjugate gradient methodEvening the zig-zags

x

x

1

2

Page 36: Basic Optimization review

d f x d

f x

f x

k k k k

kk

k

( ) ( ) ( ) ( )

( )( )

( )

( )

( )

( )

= − ∇ +

=∇

∇�

��

��

β

β

1

1

2

We know that the gradient vanishes at theoptimum. This means, that if the process is goingwell, then the gradient gets smaller for eachiteration. If this is true, then �(k) is a small number,and we don’t get much correction from thesteepest descent method.

If the gradient does not get smaller, then we needmore correction, and this is precisely what we get.

The conjugate gradient methodWhy it works

x

x

1

2

Gradient zero

Correction

Steepest descent

Page 37: Basic Optimization review

There is a very easy way of making sure that an optimization process stayswithin a defined set of constraints: tax eventual violations.

This is the basis of the so-called penalty methods.

They are also called “transformation methods” because they replace theoriginal constrained problem with an equivalent one without constraints.

The transformed problem can then be solved by an unconstrainedalgorithm.

Penalty methods- a poor man’s approach to constrained optimization

Page 38: Basic Optimization review

( )

Minimize

subject to

is converted to

Minimize

f

g

f r P

i

( )

( )

( ) ( ) ( )

x

x

x x g x

= + ⋅

0

φ

Consider the problem to the left. We want to minimize F(x) providedg1 and g2 are negative.

Golden section would solve thisright away, but for the sake of theargument, let us just assume thatwe cannot impose the constraints.

Instead, we can penalize them.

Penalty methods- basic idea

Page 39: Basic Optimization review

So we replace F by a new function,�, which is constructed so that itincreases rapidly when a constraintis violated.

Minimizing � will almost give us thesolution to the original problem.

There are two types of penalization:

- Exterior (a tax)

- Interior (capital punishment)

Penalty methods- penalization

Page 40: Basic Optimization review

( )[ ]φ ( ) ( ) max , ( )x x x= +�

��

��

=�f r gii

m

02

1

This penalty does not come into play untila constraint has been violated.

The severeness of the penalty dependson the penalty factor, r.

Small values of r will cause constraintviolations. Large values will make theproblem difficult to solve because thefunction gets sharp kinks.

The acceptable r values are problem-dependent. It is a good idea to make thefunctions dimensionless.

Exterior penalty- the mild form

Page 41: Basic Optimization review

Original problem: Linear objectivefunction and two constraints in twodimensions.

Exterior penalty- examples

Optimum

Page 42: Basic Optimization review

Penalized problem, r = 0.05.

Notice that the optimum falls quite farfrom the solution to the originalproblem.

Exterior penalty- example

Page 43: Basic Optimization review

Penalized problems, r=0.1 and r = 1.0.

The optimumapproaches thesolution to the originalproblem but neverreaches it completely.

The level curves getsharper edges andthe problem becomesmore difficult to solvenumerically.

Exterior penalty- examples

Page 44: Basic Optimization review

�A penalty term is added only after constraint violation

�The objective function inside the feasible domain is unaffected

�The pseudo objective function is defined everywhere the original functionis. We don’t need a feasible point to get started.

�The solution always falls slightly outside the feasible domain of the originalproblem. Notice that the original problem may be undefined outside thefeasible domain.

� Increasing the penalty brings the solution closer to the solution to the realproblem, but it also makes the problem more difficult to solve numerically.

� It handles equality- as well as inequality constraints.

Exterior penalty- properties

Page 45: Basic Optimization review

φ ( ) ( )( )

x xx

= + −�

��

��

=�f r

gii

m 1

1

This penalty is always present, and itreally kicks in when a constraint isapproached.

The penalty goes to infinity at theconstraint.

The severeness of the penalty dependson the penalty factor, r.

Small values of r will cause the constraintto kick in late but suddently as weapproach a constraint.

The penalty is -infinity (!) right outsidethe constraint.

Interior penalty- capital punishment

Page 46: Basic Optimization review

�The penalty is always present.

�The pseudo objective function is undefined at the constraints and goes toinfinity outside. If the algorithm happens to violate a constraint, thenchances are that it will never return to the feasible domain.

�We need a feasible point to start the algorithm.

�The solution always falls slightly inside the feasible domain of the originalproblem. This means that all solutions are usable.

� Increasing the penalty brings the solution closer to the solution to the realproblem, but it also makes the problem more difficult to solve numerically.

� It handles only inequality constraints.

Interior penalty- properties

Page 47: Basic Optimization review

�Penalty methods are “cheap and dirty” solutions toconstrained optimization.

�They are problem-dependent and may be difficult to apply.

�They are not suitable for general applications, but they maysuffice for special purposes.

�The Augmented Lagrangian method is a further developmentof penalty methods. It is only slightly more complicated andit does away with many of the problems of interior andexterior penalties.

Penalty methods- properties in general

Page 48: Basic Optimization review

Constrained Nonlinear Optimization

SLP

SQP

Method of Feasible Direction

Gradient Projection method