Basic Optimization review

Preview:

Citation preview

The Books

Arora’s text is the main book of the course. It isa very practical guide to optimization forengineering applications. It contains enoughtheory to explain what is going on, but not morethan that.

For people wishing to develop their ownoptimization software, the best reference is: G.N.Vanderplaats: “Numerical OptimizationTechniques for Engineering Design”. It focusesentirely on algorithms.

Haftka, Gürdal & Kamat: “Elements of StructuralOptimization” has both theory and applications,but is mostly directed towards structures.

Various papers and notes will be available fromhttp://www.ime.auc.dk/~no for download.

The Books (cont’d)

The famous book “NumericalRecipes” by Press et.al. has beenreleased on the internet at theaddress http://www.nr.com/. Itoffers many algorithms that areuseful for optimization purposes.

Matematical programmingThe problem below is solvable if you have a model, typicallya computer model, that can compute functions g for given

values of vector x

Minimizego(x) , x = {x1 , x2 , ... , xn}

Subject togi(x) � Gi , i = 1..m

Objective function

As many constraints as necessaryThe design space

Optimization- graphical interpretation

x

x

1

2 g (x) = G

1 1

2 2

g (x) = G

Well-posed designproblems can havemany constraints butnever more than oneobjective!

Constrainedoptimum

Unconstrainedoptimum

Feasibledomain

DefinitionsGlobal minimum

x

f(x)

x

f(x)

Global minimum: f(x*) �f(x) for all x.

We can have many globalminima, but they must allbe equally good.

Strict global minimum:f(x*) < f(x) for all x.

Local minimum: f(x*)� f(x+�) where � is asmall number.

DefinitionsLocal minimum

x

f(x)

x*

If f(x) is continuous in a closed and limited set, S,then f has a global minimum in S.

This sounds trivial, but it is important. It gives ushope. We know there is something to look for.

It is also not as simple as it sounds. Remember thatthe set must be closed. This can lead to manystrange problems in practical optimization.

Topology optimization is a good example.

Existence of the solutionWeierstrass’ theorem

A closed set: Lj � xj � Uj

An open set: Lj < xj < Uj

DefinitionsOpen and closed sets

f(x) = -1/x.

If 0 < x � 2 then the interval isopen

If 0 � x � 2 then the interval isclosed, but f is undefined forx=0.

Necessary condition:

f’(x) = 0

Sufficient condition:

f’’(x) > 0

This will identify only local optima.For general functions, there are noconditions that will ensure a globaloptimum.

Optimality conditions1-D problems

x

f(x)

Necessary condition:

for ∂∂

f

xj n

j

= =0 1, ..

Sufficient condition: The Hessian

must be positive definite.

H

fx

fx x

fx x

fx

symmf

x

n

n

=

����������

����������

∂∂

∂∂ ∂

∂∂ ∂

∂∂

∂∂

2

12

2

1 2

2

12

22

2

2

. .

.

.

.

Optimality conditionsMulti-dimensional problems without constraints

x

x

1

2

�When the stationarypoint is a maximum.

�When the stationarypoint is a saddle point.

�When the stationarypoint is a local optimum

The necessary condition(s)- identify stationary points. When is a stationary point not

the solution we are looking for?

x

f(x)

A convex function is onethat has positive definiteHessian everywhere.

A convex function has onlyone minimum - the globalone.

In 1D a convex function isone that everywhere haspositive second derivative.

ConvexityWhen can we be sure to find an optimum?

x

f(x)

Convex

In 2D, a convex function forms a surface that “curves upwards”everywhere.

Convexity (cont’d)In two dimensions

Convex Non-convex

For any pair of points, (P,Q) belonging to S, a straight lineconnecting P and Q will be completely contained in S.

This applies in any number of dimensions.

Convex setsFor a convex set, S, the following is true:

P Q

Convex

P Q

Non-convex

� If the objective function is convex, and the feasible domain is a convex set,then the optimization problem is convex.

� If all the constraint functions are convex, then the feasible domain isconvex.

�Convex optimization problems have only one optimum - the global one. This is very algorithmically convenient. If we have found a stationary point,then we know that it is the global solution. The necessary conditions arealso sufficient.

�There are no good algorithms for treatment of non-convex problems. Mostalgorithms assume that the problem is convex. Many problems are not, sobeware!

� It is usually very difficult to check if a function is convex. If the function isimplicit, then it is impossible. A good understanding of the physical natureof the problem is usually very helpful.

�Linear problems are always convex.

Convex optimization problems

Optimization algorithmsWe shall develop an algorithm for general constrained

optimization in multiple dimensions.

Constrained problems in multiple dimensions

Unconstrained problems in multiple dimensions

Algorithms for 1-D minimization

Today, we develop and implement a 1-D algorithm. It is important that youfinish the work for every lecture. The algorithm of each new lecture is built ontop of the previous.

�Golden section search: 0th order

�Bisection: 1st order

�Brent, polynomial interpolation: 2nd order

1D algorithmsCategoriztion by order

We assume that a functionf(x) is given, and we want tofind its minimum in theinterval [A,B].

We also assume that is isexpensive to compute f(x). So we must find theminimum with the leastpossible number of functionevaluations.

The function is implicit - wedon’t know what the graphlooks like.

1-D minimization problem- definition

x

f(x)

A B

The idea behind goldensection is to successivelyprune the interval for partsthat do no contain theminimum.

This way, the remaininginterval shrinks until it is sosmall that we havedetermined the location ofthe minimum with sufficientaccuracy.

Golden Section Search-a 0th order algorithm

x

f(x)

A B

It turns out that, if we don’thave gradient information,then we need two functionevaluations before we canidentify an interval that doesnot contain the minimum.

Golden Section Search- computing function values

x

f(x)

A B� �

We don’t know the graph ofthe function, but based onthe function values in � and�, and the assumption thatwe have only one minimumin the interval, we candeduce that the minimumcannot be to the right of �.

So we prune that part.

Golden Section Search (cont’d)- pruning

x

f(x)

A B� �

We could continue like this,pruning the interval until itgets small enough.

We would have to computetwo new function values foreach pruning.

Is there a way to save someof these functionevaluations?

Golden Section Search (cont’d)- development

x

f(x)

A B� �

τ ττ

τ τ

τ τ

τ

I I

I I

I I I

k k

k k

k k k

( ) ( )

( ) ( )

( ) ( ) ( )

( )

..

+

+

= −

=

���

��

= −

+ − =

= = =

1

1

2

2

1

1 0

0 6181

1618The Golden Section

If we position � and � carefully,then we can make sure that the �of one iteration becomes the � ofthe next and vice versa.

Golden Section Search (cont’d)- re-using function values

�(k)�(k)

�(k+1)�(k+1)

�(k+2)

I(k)

I(k+1) = � I(k)

I(k+1) = �2 I(k)

�The idea of the golden section dates back to Pythagoras andlater Italian mathematician Fibonacchi (1202).

� In the middle ages, the golden section became a measure ofaesthetics adobted by many humanists and architects. Itdefines a rhythm of shape that pleases the eye.

�Composition of paintings, sculptures and buildings was doneaccording to the golden section.

�The golden section played an important role also to thecubists in the beginning og the 20th century.

The Golden Section- the idea

�Set a search interval, and initialize � and �. Compute f(�)and f(�).

�Choose the right or left interval to prune.

� If the right hand interval is removed, set �:=� and f(�):=f(�). Compute new � and f(�).

� If the left hand interval is removed, set �:=� and f(�):=f(�). Compute new � and f(�).

� If the interval is small enough, stop the algorithm and returnthe happy news.

�Repeat the iteration.

Golden section (cont’d)- the algorithm

�Each iteration removes 1 - 0.618 = 38% of theinterval.

�After n iterations, the interval is reduced to 0.618n

times its original size.

� If n is 10, less than 1% of the original intervalremains. If n=15, less than 1‰ remains.

�The algorithm is rock-solid stable. It removes acertain fraction of the interval each time, and itrequires only that the function is unimodal (has oneminimum) in the interval.

Golden section (cont’d)- properties

If we have gradientinformation, then we know towhich side of a computedfunction value, the functiondecreases.

In that case, we can cut theinterval in half each time andobtain faster convergence.

The bisection method- cutting the interval in half

x

f(x)

A B�

�Faster convergence. After 10 iterations,about 1‰ of the interval is left.

�We need gradient information. It usuallyrequires more computation, but it cansometimes come very cheap. More aboutthat in a later lecture.

�When we rely on gradients, then we alsoassume that the function is differentiable. Golden section does not have thisrequirement.

�This method is less robust than goldensection.

The bisection method- pros and cons

x

f(x)

A B�

We compute the functionvalues in the end points anda point in the middle.

We fit a parabola through thethree points.

We analytically determinethe minimum of theparabola.

We let the new point replacethe worst of the previousones and repeat untilconvergence.

Polynomial Interpolation- fast and delicate like an old sports car

x

f(x)

A B�

�Convergence is very fast if the functionbehaves nicely.

�No gradients required.

�Only one function evaluation for each newiteration

�The algorithm is very sensitive to non-convexfunctions.

�The algorithm requires 2nd orderdifferentiability.

�The algorithm may diverge completely.

Polynomial Interpolation- pros and cons

x

f(x)

A B�

Not advisable for use in general algorithms, but very useful for specialapplications.

x x dk k k k( ) ( ) ( ) ( )+ = +1 α

�Choose a search direction, d(k)

�Minimize along the searchdirection (by golden section). Step = �(k) d(k).

�Repeat until convergence

Unconstrained minimization- in multiple dimensions

x

x

1

2

df

fk( ) = −

∇∇

The obvious choice whenminimizing a function is to choosethe path that goes as muchdownhill as possible. Thisalgorithm is known as “steepestdescent”.

Some people call this type ofalgorithm “greedy”.

In real life, being greedy is oftenonly profitable in the short term.

This also applies in optimization.

Choice of directionSteepest descent.

x

x

1

2

It is possible to show mathematically, thateach new direction in steepest descent isperpendicular to the previous one.

This means that the algorithmapproaches the optimum using only veryfew directions.

In 2-D, only two different directions areused.

The steps in each direction tend to getsmaller for each iteration.

They may become so small that thealgorithm thinks too soon that it hasconverged.

In any case, convergence can be veryslow.

Steepest descentZig-zagging by nature

x

x

1

2

On problems with “similar scales” in the different variable directions,steepest descent often works well.

If the level curves are circular, then the optimum is found in the firstchosen direction.

If the level curves are “longish”, then the algorithm typically requires manyiterations.

Steepest descent- may work well or terribly

d f x d

f x

f x

k k k k

kk

k

( ) ( ) ( ) ( )

( )( )

( )

( )

( )

( )

= − ∇ +

=∇

∇�

��

��

β

β

1

1

2

The conjugate gradient method can be seenas a way of detecting and eliminating zig-zagging. It also has more subtlemathematical explanations, but we don’thave to worry much about those.

The search direction is computed by theformula:

The conjugate gradient methodEvening the zig-zags

x

x

1

2

d f x d

f x

f x

k k k k

kk

k

( ) ( ) ( ) ( )

( )( )

( )

( )

( )

( )

= − ∇ +

=∇

∇�

��

��

β

β

1

1

2

We know that the gradient vanishes at theoptimum. This means, that if the process is goingwell, then the gradient gets smaller for eachiteration. If this is true, then �(k) is a small number,and we don’t get much correction from thesteepest descent method.

If the gradient does not get smaller, then we needmore correction, and this is precisely what we get.

The conjugate gradient methodWhy it works

x

x

1

2

Gradient zero

Correction

Steepest descent

There is a very easy way of making sure that an optimization process stayswithin a defined set of constraints: tax eventual violations.

This is the basis of the so-called penalty methods.

They are also called “transformation methods” because they replace theoriginal constrained problem with an equivalent one without constraints.

The transformed problem can then be solved by an unconstrainedalgorithm.

Penalty methods- a poor man’s approach to constrained optimization

( )

Minimize

subject to

is converted to

Minimize

f

g

f r P

i

( )

( )

( ) ( ) ( )

x

x

x x g x

= + ⋅

0

φ

Consider the problem to the left. We want to minimize F(x) providedg1 and g2 are negative.

Golden section would solve thisright away, but for the sake of theargument, let us just assume thatwe cannot impose the constraints.

Instead, we can penalize them.

Penalty methods- basic idea

So we replace F by a new function,�, which is constructed so that itincreases rapidly when a constraintis violated.

Minimizing � will almost give us thesolution to the original problem.

There are two types of penalization:

- Exterior (a tax)

- Interior (capital punishment)

Penalty methods- penalization

( )[ ]φ ( ) ( ) max , ( )x x x= +�

��

��

=�f r gii

m

02

1

This penalty does not come into play untila constraint has been violated.

The severeness of the penalty dependson the penalty factor, r.

Small values of r will cause constraintviolations. Large values will make theproblem difficult to solve because thefunction gets sharp kinks.

The acceptable r values are problem-dependent. It is a good idea to make thefunctions dimensionless.

Exterior penalty- the mild form

Original problem: Linear objectivefunction and two constraints in twodimensions.

Exterior penalty- examples

Optimum

Penalized problem, r = 0.05.

Notice that the optimum falls quite farfrom the solution to the originalproblem.

Exterior penalty- example

Penalized problems, r=0.1 and r = 1.0.

The optimumapproaches thesolution to the originalproblem but neverreaches it completely.

The level curves getsharper edges andthe problem becomesmore difficult to solvenumerically.

Exterior penalty- examples

�A penalty term is added only after constraint violation

�The objective function inside the feasible domain is unaffected

�The pseudo objective function is defined everywhere the original functionis. We don’t need a feasible point to get started.

�The solution always falls slightly outside the feasible domain of the originalproblem. Notice that the original problem may be undefined outside thefeasible domain.

� Increasing the penalty brings the solution closer to the solution to the realproblem, but it also makes the problem more difficult to solve numerically.

� It handles equality- as well as inequality constraints.

Exterior penalty- properties

φ ( ) ( )( )

x xx

= + −�

��

��

=�f r

gii

m 1

1

This penalty is always present, and itreally kicks in when a constraint isapproached.

The penalty goes to infinity at theconstraint.

The severeness of the penalty dependson the penalty factor, r.

Small values of r will cause the constraintto kick in late but suddently as weapproach a constraint.

The penalty is -infinity (!) right outsidethe constraint.

Interior penalty- capital punishment

�The penalty is always present.

�The pseudo objective function is undefined at the constraints and goes toinfinity outside. If the algorithm happens to violate a constraint, thenchances are that it will never return to the feasible domain.

�We need a feasible point to start the algorithm.

�The solution always falls slightly inside the feasible domain of the originalproblem. This means that all solutions are usable.

� Increasing the penalty brings the solution closer to the solution to the realproblem, but it also makes the problem more difficult to solve numerically.

� It handles only inequality constraints.

Interior penalty- properties

�Penalty methods are “cheap and dirty” solutions toconstrained optimization.

�They are problem-dependent and may be difficult to apply.

�They are not suitable for general applications, but they maysuffice for special purposes.

�The Augmented Lagrangian method is a further developmentof penalty methods. It is only slightly more complicated andit does away with many of the problems of interior andexterior penalties.

Penalty methods- properties in general

Constrained Nonlinear Optimization

SLP

SQP

Method of Feasible Direction

Gradient Projection method