Block 4 Nonlinear Systems Lesson 12 – Optimizationacademic.udayton.edu/charlesebeling/MSC521/PDF_PPT Files/Classical...4 6 12 100 50 0 62 14120300 2 4( ) 16 80 20 0 f xx xx x x f

Classical OptimizationChapter 20

A decision without optimization is like an arch without a keystone

- olde English saying circa 1886

The World Is Not Linear!

Classical Optimization

Uses differential calculus to determine points of maxima and minima (extrema).Underlying theory provides the basis for most nonlinear programming algorithms Objective – develop necessary and sufficient conditions for determining unconstrained extrema.

The General Optimization Problem

1 2

1 2

Max/Min ( , ,..., )subj to :

( , ,..., ) , 1,2,...,

n

i n i

f x x x

g x x x b i m≤⎧ ⎫⎪ ⎪= =⎨ ⎬⎪ ⎪≥⎩ ⎭

where f, g1, …,gm are real-valued functions

Mathematical Programming

Linear Integer Nonlinear Dynamic

unconstrained constrained

single multi- equality inequalityvariable variable constraints constraints

Lagrangianmultipliers

Karush – Kuhn -Tucker conditions

I am quite interested in these nonlinear programs. Can you tell me more?

The Unconstrained problem

Local Extremalocal min: x0 is a local minimum (maximum) if for an arbitrarysmall neighborhood, N, about x0, f(x0) ≤ (≥) f(x) for all x in N.

x

f(x)

x0

N

x0

N

Global Extremaglobal min: x* is a global min (max) if f(x*) ≤ (>=) f(x) for all x such that a ≤ x ≤ b.

x

f(x)

abx*

global min global max

The Problem – finding the global

localmax

localmin

unbounded

x

f(x)

a bclosed interval

globalmin

globalmax

x

x

x

f(x)+ -

+

( )d f xdx

2

2

( )d f xdx

concave convex

stationary point stationary

point

Animated

All you wanted to know about Inflection Points

f(x) changes from concave to convex (or convex to concave)f’(x) achieves a maximum or minimum; f’(x) may be zerof’’(x) = 0 and f’’(x) changes sign - f’(x) goes from decreasing to an an increasing function (or vice-versa)f’’’(x) ≠ 0

counter example:4 3 2( ) ; '( ) 4 ; ''( ) 12 ; '''( ) 24f x x f x x f x x f x x= = = =

3 2( ) ; '( ) 3 ; ''( ) 6 ; '''( ) 6f x x f x x f x x f x= = = =example:inflection point at x = 0

2-Variable Function with a Maximum

z = f(x,y)

( , ) 0

( , ) 0

f x yx

f x yy

∂=

∂∂

=∂

2-Variable Function with both Maxima and Minima

z = f(x,y)

( , ) 0

( , ) 0

f x yx

f x yy

∂=

∂∂

=∂

2-Variable Function with a Saddle Point

z = f(x,y)

( , ) 0

( , ) 0

f x yx

f x yy

∂=

∂∂

=∂

Some Math Background- a digression

The GradientThe HessianQuadratic FormsTaylor Series Expansion

The Gradient – vector of first partials

The gradient vector of the scalar-valued function f(x) at the point x = x0 is defined as

0

1

0

20

0

( )

( )

( ) ::( )

n

f xx

f xx

f

f xx

∂⎛ ⎞⎜ ⎟∂⎜ ⎟∂⎜ ⎟

⎜ ⎟∂∇ = ⎜ ⎟

⎜ ⎟⎜ ⎟⎜ ⎟∂⎜ ⎟⎜ ⎟∂⎝ ⎠

X

The Hessian – matrix of second partials

⎥⎥⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢⎢⎢

⎣

⎡

∂∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂∂

=∂∂

∂=∇=

nnnn

n

n

xxxf

xxxf

xxxf

xxxf

xxxf

xxxf

xxxf

xxxf

xxxf

xxxfxfH

*)(*)(*)(

*)(*)(*)(

*)(*)(*)(

*)(*)(

2

2

2

1

2

2

2

22

2

12

21

2

21

2

11

2

22

{ }2

1( ,..., )nij

i j

f x xhx x

∂=

∂ ∂

Quadratic FormsA quadratic form is a scalar function defined for allx ε En that takes the form:

1 1

( )n n

ij i ji j

Q x a x x= =

=∑∑

where aij is a real number (possibly zero). q(x) is a quadratic function that may be written in matrix-vectorform: q(x) = xt A x

Our very first example of a quadratic form:

That is a very fine example of a

quadratic form.

2 2 21 2 3 1 1 2 1 3 2 3

1

1 2 3 2

3

( , , ) 3 4 5 7

3 2 .5( , , ) 2 5 0

.5 0 7

q x x x x x x x x x x

xx x x x

x

= + − + +

−⎡ ⎤ ⎛ ⎞⎜ ⎟⎢ ⎥= ⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥−⎣ ⎦ ⎝ ⎠

Quadratic Forms

is called a quadratic form.

A matrix A is positive definite if and only if xTAx > 0 for all vectors x ≠ 0.

A matrix A is negative definite if and only if xTAx < 0 for all vectors x ≠ 0

A matrix A is indefinite if xTAx > 0 for some x and xTAx< 0 for others

Properties of Quadratic Forms

Test for Definiteness

What is a principal minor?

The kth principal minor of the symmetric matrix A is the determinant, denoted Mk, of the submatrix formed by deleting the last n-k rows and columns of A.

The principle leading minors

3 2 .52 5 0.5 0 7

−⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥−⎣ ⎦

1

2

3

3 3 0

3 215 4 11 0

2 5

3 2 .52 5 0 75.75 0.5 0 7

M

M

M

= = >

= = − = >

−= = >−

Taylor Series – one variable

Taylor series is a representation or approximation of a function as a sum of terms calculated from the values of its derivatives at a single point.Specifically, the Taylor series of an infinitely differentiable real function f, defined on an open interval (a − r, a + r), is the power series

Taylor Series Expansion

2

3

1( ) ( ) '( )( ) ''( )( )2!

1 '''( )( ) higher order terms3!

f x f a f a x a f a x a

f a x a

= + − + −

+ − +


2

3

1( ) ( ) '( )( ) ''( )( )2!



f a x a

= + − + −

+ − +

If “a” is a stationary point 0

for x “close” to “a”, negligible


2

3

1( ) ( ) '( )( ) ''( )( )2!



f a x a

= + − + −

+ − +

If “a” is a stationary point 0

21( ) ( ) ''( )( )20 if ''( ) 00 if ''( ) 00 if ''( ) 0

f x f a f a x a

f af af a

− ≈ −

> >< <= =

minimummaximum

look at higherorder derivatives

for x “close” to “a”, negligible

Animated

Taylor Series in 2 variables

f(x,y) = f(x0,y0) + fx(x0,y0) (x – x0) + fy(x0,y0) (y – y0)

+ ½ fxx (x0,y0) (x – x0)2 + fxy (x0,y0) (x – x0) (y – y0)

+ ½ fyy (x0,y0) (y – y0)2 + higher order terms

A second order approximation of f(x,y) in the neighborhood about the point (x0,y0):

Taylor’s Series Approximation in 2-variables in matrix-vector form

( )

( )

00 0 0 0 0 0

0

0 0 0 0 00 0

0 0 0 0 0

( , ) ( , ) ( , ) ( , )

( , ) ( , )1( , ) ( , )2

x y

xx xy

yx yy

x xf x y f x y f x y f x y

y y

f x y f x y x xx x y y

f x y f x y y y

higher order terms

−⎛ ⎞≈ + ⎜ ⎟−⎝ ⎠

−⎡ ⎤ ⎛ ⎞+ − − ⎢ ⎥ ⎜ ⎟−⎝ ⎠⎣ ⎦+

Hessian

gradient

Putting it all together - Matrix-Vector Representation of Taylor Series

0 0 0

0 0 0

( ) ( ) ( ) ( )1 ( ) ( )( )2

t

t

f f f

H

≈ +∇ −

+ − −

X X X X X

X X X X X

0 0 0 01( ) ( ) ( ) ( )( )2

tf f H− ≈ − −X X X X X X XIf “X0” is a stationary point, then:

0 0

0 0

( ) ( ) 0 ( ) .( ) ( ) 0 ( ) . ..

f f if H is pos deff f if H is neg def

− >− <

X X XX X X

Convex (concave) FunctionsA function is convex (concave) if its associated Hessian matrix, H(X) is positive (negative) definite.A convex (concave) function has a single minimum (maximum) point.∇f(Xo) = 0 is then both necessary and sufficient for minimizing (maximizing) f(X).

Unimodal Functions

with respect to a minimum:1. everywhere convex in the domain2. H(x) is positive definite for all x in the domain

with respect to a maximum:1. everywhere concave in the domain2. H(x) is negative definite for all x in the domain

In summary - Equivalent Statements

At a minimum point X0:∇f(X0)=0H(X0) is positive definite(X0)t H(X0) X0 > 0f(X0) is convex

At a maximum point X0:∇f(X0)=0H(X0) is negative definite(X0)t H(X0) X0 < 0f(X0) is concave

It’s all making sense now.

OR students ponderingthis latest information

A 1-variable example

( )( )( )

( )

6 5 4 3

5 4 3 20

2

24 3 2

0 2

165( ) 5 36 60 362

( )( ) 30 180 330 180 0

30 1 2 3 0; ' 0,1,2,3

( ) 150 720 990 360

f x x x x x

df xf x x x xdx

x x x x x

d f xH x x x xdx

= − + − +

∇ = = − + − =

− − − = =

= = − + −

X

X

X f(x) f’’(x) __0 36 0 inflection point - f’’’(0) = -3601 27.5 60 local minimum2 44 -120 local maximum3 -4.5 540 local minimum

2-Variable Problems

sufficient conditions:

0 0

0 0

( , ) 0 for a local min( , ) 0 for a local max

xx

xx

f x yf x y

><

( )2

0 0 0 0 0 0( , ) ( , ) ( , ) 0xx yy xyf x y f x y f x y⋅ − <

and

saddlepoint

( )2

0 0 0 0 0 0( , ) ( , ) ( , ) 0xx yy xyf x y f x y f x y⋅ − >

( ) 0 0 0 00 0

0 0 0 0

( , ) ( , ),

( , ) ( , )xx xy

yx yy

f x y f x yH x y

f x y f x y⎡ ⎤

= ⎢ ⎥⎣ ⎦

A 2-variable exampleMax f(x,y) = 100 – (x – 4)2 – 2 (y – 2)2

2( 4) 0 4

4( 2) 0 2

f x xxf y yy

∂= − − = ⇒ =

∂∂

= − − = ⇒ =∂

necessary conditions:

2

2 2

22 2 2

2 0

4 ; 0

8 0

fxf f

y x y

f f fx y x y

∂= − <

∂

∂ ∂= − =

∂ ∂ ∂

⎛ ⎞∂ ∂ ∂⋅ − = >⎜ ⎟∂ ∂ ∂ ∂⎝ ⎠

sufficient conditions:

concave function

A 3-variable example (18.1-1)

( )

2 2 21 2 3 1 3 2 3 1 2 3

0

11

3 22

2 33

0

( , , ) 20

1 2 0

2 0

2 2 0

1 2 4, ,2 3 3

f x x x x x x x x x xff xxf x xxf x xx

= + + − − −

∇ =

∂= − =

∂∂

= − =∂∂

= + − =∂

⎛ ⎞= ⎜ ⎟⎝ ⎠

X

X

0

2 0 0( ) 0 2 1

0 1 2H

−⎡ ⎤⎢ ⎥= −⎢ ⎥⎢ ⎥−⎣ ⎦

X

M1 = -2, M2 = 4, M3 = -6Max point

A more interesting 3-variable example

( )

( )

3 21 2 3 1 2 3 1 2 3

0

11

22

2

3 3

0

( , , ) 8 3 5ln 36 50

8 2 0

9 36 0

5 5 0

4, 2,1

f x x x x x x x x xff xxf xxfx x

= − + − + −

∇ =

∂= − =

∂∂

= − + =∂∂

= − =∂

= ±

X

X

0 2

23

2 0 0( ) 0 18 0

50 0

H x

x

⎡ ⎤⎢ ⎥−⎢ ⎥

= −⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥⎣ ⎦

X

M1 = -2, M2 = ±72, M3= -+360Max point at (4,2,1)saddle pt at (4,-2,1)

A 4-variable example

The following quadratic cost function must be minimized wherexi is the number of units of the ith product to be produced.

Min f(x1,x2,x3,x4) = 200 (x2 – x1)2 + 300 (x3 – x2)2 + 100 (x4 – x3)2

+ 200 (x1 – x4)2 + 500(x1 – 150)2 + 600(x2 – 100)2

+ 700(x3 – 120)2 + 800(x4 – 80)2 + 4000x1 + 5000x2+ 3000x3 + 2000x4

What a great cost function Chuck. Way

to go.

OR students excitedto solve a 4-variableproblem.

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( )

2 1 1 4 11

2 1 3 2 22

3 2 4 3 33

4 3 1 4 44

4 4( ) 10 150 40 0

4 6 12 100 50 0

6 2 14 120 30 0

2 4( ) 16 80 20 0

f x x x x xxf x x x x xxf x x x x xxf x x x x xx

∂= − − + − + − + =

∂∂

= − − − + − + =∂∂

= − − − + − + =∂

∂= − − − + − + =

∂

The Problem – the first partialsMin f(x1,x2,x3,x4) = 200 (x2 – x1)2 + 300 (x3 – x2)2 + 100 (x4 – x3)2

+ 200 (x1 – x4)2 + 500(x1 – 150)2 + 600(x2 – 100)2

+ 700(x3 – 120)2 + 800(x4 – 80)2 + 4000x1 + 5000x2+ 3000x3 + 2000x4

18x1 – 4x2 -4x4 = 1460-4x1 + 22x2 – 6x3 = 1150

-6x2 + 22x3 – 2x4 = 1650-4x1 -2x3 +22x4 = 1260

The first partials solved

The second partials – the Hessian

18x1 – 4x2 -4x4 = 1460-4x1 + 22x2 – 6x3 = 1150

-6x2 + 22x3 – 2x4 = 1650-4x1 -2x3 + 22x4 = 1260

18 4 0 44 22 6 0

( )0 6 22 24 0 2 22

H

− −⎡ ⎤⎢ ⎥− −⎢ ⎥=⎢ ⎥− −⎢ ⎥− −⎣ ⎦

X

The principle minors:

18 18 0

18 4380 0

4 22

18 -4 0-4 22 -6 7712 00 -6 22

18 4 0 44 22 6 0

160,592 > 00 6 22 24 0 2 22

= >

−= >

−

= >

− −− −

=− −

− −

A Darke Study

Darke County Ohioa vibrant, growing community which offers its

residents the unique opportunity to enjoy small town community life within easy access

of major metropolitan areas.

We shall locate our new store in Darke

County Ohio!

Darke County Ohio Townships

1 2 3 4 5 6 7 8

9

8

7

6

5

4

3

2

1

x

y

population = 52,983

Did you know?Because of its geographic location, Darke County is within a 90-minute air market for 55% of the population of the United States.

Township PopulationsTownship 2005 est.Adams township 2,484Allen township 1,188Brown township 2,145Butler township 1,623Franklin township 3,254Greenville township 8,845Harrison township 2,145Jackson township 1,578Liberty township 1,157Mississinawa township 809Monroe township 6,214Neave township 973Patterson township 3,781Richland township 854Twin township 6,452Van Buren township 2,576Wabash township 951Washington township 1,245Wayne township 3,201York township 544Totals 52,019

The Data

Township 2005 weights x-coord y-coordAdams township 2,484 0.048 7 4.75Allen township 1,188 0.023 3.75 8.5Brown township 2,145 0.041 4 6.75Butler township 1,623 0.031 4 1.25Franklin township 3,254 0.063 7.5 3Greenville township 8,845 0.170 4 4.25Harrison township 2,145 0.041 2 1.75Jackson township 1,578 0.030 1.5 6.75Liberty township 1,157 0.022 2 3.25Mississinawa township 809 0.016 2 9Monroe township 6,214 0.119 7 1.5Neave township 973 0.019 3.75 3Patterson township 3,781 0.073 7 9.25Richland township 854 0.016 5.5 6.25Twin township 6,452 0.124 5.75 1.75Van Buren township 2,576 0.050 5.75 3Wabash township 951 0.018 5.5 9Washington township 1,245 0.024 1.75 5Wayne township 3,201 0.062 7.25 6.5York township 544 0.010 5.5 8

The Euclidean Distance ProblemI shall now solve the very difficult Euclidean distance

problem.

( ) ( )2 2

1

min ( , )n

i i ii

f x y

w x a y b=

⎡ ⎤= − + −⎣ ⎦∑

let x = x-coordinate of outlet store y = y-coordinate of outlet store ai = x-coordinate of ith townshipbi = y-coordinate of ith townshipwi = weight placed on location (ai,bi)

Necessary Conditions

2 21

2 21

2 ( )1 02 ( ) ( )

2 ( )1 02 ( ) ( )

ni i

i i i

ni i

i i i

w x afx x a y b

w y bfy x a y b

=

=

−∂= =

∂ − + −

−∂= =

∂ − + −

∑

∑

I don’t see how these equations can be solved. ( ) ( )2 2

1

min ( , )n

i i ii

f x y

w x a y b=

⎡ ⎤= − + −⎣ ⎦∑

The Difficulty

2 2 2 21 1

2 2 2 21 1

2 2( ) ( ) ( ) ( )

2 2( ) ( ) ( ) ( )

n ni i i

i ii i i i

n ni i i

i ii i i i

w x w ax a y b x a y b

w y wbx a y b x a y b

= =

= =

=− + − − + −

=− + − − + −

∑ ∑

∑ ∑

2 21

2 21

2 ( )1 02 ( ) ( )

2 ( )1 02 ( ) ( )

ni i

i i i

ni i

i i i

w x afx x a y b

w y bfy x a y b

=

=

−∂= =

∂ − + −

−∂= =

∂ − + −

∑

∑

More of the Difficulty

2 2 2 21 1

2 2 2 21 1

2 2( ) ( ) ( ) ( )

2 2( ) ( ) ( ) ( )

n ni i i

i ii i i i

n ni i i

i ii i i i

w x w ax a y b x a y b

w y wbx a y b x a y b

= =

= =

=− + − − + −

=− + − − + −

∑ ∑

∑ ∑

2 2 2 21 1

2 2 2 21 1

2 2( ) ( ) ( ) ( )

2 2( ) ( ) ( ) ( )

n ni i i i

i ii i i in n

i i

i ii i i i

w a wbx a y b x a y b

x yw w

x a y b x a y b

= =

= =

− + − − + −= =

− + − − + −

∑ ∑

∑ ∑

Let’s try using

Solver.

The Solution ( ) ( )2 2i i iw x a y b⎡ ⎤− + −⎣ ⎦

Township 2005 weights x-coord y-coordAdams township 2,484 0.048 7 4.75 0.101891Allen township 1,188 0.023 3.75 8.5 0.109715Brown township 2,145 0.041 4 6.75 0.126215Butler township 1,623 0.031 4 1.25 0.08828Franklin township 3,254 0.063 7.5 3 0.162561Greenville township 8,845 0.170 4 4.25 0.190063Harrison township 2,145 0.041 2 1.75 0.153422Jackson township 1,578 0.030 1.5 6.75 0.138635Liberty township 1,157 0.022 2 3.25 0.069326Mississinawa township 809 0.016 2 9 0.092763Monroe township 6,214 0.119 7 1.5 0.366929Neave township 973 0.019 3.75 3 0.029375Patterson township 3,781 0.073 7 9.25 0.415439Richland township 854 0.016 5.5 6.25 0.039656Twin township 6,452 0.124 5.75 1.75 0.277496Van Buren township 2,576 0.050 5.75 3 0.055427Wabash township 951 0.018 5.5 9 0.094029Washington township 1,245 0.024 1.75 5 0.083513Wayne township 3,201 0.062 7.25 6.5 0.210572York township 544 0.010 5.5 8 0.043379

Totals 52,019 1 x-coord y-coord target5.0534 3.8761 2.8487

Darke County Ohio Townships

1 2 3 4 5 6 7 8

9

8

7

6

5

4

3

2

1

x

y

If at least one half of the cumulative weight is associated

with an existing facility, the optimum location for the new facility will coincide with the existing facility. I call this the Majority Theorem. It is also true that the optimum location

will always fall within the convex hull formed from the

existing points.

A Simple Proof using an Analog Model

The convex hull

These have been great examples of

multi-variable optimization.

Indeed! But I think I need to work some

problems. Then I will have this mastered.

The Multi-Variable Unconstrained Problem

You have experienced the peaks and valleys and the ups and downs, of the general unconstrained nonlinear function. What more can one expect out of life?

http://www.allfree-clipart.com/cgi-bin/imageFolio2.cgi?direct=clipart/Motivational&img=

http://www.allfree-clipart.com/cgi-bin/imageFolio2.cgi?direct=clipart/Motivational&img=10

Documents

Block 4 Nonlinear Systems Lesson 12 – Optimizationacademic.udayton.edu/charlesebeling/MSC521/PDF_PPT Files/Classical...4 6 12 100 50 0 62 14120300 2 4( ) 16 80 20 0 f xx xx x x f