Classical OptimizationChapter 20
A decision without optimization is like an arch without a keystone
- olde English saying circa 1886
The World Is Not Linear!
Classical Optimization
Uses differential calculus to determine points of maxima and minima (extrema).Underlying theory provides the basis for most nonlinear programming algorithms Objective – develop necessary and sufficient conditions for determining unconstrained extrema.
The General Optimization Problem
1 2
1 2
Max/Min ( , ,..., )subj to :
( , ,..., ) , 1,2,...,
n
i n i
f x x x
g x x x b i m≤⎧ ⎫⎪ ⎪= =⎨ ⎬⎪ ⎪≥⎩ ⎭
where f, g1, …,gm are real-valued functions
Mathematical Programming
Linear Integer Nonlinear Dynamic
unconstrained constrained
single multi- equality inequalityvariable variable constraints constraints
Lagrangianmultipliers
Karush – Kuhn -Tucker conditions
I am quite interested in these nonlinear programs. Can you tell me more?
The Unconstrained problem
Local Extremalocal min: x0 is a local minimum (maximum) if for an arbitrarysmall neighborhood, N, about x0, f(x0) ≤ (≥) f(x) for all x in N.
x
f(x)
x0
N
x0
N
Global Extremaglobal min: x* is a global min (max) if f(x*) ≤ (>=) f(x) for all x such that a ≤ x ≤ b.
x
f(x)
abx*
global min global max
The Problem – finding the global
localmax
localmin
unbounded
x
f(x)
a bclosed interval
globalmin
globalmax
x
x
x
f(x)+ -
+
( )d f xdx
2
2
( )d f xdx
concave convex
stationary point stationary
point
Animated
All you wanted to know about Inflection Points
f(x) changes from concave to convex (or convex to concave)f’(x) achieves a maximum or minimum; f’(x) may be zerof’’(x) = 0 and f’’(x) changes sign - f’(x) goes from decreasing to an an increasing function (or vice-versa)f’’’(x) ≠ 0
counter example:4 3 2( ) ; '( ) 4 ; ''( ) 12 ; '''( ) 24f x x f x x f x x f x x= = = =
3 2( ) ; '( ) 3 ; ''( ) 6 ; '''( ) 6f x x f x x f x x f x= = = =example:inflection point at x = 0
2-Variable Function with a Maximum
z = f(x,y)
( , ) 0
( , ) 0
f x yx
f x yy
∂=
∂∂
=∂
2-Variable Function with both Maxima and Minima
z = f(x,y)
( , ) 0
( , ) 0
f x yx
f x yy
∂=
∂∂
=∂
2-Variable Function with a Saddle Point
z = f(x,y)
( , ) 0
( , ) 0
f x yx
f x yy
∂=
∂∂
=∂
Some Math Background- a digression
The GradientThe HessianQuadratic FormsTaylor Series Expansion
The Gradient – vector of first partials
The gradient vector of the scalar-valued function f(x) at the point x = x0 is defined as
0
1
0
20
0
( )
( )
( ) ::( )
n
f xx
f xx
f
f xx
∂⎛ ⎞⎜ ⎟∂⎜ ⎟∂⎜ ⎟
⎜ ⎟∂∇ = ⎜ ⎟
⎜ ⎟⎜ ⎟⎜ ⎟∂⎜ ⎟⎜ ⎟∂⎝ ⎠
X
The Hessian – matrix of second partials
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
∂∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂∂
=∂∂
∂=∇=
nnnn
n
n
xxxf
xxxf
xxxf
xxxf
xxxf
xxxf
xxxf
xxxf
xxxf
xxxfxfH
*)(*)(*)(
*)(*)(*)(
*)(*)(*)(
*)(*)(
2
2
2
1
2
2
2
22
2
12
21
2
21
2
11
2
22
{ }2
1( ,..., )nij
i j
f x xhx x
∂=
∂ ∂
Quadratic FormsA quadratic form is a scalar function defined for allx ε En that takes the form:
1 1
( )n n
ij i ji j
Q x a x x= =
=∑∑
where aij is a real number (possibly zero). q(x) is a quadratic function that may be written in matrix-vectorform: q(x) = xt A x
Our very first example of a quadratic form:
That is a very fine example of a
quadratic form.
2 2 21 2 3 1 1 2 1 3 2 3
1
1 2 3 2
3
( , , ) 3 4 5 7
3 2 .5( , , ) 2 5 0
.5 0 7
q x x x x x x x x x x
xx x x x
x
= + − + +
−⎡ ⎤ ⎛ ⎞⎜ ⎟⎢ ⎥= ⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥−⎣ ⎦ ⎝ ⎠
Quadratic Forms
is called a quadratic form.
A matrix A is positive definite if and only if xTAx > 0 for all vectors x ≠ 0.
A matrix A is negative definite if and only if xTAx < 0 for all vectors x ≠ 0
A matrix A is indefinite if xTAx > 0 for some x and xTAx< 0 for others
Properties of Quadratic Forms
Test for Definiteness
What is a principal minor?
The kth principal minor of the symmetric matrix A is the determinant, denoted Mk, of the submatrix formed by deleting the last n-k rows and columns of A.
The principle leading minors
3 2 .52 5 0.5 0 7
−⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥−⎣ ⎦
1
2
3
3 3 0
3 215 4 11 0
2 5
3 2 .52 5 0 75.75 0.5 0 7
M
M
M
= = >
= = − = >
−= = >−
Taylor Series – one variable
Taylor series is a representation or approximation of a function as a sum of terms calculated from the values of its derivatives at a single point.Specifically, the Taylor series of an infinitely differentiable real function f, defined on an open interval (a − r, a + r), is the power series
Taylor Series Expansion
2
3
1( ) ( ) '( )( ) ''( )( )2!
1 '''( )( ) higher order terms3!
f x f a f a x a f a x a
f a x a
= + − + −
+ − +
Taylor Series Expansion
2
3
1( ) ( ) '( )( ) ''( )( )2!
1 '''( )( ) higher order terms3!
f x f a f a x a f a x a
f a x a
= + − + −
+ − +
If “a” is a stationary point 0
for x “close” to “a”, negligible
Taylor Series Expansion
2
3
1( ) ( ) '( )( ) ''( )( )2!
1 '''( )( ) higher order terms3!
f x f a f a x a f a x a
f a x a
= + − + −
+ − +
If “a” is a stationary point 0
21( ) ( ) ''( )( )20 if ''( ) 00 if ''( ) 00 if ''( ) 0
f x f a f a x a
f af af a
− ≈ −
> >< <= =
minimummaximum
look at higherorder derivatives
for x “close” to “a”, negligible
Animated
Taylor Series in 2 variables
f(x,y) = f(x0,y0) + fx(x0,y0) (x – x0) + fy(x0,y0) (y – y0)
+ ½ fxx (x0,y0) (x – x0)2 + fxy (x0,y0) (x – x0) (y – y0)
+ ½ fyy (x0,y0) (y – y0)2 + higher order terms
A second order approximation of f(x,y) in the neighborhood about the point (x0,y0):
Taylor’s Series Approximation in 2-variables in matrix-vector form
( )
( )
00 0 0 0 0 0
0
0 0 0 0 00 0
0 0 0 0 0
( , ) ( , ) ( , ) ( , )
( , ) ( , )1( , ) ( , )2
x y
xx xy
yx yy
x xf x y f x y f x y f x y
y y
f x y f x y x xx x y y
f x y f x y y y
higher order terms
−⎛ ⎞≈ + ⎜ ⎟−⎝ ⎠
−⎡ ⎤ ⎛ ⎞+ − − ⎢ ⎥ ⎜ ⎟−⎝ ⎠⎣ ⎦+
Hessian
gradient
Putting it all together - Matrix-Vector Representation of Taylor Series
0 0 0
0 0 0
( ) ( ) ( ) ( )1 ( ) ( )( )2
t
t
f f f
H
≈ +∇ −
+ − −
X X X X X
X X X X X
0 0 0 01( ) ( ) ( ) ( )( )2
tf f H− ≈ − −X X X X X X XIf “X0” is a stationary point, then:
0 0
0 0
( ) ( ) 0 ( ) .( ) ( ) 0 ( ) . ..
f f if H is pos deff f if H is neg def
− >− <
X X XX X X
Convex (concave) FunctionsA function is convex (concave) if its associated Hessian matrix, H(X) is positive (negative) definite.A convex (concave) function has a single minimum (maximum) point.∇f(Xo) = 0 is then both necessary and sufficient for minimizing (maximizing) f(X).
Unimodal Functions
with respect to a minimum:1. everywhere convex in the domain2. H(x) is positive definite for all x in the domain
with respect to a maximum:1. everywhere concave in the domain2. H(x) is negative definite for all x in the domain
In summary - Equivalent Statements
At a minimum point X0:∇f(X0)=0H(X0) is positive definite(X0)t H(X0) X0 > 0f(X0) is convex
At a maximum point X0:∇f(X0)=0H(X0) is negative definite(X0)t H(X0) X0 < 0f(X0) is concave
It’s all making sense now.
OR students ponderingthis latest information
A 1-variable example
( )( )( )
( )
6 5 4 3
5 4 3 20
2
24 3 2
0 2
165( ) 5 36 60 362
( )( ) 30 180 330 180 0
30 1 2 3 0; ' 0,1,2,3
( ) 150 720 990 360
f x x x x x
df xf x x x xdx
x x x x x
d f xH x x x xdx
= − + − +
∇ = = − + − =
− − − = =
= = − + −
X
X
X f(x) f’’(x) __0 36 0 inflection point - f’’’(0) = -3601 27.5 60 local minimum2 44 -120 local maximum3 -4.5 540 local minimum
2-Variable Problems
sufficient conditions:
0 0
0 0
( , ) 0 for a local min( , ) 0 for a local max
xx
xx
f x yf x y
><
( )2
0 0 0 0 0 0( , ) ( , ) ( , ) 0xx yy xyf x y f x y f x y⋅ − <
and
saddlepoint
( )2
0 0 0 0 0 0( , ) ( , ) ( , ) 0xx yy xyf x y f x y f x y⋅ − >
( ) 0 0 0 00 0
0 0 0 0
( , ) ( , ),
( , ) ( , )xx xy
yx yy
f x y f x yH x y
f x y f x y⎡ ⎤
= ⎢ ⎥⎣ ⎦
A 2-variable exampleMax f(x,y) = 100 – (x – 4)2 – 2 (y – 2)2
2( 4) 0 4
4( 2) 0 2
f x xxf y yy
∂= − − = ⇒ =
∂∂
= − − = ⇒ =∂
necessary conditions:
2
2 2
22 2 2
2 0
4 ; 0
8 0
fxf f
y x y
f f fx y x y
∂= − <
∂
∂ ∂= − =
∂ ∂ ∂
⎛ ⎞∂ ∂ ∂⋅ − = >⎜ ⎟∂ ∂ ∂ ∂⎝ ⎠
sufficient conditions:
concave function
A 3-variable example (18.1-1)
( )
2 2 21 2 3 1 3 2 3 1 2 3
0
11
3 22
2 33
0
( , , ) 20
1 2 0
2 0
2 2 0
1 2 4, ,2 3 3
f x x x x x x x x x xff xxf x xxf x xx
= + + − − −
∇ =
∂= − =
∂∂
= − =∂∂
= + − =∂
⎛ ⎞= ⎜ ⎟⎝ ⎠
X
X
0
2 0 0( ) 0 2 1
0 1 2H
−⎡ ⎤⎢ ⎥= −⎢ ⎥⎢ ⎥−⎣ ⎦
X
M1 = -2, M2 = 4, M3 = -6Max point
A more interesting 3-variable example
( )
( )
3 21 2 3 1 2 3 1 2 3
0
11
22
2
3 3
0
( , , ) 8 3 5ln 36 50
8 2 0
9 36 0
5 5 0
4, 2,1
f x x x x x x x x xff xxf xxfx x
= − + − + −
∇ =
∂= − =
∂∂
= − + =∂∂
= − =∂
= ±
X
X
0 2
23
2 0 0( ) 0 18 0
50 0
H x
x
⎡ ⎤⎢ ⎥−⎢ ⎥
= −⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥⎣ ⎦
X
M1 = -2, M2 = ±72, M3= -+360Max point at (4,2,1)saddle pt at (4,-2,1)
A 4-variable example
The following quadratic cost function must be minimized wherexi is the number of units of the ith product to be produced.
Min f(x1,x2,x3,x4) = 200 (x2 – x1)2 + 300 (x3 – x2)2 + 100 (x4 – x3)2
+ 200 (x1 – x4)2 + 500(x1 – 150)2 + 600(x2 – 100)2
+ 700(x3 – 120)2 + 800(x4 – 80)2 + 4000x1 + 5000x2+ 3000x3 + 2000x4
What a great cost function Chuck. Way
to go.
OR students excitedto solve a 4-variableproblem.
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
2 1 1 4 11
2 1 3 2 22
3 2 4 3 33
4 3 1 4 44
4 4( ) 10 150 40 0
4 6 12 100 50 0
6 2 14 120 30 0
2 4( ) 16 80 20 0
f x x x x xxf x x x x xxf x x x x xxf x x x x xx
∂= − − + − + − + =
∂∂
= − − − + − + =∂∂
= − − − + − + =∂
∂= − − − + − + =
∂
The Problem – the first partialsMin f(x1,x2,x3,x4) = 200 (x2 – x1)2 + 300 (x3 – x2)2 + 100 (x4 – x3)2
+ 200 (x1 – x4)2 + 500(x1 – 150)2 + 600(x2 – 100)2
+ 700(x3 – 120)2 + 800(x4 – 80)2 + 4000x1 + 5000x2+ 3000x3 + 2000x4
18x1 – 4x2 -4x4 = 1460-4x1 + 22x2 – 6x3 = 1150
-6x2 + 22x3 – 2x4 = 1650-4x1 -2x3 +22x4 = 1260
The first partials solved
The second partials – the Hessian
18x1 – 4x2 -4x4 = 1460-4x1 + 22x2 – 6x3 = 1150
-6x2 + 22x3 – 2x4 = 1650-4x1 -2x3 + 22x4 = 1260
18 4 0 44 22 6 0
( )0 6 22 24 0 2 22
H
− −⎡ ⎤⎢ ⎥− −⎢ ⎥=⎢ ⎥− −⎢ ⎥− −⎣ ⎦
X
The principle minors:
18 18 0
18 4380 0
4 22
18 -4 0-4 22 -6 7712 00 -6 22
18 4 0 44 22 6 0
160,592 > 00 6 22 24 0 2 22
= >
−= >
−
= >
− −− −
=− −
− −
A Darke Study
Darke County Ohioa vibrant, growing community which offers its
residents the unique opportunity to enjoy small town community life within easy access
of major metropolitan areas.
We shall locate our new store in Darke
County Ohio!
Darke County Ohio Townships
1 2 3 4 5 6 7 8
9
8
7
6
5
4
3
2
1
x
y
population = 52,983
Did you know?Because of its geographic location, Darke County is within a 90-minute air market for 55% of the population of the United States.
Township PopulationsTownship 2005 est.Adams township 2,484Allen township 1,188Brown township 2,145Butler township 1,623Franklin township 3,254Greenville township 8,845Harrison township 2,145Jackson township 1,578Liberty township 1,157Mississinawa township 809Monroe township 6,214Neave township 973Patterson township 3,781Richland township 854Twin township 6,452Van Buren township 2,576Wabash township 951Washington township 1,245Wayne township 3,201York township 544Totals 52,019
The Data
Township 2005 weights x-coord y-coordAdams township 2,484 0.048 7 4.75Allen township 1,188 0.023 3.75 8.5Brown township 2,145 0.041 4 6.75Butler township 1,623 0.031 4 1.25Franklin township 3,254 0.063 7.5 3Greenville township 8,845 0.170 4 4.25Harrison township 2,145 0.041 2 1.75Jackson township 1,578 0.030 1.5 6.75Liberty township 1,157 0.022 2 3.25Mississinawa township 809 0.016 2 9Monroe township 6,214 0.119 7 1.5Neave township 973 0.019 3.75 3Patterson township 3,781 0.073 7 9.25Richland township 854 0.016 5.5 6.25Twin township 6,452 0.124 5.75 1.75Van Buren township 2,576 0.050 5.75 3Wabash township 951 0.018 5.5 9Washington township 1,245 0.024 1.75 5Wayne township 3,201 0.062 7.25 6.5York township 544 0.010 5.5 8
The Euclidean Distance ProblemI shall now solve the very difficult Euclidean distance
problem.
( ) ( )2 2
1
min ( , )n
i i ii
f x y
w x a y b=
⎡ ⎤= − + −⎣ ⎦∑
let x = x-coordinate of outlet store y = y-coordinate of outlet store ai = x-coordinate of ith townshipbi = y-coordinate of ith townshipwi = weight placed on location (ai,bi)
Necessary Conditions
2 21
2 21
2 ( )1 02 ( ) ( )
2 ( )1 02 ( ) ( )
ni i
i i i
ni i
i i i
w x afx x a y b
w y bfy x a y b
=
=
−∂= =
∂ − + −
−∂= =
∂ − + −
∑
∑
I don’t see how these equations can be solved. ( ) ( )2 2
1
min ( , )n
i i ii
f x y
w x a y b=
⎡ ⎤= − + −⎣ ⎦∑
The Difficulty
2 2 2 21 1
2 2 2 21 1
2 2( ) ( ) ( ) ( )
2 2( ) ( ) ( ) ( )
n ni i i
i ii i i i
n ni i i
i ii i i i
w x w ax a y b x a y b
w y wbx a y b x a y b
= =
= =
=− + − − + −
=− + − − + −
∑ ∑
∑ ∑
2 21
2 21
2 ( )1 02 ( ) ( )
2 ( )1 02 ( ) ( )
ni i
i i i
ni i
i i i
w x afx x a y b
w y bfy x a y b
=
=
−∂= =
∂ − + −
−∂= =
∂ − + −
∑
∑
More of the Difficulty
2 2 2 21 1
2 2 2 21 1
2 2( ) ( ) ( ) ( )
2 2( ) ( ) ( ) ( )
n ni i i
i ii i i i
n ni i i
i ii i i i
w x w ax a y b x a y b
w y wbx a y b x a y b
= =
= =
=− + − − + −
=− + − − + −
∑ ∑
∑ ∑
2 2 2 21 1
2 2 2 21 1
2 2( ) ( ) ( ) ( )
2 2( ) ( ) ( ) ( )
n ni i i i
i ii i i in n
i i
i ii i i i
w a wbx a y b x a y b
x yw w
x a y b x a y b
= =
= =
− + − − + −= =
− + − − + −
∑ ∑
∑ ∑
Let’s try using
Solver.
The Solution ( ) ( )2 2i i iw x a y b⎡ ⎤− + −⎣ ⎦
Township 2005 weights x-coord y-coordAdams township 2,484 0.048 7 4.75 0.101891Allen township 1,188 0.023 3.75 8.5 0.109715Brown township 2,145 0.041 4 6.75 0.126215Butler township 1,623 0.031 4 1.25 0.08828Franklin township 3,254 0.063 7.5 3 0.162561Greenville township 8,845 0.170 4 4.25 0.190063Harrison township 2,145 0.041 2 1.75 0.153422Jackson township 1,578 0.030 1.5 6.75 0.138635Liberty township 1,157 0.022 2 3.25 0.069326Mississinawa township 809 0.016 2 9 0.092763Monroe township 6,214 0.119 7 1.5 0.366929Neave township 973 0.019 3.75 3 0.029375Patterson township 3,781 0.073 7 9.25 0.415439Richland township 854 0.016 5.5 6.25 0.039656Twin township 6,452 0.124 5.75 1.75 0.277496Van Buren township 2,576 0.050 5.75 3 0.055427Wabash township 951 0.018 5.5 9 0.094029Washington township 1,245 0.024 1.75 5 0.083513Wayne township 3,201 0.062 7.25 6.5 0.210572York township 544 0.010 5.5 8 0.043379
Totals 52,019 1 x-coord y-coord target5.0534 3.8761 2.8487
Darke County Ohio Townships
1 2 3 4 5 6 7 8
9
8
7
6
5
4
3
2
1
x
y
If at least one half of the cumulative weight is associated
with an existing facility, the optimum location for the new facility will coincide with the existing facility. I call this the Majority Theorem. It is also true that the optimum location
will always fall within the convex hull formed from the
existing points.
A Simple Proof using an Analog Model
The convex hull
These have been great examples of
multi-variable optimization.
Indeed! But I think I need to work some
problems. Then I will have this mastered.
The Multi-Variable Unconstrained Problem
You have experienced the peaks and valleys and the ups and downs, of the general unconstrained nonlinear function. What more can one expect out of life?