Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
MATH353: Optimisation
Stephen [email protected]
School of Mathematics and StatisticsVictoria University of Wellington
Stephen Marsland (VUW) 1
Constrained Optimisation
For x = (x1, x2, . . . xn)
x⇤ = min f (x)
subject to:gj(x) = 0, j = 1, . . . , phk(x) 0, k = 1, . . . ,m
These are called the primal constraints.
Stephen Marsland (VUW) 2
TIt
Constrained Optimisation
Stephen Marsland (VUW) 3
max g 0
fi fth
1unconstrained f Max f
subject to g O
Constrained Optimisation
Stephen Marsland (VUW) 4
coOconstraint
f 2
Oh
µ
levelsets f f
Constrained Optimisation
• The constraints change the problem a lot.• Without constraints, we find the stationary points of the function, and
choose the optimal one.
• With constraints, we have to follow a path of admissable solutions to
find the optimal value on that path.
• There is no generally applicable method to find constrained optimalsolutions.
Stephen Marsland (VUW) 5
satisfyconstraints
Constrained Stationary Points
The solutions are still stationary points, but on the intersection set of theconstraints and the function surface: constrained stationary points.They are points where the constraint curves touches the contours of thelevel set of function.
Stephen Marsland (VUW) 6
Constrained Stationary Points
ExampleFind the constrained stationary points of
max z = �x2 � y2 subject to (x � 2)2 + y2 = 1.
Stephen Marsland (VUW) 7
level sets J z are circles centred at Cgd
i
Equivalence of Constraints
• An equality constraint can be made into two inequality constraints:
g(x) = 0 ⌘ g(x) 0 and � g(x) 0
• An inequality constraint can be made into one equality constraint:
h(x) 0 ⌘ h(x) + b2 = 0
by using a new slack variable. This version will be very useful later.
Stephen Marsland (VUW) 8
g I o
h Ca Eo
Definitions
Open BallAn open ball with centre x⇤ and radius r > 0 is:
B(x⇤, r) = {x 2 Rn : kx � x⇤k r}
Feasible Set XThe set of x̃ 2 Rn such that g(x̃) = 0 and h(x̃) 0, i.e., the values of xthat satisfy the constraints.
Constrained Local Minimumx⇤ 2 X is a constrained local minimum if f (y) � f (x⇤) 8y 2 X .
Stephen Marsland (VUW) 9
0Possible sol's
n FS it is a man that beat mm
is in the FS
Definitions
Active ConstraintsAn inquality constraint is active or binding at x̃ if h(x̃) = 0. And inactiveotherwise.
Regular PointsA point x̃ is regular for the constraints if gradients at x̃ of the activeconstraints (i.e., rg(x̃), rh(x̃)) are linearly independent.
Remember that the gradient vector rf (x) is normal to the (relevant) levelset of the function at each point.
Stephen Marsland (VUW) 10
X 70 x D automatically truea o on the baby tractive
Regularisers
• We can add extra terms to an optimisation problem:
min f (x) + �R(x)
• This is normally done to make the solutions simpler or easier to find,especially for ill-posed problems
• It’s unclear how to choose �
• Idea: add the constraints as regularisers• Problem: might fail to satisfy any part of solution well
Stephen Marsland (VUW) 11
Somenumber
1v
e
Inverse Problems
Relaxation start with d bog shrink it
Lagrange’s Theorem
TheoremConsider the problem min f (x) such that g(x) = 0. Suppose that f and gare continuously differentiable functions of two variables andrg(x) 6= 0 8x . If f has a local optimum at a point x̂ then there existssome � 2 R such that:
rf (x̂) = �rg(x̂).
This equation gives first-order necessary conditions for a constrained localoptimum.The � is known as a Lagrange multiplier, and the function
L(x ,�) = f (x)� �g(x)
as the Lagrangian function.
Stephen Marsland (VUW) 12
Primalequality constraints
0 O2 vectors point msame drr
Lagrange’s Theorem
Stephen Marsland (VUW) 13
Of glny c o
0 09
Lagrange’s Theorem
Proof.• The graph of g(x) = 0 is a curve in R2.• We can parameterise this as a function of some other variable t:x1 = h(t), x2 = k(t) and it will be smooth for nice functions. Letr(t) = (h(t), k(t))T .
• Now F (t) = f (r(t)) describes the curve g(x) = 0.• Let x̂ = (x̂1, x̂2) be a point at which f has an extrenum, with
corresponding parameter point t̂. F 0(t̂) = 0, since F (t̂) is anextrenum of F (t).
Stephen Marsland (VUW) 14
gin u o glhltl.hu gCrCH
deet
0 at f I
Lagrange’s Theorem
Proof.
F 0(t) = fx1dx1
dt+ fx2
dx2
dt= fx1h
0(t) + fx2k0(t)
So at t = t̂, 0 = F 0(t̂)
= fx1(x̂)h0(t̂) + fx2(x̂)k
0(t̂)
= rf (x̂) · r 0(t̂)) rf (x̂) ? r 0(t̂)
which is a tangent vector to the curve.rg is also ? r 0(t̂), as the curve is a level set of g .Hence rf and rg are both orthogonal to the same vector. So in 2D, theymust be parallel, hence rf = �rg .It’s still true in higher dimensions, but the proof is a bit more subtle.
Stephen Marsland (VUW) 15
mao
Ofr
iii
Constrained Optimisation
ExampleFind the constrained stationary points of:
f (x , y) = xy subject to x2 + y2 = 1.
Example
max�x2 � y2 subject to (x � 2)2 + y2 = 1.
Stephen Marsland (VUW) 16
circle
f ay g y L
guy e a'tyr 1 0
of L og IOf X Pg y D 2x y 2day x
a 72g a y
x'ty L 2x I x y Yr
I.E
fi
Tutu
f ai y y G 2 ty 1 0
a ios ft if4
2x X 2x 4 2y 2Xy 7 1 or y 0
Try D l 2x 2et4 0 4 g o
If g o then 2 2 1 0 22 1,3
Had Kiso EIt
Using the Lagrangian function• It doesn’t matter how many (equality) constraints we have:
L(x ,�) = f (x) +mX
i=1
�i (bi � gi (x))
• We can differentiate the Lagrangian with respect to each of itsvariables:
@L@xi
=@f
@xi�
mX
j=1
�j@gj@xj
@L@�i
= bi � gi (x)
• At the optimum rL = 0 by Lagrange’s theorem and the fact that theconstraints are satisfied:
⇢rf =
Pi �irgi (x)
gi (x) = bi
�⌘ rL = 0
Stephen Marsland (VUW) 17
e s t 9
gilalebi
2
constants
Constrained Optimisation
To solve a problem with equality constraints, turn it into an unconstrainedoptimisation problem with Lagrange muliplier.
Example
min 6x21 + 4x2
2 + x23 subject to 24x1 + 24x2 = 360, x3 = 1.
Stephen Marsland (VUW) 18
mm f subject to g gzmm f t I g
164 ok Y h h bg
6x t Kai tog't t 3 24 2424
d I K
of s µ gradient
primal constraints242 24k 360
Kz I 2K Yeai z
f 3 A
ni 6,9 1
b x 6 theCau 9 Cq D 541
Summary
• The Lagrangian function matches the original objective function atfeasible points (since there the constraints are satisfied). This isknown as the relaxed form.
• For fixed Lagrange multipliers, an unconstrained optimum of therelaxed model L must be a stationary point.
• Stationary points of L satisfy the constraints of the original function.• Hence, if (x̂ , �̂) is a stationary point of L(x ,�) and x̂ is an
unconstrained optimum of L(x ,�), then x̂ is an optimum of theoriginal equality-constrained function.
• This gives a sufficient condition for a solution.
Stephen Marsland (VUW) 19
m.in AN st g.tkbiForm L feast E bi gCut 01 0 atgot
Constrained Optimisation
Example
max z = �x21 � x2
2 � x23 such that x1 + x2 + x3 = 0, x1 + 2x2 + 3x3 = 1.
In general, finding solutions is hard. But this method sets up a system for anumerical solver.
Stephen Marsland (VUW) 20
L ai E ol t d x oh og t d l a 2x 3
JL 21j
i ti d O
OL0Th
24 Xi da O
oo u o
JLTx i l K K O
OLE
i l M 242 Ix o
L k t D ai fi H k
Inequality Constraints• In 1D it’s easy:
max z = f (x) such that a x b
• Three possible solutions:• Stationary point inside interval
• a• b
• So solve f 0(x) = 0 and evaluate f (x) at each of these points, togetherwith f (a) and f (b)
• Same in higher dimensions: constraints define a boundary betweenfeasible solutions and infeasible ones. The optimum can be:
• On the boundary
• Inside the feasible region
• At infinity
• Equivalently, @L@xi
= 0 and �i (bi � gi (x)) = 0 (so �i = 0 if gi (x) = bi ).• The second constraint is called complementary slackness.
Stephen Marsland (VUW) 21
t
t.io E
Complementary Slackness
Example
max f (x , y) = x � y2 � 1 subject to x2 + y2 � 1 0
Example
max f (x , y) = �x2 � y2 subject to 1 x + y
Stephen Marsland (VUW) 22
a slack variable
FEI
interior 0 at stationary partof 0
no local c sp inside
boundary Form L x y ltd l ai y
0LE L 2ha O
Fg 2g 2dg o
y Itt o
fly I i y 0 either y o
or c I
f l l O0
f C 1,0z
9
fl I a
Il t EI4
L si y t I Caty 1
Iff 2x to o
0 2g X co
DL xty I O
Tx
interior aty I to have PL o
Ya need 4 0
boundaryx y 0
Xty L x y
x y k
a 2 1 f E k kk
at e f a
More Inequality Constraints
• Nothing really changes with more inequality constraints, but you needto check that the other constraints hold when looking at the boundaryof one.
• Inactive inequalities can be ignored by setting the corresponding�i = 0, the others have to be included.
• But which ones are active? There are 2m partitions of m constraintsinto active and inactive!
Example
max x2 � y2 subject to y � 1 0, x2 � y � 1 0
Stephen Marsland (VUW) 23
D t f Ya
inside bdry X
L ai y th Ct y th l x'ty2x 2nd o all day
Ooty 2g X th o
DLOT
i t y o
deOT l a'ty o
interior X la o f10,01 0
boundaries 1st constraint tr O
g floD I
is 2nd constraint okh y I fo
O l I 2 Sq
2nd crustraint X O
o1 7 0 Zytek o
or 2 1 a y 1 0
at a O y I fco D t
at ten I y k f FL k 44
ftp.k 5ke
1st constraint y I GOYes as.p
vertices both constants active
y 1 0 yetx y 1 0 a IF
f It D I
point at u No
i max is 54 at ft FL
Duality
• Every optimisation has a corresponding one called the Lagrange dual• The solution to the dual problem is a lower bound (for a minimisation)
for the solution to the primal one• And the problem is always concave• The difference between the solutions to the primal and dual problems
in the duality gap• For convex primal problems the duality gap =0.
Stephen Marsland (VUW) 24
no E
Duality
• Earlier we wrote down the Lagrangian and then:• solved for the (non-negative) Lagrange multipliers
• used them to find the values of the primal variables
• If instead, we solve for the primal variables as functions of theLagrange multipliers, we get dual variables
• You used it a lot when you looked at linear programming
Stephen Marsland (VUW) 25
f bi gG left Gigi
Duality
• To solve:min f (x) subject to gi (x) = 0, hj(x) 0
• we formed the Lagrangian:
L(x ,�, ⌫) = f (x) +X
i
�igi (x) +X
j
⌫ihj(x).
• Now the dual equation is:
D(�, ⌫) = infxL(x ,�, ⌫) = infx
0
@f (x) +X
i
�igi (x) +X
j
⌫ihj(x)
1
A
Stephen Marsland (VUW) 26
equally megual.ly
t t
bi gill hilal
LHs
J
Linear Programming
It ldud.ms
Primal problem
mm al t k't 5 t 71 4 450
4,24270
solution i fc2 cl 8
Dual problem
min ai't't th n 24 411 ll
p Ki xxdtm.in xi xxt4dX7oax 7hX LO 74 24 0
MY f 442 41 470
max 4,1 NO
The Karush-Kuhn-Tucker ConditionsFirst-order necessary conditions for a constrained optimisation problem
min (or max )f (x)
such that gi (x) � bi , i 2 G
gi (x) bi , i 2 L
gi (x) = bi , i 2 E
The conditions are:1 Complementary slackness for inequality constraints2 Sign restrictions3 Lagrange’s gradient equation4 The primal constraints
The sign equations deal with the two sets of inequalities: for minimising �i
for L inequalities are negative and those for G are positive, and vice versafor maximising.
Stephen Marsland (VUW) 27
KK T
h
if 5bi
Of doggCat E bi
KKT Conditions
Example
max 2x + 7y subject to
(x � 2)2 + (y � 2)2 = 10 x 20 y 2
Stephen Marsland (VUW) 28
L 2xt7g t t I E y GDYth na t c y 2 a tdsky
FIFI th ttyo.H.li thfopcomplementary Slackness
kata O de 2x 0
ty o Cry o 14g's
Sign conditions
X unrestricted
maximisation problem Osx
1,4 EO
de di 20
Y 12 4 the the 2
titty oo th tds 7
Improving Directions
• If we are at a point x (k) we want to travel in an improving direction toget closer to a local optimum:
f (x (k) + h�x) ⇡ f (x (k)) + hrf (x (k))T�x
• The step �x needs to improve the current solution while remaining inthe feasible set for some small h
• The direction is improving if:
rf (x (k))T�x
⇢< 0 for min> 0 for max
Stephen Marsland (VUW) 29
a'ht a4thBx
steepestdesert
Improving Directions
• How to we check if the step is feasible?• Linear constraints
Equality aT x = b. x (k) is feasible ) aT x (k) = b. Then
aT (x (k) + h�x) = b iff aT�x = 0.
Inequality (; symmetric for �). Inactive constraints are
automatically satisfied for small h. For active
constraints, aT x = b. Then aT (x (k) + h�x) b iff
aT�x 0.
• Nonlinear constraints Taylor’s theorem:
gi (x(k) + h�x) ⇡ gi (x
(k) + hrgi (x(k))T�x
Then �x is feasible at x (k) to 1st order if:
rgi (x(k))T�x
8<
:
� 0 for active � constraints
0 for active constraints
= 0 for active equality constraints
Stephen Marsland (VUW) 30
I
Improving Directions
TheoremA solution x⇤ is a KKT point if there are no improving feasible solutions atx⇤.
In other words, the KKT conditions are a first-order working test of thelack of improving feasible directions.The KKT conditions are sufficient if all the constraints are linear, or thegradients of all the active constraints are linearly independent.
Stephen Marsland (VUW) 31
KKT point youcan't improve fromhere
and stay in feasible set
Last Examples
Example
min x2 + y2 subject to x + y = 1, x , y � 0
State the KKT conditions, show that they hold at the global optimum.
Show that �x =
✓1�1
◆is an improving direction at x (k) =
✓01
◆and
that the KKT conditions have no solution there.
Example
min x2 � 2x + y2 + 1 subject to x + y 0, x2 � 4 0
Stephen Marsland (VUW) 32
Of XOg
filth thtPrimal constraints
a get It22 Z O
y 20
Complementary Slackness
7cL x OX L y O
Sign la Xs 710
OfCODTda Q2 f z Z E Oso it
is anpony
at a 9 my I r
U7C O active
y c u r inautre
g lo 1 Da il f O
gold 1 Dx Ilo f I 710
C S 7 O
i i Hall E
X y
YY 2 d zt t have opposite
signsY 2
no KILT satisfaction
Ii k pcS 1 0
ill C t I
KKT
Miu ol 2aty't s t xty SO n 450
KILT conditions
Primal constraints
xxy SOall 4 SO
Gradient Eq
57 41 4Complementary Slackness
X Gay O
k al 41 0
Sign conditions
7 da E O
1 1 0 4 0 7 so 1 0,1 0
X CO d CO
I D 1 0
G E 2x 2 0 a I
2g 0 y 0
bit aty SO not satisfiedus sol's
X 0 1 0 x 4 0 x z
Za Z Zha
2g O g 0
so 2,0 are possible sots
aty SO a T2 y 0 not in FR
4 2 44 i 6 41
Xp LO 1 0 xty O x y2x 2 7
2g 2
y x Iz ti
I Hz ti Y I
do I
yk
n k
Y co d co xty o y Iz
X 4 0 x Iz
ft 2 2
2 2
2 2 7 12nd
I I.amz 41 41
6 472
di 20 X
no sol's here
ok 2 y 0 f 9 a 2x ty t1
i I 1 4 1