Upload
others
View
8
Download
2
Embed Size (px)
Citation preview
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
purdue university · cs 52000
computational methods in optimization
HOMEWORKSOLUTION
Yingwei Wang
January 16, 2013
Please answer the following questions in complete sentences in a typed manuscriptand submit the solution to me in class on January 17th, 2013.
Homework 1
Problem 1: Some quick theory
Show, using the definition, that the sequence 1+ k−k converges superlinearlyto 1.
Answer:
Let xk = 1 + k−k, then
limk→∞
xk+1 − 1
xk − 1,
= limk→∞
(k + 1)−(k+1)
k−k,
= limk→∞
kk
(k + 1)k+1,
= limk→∞
1
(1 + 1/k)k1
k + 1,
= e−1 limk→∞
1
k + 1,
= 0.
It follows that the sequence 1 + k−k converges superlinearly to 1.
1
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Problem 2: Raptors in space
Mr. Munroe (the xkcd author) decided that trapping you with raptors in theplane was too easy for someone that has taken this class.After all, you did come up with the solution that you should jump to escapethem, didn’t you?
Your new problem is to solve the generalized raptor problem:Suppose raptors are positioned at the vertices of a k-dimensional regular sim-
plex. You are at the center 20 m away from the vertices. One of the raptors hasa bum leg. Which direction should you run to maximize your survival time?
• Ignore all acceleration, like we did in class.
• The slow raptor runs at 10 m/s
• The fast raptors run at 15 m/s
• You run at 6 m/s
• A raptor will catch you if you are within 20 centimeters.
Checkout wikipedia http://en.wikipedia.org/wiki/Simplex about how tofind the coordinates of the raptors in a general space, or just use this implementa-tion: http://people.sc.fsu.edu/~jburkardt/m_src/simplex_coordinates/simplex_coordinates1.m
1. Modify the raptorchase.m function to compute the survival time of a hu-
man in a three-dimensional raptor problem. Show your modified function,and show the survival time when running directly at the slow raptor.
Answer: For the three-dimensional raptor problem, the only thing we needis the spherical coordinates (radius r, inclination φ, azimuth θ), wherer ∈ [0,∞), φ ∈ [0, π], θ ∈ [0, 2π]:
x = r sin(φ) cos(θ),
y = r sin(φ) sin(θ),
z = r cos(φ).
(1)
The modified Matlab codes, named raptorchase3.m , are the following:
function T = raptorchase3(theta,phi)
% RAPTORCHASE Simulate the survival time of the human in the XKCD raptor
% problem in 3 dimension.
%
% The XKCD raptor problem is posed as follows:
%
% A human is at the center of an equilaterial triangle with side length
% 20 meters. At each corner is a velociraptor, whose maximum speed
% is 25 m/s. The top corner has a velociraptor with a broken leg,
% which is limited to 10 m/s speed. What direction should you run in
% order to maximize survival time?
%
% This function uses ode45 to explicitly simulate the raptor motion and
% compute the first time when a raptor is within 20 cm of the position of
% the human.
%
% T = raptorchase3(theta,phi) returns the survival time for the angle theta and phi.
% Note that theta \in (0, 2\pi) and phi \in (0,\pi).
2
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
%
% This function is based on ideas presented by Nick Henderson at the ICME
% Open Day in May.
vhuman = 6; % human velocity in m/s
vraptor0 = 10; % slow raptor velociy in m/s
vraptor = 15; % raptor velocity in m/s
triangle_dimension = 20; % in meters
raptor_min_distance = 0.2; % a raptor within 20 cm can attack
tmax = 10; % maximum time for integration
% Add the ODE function
function dpos = change_in_positions(~,pos)
human = [pos(1) pos(2) pos(3)];
nraptors = length(pos)/3 - 1;
assert(ceil(nraptors) == nraptors) % integer operations are exact
dpos = zeros(size(pos));
dpos(1) = vhuman*sin(phi)*cos(theta);
dpos(2) = vhuman*sin(phi)*sin(theta);
dpos(3) = vhuman*cos(phi);
for i=1:nraptors
if i>1, vrap=vraptor;
else vrap=vraptor0;
end
raptor = [pos(3*i+1) pos(3*i+2) pos(3*i+3)];
tdir = (human - raptor)/norm(human-raptor);
dpos(3*i+1) = tdir(1)*vrap;
dpos(3*i+2) = tdir(2)*vrap;
dpos(3*i+3) = tdir(3)*vrap;
end
end
% Add a function to stop the ODE evaluation when the raptors get close
function [val,isterm,dir]=eaten_event(~,pos)
dir=0; % matlab bookkeeping
isterm=1;
human = [pos(1) pos(2) pos(3)];
nraptors = length(pos)/3 - 1;
assert(ceil(nraptors) == nraptors) % integer operations are exact
raptor_dist = zeros(nraptors,1);
for i=1:nraptors
raptor = [pos(3*i+1) pos(3*i+2) pos(3*i+3)];
raptor_dist(i) = norm(human-raptor);
end
val = min(raptor_dist) - raptor_min_distance; % val = 0 when captured
end
pos = 20*[zeros(3,1) simplex_coordinates1(3)];
p0 = pos(:); % unroll pos
3
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
opts = odeset(’Events’,@eaten_event);
sol = ode45(@change_in_positions, [0,tmax], p0, opts);
T = max(sol.x);
end
When running directly at the slow raptor, the angle is (θ = 0, φ = pi/2)and the survival time is T = 1.2337.
2. Utilize a grid-search strategy to determine the best angle for the human torun to maximize the survival time. Show the angle.
Answer: Let the grid be θ = [0 : 0.01 : 2π], φ = [0 : 0.01 : π]. Thebest angle I found is (θ, φ) = (5.65, 1, 57) and the maximal survival time isT = 1.5580.
Further, the detailed numerical results are shown in Fig.1.
Figure 1: Survival time vs (θ, φ)
3. Discuss the major challenge for solving this problem in four dimensions.(Or if you are feeling ambitious, solve it in 4d, and discuss would might bea problem in 5d.)
Answer: For the problem in 4-d, I think we just need the 4-d sphericalcoordinates:
x1 = r cos(φ1),
x2 = r sin(φ1) cos(φ2),
x3 = r sin(φ1) sin(φ2) cos(φ3),
x4 = r sin(φ1) sin(φ2) sin(φ3),
(2)
where φ1, φ2 ∈ [0, π] and φ3 ∈ [0, 2π].
For the 4-d problem, I obtained this solution:
best angle : (0, 7, 3.1, 5.2),
survival time : T = 1.5463.
The major challenge for solving the problem in multi-dimension is the ex-pensive computational cost when fine grid is used to search the best angle.
4
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
purdue university · cs 52000
computational methods in optimization
HOMEWORKSOLUTION
Yingwei Wang
January 22, 2013
Please answer the following questions in complete sentences in a typed manuscriptand submit the solution to me on blackboard on January 25th, 2012, by 5pm.
Homework 2: Convexity
Convex functions are all the rage these days, and one of the interests ofstudents in this class. You may have to read a bit about convexity on wikipediaor in the book.
Problem 1
Let’s do some matrix analysis to show that a function is convex. Solve problem2.7 in the textbook, which is:
Suppose that f(x) = xTQx, where Q is an n× n symmetric positivesemi-definite matrix. Show that this function is convex using thedefinition of convexity, which can be equivalently reformulated:
f(y + α(x − y))− αf(x)− (1− α)f(y) ≤ 0
for all 0 ≤ α ≤ 1 and all x, y ∈ Rn.
This type of function will frequently arise in our subsequent studies, so it’s animportant one to understand.
Answer: Let α ∈ [0, 1] and x, y ∈ Rn, then compute
f(y + α(x− y))− αf(x)− (1− α)f(y)
= (y + α(x− y))T Q (y + α(x− y))− αxTQx− (1− α)yTQy,
= α(α− 1)(
xTQx+ yTQy + yTQx+ xTQy)
,
= α(α− 1)(x+ y)TQ(x+ y).
We know that Q is an n × n symmetric positive semi-definitematrix. Thus,
(x+ y)TQ(x+ y) ≥ 0, ∀x, y ∈ Rn.
Besides,α(α− 1) ≤ 0, ∀α ∈ [0, 1].
It follows that
f(y + α(x− y))− αf(x)− (1− α)f(y) ≤ 0,
for all 0 ≤ α ≤ 1 and all x, y ∈ Rn, which means f(x) = xTQx is a
convex function.
1
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Problem 2: Convexity and least squares
1. Show that f(x) = ‖b − Ax‖2 is a convex function. Feel free to use theresult proved on the last homework.
Answer: First, let b = 0 and consider the function
f0(x) = ‖Ax‖2 = xTA
TAx.
It is easy to know that the matrix ATA is symmetric positive
semi-definite, since
(ATA)T = A
TA,
xTA
TAx = ‖Ax‖2 ≥ 0, , ∀x ∈ R
n.
By the conclusion of the previous problem, we can know thatthe function f0(x) is convex.
Second, after simple algebraic work, we can find that
f(y + α(x− y))− αf(x)− (1− α)f(y)
= f0(y + α(x− y))− αf0(x)− (1− α)f0(y),
for all 0 ≤ α ≤ 1 and all x,y ∈ Rn.
It follows that f(x) = ‖b−Ax‖2 is also a convex function.
2. Show that the null-space of a matrix is a convex set.
Answer: Let A be any m× n matrix and its null-space be
ker(A) = {x ∈ Rn : Ax = 0}.
Let x,y ∈ ker(A), then
Ax = 0, Ay = 0,
⇒ A(αx+ (1− α)y) = αAx+ (1− α)Ay = 0, ∀α ∈ [0, 1],
which means αx+ (1− α)y ∈ ker(A).
It follows that ker(A) is a convex set.
2
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
purdue university · cs 52000
computational methods in optimization
HOMEWORKSOLUTION
Yingwei Wang
February 1, 2013
Homework 3: Optimality and Constraints
Problem 0: List your collaborators.
Please identify anyone, whether or not they are in the class, with whom youdiscussed your homework. This problem is worth 1 point, but on a multiplicativescale. (Note that collaboration is not allowed on the bonus question below.)
Answer: I guarantee that all of the homeworks are done by myself,no discussion or collaboration with others. BTW, I really enjoy that,although it is time-consuming.
Problem 1: Optimization software and optimality
We’ll be frequently using software to optimize functions, this question willhelp familiarize you with two pieces of software: Poblano and the Matlab opti-mization toolbox.
The function we’ll study is the Rosenbrock function:
f(x) = 100(x2 − x2
1)2 + (1− x1)
2.
I briefly talked about this function in class and called it the banana function.Now it’s your turn to look at it!
1. Show a contour plot of this function
Answer: The contour and 3-d plot of the Rosenbrock functionare shown in Figs.1-2.
2. Write the gradient and Hessian of this function.
Answer: The gradient and Hessian of the Rosenbrock functionare
g(x) =
[
−400(x2 − x21)x1 − 2(2− x1)
200(x2 − x21)
]
,
H(x) =
[
1200x21 − 400x2 + 2 −400x1
−400x1 200
]
.
3. By inspection, what is the minimizer of this function? (Feel free to find theanswer by other means, e.g. looking it up, but make sure you explain why
you know that answer must be a global minimizer.)
Answer: It is obvious that if x∗ = (1, 1)t then f(x∗) = 0. Be-sides,
f(x) ≥ 0, ∀x ∈ R2.
1
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
30
30
30
30
3030
30
30
30
30
60
60
60
60
60
60
60
60
60
60
90
90
90
90
90
90
90
90
90
120
120
120
120
120
120
120
120
150
150
150
150
150
150
150
150
180
180
180
180
180
180
180
180
210
210210
210
210
210
210
210
240
240 240
240
240
240
240
270
270
270
270
270
270
270
300
300
300
300
300
Contour of Rosenbrock
−1.5 −1 −0.5 0 0.5 1 1.5−1
−0.5
0
0.5
1
1.5
2
2.5
3
50
100
150
200
250
300
Figure 1: Contour of Rosenbrock function
Figure 2: Plot of Rosenbrock function
It implies that x∗ = (1, 1)t is the global minimizer of this Rosen-brock function.
Furthermore, x∗ = (1, 1)t is the strict global minimizer of theRosenbrock function. The reason is shown as follows.
g(x∗) = 0,
H(x∗) =
[
802 −400−400 200
]
.
The (numerical) eigenvalues ofH(x∗) are λ1 = 0.399360767487622and λ2 = 1001.60063923251, which indicates that H(x∗) ≻ 0.
2
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
4. Explain how any optimization package could tell that your solution is alocal minimizer.
Answer: Since the Rosenbrock function is a non-convex function(see Fig.2), the CVX does not work in this case. I think thePoblano can be employed here.
5. Use Poblano to optimize this function starting from a few different points.Be adversarial if you wish. Does it always get the answer correct? Use atable or figure to illustrate your findings if appropriate. Show your code touse Poblano. Your code should have comments to explain what it is doingand you should explain any difficulties in implementing this test.
Answer: In order to call the subroutines in Poblano, we needwrite the function named rosenbrock.m.
function [f,g,h] = rosenbrock(x)
%%% This function returns the function value, partial derivatives
%%% and Hessian of the rosenbrock function, given by
%%% f(x1,x2) = 100*(x2-x1^2)^2 + (1-x1)^2.
%% Rosenbrock "banana" function
f = 100*(x(2)-x(1)^2)^2 + (1-x(1))^2;
%% gradient
g=[-400*(x(2)-x(1)^2)*x(1)-2*(1-x(1)); 200*(x(2)-x(1)^2)];
%% hessian
if nargout > 2
h=[1200*x(1)^2-400*x(2)+2, -400*x(1); -400*x(1), 200];
end
I choose NCG(nolinear conjugate gradient) method, correspond-ing to the function ncg in Problano. Also, I choose three differ-ent starting points: (−3,−5)t, (0, 0)t and (20, 10)t. The resultsare given as following.
I. Initial guess x0 = (−3,−5)t, which is near the true solution.
The output of NCG method is
>> out = ncg(@rosenbrock,[-3,-5]’,’TraceX’,true)
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 19616.00000000 8519.81314349
1 9 2019.81456799 455.63650412
2 14 2.07319311 1.99427728
3 18 2.04100794 1.30842367
... ...
22 110 0.00000971 0.02788423
23 114 0.00000000 0.00126552
24 117 0.00000000 0.00000551
out =
3
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Params: [1x1 inputParser]
ExitFlag: 0
ExitDescription: ’Successful termination based on StopTol’
X: [2x1 double]
F: 3.31201541282898e-013
G: [2x1 double]
FuncEvals: 117
Iters: 24
TraceX: [2x25 double]
It shows that the solution with the function value 10−13 can befound after 117 function evaluations. Besides, I also plot theout.TraceX in Fig.3.
−5 0 5−5
−4
−3
−2
−1
0
1
2
3
4
5Nonlinear Conjuate Gradient Iteration from starting point (−3,−5)
1
2
3
4
5
6
7
8
x 104
Figure 3: NCG iteration starting from (−3,−5)t.
By Wikipedia [2], by adaptive coordinate descent from startingpoint (−3,−5)t, the solution with the function value 10−10 canbe found after 325 function evaluations.
Now we know that the NCG method is better than adaptive
coordinate descent in this case.
II. Initial guess x0 = (0, 0)t, which is in the long, narrow,parabolic shaped flat valley, i.e. x2 = x2
1. (see Fig.1)
The output of NCG method is
>> out = ncg(@rosenbrock,[0,0]’,’TraceX’,true)
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 1.00000000 1.00000000
1 7 0.77110969 2.60058899
4
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
... ...
12 58 0.00000003 0.00007091
13 60 0.00000000 0.00000599
out =
Params: [1x1 inputParser]
ExitFlag: 0
ExitDescription: ’Successful termination based on StopTol’
X: [2x1 double]
F: 7.1901609098776e-014
G: [2x1 double]
FuncEvals: 60
Iters: 13
TraceX: [2x14 double]
The detailed iteration process is very similar to the first case(just the last half part). I just want to mention the results givenby others [3]:
*****************************************
Solution from the steepest descent:
x= 1.0000 1.0000
f(x1,x2)= 1.6983e-011
Iterations: 8147
Solution from the regularized steepest descent:
x= 1.0000 1.0000
f(x1,x2)= 6.4558e-014
Iterations: 194
Solution from the conjugate gradient:
x= 1.0000 1.0000
f(x1,x2)= 1.0418e-023
Iterations: 21
Solution from Quasi-Newton Rank 2:
x= 1.0000 1.0000
f(x1,x2)= 1.3264e-012
Iterations: 151
*****************************************
It also implies that the (nonlinear) conjugate gradient methodis better than (regularized) steepest descent method.
III. Initial guess x0 = (20, 10)t, which is far from the truesolution.
The output of NCG method is
>> out = ncg(@rosenbrock,[20,10]’,’TraceX’,true)
5
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Iter FuncEvals F(X) ||G(X)||/N
------ --------- ---------------- ----------------
0 1 15210361.00000000 1560506.41791727
1 9 7134.16612179 2582.53033913
2 17 17.34464550 19.59251511
3 21 17.25008445 0.65048548
4 24 16.55446511 41.85639750
5 32 15.66275780 60.65921255
... ....
28 148 0.00005408 0.00330572
29 150 0.00000019 0.00963675
30 152 0.00000000 0.00000849
out =
Params: [1x1 inputParser]
ExitFlag: 0
ExitDescription: ’Successful termination based on StopTol’
X: [2x1 double]
F: 3.75606726452345e-011
G: [2x1 double]
FuncEvals: 152
Iters: 30
TraceX: [2x31 double]
Also, I want to mention the results given by others [3]:
*****************************************
Solution from the steepest descent:
x= 1.0000 1.0000
f(x1,x2)= 1.8201e-011
Iterations: 31006
Solution from the regularized steepest descent:
x= 9.7544 95.1517
f(x1,x2)= 7.6641e+001
Iterations: 50000
Solution from the conjugate gradient:
x= 1.0000 1.0000
f(x1,x2)= 4.7100e-017
Iterations: 35
Solution from Quasi-Newton Rank 2:
x= 1.0000 1.0000
f(x1,x2)= 3.7352e-012
Iterations: 471
*****************************************
It verifies again that the CG and NCG methods are very good.
6
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
6. Read about Matlab’s fminunc function and determine how to provide itwith Hessian and Gradient information. Use this toolbox to optimize thefunction starting from a few different points. Show your code to use thisfunction and explain any differences (if any) you observe in comparison tousing Poblano.
Answer: The following Matlab code is about how to use fminuncto solve this problem, including how to provide it with Hessianand Gradient information.
%% indicate gradient is provided and display iteration
options = optimset(’GradObj’,’on’,’Hessian’,’on’,’Display’,’iter’);
%% use fminunc to find the minimizer
[x,fval,exitflag,output] = fminunc(@rosenbrock,[-3,-5]’,options);
Again, let us consider two starting points: (−3,−5)t and (20, 10)t.
I. Initial guess x0 = (−3,−5)t, which is near the true solution.
The output of fminunc in Matlab is
Norm of First-order
Iteration f(x) step optimality CG-iterations
0 19616 1.68e+004
1 372.862 10 2e+003 1
2 13.0702 1.84663 7.33 1
3 13.0702 18.9087 7.33 1
4 13.0702 4.72718 7.33 0
5 11.6963 1.1818 48.4 0
... ...
32 0.000972228 0.105897 0.913 1
33 4.39254e-005 0.0311832 0.0562 1
34 1.92081e-007 0.0139071 0.0149 1
35 2.8556e-012 0.000510656 1.49e-005 1
II. Initial guess x0 = (20, 10)t, which is far from the true solu-tion. The output of fminunc in Matlab is
Norm of First-order
Iteration f(x) step optimality CG-iterations
0 1.52104e+007 3.12e+006
1 2.81507e+006 10 9.14e+005 1
2 363863 20 2.38e+005 1
3 2994.98 40 1.97e+004 1
4 65.4375 5.27646 16.4 1
5 65.4375 80 16.4 1
... ...
56 0.000935215 0.0323693 0.184 1
57 7.09916e-005 0.0655248 0.334 1
58 5.93496e-008 0.00258142 0.00135 1
59 3.57116e-013 0.000540531 2.37e-005 1
It seams that more iterations are needed in Matlab than inPoblano.
7
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Problem 2: Log-barrier terms
The basis of a class of methods known as interior point methods is that wecan handle non-negativity constraints such as x ≥ 0 by solving a sequence ofunconstrained problems where we add the function b(x;µ) = −µ
∑
ilog(xi) to
the objective. Thus, we convert
{
minimize f(x)
subject to x ≥ 0(1)
intominimize f(x) + b(x;µ). (2)
1. Explain why this idea could work. (Hint: there’s a very useful picture youshould probably show here!)
Answer: I think there are at least three key things behind thisidea.
Suppose x∗ be the minimizer to the problem (2).
First, x∗ must be a (strictly) feasible solution to problem (1).
Second, the log-barrier function b(x;µ) is smooth and strictlyconvex.
Third, if 0 < µ ≪ 1, then x∗ is very close to the minimizer ofthe original problem (1).
2. Write a matrix expression for the gradient and Hessian of f(x) + b(x;µ) interms of the gradient vector g(x) and the Hessian matrix H(x) of f .
Answer:
b(x;µ) = −µ∑
i
log(xi),
⇒∂b(x;µ)
∂xi
= −µx−1i ,
⇒∂2b(x;µ)
∂xi∂xj
= µx−2i δij .
It follows that the gradient and Hessian of f(x) + b(x;µ) are
∇(f(x) + b(x;µ)) = g(x)− µx−1,
Hessian(f(x) + b(x;µ)) = H(x) + diag(x−2),
where g(x) and H(x) are the gradient vector and Hessian ma-trix of f respectively, x−1 = (x−1
1 , x−22 , · · · , x−n)T and diag(x−2)
means the diagonal matrix with x−2i at the diagonal entries.
8
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Problem 3: Do constraints always make a problem hard?
Let f(x) = (1/2)xTQx− xTc. Recall that if Q is positive-semi-definite then
f(x) is convex.
1. Construct and illustrate an example in R2 to show that that f(x) is non-
convex if Q is indefinite.
Answer: Without loss of generality, we can assume that c = 0.Let
Q =
(
1 00 −1
)
,
then
f(x) = (1/2)xTQx,
= (1/2)(x21 − x2
2).
Choose x = (0,−1)t,y = (0, 1)t. On one hand,
f(x) = f(y) = −1,
⇒ αf(x) + (1− α)f(y) = −1, ∀α ∈ (0, 1).
On the other hand,
f(1/2x+ 1/2y) = f(0) = 0.
It follows that f(x) is not convex.
Actually, the figure for the function f(x) = (1/2)(x21 − x2
2) areshown in Fig.4.
2. Now, suppose we consider the problem:
minimize f(x)
subject to Ax = b.
For this problem, show that any local minimizer is a global solution, even if f(x)is non-convex.
Proof. Assume A is full rank.It is easy to know that the gradient vector g(x) and the Hessian
matrix H(x) of f are
g(x) = Qx− c,
H(x) = Q.
Suppose x∗ is a local minimizer, then we have
g(x∗) = 0 ⇒ Qx∗ = c, (3)
H(x) � 0 ⇒ Q � 0. (4)
If x is another feasible solution, i.e.
Ax = b,
9
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
−1−0.5
00.5
1
−1
−0.5
0
0.5
1−0.5
0
0.5
f(x) = 1/2*(x12 − x
22)
Figure 4: Hyperbolic paraboloid
and p = x∗ − x, then Ap = 0.Substituting x = x∗ − p into f(x) yields
f(x) =1
2(x∗ − p)tQ(x∗ − p)− (x∗ − p)tc,
= f(x∗) +1
2ptQp− pt(Qx∗ − c),
(3)= f(x∗) +
1
2ptQp,
≥ 0, by (4).
It follows that for quadratic programming problem, the local min-imizer, if exists, is also the global minimizer.
1. Is there always a local minimizer? Prove that there is, or show a counter-example. (Hint, there may be some ambiguity in this problem, if so, I’mlooking for a discussion and what you think the right answer is, so yourreasoning is more important than your actual answer.)
Answer: No. For example, there is no local minimizer for thefunction f(x) = (1/2)(x2
1 − x22), see Fig.4.
2. Write down and test an algorithm to solve these problems and find a globalminimizer.
Answer: Let A ∈ Rm×n, m ≤ n, with full rankm, x ∈ R
n be theunknowns and λ ∈ R
n be the associated Lagrange multiplier. Inorder to solve this problem, we just need to solve
(
Q At
A 0
)(
x
λ
)
=
(
c
b
)
(5)
10
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
I think we should assume Q ∈ Rn×n be also full rank.
Of course, usually it is costly to directly solve the (m+n)×(m+n) linear system (5). People propose many efficient way to solve(5) (see the chapter 16 of the textbook [1]), in which Krylovsubspace methods are appropriate candidates, in my mind.
However, the numerical results shown here are obtained by back-slash in Matlab. I just want to test the problem f(x) = (1/2)(x2
1−x22) with different constants.
Q =
[
1 00 −1
]
, c =
[
00
]
. (6)
I. ChooseA =
[
1 0]
, b = 1. (7)
Then the solution is
x∗ = [1, 0]t, λ∗ = −1.
II. ChooseA =
[
1 1]
, b = 1 (8)
Then the solution from Matlab is
x∗ = [NaN, Inf ]t, λ∗ = Inf.
Why I show this two numerical tests? Well, I just want toconclude that, if without constrains, there is no minimizer forf(x) = (1/2)(x2
1 − x22). If with constrains, then either global
minimizer exists or still no minimizer.
11
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Problem 4: Constraints can make a non-smooth problem
smooth.
Show that
minimize ‖Ax− b‖∞
can be reformulated as a smooth, constrained optimization problem.
Answer: The equivalent smooth optimization problem of infinity-norm minimization problem can be obtained by minimizing an aux-iliary variable t ∈ R with inequality constrains:
minimize t ,
subject to ‖Ax− b‖∞ ≤ t.
Further, the infinity norm constraint can be written as a set of linearinequalities, and the problem becomes:
minimize t ,
subject to − te ≤ Ax− b ≤ te,
where e is the vector with 1 in all entries.
References
[1] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. SpringerSeries in Operations Research and Financial Engineering. Springer, Berlin-Heidelberg, 2 edition, 2008.
[2] Wikipedia. Rosenbrock function. http://en.wikipedia.org/wiki/Rosenbrock_function,2013.
[3] M. Zhou. GG7920 Homework 2. http://utam.gg.utah.edu/~u0027410/course/gg6920/hw2/hw2.htm,2005.
12
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
purdue university · cs 52000
computational methods in optimization
HOMEWORKSOLUTION
Yingwei Wang
February 7, 2013
Homework 4:
Problem 0: List your collaborators.
Please identify anyone, whether or not they are in the class, with whom youdiscussed your homework. This problem is worth 1 point, but on a multiplicativescale. (Note that collaboration is not allowed on the bonus question below.)
Answer: I guarantee that all of the homeworks are done by myself,no discussion or collaboration with others. BTW, I really enjoy that,although it is time-consuming.
Problem 1: Steepest descent
(Nocedal and Wright, Exercise 3.6) Let’s conclude with a quick problem toshow that steepest descent can converge very rapidly! Consider the steepestdescent method with exact line search for the function f(x) = (1/2)xTQx−xTb.Suppose that we know x0 − x∗ is parallel to an eigenvector of Q. Show that themethod will converge in a single iteration.
Answer: Let us suppose that
f(x) = (1/2)xTQx− xTb, (1)
where Q is symmetric and positive definite.The steepest descent iteration for (1) is given by
xk+1 = xk −
(
∇f t
k∇fk
∇t
kQ∇fk
)
∇fk, (2)
where ∇fk = Qxk − b. For k = 0, we have
x1 = x0 −
(
∇f t
0∇f0∇t
0Q∇f0
)
∇f0. (3)
Supposex0 − x∗ = γy
k, (4)
where γ is a constant and ykis a normalized eigenvector of Q, i.e.
Qyk = λkyk, yt
kyk = 1. (5)
It follows that
∇f0 = Qx0 − b,
= Q(x∗ + γyk)− b,
= Qx∗
− b+ γλkyk.
1
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Since x∗ is the minimizer, ∇f(x∗) = Qx∗ − b = 0. Then
∇f0 = γλkyk. (6)
Substituting (6) into (3) yields
x1 = x0 −γ2λ2
k
γ2λ3k
γλkyk,
= x0 − γyk,
= x∗.
It that the method will converge in a single iteration.Remark: if Q is not positive definite, the proof also work.
Problem 2: Inequality constraints
Draw a picture of the feasible region for the constraints:
1− x1 − x2
1− x1 + x2
1 + x1 − x2
1 + x1 + x2
≥ 0
Answer: The feasible region of these inequality constraints areshown in Fig.1.
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1−x1−x
2=01+x
1−x
2=0
1+x1+x
2=0
1−x1+x
2=0
Feasible region
Figure 1: The feasible region of the inequality constraints
2
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Problem 3: Necessary and sufficient conditions
Let f(x) = 1
2xTQx− xT c.
1. Write down the necessary conditions for the problem:
minimize f(x)
subject to x ≥ 0.
Answer: The necessary conditions for this problem is
g(x∗)− λ = 0,
λi ≥ 0, x∗
i≥ 0, and either λi = 0 or x∗
i= 0,
where x∗ is the minimizer and g(x∗) = Qx∗ − c.
2. Write down the sufficient conditions for the same problem.
Answer: The sufficient conditions for this problem is
g(x∗)− λ = 0,
λi ≥ 0, x∗
i≥ 0, and either λi = 0 or x∗
i= 0,
Q ≻ 0,
where x∗ is the minimizer and g(x∗) = Qx∗ − c.
3. Consider the two-dimensional case with
Q =
[
1 22 1
]
c =
[
0−1.5
]
.
Determine the solution to this problem by any means you can, and justifyyour work.
Answer: In this case, f(x) = 0.5(x21+4x1x2+x2
2)+1.5x2. Thenthe Lagrangean function is
L(x, λ) = 0.5(x2
1 + 4x1x2 + x2
2) + 1.5x2 − λ1x1 − λ2x2. (7)
Now we need to solve this problem:
∂L
∂x1
= x1 + 2x2 − λ1 = 0,
∂L
∂x2
= 2x1 + x2 + 1.5− λ2 = 0,
λ1 ≥ 0, x1 ≥ 0 and either λ1 = 0 or x1 = 0,
λ2 ≥ 0, x2 ≥ 0 and either λ2 = 0 or x2 = 0.
The solution is
x∗
1 = 0, x∗
2 = 0, λ1 = 0, λ2 = 1.5.
It follows that the minimizer of this problem is x∗ = (0, 0)t.
3
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
4. Produce a Matlab or hand illustration of the solution showing the functioncontours, gradient, the constraint normal. What are the active constraintsat the solution? What is the value of λ in ATλ = g?
Answer: The function contours and gradient are shown in Fig.2.λ = (0, 1.5)t.
Both of x1 ≥ 0 and x2 ≥ 0 are active constraints. The reasonis that if no constraint x1 ≥ 0, then setting x2 → +∞ andx1 = −2x2 leads to f(x) = −∞ while if no constraint x2 ≥ 0,then just setting x1 → +∞ and x2 = −0.5x1 also leads tof(x) = −∞.
Further, since in this case, Q is not positive definitive, if noconstrains, the minimizer might not exit.
Constraint normal
Contour and gradient
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Figure 2: The contour and gradient of this function
4
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
purdue university · cs 52000
computational methods in optimization
HOMEWORKSOLUTION
Yingwei Wang
February 21, 2013
Homework 5
Problem 0: List your collaborators.
Please identify anyone, whether or not they are in the class, with whom youdiscussed your homework. This problem is worth 1 point, but on a multiplicativescale. (Note that collaboration is not allowed on the bonus question below.)
Answer: I guarantee that all of the homeworks are done by myself,no discussion or collaboration with others. BTW, I really enjoy that,although it is time-consuming.
Problem 1: Make it an LP
Show that we can solve:
minimize ‖x‖1
subject to Ax = b
by constructing an LP in standard form.This problem is preparation for a mini-project that isn’t quite ready yet.
Answer: It is obvious that this problem is equivalent to
minimize∑
n
i=1yi
subject to Ax = b,
− y ≤ x ≤ y.
It follows that
minimize∑
n
i=1yi
subject to Ax = b,
y − x ≥ 0,
y + x ≥ 0,
y ≥ 0.
Let t = y − x, s = y + x, then x = (s − t)/2,y = (s + t)/2.Besides, let
x =
y
s
t
, c = (1, · · · , 1, 0, · · · , 0, 0, · · · , 0)t,
A =
(
0 A/2 −A/22I −I −I
)
, b =
(
b
0
)
.
1
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Now the standard form of LP is
minimize ctx
subject to Ax = b,
x ≥ 0.
Problem 2
Using the codes from class, illustrate the behavior of the simplex method onthe LP from problem 13.9 in Nocedal and Wright:
minimize −5x1 − x2
subject to x1 + x2 ≤ 5 (1)
2x1 + (1/2)x2 ≤ 8
x ≥ 0
starting at [0, 0]T after converting the problem to standard form.Use your judgement in reporting the behavior of the method.
Answer: Method I: Use the linprog.m to solve the problem (1).
c = [-5 -1]’;
A = [1 1; 2 1/2];
b = [5 8]’;
lb = zeros(size(c));
options = optimset(’LargeScale’,’off’,’Simplex’,’on’,’Display’,’iter’);
[x,fval,exitflag,output,lambda] = linprog(c,A,b,[],[],lb,[],[0 0]’,options)
The results are as follows:
Phase 2: Minimize using simplex.
Iter Objective Dual Infeasibility
f’*x A’*y+z-w-f
0 0 5.09902
1 -20 0
Optimization terminated.
x =
4
0
fval =
-20
exitflag =
1
output =
iterations: 1
2
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
algorithm: ’medium scale: simplex’
cgiterations: []
message: ’Optimization terminated.’
constrviolation: 0
firstorderopt: 0
lambda =
ineqlin: [2x1 double]
eqlin: [0x1 double]
upper: [2x1 double]
lower: [2x1 double]
It shows that after just 1 iteration, we can find the solution.Method II: Convert the problem (1) into standard form
minimize −5x1 − x2 (2)
subject to x1 + x2 + s1 = 5
2x1 + (1/2)x2 + s2 = 8
x1, x2, s1, s2 ≥ 0
Use the subroutine simplex step.m to solve the problem (2)
%% Define the LP
% minimize -5x1 - x2
% subject to x1 + x2 <= 5
% 2x1 + (1/2)*x2 <= 8
% x1, x2 >= 0
%
% Except, we need this in standard form:
% minimize f’*x subject to. A*x <= b
function simplex_hw
c = [-5 -1]’;
A = [1 1; 2 1/2];
b = [5 8]’;
lb = [0,0]’;
ub = []; % No upper bound
%% Plot the LP polytope
clf;
plotregion(-A,-b,lb,ub);
box off; hold on;
% plot(x(1), x(2), ’ro’,’MarkerSize’,16);
hold off;
%% Convert the LP into standard form
cs = [-5 -1 0 0]’;
AS = [1 1 1 0;2 1/2 0 1];
3
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
%% Start with a simple feasible point
x = [zeros(2,1); b];
Bind = [3,4];
Nind = [1,2];
simplex_step(cs,AS,b,Bind,Nind,1);
%% Show all iterates
x = [zeros(2,1); b];
Bind = [3,4];
Nind = [1,2];
sol = 0;
clf;
plotregion(-A,-b,lb,ub);
box off; hold on;
while ~sol
plot(x(1), x(2), ’ro’,’MarkerSize’,16);
[x,Bind,Nind,sol] = simplex_step(cs,AS,b,Bind,Nind,0);
end
plot(x(1), x(2), ’r*’,’MarkerSize’,16);
hold off;
The results are
x =
4 0 1 0
Bind =
1 3
Nind =
2 4
sol =
1
It shows that after 1 iterations, we find the solution (see Fig.1).Besides, we also know that the minimizer is x∗
1= 4, x∗
2= 0 and
the slack variables are s∗1= 1, s∗
2= 0, which means the constrain
2x1 + (1/2)x2 ≤ 8 and x2 ≥ 0 are hit while others are not.
4
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
Figure 1: All iterates of Simplex method
Problem 3
Show that the these two problems are dual by showing the equivalence of theKKT conditions:
minimizex
cTx (3)
subject to Ax = b, x ≥ 0
and
maximizeλ
bTλ (4)
subject to ATλ ≤ c, λ ≥ 0.
Answer: The Lagrangian function for the problem (3) is
L(x,λ, s) = cTx− λT (Ax− b)− sTx.
The KKT conditions for problem (3) are
∂L
∂x= c−A
Tλ− s = 0,
∂L
∂λ= Ax− b = 0,
x ≥ 0,
s ≥ 0,
xisi = 0, i = 1, · · · , n.
The dual problem (4) can be rewritten as
minimizeλ
−bTλ (5)
subject to c−ATλ ≥ 0, λ ≥ 0.
5
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
By using x to denote the Lagrange multipliers for the constraintsc−A
Tλ ≥ 0, we see that the Lagrangian function is
L(λ,x) = −bTλ− xT (c−A
Tλ).
Then we have
∂L
∂λ= −b+Ax = 0,
c−ATλ ≥ 0
x ≥ 0,
xi(c−ATλ)i = 0, i = 1, · · · , n.
Let s = c −ATλ, then we find that the the KKT conditions for
the dual problem are the same as the ones for the original problem.
6
Yingwei Wang Dept. of Math, Purdue Univ [email protected]
purdue university · cs 52000
computational methods in optimization
HOMEWORKSOLUTION
Yingwei Wang
February 22, 2013
Homework 6
What is the computational complexity of the simplex method?
Consider the LP:
minimizex
cTx (1)
subject to Ax = b, x ≥ 0,
where A ∈ Rm×n and x, c,b ∈ R
n.It is obvious that the total cost equals to the cost at each step times the
number of iterations.
About each step Recall the simplex algorithm in each step (see Procedure13.1 on Page 370 of Nocedal and Wright’s book). We have to solve two linearsystems involving the matrix BinRm×m at each step, namely
BTλ = cB, Bd = Aq.
If we do the LU factorization for B, as suggested in Section 13.4 in thetextbook, then the complexity in each step is
Table 1: Computational complexity in each stepOperation Cost
decompose B into LU 2m3/3 flopssolve λ and d 4m2 flops
It gives the total cost at each step is roughly O(m3) (or O(m2n)).Now we reach the main point. In the next iteration of the simplex algorithm,
only one column is exchanged between B andN . Hence, the number of operationsper iteration can be reduced to O(mn).
About the number of iterations In many cases, the number of iterationsis O(αm), where α depends on n.
In 1972, Klee and Minty gave an example showing that the worst-case com-plexity of simplex method (in the d-dimensional space) which need iterations2d − 1 for the simplex approach.
1