Homework 1 - Purdue Universitywang838/notes/HW/CS20_HW.pdf · HOMEWORK SOLUTION YingweiWang January16,2013 Pleaseanswerthe followingquestionsin completesentencesinatyped manuscript

Yingwei Wang Dept. of Math, Purdue Univ [email protected]

purdue university · cs 52000

computational methods in optimization

HOMEWORKSOLUTION

Yingwei Wang

January 16, 2013

Please answer the following questions in complete sentences in a typed manuscriptand submit the solution to me in class on January 17th, 2013.

Homework 1

Problem 1: Some quick theory

Show, using the definition, that the sequence 1+ k−k converges superlinearlyto 1.

Answer:

Let xk = 1 + k−k, then

limk→∞

xk+1 − 1

xk − 1,

= limk→∞

(k + 1)−(k+1)

k−k,

= limk→∞

kk

(k + 1)k+1,

= limk→∞

1

(1 + 1/k)k1

k + 1,

= e−1 limk→∞

1

k + 1,

= 0.

It follows that the sequence 1 + k−k converges superlinearly to 1.

1


Problem 2: Raptors in space

Mr. Munroe (the xkcd author) decided that trapping you with raptors in theplane was too easy for someone that has taken this class.After all, you did come up with the solution that you should jump to escapethem, didn’t you?

Your new problem is to solve the generalized raptor problem:Suppose raptors are positioned at the vertices of a k-dimensional regular sim-

plex. You are at the center 20 m away from the vertices. One of the raptors hasa bum leg. Which direction should you run to maximize your survival time?

• Ignore all acceleration, like we did in class.

• The slow raptor runs at 10 m/s

• The fast raptors run at 15 m/s

• You run at 6 m/s

• A raptor will catch you if you are within 20 centimeters.

Checkout wikipedia http://en.wikipedia.org/wiki/Simplex about how tofind the coordinates of the raptors in a general space, or just use this implementa-tion: http://people.sc.fsu.edu/~jburkardt/m_src/simplex_coordinates/simplex_coordinates1.m

1. Modify the raptorchase.m function to compute the survival time of a hu-

man in a three-dimensional raptor problem. Show your modified function,and show the survival time when running directly at the slow raptor.

Answer: For the three-dimensional raptor problem, the only thing we needis the spherical coordinates (radius r, inclination φ, azimuth θ), wherer ∈ [0,∞), φ ∈ [0, π], θ ∈ [0, 2π]:

x = r sin(φ) cos(θ),

y = r sin(φ) sin(θ),

z = r cos(φ).

(1)

The modified Matlab codes, named raptorchase3.m , are the following:

function T = raptorchase3(theta,phi)

% RAPTORCHASE Simulate the survival time of the human in the XKCD raptor

% problem in 3 dimension.

%

% The XKCD raptor problem is posed as follows:

%

% A human is at the center of an equilaterial triangle with side length

% 20 meters. At each corner is a velociraptor, whose maximum speed

% is 25 m/s. The top corner has a velociraptor with a broken leg,

% which is limited to 10 m/s speed. What direction should you run in

% order to maximize survival time?

%

% This function uses ode45 to explicitly simulate the raptor motion and

% compute the first time when a raptor is within 20 cm of the position of

% the human.

%

% T = raptorchase3(theta,phi) returns the survival time for the angle theta and phi.

% Note that theta \in (0, 2\pi) and phi \in (0,\pi).

2

http://en.wikipedia.org/wiki/Simplex

http://people.sc.fsu.edu/~jburkardt/m_src/simplex_coordinates/simplex_coordinates1.m


%

% This function is based on ideas presented by Nick Henderson at the ICME

% Open Day in May.

vhuman = 6; % human velocity in m/s

vraptor0 = 10; % slow raptor velociy in m/s

vraptor = 15; % raptor velocity in m/s

triangle_dimension = 20; % in meters

raptor_min_distance = 0.2; % a raptor within 20 cm can attack

tmax = 10; % maximum time for integration

% Add the ODE function

function dpos = change_in_positions(~,pos)

human = [pos(1) pos(2) pos(3)];

nraptors = length(pos)/3 - 1;

assert(ceil(nraptors) == nraptors) % integer operations are exact

dpos = zeros(size(pos));

dpos(1) = vhuman*sin(phi)*cos(theta);

dpos(2) = vhuman*sin(phi)*sin(theta);

dpos(3) = vhuman*cos(phi);

for i=1:nraptors

if i>1, vrap=vraptor;

else vrap=vraptor0;

end

raptor = [pos(3*i+1) pos(3*i+2) pos(3*i+3)];

tdir = (human - raptor)/norm(human-raptor);

dpos(3*i+1) = tdir(1)*vrap;



end

end

% Add a function to stop the ODE evaluation when the raptors get close

function [val,isterm,dir]=eaten_event(~,pos)

dir=0; % matlab bookkeeping

isterm=1;

human = [pos(1) pos(2) pos(3)];

nraptors = length(pos)/3 - 1;

assert(ceil(nraptors) == nraptors) % integer operations are exact

raptor_dist = zeros(nraptors,1);

for i=1:nraptors

raptor = [pos(3*i+1) pos(3*i+2) pos(3*i+3)];

raptor_dist(i) = norm(human-raptor);

end

val = min(raptor_dist) - raptor_min_distance; % val = 0 when captured

end

pos = 20*[zeros(3,1) simplex_coordinates1(3)];

p0 = pos(:); % unroll pos

3


opts = odeset(’Events’,@eaten_event);

sol = ode45(@change_in_positions, [0,tmax], p0, opts);

T = max(sol.x);

end

When running directly at the slow raptor, the angle is (θ = 0, φ = pi/2)and the survival time is T = 1.2337.

2. Utilize a grid-search strategy to determine the best angle for the human torun to maximize the survival time. Show the angle.

Answer: Let the grid be θ = [0 : 0.01 : 2π], φ = [0 : 0.01 : π]. Thebest angle I found is (θ, φ) = (5.65, 1, 57) and the maximal survival time isT = 1.5580.

Further, the detailed numerical results are shown in Fig.1.

Figure 1: Survival time vs (θ, φ)

3. Discuss the major challenge for solving this problem in four dimensions.(Or if you are feeling ambitious, solve it in 4d, and discuss would might bea problem in 5d.)

Answer: For the problem in 4-d, I think we just need the 4-d sphericalcoordinates:

x1 = r cos(φ1),

x2 = r sin(φ1) cos(φ2),

x3 = r sin(φ1) sin(φ2) cos(φ3),

x4 = r sin(φ1) sin(φ2) sin(φ3),

(2)

where φ1, φ2 ∈ [0, π] and φ3 ∈ [0, 2π].

For the 4-d problem, I obtained this solution:

best angle : (0, 7, 3.1, 5.2),

survival time : T = 1.5463.

The major challenge for solving the problem in multi-dimension is the ex-pensive computational cost when fine grid is used to search the best angle.

4




HOMEWORKSOLUTION

Yingwei Wang

January 22, 2013

Please answer the following questions in complete sentences in a typed manuscriptand submit the solution to me on blackboard on January 25th, 2012, by 5pm.

Homework 2: Convexity

Convex functions are all the rage these days, and one of the interests ofstudents in this class. You may have to read a bit about convexity on wikipediaor in the book.

Problem 1

Let’s do some matrix analysis to show that a function is convex. Solve problem2.7 in the textbook, which is:

Suppose that f(x) = xTQx, where Q is an n× n symmetric positivesemi-definite matrix. Show that this function is convex using thedefinition of convexity, which can be equivalently reformulated:

f(y + α(x − y))− αf(x)− (1− α)f(y) ≤ 0

for all 0 ≤ α ≤ 1 and all x, y ∈ Rn.

This type of function will frequently arise in our subsequent studies, so it’s animportant one to understand.

Answer: Let α ∈ [0, 1] and x, y ∈ Rn, then compute

f(y + α(x− y))− αf(x)− (1− α)f(y)

= (y + α(x− y))T Q (y + α(x− y))− αxTQx− (1− α)yTQy,

= α(α− 1)(

xTQx+ yTQy + yTQx+ xTQy)

,

= α(α− 1)(x+ y)TQ(x+ y).

We know that Q is an n × n symmetric positive semi-definitematrix. Thus,

(x+ y)TQ(x+ y) ≥ 0, ∀x, y ∈ Rn.

Besides,α(α− 1) ≤ 0, ∀α ∈ [0, 1].

It follows that

f(y + α(x− y))− αf(x)− (1− α)f(y) ≤ 0,

for all 0 ≤ α ≤ 1 and all x, y ∈ Rn, which means f(x) = xTQx is a

convex function.

1


Problem 2: Convexity and least squares

1. Show that f(x) = ‖b − Ax‖2 is a convex function. Feel free to use theresult proved on the last homework.

Answer: First, let b = 0 and consider the function

f0(x) = ‖Ax‖2 = xTA

TAx.

It is easy to know that the matrix ATA is symmetric positive

semi-definite, since

(ATA)T = A

TA,

xTA

TAx = ‖Ax‖2 ≥ 0, , ∀x ∈ R

n.

By the conclusion of the previous problem, we can know thatthe function f0(x) is convex.

Second, after simple algebraic work, we can find that

f(y + α(x− y))− αf(x)− (1− α)f(y)

= f0(y + α(x− y))− αf0(x)− (1− α)f0(y),

for all 0 ≤ α ≤ 1 and all x,y ∈ Rn.

It follows that f(x) = ‖b−Ax‖2 is also a convex function.

2. Show that the null-space of a matrix is a convex set.

Answer: Let A be any m× n matrix and its null-space be

ker(A) = {x ∈ Rn : Ax = 0}.

Let x,y ∈ ker(A), then

Ax = 0, Ay = 0,

⇒ A(αx+ (1− α)y) = αAx+ (1− α)Ay = 0, ∀α ∈ [0, 1],

which means αx+ (1− α)y ∈ ker(A).

It follows that ker(A) is a convex set.

2




HOMEWORKSOLUTION

Yingwei Wang

February 1, 2013

Homework 3: Optimality and Constraints

Problem 0: List your collaborators.

Please identify anyone, whether or not they are in the class, with whom youdiscussed your homework. This problem is worth 1 point, but on a multiplicativescale. (Note that collaboration is not allowed on the bonus question below.)

Answer: I guarantee that all of the homeworks are done by myself,no discussion or collaboration with others. BTW, I really enjoy that,although it is time-consuming.

Problem 1: Optimization software and optimality

We’ll be frequently using software to optimize functions, this question willhelp familiarize you with two pieces of software: Poblano and the Matlab opti-mization toolbox.

The function we’ll study is the Rosenbrock function:

f(x) = 100(x2 − x2

1)2 + (1− x1)

2.

I briefly talked about this function in class and called it the banana function.Now it’s your turn to look at it!

1. Show a contour plot of this function

Answer: The contour and 3-d plot of the Rosenbrock functionare shown in Figs.1-2.

2. Write the gradient and Hessian of this function.

Answer: The gradient and Hessian of the Rosenbrock functionare

g(x) =

[

−400(x2 − x21)x1 − 2(2− x1)

200(x2 − x21)

]

,

H(x) =

[

1200x21 − 400x2 + 2 −400x1

−400x1 200

]

.

3. By inspection, what is the minimizer of this function? (Feel free to find theanswer by other means, e.g. looking it up, but make sure you explain why

you know that answer must be a global minimizer.)

Answer: It is obvious that if x∗ = (1, 1)t then f(x∗) = 0. Be-sides,

f(x) ≥ 0, ∀x ∈ R2.

1


30

30

30

30

3030

30

30

30

30

60

60

60

60

60

60

60

60

60

60

90

90

90

90

90

90

90

90

90

120

120

120

120

120

120

120

120

150

150

150

150

150

150

150

150

180

180

180

180

180

180

180

180

210

210210

210

210

210

210

210

240

240 240

240

240

240

240

270

270

270

270

270

270

270

300

300

300

300

300

Contour of Rosenbrock

−1.5 −1 −0.5 0 0.5 1 1.5−1

−0.5

0

0.5

1

1.5

2

2.5

3

50

100

150

200

250

300

Figure 1: Contour of Rosenbrock function

Figure 2: Plot of Rosenbrock function

It implies that x∗ = (1, 1)t is the global minimizer of this Rosen-brock function.

Furthermore, x∗ = (1, 1)t is the strict global minimizer of theRosenbrock function. The reason is shown as follows.

g(x∗) = 0,

H(x∗) =

[

802 −400−400 200

]

.

The (numerical) eigenvalues ofH(x∗) are λ1 = 0.399360767487622and λ2 = 1001.60063923251, which indicates that H(x∗) ≻ 0.

2


4. Explain how any optimization package could tell that your solution is alocal minimizer.

Answer: Since the Rosenbrock function is a non-convex function(see Fig.2), the CVX does not work in this case. I think thePoblano can be employed here.

5. Use Poblano to optimize this function starting from a few different points.Be adversarial if you wish. Does it always get the answer correct? Use atable or figure to illustrate your findings if appropriate. Show your code touse Poblano. Your code should have comments to explain what it is doingand you should explain any difficulties in implementing this test.

Answer: In order to call the subroutines in Poblano, we needwrite the function named rosenbrock.m.

function [f,g,h] = rosenbrock(x)

%%% This function returns the function value, partial derivatives

%%% and Hessian of the rosenbrock function, given by

%%% f(x1,x2) = 100*(x2-x1^2)^2 + (1-x1)^2.

%% Rosenbrock "banana" function

f = 100*(x(2)-x(1)^2)^2 + (1-x(1))^2;

%% gradient

g=[-400*(x(2)-x(1)^2)*x(1)-2*(1-x(1)); 200*(x(2)-x(1)^2)];

%% hessian

if nargout > 2

h=[1200*x(1)^2-400*x(2)+2, -400*x(1); -400*x(1), 200];

end

I choose NCG(nolinear conjugate gradient) method, correspond-ing to the function ncg in Problano. Also, I choose three differ-ent starting points: (−3,−5)t, (0, 0)t and (20, 10)t. The resultsare given as following.

I. Initial guess x0 = (−3,−5)t, which is near the true solution.

The output of NCG method is

>> out = ncg(@rosenbrock,[-3,-5]’,’TraceX’,true)

Iter FuncEvals F(X) ||G(X)||/N

------ --------- ---------------- ----------------

0 1 19616.00000000 8519.81314349

1 9 2019.81456799 455.63650412

2 14 2.07319311 1.99427728

3 18 2.04100794 1.30842367

... ...

22 110 0.00000971 0.02788423

23 114 0.00000000 0.00126552

24 117 0.00000000 0.00000551

out =

3


Params: [1x1 inputParser]

ExitFlag: 0

ExitDescription: ’Successful termination based on StopTol’

X: [2x1 double]

F: 3.31201541282898e-013

G: [2x1 double]

FuncEvals: 117

Iters: 24

TraceX: [2x25 double]

It shows that the solution with the function value 10−13 can befound after 117 function evaluations. Besides, I also plot theout.TraceX in Fig.3.

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5Nonlinear Conjuate Gradient Iteration from starting point (−3,−5)

1

2

3

4

5

6

7

8

x 104

Figure 3: NCG iteration starting from (−3,−5)t.

By Wikipedia [2], by adaptive coordinate descent from startingpoint (−3,−5)t, the solution with the function value 10−10 canbe found after 325 function evaluations.

Now we know that the NCG method is better than adaptive

coordinate descent in this case.

II. Initial guess x0 = (0, 0)t, which is in the long, narrow,parabolic shaped flat valley, i.e. x2 = x2

1. (see Fig.1)


>> out = ncg(@rosenbrock,[0,0]’,’TraceX’,true)


------ --------- ---------------- ----------------

0 1 1.00000000 1.00000000

1 7 0.77110969 2.60058899

4


... ...

12 58 0.00000003 0.00007091

13 60 0.00000000 0.00000599

out =


ExitFlag: 0


X: [2x1 double]

F: 7.1901609098776e-014

G: [2x1 double]

FuncEvals: 60

Iters: 13


The detailed iteration process is very similar to the first case(just the last half part). I just want to mention the results givenby others [3]:

*****************************************

Solution from the steepest descent:

x= 1.0000 1.0000

f(x1,x2)= 1.6983e-011

Iterations: 8147

Solution from the regularized steepest descent:

x= 1.0000 1.0000

f(x1,x2)= 6.4558e-014

Iterations: 194

Solution from the conjugate gradient:

x= 1.0000 1.0000

f(x1,x2)= 1.0418e-023

Iterations: 21

Solution from Quasi-Newton Rank 2:

x= 1.0000 1.0000

f(x1,x2)= 1.3264e-012

Iterations: 151

*****************************************

It also implies that the (nonlinear) conjugate gradient methodis better than (regularized) steepest descent method.

III. Initial guess x0 = (20, 10)t, which is far from the truesolution.


>> out = ncg(@rosenbrock,[20,10]’,’TraceX’,true)

5



------ --------- ---------------- ----------------

0 1 15210361.00000000 1560506.41791727

1 9 7134.16612179 2582.53033913

2 17 17.34464550 19.59251511

3 21 17.25008445 0.65048548

4 24 16.55446511 41.85639750

5 32 15.66275780 60.65921255

... ....

28 148 0.00005408 0.00330572

29 150 0.00000019 0.00963675

30 152 0.00000000 0.00000849

out =


ExitFlag: 0


X: [2x1 double]

F: 3.75606726452345e-011

G: [2x1 double]

FuncEvals: 152

Iters: 30


Also, I want to mention the results given by others [3]:

*****************************************

Solution from the steepest descent:

x= 1.0000 1.0000

f(x1,x2)= 1.8201e-011

Iterations: 31006

Solution from the regularized steepest descent:

x= 9.7544 95.1517

f(x1,x2)= 7.6641e+001

Iterations: 50000

Solution from the conjugate gradient:

x= 1.0000 1.0000

f(x1,x2)= 4.7100e-017

Iterations: 35

Solution from Quasi-Newton Rank 2:

x= 1.0000 1.0000

f(x1,x2)= 3.7352e-012

Iterations: 471

*****************************************

It verifies again that the CG and NCG methods are very good.

6


6. Read about Matlab’s fminunc function and determine how to provide itwith Hessian and Gradient information. Use this toolbox to optimize thefunction starting from a few different points. Show your code to use thisfunction and explain any differences (if any) you observe in comparison tousing Poblano.

Answer: The following Matlab code is about how to use fminuncto solve this problem, including how to provide it with Hessianand Gradient information.

%% indicate gradient is provided and display iteration

options = optimset(’GradObj’,’on’,’Hessian’,’on’,’Display’,’iter’);

%% use fminunc to find the minimizer

[x,fval,exitflag,output] = fminunc(@rosenbrock,[-3,-5]’,options);

Again, let us consider two starting points: (−3,−5)t and (20, 10)t.

I. Initial guess x0 = (−3,−5)t, which is near the true solution.

The output of fminunc in Matlab is

Norm of First-order

Iteration f(x) step optimality CG-iterations

0 19616 1.68e+004

1 372.862 10 2e+003 1

2 13.0702 1.84663 7.33 1

3 13.0702 18.9087 7.33 1

4 13.0702 4.72718 7.33 0

5 11.6963 1.1818 48.4 0

... ...

32 0.000972228 0.105897 0.913 1

33 4.39254e-005 0.0311832 0.0562 1

34 1.92081e-007 0.0139071 0.0149 1

35 2.8556e-012 0.000510656 1.49e-005 1

II. Initial guess x0 = (20, 10)t, which is far from the true solu-tion. The output of fminunc in Matlab is

Norm of First-order

Iteration f(x) step optimality CG-iterations

0 1.52104e+007 3.12e+006

1 2.81507e+006 10 9.14e+005 1

2 363863 20 2.38e+005 1

3 2994.98 40 1.97e+004 1

4 65.4375 5.27646 16.4 1

5 65.4375 80 16.4 1

... ...

56 0.000935215 0.0323693 0.184 1

57 7.09916e-005 0.0655248 0.334 1

58 5.93496e-008 0.00258142 0.00135 1

59 3.57116e-013 0.000540531 2.37e-005 1

It seams that more iterations are needed in Matlab than inPoblano.

7


Problem 2: Log-barrier terms

The basis of a class of methods known as interior point methods is that wecan handle non-negativity constraints such as x ≥ 0 by solving a sequence ofunconstrained problems where we add the function b(x;µ) = −µ

∑

ilog(xi) to

the objective. Thus, we convert

{

minimize f(x)

subject to x ≥ 0(1)

intominimize f(x) + b(x;µ). (2)

1. Explain why this idea could work. (Hint: there’s a very useful picture youshould probably show here!)

Answer: I think there are at least three key things behind thisidea.

Suppose x∗ be the minimizer to the problem (2).

First, x∗ must be a (strictly) feasible solution to problem (1).

Second, the log-barrier function b(x;µ) is smooth and strictlyconvex.

Third, if 0 < µ ≪ 1, then x∗ is very close to the minimizer ofthe original problem (1).

2. Write a matrix expression for the gradient and Hessian of f(x) + b(x;µ) interms of the gradient vector g(x) and the Hessian matrix H(x) of f .

Answer:

b(x;µ) = −µ∑

i

log(xi),

⇒∂b(x;µ)

∂xi

= −µx−1i ,

⇒∂2b(x;µ)

∂xi∂xj

= µx−2i δij .

It follows that the gradient and Hessian of f(x) + b(x;µ) are

∇(f(x) + b(x;µ)) = g(x)− µx−1,

Hessian(f(x) + b(x;µ)) = H(x) + diag(x−2),

where g(x) and H(x) are the gradient vector and Hessian ma-trix of f respectively, x−1 = (x−1

1 , x−22 , · · · , x−n)T and diag(x−2)

means the diagonal matrix with x−2i at the diagonal entries.

8


Problem 3: Do constraints always make a problem hard?

Let f(x) = (1/2)xTQx− xTc. Recall that if Q is positive-semi-definite then

f(x) is convex.

1. Construct and illustrate an example in R2 to show that that f(x) is non-

convex if Q is indefinite.

Answer: Without loss of generality, we can assume that c = 0.Let

Q =

(

1 00 −1

)

,

then

f(x) = (1/2)xTQx,

= (1/2)(x21 − x2

2).

Choose x = (0,−1)t,y = (0, 1)t. On one hand,

f(x) = f(y) = −1,

⇒ αf(x) + (1− α)f(y) = −1, ∀α ∈ (0, 1).

On the other hand,

f(1/2x+ 1/2y) = f(0) = 0.

It follows that f(x) is not convex.

Actually, the figure for the function f(x) = (1/2)(x21 − x2

2) areshown in Fig.4.

2. Now, suppose we consider the problem:

minimize f(x)

subject to Ax = b.

For this problem, show that any local minimizer is a global solution, even if f(x)is non-convex.

Proof. Assume A is full rank.It is easy to know that the gradient vector g(x) and the Hessian

matrix H(x) of f are

g(x) = Qx− c,

H(x) = Q.

Suppose x∗ is a local minimizer, then we have

g(x∗) = 0 ⇒ Qx∗ = c, (3)

H(x) � 0 ⇒ Q � 0. (4)

If x is another feasible solution, i.e.

Ax = b,

9


−1−0.5

00.5

1

−1

−0.5

0

0.5

1−0.5

0

0.5

f(x) = 1/2*(x12 − x

22)

Figure 4: Hyperbolic paraboloid

and p = x∗ − x, then Ap = 0.Substituting x = x∗ − p into f(x) yields

f(x) =1

2(x∗ − p)tQ(x∗ − p)− (x∗ − p)tc,

= f(x∗) +1

2ptQp− pt(Qx∗ − c),

(3)= f(x∗) +

1

2ptQp,

≥ 0, by (4).

It follows that for quadratic programming problem, the local min-imizer, if exists, is also the global minimizer.

1. Is there always a local minimizer? Prove that there is, or show a counter-example. (Hint, there may be some ambiguity in this problem, if so, I’mlooking for a discussion and what you think the right answer is, so yourreasoning is more important than your actual answer.)

Answer: No. For example, there is no local minimizer for thefunction f(x) = (1/2)(x2

1 − x22), see Fig.4.

2. Write down and test an algorithm to solve these problems and find a globalminimizer.

Answer: Let A ∈ Rm×n, m ≤ n, with full rankm, x ∈ R

n be theunknowns and λ ∈ R

n be the associated Lagrange multiplier. Inorder to solve this problem, we just need to solve

(

Q At

A 0

)(

x

λ

)

=

(

c

b

)

(5)

10


I think we should assume Q ∈ Rn×n be also full rank.

Of course, usually it is costly to directly solve the (m+n)×(m+n) linear system (5). People propose many efficient way to solve(5) (see the chapter 16 of the textbook [1]), in which Krylovsubspace methods are appropriate candidates, in my mind.

However, the numerical results shown here are obtained by back-slash in Matlab. I just want to test the problem f(x) = (1/2)(x2

1−x22) with different constants.

Q =

[

1 00 −1

]

, c =

[

00

]

. (6)

I. ChooseA =

[

1 0]

, b = 1. (7)

Then the solution is

x∗ = [1, 0]t, λ∗ = −1.

II. ChooseA =

[

1 1]

, b = 1 (8)

Then the solution from Matlab is

x∗ = [NaN, Inf ]t, λ∗ = Inf.

Why I show this two numerical tests? Well, I just want toconclude that, if without constrains, there is no minimizer forf(x) = (1/2)(x2

1 − x22). If with constrains, then either global

minimizer exists or still no minimizer.

11


Problem 4: Constraints can make a non-smooth problem

smooth.

Show that

minimize ‖Ax− b‖∞

can be reformulated as a smooth, constrained optimization problem.

Answer: The equivalent smooth optimization problem of infinity-norm minimization problem can be obtained by minimizing an aux-iliary variable t ∈ R with inequality constrains:

minimize t ,

subject to ‖Ax− b‖∞ ≤ t.

Further, the infinity norm constraint can be written as a set of linearinequalities, and the problem becomes:

minimize t ,

subject to − te ≤ Ax− b ≤ te,

where e is the vector with 1 in all entries.

References

[1] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. SpringerSeries in Operations Research and Financial Engineering. Springer, Berlin-Heidelberg, 2 edition, 2008.

[2] Wikipedia. Rosenbrock function. http://en.wikipedia.org/wiki/Rosenbrock_function,2013.

[3] M. Zhou. GG7920 Homework 2. http://utam.gg.utah.edu/~u0027410/course/gg6920/hw2/hw2.htm,2005.

12

http://en.wikipedia.org/wiki/Rosenbrock_function

http://utam.gg.utah.edu/~u0027410/course/gg6920/hw2/hw2.htm




HOMEWORKSOLUTION

Yingwei Wang

February 7, 2013

Homework 4:




Problem 1: Steepest descent

(Nocedal and Wright, Exercise 3.6) Let’s conclude with a quick problem toshow that steepest descent can converge very rapidly! Consider the steepestdescent method with exact line search for the function f(x) = (1/2)xTQx−xTb.Suppose that we know x0 − x∗ is parallel to an eigenvector of Q. Show that themethod will converge in a single iteration.

Answer: Let us suppose that

f(x) = (1/2)xTQx− xTb, (1)

where Q is symmetric and positive definite.The steepest descent iteration for (1) is given by

xk+1 = xk −

(

∇f t

k∇fk

∇t

kQ∇fk

)

∇fk, (2)

where ∇fk = Qxk − b. For k = 0, we have

x1 = x0 −

(

∇f t

0∇f0∇t

0Q∇f0

)

∇f0. (3)

Supposex0 − x∗ = γy

k, (4)

where γ is a constant and ykis a normalized eigenvector of Q, i.e.

Qyk = λkyk, yt

kyk = 1. (5)

It follows that

∇f0 = Qx0 − b,

= Q(x∗ + γyk)− b,

= Qx∗

− b+ γλkyk.

1


Since x∗ is the minimizer, ∇f(x∗) = Qx∗ − b = 0. Then

∇f0 = γλkyk. (6)

Substituting (6) into (3) yields

x1 = x0 −γ2λ2

k

γ2λ3k

γλkyk,

= x0 − γyk,

= x∗.

It that the method will converge in a single iteration.Remark: if Q is not positive definite, the proof also work.

Problem 2: Inequality constraints

Draw a picture of the feasible region for the constraints:

1− x1 − x2

1− x1 + x2

1 + x1 − x2

1 + x1 + x2

≥ 0

Answer: The feasible region of these inequality constraints areshown in Fig.1.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1−x1−x

2=01+x

1−x

2=0

1+x1+x

2=0

1−x1+x

2=0

Feasible region

Figure 1: The feasible region of the inequality constraints

2


Problem 3: Necessary and sufficient conditions

Let f(x) = 1

2xTQx− xT c.

1. Write down the necessary conditions for the problem:

minimize f(x)

subject to x ≥ 0.

Answer: The necessary conditions for this problem is

g(x∗)− λ = 0,

λi ≥ 0, x∗

i≥ 0, and either λi = 0 or x∗

i= 0,

where x∗ is the minimizer and g(x∗) = Qx∗ − c.

2. Write down the sufficient conditions for the same problem.

Answer: The sufficient conditions for this problem is

g(x∗)− λ = 0,

λi ≥ 0, x∗

i≥ 0, and either λi = 0 or x∗

i= 0,

Q ≻ 0,

where x∗ is the minimizer and g(x∗) = Qx∗ − c.

3. Consider the two-dimensional case with

Q =

[

1 22 1

]

c =

[

0−1.5

]

.

Determine the solution to this problem by any means you can, and justifyyour work.

Answer: In this case, f(x) = 0.5(x21+4x1x2+x2

2)+1.5x2. Thenthe Lagrangean function is

L(x, λ) = 0.5(x2

1 + 4x1x2 + x2

2) + 1.5x2 − λ1x1 − λ2x2. (7)

Now we need to solve this problem:

∂L

∂x1

= x1 + 2x2 − λ1 = 0,

∂L

∂x2

= 2x1 + x2 + 1.5− λ2 = 0,

λ1 ≥ 0, x1 ≥ 0 and either λ1 = 0 or x1 = 0,

λ2 ≥ 0, x2 ≥ 0 and either λ2 = 0 or x2 = 0.

The solution is

x∗

1 = 0, x∗

2 = 0, λ1 = 0, λ2 = 1.5.

It follows that the minimizer of this problem is x∗ = (0, 0)t.

3


4. Produce a Matlab or hand illustration of the solution showing the functioncontours, gradient, the constraint normal. What are the active constraintsat the solution? What is the value of λ in ATλ = g?

Answer: The function contours and gradient are shown in Fig.2.λ = (0, 1.5)t.

Both of x1 ≥ 0 and x2 ≥ 0 are active constraints. The reasonis that if no constraint x1 ≥ 0, then setting x2 → +∞ andx1 = −2x2 leads to f(x) = −∞ while if no constraint x2 ≥ 0,then just setting x1 → +∞ and x2 = −0.5x1 also leads tof(x) = −∞.

Further, since in this case, Q is not positive definitive, if noconstrains, the minimizer might not exit.

Constraint normal

Contour and gradient

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Figure 2: The contour and gradient of this function

4




HOMEWORKSOLUTION

Yingwei Wang

February 21, 2013

Homework 5




Problem 1: Make it an LP

Show that we can solve:

minimize ‖x‖1

subject to Ax = b

by constructing an LP in standard form.This problem is preparation for a mini-project that isn’t quite ready yet.

Answer: It is obvious that this problem is equivalent to

minimize∑

n

i=1yi

subject to Ax = b,

− y ≤ x ≤ y.

It follows that

minimize∑

n

i=1yi

subject to Ax = b,

y − x ≥ 0,

y + x ≥ 0,

y ≥ 0.

Let t = y − x, s = y + x, then x = (s − t)/2,y = (s + t)/2.Besides, let

x =

y

s

t

, c = (1, · · · , 1, 0, · · · , 0, 0, · · · , 0)t,

A =

(

0 A/2 −A/22I −I −I

)

, b =

(

b

0

)

.

1


Now the standard form of LP is

minimize ctx

subject to Ax = b,

x ≥ 0.

Problem 2

Using the codes from class, illustrate the behavior of the simplex method onthe LP from problem 13.9 in Nocedal and Wright:

minimize −5x1 − x2

subject to x1 + x2 ≤ 5 (1)

2x1 + (1/2)x2 ≤ 8

x ≥ 0

starting at [0, 0]T after converting the problem to standard form.Use your judgement in reporting the behavior of the method.

Answer: Method I: Use the linprog.m to solve the problem (1).

c = [-5 -1]’;

A = [1 1; 2 1/2];

b = [5 8]’;

lb = zeros(size(c));

options = optimset(’LargeScale’,’off’,’Simplex’,’on’,’Display’,’iter’);

[x,fval,exitflag,output,lambda] = linprog(c,A,b,[],[],lb,[],[0 0]’,options)

The results are as follows:

Phase 2: Minimize using simplex.

Iter Objective Dual Infeasibility

f’*x A’*y+z-w-f

0 0 5.09902

1 -20 0

Optimization terminated.

x =

4

0

fval =

-20

exitflag =

1

output =

iterations: 1

2


algorithm: ’medium scale: simplex’

cgiterations: []

message: ’Optimization terminated.’

constrviolation: 0

firstorderopt: 0

lambda =

ineqlin: [2x1 double]

eqlin: [0x1 double]

upper: [2x1 double]

lower: [2x1 double]

It shows that after just 1 iteration, we can find the solution.Method II: Convert the problem (1) into standard form

minimize −5x1 − x2 (2)

subject to x1 + x2 + s1 = 5

2x1 + (1/2)x2 + s2 = 8

x1, x2, s1, s2 ≥ 0

Use the subroutine simplex step.m to solve the problem (2)

%% Define the LP

% minimize -5x1 - x2

% subject to x1 + x2 <= 5

% 2x1 + (1/2)*x2 <= 8

% x1, x2 >= 0

%

% Except, we need this in standard form:

% minimize f’*x subject to. A*x <= b

function simplex_hw

c = [-5 -1]’;

A = [1 1; 2 1/2];

b = [5 8]’;

lb = [0,0]’;

ub = []; % No upper bound

%% Plot the LP polytope

clf;

plotregion(-A,-b,lb,ub);

box off; hold on;

% plot(x(1), x(2), ’ro’,’MarkerSize’,16);

hold off;

%% Convert the LP into standard form

cs = [-5 -1 0 0]’;

AS = [1 1 1 0;2 1/2 0 1];

3


%% Start with a simple feasible point

x = [zeros(2,1); b];

Bind = [3,4];

Nind = [1,2];

simplex_step(cs,AS,b,Bind,Nind,1);

%% Show all iterates

x = [zeros(2,1); b];

Bind = [3,4];

Nind = [1,2];

sol = 0;

clf;

plotregion(-A,-b,lb,ub);

box off; hold on;

while ~sol

plot(x(1), x(2), ’ro’,’MarkerSize’,16);

[x,Bind,Nind,sol] = simplex_step(cs,AS,b,Bind,Nind,0);

end

plot(x(1), x(2), ’r*’,’MarkerSize’,16);

hold off;

The results are

x =

4 0 1 0

Bind =

1 3

Nind =

2 4

sol =

1

It shows that after 1 iterations, we find the solution (see Fig.1).Besides, we also know that the minimizer is x∗

1= 4, x∗

2= 0 and

the slack variables are s∗1= 1, s∗

2= 0, which means the constrain

2x1 + (1/2)x2 ≤ 8 and x2 ≥ 0 are hit while others are not.

4


Figure 1: All iterates of Simplex method

Problem 3

Show that the these two problems are dual by showing the equivalence of theKKT conditions:

minimizex

cTx (3)

subject to Ax = b, x ≥ 0

and

maximizeλ

bTλ (4)

subject to ATλ ≤ c, λ ≥ 0.

Answer: The Lagrangian function for the problem (3) is

L(x,λ, s) = cTx− λT (Ax− b)− sTx.

The KKT conditions for problem (3) are

∂L

∂x= c−A

Tλ− s = 0,

∂L

∂λ= Ax− b = 0,

x ≥ 0,

s ≥ 0,

xisi = 0, i = 1, · · · , n.

The dual problem (4) can be rewritten as

minimizeλ

−bTλ (5)

subject to c−ATλ ≥ 0, λ ≥ 0.

5


By using x to denote the Lagrange multipliers for the constraintsc−A

Tλ ≥ 0, we see that the Lagrangian function is

L(λ,x) = −bTλ− xT (c−A

Tλ).

Then we have

∂L

∂λ= −b+Ax = 0,

c−ATλ ≥ 0

x ≥ 0,

xi(c−ATλ)i = 0, i = 1, · · · , n.

Let s = c −ATλ, then we find that the the KKT conditions for

the dual problem are the same as the ones for the original problem.

6




HOMEWORKSOLUTION

Yingwei Wang

February 22, 2013

Homework 6

What is the computational complexity of the simplex method?

Consider the LP:

minimizex

cTx (1)

subject to Ax = b, x ≥ 0,

where A ∈ Rm×n and x, c,b ∈ R

n.It is obvious that the total cost equals to the cost at each step times the

number of iterations.

About each step Recall the simplex algorithm in each step (see Procedure13.1 on Page 370 of Nocedal and Wright’s book). We have to solve two linearsystems involving the matrix BinRm×m at each step, namely

BTλ = cB, Bd = Aq.

If we do the LU factorization for B, as suggested in Section 13.4 in thetextbook, then the complexity in each step is

Table 1: Computational complexity in each stepOperation Cost

decompose B into LU 2m3/3 flopssolve λ and d 4m2 flops

It gives the total cost at each step is roughly O(m3) (or O(m2n)).Now we reach the main point. In the next iteration of the simplex algorithm,

only one column is exchanged between B andN . Hence, the number of operationsper iteration can be reduced to O(mn).

About the number of iterations In many cases, the number of iterationsis O(αm), where α depends on n.

In 1972, Klee and Minty gave an example showing that the worst-case com-plexity of simplex method (in the d-dimensional space) which need iterations2d − 1 for the simplex approach.

1

Documents

Homework 1 - Purdue Universitywang838/notes/HW/CS20_HW.pdf · HOMEWORK SOLUTION YingweiWang January16,2013 Pleaseanswerthe followingquestionsin completesentencesinatyped manuscript