Introduction to Optimization Theory - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20130906201902822.pdf · 1 MULTIVARIABLE CALCULUS In this chapter we consider

Introduction to Optimization

Theory

Lecture Notes

JIANFEI SHEN

S C H O O L O F E C O N O M I C S

S H A N D O N G U N I V E R S I T Y

Besides language and music, mathematics is one of the primarymanifestations of the free creative power of the human mind.

— Hermann Weyl

CONTENTS

1 Multivariable Calculus 1

1.1 Functions on Euclidean Space . . . . . . . . . . . . . . . . . . . 2

1.2 Directional Derivative and Derivative . . . . . . . . . . . . . . . 8

1.3 Partial Derivatives and the Jacobian . . . . . . . . . . . . . . . . 13

1.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 16

1.6 Gradient and Its Properties . . . . . . . . . . . . . . . . . . . . . 19

1.7 Continuously Differentiable Functions . . . . . . . . . . . . . . 25

1.8 Quadratic Forms: Definite and Semidefinite Matrices . . . . . 27

1.9 Homogeneous Functions and Euler’s Formula . . . . . . . . . . 28

2 Optimization in Rn 31

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . 33

2.3 Equality Constrained Optimization: Lagrange’s Method . . . . 35

2.4 Inequality Constrained Optimization: Kuhn-Tucker Theorem . 42

2.5 Envelop Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3 Convex Analysis in Rn 55

3.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.2 Separation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3 Systems of Linear Inequalities: Theorems of the Alternative . 63

3.4 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.5 Convexity and Optimization . . . . . . . . . . . . . . . . . . . . 70

References 75

Index 78

1

MULTIVARIABLE CALCULUS

In this chapter we consider functions mapping Rm into Rn, and we define

what we mean by the derivative of such a function. It is important to be

familiar with the idea that the derivative at a point a of a map between

open sets of (normed) vector spaces is a linear transformation between the

vector spaces (in this chapter the linear transformation is represented as a

n �m matrix).

This chapter is based on Spivak (1965, Chapters 1 & 2) and Munkres

(1991, Chapter 2)—one could do no better than to study theses two excel-

lent books for multivariable calculus.

Notation

We use standard notation:

N´ f1; 2; 3; : : :g D the set of all natural numbers;

Z´ f0;˙1;˙2; : : :g D the set of all integers;

Q´

�n

mW n;m 2 Z and m ¤ 0

�D the set of all rational numbers;

R´ the set of all real numbers:

We also define

RC´ fx 2 R W x > 0g and RCC´ fx 2 R W x > 0g:

2 CHAPTER 1 MULTIVARIABLE CALCULUS

1.1 Functions on Euclidean Space

Norm, Inner Product and Metric

Definition 1.1 (Euclidean n-space) Euclidean n-space Rn is defined as the

set of all n-tuples .x1; : : : ; xn/ of real numbers xi :

Rn´˚.x1; : : : ; xn/ W each xi 2 R

:

An element of Rn is often called a point in Rn, and R1, R2, R3 are often called

the line, the plane, and space, respectively.

If x denotes an element of Rn, then x is an n-tuple of numbers, the i th

one of which is denoted xi ; thus, we can write

x D .x1; : : : ; xn/:

A point in Rn is frequently also called a vector in Rn, because Rn, with

x C y D .x1 C y1; : : : ; xn C yn/ x;y 2 Rn

and

˛x D .˛x1; : : : ; ˛xn/; ˛ 2 R and x 2 Rn;

as operations, is a vector space.

To obtain the full geometric structure of Rn, we introduce three struc-

tures on Rn: the Euclidean norm, inner product and metric.

Definition 1.2 (Norm) In Rn, the length of a vector x 2 Rn, usually called

the norm kxk of x, is defined by

kxk D

qx21 C � � � x

2n:

Remark 1.3 The norm k � k satisfies the following properties: for all x;y 2

Rn and ˛ 2 R,

� kxk > 0,

� kxk D 0 iff1 x D 0,

� k˛xk D j˛j � kxk,

� kx C yk 6 kxk C kyk (Triangle inequality).

1“iff” is the abbreviation of “if and only if”.

1.1 FUNCTIONS ON EUCLIDEAN SPACE 3

? Exercise 1.4 Prove thatˇkxk � kyk

ˇ6 kx�yk for any two vectors x;y 2 Rn

(use the triangle inequality).

Definition 1.5 (Inner Product) Given x;y 2 Rn, the inner product of the

vectors x and y , denoted x � y or hx;yi, is defined as

x � y D

nXiD1

xiyi :

Remark 1.6 The norm and the inner product are related through the

following identity:

kxk Dpx � x:

Theorem 1.7 (Cauchy-Schwartz Inequality) For any x;y 2 Rn we have

jx � yj 6 kxk kyk:

Proof. We assume that x ¤ 0; for otherwise the proof is trivial. For every

a 2 R, we have

0 6 kax C yk2 D a2kxk2 C 2a.x � y/C kyk2:

In particular, let a D �.x � y/=kxk2. Then, from the above display, we get

the desired result. ut

? Exercise 1.8 Prove the triangle inequality (use the Cauchy-Schwartz In-

equality). Show it holds with equality iff one of the vector is a nonnegative

scalar multiple of the other.

Definition 1.9 (Metric) The distance d.x;y/ between two vectors x;y 2 Rn

is given by

d.x;y/ D

pnXiD1

.xi � yi /2:

The distance function d is called a metric.

Example 1.10 In R2, choose two points x1 D .x11 ; x12/ and x2 D .x21 ; x

22/

with x21 � x11 D a and x22 � x

12 D b. Then Pythagoras tells us that (Figure 1.1)

d.x1;x2/ Dpa2 C b2 D

r�x21 � x

11

�2C

�x22 � x

12

�2:

Remark 1.11 The metric is related to the norm k � k through the identity

d.x;y/ D kx � yk:


0 x1

x2

x1

x2

x11 x21

x12

x22

d.x1 ;x

2 / D

pa2 C

b2

x21 � x11 D a

x22 � x12 D b

Figure 1.1: Distance in the plane.

Subsets of Rn

Definition 1.12 (Open Ball) Let x 2 Rn and r > 0. The open ball B.xI r/

with center x and radius r is given by

B.xI r/´˚y 2 Rn W d.x;y/ < r

:

Definition 1.13 (Interior) Let S � Rn. A point x 2 S is called an interior

point of S if there is some r > 0 such that B.xI r/ � S . The set of all interior

points of S is called its interior and is denoted SB.

Definition 1.14 Let S � Rn.

� S is open if for every x 2 S there exists r > 0 such that B.xI r/ � S .

� S is closed if its complement Rn X S is open.

� S is bounded if there exists r > 0 such that S � B.0I r/.

� S is compact if (and only if) it is closed and bounded (Heine-Borel Theo-

rem).2

Example 1.15 On R, the interval .0; 1/ is open, the interval Œ0; 1� is closed.

Both .0; 1/ and Œ0; 1� are bounded, and Œ0; 1� is compact. However, the interval

.0; 1� is neither open nor closed. But R is both open and closed.

2This definition does not work for more general metric spaces. See Willard (2004) fordetails.


Limit and Continuity

Functions A function from Rm to Rn (sometimes called a vector-valued

function of m variables) is a rule which associates to each point in Rm some

point in Rn. We write

f W Rm ! Rn

to indicate that f .x/ 2 Rn is defined for x 2 Rm.

The notation f W A! Rn indicates that f .x/ is defined only for x in the

set A, which is called the domain of f . If B � A, we define f .B/ as the set

of all f .x/ for x 2 B :

f .B/´ ff .x/ W x 2 Bg :

If C � Rn we define

f �1.C /´ fx 2 A W f .x/ 2 C g :

The notation f W A! B indicates that f .A/ � B . The graph of f W A! B is

the set of all pairs .a; b/ 2 A � B such that b D f .a/.

A function f W A! Rn determines n component functions f1; : : : ; fn W A!

R by

f .x/ D�f1.x/; : : : ; fn.x/

�:

Sequences A sequence is a function that assigns to each natural number

n 2 N a vector or point xn 2 Rn. We usually write the sequences as .xn/1nD1or .xn/.

Example 1.16 Examples of sequences in R2 are

(a) .xn/ D ..n; n//.

(b) .xn/ D ..cos n�2; sin n�

2//.

(c) .xn/ D ...�1/n=2n; 1=2n//.

(d) .xn/ D ...�1/n � 1=n; .�1/n � 1=n//.

See Figure 1.2.

Definition 1.17 (Limit) A sequence .xn/ is said to have a limit x or to

converge to x if for every " > 0 there is N" 2 N such that whenever n > N",

we have xn 2 B.xI "/. We write

limn!1

xn D x or xn ! x:

Example 1.18 In Example 1.16, the sequences (a), (b) and (d) do not con-

verge, while the sequence (c) converges to .0; 0/.


0 x1

x2

x1

x2

x3

(a) .xn/ D�.n; n/

�

0 x1

x2

x1

x2

x3

x4

x5

x6

(b) .xn/ D�.cos n�

2; sin n�

2/�

0 x1

x2x1

x2

x3x4

(c) .xn/ D˚..�1/n=2n; 1=2n/

, which is convergent

0 x1

x2

x1

x2

x3

x4

(d) .xn/ D�..�1/n � 1=n; .�1/n � 1=n/

�Figure 1.2: Examples of sequences


0

(a) A continuous function.

0 x

(b) We cannot draw the graph of 1=xwithout taking our pencil off the paper.

Figure 1.3: Naive continuity.

Naive Continuity The simplest way to say that a function f W A! R is

continuous is to say that one can draw its graph without taking the pencil off

the paper. For example, a function whose graph looks like in Figure 1.3(a)

would be continuous in this sense (Crossley, 2005, Chapter 2).

But if we look at the function f .x/ D 1=x, then we see that things are

not so simple. The graph of this function has two parts—one part cor-

responding to negative x values, and the other to positive x values. The

function is not defined at 0, so we certainly cannot draw both parts of this

graph without taking our pencil off the paper; see Figure 1.3(b). Of course,

f .x/ D 1=x is continuous near every point in its domain. Such a function

deserves to be called continuous. So this characterization of continuity in

terms of graph-sketching is too simplistic.

Rigorous Continuity The notation limx!a f .x/ D b means, as in the

one-variable case, that we get f .x/ as close to b as desired, by choosing x

sufficiently close to, but not equal to, a. In mathematical terms this means

that for every number " > 0 there is a number ı > 0 such that kf .x/�bk < "

for all x in the domain of f which satisfy 0 < kx � ak < ı.

A function f W A ! Rn is called continuous at a 2 A if limx!a f .x/ D

f .a/, and f is continuous if it is continuous at each a 2 A.


? Exercise 1.19 Let

f .x/ D

˚x if x ¤ 1

3=2 if x D 1:

Show that f .x/ is not continuous at a D 1.

1.2 Directional Derivative and Derivative

Let us first recall how the derivative of a real-valued function of a real vari-

able is defined. Let A � R; let f W A ! R. Suppose A contains a neigh-

borhood of the point a, that is, there is an open ball B.aI r/ such that

B.aI r/ � A. We define the derivative of f at a by the equation

f 0.a/ D limt!0

f .aC t / � f .a/

t; (1.1)

provided the limit exists. In this case, we say that f is differentiable at a.

Geometrically, f 0.a/ is the slope of the tangent line to the graph of f at the

point .a; f .a//.

Definition 1.20 For a function f W .a; b/! R, and point x0 2 .a; b/, if

limt"0

f .x0 C t / � f .x0/

t

exists and is finite, we denote this limit by f 0�.x0/ and call it the left-hand

derivative of f at x0. Similarly, we define f 0C.x0/ and call it the right-hand

derivative of g at x0. Of course, f is differentiable at x0 iff it has left-hand

and right-hand derivatives at x0 that are equal.

Now let A � Rm, where m > 1; let f W A ! Rn. Can we define the

derivative of f by replacing a and t in the definition just given by points of

Rm? Certainly we cannot since we cannot divide a point of Rn by a point of

Rm if m > 1.

Directional Derivative

The following is our first attempt at a definition of “derivative”. Let A � Rm

and let f W A ! Rn. We study how f changes as we move from a point

a 2 AB (the interior of A) along a line segment to a nearby point a C u,

where u ¤ 0. Each point on the segment can be expressed as aC tu, where

t 2 R. The vector u describes the direction of the line segment. Since a is

an interior point of A, the line segment joining a to aC tu will lie in A if t is

small enough.

1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE 9

Definition 1.21 (Directional Derivative) Let A � Rm; let f W A ! Rn. Sup-

pose A contains a neighborhood of a. Given u 2 Rm with u ¤ 0, define

f 0.aIu/ D limt!0

f .aC tu/ � f .a/

t;

provided the limit exists. This limit is called the directional derivative of f

at a with respect to the vector u.

Remark 1.22 In calculus, one usually requires u to be a unit vector, i.e.,

kuk D 1, but that is not necessary.

Example 1.23 Let f W R2 ! R be given by the equation f .x1; x2/ D x1x2.

The directional derivative of f at a D .a1; a2/ with respect to the vector

u D .u1; u2/ is

f 0.aIu/ D limt!0

f .aC tu/ � f .a/

tD lim

t!0

.a1 C tu1/.a2 C tu2/ � a1a2

t

D u2a1 C u1a2:

Example 1.24 Suppose the directional derivative of f at a with respect to

u exists. Then for c 2 R,

f 0.aI cu/ D limt!0

f .aC tcu/ � f .a/

tD c lim

t!0

f .aC tcu/ � f .a/

tc

D c lims!0

f .aC su/ � f .a/

s

D cf 0.aIu/:

Remark 1.25 Example 1.24 shows that if u and v are collinear vectors in

Rm, then f 0.aIu/ and f 0.aI v/ are collinear in Rn.

However, directional derivative is not the appropriate generalization of

the notion of “derivative”. The main problems are:

Problem 1. Continuity does not follow from this definition of “differen-

tiability”. There exists functions such that f 0.aIu/ exists for all u ¤ 0 but

are not continuous.

Example 1.26 Define f W R2 ! R by setting

f .x; y/ D

�0 if .x; y/ D .0; 0/x2y

x4 C y2if .x; y/ ¤ .0; 0/:


We show that all directional derivatives of f exist at 0, but that f is not

continuous at 0. Let u D .h; k/ ¤ 0. Then

f .0C tu/ � f .0/

tD

.th/2.tk/�.th/4 C .tk/2

�tD

h2k

t2h4 C k2;

so that

f 0.0Iu/ D

˚h2=k if k ¤ 0;

0 if k D 0:

However, the function f takes the value 1=2 as each point of the parabola

y D x2 (except at 0), so f is not continuous at 0 since f .0/ D 0.

Problem 2. Composites of “differentiable” functions may not differen-

tiable.

Derivative

To give the right generalization of the notion of “derivative”, let us recon-

sider (1.1). In fact, if f 0.a/ exists, let Ra.t/ denote the difference

Ra.t/´

˚f .aCt/�f .a/

t� f 0.a/ if t ¤ 0

0 if t D 0:(1.2)

From (1.2) we see that limt!0Ra.t/ D 0. Then we have

f .aC t / D f .a/C f 0.a/t CRa.t/t: (1.3)

Note that (1.3) also holds for t D 0. This is called the first-order Taylor

formula for approximating f .a C t / � f .a/ by f 0.a/t . The error committed

is Ra.t/t .3 See Figure 1.4. It is this idea leads to the following definition:

Definition 1.27 (Differentiability) Let A � Rm; let f W A ! Rn. Suppose A

contains a neighborhood of a. We say that f is differentiable at a if there is

an n �m matrix Ba such that

f .aC h/ D f .a/C Ba � hC khkRa.h/;

where limh!0Ra.h/ D 0. The matrix Ba, which is unique, is called the

derivative of f at a; it is denoted Df .a/.

3“All science is dominated by the idea of approximation.”—Bertrand Russell.

1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE 11

0 xa

f .a/

f .x/

f.aCt /�f.a/

f0 .a/� t

t

Figure 1.4: f 0.a/t is the linear approximation to f .aC t / � f .a/ at a.

0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x0 x

a

f .a/f

Df .a/

Figure 1.5: Df .a/ is the linear part of f at a.

Remark 1.28 [1] Notice that h is a point of Rm and f .aCh/�f .a/�Ba �h

is a point of Rn, so the norm signs are essential.

[2] The derivative Df .a/ depends on the point a as well as the function f .

We are not saying that there exists a B which works for all a, but that

for a fixed a such a B exists.

[3] Here is how to visualize Df . Take m D n D 2. The function f W A! R2

distorts shapes nonlinearly; its derivative describes the linear part of

the distortion. Circles are sent by f to wobbly ovals, but they become

ellipses under Df .a/ (here we treat Df .a/ as the matrix that represents

a linear operator; see, e.g., Axler 1997.) Lines are sent by f to curves,

but they become straight lines under Df .a/. See Figure 1.5.

Example 1.29 Let f W Rm ! Rn be defined by the equation

f .x/ D A � x C b;


where A is an n �m matrix, and a 2 Rn. Then

f .aC h/ D A � .aC h/C b D A � aC bCA � h

D f .a/CA � h:

Hence, Ra.h/ D Œf .aC h/ � f .a/ �A � h�=khk D 0; that is, Df .a/ D A.

We now show that the definition of derivative is stronger than direc-

tional derivative. In particular, we have:

Theorem 1.30 Let A � Rm; let f W A! Rn. If f is differentiable at a, then

f is continuous at a.

Proof. Differentiablity at a implies that

kf .aC h/ � f .a/k D Df .a/ � hC khkRa.h/

6 kDf .a/k � khk C kRa.h/k � khk

! 0;

as aCh! a, where the inequality follows from the Triangle Inequality and

Cauchy-Schwartz Inequality (Theorem 1.7). ut

However, there is a nice connection between directional derivative and

derivative.

Theorem 1.31 Let A � Rm; let f W A! Rn. If f is differentiable at a, then

all the directional derivatives of f at a exist, and

f 0.aIu/ D Df .a/ � u:

Proof. Fix any u 2 Rm and take h D tu. Then

f .aC tu/ � f .a/

tD

Df .a/ � .tu/C ktukRa.tu/

t

D Df .a/ � uCjt j � kuk

tRa.tu/:

The last term converges to zero as t ! 0, which proves that f 0.aIu/ D

Df .a/ � u. ut

? Exercise 1.32 Define f W R2 ! R by setting

f .x; y/ D

�0 if .x; y/ D .0; 0/x2y

x4 C y2if .x; y/ ¤ .0; 0/:

Show that f is not differentiable at .0; 0/.

? Exercise 1.33 Let f W R2 ! R be defined by f .x; y/ Dpjxyj. Show that f

is not differentiable at .0; 0/.

1.3 PARTIAL DERIVATIVES AND THE JACOBIAN 13

1.3 Partial Derivatives and the Jacobian

We now introduce the notion of the “partial derivatives” of a real-valued

function. Let .e1; : : : ; em/ be the standard basis of Rm, i.e.,

e1 D .1; 0; 0; : : : ; 0/;

e2 D .0; 1; 0; : : : ; 0/;

: : :

em D .0; 0; : : : ; 0; 1/:

Definition 1.34 (Partial Derivatives) Let A � Rm; let f W A ! R. We define

the j th partial derivative of f at a to be the directional derivative of f at a

with respect to the vector ej , provided this derivative exists; and we denote

it by Djf .a/. That is,

Djf .a/ D limt!0

f .aC tej / � f .a/

t:

Remark 1.35 It is important to note that Djf .a/ is the ordinary derivative

of a certain function; in fact, if g.x/ D f .a1; : : : ; aj�1; x; ajC1; : : : ; am/, then

Djf .a/ D g0.aj /. This means that Djf .a/ is the slope of the tangent line at

.a; f .a// to the curve obtained by intersecting the graph of f with the plane

xi D ai with i ¤ j . See Figure 1.6.

The following theorem relates partial derivatives to the derivative in the

case where f is a real-valued function.

Theorem 1.36 Let A � Rm; let f W A! R. If f is differentiable at a, then

Df .a/ Dh

D1f .a/ D2f .a/ � � � Dmf .a/i:

Proof. If f is differentiable at a, then Df .a/ is a .1 �m/-matrix. Let

Df .a/ Dh�1 �2 � � � �m

i:

It follows from Theorem 1.31 that

Djf .a/ D f0.aI ej / D Df .a/ � ej D �j : ut

1.4 The Chain Rule

We extend the familiar chain rule to the current setting.


.a; b/

x1

x2

Figure 1.6: D1f .a; b/.

Theorem 1.37 (Chain Rule) Let A � Rm; let B � Rn. Let f W A ! Rn and

g W B ! Rp , with f .A/ � B . Suppose f .a/ D b. If f is differentiable at a,

and if g is differentiable at b, then the composite function g B f W A! Rp is

differentiable at a. Furthermore,

D.g B f /.a/ D Dg.b/ � Df .a/:

Proof. Omitted. See, e.g., Spivak (1965, Theorem 2-2), Rudin (1976, Theo-

rem 9.15), or Munkres (1991, Theorem 7.1). ut

Here is an application of the Chain Rule.

Theorem 1.38 Let A � Rm; let f W A ! Rn. Suppose A contains a neigh-

borhood of a. Let fi W A! R be the i th component function of f , so that

f .x/ D

2664f1.x/:::

fn.x/

3775 :(a) The function f is differentiable at a if and only if each component func-

tion fi is differentiable at a.

1.4 THE CHAIN RULE 15

(b) If f is differentiable at a, then its derivative is the .n�m/-matrix whose

i th row is the derivative of the function fi . That is,

Df .a/ D

2664Df1.a/:::

Dfn.a/

3775 D2664

D1f1.a/ � � � Dmf1.a/:::

: : ::::

D1fn.a/ � � � Dmfn.a/

3775 :

Proof. (a) Assume that f is differentiable at a and express the i th compo-

nent of f as

fi D �i B f;

where �i W Rn ! R is the projection that sends a vector x D .x1; : : : ; xn/ to

xi . Notice that we can write �i as �i .x/ D A � x, where A is a 1 � n matrix

such that

A Dh0 � � � 0 1 0 � � � 0

i;

where the number 1 appears at the i th place. Then �i is differentiable and

D�.x/ D A for all x 2 A (see Example 1.29). By the Chain Rule, fi is

differentiable at a and

Dfi .a/ D D.�i B f /.a/ D D�i�f .a/

�� Df .a/ D A � Df .a/: (1.4)

Now suppose that each fi is differentiable at a. Let

B´

2664Df1.a/:::

Dfn.a/

3775 :We show that Df .a/ D B.

khkRa.h/ D f .aC h/ � f .a/ � B � h D

2664f1.aC h/ � f1.a/ � Df1 � h

:::

fn.aC h/ � fn.a/ � Dfn � h

3775

D khk

2664Ra.hIf1/

:::

Ra.hIfn/;

3775whereRa.hIfi / is the Taylor remainder for fi . It is clear that limh!0Ra.h/ D

0, and which proves that Df .a/ D B.


(b) This claim follows from the previous part and Theorem 1.36. ut

Remark 1.39 Theorem 1.38 implies that there is little loss of generality

assuming n D 1, i.e., that our functions are real-valued. Multidimensionality

of the domain, not the range, is what distinguished multivariable calculus

from one-variable calculus.

Definition 1.40 (Jocobian Matrix) Let A � Rm; let f W A! Rn. If the partial

derivatives of the component functions fi of f exist at a, then one can form

the matrix that has Djfi .a/ as its entry in row i and column j . This matrix,

denoted by Jf .a/, is called the Jacobian matrix of f . That is,

Jf .a/ D

2664D1f1.a/ � � � Dmf1.a/

:::: : :

:::

D1fn.a/ � � � Dmfn.a/

3775 :Remark 1.41 [1] The Jacobian encapsulates all the essential information

regarding the linear function that best approximates a differentiable

function at a particular point. For this reason it is the Jacobian which is

usually used in practical calculations with the derivative

[2] If f is differentiable at a, then Jf .a/ D Df .a/. However, it is possi-

ble for the partial derivatives, and hence the Jacobian matrix, to exist,

without it following that f is differentiable at a (see Exercise 1.32).

1.5 The Implicit Function Theorem

Let U � Rk � Rn be open. Let f W U ! Rn. Fix a point .a;b/ 2 U and write

f .a;b/ D 0. Our goal is to solve the equation

f .x;y/ D 0 (1.5)

near .a;b/. More precisely, we hope to show that the set of points .x;y/

nearby .a;b/ at which f .x;y/ D 0, the level-set of f through 0, is the graph

of a function y D g.x/. If so, g is the implicit function defined by (1.5). See

Figure 1.7.

Under various hypotheses we will show that g exists, is unique, and is

differentiable. Let us first consider an example.

Example 1.42 Consider the function f W R2 ! R defined by

f .x; y/ D x2 C y2 � 1:

1.5 THE IMPLICIT FUNCTION THEOREM 17

0 Rk

Rn

0 Rk

Rn

a

b

Lf .0/

Figure 1.7: Near .a;b/, Lf .0/ is the graph of a function y D g.x/.

If we choose .a; b/ with f .a; b/ D 0 and a ¤ ˙1, there are (Figure 1.8) open

intervals A containing a and B containing b with the following property: if

x 2 A, there is a unique y 2 B with f .x; y/ D 0. We can therefore define a

function g W A ! R by the condition g.x/ 2 B and f .x; g.x// D 0 (if b > 0,

as indicated in Figure 1.8, then g.x/ Dp1 � x2). For the function f we

are considering there is another number b1 such that f .a; b1/ D 0. There

will also be an interval B1 containing b1 such that, when x 2 A, we have

f .x; g1.x// D 0 for a unique g1.x/ 2 B1 (here g1.x/ D �p1 � x2). Both g and

g1 are differentiable. These functions are said to be defined implicitly by

the equation f .x; y/ D 0.

If we choose a D 1 or �1 it is impossible to find any such function g

defined in an open interval containing a.

Theorem 1.43 (Implicit Function Theorem) Let U � RkCn be open; let

f W U ! Rn be of class C r . Write f in the form f .x;y/ for x 2 Rk and

y 2 Rn. Suppose that .a;b/ is a point of U such that f .a;b/ D 0. Let M be

the n � n matrix

M D

266664DkC1f1.a;b/ DkC2f1.a;b/ � � � DkCnf1.a;b/

DkC1f2.a;b/ DkC2f1.a;b/ � � � DkCnf2.a;b/:::

:::: : :

:::

DkC1fn.a;b/ DkC2fn.a;b/ � � � DkCnfn.a;b/

377775 :If det .M/ ¤ 0, then near .a;b/, Lf .0/ is the graph of a unique function

y D g.x/. Besides, g is C r .

Proof. The proof is too long to give here. You can find it from, e.g., Spivak

(1965, Theorem 2-12), Rudin (1976, Theorem 9.28), Munkres (1991, Theo-

rem 2.9.2), or Pugh (2002, Theorem5.22). ut


0 x

y

graph of g

graph of g1f .x; y/ D 0

.a; b/B

B1

A

b

b1

a

Figure 1.8: Implicit function theorem.

Example 1.44 Let f W R2 ! R be given by the equation

f .x; y/ D x2 � y3:

Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0,

we do not expect to be able to solve this equation for y in terms of x near

.0; 0/. But in fact, we can; and furthermore, the solution is unique! However,

the function we obtain is not differentiable at x D 0. See Figure 1.9.

Example 1.45 Let f W R2 ! R be given by the equation

f .x; y/ D �x4 C y2:

Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0,

we do not expect to be able to solve for y in terms of x near .0; 0/. In fact,

however, we can do so, and we can do so in such a way that the result-

ing function is differentiable. However, the solution is not unique. See

Figure 1.10.

Now the point .1; 1/ is also a solution to f .x; y/ D 0. Because @f .1; 1/=@y D

2, one can solve this equation for y as a continuous function of x in a neigh-

borhood of x D 1. See Figure 1.10.

1.6 GRADIENT AND ITS PROPERTIES 19

0 x

y

Figure 1.9: y is not differentiable at x D 0.

0 x

y

−2 −1 1 2

−1

1

Figure 1.10: Example 1.45.

Remark 1.46 We will use the Implicit Function Theorem in Theorem 2.12.

The theorem will also be used to derive comparative statics for economic

models, which we perhaps do not have time to discuss.

1.6 Gradient and Its Properties

In this section we investigate the significance of the gradient vector, which

is defined as follows:

Definition 1.47 (Gradient) Let A � Rm; let f W A! R. Suppose A contains

a neighborhood of a. The gradient of f , denoted by Of .a/, is defined by

Of .a/´h

D1f .a/ D2f .a/ � � � Dmf .a/i:

Remark 1.48 [1] It follows from Theorem 1.36 that if f is differentiable

at a, then Of .a/ D Df .a/. The inverse does not hold; see Remark 1.41[2].


0 x

yOf .1; 3/ D .2; 6/

Tangent plane

.1; 3/

Lf .10/

(a)

0 x1

x2

Lf .c/x

Of .x/

slope D � D1f .x/D2f .x/

(b)

Figure 1.11: The geometric interpretation of Of .x/.

[2] With the notation of gradient, we can write the Jocobian of f D .f1; : : : ; fn/

as

Jf .a/ D

2664Of1.a/:::

Ofn.a/

3775 :Let f W Rm ! Rn with f D .f1; : : : ; fn/. Recall that the level set of f

through 0 is given by

Lf .0/´˚x 2 Rm W f .x/ D 0

D

n\iD1

Lfi .0/: (1.6)

Given a point a 2 Lf .0/, it is intuitively clear what it means for a plane to be

tangent to Lf .0/ at a. Figure 1.12 shows some examples of tangent planes.

A formal definition of tangent plane will be given later. In this section we

show that the gradient Of .a/ is orthogonal to the tangent plane of Lf .0/

at a and, under some conditions, the tangent plane of Lf .0/ at a can be

characterized by the vectors that are orthogonal to Of .a/. We begin with

the simplest case that f W R2 ! R. Here is an example.

Example 1.49 Let f .x; y/ D x2 C y2. The level set Lf .10/ is given by

Lf .10/ Dn.x; y/ 2 R2 W x2 C y2 D 10

o:

Calculus yieldsdy

dx

ˇalong Lf .10/ at .1;3/

D �1

3:


Of .a/

a

Tangent plane

Lf .0/

(a) f W R2! R.

a

Of .a/

Lf .0/

Tangent plane

(b) f W R3! R.

Lf1.0/

Lf2.0/

a

Of1.a/

Of2.a/

Tangent plane

(c) f1; f2 W R3! R.

Figure 1.12: Examples of tangent planes.


Hence, the tangent plane at .1; 3/ is given by y D 3 � .x � 1/=3. Since

Of .1; 3/ D .2; 6/, the result follows immediately; see Figure 1.11(a).

The result in Example 1.49 can be explained as follows. If we change x1and x2, and are to remain on Lf .0/, then dx1 and dx2 must be such as to

leave the value of f unchanged at 0. They must therefore satisfy

f 0.xI .dx1; dx2// D D1f .x/dx1 C D2f .x/dx2 D 0: (1.7)

By solving (1.7) for dx2=dx1, the slope of the level set through x will be (see

Figure 1.11(b))dx2dx1D �

D1f .x/

D2f .x/:

Since the slope of the vector Of .x/ D .D1f .x/; D2f .x// is D2f .x/=D1f .x/,

we obtain the desired result.

We then present the general result. For simplicity, we assume through-

out this section that each fi 2 C1. For Lf .0/ defined in (1.6), obtaining an

explicit representation for the tangent plane is a fundamental problem that

we now address. First we define curves on Lf .0/ and the tangent plane at

some point x 2 Rm. You may want to refer some Differential Geometry text-

books, e.g., O’neill (2006), Spivak (1999), or Lee (2009), for understanding

some of the following concepts better.

One can picture a curve in Rm as a trip taken by a moving point c. At

each “time” t in some interval Œa; b� � R, c is located at the point

c.t/ D�c1.t/; : : : ; cm.t/

�2 Rm:

In rigorous terms then, c is a function from Œa; b� to Rm, and the compo-

nent functions c1; : : : ; cm are its Euclidean coordinate functions. We define

the function c to be differentiable provided its component functions are

differentiable.

Example 1.50 A helix c W R! R3 is obtained through the formula

c.t/ D .a cos t; a sin t; bt/ ;

where a; b > 0. See Figure 1.13.

Definition 1.51 A curve on Lf .0/ is a continuous curve c W Œa; b� ! Lf .0/.

A curve c.t/ is said to pass through the point a 2 Lf .0/ if a D c.t�/ for some

t� 2 Œa; b�.


Figure 1.13: The Helix.

Definition 1.52 The tangent plane at a 2 Lf .0/, denoted Tf .a/, is defined

as the collection of the derivatives at a of all differentiable curves on Lf .0/

passing through a.

Ideally, we would like to express the tangent plane defined in Defini-

tion 1.52 in terms of derivatives of functions fi that defines the surface

Lf .0/ (see Example 1.49). We introduce the subspace

M ´˚x 2 Rm W Df .a/ � x D 0

(1.8)

and investigate under what conditions M is equal to the tangent plane at a.

The following result shows that Ofi .a/ is orthogonal to the tangent plane

Tf .a/ for all a 2 Lf .0/.

Theorem 1.53 For each a 2 Lf .0/, the gradient Ofi .a/ is orthogonal to the

tangent plane Tf .a/.

Proof. We establish this result by showing Tf .a/ � M for each a 2 Lf .0/.

Every curve c.t/ passing through a at t D t� satisfies f .x.t�// D 0, and so

Df�c.t�/

�� Dc.t�/ D 0:

That is, Dc.t�/ 2M . ut

Definition 1.54 A point a 2 Lf .0/ is said to be a regular point if the

gradient vectors .Of1.a/; : : : ;Ofn.a// are linearly independent.

In general, at regular points it is possible to characterize the tangent

plane in terms of Of1.a/; : : : ;Ofn.a/.


Theorem 1.55 At a regular point a of Lf .0/, the tangent plane Tf .a/ is

equal to M

Proof. We show that M � Tf .a/. Combining this result with Theorem 1.53,

we have Tf .a/ DM .

To show M � Tf .a/, we must show that if x 2 M then there exists a

curve on Lf .a/ passing through a with derivative x. To construct such a

curve we consider the equations

f�aC tx C Df .a/T � u.t/

�D 0; (1.9)

where for fixed t we consider u.t/ 2 Rn to be the unknown. This is a nonlin-

ear system of n equations and n unknowns, parametrized continuously by

t . At t D 0 there is a solution u.0/ D 0. The Jacobian matrix of the system

with respect to u at t D 0 is the n � n matrix

Df .a/ � Df .a/T ;

which is nonsingular, since Df .a/ is of full rank if a is a regular point. Thus,

by the Implicit Function Theorem (Theorem 1.43) there is a continuously

differentiable solution u.t/ in some region t 2 Œ�a; a�.

The curve

c.t/´ aC tx C Df .a/T � u.t/

is thus a curve on Lf .0/. By differentiating the system (1.9) with respect to

t at t D 0 we obtain

Df .a/ �hx C Df .a/T � Du.0/

iD 0: (1.10)

By definition of x we have Df .a/�x D 0 and thus, again since Df .a/�Df .a/T

is nonsingular, we conclude from (1.10) that

Du.0/ Dh

Df .a/ � Df .a/Ti�1� 0 D 0:

Therefore,

Dc.0/ D x C Df .a/T � Du.0/ D x;

and the constructed curve has derivative x at a. ut

Example 1.56 [1] In R2 let f .x1; x2/ D x1. Then Lf .0/ is the x2-axis, and

every point on that asis is regular since Of .0; x2/ D .1; 0/. In this case,

Tf ..x1; 0// DM , and which is the x2-axis.

1.7 CONTINUOUSLY DIFFERENTIABLE FUNCTIONS 25

[2] In R2 let f .x1; x2/ D x21 . Again, Lf .0/ is the x2-axis, but now no point

on the x2-axis is regular: Of .0; x2/ D .0; 0/. Indeed in this case M D R2,

while the tangent plane is the x2-axis.

We close this section by providing another property of gradient vectors.

Let f W Rn ! R be a differentiable function and a 2 Rn, where Of .a/ ¤ 0.

Suppose that we want to determine the direction in which f increases most

rapidly at a. By a “direction” here we mean a unit vector u. Let �u denote

the angle between u and Of .a/. Then

f 0.aIu/ D Of .a/ � u D kOf .a/k cos �u:

But cos �u attains its maximum value of 1 when �u D 0, that is, when u

and Of .a/ are collinear and point in the same direction. We conclude that

kOf .a/k is the maximum value of f 0.aIu/ for u a unit vector, and that this

maximum value is attained with u D Of .a/= kOf .a/k.

1.7 Continuously Differentiable Functions

We know that mere existence of the partial derivatives does not imply dif-

ferentiability (see Exercise 1.32). If, however, we impose the additional con-

dition that these partial derivatives are continuous, then differentiability is

assured.

Theorem 1.57 Let A be open in Rm. Suppose that the partial derivatives

Djfi .x/ of the component functions of f exist at each point x 2 A and are

continuous on A. Then f is differentiable at each point of A.

A function satisfying the hypotheses of this theorem is often said to be

continuously differentiable, or of class C1, on A.

Proof of Theorem 1.57. It suffices to show that each component function

of f is differentiable by Theorem 1.38. Therefore we may restrict ourselves

to the case of a real-valued function f W A ! R. Let a 2 A. We claim that

Df .a/ D Of .a/ when f 2 C1.

Recall that .e1; : : : ; em/ is the standard basis of Rm. Then every h D

.h1; : : : ; hm/ 2 Rm can be written as h DPmiD1 hiei . For each i D 1; : : : ; m, let

pi ´ aC

iXkD1

hkek D pi�1 C hiei ;

where p0 ´ a. Figure 1.14 illustrates the case where m D 3 and all hi are

positive. For each i D 1; : : : ; m, we also define a function �i W Œ0; 1� ! Rm by


x1

x2

x3

p0 D a

p1 p2

p3 D aC h

Figure 1.14: The segmented path from a to aC h.

letting

�i .t/ D pi�1 C thiei :

So, �i is a segment from pi�1 to pi .

By the one-dimensional chain rule and mean value theorem applied to

the differentiable real-valued function g W Œ0; 1�! R defined by

g.t/ D .f B �i /.t/;

there exists Nti 2 .0; 1/ such that

f .pi / � f .pi�1/ D g.1/ � g.0/

D g0�Nti�

Ddf

�hi�11 ; hi�1i�1; h

i�1i C thi ; h

i�1iC1; : : : ; h

i�1m

�dt

ˇˇtDNti

D Dif��i�Nti�� hi :

Telescoping f .aC h/ � f .a/ along .�1; : : : ; �m/ gives

f .aC h/ � f .a/ �Of .a/ � h DmXiD1

hf�pi�� f

�pi�1

�i�Of .a/ � h

D

mXiD1

�Dif

��i�Nti�� Dif .a/

�� hi :

1.8 QUADRATIC FORMS: DEFINITE AND SEMIDEFINITE MATRICES 27

Continuity of the partials implies that Dif .�i .Nti // � Dif .a/ ! 0 as khk !

0. ut

Remark 1.58 It follows from Theorem 1.57 that sin.xy/ and xy2 C zexy

are both differentiable since they are of class C1.

Let A � Rm and f W A! Rn. Suppose that the partial derivative Djfi of

the component functions of f exist on A. These then are functions from A

to R, and we may consider their partial derivatives, which have the form

Dk.Djfi /µ Djkfi

and are called the second-order partial derivatives of f . Similarly, one de-

fines the third-order partial derivatives of the functions fi , or more gener-

ally the partial derivatives of order r for arbitrary r .

Definition 1.59 If the partial derivatives of the function fi of order less

than or equal to r are continuous on A, we say f is of class C r on A. We

say f is of class C1 on A if the partials of the functions fi of all orders are

continuous on A.

Definition 1.60 (Hessian) Let a 2 A � Rm; let f W A ! R be twice-

differentiable at a. The m � m matrix representing the second derivative

of f is called the Hessian of f , denoted Hf .a/:

Hf .a/ D

266664D11f .a/ D12f .a/ � � � D1mf .a/

D21f .a/ D22f .a/ � � � D2mf .a/:::

:::: : :

:::

Dm1f .a/ Dm2f .a/ � � � Dmmf .a/

377775 D D.Of /:

Remark 1.61 If f W A ! R is of class C2, then the Hessian of f is a

symmetric matrix, i.e., Dijf .a/ D Dj if .a/ for all i; j D 1; : : : ; m and for all

a 2 A. See Rudin (1976, Corollary to Theorem 9.41, p. 236).

? Exercise 1.62 Find the Hessian of the Cobb-Douglas function

f .x; y/ D x˛yˇ :

1.8 Quadratic Forms: Definite and Semidefinite

Matrices

Definition 1.63 (Quadratic Form) Let A be a symmetric n � n matrix. A

quadratic form on Rn is a function QA W Rn ! R of the form

QA.x/ D x �Ax D

nXiD1

nXjD1

aijxixj :


Since the quadratic form QA is completely specified by the matrix A,

we henceforth refer to A itself as the quadratic form. Observe that if f

is of class C2, then the Hessian Hf of f defines a quadratic form; see

Remark 1.61.

Definition 1.64 A quadratic form A is said to be

� positive definite if we have x �Ax > 0 for all x 2 Rn X f0g;

� positive semidefinite if we have x �Ax > 0 for all x 2 Rn;

� negative definite if we have x �Ax < 0 for all x 2 Rn X f0g;

� negative semidefinite if we have x �Ax 6 0 for all x 2 Rn.

1.9 Homogeneous Functions and Euler’s Formula

Definition 1.65 (Homogeneous Function) A function f W Rn ! R is homo-

geneous of degree r (for r D : : : ;�1; 0; 1; : : :) if for every t > 0 we have

f .tx1; : : : ; txn/ D trf .x1; : : : ; xn/:

? Exercise 1.66 The function

f .x; y/ D Ax˛yˇ ; A; ˛; ˇ > 0;

is known as the Cobb-Douglas function. Check whether this function is

homogeneous.

Theorem 1.67 (Euler’s Formula) Suppose that f W Rn ! R is homogeneous

of degree r (for some r D : : : ;�1; 0; 1; : : :) and differentiable. Then at any

x� 2 Rn we have

Of .x�/ � x� D rf .x�/:

Proof. By definition we have

f .tx�/ � t rf .x�/ D 0:

Differentiating with respect to t using the chain rule, we have

Of .tx�/ � x� D rt r�1f .x�/:

Evaluating at t D 1 gives the desired result. ut

Lemma 1.68 If f is homogeneous of degree r , its partial derivatives are

homogeneous of degree r � 1.

1.9 HOMOGENEOUS FUNCTIONS AND EULER’S FORMULA 29

? Exercise 1.69 Prove Lemma 1.68.

? Exercise 1.70 Let f .x; y/ D Ax˛yˇ with ˛ C ˇ D 1 and A > 0. Show that

Theorem 1.67 and Lemma 1.68 hold for this function.

2

OPTIMIZATION IN RN

This chapter is based on Luenberger (1969, Chapters 8 & 9), Mas-Colell,

Whinston and Green (1995, Sections M.J & M.K), Sundaram (1996), Vohra

(2005), Luenberger and Ye (2008, Chapter 11), Duggan (2010), and Jehle

and Reny (2011, Chapter A2).

2.1 Introduction

An optimization problem in Rn, or simply an optimization problem, is one

where the values of a given function f W Rn ! R are to be maximized or

minimized over a given set X � Rn. The function f is called the objective

function and the set X the constraint set. Notationally, we will represent

these problems by

Maximize f .x/ subject to x 2 X;

and

Minimize f .x/ subject to x 2 X;

respectively. More compactly, we shall also write

max ff .x/ W x 2 Xg and min ff .x/ W x 2 Xg :

Example 2.1 [a] LetX D Œ0;1/ and f .x/ D x. Then the problem maxff .x/ W

x 2 Xg has no solution; see Figure 2.1(a).

[b] Let X D Œ0; 1� and f .x/ D x.1�x/. Then the problem maxff .x/ W x 2 Xg

has exactly one solution, namely x D 1=2; see Figure 2.1(b).

32 CHAPTER 2 OPTIMIZATION IN RN

0 x

f.x/ Dx

(a) No solution.

0 x1

f .x/ D x.1 � x/

(b) Exactly one solution.

0 x−1 1

f.x/Dx2

(c) Two solutions.


[c] Let X D Œ�1; 1� and f .x/ D x2. Then the problem maxff .x/ W x 2 Xg

has two solutions, namely x D �1 and x D 1; see Figure 2.1(c).

Example 2.1 suggests that we shall talk of the set of solutions of the

optimization problem, which is denoted

argmax ff .x/ W x 2 Xg D˚x 2 X W f .x/ > f .y/ for all y 2 X

;

and

argmin ff .x/ W x 2 Xg D˚x 2 X W f .x/ 6 f .y/ for all y 2 X

:

We close this section by considering an optimization problem in eco-

nomics.

Example 2.2 There are n commodities in an economy. There is a consumer

whose utility from consuming xi > 0 units of commodity i (i D 1; : : : ; n) is

given by u.x1; : : : ; xn/, where u W RnC ! R is the consumer’s utility function.

The consumer’s income is I > 0, and faces the price vector p D .p1; : : : ; pn/.

His budget set is given by (see Figure 2.2)

B.p; I /´˚x 2 RnC W p � x 6 I

:

The consumer’s objective is to maximize his utility over the budget set, i.e.,

Maximize u.x/ subject to x 2 B.p; I /:

Here we introduce an important fact we will make frequent use of.

2.2 UNCONSTRAINED OPTIMIZATION 33

0 x1

x2

B.p; I /

I=px1

I=px2

Figure 2.2: The budget set B.px1 ; px2 ; I /.

Lemma 2.3 Let f W Rn ! R be differentiable. Then for all h 2 Rn satisfying

h �Of .x/ > 0, we have

f .x C "h/ > f .x/ for all " > 0 sufficiently small.

Proof. We approximate f by its Taylor series expansion (see Definition 1.27):

f .x C "h/ D f .x/C "h �Of .x/C k"hkRx."h/;

where lim"!0Rx."h/ D 0. Then

lim"#0

f .x C "h/ � f .x/

"Dh �Of .x/khk

C khk � lim"#0

Rx."h/ Dh �Of .x/khk

> 0I

therefore, there is N" > 0 such that for all " 2 .0; N"/,

f .x C "h/ � f .x/

"> 0;

i.e., f .x C "/ � f .x/ > 0 for all " 2 .0; N"/. ut

2.2 Unconstrained Optimization

Definition 2.4 (Maximum) Given X � Rn, f W X ! R and x 2 X , we say x

is a maximum of f if

f .x/ D max ff .y/ W y 2 Xg :

We say x is a local maximum of f if there is some " > 0 such that for all

y 2 X \ B.xI "/ we have f .x/ > f .y/. And x is a strict local maximum of f

if the latter inequality holds strictly.


2.2.1 First-Order Necessary Conditions

Recall that XB is the interior of X � Rn (Definition 1.13), and f 0.xIu/ is the

directional derivative of f at x with respect to u (Definition 1.21).

Theorem 2.5 Let X � Rn and x 2 XB; let f W X ! R be differentiable at x.

If x is a local maximum of f , then Of .x/ D 0.

Proof. Suppose that Of .x/ ¤ 0. Let h D Of .x/ (we can do this since x 2

XB). Then h � Of .x/ > 0. Hence f .x C "h/ > f .x/ for all " > 0 sufficiently

small by Lemma 2.3. This contradicts local optimality of x. ut

Definition 2.6 (Critical Point) A vector x 2 Rn such that Of .x/ D 0 is

called a critical point.

Example 2.7 Let X D R2 and f .x; y/ D xy � 2x4 � y2. The first order

condition is

Of .x; y/ D .y � 8x3; x � 2y/ D .0; 0/:

Thus, the critical points are .x; y/ D .0; 0/; .1=4; 1=8/; .�1=4;�1=8/.

2.2.2 Second-Order Sufficient Conditions

The first-order conditions for unconstrained local optima do not distin-

guish between maxima and minima (see the following Example 2.9). To

obtain such a distinction in the behavior of f at an optimum, we need to

examine the behavior of the Hessian Hf of f (see Definition 1.60).

Theorem 2.8 Suppose f is of class C2 on X � Rn, and x 2 XB.

(a) If f has a local maximum at x, then Hf .x/ is negative semidefinite.

(b) If f has a local minimum at x, then Hf .x/ is positive semidefinite.

(c) If Of .x/ D 0 and Hf .x/ is negative definite at some x, then x is a strict

local maximum of f on X .

(d) If Of .x/ D 0 and Hf .x/ is positive definite at some x, then x is a strict

local minimum of f on X .

Proof. See Sundaram (1996, Section 4.6). ut

Example 2.9 Let f W R ! R be defined by f .x/ D 2x3 � 3x2. It is easy to

check that f 2 C2 on R and there are two critical points: x D 0 and x D 1.

Invoking the second-order conditions, we get f 00.0/ D �6 and f 00.1/ D 6.

2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 35

0 x−1 1

2x3�3x2

Figure 2.3: x D 0 and x D 1 are local optima but not global optima.

Thus, the point x D 0 is a strict local maximum of f on R, and the point

x D 1 is a strict local minimizer of f on R; see Figure 2.3.

However, there is nothing in the first- or second-order conditions that

will help determine whether these points are global optima. In fact, they are

not: global optima do not exist in this example, since limx!C1 f .x/ D C1

and limx!�1 f .x/ D �1.

2.3 Equality Constrained Optimization: Lagrange’s

Method

In this section we consider problems with equality constraints of the form

max f .x/

s:t: gi .x/ D 0; i D 1; : : : ; k 6 n:(ECP)

We assume that f W Rn ! R, gi W Rn ! R, i D 1; : : : ; k, are continuously

differentiable functions. For notational convenience, we introduce the con-

straint function g W Rn ! Rk , where

g D .g1; : : : ; gk/:

We can then write the constraints in the more compact form g.x/ D 0. Also

recall that a point x� such that gi .x�/ D 0 for all i D 1; : : : ; k is called a

regular point if the gradient vectors

Og1.x�/; : : : ;Ogk.x�/


are linearly independent (see Definition 1.54).

2.3.1 First-Order Necessary Conditions

Since the representation of the tangent plane is known from Theorem 1.55,

it is fairly simple to derive the necessary and sufficient conditions for a

point to be a local extremum point subject to equality constraints.

Lemma 2.10 Let x� be a regular point of the constrains g.x/ D 0 and a

local maximum of f subject to these constraints. Then all x 2 Rn satisfying

Dg.x�/ � x D 0 (2.1)

must also satisfy

Of .x�/ � x D 0: (2.2)

Proof. Take an arbitrary point x 2 M ´ fy 2 Rn W Dg.x�/ � y D 0g. Since

x� is regular, it follows from Theorem 1.55 that the tangent plane Tg.x�/

coincides with M . Therefore, there exists a differentiable curve c W Œa; b� !

Rn on Lg.x�/ passing through x� with c.t�/ D x� and Dc.t�/ D x for some

t� 2 Œa; b�.

Since x� is a constrained local maximium of f , we have

df�c.t/

�dt

ˇtDt�D 0;

or equivalently,

Of .x�/ � x D 0: ut

Lemma 2.10 says that Of .x�/ is orthogonal to the tangent plane Tg.x�/.

Here is an example.

Example 2.11 Consider the following problem

max f .x1; x2/ D �x1=2C x2=2

s:t: g.x1; x2/ D x21 C x

22 D 2:

At the local maximum x� D .�1; 1/, the gradient Of .x�/ is orthogonal to

the tangent plane of the constraint surface. See Figure 2.4.

We now conclude that Lemma 2.10 implies that Of .x�/ is a linear com-

bination of the gradients of g at x�.


0 x1

x2

x� D .�1; 1/

Lf.�1/

Lg.0/

Tangent plane

Og.x�/ D .�2;�2/

Of .x�/ D .�12 ;12 /


Theorem 2.12 (Lagrange’s Theorem) Let x� be a local maximum of f sub-

ject to g.x/ D 0, and assume that x� is a regular point of these constraints.

Then, there exists a unique vector �� D .��1 ; : : : ; ��k/ such that

Of .x�/ DkXiD1

��i �Ogi .x�/:

Proof. Since x� is a regular point of the constraints, the vectors Og1.x�/,: : :, Ogk.x�/ consists of a basis of Rn if k D n, and in this case it is obvious

to see that the theorem holds. So we assume that k < n.

Let U be the subspace spanned by Og1.x�/; : : : ;Ogk.x�/. Then, Rn can

be represented as the direct sum of U and U?, the orthogonal complement

of U , i.e., Rn D U ˚ U?. It follows from Theorem 1.55 that U? D M .

Therefore, dim.Rn/ D dim.U /C dim.M/, or equivalently,

dim.M/ D n � k: (2.3)

Let V be the subspace spanned by Og1.x�/; : : : ;Ogk.x�/;Of .x�/. Sup-

pose there does not exist �1; : : : ; �k such that Of .x�/ DPkiD1 �kOgi .x�/.

Then Og1.x�/; : : : ;Ogk.x�/;Of .x�/ are linearly independent, and so dim.V / D

k C 1. However, Lemma 2.10 says that the

M � V ?;

and which means that We then have

dim.M/ 6 dim.V ?/ D n � dim.V / D n � k C 1; (2.4)


which contradicts (2.3). We thus conclude that Of .x�/ 2 U , i.e., there exists

a unique vector �� D .��1 ; : : : ; ��k/ such that

Of .x�/ DkXiD1

��i Ogi .x�/: ut

Example 2.13 Consider the problem

max x1x2 C x2x3 C x1x3

s.t. x1 C x2 C x3 D 3:

The necessary conditions become

x2 C x3 D �

x1 C x3 D �

x1 C x2 D �:

These three equations together with the one constraint equation give four

equations that can be solved for the four unknowns x1, x2, x3, �. Solution

yields x1 D x2 D x3 D 1, � D 2.

The Regularity If x� is not a regular point, then Theorem 2.12 can fail.

Consider the following example:

Example 2.14 Let f .x/ D .x C 1/2 and g.x/ D x2. Consider the problem

of maximizing f subject to g.x/ D 0. The maximum is clearly x D 0. But

Dg.0/ D 0 and Df .0/ D 2, so there is no � such that Df .0/ D �Dg.0/.

The Lagrange Multipliers The vector �� D .��1 ; : : : ; ��k/ in Theorem 2.12

is called the vector of Lagrange multipliers corresponding to the local opti-

mum x�. The i th multiplier ��i measures the sensitivity of the value of the

objective function at x� to a small relaxation of the i th constraint gi .

To clarify the notion of “relaxation of a constraint”, let us consider the

following 1-constraint case:

max f .x/

s.t. Qg.x/ � c D 0;

where c 2 R. Then a relaxation of the constraint may be thought of as an

increase in c.


Suppose that for each c 2 R there exists a global maximum x�.c/. Sup-

pose further that x�.c/ is a regular point for each c, so there exists ��.c/ 2 R

such that

Of�x�.c/

�D ��.c/ �O Qgi

�x�.c/

�:

Suppose further that x�.c/ is differentiable with respect to c. Then

df�x�.c/

�dc

D Of�x�.c/

�� Dx�.c/

D ��.c/ �hO Qg

�x�.c/

�� Dx�.c/

i:

Since Qg.x�.c// � c D 0, we also have O Qg�x�.c/

�� Dx�.c/ D 1. Therefore,

df�x�.c/

�dc

D ��.c/:

In sum, the lagrange multiplier ��.c/ tells us that a small relaxation in

the constraint will raise the maximized value of the objective function by

��.c/. For this reason, ��.c/ is also called the shadow price of the constraint.

Lagrange’s Method We now describe a procedure for using Theorem 2.12

to solve (ECP).

Step 1 Set up a function L W Rn � Rk ! R, called the Lagrangian, defined

by

L.x;�/ D f .x/ �

kXiD1

�igi .x/:

The vector � D .�1; : : : ; �k/ 2 Rk is called the vector of Lagrange multi-

pliers.

Step 2 Find all critical points of L.x;�/:

@L

@xi.x;�/ D Dif .x/ �

kX`D1

�` Dig`.x/ D 0; i D 1; : : : ; n (2.5)

@L

@�j.x;�/ D gj .x/ D 0; j D 1; : : : ; k: (2.6)

Define

Y ´n.x;�/ 2 RnCk W .x;�/ satisfies (2.5) and (2.6)

o:


Step 3 Evaluate f at each point x in the set˚x 2 Rn W 9 � such that .x;�/ 2 Y

:

Thus, we see that Lagrange’s method is a clever way of converting a

maximization problem with constraints, to another maximization problem

without constraint, by increasing the number of variables.

Why the Lagrange’s Method typically succeeds in identifying the desired

optima? This is because the set of all critical points of L contains the set of

all local maximums and minimums of (ECP) when the regularity condition

is met. That is, if x� is a local maximum or minimum of f subject to

g.x�/ D 0, and if x� is regular, then there exists �� such that .x�;��/ is a

critical point of L.

We are not going to explain why the Lagrange’s method could fail (but

see Sundaram 1996, Section 5.4 for details).


max � x2 � y2

s.t. x C y � 1 D 0:

First, form the Lagrangian,

L.x; y; �/ D �x2 � y2 � �.x C y � 1/:

Then set all of its first-order partials equal to zero:

@L

@xD �2x � � D 0

@L

@yD �2y � � D 0

@L

@�D x C y � 1 D 0:

So the critical points of L is

.x�; y�; ��/ D .1=2; 1=2;�1/:

Hence, f .x�; y�/ D �1=2; see Figure 2.5.

? Exercise 2.16 A consumer purchases a bundle .x; y/ to maximize utility.

His income is I > 0 and prices are px > 0 and py > 0. His utility function is

u.x; y/ D xayb;

where a; b > 0. Find his optimal choice .x�; y�/.


0 x

y

L f.�2/

L f.�1/

L f.�1=2/

x�

Lg .0/

Figure 2.5: Lagrange’s method.

Lagrange’s Theorem Is Not Sufficient Lagrange’s Theorem (Theorem 2.12)

only gives us a necessary—not a sufficient—condition for a constrained lo-

cal maximum. To see why the first order condition is not generally suffi-

cient, consider the following example.


max x C y2

s.t. x � 1 D 0:

Observe that .x�; y�/ D .1; 0/ satisfies the constraint g.x�; y�/ D 0, and

.1; 0/ is regular. Furthermore, the first-order condition from Lagrange’s The-

orem is satisfied at .1; 0/. This is because Of .1; 0/ D Og.1; 0/ D .1; 0/. Hence,

by letting � D 1 we have Of .1; 0/ D �Og.1; 0/.However, .1; 0/ is not a constrained local maximum: for " > 0, we have

g.1; "/ D 0 and f .1; "/ D 1C "2 > 1 D f .1; 0/. See Figure 2.6.

2.3.2 Second-Order Analysis

We probably do not have time to discus the second-order conditions. See

Jehle and Reny (2011, Section A2.3.4) and Sundaram (1996, Section 5.3).


0 x

y

.1; 0/

.1; "/

Lg .0/

Lf .1/

Lf .1C "2/

Figure 2.6: The Lagrange’s Theorem is not sufficient.

2.4 Inequality Constrained Optimization: Kuhn-Tucker

Theorem

We now consider a problem of the form

max f .x/

s.t. h1.x/ 6 0; : : : ; h` 6 0;(ICP)

where f and hi are continuously differentiable functions from Rn to R.

More succinctly, we can write this problem as

max f .x/

s.t. h.x/ D 0;

where h W Rn ! R` is the function h D .h1; : : : ; h`/.

For any feasible point x, the set of active inequality constraints is de-

noted by

A.x/´ fi W hi .x/ D 0g : (2.7)

If i … A.x/, we say that the i th constraint is inactive at x. We note that

if x� is a local maximum of (ICP), then x� is also a local maximum for a

problem identical to (ICP) except that the inactive constraints at x� have

been discarded. Thus, inactive constraints at x� do not matter ; they can be

ignored in the statement of optimality conditions.

2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM 43

0 x1

x2

0 x1

x2

0 x1

x2

0 x1

x2

Of .y/

Oh1.y/

Oh2.z/

Oh1.z/

Of .z/

Lh1.0/

Lh2.0/

x

Df .x/ D 0

y

z

Figure 2.7: Inequality constrained optimization.

Definition 2.18 Let x� be a point satisfying the constraint h.x�/ 6 0. Then

x� is said to be regular if Ohi .x�/, i 2 A.x�/, are linearly independent.

2.4.1 First-Order Analysis

Example 2.19 Figure 2.7 illustrates a problem with two inequality con-

straints and depicts three possibilities, depending on whether none, one,

or two constraints are active.

� In the first case, we could have a constrained local maximum such as x,

for which both constraints are inactive. Such a vector must be a critical

point of the objective function.

� In the second case, only a single constraint is active at a constrained

local maximum such as y , and here the gradients of the objective and

constraint are collinear. As we will see, these gradients actually point in

the same direction.

� Lastly, we could have a constrained local maximum such as z, where

both constraints are active. Here, the gradient of the objective is not

collinear with the gradient of either constraint, and it may appear that


Oh1.x�/

Oh2.x�/

Of .x�/

x�

Figure 2.8: Kuhn-Tucker Theorem.

no gradient restriction is possible. But in fact, Of .z/ can be written as a

linear combination of Oh1.z/ and Oh2.z/ with non-negative weights.

The restrictions evident in Figure 2.7 are formalized in the next theorem.

Theorem 2.20 (Kuhn-Tucker Theorem) Let x� be a local maximum of (ICP),

and assume that x� is regular. Then there exists a vector �� D .��1 ; : : : ; ��`/

such that

��i > 0 and ��i hi .x�/ D 0; i D 1; : : : ; `; (KT-1)

Of .x�/ DXiD1

�iOhi .x�/: (KT-2)


Remark 2.21 [1] Geometrically, the first-order condition from the Kuhn-

Tucker Theorem means that the gradient of the objective function, Of .x�/,is contained in the “semi-positive cone” generated by the gradients of

binding constraints, i.e., it is contained in the set8<:XiD1

˛iOhi .x�/ W ˛1; : : : ; ˛` > 0

9=; ;depicted in Figure 2.8.

[2] Condition (KT-1) in Theorem 2.20 is called the condition of comple-

mentary slackness: if hi .x�/ < 0 then ��i D 0; if ��i > 0 then hi .x�/ D 0.

The Kuhn-Tucker Multipliers The vector �� in Theorem 2.20 is called

the vector of Kuhn-Tucker multipliers corresponding to the local maximum

x�. The Kuhn-Tucker multipliers measure the sensitivity of the objective

function at x� to relaxations of the various constraints:


� If hi .x�/ < 0, then the i th constraint is already slack, so relaxing it fur-

ther will not help raise the value of the objective function, and ��i must

be zero.

� If hi .x�/ D 0, then relaxing the i th constraint may help increase the value

of the maximization exercise, so we have ��i > 0.

Two Differences There are two important differences from the case of

equality constraints (see Theorem 2.12 and Theorem 2.20):

� The regularity condition now holds only for the gradients of binding

constraints.

� The multipliers are non-negative. This difference comes from the fact

that now only the inequality hi .x/ 6 0 needs to be maintained, so relax-

ing the constraint never hurts.

The Regularity Condition As with the analogous condition in Theorem 2.12

(see Example 2.14), here we show that the regularity condition in Theo-

rem 2.20 is essential.

Example 2.22 Consider the following maximization problem

max f .x1; x2/ D x1

s.t. h1.x1; x2/ D �.1 � x1/3C x2 6 0

h2.x1; x2/ D �x1 6 0

h3.x1; x2/ D �x2 6 0:

See Figure 2.9. Clearly the solution is .x�1 ; x�2 / D .1; 0/. At this point we have

Oh1.1; 0/ D .0; 1/; Oh3.1; 0/ D .0;�1/ and Of .1; 0/ D .1; 0/:

Since x�1 > 0, it follows from the complementary slackness condition (KT-1)

that ��2 D 0. But now (KT-2) fails: for any �1 > 0 and �3 > 0, we have

�1Oh1.1; 0/C �3Oh3.1; 0/ D .0; �1 � �3/ ¤ Of .1; 0/:

This is because .1; 0/ is not regular: There are two binding constraints

at .1; 0/, namely h1 and h3, and the gradients Oh1.1; 0/ and Oh3.1; 0/ are

colinear. Certainly Of .1; 0/ cannot be contained in the cone generated by

Oh1.1; 0/ and Oh3.1; 0/; see Remark 2.21.


0 x1

x2

1

Lf .1=2/ Lf .1/

Lh1 .0/

Oh3.1; 0/

Of .1; 0/

Oh1.1; 0/

Figure 2.9: The constraint qualification fails at .1; 0/.

The Lagrangian As with equality constraints, we can define the Lagrangian

L W Rn � R` ! R by

L.x;�/ D f .x/ �XiD1

�ihi .x/;

and then condition (KT-2) from Theorem 2.20 is the requirement that x is

a critical point of the Lagrangian given multipliers �1; : : : ; �`.

Let us consider a numerical example.


max x2 � y

s.t. x2 C y2 � 1 6 0:

Set the Lagrangian:

L.x; y; �/ D x2 � y � �.x2 C y2 � 1/:


0 x

y

0 x

y

Lf.1/

Lf.5=4/

Lh .0/

.�p3=4;�1=2/ .

p3=4;�1=2/


The critical points of L are the solutions .x; y; �/ to

2x � 2�x D 0 (2.8)

�1 � 2�y D 0 (2.9)

� > 0 (2.10)

x2 C y2 � 1 6 0 (2.11)

�.x2 C y2 � 1/ D 0: (2.12)

For (2.8) to hold, we must have x D 0 or � D 1.

� If � D 1, then (2.9) implies y D �1=2, and (2.12) implies x D ˙p3=2.

That is,

.x; y; �/ D

˙

p3

2;�1

2; 1

!:

We thus have f .x; y/ D 5=4; see Figure 2.10.

� If x D 0, then � > 0 by (2.9). Hence h is binding: x2 C y2 � 1 D 0, and so

y D ˙1. Since (2.9) implies that y D 1 is impossible, we have

.x; y; �/ D

�0;�1;

1

2

�:

At this critical point, we have f .0;�1/ D 1 < 5=4, which means that

.0;�1; 1=2/ cannot be a solution. Since there are no other critical points,

it follows that there are exactly two solutions to the maximization prob-

lem, namely .x�; y�/ D .˙p3=2;�1=2/.


? Exercise 2.24 Let U D R2; let f .x; y/ D .x � 1/2 C y2, and let h.x; y/ D

2kx � y2 6 0, where k > 0. Solve the maximization problem

max ff .x; y/ W h.x; y/ 6 0g :

Example 2.25 Consider a consumer’s problem:

max.x;y/2R2

fu.x; y/ D x C yg

s.t. h1.x; y/ D �x 6 0

h2.x; y/ D �y 6 0

h3.x; y/ D pxx C pyy � I 6 0;

(2.13)

where px ; py ; I > 0.

We first identify all possible combinations of constraints that can, in

principle, be binding at the optimum. There are eight combinations to be

check:

¿; h1; h2; h3; .h1; h2/; .h1; h3/; .h2; h3/; and .h1; h2; h3/:

Of these, the last one can be ruled out, since h1 D h2 D 0 implies that h3 < 0.

Moreover, since u is strictly increasing in both arguments, it is obvious that

h3 D 0. So we only need to check three combinations: .h1; h3/, .h2; h3/, and

h3.

� If the optimum occurs at a point where only h1 and h3 are binding, then�Oh1.x; y/;Oh3.x; y/

�D�.�1; 0/; .px ; py/

�is linear independent. So the constraint qualification holds at such a

point.

� Similarly, the constraint qualification holds if only .h2; h3/ bind.

� If h3 is the only binding constraint, then Oh3.x; y/ D .px ; py/ ¤ 0; that

is, the constraint qualification holds.

? Exercise 2.26 Solve the problem of (2.13).

2.4.2 Second-Order Analysis

We probably do not have time to discus the second-order conditions. See

Duggan (2010, Section 6.3).

2.5 ENVELOP THEOREM 49

2.5 Envelop Theorem

Let A � R. The graph of a real-valued function f on A is a curve in the

R2 plane, and we shall also refer to the curve itself as f . Given a one-

dimensional parametrized family of curves f˛ W A ! R, where ˛ runs over

some interval, the curve h W A! R is the envelope of the family if

� each point on the curve h is tangent to the graph of one of the curves f˛and

� each curve f˛ is tangent to h.

(See, e.g., Apostol 1974, p. 342 or Zorich 2004, p. 252 for this definition.)

That is, for each ˛, there is some q and also for each q, there is some ˛,

satisfying ˚f˛.q/ D h.q/

f 0˛.q/ D h0.q/:

We may regard h as a function of ˛ if the correspondence between curves

and points on the envelope is one-to-one.

2.5.1 An Envelopment Theorem for Unconstrained

Maximization

Consider now an unconstrained parametrized maximization problem. Let

x�.q/ be the value of the control variable x that maximizes f .x; q/, where q

is our parameter of interest. For some fixed x, the function

�x.q/´ f .x; q/

defines a curve. We also define the value function

V.q/´ f�x�.q/; q

�D max

x�x.q/:

Under appropriate conditions, the graph of the value function V will be the

envelope of the curves �x . “Envelope theorems” in maximization theory are

concerned with the tangency conditions this entails.

Example 2.27 Let

f .x; q/ D q � .x � q/2 C 1; where x; q 2 Œ0; 2�:

Then given q, the maximizing x is given by x�.q/ D q, and V.q/ D q C 1.


0 q

�x.q/

1 2

1

2

3

xD0

xD0:2

xD0:4

xD0:6

xD0:8

xD1

(a) The family of curves f�x.q/ W x D0; 0:2; : : : ; 2g.

0 q

V.q/

1 2

1

2

3

V.q/ DqC1

(b) The envelop.

Figure 2.11: The graph of V is the envelope of the family of graphs of the

functions �x .

For each x, the function �x is given by

�x.q/ D q � .x � q/2C 1:

The graphs of these functions and of V are shown for selected values of x

in Figure 2.11. Observe that the graph of V is the envelope of the family of

graphs of the functions �x . Consequently the slope of V is the slope of the

�x to which it is tangent, that is,

V 0.q/ D@�x

@q

ˇxDx�.q/Dq

D@f

@q

ˇxDx�.q/Dq

D 1C 2.x � q/jxDx�.q/Dq D 1:

This last observation is one version of the Envelope Theorem.

2.5.2 An Envelope Theorem for Constrained Maximization

Consider the maximization problem,

maxx2Rn

f .x; q/ s.t. gi .x; q/ D 0; i D 1; : : : ; m; (2.14)


where x is a vector of choice variables, and q D .q1; : : : ; q`/ 2 R` is a vector

of parameters that may enter the objective function, the constraints, or

both.

Suppose that for each q there exists a unique solution x.q/. Further-

more, we assume that the objective function f W Rn ! R, constraints gi W Rn�

R` ! R (i D 1; : : : ; m), and solutions x W R` ! Rn are differentiable in the

parameter q.

Then, for every parameter q, the maximized value of the objective func-

tion is f .x.q/; q/. This defines a new function, V W R` ! R, called the value

function. Formally,

V.q/´ maxx2Rn

ff .x; q/ W gi .x; q/ D 0; i D 1; : : : ; mg : (2.15)

Theorem 2.28 (Envelope Theorem) Consider the value function V.q/ for

the problem (2.14). Let .�1; : : : ; �m/ be values of the Lagrange multipliers

associated with the maximum solution x. Nq/ at Nq. Then for each k D 1; : : : ; `,

@V. Nq/

@qk

D@f .x. Nq/; Nq/

@qk

�

mXjD1

�j@gj .x. Nq/; Nq/

@qk

: (2.16)

Proof. By definition, V.q/ D f .x.q/; q/ for all q. Using the chain rule, we

have@V. Nq/

@qk

D

nXiD1

"@f .x. Nq/; Nq/

@xi

@xi . Nq/

@qk

#C@f .x. Nq/; Nq/

@qk

:

It follows from Theorem 2.12 that

@f .x. Nq/; Nq/

@xiD

mXjD1


@xi:

Hence,

@V. Nq/

@qk

D

nXiD1

"@f .x. Nq/; Nq/

@xi

@xi . Nq/

@qk

#C@f .x. Nq/; Nq/

@qk

D

nXiD1

2640@ mXjD1


@xi

1A @xi . Nq/

@qk

375C @f .x. Nq/; Nq/

@qk

D

nXjD1

�j

nXiD1

"@gj .x. Nq/; Nq/

@xi

@xi . Nq/

@qk

#C@f .x. Nq/; Nq/

@qk

:

Finally, since gj .x.q/; q/ D 0 for all q, we have

nXiD1

"@gj .x. Nq/; Nq/

@xi

@xi . Nq/

@qk

#C@gj .x. Nq/; Nq/

@qk

D 0:


Combining, we get (2.16). ut

Example 2.29 We are given the problem

max.x;y/2R2

ff ..x; y/; q/ D xyg s.t. g..x; y/; q/ D 2x C 4y � q D 0:

Forming the Lagrangian, we get

L D xy � �.2x C 4y � q/;

with first-order conditions:

y � 2� D 0

x � 4� D 0

q � 2x � 4y D 0:

(2.17)

These solve for x.q/ D q=4, y.q/ D q=8 and �.q/ D q=16. Thus,

V.q/ D x.q/y.q/ Dq2

32:

Differentiating V.q/ with respect to q we get

V 0.q/ Dq

16:

Now let us verify this using the Envelope Theorem. The theorem tells us

that

V 0.q/ D@f ..x.q/; y.q//; q/

@q� �.q/

@g..x.q/; y.q//; q/

@qD �.q/ D

q

16:

Example 2.30 Consider a consumer whose utility function u W RnC ! R is

strictly increasing in every commodity i D 1; : : : ; n. Then this consumer’s

problem is

maxu.x/ s.t.nXiD1

xipi D I:

The Lagrangian is

L.x; �/ D u.x/ � �

0@I � nXiD1

xipi

1A :It follows from Theorem 2.28 that

V 0.I / [email protected]; �/

@ID �:

That is, � measures the marginal utility of income.


2.5.3 Integral Form Envelope Theorem

The Envelope theorems we introduced so far rely on assumptions that are

not satisfactory for applications, e.g., mechanism design. Unfortunately, it

is too technique to develop the more advanced treatment of the Envelope

Theorem. We refer the reader to Milgrom and Segal (2002) and Milgrom

(2004, Chapter 3) for the integral form Envelope Theorem.

3

CONVEX ANALYSIS IN RN

Rockafellar (1970) is the classical reference for finite-dimensional con-

vex analysis. As for infinite-dimensional convex analysis, Luenberger (1969)

is an excellent text.

To understand the material what follows, it is necessary that the reader

have a good background in Multivariable Calculus (Chapter 1) and Linear

Algebra (Axler, 1997).

In this chapter, we will exclusively consider convexity in Rn for con-

creteness, but much of the discussion here generalizes to infinite dimen-

sional vector spaces. You may want to consult Lay (1982), Hiriart-Urruty

and Lemaréchal (2001), Berkovitz (2002), Bertsekas, Nedic and Ozdaglar

(2003), Bertsekas (2009), Ok (2007, Chapter G) and Royden and Fitzpatrick

(2010, Chapter 6).

3.1 Convex Sets

Definition 3.1 (Convex Set) A subset C � Rn is convex if for every pair of

points x1;x2 2 C , the line segment

Œx1;x2�´ fx W x D �x1 C .1 � �/x2; 0 6 � 6 1g

belongs to C .

? Exercise 3.2 Sketch the following sets in R2 and determine from figure

which sets are convex and which are not:

(a)n�x; y

�W x2 C y2 6 1

o,

(b)n�x; y

�W 0 < x2 C y2 6 1

o,

(c)n�x; y

�W y > x2

o,

56 CHAPTER 3 CONVEX ANALYSIS IN RN

(0, 1, 0)

(1, 0, 0)

(0, 0, 1)

(0, 1, 0)

(1, 0, 0)

(0, 0, 1)

Figure 3.1: �2 in R3.

(d)n�x; y

�W jxj Cjyj 6 1

o, and

(e)n�x; y

�W y > 1

ı1C x2

o.

Lemma 3.3 Let fC˛g be a collection of convex sets such that C ´T˛ C˛ ¤

¿. Then C is convex.

Proof. Let x1;x2 2 C . Then x1;x2 2 C˛ for all ˛. Since C˛ is convex, we

have Œx1;x2� � C˛ for all ˛. Hence, Œx1;x2� � C , so C is convex. ut

For each positive integer n, define

�n�1´

8<:.�1; : : : ; �n/ 2 Œ0; 1�n WnXiD1

�i D 1

9=; : (3.1)

For n D 1, the set �0 is the singleton f1g. For n D 2, the set �1 is the

closed line segment joining .0; 1/ and .1; 0/. For n D 3, the set �2 is the

closed triangle with vertices .1; 00/, .0; 1; 0/ and .0; 0; 1/ (see Figure 3.1).

Definition 3.4 A point x 2 Rn is a convex combination of points x1; : : : ;xkif there exists � 2 �k�1 such that

x D

kXiD1

�ixi :

Lemma 3.5 A set C � Rn is convex iff every convex combination of points

in C is also in C .

3.2 SEPARATION THEOREM 57

Proof. The “if” part is evident. So we shall prove the “only if” statement by

induction on k. It holds for k D 2 by definition. Suppose the statement is

true for k D n. Now consider k D n C 1. Let x1; : : : ;xkC1 2 C and � 2 �n

with �kC1 2 .0; 1/. Then

x D

nC1XiD1

�ixi

D

0@ nXjD1

�j

1A24 nXiD1

�iPnjD1 �j

!xi

35C �kC1xkC12 C: ut

Let A � Rn, and let A be the class of all convex subsets of Rn that

contain A. We have A ¤ ¿—after all, Rn 2 A. Then, by Lemma 3.3,T

A is

a convex set in Rn that contains A. Clearly, this set is the smallest (that is,

�-minimum) convex subset of Rn that contains A.

Definition 3.6 The convex hull of A, denoted by cov.A/, is the intersection

of all convex sets containing A.

? Exercise 3.7 For a given set A, let K.A/ denote the set of all convex

combinations of points in A. Show that K.A/ is convex and A � K.A/.

Theorem 3.8 Let A � Rn. Then cov.A/ D K.A/.

Proof. Let A be the family of convex sets containing A. Since cov.A/ DT

A

and K.A/ 2 A (Exercise 3.7), we have cov.A/ � K.A/.

To prove the reverse inclusion relation, take an arbitrary C 2 A. Then

A � C . It follows from Lemma 3.5 that K.A/ � C . Hence K.A/ �T

A D

cov.A/. ut

3.2 Separation Theorem

This section is devoted to the establishment of separation theorems. In

some sense, these theorems are the fundamental theorems of optimization

theory. For simplicity, we restrict our analysis on Rn.

Definition 3.9 A hyperplane Hˇa in Rn is defined to be the set of points

that satisfy the equation ha;xi D ˛. Thus,

Hˇa ´

˚x 2 Rn W ha;xi D ˇ

:

The vector a is said to be a normal to the hyperplane.


Hˇa

a

Half-space below Hˇa

Half-space above Hˇa

Figure 3.2: Hyperplane and half-spaces.

Remark 3.10 Geometrically, a hyperplane Hˇa in Rn is a translation of an

.n� 1/-dimensional subspace (an affine manifold). Algebraically, it is a level

set of a linear functional. For an excellent explanation about hyperplanes,

see Luenberger (1969, Section 5.12).

A hyperplane Hˇa divides Rn into two half spaces, one on each side of

Hˇa . The set fx 2 Rn W ha;xi > ˇg is called the half-space above the hyper-

plane Hˇa , and the set fx 2 Rn W ha;xi 6 ˇg is called the half-space below the

hyperplane Hˇa ; see Figure 3.2.

It therefore seems natural to say that two sets X and Y are separated by

a hyperplane Hˇa if they are contained in different half spaces determined

by Hˇa . We will introduce two separation theorems.

Theorem 3.11 (A First Separation Theorem) Let C be a closed and convex

subset of Rn; let y 2 Rn X C . Then there exists a vector a 2 Rn with a ¤ 0,

and a scalar ˇ 2 R such that ha;yi > ˇ and ha;xi < ˇ for all x 2 C .

To motivate the proof, we argue heuristically from Figure 3.3, where C

is assumed to have a tangent at each boundary point. Draw a line from y to

x�, the point of C that is closest to y . The vector y � x� is orthogonal to C

in the sense that y �x� is orthogonal to the tangent line at x�. The tangent

line, which is exactly the set fz 2 Rn W hy � x�; z � x�i D 0g, separates y and

C . The point x� is characterized by the fact that hy � x�;x � x�i 6 0 for all

x 2 C . If we move the tangent line parallel to itself so as to pass through

a point x0 2 .x�;y/, we are done. We now justify these steps in a series of

claims.

Claim 1 Let C be a convex subset of Rn and let y 2 Rn X C . If there exists

a point in C that is closest to y , then it is unique.


Hˇa

y

a

C

x�

x

y � x�

x � x�

Figure 3.3: The separating hyperplane theorem.

Proof. Suppose that there were two points x1 and x2 of C that were closest

to y . Then .x1 C x2/=2 2 C since C is convex, and so1

d.y; C / 6 x1 C x22

� y

D

12 �.x1 � y/C .x2 � y/�

61

2kx1 � yk C

1

2kx2 � yk

D d.y; C /:

Hence the triangle inequality holds with equality. It follows from Exer-

cise 1.8 that there exists � > 0 such that x1 � y D �.x2 � y/. Clearly, � ¤ 0;

for otherwise x1 � y D 0 implies that y D x1 2 C . Then � D 1 since

kx1 � yk D kx2 � yk D d.y; C /. But then x1 D x2.

Claim 2 Let C be a closed subset of Rn and let y 2 Rn X C . Then there

exists a point x� 2 C that is closest to y .

Proof. Take an arbitrary point x0 2 C . Let r > kx0�yk. Then C1´ B.yI r/\

C is nonempty (at least x0 is in the intersection), closed, and bounded and

hence is compact. The function x 7! kx � yk is continuous on C1 and so

attains its minimum at some point x� 2 C1, i.e.,

kx� � yk 6 kx � yk for all x 2 C1:

1For a set X � Rn, the function d.y;X/ is the distance from y to X defined by

d.y;X/ D infx2Xky � xk:


For every x 2 C X C1, we have

kx � yk > r > kx0 � yk > kx� � yk;

since x0 2 C1.

Claim 3 Let C be a convex subset of Rn and let y 2 Rn X C . Then x� 2 C

is a closest point in C to y iff˝y � x�;x � x�

˛6 0 for all x 2 C: (3.2)

Proof. Let x� 2 C be a closest point to y and let x 2 C . Since C is convex,

we have

Œx�;x�´˚z.t/ 2 Rn W z.t/ D x� C t .x � x�/; t 2 Œ0; 1�

� C:

Let

g.t/´ kz.t/ � yk2 D˝x� C t .x � x�/ � y;x� C t .x � x�/ � y

˛D

nXiD1

�x�i � yi C t .xi � x

�i /�2:

Observe that g.0/ D x�. Since g is continuously differentiable on .0; 1� and

x� 2 argminx2C kx � yk, we have g0C.0/ > 0. Since

g0.t/ D 2

nXiD1

�x�i � yi C t .xi � x

�i /�.xi � x

�i /

D 2

24� nXiD1

.yi � x�i /.xi � x

�i /C t

nXiD1

.xi � x�i /2

35D 2

h�˝y � x�;x � x�

˛C tkx � x�k2

i:

(3.3)

Letting t # 0 we get (3.2).

Conversely, suppose that (3.2) holds. Take an arbitrary x 2 C X fx�g. It

follows from (3.3) that if t 2 .0; 1� then

g0.t/ D 2h�˝y � x�;x � x�

˛C tkx � x�k2

i> 2tkx � x�k2 > 0:

That is, g is strictly increasing on Œ0; 1�. Thus, g.1/ D kx � yk > kx� � yk D

g.0/.


Proof of Theorem 3.11. We now can complete the proof of Theorem 3.11.

Let x� 2 C be the closest point to y (by Claim 1 and Claim 2). Let a D y�x�.

Then for all x 2 C , we have ha;x � x�i 6 0 (by Claim 3), i.e., ha;xi 6 ha;x�i,with equality occurring when x D x�. Hence,

maxx2Cha;xi D

˝a;x�

˛:

On the other hand, ha;y � x�i D kak2 > 0, so

ha;yi D˝a;x�

˛C kak2 >

˝a;x�

˛:

Finally, take an arbitrary ˇ 2 .ha;x�i ; ha;yi/. We thus have ha;yi > ˇ

and ha;xi 6 ha;x�i < ˇ for all x 2 C . ut

Theorem 3.12 (Separation Theorem) Let X and Y be two disjoint convex

subsets of Rn. Then there exists a 2 Rn with a ¤ 0 and a scalar ˇ 2 R such

that ha;xi > ˇ for all x 2 X and ha;yi 6 ˇ for all y 2 Y . That is, there is a

hyperplane Hˇa that separates X and Y .

Proof. First note that

X \ Y D ¿ () 0 … X � Y:

Since X and Y are convex, so is X � Y . Hence, there exists an a ¤ 0 such

that

ha;x � yi 6 0 for all x � y 2 X � Y:

Therefore,

ha;xi 6 ha;yi for all x 2 X and y 2 Y: (3.4)

Let ˛ ´ supfha;xi W x 2 Xg and ´ inffha;yi W y 2 Y g. Then from (3.4)

we get that ˛ and are finite, and for any number ˇ 2 Œ˛; �,

ha;xi 6 ˇ 6 ha;yi for all x 2 X and y 2 Y:

Thus the hyperplane Hˇa separates X and Y . ut

Theorem 3.13 (Proper Separation Theorem) Let X and Y be two convex

sets such that XB ¤ ¿ and XB \ Y D ¿. Then there exists a hyperplane Hˇa

that properly separates xX and xY .

Proof. SinceXB is convex and disjoint from Y , it follows from Theorem 3.12

that there exists a hyperplane Hˇa such that

ha;xi 6 ˇ 6 ha;yi for all x 2 XB and all y 2 Y: (3.5)


Let x 2 xX . Then x 2 XB since xX D XB. So there exists a sequence .xn/

in XB such that xn ! x. It then follows from (3.5) and the continuity of

the inner product that (3.5) holds for all x 2 xX and all y 2 xY . Hence, the

hyperplane Hˇa separates xX and xY . Since . xX/B D XB, we know that . xX/B ¤

¿. Furthermore, ha;xi < ˇ for x 2 . xX/B, so the separation is proper. ut

Theorem 3.14 (Strong Separation Theorem) Let K be a compact convex

set and let C be a closed convex set such that K \ C D ¿. Then K and C

can be strongly separated.

Proof. For every " > 0, define

K"´ K C B.0I "/ D[

x2K

�x C B.0I "/

�C"´ C C B.0I "/ D

[y2C

�y C B.0I "/

�:

Clear, both K" and C" are open and convex.

We now show that there exists " > 0 such that K" \ C" D ¿. If the

assertion were false, there would exist a sequence ."n/ with "n > 0 and

"n ! 0 and a sequence .wn/ such that wn 2 K"k \ C"k for each n. Since

wn D

˚xn C zn with xn 2 K and kznk < "k

yn C z0n with yn 2 C and

z0n < "k ;we have a sequence .xk/ in K and a sequence .yn/ in C such that

kwn � xnk < "k ;

kwn � ynk < "k :

Hence,

kxn � ynk D .xn �wn/C .wn � yn/ 6 kxn �wnkCkyn �wnk < 2"k : (3.6)

Since K is compact, there exists a subsequence .xnk / that converges to

x0 2 K. It follows from (3.6) that

kynk � x0k 6 kxnk � x0k C kxnk � ynkk ! 0;

that is, ynk ! x0. Since C is closed, x0 2 C . This contradicts the assump-

tion that C \K D ¿, and so the assertion is true.

We have shown that there exists " > 0 such that K" \ C" D ¿. Hence,

by Theorem 3.13 there is a hyperplane that properly separates K" and C".

Since both K" and C" are open, the separation is strict. According to the

definition, this says that K and C are strongly separated. ut

3.3 SYSTEMS OF LINEAR INEQUALITIES: THEOREMS OF THE ALTERNATIVE 63

3.3 Systems of Linear Inequalities: Theorems of the

Alternative

Let A be an m� n matrix and b ¤ 0 be a vector in Rm. We focus on finding a

non-negative x 2 RnC such that A � x D b or show that no such x exists. The

problem can be framed as follows: can b be expressed as a non-negative

linear combination of the columns of A?

Definition 3.15 The set of all non-negative linear combinations of the

columns of A is called the finite cone generated by the columns of A. It is

denoted cone.A/.

Lemma 3.16 Let A be an m � n matrix, then cone.A/ is a convex set.

Proof. Take any two y;y 0 2 cone.A/. Then there exist x;x0 2 RnC such that

y D Ax and y 0 D Ax0:

For any � 2 Œ0; 1� we have

�y C .1 � �/y 0 D �Ax C .1 � �/Ax0 D A��x C .1 � �/x0

�:

Since �x C .1 � �/x0 2 RnC it follows that �y C .1 � �/y 0 2 cone.A/. ut

Lemma 3.17 Let A be an m � n matrix, then cone.A/ is a closed set.

Proof. We prove this lemma by considering two cases.

Case 1: The columns of A are linearly independent. Let .wn/ be a con-

vergent sequence in cone.A/ with limit w. We show that w 2 cone.A/. For

each wn there exists xn 2 RnC such that wn D Axn. We use the fact that

.Axn/ converges to show that .xn/ converges.

ut

Theorem 3.18 (Farkas’ Lemma) Let A be anm�nmatrix and let b 2 RmXf0g.

Then one and only one of the systems

Ax D b; x > 0; x 2 Rn; (FI )

yA > 0; y � b < 0; y 2 Rm; (FII )

has a solution.


Proof. We first suppose that (FI ) has a solution x�. We shall show that (FII )

has no solution. Suppose there exists y0 2 Rm such that AT � y0 > 0. Then

y0 � b D x�� .ATy0/ > 0:

Hence (FII ) has no solution.

We now suppose that (FI ) has no solution; that is,

b … cone.A/:

Since cone.A/ is convex and closed, there exists a hyperplane Hˇy0 that

strongly separates b and cone.A/. Thus, y0 ¤ 0 and˝y0;b

˛> ˇ >

˝y0; z

˛for all z 2 cone.A/:

Since 0 2 cone.A/, we get that ˇ > 0. Thus,˝y0;b

˛> 0: (3.7)

For all z 2 cone.A/ we have˝y0; z

˛D˝y0;Ax

˛D

Dx;ATy0

E< ˛;

the last inequality now holding for all x 2 RnC. ut

Example 3.19 We use the Farkas’ Lemma to decide if the following system

has a non-negative solution:

"4 1 �5

1 0 2

#264x1x2x3

375 D "11

#:

The Farkas alternative ishy1 y2

i "4 1 �5

1 0 2

#>h0 0 0

i;

y1 C y2 < 0:

Or equivalently, ˚4y1 C y2 > 0;

y1 > 0;

�5y1 C 2y2 > 0;

y1 C y2 < 0:

(3.8)

3.4 CONVEX FUNCTIONS 65

There has no solution to the system (3.8). The second inequality requires

that y1 > 0. Combining this with the last inequality we conclude that y2 < 0.

But y1 > 0 and y2 < 0 contradict the third inequality of (3.8). So, the original

system has a non-negative solution.

Example 3.20 We use the Farkas’ Lemma to decide the solvability of the

system: 2666641 1 0

0 1 1

1 0 1

1 1 1

377775264x1x2x3

375 D2666642

2

2

1

377775 :We are interested in non-negative solutions of this system. The Farkas

alternative is

hy1 y2 y3 y4

i2666641 1 0

0 1 1

1 0 1

1 1 1

377775 >h0 0 0 0

i;

2y1 C 2y2 C 2y3 C y4 < 0:

One solution is y1 D y2 D y3 D �1=2 and y4 D 1, implying that the given

system has no solution.

3.4 Convex Functions

Throughout this section we will assume the subset C � Rn is convex and

f is a real-valued function defined on C , that is, f W C ! R. When we take

x1;x2 2 C , we will let x t ´ tx1 C .1 � t /x2, for t 2 Œ0; 1�, denote the convex

combination of x1 and x2.

Definitions

Definition 3.21 A function f W C ! R is convex if for all x1;x2 2 C and

t 2 Œ0; 1�,

f Œtx1 C .1 � t /x2� 6 tf .x1/C .1 � t /f .x2/:

The function f is strictly convex if the above inequality holds strictly.

A function f W C ! R is concave if for all x1;x2 2 C and t 2 Œ0; 1�,

f Œtx1 C .1 � t /x2� > tf .x1/C .1 � t /f .x2/:

The function f is strictly concave if the above inequality holds strictly.


0 x

f .x/

epi.f /

(a) f is convex.

0 x

f .x/

sub.f /

(b) f is concave.

0 x1

x2

I.y0/Lf .y

0 /

(c) f is quasi-convex.

0 x1

x2

S.y0/

Lf.y0 /

(d) f is quasi-concave.

Figure 3.4:

Definition 3.22 A function f W C ! R is quasi-convex if, for all x1;x2 2 C

and t 2 Œ0; 1�,

f Œtx1 C .1 � t /x2� 6 max ff .x1/; f .x2/g :

The function f is strictly quasi-convex function if the above inequality holds

strictly.

A function f W C ! R is quasi-concave if, for all x1;x2 2 C and t 2 Œ0; 1�,

f Œtx1 C .1 � t /x2� > min ff .x1/; f .x2/g :

The function f is strictly quasi-concave if the above inequality holds strictly.

Geometric Interpretation

Given a function f W C ! R and y0 2 R, let us define


The epigraph of f : epi.f /´ f.x; y/ 2 C � R W f .x/ 6 yg;

The subgraph of f : sub.f /´ f.x; y/ 2 C � R W f .x/ > yg;

The superior set for level y0: S.y0/´ fx 2 C W f .x/ > y0g;

The inferior set for level y0: I.y0/´ fx 2 C W f .x/ 6 y0g.

We then have (see Figure 3.4)

� f is convex () epi.f / is convex.

� f is concave () sub.f / is convex.

� f is quasi-convex () I.y0/ is convex.

� f is quasi-concave () S.y0/ is convex.

Convexity and Quasi-convexity

It is a simple matter to show that concavity (convexity) implies quasi-concavity

(quasi-convexity).

Theorem 3.23 Let C � Rn and f W C ! R. If f is concave on C , it is

also quasi-concave on C . If f is convex on C , it is also quasi-convex

on C .

Proof. We only prove the first claim, and leave the second as an exercise.

Suppose f is concave on C . Take any x;y 2 C and t 2 Œ0; 1�. Without loss

of generality, we let

f .x/ > f .y/:

By the definition of concavity, we have

f�tx C .1 � t /y

�> tf .x/C .1 � t /f .y/

D t�f .x/ � f .y/

�C f .y/

> f .y/

D min ff .x/; f .y/g :

Hence, f is quasi-concave. ut

? Exercise 3.24 Prove the second claim in Theorem 3.23: if f is convex,

then it is also quasi-convex.


Concavity and Hessian

We now characterize concavity of a function using the Hessian matrix.

Theorem 3.25 Let A � Rn. The (twice continuously differentiable)

function f W A! R is concave if and only if Hf .x/ is negative semidef-

inite for every x 2 A.

Proof. See Mas-Colell et al. (1995, Theorem M.C.2). ut

Example 3.26 Let A´ .0;1/� .�5;1/. Let f .x; y/ D ln x C ln.y C 5/. For

each point .x; y/ 2 A, the Hessian of f is

Hf .x; y/ D

"�1=x2 0

0 �1=.y C 5/2

#:

Then for each .u; v/ 2 R2, we have

hu v

i�

"�1=x2 0

0 �1=.y C 5/2

#"u

v

#D �

u2

x2�

v2

.y C 5/26 0:

That is, Hf .x; y/ is negative semidefinite. Hence, f .x; y/ is concave. See

Figure 3.5.

Jensen’s Inequality

Theorem 3.27 (Jensen’s Inequality) Let f W C ! R, where C � Rn is

convex. Then f is convex iff for every finite set of points x1; : : : ;xk 2 C

and every t D .t1; : : : ; tk/ 2 �k�1 (see (3.1) for the definition of �k�1),

f

0@ kXiD1

tixi

1A 6kXiD1

tif .xi /: (3.9)

Proof. The ”If” part is evident. So we only prove the “only if” part. Suppose

that f is convex. We shall prove (3.9) by induction on k. For k D 1 the

relation is trivial (remember that �0 D f1g when k D 1). For k D 2 the

relation follows from the definition of a convex function. Suppose that

k > 2 and that (3.9) has been established for k� 1. We show that (3.9) holds

for k.


10

20

30

x

0

10

20

30

y

0

2

4

6

Figure 3.5: The function ln x C ln.y C 5/ is concave.

If tk D 1, then there is nothing to prove. If tk < 1, set T DPk�1iD1 ti . Then

T C tk D 1, T D 1 � tk > 0 and

k�1XiD1

ti

TDT

TD 1:


Hence,

f

0@ kXiD1

tixi

1A D f0@k�1XiD1

tixi C tkxk

1AD f

0B@T24k�1XiD1

�ti

T

�xi

35C tkxk1CA

6 Tf

0@k�1XiD1

�ti

T

�xi

1AC tkf .xk/6 T

24k�1XiD1

�ti

T

�f .xi /

35C tkf .xk/D

kXiD1

tif .xi /;

where the first inequality follows from the convexity of f and the second

inequality follows from the inductive hypothesis. ut

3.5 Convexity and Optimization

Concave Programming

We first present two results which indicates the importance of convexity

for optimization theory.

� In convex optimization problems, all local optima must also be global

optima.

� If a strictly convex optimization problem admits a solution, the solution

must be unique.

Theorem 3.28 Let X � Rn be convex and f W X ! R be concave. Then

(a) Any local maximizer of f is a global maximizer of f .

(b) The set argmaxff .x/ W x 2 Xg of maximizers of f on X is either

empty or convex.

3.5 CONVEXITY AND OPTIMIZATION 71

Proof. (a) Suppose x is a local maximizer but not a global maximizer of f .

Then there exists " > 0 such that

f .x/ > f .y/; for all y 2 X \ B.xI "/;

and there exists z 2 X such that

f .z/ > f .x/: (3.10)

Since X is convex, Œtx C .1 � t /z� 2 X for all t 2 .0; 1/. Take Nt sufficiently

close to 1 so that NtxC .1� Nt /z 2 B.xI "/. By the concavity of f and (3.10), we

have

f�Ntx C .1 � Nt /z

�> Ntf .x/C .1 � Nt /f .z/ > f .x/:

A contradiction.

b Suppose that x1 and x2 are both maximizers of f on X . Then f .x1/ D

f .x2/. For any t 2 .0; 1/, we have

f .x1/ > f�tx1 C .1 � t /x2

�> tf .x1/C .1 � t /f .x2/ D f .x1/:

That is, f ŒtxC .1� t /x2� D f .x1/. Thus, tx1C .1� t /x2 is a maximizer of f

on X . ut

Theorem 3.29 Let X � Rn be convex and f W X ! R is strictly con-

cave. Then argmaxff .x/ W x 2 Xg either is empty or contains a single

point.

? Exercise 3.30 Prove Theorem 3.29.

We now present a extremely important theorem, which says that the

first-order conditions of the Kuhn-Tucker Theorem (Theorem 2.20) are both

necessary and sufficient to identify optima of convex inequality-constrained

optimization problem, provided a mild regularity condition is met.


Theorem 3.31 (Kuhn-Tucker Theorem under Convexity) Let U � Rn

be open and convex. Let f W U ! R be a concave C1 function. For

i D 1; : : : ; `, let hi W U ! R be convex C1 functions. Suppose there is

some Nx 2 U such that

hi . Nx/ < 0; i D 1; : : : ; `:

(This is called the Slater’s condition.) Then x� maximizes f over

X ´ fx 2 U W hi .x/ 6 0; i D 1; : : : ; `g

if and only if there is �� 2 R` such that the Kuhn-Tucker first-order

conditions hold:

�� > 0;XiD1

��i hi .x�/ D 0: (KTC-1)

Of .x�/ DXiD1

��i Ohi .x�/: (KTC-2)


Example 3.32 Let U D .0;1/ � .�5;1/. Let f .x; y/ D ln x C ln.y C 5/,

h1.x; y/ D x C y � 4 and h2.x; y/ D �y. Consider the problem

max.x;y/2U

f .x; y/

s.t. h1.x; y/ 6 0; h2.x; y/ 6 0:

(3.11)

? Exercise 3.33 Show that .x�; y�/ D .4; 0/ is the unique point satisfying the

first-order condition for a local maximizer of the problem (3.11).

Clearly, the Slater’s condition holds. Then, combining Theorem 3.31, Ex-

ercise 3.33 and Example 3.26, we conclude that .4; 0/ is a global maximizer.

See Figure 3.6.

Slater’s Condition

For a formal demonstration of the need for Slater’s condition, let us con-

sider the following example.

3.5 CONVEXITY AND OPTIMIZATION 73

0 x

y

4

4

X

Lh1 .0/

Lf.ln4C

ln5/

.4; 0/

Figure 3.6: .4; 0/ is the global maximizer.

Example 3.34 Let U D R; let f .x/ D x and g.x/ D x2. The only point in

R satisfying g.x/ 6 0 is x D 0, so this is trivially the constrained maximizer

of f . But

f 0.0/ D 1 and g0.0/ D 0;

so there is no � > 0 such that f 0.0/ D �g0.0/.

Quasi-concavity and Optimization

Quasi-concave and quasi-convex functions fail to exhibit many of the sharp

properties that distinguish concave and convex functions. As an example,

we show that Theorem 3.31 fails for quasi-concave objective functions.

Example 3.35 Let f W R! R and h W R! R be quasi-concave continuously

differentiable functions, where

f .x/ D

„x3 if x 2 .�1; 0/

0 if x 2 Œ0; 1�

.x � 1/2 if x 2 .1;1/;

and h.x/ D �x; see Figure 3.7.

? Exercise 3.36 Show that for every point x 2 Œ0; 1�, there exists � > 0 such

that the pair .x; �/ satisfies (KT-1) and (KT-2) (see p. 44).


Furthermore, the Slater’s condition holds. However, it is clear that no

point x 2 Œ0; 1� can be a solution to the problem (Why?).

0 x

f.x/

h.x/D�x

1

Figure 3.7: Quasi-concavity and optimization.

REFERENCES

[1] Apostol, Tom M. (1974) Mathematical Analysis: Pearson Education,

2nd edition. [49]

[2] Axler, Sheldon (1997) Linear Algebra Done Right, Undergraduate

Texts in Mathematics, New York: Springer-Verlag, 2nd edition. [11,

55]

[3] Berkovitz, Leonard D. (2002) Convexity and Optimization in Rn,

Pure and Applied Mathematics: A Wiley-Interscience Series of Texts,

Monographs and Tracts, New York: Wiley-Interscience. [55]

[4] Bertsekas, Dimitri P. (2009) Convex Optimization Theory, Belmont,

Massachusetts: Athena Scientific. [55]

[5] Bertsekas, Dimitri P., Angelia Nedic, and Asuman E. Ozdaglar

(2003) Convex Analysis and Optimization, Belmont, Massachusetts:

Athena Scientific. [55]

[6] Crossley, Martin D. (2005) Essential Topology, Springer Undergrad-

uate Mathematics Series, London: Springer-Verlag. [7]

[7] Duggan, John (2010) “Notes on Optimization and Pareto Optimality,”

URL: http://www.johnduggan.net/lecture, Rochester University.

[31, 48]

[8] Hiriart-Urruty, Jean-Baptiste and Claude Lemaréchal (2001)

Fundamentals of Convex Analysis, Grundlehren Text Editions, Berlin:

Springer-Verlag. [55]

[9] Jehle, Geoffrey A. and Philip J. Reny (2011) Advanced Microeco-

nomic Theory, Essex, England: Prentice Hall, 3rd edition. [31, 41]

[10] Lay, Steven R. (1982) Convex Sets and Their Applications, New York:

John Wiley & Sons. [55]

http://www.johnduggan.net/lecture

76 REFERENCES

[11] Lee, Jeffrey M. (2009) Manifolds and Differential Geometry, Vol. 107

of Graduate Studies in Mathematics, Providence, Rhode Island: Amer-

ican Mathematical Society. [22]

[12] Luenberger, David G. (1969) Optimization by Vector Space Methods,

Wiley Professional Paperback Series, New York: John Wiley & Sons.

[31, 55, 58]

[13] Luenberger, David G. and Yinyu Ye (2008) Linear and Nonlinear

Programming, International Series in Operations Research and Man-

agement Science, New York: Springer, 3rd edition. [31]

[14] Mas-Colell, Andreu, Michael D. Whinston, and Jerry R. Green

(1995) Microeconomic Theory, New York: Oxford University Press. [31,

68]

[15] Milgrom, Paul (2004) Putting Auction Theory to Work, Cambridge,

U.K.: Cambridge University Press. [53]

[16] Milgrom, Paul and Ilya Segal (2002) “Envelope Theorems for Ar-

bitrary Choice Sets,” Econometrica, 70 (2): 583–601. [53]

[17] Munkres, James R. (1991) Analysis on Manifolds, Boulder, Colorado:

Westview Press. [1, 14, 17]

[18] Ok, Efe A. (2007) Real Analysis with Economic Applications, New Jer-

sey: Princeton University Press. [55]

[19] O’neill, Barrett (2006) Elementary Differential Geometry, New York:

Academic Press, revised second edition edition. [22]

[20] Pugh, Charles Chapman (2002) Real Mathematical Analysis, Under-

graduate Texts in Mathematics, New York: Springer-Verlag. [17]

[21] Rockafellar, R. Tyrrell (1970) Convex Analysis, Princeton Mathe-

matical Series, New Jersey: Princeton University Press, 2nd edition.

[55]

[22] Royden, Halsey and Patrick Fitzpatrick (2010) Real Analysis,

New Jersey: Prentice Hall, 4th edition. [55]

[23] Rudin, Walter (1976) Principles of Mathematical Analysis, New York:

McGraw-Hill Companies, Inc. 3rd edition. [14, 17, 27]

REFERENCES 77

[24] Spivak, Michael (1965) Calculus on Manifolds: A Modern Approach

to Classical Theorems of Advanced Calculus, Boulder, Colorado: West-

view Press. [1, 14, 17]

[25] (1999) A Comprehensive Introduction to Differential Geome-

try, Vol. Two, Houston, Texas: Publish or Perish, Inc. [22]

[26] Sundaram, Rangarajan K. (1996) A First Course in Optimization

Theory, Cambridge, Massachusetts: Cambridge University Press. [31,

34, 40, 41, 44, 72]

[27] Vohra, Rakesh V. (2005) Advanced Mathematical Economics, Lon-

don: Routledge. [31]

[28] Willard, Stephen (2004) General Topology, New York: Dover Publi-

cations, Inc. [4]

[29] Zorich, Vladimir A. (2004) Mathematical Analysis II, Universitext,

Berlin: Springer-Verlag. [49]

INDEX

C1, 25

C1, 27

C r , 27

Active inequality constraint, 42

Affine manifold, 58

Chain Rule, 14

Cobb-Douglas function, 28

Component function, 5

Concave function, 65

Continuously differentiable, 25

Convex combination, 56

Convex function, 65

Convex hull, 57

Convex set, 55

Critical point, 34

Curve, 22

Directional derivative, 9

Envelope, 49

Epigraph, 67

Euclidean n-space, 2

Farkas’ Lemma, 63

First-order Taylor formula, 10

Gradient, 19

Graph, 5

Half space, 58

Helix, 22

Hessian, 27

Hyperplane, 57

Implicit function, 16

Implicit function theorem, 17

Inactive constraint, 42

Inequality constrained optimization,

42

Inferior set, 67

Inner product, 3

Interior, 4

Jacobian matrix, 16, 24

Jensen’s inequality, 68

Kuhn-Tucker Multipliers, 44

Kuhn-Tucker Theorem, 44

Lagrange multiplier, 38, 39

Lagrange multipliers, 38

Lagrange’s Method, 39

Lagrange’s Theorem, 37

Lagrangian, 39

Limit, 5

Local maximum, 33

Maximum, 33

maximum, 33

Negative definite, 28

Negative semidefinite, 28

Norm, 2

Orthogonal complement, 37

Partial derivative, 13

INDEX 79

Positive definite, 28

Positive semidefinite, 28

Proper Separation Theorem, 61

Quadratic form, 27

quasi-concave function, 66

quasi-convex function, 66

Regular point, 23, 43

Second-order partial derivative, 27

Separation Theorem, 61

Separation theorem, 58

Sequence, 5

Shadow price, 39

Slater’s condition, 72

Standard basis, 13

Strict local maximum, 33

Strictly concave function, 65

Strictly convex function, 65

Strictly quasi-concave function, 66

Strictly quasi-convex function, 66

Strong Separation Theorem, 62

Subgraph, 67

Superior set, 67

Tangent plane, 23

Unconstrained optimization, 33

Value function, 49, 51

Documents

Introduction to Optimization Theory - Shandong Universitycourse.sdu.edu.cn/G2S/eWebEditor/uploadfile/20130906201902822.pdf · 1 MULTIVARIABLE CALCULUS In this chapter we consider