Calculus of Variations and Partial Di erential Equationsdgomes/notas_calvar.pdf · Introduction This book is dedicated to the study of calculus of variations and its connection and

Calculus of Variations and Partial

Differential Equations

Diogo Aguiar Gomes

Contents

. Introduction 5

1. Finite dimensional optimization problems 9

1. Unconstrained minimization in Rn 10

2. Convexity 16

3. Lagrange multipliers 26

4. Linear programming 30

5. Non-linear optimization with constraints 37

6. Bibliographical notes 48

2. Calculus of variations in one independent variable 49

1. Euler-Lagrange Equations 50

2. Further necessary conditions 57

3. Applications to Riemannian geometry 60

4. Hamiltonian dynamics 75

5. Sufficient conditions 89

6. Symmetries and Noether theorem 105

7. Critical point theory 111

8. Invariant measures 116

9. Non convex problems 118

10. Geometry of Hamiltonian systems 119

11. Perturbation theory 122

12. Bibliographical notes 126

3. Calculus of variations and elliptic equations 127

1. Euler-Lagrange equation 129

2. Further necessary conditions and applications 136

3. Convexity and sufficient conditions 136

4. Direct method in the calculus of variations 136

3

4 CONTENTS

5. Euler-Lagrange equations 145

6. Regularity by energy methods 146

7. Holder continuity 155

8. Schauder estimates 171

4. Optimal control and viscosity solutions 183

1. Elementary examples and properties 186

2. Dynamic programming principle 188

3. Pontryagin maximum principle 190

4. The Hamilton-Jacobi equation 192

5. Verification theorem 193

6. Existence of optimal controls - bounded control space 195

7. Sub and superdifferentials 197

8. Optimal control in the calculus of variations setting 202

9. Viscosity solutions 214

10. Stationary problems 224

5. Duality theory 231

1. Model problems 231

2. Some informal computations 237

3. Duality 241

4. Generalized Mather problem 244

5. Monge-Kantorowich problem 266

. Bibliography 269

. Index 271

Introduction

This book is dedicated to the study of calculus of variations and its

connection and applications to partial differential equations. We have

tried to survey a wide range of techniques and problems, discussing,

both classical results as well as more recent techniques and problems.

This text is suitable to a first one-year graduate course on calculus of

variations and optimal control, and is organized in the following way:

1. Finite dimensional optimization problems;

2. Calculus of variations with one independent variable;

3. Calculus of variations and elliptic partial differential equations;

4. Deterministic optimal control and viscosity solutions;

5. Duality theory.

The first chapter is dedicated to finite dimensional optimization,

giving emphasis to techniques that can be generalized and applied in in-

finitely dimensional problems. This chapter starts with an elementary

discussion of unconstrained optimization in Rn and convexity. Then

we discuss constrained optimization problems, linear programming and

KKT conditions. The following chapter concerns variational problems

with one independent variable. We study classical results including

applications to Riemannian geometry and classical mechanics. We also

discuss sufficient conditions for minimizers, Hamiltonian dynamics and

several other related topics. The next chapter concerns variational

problems with functionals defined through multiple integrals. In many

of these problems, the Euler-Lagrange equation is an elliptic partial

differential equation, possibly non linear. Using the direct method in

the calculus of variations, we prove the existence of minimizers. Then

5

6 INTRODUCTION

we show that the minimum is a weak solution to the Euler-Lagrange

equation and study its regularity. The study of regularity follows the

classical path: first we consider energy methods, then we prove the De

Giorgi-Nash-Moser estimates and finally Schauder estimates. In the

fourth chapter we consider optimal control problems. We study both

classical control theory methods such as the dynamic programming

and Pontryagin maximum principle, as well as more recent tools such

as viscosity solutions of Hamilton-Jacobi equations. The last chap-

ter is a brief introduction to the (infinite dimensional) duality theory

and its applications to non-linear partial differential equations. We

study Mather’s problem and Monge-Kantorowich optimal mass trans-

port problem. These have important relations with Hamilton-Jacobi

and Monge-Ampere equations, respectively.

The pre-requisites of these notes are some familiarity with the

Sobolev spaces and functional analysis, at the level of [Eva98b]. With

some few exceptions, we do not assume familiarity with partial differ-

ential equations beyond elementary theory.

Many of the results discussed, as well as important extensions,

can be found in the bibliography. In what it what concerns finite

dimensional optimization and linear programming, the main reference

is [Fra02]. On variational problems with one independent variable,

a key reference is [AKN97]. The approach to elliptic equations in

chapter 3 was strongly influenced by the course the author frequented

at the University of California at Berkeley by Fraydoun Rezakhanlou,

by the (unpublished) notes on Elliptic Equations by my advisor L. C.

Evans, and by the book [Gia83]. The books [GT01] and [Gia93]

are also classical references in this area. Optimal control problems are

discussed in 4. The main references are [Eva98b], [Lio82], [Bar94]

[FS93], [BCD97]. The last chapter concerns duality theory. We rec-

ommend the books [Eva99] [Vil03a], [Vil] as well as the author’s

papers [Gom00], [Gom02b].

INTRODUCTION 7

I would like to thank my students: Tiago Alcaria, Patrıcia Engracia,

Sılvia Guerra, Igor Kravchenko, Anabela Pelicano, Ana Rita Pires,

Veronica Quıtalo, Lucian Radu, Joana Santos, Ana Santos, and Vitor

Saraiva, which took courses based on part of these notes and suggested

me several corrections and improvements. My friend Pedro Girao de-

serves a special thanks are he read the first LATEX version of these notes

and suggested many corrections and improvements.

1

Finite dimensional optimization problems

This chapter is an introduction to optimization problems in finite

dimension. We are certain that many of the results discussed, as well as

its proofs, are familiar to the reader. However, we feel that it is instruc-

tive to recall them and, throughout this text, observe how they can be

adapted for infinite dimensional problems. The plan of this chapter is

the following: we start in §1 by considering unconstrained minimization

problems in Rn, we discuss existence and uniqueness of minimizers, as

well as first and second order tests for minimizers. The following sec-

tion, §2, concerns properties of convex functions which will be needed

throughout the text. We start the discussion of constrained optimiza-

tion problems in §3 by studying the Lagrange multiplier method for

equality constraints. Then, the general case involving both equality

and inequality constrains is discussed in the two remaining sections. In

§4 we consider linear programming problems, and in §5 we discuss non-

linear optimization problems and we derive the Karush-Kuhn-Tucker

(KKT) conditions. The chapter ends with a few bibliographical refer-

ences.

The general setting of optimization problems is the following: given

a function f : Rn → R and a set X ⊂ Rn, called the admissible set, we

would like to solve the following minimization problem

(1)

min f(x)

x ∈ X,

i.e. to find the solution set S ⊂ X such that

f(y) = infXf,

9

10 1. FINITE DIMENSIONAL OPTIMIZATION PROBLEMS

for all y ∈ S. We should note that the ”min” in (1) should be

read ”minimize” rather than ”minimum” as the minimum may not

be achieved. The number infX f is called the value of problem (1).

1. Unconstrained minimization in Rn

In this section we address the unconstrained minimization case,

that is the case in which the admissible set X is Rn. Let f : Rn → Rbe an arbitrary function. We look for conditions on f that

• ensure the existence of a minimum;

• show that this minimum is unique.

In many instances, existence and uniqueness results are not enough:

we would also like to

• determine necessary or sufficient conditions for a point to be a

minimum;

• estimate the location of a possible minimum.

By looking for all points that satisfy necessary conditions one can

determine a set of candidate minimizers. Then, by looking at sufficient

conditions one may in fact be able to show that some of these points

are indeed minimizers.

To study the existence of a minimum of f , we can use the following

procedure, called the direct method of the calculus of variations: let

(xn) be a minimizing sequence, that is, a sequence such that

f(xn)→ inf f.

Proposition 1. Let A be an arbitrary set and f : A→ R. Then there

exists a minimizing sequence.

1. UNCONSTRAINED MINIMIZATION IN Rn 11

Proof. If infA f = −∞, there exists xn ∈ A such that f(xn) →−∞. Otherwise, if infA f > −∞ ,we can always find xn ∈ A such

that infA f ≤ f(xn) ≤ infA f + 1n, which again produces a minimizing

sequence.

Let f : Rn → R. Suppose (xn) is a minimizing sequence for f . If

xn (or some subsequence) converges to a point x, and, if additionaly,

f(xn) converges to f(x), then x is a minimum of f because

f(x) = lim f(xn),

and

lim f(xn) = inf f,

because xn is a minimizing sequence. Thus f(x) = inf f . Although

minimizing sequences always exist, they may fail to converge, even up

to subsequences, as the next exercise illustrates:

Exercise 1. Consider the function f(x) = e−x. Compute inf f , give an

example of a minimizing sequence. Show that no minimizing sequence

for f converges.

As the previous exercise suggests, to ensure convergence it is nat-

ural to impose certain compactness conditions. In Rn, any bounded

sequence (xn) has a convergent subsequence. A convenient condition

on f that ensures boundedness of minimizing sequences is coercivity:

a function f : Rn → R is called coercive if f(x)→ +∞, as |x| → ∞.

Exercise 2. Let f be a coercive function and let xn be a sequence such

that f(xn) is bounded. Show that xn is bounded. Note in particular

that if f(xn) is convergent then xn is bounded.

Therefore, from the previous exercise, it follows

Proposition 2. Let f : Rn → R be a coercive function. Let (xn) is

a minimizing sequence for f . Then there exists a point x for which,

through some subsequence xn → x.


Unfortunately, if f is discontinuous at x, f(xn) may fail to converge

to f(x). This poses a problem because if xn is a minimizing sequence

f(xn) → inf f and if this limit is not f(x) then x cannot be a mini-

mizer. It would, therefore, seem natural to require f to be continuous.

However, to establish that x is a minimizer we do not really need con-

tinuity. In fact, a weaker property is sufficient: it is enough that for

any sequence (xn) converging to x the following inequality holds:

(2) lim inf f(xn) ≥ f(x).

A function f is called lower semicontinuous if inequality (2) holds for

any point x and any sequence xn converging to x.

Example 1. The function

f(x) =

1 if x 6= 0

0 if x = 0

is lower semicontinuous. However,

g(x) =

0 if x 6= 0

1 if x = 0

is not. J

ADD HERE GRAPH OF FUNCTIONS

Proposition 3. Let f : Rn → R be lower semicontinuous and let

(xn) ⊂ Rn be a minimizing sequence converging to x ∈ Rn. Then x is

a minimizer of f .

Proof. Let xn be a minimizing sequence. Then

inf f = lim f(xn) = lim inf f(xn) ≥ f(x),

that is, f(x) ≤ inf f .

Lower semicontinuity is a weaker property than continuity, and

therefore easier to be satisfied.


Establishing the uniqueness of minimizer is, in general, more com-

plex. A convenient condition that implies uniqueness of minimizers is

convexity.

A set A ⊂ Rn is convex if for all x, y ∈ A and any 0 ≤ λ ≤ 1 we

have λx+ (1− λ)y ∈ A. Let A be a convex set A function f : A→ Ris convex if, for any x, y ∈ A and 0 ≤ λ ≤ 1,

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y),

and it is uniformly convex if there exists θ > 0 such that for all x, y ∈ Aand 0 ≤ λ ≤ 1,

f(λx+ (1− λ)y) + θλ(1− λ)|x− y|2 ≤ λf(x) + (1− λ)f(y).

Example 2. Let ‖ · ‖ be any norm in Rn. Then, by the triangle

inequality

‖λx+ (1− λ)y‖ ≤ ‖λx‖+ ‖(1− λ)y‖ = λ‖x‖+ (1− λ)‖y‖,

for all 0 ≤ λ ≤ 1. Thus the mapping x 7→ ‖x‖ is convex. J

Exercise 3. Show that the square of the Euclidean norm in Rd, ‖x‖2 =∑k x

2k is uniformly convex.

Proposition 4. Let A ⊂ Rn be a convex set and f : A→ R be a convex

function. If x and y are minimizers of f then so is λx+ (1− λ)y, for

any 0 ≤ λ ≤ 1. If f is uniformly convex then x = y.

Proof. If x and y are minimizers then f(x) = f(y) = min f .

Consequently, by convexity

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y) = min f.

Therefore λx+ (1− λ)y is a minimizer of f . If f is uniformly convex,

and choosing 0 < λ < 1, we obtain

f(λx+ (1− λ)y) + θλ(1− λ)|x− y|2 ≤ min f,

which implies x = y.


The characterization of minimizers, through necessary or sufficient

conditions is usually made by introducing certain conditions that in-

volve first or second derivatives. Let f : Rn → R be a C2 function. Re-

call that Df and D2f denote, respectively the first and second deriva-

tives of f . Also we use the notation that a n × n matrix A ≥ 0 if A

is semidefinite positive and A > 0 is A is definite positive. The next

proposition is a well known result that illustrates this.

Proposition 5. Let f : Rn → R be a C2 function and x a minimizer

of f . Then

Df(x) = 0 and D2f(x) ≥ 0.

Proof. For any vector y ∈ Rn and ε > 0 we have

0 ≤ f(x+ εy)− f(x) = εDf(x)y +O(ε2),

dividing by ε, and letting ε→ 0, we obtain

Df(x)y ≥ 0.

Since y is arbitrary we conclude that:

Df(x) = 0.

In a similar way,

0 ≤ f(x+ εy) + f(x− εy)− 2f(x)

ε2= yTD2f(x)y + o(1),

and so, when ε→ 0, we obtain

yTD2f(x)y ≥ 0.

Let f : Rn → R be a C1 function. A point x is called a critical

point of f if Df(x) = 0.

Exercise 4. Let A be any set and f : A → R be a C1 function in

the interior intA of A. Show that any maximizer or minimizer of f is

either a critical point or lies on the boundary ∂A of A.


We will now show that any critical point of a convex function is a

minimizer. For that we need the following preliminary result:

Proposition 6. Let f : Rn → R be a C1 convex function. Then, for

any x, y we have

f(y) ≥ f(x) +Df(x)(y − x).

Proof. We have

(1−λ)f(x)+λf(y) ≥ f(x+λ(y−x)) = f(x)+λDf(x)(y−x)+o(|λ(y−x)|).

Thus, reorganizing the inequality and dividing by λ we obtain

f(y) ≥ f(x) +Df(x)(y − x) + o(1),

as λ→ 0.

We can use now this result to prove:

Proposition 7. Let f : Rn → R be a C1 convex function and x a

critical point of f . Then x is a minimizer of f .

Proof. Since Df(x) = 0 and f is convex, it follows from proposi-

tion 6 that

f(y) ≥ f(x),

for all y.

Exercise 5. Let f(x, λ) : Rn × Rm → R be a C2 function, x0 a mini-

mizer of f(·, 0), with D2xxf(x0, 0) definite positive. Show that, for each

λ in a neighborhood of λ = 0, there exists a unique local minimizer xλof f(·, λ) with xλ|λ=0 = x0. Compute Dλxλ at λ = 0.

Growth conditions on f can be used to estimate the norm of a

minimizer. In finite dimensional problems, estimates on the norm of a

minimizer are important for numerical methods. For instance, if such

an estimate exits, it makes it possible localize the search region for

a minimizer. In infinite dimensional problems this issue is even more


relevant as it will be clear later in these notes. An elementary result is

given in the next exercise:

Exercise 6. Let f : Rn → R be such that f(x) ≥ C1|x|2 +C2, C1 > 0.

Let x0 be a minimizer of f . Show that

|x0| ≤

√f(y)− C2

C1

,

for any y ∈ Rn.

Exercise 7. Let f(x, λ) : R2 → R be a continuous function. Suppose

for each λ there is at least one minimizer xλ of x 7→ f(x, λ). Suppose

there exists C such that |xλ| ≤ C for all λ in a neighborhood of λ = 0.

Suppose that for λ = 0 there exists a unique minimizer x0. Show that

limλ→0 xλ = x0.

Exercise 8. Let f ∈ C1(R2). Define u(x) = infy∈R f(x, y). Suppose

that

lim|y|→∞

f(x, y) = +∞,

uniformly in x. Let x0 be a point in which the infimum in y of f is

achieved at a single point y0. Show that u is differentiable in x at x0

and that∂u

∂x(x0) =

∂f

∂x(x0, y0).

Give an example that shows that u may fail to be differentiable if the

infimum of f in y is achieved at more than one point.

Exercise 9. Find all maxima and minima (both local and global) of

the function xy(1− x2 − y2) on the square −1 ≤ x, y ≤ 1.

2. Convexity

As we discussed in the previous section, convexity is a central prop-

erty in optimization. In this section we discuss additional properties of

convex functions which will be necessary in the sequel.

2. CONVEXITY 17

2.1. Characterizarion of convex functions. We now discuss

several tools that are useful to characterize convex functions. We first

observe that given a family of convex functions it is possible to build

another convex function by taking the pointwise supremum. This is a

useful construction and is illustrated in figure

ADD FIGURE HERE

Proposition 8. Let I be an arbitrary set and fι : Rn → R, ι ∈ I, an

indexed collection of convex functions. Let

f(x) = supι∈I

fι(x).

Then f is convex.

Proof. Let x, y ∈ Rn and 0 ≤ λ ≤ 1. Then

f(λx+ (1− λ)y) = supι∈I

fι(λx+ (1− λ)y) ≤ supι∈I

λfι(x) + (1− λ)fι(y)

≤ supι1∈I

λfι1(x) + supι2∈I

(1− λ)fι2(y)

= λf(x) + (1− λ)f(y).

Corollary 9. Suppose f : Rn → R is a C1 function satisfying

f(y) ≥ f(x) +Df(x)(y − x),

for all x. Then f is convex.

Proof. It suffices to observe that

f(y) ≥ supx∈Rn

f(x) +Df(x)(y − x),

which by proposition 8 is convex. Finally, we just observe that

supx∈Rn

f(x) +Df(x)(y − x) ≥ f(y),

and so the equality follows.

Proposition 10. Let f : Rn → R be a C2 function. Then f is convex

if and only if D2f(x) is positive semi-definite, for all x ∈ Rn.


Proof. Observe that if f is convex then for any y ∈ Rn and any

ε ≥ 0 we have

f(x− εy) + f(x+ εy)− 2f(x)

ε2≥ 0.

By sending ε→ 0 and using Taylor formula conclude

yTD2f(x)y ≥ 0,

and so D2f(x) is semi-definite positive.

Conversely,

f(y)− f(x) =

∫ 1

0

Df(x+ s(y − x))(y − x)ds =

= Df(x)(y − x) +

∫ 1

0

[Df(x+ s(y − x))(y − x)−Df(x)(y − x)] ds

= Df(x)(y − x) +

∫ 1

0

[∫ 1

0

s(y − x)TD2f(x+ ts(y − x))(y − x)dt

]ds

≥ Df(x)(y − x),

since (y − x)TD2f(x + ts(y − x))(y − x) ≥ 0, by the semi-positive

definiteness hypothesis.

Proposition 11. Let f : Rn → R be a continuous function. Then f is

convex if and only if

(3) f(x+ y) + f(x− y)− 2f(x) ≥ 0,

for any x, y ∈ Rn.

Proof. Clearly convexity implies (3). Let x, y ∈ Rn, and 0 ≤ λ ≤1 be such that λx+ (1− λ)y = z. We must prove that

(4) λf(x) + (1− λ)f(y) ≥ f(z)

holds. We claim that the previous equation holds for any λ = k2j

, for

any 0 ≤ k ≤ 2j. Clearly (4) holds when j = 1. Now we proceed with

induction in j. Assume that (4) holds for λ = k2j

. Then we claim that

it holds with λ = k2j+1 . If k is even we can reduce the fraction, therefore

2. CONVEXITY 19

we may suppose that k is odd, λ = k2j+1 and λx + (1 − λ)y = 0. Now

note that

z =1

2

[k − 1

2j+1x+

(1− k − 1

2j+1

)y

]+

1

2

[k + 1

2j+1x+

(1− k + 1

2j+1y

)].

Thus

f(z) ≤ 1

2f

(k − 1

2j+1x+

(1− k − 1

2j+1

)y

)+

1

2f

(k + 1

2j+1x+

(1− k + 1

2j+1

)y

)but, since k−1 and k+1 are even, k0 = k−1

2and k1 = k+1

2are integers.

Hence

f(z) ≤ 1

2f

(k0

2jx+

(1− k0

2j

)y

)+

1

2f

(k1

2jx+

(1− k1

2j

)y

)But this implies, that

f(z) ≤ k0 + k1

2j+1f(x) +

(1− k0 + k1

2j+1

)f(y).

Since k0 + k1 = k we get

f(z) ≤ k

2j+1f(x) +

(1− k

2j+1

)f(y).

Since f is continuous and the rationals of the form k2j

are dense in [0, 1],

we conclude that

f(z) ≤ λf(x) + (1− λ)f(y),

for any real 0 ≤ λ ≤ 1.

Exercise 10. Let f : Rn → R be a C2 function. Show that the following

statements are equivalent:

1. f is uniformly convex;

2. D2f ≥ γ > 0, for some γ > 0;

3. f(x+y

2

)+ θ |x−y|

2

4≤ f(x)+f(y)

2;

4. f(y) ≥ f(x) +Df(x)(y − x) + γ2|x− y|2, for some γ > 0.

Exercise 11. Let ϕ : R→ R be a non-decreasing convex function, and

ψ : Rn → R a convex function. Show that ϕ ψ is convex. Show by

giving an example that if ϕ is not non-decreasing then ϕ ψ may fail

to be convex.


2.2. Lipschitz continuity. Convex functions enjoy remarkable

properties. We will first show that any convex function is locally

bounded and Lipschitz.

Proposition 12. Let f : Rd → R be a convex function. Then f is

locally bounded and locally Lipschitz.

Proof. For x ∈ Rd denote |x|1 =∑

k |xk|. Define XM = x ∈ Rd :

|x|1 ≤M. We will prove that f is bounded on XM/8.

Any point x ∈ XM can be written as a convex combination of the

points ±Mek, where ek is the k-th standard unit vector. Thus

f(x) ≤ maxkf(Mek), f(−Mek).

Suppose now f is not bounded by bellow on XM/8. Then there exists

a sequence xn ∈ XM/8 such that f(xn) → −∞. Choose a point y ∈XM/4∩Xc

M/8. Note that 2y−xn ∈ XM . Therefore we can write 2y−xnas a convex combination of the points ±Mek, i.e.

y =1

2xn +

1

2

∑k

∑±

±λ±kMek.

Thus

f(y) ≤ 1

2f(xn) +

1

2maxkf(Mek), f(−Mek),

which is a contradiction if f(xn)→ −∞.

Now we will show the second part of the proposition, i.e., that any

convex function is also locally Lipschitz. By contradiction, by changing

coordinates if necessay, we can assume that 0 is not a Lipschizt point,

that is, there exists a sequence xn → 0 such that

|f(xn)− f(0)| ≥ C|xn|,

for all C and all n large enough. In particular this implies that

lim supn→∞

f(xn)− f(0)

|xn|∈ −∞,+∞.

2. CONVEXITY 21

and, similarly,

lim infn→∞

f(xn)− f(0)

|xn|∈ −∞,+∞.

By the previous part of the proof, we can assume that f is bounded

on X1. For each n choose a point yn such that |yn|1 = 1 such that

xn = |xn|1yn. Then

f(xn) ≤ |xn|f(yn) + (1− |xn|)f(0),

which implies

f(yn) ≥ f(0) +f(xn)− f(0)

|xn|.

Therefore

(5) lim supn→∞

f(xn)− f(0)

|xn|= −∞,

otherwise we would have a contradiction (note that f(yn) is bounded).

We can also write 0 = 11+|xn|xn −

|xn|1+|xn|yn. So

f(0) ≤ 1

1 + |xn|f(xn) +

|xn|1 + |xn|

f(−yn).

This implies

f(−yn) ≥ f(0) +f(0)− f(xn)

|xn|.

Because f(−yn) is bounded

lim supn→∞

f(0)− f(xn)

|xn|= −∞

which is a contradiction to (5).

2.3. Separation. In this last subsection we study separation prop-

erties that arise from convexity and present some applications.

Proposition 13. Let C be a closed convex set not containing the ori-

gin. Then there exists x0 ∈ C which minimizes |x| over all x ∈ C.


Proof. Consider a minimizing sequence xn. By a simple compu-

tation, we have the parallelogram identity∥∥∥∥xn + xm2

∥∥∥∥2

+1

4‖xn − xm‖2 =

1

2‖xn‖2 +

1

2‖xm‖2.

Because xn+xm2∈ C, by convexity, we have the inequality∥∥∥∥xn + xm

2

∥∥∥∥2

≥ infy∈C‖y‖2.

As n,m→∞ we also have

‖xn‖, ‖xm‖ → infy∈C‖y‖2.

But then, as n,m→∞, we conclude that

‖xn − xm‖2 → 0.

Therefore any minimizing sequence is a Cauchy sequence and hence

convergent.

Exercise 12. Let F : Rn → R be a uniformly convex function. Show

that any minimizing sequence for F is a Cauchy sequence. Hint:

F (xn)+F (xm)−2 inf F ≥ F (xn)+F (xm)−2F (xn + xm

2) ≥ θ

2|xn−xm|2.

Proposition 14. Let U and V be disjoint closed convex sets. Suppose

one them is compact. Then there exists w ∈ Rn and a > 0 such that

(w, x− y) ≥ a > 0,

for all x ∈ U and y ∈ V .

Proof. Consider the closed convex set W = U − V (this set is

closed because either U or V is compact). Then there exists a point

w ∈ W with minimal norm. Since 0 6∈ W , w 6= 0. So, for all x ∈ Uand y ∈ V , by the convexity of W ,

‖w‖2 ≤ ‖λ(x− y) + (1− λ)w‖2

= (1− λ)2‖w‖2 + 2λ(1− λ)(x− y, w) + λ2‖x− y‖2.

The last inequality implies

0 ≤ ((1− λ)2 − 1)‖w‖2 + 2λ(1− λ)(x− y, w) + λ2‖x− y‖2.

2. CONVEXITY 23

Dividing by λ and letting λ→ 0 we obtain

(x− y, w) ≥ ‖w‖2 > 0.

As a first application to the separation result we discuss a general-

ization of derivatives for convex functions. The subdifferential ∂−f(x)

of a convex function f : Rn → R at a point x ∈ Rn is the set of vectors

p ∈ Rn such that

f(y) ≥ f(x) + p · (y − x),

for all y ∈ Rn.

Proposition 15. Let f : Rn → R be a convex function and x0 ∈ Rn.

Then ∂−f(x0) 6= ∅.

Proof. Consider the set E(f) = (x, y) ∈ Rn+1 : y ≥ f(x), the

epigraph of f . Then, because f is convex and hence continuous, E(f)

is a closed convex set. Consider the sequence yn = f(x0)− 1n. Because

for each n the sets E(f) and (x0, yn) are disjoint closed convex sets,

and the second one is compact, there is a separating plane

(6) f(x) ≥ αn(x− x0) + βn,

for all x and

(7) f(x0)− 1

n= yn ≤ βn ≤ f(x0).

Thus, from (7) we get that βn is bounded. Since f is locally bounded,

the inequality (6) implies the boundedness of αn. Therefore, up to a

subsequence, there exists α = limαn and β = lim βn. Furthermore

f(x) ≥ α(x− x0) + β,

and, again using (7), we get that f(x0) = β. Thus

f(x) ≥ α(x− x0) + f(x0),

and so α ∈ ∂−f(x).

Exercise 13. Let f : R→ R, be given by f(x) = |x|. Compute ∂−f .


Exercise 14. Let f : Rn → R be convex. Show that if f is differentiable

at x ∈ Rn then ∂−f(x) = Df(x).

Proposition 16. Let f : Rn → R be a C1 convex function. Then

(Df(x)−Df(y)) · (x− y) ≥ 0.

Proof. Observe that

f(y) ≥ f(x) +Df(x) · (y − x) f(x) ≥ f(y) +Df(y) · (x− y).

Add these two inequalities.

Exercise 15. Prove the analogous to the previous proposition for the

case in which f is not C1 by replacing derivatives by points in the

subdifferential.

Exercise 16. Let f be a uniformly convex function. Show that

(Df(x)−Df(y)) · (x− y) ≥ γ|x− y|2.

Exercise 17. Let f : Rn → R be a convex function. Show that a point

x ∈ Rn is a minimizer of f if and only if 0 ∈ ∂−f(x).

Exercise 18. Let A be a convex set and f : A → R be a uniformly

convex function. Let x ∈ A be a maximizer of f . Show that x is

an extreme point, that is, that there are no y, z ∈ A, x 6= y, z and

0 < λ < 1 such that x = λy + (1− λ)z.

The second application of Proposition 14 is a very important result

called Farkas lemma:

Lemma 17 (Farkas Lemma). Let A be a m×n matrix, c a line vector

in Rn. Then we have one and only one of the following alternatives

1. c = yTA, for some y ≥ 0

2. There exists a column vector w ∈ Rn, such that Aw ≤ 0 and

cw > 0

2. CONVEXITY 25

Proof. If the first alternative does not hold, the sets U = yTA, y ≥0 and V = c are disjoint and convex. Then the separation theo-

rem for convex sets (see proposition 14) implies that there exists an

hyperplane with normal w which separates them, that is

(8) yTAw ≤ a

and

cw > a.

Note that a ≥ 0 (by setting y = 0 in (8)), so cw > 0. Furthermore, for

any γ ≥ 0 we have

γyTAw ≤ a,

by letting γ → +∞ we conclude that

yTAw ≤ 0.

So this corresponds to the second alternative.

Example 3. Consider a discrete state one-period pricing model, that

is, we are given n assets which at the initial time cost ci, 1 ≤ i ≤ n per

unit (we regard c as a row vector) and after one unit of time, each asset

is worth with probability pj, 1 ≤ j ≤ m, Pji. A portfolio is a (column)

vector π ∈ Rn. The value of the portfolio at time 0 is cπ and at time

one, with probability pj the value is (Pπ)j. An arbitrage opportunity

is a portfolio such that cπ < 0 and (Pπ)j ≥ 0, i.e. a portfolio with

negative cost and non-negative return.

Farkas lemma yields that either

1. there exists y ∈ Rm, y ≥ 0 such that c = yP

or

2. there exists an arbitrage portfolio.

Furthermore, if one of the assets is a no-interest bearing bank ac-

count, for instance c1 = 1 and Pj1 = 1. Then y is a probability vector

which in general may be different from p. J


3. Lagrange multipliers

Many important problems require minimizing (or maximizing) func-

tions under equality constraints. The Lagrange multiplier method is

the standard tool to study these problems. For inequality constraints,

the Lagrange multiplier method can be extended in a suitable way as

it will be studied in the two following sections.

Proposition 18. Let f : Rn → R and g : Rn → Rm (m < n) be C1

functions. Suppose c ∈ Rm fixed, and assume that the rank of Dg is m

at all points of the set g = c. Then, if x0 is a minimum of f in the set

g(x) = c, there exists λ ∈ Rm such that

Df(x0) = λTDg(x0).

Proof. Let x0 be as in the statement. Suppose that w1, . . . wm are

vectors in Rn satisfying

det [Dg(x0)W ] 6= 0,

where W ≡ [w1 · · ·wm] is the matrix with columns w1, . . . wm. Note

that it is possible to choose such vectors because the rank of Dg is m.

Given v ∈ Rn consider the equation

g(x0 + εv +Wi) = c.

The implicit function theorem implies that there exists a unique func-

tion i(ε) : R→ Rm,

i(ε) =

i1(ε)...

im(ε)

,defined in a neighborhood of ε = 0, with i(0) = 0, and such that

g(x0 + εv +Wi(ε)) = c.

Additionally,

i′(0) = −(Dg(x0)W )−1Dg(x0)v.

Since x0 is a minimizer of f in the set g(x) = c, the function

I(ε) = f(x0 + εv +Wi(ε))

3. LAGRANGE MULTIPLIERS 27

satisfies

0 = I ′(0) = Df(x0)v +Df(x0)Wi′(0),

that is,

Df(x0)v = λTDg(x0)v,

with

λT = Df(x0)W (Dg(x0)W )−1,

for any vector v.

Proposition 19. Let f : Rn → R, and g : Rn → Rm, with m < n,

be smooth functions. Assume that Dg has maximal rank at all points.

Let xc be a minimizer of f(x) under the constraint g(x) = c, and λcthe corresponding Lagrange multiplier, i.e.

(9) Df(xc) = λcDg(xc).

Suppose that xc is differentiable function of c. Define

V (c) = f(xc).

Then DcV (c) = λc.

Proof. We have

g(xc) = c.

By differentiating with respect to c we obtain

Dg(xc)∂xc∂c

= I.

Multipying by λc and using (9) yields

λc = λcDg(xc)∂xc∂c

= Df(xc)∂xc∂c

= DcV (c).

Exercise 19. Let f : Rn → R, and g : Rn → Rm, with m < n, be

smooth functions. Assume that Dg has maximal rank at all points. Let

x0 be a minimizer of f(x) under the constraint g(x) = g(x0), λ the

corresponding Lagrange multiplier, and F = f + λg. Show that

D2xixj

F (x0)ξiξj ≥ 0,

for all vectors ξ that satisfy Dxig(x0)ξi = 0.


Proposition 20. Let f : Rn → R, and g : Rn → Rm, with m < n.

Let x0 be a minimizer of f(x) under the constraint g(x) = g(x0). Then

there exist constants λ0, · · ·λm not identically zero such that

λ0Df + λ1Dg1 · · ·λmDgm = 0

at x0. Furthermore, if Dg has maximal rank we can choose λ0 = 1.

Proof. First observe that the matrix[Df

Dg

]cannot have rank m + 1. Indeed, this follows by applying the implicit

function theorem to the function (x, c) 7→ (f(x) − c0, g(x) − c′) with

x ∈ Rn and c = (c0, c′) ∈ Rm+1, to obtain a contradiction to x0 being

a minimizer.

This fact then implies that there exist constants λ0, · · ·λm not iden-

tically zero such that

λ0Df + λ1Dg1 + · · ·+ λmDgm = 0

at x0. Observe also that if Dg has maximal rank we can choose λ0 = 1.

In fact, if λ0 6= 0, it suffices to multiply λ by 1λ0

. To see that λ0 6= 0 we

argue by contradiction. In fact, if λ0 = 0 we would have

λ1Dg1 + · · ·+ λmDgm = 0

which contradicts the hypothesis that Dg has maximal rank m.

Example 4 (Minimax principle). There exists a nice formal interpre-

tation of Lagrange multipliers, which although not rigorous is quite

useful. Fix c ∈ Rm, and consider the problem of minimizing a function

f : Rn → R under the constraint g(x)− c = 0, with g : Rn → Rm. This

problem can be rewritten as

minx

maxλ

f(x) + λT (g(x)− c).

The minimax principle asserts that the maximum can be exchanged

with the minimum (which is frequently false) and, therefore, we obtain

3. LAGRANGE MULTIPLIERS 29

the ”equivalent” problem

maxλ

minxf(x) + λT (g(x)− c).

From this we deduce that, for each λ the minimum xλ is determined

by

(10) Df(xλ) + λTDg(xλ) = 0.

Furthermore, the function to maximize in λ is

f(xλ) + λT (g(xλ)− c).

Differentiating this equation with respect to λ, assuming that xλ is

differentiable, and using (10), we obtain

g(xλ) = c.

J

Exercise 20. Use the minimax principle to determine (formally) op-

timality conditions for the problem

min f(x)

under the constraint g(x) ≥ c.

The next exercise illustrates that the minimax principle may indeed

be false, although in many problems it is an important heuristic

Exercise 21. Show that the minimax principle is not valid in the fol-

lowing cases:

1. x+ λ;

2. x3 + λ(x2 + 1);

3. 11+(x−λ)2 .

Exercise 22. Let A and B be arbitrary sets and F : A×B → R. Show

that

infa∈A

supb∈B

F (a, b) ≥ supb∈B

infa∈A

F (a, b).


4. Linear programming

We now continue the study of constrained optimization problems

by looking into minimization of linear functions subjected to linear in-

equality constraints - i.e., linear programming problems. A detailed dis-

cussion on this class of problems can be found, for instance, in [GSS08]

or [Fra02].

4.1. The setting of linear programming. A model problem in

linear programming is the following: given a line vector c ∈ Rn, a real

m × n matrix A, and a column vector b ∈ Rm we look for a column

vector x ∈ Rn which is a solution to the problem:

(11)

maxx cx

Ax ≤ b

x ≥ 0,

where the notation v ≥ 0 for a vector v means that all components of

v are non-negative. The set defined by the inequalities Ax ≤ b and

x ≥ 0 may be empty, or in this set the function cx may be unbounded

by above. To simplify the discussion, we assume that this situation

does not occur.

Move here feasible set

Example 5. Add example here.

J

Observe that if c 6= 0 the maximizers of cx cannot be interior points

of the feasible set, otherwise by exercise 4 they would be critical points.

Therefore, the maximizers must lie on the boundary of Ax ≤ b, x ≥ 0.

Unfortunately this boundary can be quite complex as consists on a

finite (but frequently large) union of intersections of planes (of the

form dx = e) with half-planes (of the form dx ≤ e).

4. LINEAR PROGRAMMING 31

Exercise 23. Suppose that no line of A vanishes. Show that the bound-

ary of the set Ax ≤ b consist of all points which satisfy Ax ≤ b with

equality in at least one coordinate.

Note that the linear programming problem (11) is quite general

as it is possible to include equality constraints as inequalities: in fact

A′x = b′ is the conjunction of A′x ≤ b′ and −A′x ≤ −b′.

A vector x is called feasible for (11) if it satisfies the constraints,

that is Ax ≤ b and x ≥ 0.

Example 6 (Diet problem). A animal food factory would like to min-

imize the production cost of a pet food, while keeping it nutritionally

balanced. Each food i costs ci by unit. Therefore, if each unit of pet

food contains an amount xi of the food ci, the total cost is

cx.

There is, of course, the obvious constraint that x ≥ 0. Suppose that

Aij represents the amount of the nutrient i in the food j, and bi the

minimum recommended amount of the nutrient i. Then, to ensure a

nutritionally ballanced diet we must have

Ax ≥ b.

Thus the diet problem is min cx

Ax ≥ b

x ≥ 0.

J

Example 7 (Optimal Transport). A large multinational needs to trans-

port its supply from each factory i to the distribution points j. The

supply in i is si and the demand in j is dj. The cost of transporting

one unit from i to j is cij. We would like to determine the quantity πijtransported from i to j solving the following optimization problem

minπ

∑ij

cijπij,


under the constraints πij ≥ 0, and supply and demand bounds∑j

πij ≤ si,∑i

πij ≥ dj.

J

Example 8. The existence of feasible vectors, i.e. vectors satisfying

the constraint Ax ≤ b is not obvious. There exists, however a procedure

that can convert this question into a new linear programming problem.

Let x0 be a new variable. We would like to solve

minx0

where the minimum is taken over all vectors (x0, x) which satisfy the

constraints (Ax)j ≤ bj +x0, for all j. It is clear that the feasible set for

this problem is non-empty, take for instance x = 0 and x0 = max |bj|.

This new linear programming problem has therefore a value (which

could be −∞ but not +∞). If the value is non-positive, there exist

feasible vectors for the constraint Ax ≤ b. Otherwise, if the value is

positive, it implies that the feasible set of the original problem is empty.

J

Exercise 24. Let A be m × n matrix, with m > n. Consider the

overdetermined system

Ax = b

for b ∈ Rm. In general, this equation has no solution. We would like to

determine a vector x ∈ Rn which minimizes the maximum of the error

supi|(Ax)i − bi|.

Rewrite this problem as a linear programming problem. Compare this

problem with the minimum square method which consists in solving

minx‖Ax− b‖2.


4.2. The dual problem. To problem (11), which we call primal,

we associate another problem, called the dual, which consists in deter-

mining y ∈ Rm, which solves

(12)

min yT b

yTA ≥ c

y ≥ 0.

As the next exercise shows, the dual problem can be motivated by the

minimax principle:

Exercise 25. Show that (11) can be written as

(13) maxx≥0

miny≥0

cTx+ yT (b− Ax).

Suppose we can exchange the maximum with the minimum in (13).

Relate the resulting problem with (12).

Example 9 (Interpretation of the dual of the diet problem). The dual

of the diet problem (example 6) is the followingmax yT b

yTA ≤ c

y ≥ 0.

This problem admits the following interpretation. A competing com-

pany is willing to provide a nutritionally balanced diet, charging for

each unit of the nutrient i a price yi. Obviously, the competing com-

pany would like to maximize its income. There are the following con-

straints: y ≥ 0, and furthermore if the food item j costs cj the com-

peting company should charge an amount (yTA)j no larger than cj.

This constraint is quite natural, since if it does not hold, at least part

of the diet could be obtained by buying the food items j such that

(yTA)j > cj. J

Exercise 26. Show that the dual of the dual is equivalent to the primal.

Exercise 27. Determine the dual of the optimal transport problem and

give a possible interpretation.


As the next theorem concerns the relation between the primal and

dual problems:

Theorem 21.

1. Weak Duality: Suppose x and y are feasible, respectively, for

(11) and (12), then

cx ≤ yT b.

2. Optimality: Furthermore, if cx = yT b then x and y are solu-

tions of (11) and (12), respectively.

3. Strong duality: If (11) has a solution x∗, then (12) also has

a solution y∗,

cx∗ = (y∗)T b.

Finally, y∗j = 0 for all indices j such that (Ax∗)j < bj.

Proof. To prove the weak duality, observe that

cx ≤ (yTA)x = yT (Ax) ≤ yT b.

The optimality criterion follows from the previous inequality.

To prove the strong duality, we may assume that the inequality

Ax ≤ b includes also x ≥ 0, for instance replacing A by the augmented

matrix

A =

[A

−I

]and the vector b by

b =

[b

0

].

In this case it will be enough to prove that there exists a vector y∗ ∈Rn+m such that y∗ ≥ 0,

c = (y∗)T A

with y∗j = 0 for all indices j such that (Ax∗)j < bj. In fact, if such

vector y∗ is given we just set y∗ to be the first n coordinates of y∗.


Then c ≤ (y∗)TA and then

cx∗ = (y∗)T Ax∗ = (y∗)T b = (y∗)T b,

since b differs from b by adding n zero entries. From this point on we

drop the ∼ to simplify the notation.

First we state the following auxiliary result, whose proof is a simple

corollary to Lemma 17:

Lemma 22. Let A be a m × n matrix, c a line vector in Rn and J

an arbitrary set of lines of A. Then we have one and only one of the

following alternatives

1. c = yTA, for some y ≥ 0 with yj = 0 for all j 6∈ J .

2. There exists a column vector w ∈ Rn, such that (Aw)j ≤ 0 for

all j ∈ J and cw > 0.

Exercise 28. Use Lemma 17 to prove Lemma 22.

Let x∗ be a solution of (11). Let J be the set of indices j for which

(Ax∗)j = bj. We will show that there exists y ≥ 0 such that c = yTA

and yj = 0 for j 6∈ J . By contradiction assume that no such y exists.

By the previous lemma there is w such that cw > 0 and (Aw)j ≤ 0 for

j ∈ J . But then, x = x∗ + εw is feasible, for ε > 0 sufficiently small

since

Ax = Ax∗ + εAw ≤ b.

However,

cx = c(x∗ + εw) > cx∗,

which contradicts the optimality of x∗.

Therefore, for some y ≥ 0,

cx∗ = yTAx∗ = yT b.

Consequently, by the second part of the theorem we conclude that y is

optimal.


Lemma 23. Let x and y be, respectively, feasible for the primal and

dual problems. Define

s = b− Ax ≥ 0, e = ATy − cT ≥ 0.

Then

sTy + xT e = bTy − xT cT ≥ 0.

Proof. Since x, y ≥ 0 we have

sTy = bTy − xTATy ≥ 0 xT e = xTATy − xT cT ≥ 0.

By adding these two expressions, we obtain

sTy + xT e = bTy − xT cT ≥ 0.

Theorem 24 (Complementarity). Suppose x and y are solutions of

(11) and (12), respectively. Then

sTy = 0 and xT e = 0.

Proof. We have sTy, xT e ≥ 0. If x and y are optimal then cx =

yT b. By the previous lemma

sTy + xT e = 0,

which implies the theorem.

Exercise 29. Study the following problem in R2:

maxx1 + 2x2

with x1, x2 ≥ 0, x1 + x2 ≤ 1 and 2x1 + x2 ≤ 3/2. Determine the dual

problem, its solution and show that it has the same value as the primal

problem.

Exercise 30. Let x∗ be a solution of the problem

min cx

5. NON-LINEAR OPTIMIZATION WITH CONSTRAINTS 37

under the constraints Ax ≥ b, x ≥ 0 and let y∗ be a solution of the

dual. Use complementarity to show that x∗ minimizes

cx− (y∗)TAx

under the constraint x ≥ 0.

Exercise 31. Solve by elementary methods the problem

maxx1 + x2

under the constraints 3x1 + 4x2 ≤ 12, 5x1 + 2x2 ≤ 10.

Exercise 32. Consider the problem

min−7x1 + 9x2 + 16x3,

under the constraints x ≥ 0, 2 ≤ x1 + 2x2 + 9x3 ≤ 7. Obtain an upper

and lower bound for the value of the minimum.

Exercise 33. Show that the solution set of a linear programming prob-

lem is a convex set.

Exercise 34. Consider a linear programming problem in Rn

min cεx

under the constraints Ax ≤ b, x ≥ 0. Suppose cε = c0 + εc1. Suppose

that for ε > 0 there exists a minimizer xε which converges to a point

x0, as ε → 0. Show that x0 is a minimizer of c0x under Ax ≤ b, x ≥0. Show, furthermore that if this limit problem has more than one

minimizer then x0 minimizes c1x among all other minimizers.

5. Non-linear optimization with constraints

Let f : Rn → R and g : Rn → Rm be C1 functions. We consider

the following non-linear optimization problem:

(14)

maxx

f(x)

g(x) ≤ 0

x ≥ 0.


We denote the feasible set by X:

X = x ∈ Rn|x ≥ 0, g(x) ≤ 0,

and the solution set by S:

S = x ∈ X : f(x) = supx∈X

f(x).

In this section we derive necessary conditions, called the Karush-Kuhn-

Tucker (KKT) conditions, for a point to be a solution of the problem.

We start by explaining these conditions which generalize both the La-

grange multipliers for equality constraints and the optimality condi-

tions from linear programming. We then show that under convexity

hypothesis these conditions are in fact sufficient. After that we show

that under a condition called constraint qualification that the KKT

conditions are indeed necessary optimality conditions. We end the

discussion with several conditions that allow to check in practice the

constraint qualification conditions.

5.1. KKT conditions. For y ∈ Rm define the Lagrangian

L(x, y, µ) = f(x)− yTg(x) + µTx

For (x, µ, y) ∈ Rn × Rn × Rm the KKT conditions are the following:

(15)

∂L∂xi

= 0

g(x) ≤ 0, yTg(x) = 0

x ≥ 0, µTx = 0

µ, y ≥ 0.

The variables y and µ are called the Lagrange multipliers.

Several variations of the KKT conditions arise in different problems.

For instance, in the case in which there is no positivity constraints for

the variable x, the KKT conditions take the form: for (x, y) ∈ Rn×Rm,


and L(x, y) = f(x)− yTg(x),

(16)

∂L∂xi

= 0

g(x) ≤ 0, yTg(x) = 0

y ≥ 0.

Exercise 35. Derive (16) from (15) by writing x = x+ − x− where

x+, x− ≥ 0.

Another example are equality constraints g(x) = 0, again without

positivity constraints in the variable x. We can write the equality

constraint as g(x) ≤ 0 and −g(x) ≤ 0. Let y± be the multipliers

corresponding to ±g(x) ≤ 0, define y = y+ − y−. Then (16) can be

written as

∂f

∂xi=

m∑j=1

yj∂gj∂xi

, g(x) = 0,

that is, y is the Lagrange multiplier for the equality constraint g(x) = 0.

Consider a linear programming problem where in (14) we set

f(x) = cx, g(x) = Ax− b.

Then the KKT conditions are thenc− yTA = −µAx ≤ b, yT (Ax− b) = 0

x ≥ 0, µTx = 0

µ, y ≥ 0.

In this case, the first line of the KKT conditions can be rewritten as

c− yTA ≤ 0,

that is, since y ≥ 0, y is admissible for the dual problem. Using the

condition µTx = 0 we conclude that

c · x = yTAx.


Then the second line of the KKT condition yields yTAx = yT b, which

implies

cx = yT b,

which is the optimality criterion for the linear programming problem,

and shows that a solution of the KKT condition is in fact a solution

of (14). Furthermore, it also shows that y is a solution to the dual

problem.

Example 10. Let Q be an n× n real matrix. Consider the quadratic

programming problem

(17)

maxx

12xTQx

Ax ≤ b

x ≥ 0.

The KKT conditions are

(18)

xTQ− yTA = −µAx ≤ b, yT (Ax− b) = 0

x ≥ 0, µTx = 0

µ, y ≥ 0.

J

5.2. Duality and sufficiency of KKT conditions. We can write

problem (14) in the following minimax form:

supx≥0

infy≥0

f(x)− yTg(x).

We define the dual problem as

(19) infy≥0

supx≥0

f(x)− yTg(x).

Let

h∗(y) = supx≥0

f(x)− yTg(x),

and

h∗(x) = infy≥0

f(x)− yTg(x).


Then (14) is equivalent to

supx≥0

h∗(x),

and (19) is equivalent to the problem

infy≥0

h∗(y).

From exercise 22, we have the duality inequality

supx≥0

h∗(x) = supx≥0

infy≥0

f(x)− yTg(x)

≤ infy≥0

supx≥0

f(x)− yTg(x) = infy≥0

h∗(y).

Furthermore, if x ≥ 0 and y ≥ 0 satisfy

h∗(x) = h∗(y)

then x and y are, respectively, solutions to (14) and (19).

If we choose

f(x) = cx, g(x) = Ax− b,(14) is a linear programming problem. Then

h∗(x) =

cx if Ax− b ≤ 0

−∞ otherwise,

and

h∗(y) =

bTy if ATy − c ≥ 0

+∞ otherwise.

Consider the quadratic programming problem

(20)

max 12xTQx

Ax− b ≤ 0.

Note that here the variable x does not have any sign constraint.

In this case we define

h∗(x) = infy≥0

1

2xTQx− yT (Ax− b) =

12xTQx if Ax− b ≤ 0

−∞ otherwise,


and

h∗(y) = supx

1

2xTQx− yT (Ax− b).

If we assume that Q is non-singular and negative definite we have

h∗(y) = −1

2yTAQ−1ATy + yT b.

It is easy to check directly that h∗(x) ≤ h∗(y).

It turns out that the KKT conditions are in fact sufficient if f and

g satisfy additional convexity conditions.

Proposition 25. Suppose that −f and each component of g is convex.

Let (x, µ, y) ∈ Rn×Rn×Rm be a solution of the KKT conditions (15).

Then x is a solution of (14).

Proof. Let x ∈ X. By the concavity of f we have

f(x)− f(x) ≤ Df(x)(x− x).

By the KKT conditions (15),

Df(x)(x− x) = yTDg(x)(x− x)− µT (x− x).

Since each component of g is convex, and y ≥ 0,

yTDg(x)(x− x) ≤ yT (g(x)− g(x))

Since yTg(x) = 0, yTg(x) ≥ 0, µTx ≥ 0, and µT x = 0, we have

f(x)− f(x) ≤ 0,

that is x is solution.

As the next proposition shows, the KKT conditions imply strong

duality.

Proposition 26. Suppose that −f and each component of g is convex.

Let (x, µ, y) ∈ Rn×Rn×Rm be a solution of the KKT conditions (15).

Then

h∗(x) = h∗(y).


Proof. Observe that, by the previous theorem, any solution to

Df(x)− yTDg(x) + µT = 0,

with µ ≥ 0, µTx = 0, is a maximizer of the function

f(x)− yTg(x),

under the constraint x ≥ 0. Therefore

h∗(y) = f(x)− yTg(x) = f(x),

since yTg(x) = 0. Furthermore,

h∗(x) = f(x) + infy≥0−yTg(x) = f(x),

because g(x) ≤ 0. Thus

h∗(x) = h∗(y).

5.3. Constraint qualification and KKT conditions. Consider

the constraints

(21) g(x) ≤ 0, x ≥ 0.

Let X denote the admissible set for (21). For x ∈ X define the active

coordinates indices as I(x) = i : xi = 0, and the active constraints

indices as J(x) = j : gj(x) = 0. For x ∈ X define the tangent cone

to the admissible set X at the point x ∈ X as the set T (x) of vectors

v ∈ Rn which satisfy

vi ≥ 0, v ·Dgj(x) ≤ 0,

for all i ∈ I(x) and all j ∈ J(x). We say that the constraints satisfy the

constraint qualification condition if for any x ∈ X and any v ∈ T (x)

there exists a C1 curve x(t) with x(0) = x and x(0) = v with x(t) ∈ Xfor all t ≥ 0 sufficiently small.

Proposition 27. Let x be a solution of (14), and assume that the

constraint qualification condition holds. Then there exists µ ∈ Rn and

y ∈ Rm such that (15) holds.


Proof. Fix v ∈ T (x) and let x be a curve as in the constraint

qualification condition. Because x is a maximizer,

(22) 0 ≥ d

dtf(x(t))

∣∣∣∣t=0

= v ·Df(x).

From Farkas lemma (Lemma 17) we know that either there is v ∈ T (x)

such that v ·Df > 0 or else the vector −Df belongs to the positive cone

generated by ei, i ∈ I and −Dgj(x), for j ∈ J . By (22) we know that

the first alternative does not hold, hence there exists a vector µ ∈ Rn,

with µi ≥ 0 for i ∈ I, and µi = 0 for i ∈ Ic, and y ∈ Rm with yj ≥ 0

for j ∈ J and yj = 0 for j ∈ J c such that

Df = yTDg − µT .

By the construction of y and µ, as well as the definition of I and J , it

is clear that µTx = 0 as well as yTg = 0.

To give an interpretation of the Lagrange multipliers in the KKT

conditions, consider the family of problems

(23)

maxx

f(x)

gθ(x) ≤ 0,

where θ ∈ Rm and

gθ(x) = g(x)− θ.

We will assume that for all θ the constraint qualification condition

holds. Furthermore, assume that there exists a unique solution xθ

which is a differentiable function of θ. Define the value function

V (θ) = f(xθ).

Let yθ ∈ Rm be the corresponding Lagrange multipliers, which we

assume to be also differentiable.

We claim that for any θ0 ∈ Rm

(24)∂V (θ0)

∂θj= yθ0j .


To prove this identity, observe first that we have, using the KKT con-

ditions,

∂V (θ)

∂θj=∑k

∂f(xθ)

∂xk

∂xθ

∂θj=∑kj

yθj∂gθj (x

θ)

∂xk

∂xθk∂θj

.

By differentiating the complementarity condition∑

k yθkg

θk(xθ) = 0 with

respect to θj we obtain

(25) 0 =

[∑k

∂yθk∂θj

gθk(xθ) + yθk

∑i

∂gθk(xθ)

∂xi

∂xθi∂θj

]− yθj .

For θ = θ0 we either have gθk(xθ0) = 0 or gθk(x

θ0) < 0, in which case

yθk vanishes in a neighborhood of θ0. Consequently, in this last case we

have∂yθ0k

∂θj= 0. Therefore

∂yθk∂θj

gθk(xθ) = 0.

So, from (25), we conclude that

yθ0j =∑k

yθk∂gθ0k (xθ0)

∂xi

∂xθ0i∂θk

.

Thus we obtain (24).

5.4. Checking the constraint qualification conditions. Con-

sider the following optimization problem

(26)

maxx

x1

−(1− x1)3 + x2 ≤ 0

x ≥ 0.

The Lagrangian is

L(x, y, µ) = x1 − yT (x2 − (1− x1)3) + µ1x1 + µ2x2

and so∂L(x, y, µ)

∂x1

= 1− 3(1− x1)2y + µ1.

In particular, when x1 = 1, the equation

1 + µ1 = 0


does not have a solution with µ1 ≥ 0. Hence the KKT conditions are

not satisfied. Nevertheless the point (x1, x2) = (1, 0) is a solution.

This example illustrates the need for obtaining simple criteria to

check whether the constraint qualification conditions hold. We will

show that the following are sufficient conditions for the verification of

the constraint qualifications.

1. The Mangasarian-Fromowitz condition: for any x ∈ X there is

v such that

∇gi(x)v < 0;

2. The Cotte-Dragominescu condition: for any x ∈ X the active

constraints are positively linearly independent:∑y∇gi = 0, y ≥ 0 implies y = 0;

3. The Arrow-Hurwicz and Uzawa condition: for any x ∈ X the

active constraints are linearly independent.

It is obvious that 3. implies 2. We will show that 1. is equivalent to 2.

To do so we need the following lemma:

Proposition 28 (Gordon alternative). Let A be a real-valued m × nmatrix. Then one and only one of the following holds:

• There exists x ∈ Rn such that Ax < 0;

• There exists y ∈ Rm, y ≥ 0, and y 6= 0, such that yTA = 0.

Proof. (i) It is clear that the two conditions are disjoint. Other-

wise, if Ax < 0 and yTA = 0 we would have 0 = yTAx < 0 which is a

contradiction.

(ii) We consider the following optimization problem:

(27)

maxy

y1 + · · ·+ ym

yTA = 0

y ≥ 0.


It is clear that if the second alternative holds then the value of this

problem is +∞. Otherwise, y = 0 is a solution and the value is 0. In

this case the dual problem:

(28)

minx

0

(Ax)i ≤ −1, i = 1, . . . ,m

has a solution, i.e., there is a point x satisfying the constraints. Hence,

the first alternative holds.

Proposition 29. The Cotte-Dragominescu condition is equivalent to

the Mangasarian-Fromowitz condition.

Proof. Set A = ∇g. The Mangasarian-Fromowitz condition cor-

responds to the first case in the Gordon alternative. Therefore, the

only solution of∑y∇gi = 0 and y ≥ 0 is y = 0. Thus the Cotte-

Dragominescu condition is satisfied. Conversely, if the only solution

to∑y∇gi = 0 and y ≥ 0 is y = 0 the second case of the Gordon

alternative does not hold. Then the first alternative holds and so the

Mangasarian-Fromowitz condition is satisfied.

Theorem 30. If the Mangasarian-Fromowitz condition holds then the

constraint qualification condition is satisfied.

Proof. Let x0 ∈ X. Take w such that ∇gi(x0)w ≤ 0. We must

construct a curve x(ε) in such a way that x(ε) ∈ X for ε sufficiently

small and such that x(0) = w. Let v be a vector as in the Mangasarian-

Fromowitz condition. Take M sufficiently large and define

x(ε) = x0 + εw +Mε2v.

Then using Taylor’s series we have

gi(x(ε)) = gi(x0)+ε∇gi(x0)w+Mε2∇gi(x0)v+ε2

2wTD2gi(x0)w+O(ε3).

Thus, if M is large enough and ε sufficiently small gi(x(ε)) < 0.

Theorem 31. If either the Cotte-Dragominescu condition or the Arrow-

Hurwicz and Uzawa condition hold then so does the constraint qualifi-

cation condition.


6. Bibliographical notes

In what concerns linear programming problem, we have used the

books [GSS08] or [Fra02]...

2

Calculus of variations in one independent variable

This chapter is dedicated to a classical subject in the calculus of

variations: variational problems with one independent variable. These

are extremely important because of its applications to classical me-

chanics and Riemannian geometry. Furthermore they serve as a model

for optimal control problems and problems with multiple integrals. We

start in Section 1, by deriving the Euler-Lagrange equation and give

some elementary applications. Then, in section 2 we study additional

necessary conditions for minimizers, and in section 3 we discuss several

applications to Riemannian geometry and classical mechanics.

An introduction to the Hamiltonian formalism is discussed in sec-

tion 4. The next issue, section 5, is the study of sufficient conditions

for a trajectory to be a minimizer: first we establish the existence of

local minimizers, then we study the connections between smooth solu-

tions of Hamilton-Jacobi equations and global minimizers, and finally

we discuss the Jacobi equation, conjugate points and curvature.

Symmetries are an important topic in calculus of variations. In

section 6 we present Routh’s method for integration of Lagrangian

systems and Noether’s theorem.

Of course, not every solution to the Euler-Lagrange equation is a

minimizer. Section 7 is a brief introduction to minimax methods and to

the mountain pass theorem. We also consider several examples of non-

existence of minimizing orbits (Lavrentiev phenomenon) and relaxation

methods (Young measures) in section 9.

49

50 2. CALCULUS OF VARIATIONS IN ONE INDEPENDENT VARIABLE

Invariant measures for Lagrangian and Hamiltonian systems are

considered in section 8.

The next part of this chapter is dedicated to the study of the ge-

ometry of Hamiltonian systems: symplectic and Poisson structures,

Darboux theorem and Arnold-Liouville integrability, section 10

In the last section, section 11 we consider perturbation problems

and describe the Linstead series perturbation procedure.

We end the chapter with bibliographical notes.

1. Euler-Lagrange Equations

In classical mechanics, the trajectories x : [0, T ] → Rn of a me-

chanical system are determined by a variational principle called the

minimal action principle. This principle asserts that the trajectories

are minimizers (or at least critical points) of an integral functional. In

this section we study this problem and discuss several examples.

Consider a mechanical system on Rn with kinetic energy K(x, v)

and potential energy U(x, v). We define the Lagrangian, L(x, v) : Rn×Rn → R to be difference between the kinetic energy K and potential

energy U of the system, that is, L = K−U . The variational formulation

of classical mechanics asserts that trajectories of this mechanical system

minimize (or are at least critical points) of the action functional

S[x] =

∫ T

0

L(x(t), x(t))dt,

under fixed boundary conditions. More precisely, a C1 trajectory x :

[0, T ]→ Rn is a minimizer S under fixed boundary conditions if for any

C1 trajectory y : [0, T ]→ Rn such that x(0) = y(0) and x(T ) = y(T )

we have

S[x] ≤ S[y].

1. EULER-LAGRANGE EQUATIONS 51

In particular, for any C1 function ϕ : [0, T ]→ Rn with compact support

in (0, T ), and any ε ∈ R we have

i(ε) = S[x + εϕ] ≥ S[x] = i(0).

Thus i(ε) has a minimum at ε = 0. So, if i is differentiable, i′(0) = 0. A

trajectory x is a critical point of S, if for any C1 function ϕ : [0, T ]→ Rn

with compact support in (0, T ) we have

i′(0) =d

dεS[x + εϕ]

∣∣∣∣ε=0

= 0.

The critical points of the action which are of class C2 are solutions

to an ordinary differential equation, the Euler-Lagrange equation, that

we derive in what follows. Any minimizer of the action functional

satisfies further necessary conditions which will be discussed in section

2.

Theorem 32 (Euler-Lagrange equation). Let L(x, v) : Rn × Rn → Rbe a C2 function. Suppose that x : [0, T ]→ Rn is a C2 critical point of

the action S under fixed boundary conditions x(0) and x(T ). Then

(29)d

dtDvL(x, x)−DxL(x, x) = 0.

Proof. Let x be as in the statement. Then for any ϕ : [0, T ]→ Rn

with compact support on (0, T ), the function

i(ε) = S[x + εϕ]

has a minimum at ε = 0. Thus

i′(0) = 0,

that is, ∫ T

0

DxL(x, x)ϕ+DvL(x, x)ϕ = 0.

Integrating by parts, we conclude that∫ T

0

[d

dtDvL(x, x)−DxL(x, x)

]ϕ = 0,


for all ϕ : [0, T ] → Rn with compact support in (0, T ). This implies

(29) and ends the proof of the theorem.

Example 11. In classical mechanics, the kinetic energy K of a particle

with mass m with trajectory x(t) is:

K = m|x|2

2.

Suppose that the potential energy U(x) depends only on the position x.

Assume also that U is smooth. Then the Lagrangian for this mechanical

system is then

L = K − U.

and the corresponding Euler-Lagrange equation is

mx = −U ′(x),

which is the Newton’s law. J

Exercise 36. Let P ∈ Rn, and consider the Lagrangian L(x, v) : Rn×Rn → R defined by L(x, v) = g(x)|v|2 +P ·v−U(x), where g and U are

C2 functions. Determine the Euler-Lagrange equation and show that it

does not depend on P .

Exercise 37. Suppose we form a surface of revolution by connecting a

point (x0, y0) with a point (x1, y1) by a curve (x, y(x)), x ∈ [0, 1], and

then revolving it around the y axis. The area of this surface is∫ x1

x0

x√

1 + y2dx.

Compute the Euler-Lagrange equation and study its solutions.

To understand the behavior of the Euler-Lagrange equation it is

sometimes useful to change coordinates. The following proposition

shows how this is achieved:

Proposition 33. Let x : [0, T ]→ Rn be a critical point of the action∫ T

0

L(x, x)dt.


Let g : Rn → Rn be a C2 diffeomorphism and L given by

L(y, w) = L(g(y), Dg(y)w).

Then y = g−1 x is a critical point of∫ T

0

L(y, y)dt.

Proof. This is a simple computation and is left as an exercise to

the reader.

Before proceeding, we will discuss some applications of variational

methods to classical mechanics. As mentioned before, the trajectories

of a mechanical system with kinetic energy K and potential energy

U are critical points of the action corresponding to the Lagrangian

L = K−U . In the following examples we use this variational principle

to study the motion of a particle in a central field, and the planar two

body problem.

Example 12 (Central field motion). Consider the Lagrangian of a

particle in the plane subjected to a radial potential field.

L(x,y, x, y) =x2 + y2

2− U(

√x2 + y2).

Consider polar coordinates, (r, θ), that is (x, y) = (r cos θ, r sin θ) =

g(r, θ), We can change coordinates (see proposition 33) and obtain the

Lagragian in these new coordinates

L(r, θ, r, θ) =r2θ2 + r2

2− U(r).

Then the Euler-Lagrange equations can be written as

d

dtr2θ = 0

d

dtr = −U ′(r) + rθ2.

The first equation implies that r2θ ≡ η is conserved. Therefore, rθ2 =η2

r3 . Multiplying the second equation by r we get

d

dt

[r2

2+ U(r) +

η2

2r2

]= 0.


Consequently

Eη =r2

2+ U(r) +

η2

2r2

is a conserved quantity. Thus, we can solve for r as a function of r

(given the values of the conserved quantities Eη and η) and so obtain

a first-order differential equation for the trajectories. J

Example 13 (Planar two-body problem). Consider now the problem

of two point bodies in the plane, with trajectories (x1,y1) and (x2,y2).

Suppose that the interaction potential energy U depends only on the

distance√

(x1 − x2)2 + (y1 − y2)2 between them. We will show how

to reduce this problem to the one of a single body under a radial field.

The Lagrangian of this system is

L = m1x2

1 + y21

2+m2

x22 + y2

2

2− U(

√(x1 − x2)2 + (y1 − y2)2).

Consider new coordinates (X, Y, x, y), where (X, Y ) is the center of

mass

X =m1x1 +m2x2

m1 +m2

, Y =m1y1 +m2y2

m1 +m2

,

and (x, y) the relative position of the two bodies

x = x1 − x2, y = y1 − y2.

In these new coordinates the Lagrangian, using proposition 33, is

L = L1(X, Y) + L2(x,y, x, y).

Therefore, the equations for the variables X and Y are decoupled from

the ones for x,y. Elementary computations show that

d2

dt2X =

d2

dt2Y = 0.

Thus X(t) = X0 +VXt and Y(t) = Y0 +VY t, for suitable constants X0,

Y0, VX and VY .

Since

L2 =m1m2

m1 +m2

x2 + y2

2− U(

√x2 + y2),

the problem now is reduced to the previous example. J


Exercise 38 (Two body problem). Consider a system of two point

bodies in R3 with masses m1 and m2, whose relative location is given

by the vector r ∈ R3. Assume that the interaction depends only on

the distance between the bodies. Show that by choosing appropriate

coordinates, the motion can be reduced to the one of a single point

particle with mass M = m1m2

m1+m2under a radial potential. Show, by

proving that r × r is conserved, that the orbit of a particle under a

radial field lies in a fixed plane for all times.

Exercise 39. Let x : [0, T ] → Rn be a solution to the Euler-Lagrange

equation associated to a C2 Lagrangian L : Rn × Rn → R. Show that

E(t) = −L(x, x) + x ·DvL(x, x)

is constant in time. For mechanical systems this is simply the conser-

vation of energy. Occasionally, the identity ddtE(t) = 0 is also called

the Beltrami identity.

Exercise 40. Consider a system of n point bodies of mass mi, and

positions ri ∈ R3, 1 ≤ i ≤ n. Suppose the kinetic energy is T =∑imi2|r|2 and the potential energy is U = −

∑i,j 6=i

mimj2|ri−rj | . Let I =∑

imi|ri|2. Show that

d2

dt2I = 4T + 2U,

which is strictly positive if the energy T +U is positive. What implica-

tions does this identity have for the stability of planetary systems?

Exercise 41 (Jacobi metric). Let L(x, v) : Rn × Rn → R be a C2

Lagrangian. Let x : [0, T ] → Rn be a solution to the corresponding

Euler-Lagrange

(30)d

dtDvL−DxL = 0,

for the Lagrangian

L(x, v) =|v|2

2− V (x).

Let E(t) = |x(t)|22

+ V (x(t)).

1. Show that E = 0.


2. Let E0 = E(0). Show that x is a solution to the Euler-Lagrange

equation

(31)d

dtDvLJ −DxLJ = 0

associated to LJ =√E0 − V (x)|x|.

3. Show that any reparametrization in time of x is also a solution

to (31) and observe that the functional∫ T

0

√E0 − V (x)|x|

represents the lenght of the path between x(0) and x(T ) using

the Jacobi metric g =√E0 − V (x).

4. Show that the solutions to the Euler-Lagrange (31) when repar-

ametrized in time in such a way that the energy of the reparametrized

trajectory is E0 satisfy (30).

Exercise 42 (Braquistochrone problem). Let (x1, y1) be a point in a

(vertical) plane. Show that the curve y = u(x) that connects (0, 0) to

(x1, y1) in such a way that a particle with unit mass moving under the

influence a unit gravity field reaches (x1, y1) in the minimum amount

of time minimizes ∫ x1

0

√1 + u2

−2udx.

Hint: use the fact that the sum of kinetic and potential energy is con-

stant.

Determine the Euler-Lagrange equation and study its solutions, us-

ing exercise 39.

Exercise 43. Consider a second-order variational problem:

(32) minx

∫ T

0

L(x, x, x)

where the minimum is taken over all trajectories x : [0, T ] → Rn

with fixed boundary data x(0),x(T ), x(0), x(T ). Determine the Euler-

Lagrange equation corresponding to .

2. FURTHER NECESSARY CONDITIONS 57

2. Further necessary conditions

A classical strategy in the study of variational problems consists

in establishing necessary conditions for minimizers. If there exists a

minimizer and if the necessary conditions have a unique solution, then

this solution has to be the unique minimizer and thus the problem is

solved. In addition to Euler-Lagrange equations, several other neces-

sary conditions can be derived. In this section we discuss boundary

conditions which arise, for instance when the end-points are not fixed,

and second-order conditions.

2.1. Boundary conditions. In certain problems, the boundary

conditions, such as end point values are not prescribed a-priori. In

this case, it is possible to prove that the minimizers satisfy certain

boundary conditions automatically. These are called natural boundary

conditions.

Example 14. Consider the problem of minimizing the integral

(33)

∫ T

0

L(x, x)dt,

over all C2 curves x : [0, T ] → Rn. Note that the boundary values for

the trajectory x at t = 0, T are not prescribed a-priori.

Let x be a minimizer of (33) (with free endpoints). Then for all

ϕ : [0, T ]→ Rn, not necessarily compactly supported,∫ T

0

DxL(x, x)ϕ+DvL(x, x)ϕdt = 0.

Integrating by parts and using the fact that x is a solution to the

Euler-Lagrange equation, we conclude that

DvL(x(0), x(0)) = DvL(x(T ), x(T )) = 0.

J


Exercise 44. Consider the problem of minimizing the integral∫ T

0

L(x, x)dt,

over all C2 curves x : [0, T ]→ Rn such that x(0) = x(T ). Deduce that

DvL(x(0), x(0)) = DvL(x(T ), x(T )).

Use the previous identity to show that any periodic (smooth) minimizer

is in fact a periodic solutions to the Euler-Lagrange equations.

Exercise 45. Consider the problem of minimizing∫ T

0

L(x, x)dt+ ψ(x(T )),

with x(0) fixed and x(T ) free. Derive a boundary condition at t = T

for the minimizers.

Exercise 46 (Free boundary).

Consider the problem of minimizing∫ T

0

L(x, x),

over all terminal times T and all C2 curves x : [0, T ]→ Rn. Show that

x is a solution to the Euler-Lagrange equation and that

L(x(T ), x(T )) = 0,

DxL(x(T ), x(T ))x(T ) +DvL(x(T ), x(T ))x(T ) ≥ 0,

DvL(x(T ), x(T )) = 0.

Let q ∈ R and L : R2 → R given by

L(x, v) =(v − q)2

2+x2

2− 1

If possible, determine T and x : [0, T ]→ R that are (local) minimizers

of ∫ T

0

L(x, x)ds,

with x(0) = 0.

2. FURTHER NECESSARY CONDITIONS 59

2.2. Second-order conditions. If f : R → R is a C2 function

which has a minimum at a point x0 then f ′(x0) = 0 and f ′′(x0) ≥ 0.

For the minimal action problem, the analog of the vanishing of the first

derivative is the Euler-Lagrange equation. We will now consider the

analog to the second derivative being non-negative.

The next theorem concerns second-order conditions for minimizers:

Theorem 34 (Jacobi’s test). Let L(x, v) : Rn × Rn → R be a C2

Lagrangian. Let x : [0, T ] → Rn be a C1 minimizer of the action

under fixed boundary conditions. Then, for each η : [0, T ] → Rn, with

compact support in (0, T ), we have

(34)

∫ T

0

1

2ηTD2

xxL(x, x)η + ηTD2xvL(x, x)η +

1

2ηTD2

vvL(x, x)η ≥ 0.

Proof. If x is a minimizer, the function ε 7→ I[x + εη] has a

minimum at ε = 0. By computing d2

dε2I[x + εη] at ε = 0 we obtain

(34).

A corollary of the previous theorem is Lagrange’s test that we state

next:

Corollary 35 (Lagrange’s test). Let L(x, v) : Rn × Rn → R be a C2

Lagrangian. Suppose x : [0, T ] → Rn is a C1 minimizer of the action

under fixed boundary conditions. Then

D2vvL(x, x) ≥ 0.

Proof. Use Theorem 34 with η = εξ(t) sin tε, for ξ : [0, T ] → Rn,

with compact support in (0, T ), and let ε→ 0.

Exercise 47. Let L : R2n → R be a continuous Lagrangian and let

x : [0, T ]→ Rn be a continuous piecewise C1 trajectory. Show that for

each δ > 0 there exists a trajectory yδ : [0, T ] → Rn of class C1 such

that ∣∣∣∣∫ T

0

L(x, x)−∫ T

0

L(yδ, yδ)

∣∣∣∣ < δ.


As a corollary, show that the value of the infimum of the action over

piecewise C1 trajectories is the same as the infimum over trajectories

globally C1. Note, however, that a minimizer may not be C1.

Exercise 48 (Weierstrass test). Let x : [0, T ]→ Rn be a C1 minimum

of the action corresponding to a Lagrangian L. Let v, w ∈ Rn and

0 ≤ λ ≤ 1 be such that λv + (1− λ)w = 0. Show that

λL(x, x + v) + (1− λ)L(x, x + w) ≥ L(x, x).

Hint: To prove the inequality at a point t0, choose η such that

η(t) =

v if t0 ≤ t ≤ t+ λε

w if t+ λε < t ≤ t0 + ε

0 otherwise

and consider I[x + η], as ε→ 0.

3. Applications to Riemannian geometry

This section is dedicated to some applications of the calculus of

variations to Riemannian geometry, namely the study of geodesics and

curvature. We also present some applications to geometric mechanics,

namely the study of the rigid body.

In our examples we will use most of the time local coordinates and

will not try to address global problems in geometry. In fact, by using

suitable charts, the problems we address can usually be reduced to

problems in Rn. To simplify the notation we will also use the Einstein

convention for repeated indices, that is aibi in fact is an abreviation of∑i aibi.

Example 15. Let M be a Riemannian manifold with metric g, defined

in local coordinates by the positive definite symmetric matrix gij(x).

Let L : TM → R be given by

L(x, v) =1

2gij(x)vivj.

3. APPLICATIONS TO RIEMANNIAN GEOMETRY 61

Let x : [a, b]→M be a curve that minimizes∫ b

a

L(x, x)dt,

over all curves with certain fixed boundary conditions. Then, we have

d

dt(gijxi)−

1

2Djgmkxmxk = 0,

that is,

(35) xi +1

2

[gij (Dkgmj +Dkgmj −Djgmk)

]xmxk = 0,

where gij represents the inverse matrix of gij. We can write the previous

equation in the more compact form

xi + Γikmxmxk = 0,

where

(36) Γikm =1

2gij (Dkgmj +Dmgkj −Djgmk)

is the Christoffel symbol for the metric g (note that the change in the

order of the indices in the second term does not change the sum in (35)

but makes Γ symmetric in the indices m and k). J

Theorem 36. Let gij be a smooth Riemannian metric in Rn. The

critical points x of the functional

(37)

∫ T

0

1

2gij(x)xixjdt

are also critical points of the functional

(38)

∫ T

0

√gij(x)xixjdt,

Additionally, we can reparametrize the critical points of (38) in such a

way that they are also critical points of (37).

Proof. The fact that the critical points of (37) are critical points

of (38) is a simple computation. To prove the second part of the

theorem it suffices to observe that the solutions of the Euler-Lagrange

associated to L preserve the energy E = 12gij(x)xixj. Using this fact is


easy to find the correct parametrization of the critical points of (38).

The minimizers of (38) are called geodesics, although sometimes

the name is also used for critical points.

Example 16. Consider a parametrization f : A ⊂ Rm → Rn of a

m-dimensional manifold. The induced metric in Rm is represented by

the matrix

g = (Df)TDf.

The motivation is the following, given a curve θ(t) ∈ M consider the

corresponding tangent vector θ(t) in TM . Let x = f(θ) and x = Dfθ.

Then we define

〈θ, θ〉 = 〈x, x〉,

which gives rise, precisely to the induced metric. J

Exercise 49. Consider R2\0 with polar coordinates (r, θ). Show that

the standard metric in R2 can be written in these coordinates as

g =

[1 0

0 r2

].

Let

L(r, θ, r, θ) =r2 + r2θ2

2,

the Lagrangian of a free particle in polar coordinates. Compute the

Euler-Lagrange equation and determine the corresponding Christoffel

symbol.

Exercise 50. Consider the sphere x2 + y2 + z2 = 1 and the associate

spherical coordinates (θ, ϕ)

x = cos θ sinϕ

y = sin θ sinϕ

z = cosϕ,


θ ∈ (0, 2π) and ϕ ∈ (0, π). Show that the induced metric is given by

the matrix

g =

[sin2 ϕ 0

0 1

].

Determine the Euler-Lagrange equation for L = 12gijvivj and the Christof-

fel symbol corresponding to the coordinates (θ, ϕ).

Exercise 51. Consider the revolution surface in R3 parametrized by

(r, θ):

x = r cos θ

y = r sin θ

z = z(r).

Show that the induced metric is

g =

[1 + (z′)2 0

0 r2

].

Show that the equation for the geodesics is

θ +2

rrθ = 0

r− r

1 + (z′)2θ2 +

z′z′′

1 + (z′)2r2 = 0

Determine the corresponding Christoffel symbols. Prove the Clairaut

identity, that is, that r cos β is constant, where β is the angle between∂∂θ

and r ∂∂r

+ θ ∂∂θ

.

Exercise 52 (Spherical pendulum). Show that for a spherical pendu-

lum with unit mass, the Lagrangian can be written as

L =θ2 sin2 ϕ+ θ2

2− U(ϕ).

Exercise 53. Determine the Lagrangian of point particle constrained

to the cone z2 = x2 + y2.

Exercise 54. Consider the Lagrangian for a particle of unit mass con-

strained to move in the cycloid parametrized by

x = θ − sin θ y = cos θ.


Show that the y coordinate is 2π-periodic for any initial condition that

yields a periodic orbit.

3.1. Parallel Transport. The Christoffel symbols Γikm can be

used to study parallel transport in a Riemannian manifold. In this

section we define and discuss the main properties of parallel transport.

Let M be a manifold and Ξ(M) the set of all C∞ vector fields in

M . As usual in differential geometry, we identify vector fields in M

with the corresponding first-order linear differential operators. That

is, if X = (X1, . . . Xn) is a vector field, we identify X with the first

order differential operator

X =∑i

Xi∂

∂xi.

Then, the commutator of two vector fields X and Y is the vector field

[X, Y ], which is defined through its action as a differential operator in

smooth functions f :

[X, Y ]f = X(Y (f))− Y (X(f)).

A connection ∇ in M is a mapping

∇ : Ξ× Ξ→ Ξ

satisfying the following properties

1. ∇fX+gYZ = f∇XZ + g∇YZ,

2. ∇X(Y + Z) = ∇XY +∇XZ,

3. ∇X(fY ) = f∇XY +X(f)Y ,

for all X, Y, Z ∈ Ξ(M) and all f, g ∈ C∞(M).

The vector ∇XY represents the rate of variation of Y along a curve

tangent to X.


Exercise 55. Let M be a manifold and ∇ a connection in M . Define

Γikm as

∇ ∂∂xk

∂

∂xm= Γikm

∂

∂xi.

Show that

(39) ∇XY =

[ΓikmXkYm +Xj

∂Yi∂xj

]∂

∂xi,

whereX = Xj∂∂xj

e Y = Yj∂∂xj

.

In every point x, the formula (39) only depends on the value of the

vector field X at x, this allow us to define the covariant derivative of a

vector field Y along a curve x(t) trough

DY

dt= ∇xY.

A vector field X is parallel along a curve x(t) if

DX

dt= 0.

A connection is symmetric if

∇XY −∇YX = [X, Y ].

In general, connections in a manifold do not have to be symmetric, and

therefore

∇XY −∇YX = T (X, Y ) + [X, Y ],

where T is the torsion.

Exercise 56. Determine an expression for the torsion in local coordi-

nates.

Exercise 57. Let ∇ be a symmetric connection. Show that

Γkij = Γkji.

A manifold can be endowed with different connections. For Rie-

mannian manifolds, are of special interest the connections which are


compatible with the metric, that is, that for all vector fields X and Y

satisfy

(40)d

dt〈X, Y 〉 = 〈D

dtX, Y 〉+ 〈X, D

dtY 〉,

where the derivatives are taken along any arbitrary curve x(t). There

exists a unique symmetric connection compatible with the metric, the

Levi-Civita connection, whose Christoffel symbols are given by (36).

Theorem 37. Let M be a Riemannian manifold with metric g. The

the Levi-Civita connection, defined in local coordinates by the Christof-

fel symbols (36), is the unique connection which is symmetric and com-

patible with the metric g.

Proof. Let ∇ be a connection which is symmetric and compatible

with the metric g. Then one can use (40) to determine Dkgmj, Dmgkjand Djgmk and it is a simple computation to show that its Christoffel

symbols are give by (36).

Exercise 58. Verify that the Christoffel symbols define a connection.

Exercise 59. Use formula (36) to determine the Christoffel symbol

corresponding to the polar coordinates in R2 - compare with the result

of exercise 49.

Exercise 60. Let X be a vector field and x a trajectory that satisfies

dx

dt= X(x).

Show that in local coordinates

xi∂

∂xi= Xk(x)

∂Xi

∂xk

∂

∂xi,

and, therefore,

DX

dt=(Γikmxkxm + xi

) ∂

∂xi.


Show that the previous definition is independent of the choice of local

coordinates, which allow us to define covariant acceleration as:

Dx

dt=(Γikmxkxm + xi

) ∂

∂xi,

for any C2 trajectory.

Example 17. Equation (15) can be then rewritten as

Dx

dt= 0,

which should be compared with the Newton law for a particle in the

absence of forces x = 0. J

Exercise 61. Let M be a Riemannian manifold in which is defined a

potential V : M → R. The corresponding Lagrangian is

L(x, v) =1

2gijvivj − V (x).

Determine the Euler-Lagrange equation.

Example 18. A force field in a manifold M is a mapping

F : TM → T ∗M

such that the image of TxM is a subset of T ∗xM . The generalized

Newton law is

gDx

dt= F,

in which the metric g is identified with the operator g : TM → T ∗M

defined by (gX)(Y ) = 〈X, Y 〉. J

3.2. Rigid Body - I. The rigid body is perhaps one of the best

examples in which the geometric formalism of the classical mechanics

is natural.

Consider a rigid body F with a fixed point at the origin. The

position of F at the time t can be described by a matrix M(t) ∈ SO(3)

with M(0) = I (recall that SO(3) is the set of 3× 3 matrices M that

satisfy MTM = I and detM = 1). More precisely, consider a point

of F which was at the position x in t = 0. Then at time t, the same


point is located at x(t) = M(t)x. If the body has mass density ρ(x),

the kinetic energy is given by

T =1

2

∫ρ(x)|Mx|2.

Since M is an isometry, we have

|Mx|2 = |M−1Mx|2,

that is,

T =1

2

∫ρ(x)|M−1Mx|2.

The mapping that to a vector K, tangent to SO(3) at the point M ,

associates

(41) K 7→ 〈K,K〉M =1

2

∫ρ(x)|M−1Kx|2

is a metric in SO(3) which is invariant by left translation. More pre-

cisely, let G ∈ SO(3) be fixed. The left translation by G is the mapping

LG : SO(3)→ SO(3) defined by

LGM = GM, M ∈ SO(3).

We have that L∗G : TMSO(3)→ TGMSO(3) is simply

L∗GK = GK K ∈ TMSO(3).

A metric is called left-invariant if 〈L∗GK,L∗GK〉LGM = 〈K,K〉M .

Exercise 62. Verify that the metric (41) is left invariant.

Exercise 63. Let M(t) be a C1 curve in SO(3). Show that:

1. The matrix M−1M is anti-symmetric.

2. There exists a vector ωM−1M = (ω1, ω2, ω3) such that

M−1M =

0 −ω3 ω2

ω3 0 −ω1

−ω2 ω1 0

and M−1Mx = ωM−1M ×x, in which × is the usual inner prod-

uct in R3. The vector ωMT M is called the angular velocity.


3. Verify that the kinetic energy is a quadratic form in ωM , that

is, there exists a symmetric matrix I (the inertia tensor) such

that

T =1

2ωTMT M

IωMT M .

4. Let M1(t) and M2(t) be C1 curves in SO(3) and M(t) = M1(t)M2(t).

Determine ωMT M as a function of ωMT1 M1

and ωMT2 M2

.

5. Let y(t) be the trajectory of a body in a referential under con-

stant rotation. Identify the forces that act over the body: refer-

ential acceleration, centrifugal force and Coriolis force.

Let M(t) be a curve in SO(3). The trajectory of a point x is

x(t) = M(t)x. Let G ∈ SO(3) and consider the change of coordinates

Gy = x. Then

y(t) = GTM(t)Gy.

And, therefore, in the new coordinates the trajectory isN(t) = GTM(t)G

The kinetic energy can be written as

T =1

2ωTMT M

IωMT M =1

2ωTNT N

IωNT N .

We would like to relate I and ωNT N with I and ωMT M . We have

ωNT N ∧ x = GTMTMGx = GT (ωMT M ∧ (Gx))

= GTG[(GTωMT M) ∧ (GT (Gx))

]= (GTωMT M) ∧ x,

that is, ωNT N = GTωMT M and, consequently,

I = GT IG.

Since I is symmetric, we can always choose a rotation matrix G such

that in the new referential

I =

I1 0 0

0 I2 0

0 0 I3

.The constants Ii are called the principal moments of inertia.


3.3. Poincare equations. Let M be a differentiable manifold.

Consider a set of n linearly independent vector fields Zi in M . The

speed x of a trajectory x : [0, T ] → M can be written as a linear

combination of these vector fields:

x(t) = wi(t)Zi(x),

the functions wi(t) are called quasi-velocities [AKN97].

Sometimes it is useful to write the Lagrangian as a function of

the quasi-velocities, that is, we write L(x,w). We will deduce the

Euler-Lagrange equations in this situation. Let us consider a family of

trajectories xτ (t) depending differentiably on a parameter τ . We have

∂xτ∂t

= wiZi∂xτ∂τ

= ξiZi.

Write Zi = Zki

∂∂xk

. Then by differentiating and dropping the subscript

in xτ ,

∂2xk∂τ∂t

=∂wi∂τ

Zki + wi

∂Zki

∂xm

∂xm∂τ

=∂wi∂τ

Zki + wiξjZ

mj

∂Zki

∂xm,

and

∂2xk∂t∂τ

=∂ξi∂tZki + ξj

∂Zkj

∂xm

∂xm∂t

=∂ξi∂tZki + wiξjZ

mi

∂Zkj

∂xm.

As ∂2x∂t∂τ

= ∂2x∂τ∂t

and

[Zj, Zi] =

[Zmj

∂Zki

∂xm− Zm

i

∂Zkj

∂xm

]∂

∂xk,

we have

0 =∂ξi∂tZi −

∂wi∂τ

Zi + wiξj[Zi, Zj],

that is,∂wi∂τ

=∂ξi∂t

+ cikjwkξj,


where cikjZi = [Zk, Zj]. Then

d

dτ

∫ T

0

L(xτ ,wτ ) =

∫ T

0

Zi(L)ξi +∂L

∂wi

(∂ξi∂t

+ cikjwkξj

)and, therefore,

d

dt

∂L

∂wi= Zi(L) +

∂L

∂wkckjiwj.

3.4. Rigid body - II. Let M and N be differentiable manifolds.

Recall that for a diffeomorphism f : M → N and any vector field X in

TM we define the vector field f∗X to be the vector field in TN which

satisfies

(f∗X)(h) = (X(h f)) f−1,

for all h ∈ C∞(N).

In the case of the rigid body, or more generally, in the case of a

Lagrangian defined in a Lie group and left invariant, we can choose the

vector fields Zi of the form

Zi(g) = g∗Zi(e),

that is, left invariant vector fields.

Lemma 38. Let Xi and Yi, i = 1, 2, be vector fields in a manifold M

and f : M →M a diffeomorfism. Assume that

Yi = f∗Xi.

Then

[Y1, Y2] = f∗[X1, X2].

Proof. Let p ∈M . We have

Yi(g)|f(p) = f∗Xi(g)|f(p) = Xi(g f)|p ,

that is,

Yi(g) f = Xi(g f).

Therefore

Y1(Y2(g))|f(p) = X1(Y2(g) f) = X1(X2(g f)).


Consequently

[Y1, Y2] = f∗[X1, X2].

Thus, from the previous result, ckij is constant since

[g∗Zi, g∗Zj] = g∗[Zi, Zj].

Therefore, if L is left invariant L ≡ L(w). Consequently

d

dt

∂L

∂wi=

∂L

∂wkckjiwj.

In the case of a rigid body, using, if necessary, a orthogonal transfor-

mation to diagonalize the inertia tensor into diag(I1, I2, I3). We can

choose vectors Z1, Z2, Z3, such that in the identity they have the fol-

lowing form:

Z1 =

0 1 0

−1 0 0

0 0 0

, Z2 =

0 0 −1

0 0 0

1 0 0

e Z3 =

0 0 0

0 0 1

0 −1 0

and that are left invariant. Thus, the Lagrangian is

L(w) =I1w

21 + I2w

22 + I3w

23

2.

Exercise 64. Verify that the commutator of the vector fields Zi corre-

sponds to the commutator of the corresponding matrices, and that

[Z1, Z2] = Z3 [Z2, Z3] = Z1 [Z3, Z1] = Z2.

Using the previous exercise, the Euler-Lagrange equation are then

I1w1 = (I2 − I3)w2w3(42)

I2w2 = (I3 − I1)w3w1

I3w3 = (I1 − I2)w1w2,

that is,

(43) Iw + w × (Iw) = 0.


The angular momentum vector is given by

N = Iω.

With this notation, (43) can be written as

(44) N = N ∧ ω.

From the previous equation we conclude that

d

dt‖N‖2 = 0

d

dtN · ω = 0.

The first identity represents the conservation of the total angular mo-

mentum and the second the conservation of the energy. Let

L =

0 N1 −N2

−N1 0 N3

N2 −N3 0

, A =

0 ω1 −ω2

−ω1 0 ω3

ω2 −ω3 0

.The equation (44) can be written as

(45) L = [A,L].

A pair (A,L) satisfying (45) is called a Lax pair. Equations with the

previous structure have a rich structure and are interesting in the study

of diverse equations such as Kortwreg-of-Vries equations.

Proposition 39. Let L be a solution of (45). Then the eigenvalues of

L are constant. Furthermore, if v0 is an eigenvalue of L at t = 0 and

v solves

v = Av,

with v(0) = v0 then v(t) is an eigenvalue for all t.

Proof. Let v(0) = v0 be an eigenvector of L at t = 0 with corre-

sponding eigenvalue λ ∈ C. Define v(t) through the differential equa-

tion

v = Av.

Thend

dtLv = Lv + Lv = ALv − LAv + LAv = ALv,

that is, w = Lv satisfies

w = Aw, w(0) = λv(0),


which implies w(t) = λv(t).

The Euler equation (42) admits as stationary solutions rotations

around each of the principal inertia axis. For instance, ω1 6= 0, ω2 =

ω3 = 0. In the case in which I1 = I2 = I3 the only solutions are

stationary rotations ω = 0.

Proposition 40. The stationary solution ω1 6= 0, ω2 = ω3 = 0 is stable

if I1 < I2, I3 or I2, I3 < I1 and unstable if I2 < I1 < I3 or I3 < I1 < I2.

Proof. In the unstable cases, it suffices to look at the linearized

matrix (42): 0 0 0

0 0 I3−I1I2

ω1

0 I1−I2I3

ω1 0

,and check that it has two eigenvalues with opposite sign. The stable

case requires some additional work which is left to the reader.

If I1 = I2 = Ic the body is called a symmetrical top. In this in case,

ω3 = 0

and

ω2 =Ic − I3

Icω1ω3

ω1 =I3 − IcIc

ω3ω2.

From this last equation one concludes that

ω1 = −(I3 − Ic)2

I2c

ω32ω1,

that is,

ω1 = −kω1,

with k > 0, which implies that ω1 is a periodic function, and, in a

similar way, the same holds for ω2.

4. HAMILTONIAN DYNAMICS 75

Finally, in the general case, the conservation of energy and total

angular momentum implies that the trajectory ω(t) satisfies:

I21ω

21 + I2

2ω22 + I2

3ω23 = C1

I1ω21 + I2ω

22 + I3ω

23 = C2,

that is, the trajectories belongs to the intersection of two ellipsoids.

Exercise 65. Consider a rigid body with mass density ρ. Show that

the inertia tensor admits the matricial representation:

I =

∫

(y2 + z2)dρ −∫xydρ −

∫xzdρ

−∫xydρ

∫(x2 + z2)dρ −

∫yzdρ

−∫xzdρ −

∫yzdρ

∫(x2 + y2)dρ

.Exercise 66. Show that S(θ, ϕ, ψ) given by cosϕ − sinϕ 0

sinϕ cosϕ 0

0 0 1

1 0 0

0 cos θ − sin θ

0 sin θ cos θ

cosψ − sinψ 0

sinψ cosψ 0

0 0 1

,for (θ, ϕ, ψ) ∈ (0, π) × (0, 2π) × (0, 2π) defines a local parametrization

of SO(3). The coordinates (θ, ϕ, ψ) are called the Euler angles.

Exercise 67. Consider a rigid body with a fixed point and such that

I1 = I2. Show that the kinetic energy written in the local coordinates

(θ, ϕ, ψ) is

I1

2(θ2 + ϕ2 sin θ2) +

I3

2(ψ + φ cos θ)2.

4. Hamiltonian dynamics

In this section we introduce the Hamiltonian formalism of Classical

Mechanics. We start by discussing the main properties of the Legendre

transform. Then we derive Hamilton’s equations. Afterwards we dis-

cuss briefly the classical theory of canonical transformations. The sec-

tion ends with a discussion of additional variational principles.


4.1. Legendre transform. Before we proceed, we need to discuss

the Legendre transform of convex functions. The Legendre transform is

used to define the Hamiltonian of a mechanical system and it plays an

essential role in many problems in calculus of variations. Additionally,

it illustrates many of the tools associated with convexity.

Let L(v) : Rn → R be a convex function, satisfying the following

superlinear growth condition:

lim|v|→∞

L(v)

|v|= +∞.

The Legendre transform L∗ of L is

L∗(p) = supv∈Rn

[−v · p− L(v)] .

This is the usual definition of Legendre transform in optimal control,

see [FS93] or [BCD97]. However, it differs by a sign from the Legendre

transform traditionally used in classical mechanics:

L](p) = supv∈Rn

[v · p− L(v)] ,

as it is defined, for instance, in [AKN97] or [Eva98b]. They are

related by the elementary identity

L∗(p) = L](−p).

We will frequently denote L∗(p) by H(p). The Legendre transform of

H is denoted by H∗ and is

H∗(v) = supp∈Rn

[−p · v −H(p)] .

In classical mechanics, the Lagrangian L can depend also on a po-

sition coordinate x ∈ Rn, L(x, v), but for purposes of the Legendre

transform x is taken as a fixed parameter. In this case we write also

H(p, x) = L∗(p, x).

Proposition 41. Let L(x, v) be a C2 function, which for each x fixed

is uniformly convex and superlinear in v. Let H = L∗. Then

1. H(p, x) is convex in p;


2. H∗ = L;

3. for each x

lim|p|→∞

H(p, x)

|p|=∞;

4. let v∗ be defined by p = −DvL(x, v∗), then

H(p, x) = −v∗ · p− L(x, v∗);

5. in a similar way, let p∗ be given by v = −DpH(p∗, x), then

L(x, v) = −v · p∗ −H(p∗, x);

6. if p = −DvL(x, v) or v = −DpH(p, x), then

DxL(x, v) = −DxH(p, x).

Proof. The first statement follows from the fact that the supre-

mum of convex functions is a convex function. To prove the second

point, observe that

H∗(x,w) = supp

[−w · p−H(p, x)]

= supp

infv

[(v − w) · p+ L(x, v)] .

For v = w we conclude that

H∗(x,w) ≤ L(x,w).

The opposite inequality is obtained by observing, since L is convex in

v, that for each w ∈ Rn there exists s ∈ Rn such that

L(x, v) ≥ L(x,w) + s · (v − w).

Therefore,

H∗(x,w) ≥ supp

infv

[(p+ s) · (v − w) + L(x,w)] ≥ L(x,w),

by letting p = −s.

To prove the third point observe that

H(p, x)

|p|≥ λ−

L(x,−λ p|p|)

|p|,


by choosing v = −λ p|p| . Thus, we conclude

lim inf|p|→∞

H(p, x)

|p|≥ λ.

Since λ is arbitrary, we have

lim inf|p|→∞

H(p, x)

|p|=∞.

To establish the fourth point, note that for fixed p the function

v 7→ v · p+ L(x, v)

is differentiable and strictly convex. Consequently, its minimum, which

exists by coercivity and is unique by the strict convexity, is achieved

for

−p−DvL(x, v) = 0.

Note also that v as function of p is a differentiable function by the

inverse function theorem.

The proof of the fifth point is similar.

Finally, to prove the last item, observe that for

p(x, v) = −DvL(x, v),

we have

H(p(x, v), x) = −v · p(x, v)− L(x, v).

Differentiating this last equation with respect to x and using

v = −DpH(p(x, v), x),

we obtain

DxH = −DxL.

Exercise 68. Compute the Legendre transform of the following func-

tions:


1.

L(x, v) =1

2aij(x)vivj + hi(x)vi − U(x),

where aij is a positive definite matrix and h(x) an arbitrary

vector field.

2.

L(x, v) =√aij(x)vivj,

where aij is a positive definite matrix.

3.

L(x, v) =1

2|v|λ − U(x),

with λ > 1.

Exercise 69. By allowing the Lagrangian and its Legendre transform

to assume the values ±∞ comute the Legendre transforms of

1. for ω ∈ Rn

L(v) =

0 if v = ω

+∞ otherwise.

2. for ω ∈ Rn set

L(v) = ω · v.3. for R > 0

L(v) =

0 if |v| ≤ R

+∞ otherwise.

4.2. Hamiltonian formalism. To motivate the Hamiltonian for-

malism, we consider the following alternative problem. Rather than

looking for curves x : [0, T ]→ Rn, which minimize the action∫ T

0

L(x, x)dt

we can consider extended curves (x,v) : [0, T ] → R2n which minimize

the action

(46)

∫ T

0

L(x,v)dt


and that satisfy the additional constraint x = v. Obviously, this prob-

lem is equivalent to the original one, however it motivates the intro-

duction of a Lagrange multiplier p in order to enforce the constraint.

Therefore, we will look for critical points of

(47)

∫ T

0

L(x,v) + p · (v − x)dt.

Proposition 42. Let L : Rn × Rn → R be a smooth Lagrangian. Let

(x,v) : [0, T ] → R2n be a critical point of (46) under fixed boundary

conditions and under the constraint x = v (the choice of p is irrelevant

since the corresponding term always vanishes). Let

p = −DvL(x,v).

Then the curve (x,v,p) is a critical point of (47) under fixed boundary

conditions. Additionally, any critical point (x,v,p) of (47) satisfiesx = v

p = −DvL(x,v)

p = DxL(x,v).

In particular, x is a critical point of (46). Furthermore, the Euler-

Lagrange equation can be rewritten as

p = DxH(p,x) x = −DpH(p,x).

Proof. Let φ, ψ and η be C2([0, T ],Rn) with compact support in

(0, T ). Then, at ε = 0

d

dε

∫ T

0

L(x + εφ,v + εψ) + (p + εη) · (v − x) + ε(p + εη) · (ψ − φ)

=

∫ T

0

DxL(x, x)φ+DvLψ + p · (ψ − φ) + η · (v − x)

=

∫ T

0

[DxL(x, x) + p]φ = 0.

If p = −DvL(x, v), then v maximizes

−p · v − L(x, v).


Let

H(p, x) = maxv

[−p · v − L(x, v)] .

By proposition 41 we have

DxH(p, x) = −DxL(x, v)

whenever

p = −DvL(x, v).

Additionally, we also have

v = −DpH(p, x).

Therefore, the Euler-Lagrange equation can be rewritten as

p = DxH(p,x) x = −DpH(p,x).

These are the Hamilton equations.

Exercise 70. Suppose H(p, x) : Rn ×Rn → R is a C1 function. Show

that the energy, which coincides with H, is conserved by the Hamilton-

ian flow sinced

dtH(p,x) = 0.

4.3. Canonical transformations. Before discussing canonical trans-

formations we need to review some basic facts about differential forms

in Rn. Firstly, recall that given a C1 function f : Rn → R its differen-

tial, denoted by df , is a mapping df : Rn × Rn → R that to any point

x ∈ Rn and each direction v ∈ Rn it associates the derivative of f in

the direction v:

df(x)(v) =d

dtf(x+ vt)

∣∣∣∣t=0

.

Note that for each x ∈ Rn this mapping is linear in v. For instance, for

each coordinate i ∈ 1, . . . , n we can consider the projection function

in this coordinate: x 7→ xi, whose differential is dxi.

A (first order) differential form is any mapping

Λ : Rn × Rn → R,


which is linear on the second coordinate. For simplicity, we assume

also that this map is continuous in the first coordinate. Clearly we can

write

Λ =∑i

fi(x)dxi,

where fi(x) = Λ(x)(ei).

An important example of a differential form is the differential df of

a C1 function f . In fact, by linearity, we have

df =∑i

∂f

∂xidxi.

The integral of a differential Λ form along a path γ : [0, T ]→ Rn is

simply ∫ T

0

Λ(γ(t))(γ(t))dt =∑i

∫ T

0

fi(γ(t))γi(t)dt.

Exercise 71 (Poincare-Cartan invariant). Fix t ∈ R and consider a

closed curve

γ = (x(s, t),p(s, t)) : [0, T ]→ R2n.

Suppose that for each fixed s ∈ [0, 1]

d

dtx(s, t) = −DpH(p(s, t),x(s, t))

d

dtp(s, t) = DxH(p(s, t),x(s, t)).

Show that ∮pdx ≡

∫ 1

0

p · ∂x

∂sds

is independent of t.

Exercise 72. Show that the critical points of∫ T

0

pdx +H(p,x)dt

under fixed boundary conditions satisfy the Hamilton equations.


Let (x,p) be a solution of the Hamilton equation. By exercise

72, (x,p) is a critical point of∫pdx +Hdt.

Let S(x, p) : R2n → R be a C1 function. Then (x,p) is also a critical

point of ∫pdx +Hdt− dS,

because the last integral differs from the previous only be the addition

of the differential of a function S. Consider now a change of coordinates

P (x, p), X(x, p). In general the functional∫pdx + Hdt − dS when

rewritten in terms of the new coordinates (P,X) does not have the

form∫PdX + H(P,X)dt, and, therefore, the Hamilton equations in

these new coordinates may not have the standard form. A change of

coordinates (x, p) 7→ (X(x, p), P (x, p)) is called canonical if there exist

functions S and H(P,X) such that

(48) pdx+Hdt− dS = PdX + Hdt.

Consider now a solution (x,p) : [0, T ] → R2n of Hamilton’s equa-

tions. Suppose the coordinate change (x, p) 7→ (X(x, p), P (x, p)) is

canonical. Then the trajectory written in the new coordinates (X,P)

is a critical point of the functional∫ T

0

PdX + Hdt.

Therefore (X,P) satisfies Hamilton’s equations in the new coordinates,

which are

(49) P = DXH(P,X) X = −DP H(P,X).

Thus, in order to have (48), we must have (because the change of

coordinates does not depend on t)

H(p, x) = H(P (p, x), X(p, x)).

From this we conclude that

pdx− PdX = dS.


Suppose now we can write the function S in terms of x and X, that is

S ≡ S(x,X). Then

(50) p = DxS P = −DXS.

Consider now the inverse procedure. Given S(x,X), suppose that (50)

defines a change of coordinates (for this to happen locally it is sufficient,

by the implicit function theorem that detD2xXS 6= 0). Then, in these

new coordinates we have (49). Since S determines (at least formally)

the change of coordinates, we call it a generating function. J

Example 19. Consider the generating function S(x,X) = xX. Then

the corresponding canonical transformation is p = X, P = −x. Thus,

(x, p) 7→ (X,P ) = (p,−x) and H(P,X) = H(−P,X). J

Suppose now that S, written as a function of (x, P ), is:

S(x, P ) = −PX + S1(x, P ).

Then (48) can be written as:

pdx+ PdX +XdP −DxS1dx−DPS1dP = PdX,

that is,

p = DxS1 X = DPS1.

Example 20. Let S1(x, P ) = xP . Then p = P and X = x, therefore

S1 generates the identity transformation. J

Exercise 73. Assume now that S can be written as a function of X

and p and that we have

S(X, p) = px+ S2(X, p).

Determine the corresponding canonical transformation in terms of S2.

Exercise 74. Suppose that S can be written as a function of p and P

with the following form:

S(p, P ) = px− PX + S3(p, P ).

Determine corresponding canonical transformation in terms of S3.


Example 21. Consider the Hamiltonian

H ≡ H(px, py, x− y).

Choosing

S1 = P1(x+ y) + P2(x− y)

we obtain

px = P1 + P2 py = P1 − P2,

X1 = x1 + x2 X2 = x− y,

and

H(P1, P2, X1, X2) ≡ H(P1, P2, X2) = H

(P1 + P2

2,P1 − P2

2, X2

),

which does not depend on X1 and, therefore, P1, the total linear mo-

mentum is conserved. J

Example 22. Let S1(x, P ) be a C2 solution of the Hamilton-Jacobi

equation

H(DxS1(x, P ), x) = H(P ).

Suppose that

X = DPS1(x, P ) p = DxS1(x, P )

defines implicitly a change of coordinates (x, p) 7→ (X,P ). Assume

that detD2xPS1 6= 0. Then, if (x(t),p(t)) satisfy

x = −DpH(p,x) p = DxH(p,x),

in the new coordinates we have

X = −DPH(P) P = 0.

J

Example 23. Consider a Hamiltonian H(p, x) with one degree of free-

dom, that is x, p ∈ R. We would like to construct a canonical change

of coordinates such that the new Hamiltonian depends only on P . We

will first construct the corresponding generating function. For that,

suppose that there exists such a generating function S1(x, P ). Then

dS1 = XdP + pdx.


Fix a value P . We will try to choose S1 so that the new Hamiltonian H

depends only on P , that is H(p(P,X), x(P,X)) = H(P ). Along each

curve γ = (x,p) : [0, T ]→ R2 such that P is constant, we have

dS1 = pdx.

Therefore,

S1(x(T ), P )− S1(x(0), P ) =

∫ T

0

p(t) · x(t)dt.

In principle, from the equation H(p, x) = H(P ). So we can solve for p

as a function of x and of the value H(P ). In this case, the generating

function is automatically determined as a function of H and of x. In the

following example we consider a concrete application of this technique.

J

Example 24. Consider the Hamiltonian system with one degree of

freedom:

H(p, x) =p2

2+ V (x),

with V (x) 2π-periodic. For each value of H(P ) we have (assuming for

definiteness p > 0)

S1(x, P ) =

∫ x

0

√2(H(P )− V (y))dy.

Therefore,

X =

∫ x

0

∂

∂H

√2(H(P )− V (y))DPH(P )dy.

In principle, the function H(P ) can be more or less arbitrary. To

impose uniqueness it is convenient to require periodicity in the change

of variables

X(0, P ) = X(2π, P ),

which implies

DPH(P ) =

[∂

∂H

∫ 2π

0

√2[H(P )− V (y)

]dy

]−1

.

J


Exercise 75. Show that the polar coordinates change of variables (x, p) =

(r cos θ, r sin θ) is not canonical. Determine a function g(r) such that

(x, p) = (g(r) cos θ, g(r) sin θ) is a canonical transformation (for r > 0).

4.4. Other variational principles. In the case of Hamiltonian

systems, as the next exercise shows, there exists an additional varia-

tional principle:

Exercise 76. Show that the critical points (x,p) of the functional∫ T

0

px− xp

2+H(p,x)

are solutions to the Hamilton equation

Unfortunately the functional of the previous exercise is not coer-

cive in W 1,2 and may not have any minimizer. The Clarke duality

principle (following exercise) is another variational principle for convex

Hamiltonians which is coercive.

Exercise 77 (Clarke duality). Let H(p, x) : R2n → R be a C∞ func-

tion, strictly convex and coercive, both in x and p. Let H∗(vx, vp) :

R2n → R be the total Legendre transform

H∗(wx, wp) = supx,p−wx · x− wp · p−H(p, x).

Let (vx,vp) be a critical point of∫ T

0

1

2[vx · vp − vp · vx] +H∗(vx, vp).

Show that

x = −DvxH∗(vx, vp) p = −DvpH

∗(vx, vp)

is a solution of Hamilton’s equations.

Exercise 78. Apply the previous exercise to the Hamiltonian

H(p, x) =p2 + x2

2.


Example 25 (Maupertuis principle). Consider a system with Lagrangian

L and energy given by

E(x, x) = DvL(x, x)x− L(x, x).

Since the energy is conserved by the solutions of the Euler-Lagrange

equation, the critical points of the action are also critical points of the

functional ∫ T

0

L+ E =

∫ T

0

DvL(x, x)x,

under the constraint that energy is conserved.

Obviously, in general it is hard to construct energy-preserving vari-

ations. We are going to illustrate, in an example, how to avoid this

problem. Let L be the Lagrangian

L(x, v) =1

2gijvivj − U(x).

Then,

E =1

2gijvivj + U(x)

and

DvLv = gijvivj.

Thus we can write

DvLv = 2 (E − U(x)) .

Therefore the functional can be rewritten as

(51) M(x, E) =

∫ T

0

√2 (E − U(x))

√gijxixjdt

The last term represents the arc length along the curve that connects

x(0) to x(T ). This integral is independent of the parametrization and

therefore we can look at its critical points (without any constraint)

which obviously depend on the parameter E. Then, once determined,

in principle we can choose a parametrization of the curve that pre-

serves the energy. The next exercise shows that such critical points are

solutions to the Euler-Lagrange equation:

5. SUFFICIENT CONDITIONS 89

Exercise 79. Let x be a critical point of M(x, E0) parametrized in

such a way that

E(x, x) = E0.

Show that x is a solution of the Euler-Lagrange equation.

J

5. Sufficient conditions

This section addresses a very classical topic in the calculus of vari-

ations, namely the study of conditions that ensure that a solution to

the Euler-Lagrange equation is indeed a minimizer.

5.1. Existence of minimizers. In general, it is not possible to

guarantee that a solution to the Euler-Lagrange is a minimizer of the

action. However, for short time, the next theorem settles this issue.

Theorem 43 (Existence of minimizers). Let L(x, v) be strictly convex

in v satisfying

|D2xxL| ≤ C, |D2

xvL| ≤ C.

Let x : [0, T ]→ Rn be a solution to the Euler-Lagrange equation. Then,

for T sufficiently small, x is a minimizer of the action over all C1

functions y with the same endpoints: y(0) = x(0), and y(T ) = x(T ).

Proof. Observe that if f is a C2 function then

f(1) = f(0) + f ′(0) +

∫ 1

0

∫ s

0

f ′′(r)drds.

Applying this identity to

f(r) = L((1− r)x + ry, (1− r)x + ry),


we obtain

∫ T

0

L(y, y)dt

=

∫ T

0

[L(x, x) +DxL(x, x)(y − x) +DvL(x, x)(y − x)

+

∫ 1

0

∫ s

0

[(y − x)TD2

xxL((1− r)x + ry, (1− r)x + ry)(y − x)

+ 2(y − x)TD2xvL((1− r)x + ry, (1− r)x + ry)(y − x)

+(y − x)TD2vvL((1− r)x + ry, (1− r)x + ry)(y − x)

]drds

]dt.

Since x satisfies the Euler-Lagrange equation and, by strict convexity,

D2vvL ≥ γ, we have

∫ T

0

L(y, y)dt ≥∫ T

0

[L(x, x)

+

∫ 1

0

∫ s

0

((y − x)TD2

xxL((1− r)x + ry, (1− r)x + ry)(y − x)

+2(y − x)TD2xvL((1− r)x + ry, (1− r)x + ry)(y − x)

)drds

+γ|y − x|2]dt.

The one-dimensional Poincare inequality implies

∫ T

0

|y − x|2dt ≤ T 2

2

∫ T

0

|y − x|2dt,

that is,

∫ T

0

∫ 1

0

∫ s

0

(y − x)T ·

·D2xxL((1− r)x + ry, (1− r)x + ry)(y − x)drdsdt

≥ −CT 2

∫ T

0

|y − x|2.


Thus, for any ε,∫ T

0

∫ 1

0

∫ s

0

(y − x)T ·

·D2vxL((1− r)x + ry, (1− r)x + ry)(y − x)drdsdt

≥ −ε∫ T

0

|y − x|2 − C

ε

∫ T

0

|y − x|2

≥ −(ε+

CT 2

ε

)∫ T

0

|y − x|2.

Thus, choosing T sufficiently small and taking, ε = T we obtain∫ T

0

L(y, y)dt ≥∫ T

0

L(x, x) + θ

∫ T

0

|y − x|2,

for some θ > 0.

Exercise 80. Prove the one-dimensional Poincare inequality∫ T

0

φ2 ≤ T 2

2

∫ T

0

|φ|2

for all C1 function φ satisfying φ(0) = φ(T ) = 0.

Exercise 81. Suppose that the Lagrangian L instead of satisfying

|D2xxL| ≤ C, |D2

xvL| ≤ C,

as in theorem 43, satisfies

|D2xxL| ≤ C(1 + |v|2), |D2

xvL| ≤ C(1 + |v|).

Assume further that the curves y are constrained to have bounded

derivatives in L2. Can you adapt the proof and the statement of theo-

rem 43 to include this case?

5.2. Hamilton-Jacobi equations.

Theorem 44. Let V (x, t) be a C2 solution of the Hamilton-Jacobi

equation

(52) Vt = H(DxV, x),

for 0 ≤ t ≤ T . Let x be a solution to the equation

x = −DpH(DxV (x),x).


Then x is a solution to the Euler-Lagrange equation

d

dtDvL−DxL = 0

which minimizes the action

(53)

∫ T

0

L(x, x)dt,

under fixed boundary conditions.

Proof. Obviously, it suffices to show that the trajectory x mini-

mizes the action that automatically it will be a solution to the Euler-

Lagrange equation. Observe that the problem of minimizing (53) with

fixed endpoints is equivalent to minimize∫ T

0

L(x, x) + xDxV (x) + Vtdt,

with the same endpoint constraint. In the trajectory x we have∫ T

0

L(x, x) + xDxV (x) + Vtdt = 0.

But for any other trajectory y we have

L(y, y) + yDxV (y) ≥ L(y, y)−DpH(DxV (y),y)DxV (y)

= −H(DxV (y),y),

and, therefore, ∫ T

0

L(y, y) + yDxV (y) + Vt(y)dt ≥ 0.

To solve the Hamilton-Jacobi equation we can use the method of

characteristics: let (p,x) be a solution of Hamilton’s equation:

(54) p = DxH(p,x) x = −DpH(p,x),

with initial data (p(0),x(0)) = (DxV (x), x). Then

V (x(0), 0)− V (x(t), t) =

∫ t

0

L(x, x)ds.


Therefore, in order for the method of characteristics to yield a solution

to the Hamilton-Jacobi equation in a neighborhood of the trajectory,

we must have that the mapping

x 7→ x(t;x)

is invertible. As it was seen previously, the equation (54) is equivalent

to the Euler-Lagrange equation:

d

dtDvL(x, x)−DxL = 0.

The derivative of this equation with respect to a parameter is the Jacobi

equation

(55)d

dt

[D2vvLY +D2

xvLY]−D2

xvLY −D2xxLY = 0.

If Y (0) = I then there exists T > 0 such that detY (t) 6= 0 for all

0 ≤ t < T , and therefore the method of the characteristics yields a

local solution of the Euler-Lagrange equation.

5.3. Existence and regularity of minimizers. In this section

we assume that the Lagrangian L(x, v) is C∞, strictly convex in v,

satisfies

(56) − C + θ|v|2 ≤ L(x, v) ≤ C(1 + |v|2),

for θ > 0, and that, for each fixed compact K and x ∈ K we have

(57) |DxL(x, v)| ≤ CK(1 + |v|2),

(58) |DvL(x, v)| ≤ CK(1 + |v|).

Theorem 45. Suppose L(x, v) : Rn × Rn → R is smooth and satisfies

the previous (56), (57) and (58). Then, for any T > 0 and any x0, x1 ∈Rn, there exists a minimizer of x ∈ W 1,2[0, T ] of

(59)

∫ T

0

L(x, x)ds

satisfying x(0) = x0, x(T ) = x1.


Proof. Let xn be a minimizing sequence. Then, using (56) we

conclude that ‖xn‖L2 is uniformly bounded. By Poincare inequality,

we conclude that

supn‖xn‖W 1,2 <∞.

By Morrey’s theorem, the sequence xn is equicontinuous and bounded

(since xn(0) is fixed), thus there exists, by Ascoli-Arzela theorem, a

subsequence which converges uniformly. We can extract a further sub-

sequence that converges weakly in W 1,2 to a function x. We would like

to prove that x is a minimum. To do that it is enough to prove that

the functional is weakly lower semicontinuous, that is, that

(60) lim infn→∞

∫ T

0

L(xn, xn) ≥∫ T

0

L(x, x),

whenever xn x in W 1,2. By contradiction suppose that there is a

sequence xn x such that

(61) lim infn→∞

∫ T

0

L(xn, xn) <

∫ T

0

L(x, x),

By convexity,∫ T

0

L(xn, xn)(62)

≥∫ T

0

L(xn, xn)− L(x, xn) + L(x, x) +DvL(x, x)(xn − x).

Because xn x we have∫ T

0

DvL(x, x)(xn − x)→ 0,

since DvL(x, x) ∈ L2. From the uniform convergence of xn to x we

conclude that ∫ T

0

L(xn, xn)− L(x, xn)→ 0,

since

|L(xn, xn)− L(x, xn)| ≤ CK |xn − x|(1 + |xn|2).

Thus by taking the lim inf in (62) we obtain a contradiction to (61),

and therefore (60) holds.


Theorem 46. Let x be a minimizer of (59). Then x is a weak solution

to the Euler-Lagrange equation, that is, for all ϕ ∈ C∞c (0, T ),

(63)

∫ T

0

DxL(x, x)ϕ+DvL(x, x)ϕ = 0.

Proof. To obtain this result, it is enough to prove that at ε = 0,

d

dε

∫ T

0

L(x + εϕ, x + εϕ)

∣∣∣∣ε=0

=

∫ T

0

d

dεL(x + εϕ, x + εϕ)

∣∣∣∣ε=0

,

that is, justify the exchange of the derivative with the integral.

By Morrey’s theorem, since x ∈ W 1,2(0, T ), we have ‖x‖L∞ ≤ C.

So x ∈ K for a suitable compact set K. Let |ε| < 1. Observe that

there exists a compact K ⊃ K such that x + εϕ ∈ K for all t. For

almost every t ∈ [0, T ], the function

ε 7→ L(x + εϕ, x + εϕ)

is a C1 function of ε. Furthermore

|L(x + εϕ, x + εϕ)| ≤ CK(1 + |x + εϕ|2) ≤ CK(1 + |x|2 + |ϕ|2),

and, ∣∣∣∣ ddεL(x + εϕ, x + εϕ)

∣∣∣∣ ≤ CK(1 + |x|2 + |ϕ|2)(|ϕ|+ |ϕ|).

This estimate allows us to exchange the derivative with the integral.

Exercise 82. Show that the identity (63) also holds for ϕ ∈ W 1,20 .

Theorem 47. Suppose L(x, v) : Rn×Rn → R is smooth, satisfies (56)

and it is strictly convex. Then the weak solutions to the Euler-Lagrange

equation are C2 and, therefore, classical solutions.

Proof. Let x ∈ W 1,2(0, T ) be a weak solution to the Euler-Lagrange

equation. Define

p(t) = p0 +

∫ T

t

DxL(x, x)ds,


with p0 ∈ Rn to be chosen later. For each ϕ ∈ C∞c (0, T ) taking values

in Rn we have ∫ T

0

d

dt(p · ϕ)dt = p · ϕ

∣∣T0

= 0.

Thus, ∫ T

0

−DxL(x, x)ϕ+ pϕdt = 0.

Using the Euler-Lagrange equation in the weak form we conclude that∫ T

0

(p +DvL(x, x))ϕdt = 0,

which implies that p +DvL is constant, that is,

p = −DvL(x, x),

choosing p0 conveniently. Since p is continuous, by the previous iden-

tity, x = −DpH(p,x). Therefore, x is continuous. Moreover, if H(p, x)

is the Hamiltonian associated to L, we have

p = DxH(p,x),

which shows that p is C1. But, since

x = −DpH(p,x),

we have that x is C1 and, therefore, x is C2.

5.4. Conjugate points. In this section we study the second vari-

ation of the action and certain issues concerning the existence of mini-

mizing trajectories. If the Lagrangian corresponds to the kinetic energy

in a Riemannian Manifold we also study the connections with curva-

ture.

5.4.1. Second variation and conjugate points. The next exercise es-

tablishes the connection between Jacobi equation (55) and the second

variation:

Exercise 83. Let x : [0, T ]→ Rn. Consider the functional

Y 7→∫ T

0

1

2D2vivj

LYiYj +D2xivj

LYiYj +1

2D2xixj

LYiYj.


Show that the Euler-Lagrange equation is Jacobi equation (55). Show

that if Y is a solution of the Jacobi equation with Y (0) = Y (T ) = 0

then

(64)

∫ T

0

1

2D2vivj

LYiYj +D2xivj

LYiYj +1

2D2xixj

LYiYj = 0.

Let x is a solution of the Euler-Lagrange equation corresponding

to the Lagrangian L. A point x(T ) is conjugate to x(0) if there exists

a non vanishing solution of (55) satisfying Y (0) = Y (T ) = 0. The

dimension of the space of solutions Y to the Jacobi equation which

satisfy Y (0) = 0 is n. Similarly, the space of solutions Y to the Jacobi

equation which satisfy Y (0) = 0 is also n. Since the space of solutions

to the Jacobi equation is 2n, in general the intersection of these two

spaces is 0-dimensional, i.e. it only contains the trivial solution.

Exercise 84. Let x be a solution to the Euler-Lagrange equation. Show

that Y (t) = ddλx(λt)

∣∣λ=0

is a solution to the Jacobi equation satisfying

Y (0) = 0.

Suppose L = 12gijvivj for some Riemannian metric g. Show that

Y (t) 6= 0, for all t 6= 0. Conclude that the space of solutions Y to the

Jacobi equation which satisfy Y (0) = Y (T ) = 0 is at most n− 1.

Theorem 48. Let L(x, v) be a C∞ Lagrangian, strictly convex and

coercive. Let x a solution of the Euler-Lagrange equation corresponding

to the Lagrangian L. Let T be such that x(T ) is conjugate to x(0). then

the trajectory x is not a local minimum of the action∫ T1

0

L(x, x)

para T1 > T .

Proof. Let Y be a non-trivial solution of the Jacobi equation com

Y (0) = Y (T ) = 0. for each ε > 0 consider the trajectory

xε =

x + εY if 0 ≤ t ≤ T

x otherwise.


for each δ > 0, computing the Taylor expansion up to second order and

taking into account (64) we obtain∫ T+δ

0

L(xε, xε) ≤∫ T+δ

0

L(x, x) +O(ε3).

However, if the sign of the term O(ε3) is negative, we obtain a contra-

diction, if it is positive, by replacing Y by −Y , we are in the previous

situation. Therefore the only non-trivial case occurs when the third

order term vanishes and we have:∫ T+δ

0

L(xε, xε) ≤∫ T+δ

0

L(x, x) +O(ε4).

Let ϕ be defined in the following way ϕ(t) = εY (t) if 0 ≤ t ≤ T − δ,ϕ(t) = 0 for t > T + δ and is linear in t for T − δ ≤ t ≤ T + δ,

interpolating between the values of ϕ(T − δ) = εY (T − δ) and 0 =

ϕ(T + δ). We would like to show that∫ T+δ

0

L(x + ϕ, x + ϕ) <

∫ T+δ

0

L(x, x),

if ε and δ were chosen conveniently. For that we will proceed to prove

a series of estimates. To simplify notation, and for reasons that will

be clear later on, we assume that δ = ε3/2 and we will use the relation

a ∼ b to denote a = b+O(ε4), and similarly for and≺ for inequalities.

We have∫ T+δ

T−δL(xε, xε) ∼

∫ T

T−δL(xε(T ), xε(T ))+

DxL(xε(T ), xε(T ))(xε(t)− xε(T ))+

DvL(xε(T ), xε(T ))(xε(t)− xε(T ))+∫ T+δ

T

L(x(T ), x(T ))+

DxL(x(T ), x(T ))(x(t)− x(T ))+

DvL(x(T ), x(T ))(x(t)− x(T )).

We observe that since |t− T | ≤ δ

x(t)− x(T ) = x(T )(t− T ) +O(δ2).


Furthermore

DxL(xε(T ), xε(T )) = DxL(x(T ), x(T )) +O(ε)

DvL(xε(T ), xε(T )) = DvL(x(T ), x(T )) +O(ε).

consequently,∫ T+δ

T−δL(xε, xε) ∼δL(x(T ), xε(T )) + δL(x(T ), x(T )

2δL

(x(T ),

xε(T ) + x(T )

2

)+ 2δγε2|Y (T )|2 ∼∫ T+δ

T−δL(x + ϕ, x + ϕ) + Cδε2.

and so,∫ T−δ

0

L(xε, xε) +

∫ T+δ

T−δL(x + ϕ, x + ϕ) ≺

∫ T+δ

0

L(x, x)ds− Cδε2,

which, for δ = ε3/2 and ε sufficiently small, implies that x does not

minimize the action between 0 and T + δ.

5.4.2. Curvature. The curvature tensor R is defined by

R(X, Y )Z = ∇X∇YZ −∇Y∇XZ −∇[X,Y ]Z.

Exercise 85. Show that

R(X, Y )Z = RlijkXiYzZk

∂

∂xl,

where

Rlijk =

∂Γjk∂xi− ∂Γik

∂xj+ ΓmjkΓ

lim − ΓmikΓ

ljm.

Exercise 86 (Bianchi’s identity). Show that for all vector fields X, Y, Z

R(X, Y )Z +R(Y, Z)X +R(Z,X)Y = 0.

Theorem 49. Let L be the Lagrangian be the kinetic energy defined

by a Riemannian metric. Consider a geodesic x with tangent vector

X = x. Then Jacobi’s equation can be written as

(65)D2Y

dt2= R(X, Y )X.


Proof. Consider a one-parameter family of geodesics φ(t, δ), that

is for each δ the mapping t 7→ ψ(t, δ) is a geodesic. Let

Y =∂φk∂δ

∂

∂xk

and

X =∂φk∂t

∂

∂xk.

We have [X, Y ] = 0, and so

R(X, Y )X = ∇X∇YX −∇Y∇XX −∇[X,Y ]X

= ∇X∇YX −∇Y∇XX

= ∇X∇YX,

since ∇XX = DXdt

= 0. Once more, using [X, Y ] = 0 and the fact that

the connection is symmetric, we have ∇YX = ∇XY which then yields

(65).

Lemma 50. For all vector fields X, Y, Z, we have

〈R(X, Y )Z,Z〉 = 0.

Proof. We have

〈∇Y∇XZ,Z〉 = Y 〈∇XZ,Z〉 − 〈∇XZ,∇YZ〉,

and

〈∇X∇YZ,Z〉 = X〈∇YZ,Z〉 − 〈∇YZ,∇XZ〉.Therefore

〈∇X∇YZ,Z〉 − 〈∇Y∇XZ,Z〉 = X〈∇YZ,Z〉 − Y 〈∇XZ,Z〉

= XY 〈Z,Z〉 − Y X〈Z,Z〉 −X〈Z,∇YZ〉+ Y 〈Z,∇XZ〉,

that is,

〈∇X∇YZ,Z〉 − 〈∇Y∇XZ,Z〉 =1

2[X, Y ]〈Z,Z〉.

Since

〈∇[X,Y ]Z,Z〉 = [X, Y ]〈Z,Z〉 − 〈Z,∇[X,Y ]Z〉we have

〈∇[X,Y ]Z,Z〉 =1

2[X, Y ]〈Z,Z〉,

which implies the desired identity.


Proposition 51. Let Y be a solution of (65) along a geodesic x whose

tangent vector is X = x and satisfies

D

dtX = 0.

Suppose that Y (0) = 0 and that

〈X, DYdt〉 = 0

at t = 0. Then 〈X, DYdt〉 = 0 for all t.

Proof. We have

d

dt〈X, D

dtY 〉 = 〈D

dtX,

D

dtY 〉+ 〈X, D

2

dt2Y 〉

= 〈X,R(X, Y )X〉 = 0,

taking into account that DdtX = 0, and using in the last identity lemma

50.

Suppose we are looking for solutions to the Jacobi equation satisfy-

ing Y (0) = 0 along a geodesic x. Consider the solution Y constructed

in exercise 84. Observe that ˙Y is tangent to the geodesic x. We can

write the solution Y = aY + Y⊥ where a ∈ R and Y⊥ is a solution to

the Jacobi equation such that Y⊥(0) = 0 and Y⊥(0) is orthogonal to

x(0). By the previous proposition, Y⊥(t) is orthogonal at all times to

x(t). Additionally, by exercise 84, Y (t) 6= 0 for all t 6= 0. Therefore,

if Y (T ) = 0 then a = 0. Consequently, to look for conjugate points it

suffices to consider initial conditions orthogonal to the geodesic.

A manifold has constant sectional curvature k0 if for all vector fields

X, Y,W,Z we have

〈R(X, Y )W,Z〉 = k0 [〈X,W 〉〈Y, Z〉 − 〈Y,W 〉〈X,Z〉] .

Exercise 87. Show that the sphere x2 + y2 + z2 has constant sectional

curvature.


Exercise 88. Let ei be an orthonormal basis for TpM . Show that if M

has constant sectional curvature then

Rijkl = 〈R(ei, ej)ek, el〉 = k0(δikδjl − δilδjk).

Example 26. Let M be a manifold with constant sectional curvature.

Let x be a geodesic in M with |x| = 1 and let Y be a Jacobi field

orthogonal to x. Then, Jacobi’s equation

D2Y

dt2= R(x, Y )x

can be written asD2Y

dt2= k0Y,

since for each vector field X we have

〈R(x, Y )x, X〉 = k0(〈x, x〉〈Y,X〉 − 〈x, X〉〈Y, x〉) = k0〈Y,X〉.

Thus, depending on the sign of k0, we obtain the following solutions

Y (t) =

sin t√k0e(t) if k0 > 0

te(t) if k0 > 0

sinh t√−k0e(t) if k0 < 0,

where e(t) is a parallel vector field. As a conclusion, if the sectional

curvature is negative, the geodesics cannot have conjugate points. J

5.4.3. Computation of conjugate points. In this section we explicitly

compute conjugate points.

Example 27 (Sphere). Consider a sphere of radius 1 in spherical co-

ordinates (θ, ϕ) as in exercise (50). The Euler-Lagrange equations are ddt

(sin2 ϕθ) = 0

ddtϕ+ sinϕ cosϕθ2 = 0.

And the corresponding Jacobi equation isddt

[sin2 ϕpθ

]+ d

dt

[2 sinϕ cosϕθpϕ

]= 0

ddtpϕ + cos2 ϕpϕθ

2 − sin2 ϕpϕθ2 + 2 sinϕ cosϕθpθ = 0.


Consider a geodesic ϕ = π2, θ = 1 (the equator). In this case the Jacobi

equation is pθ = 0

pϕ + pϕ = 0,

which has as a particular solution

pϕ = sin t pθ = 0,

which shows that θ = π is conjugated to θ = 0. J

Example 28 (Lobatchewski plane). Consider the following metric in

the upper semiplane (y > 0) given by

g =

[1y2 0

0 1y2

].

The geodesics minimize ∫x2 + y2

2y2,

and, consequently, are solutions to the Euler-Lagrange equation

d

dt

[x

y2

]= 0,

d

dt

[y

y2

]+

x2 + y2

y3= 0.

Consider vertical geodesics, that is with x = 0. Then

d

dt

[y

y2

]+

y2

y3= 0,

which admits

y = ae−t

as a solution.

The Jacobi equation is

d

dt

[pxy2

]− d

dt

[2xpyy3

]= 0

d

dt

[pyy2− 2

ypyy3

]+ 2

xpx + ypyy3

− 3x2 + y2

y4py = 0.


Observe that to determine the conjugate points we only need to con-

sider solutions which are orthogonal to the geodesic. So we for vertical

geodesics we can set px = 0, and x = 0. Thus

d

dt

[pyy2− 2

ypyy3

]+ 2

ypyy3− 3

y2

y4py = 0

Set p = pyy2 . Then

p + 2y

yp− 3

y2

y2p = 0

Since y = −y we have

p− p = 0.

We leave as a (simple) exercise to check that therefore there are no

conjugate points. J

5.4.4. Cut locus.

Theorem 52. Let x be a solution of the Euler-Lagrange equation and

let T > 0 be the infimum of all t for which x is not a minimizing

trajectory. Then either x(0) and x(T ) are conjugated or there exists y

such that y(0) = x(0) and y(T ) = x(T ) such that∫ T

0

L(x, x) =

∫ T

0

L(y, y).

Proof. Since for t > T the trajectory x is not minimizing, there

exist ti > T and solutions to the Euler-Lagrange equation yi such

that ti → T , yi(0) = x(0), yi(ti) = x(ti). By the proof of theorem

45, which guarantees the existence of minimizing trajectories, we can

assume that yi(0) is uniformly bounded. Then yi(0)→ y(0), through

some subsequence. If y(0) 6= x(0), it is easy to check that we have∫ T

0

L(x, x) =

∫ T

0

L(y, y),

otherwise the trajectory x would not be minimizing for t < T . In

the second case, consider the flow φ(x, v, t) = (φx, φv) given by the

Euler-Lagrange equations with initial conditions (x, v) at 0, that is,

φ(x(0), x(0), t) = (x(t), x(t). If x(0) is not conjugated to x(T ) the

6. SYMMETRIES AND NOETHER THEOREM 105

matrix Dvφx(x(0), x(0), T ) is non singular, therefore for v in a neigh-

borhood of x(0) and t sufficiently close to T the mapping

v 7→ φ(x(0), v, t)

is a diffeomorfism. But yi(0) → x(0) and yi(ti) = xi(ti) which is a

contradiction.

6. Symmetries and Noether theorem

Noether’s theorem concerns variational problems which admit sym-

metries. By this theorem, associated to each symmetry there is a

quantity that is conserved by the solutions of the Euler-Lagrange equa-

tion. In classical mechanics, for instance, translation symmetry yields

conservation of linear momentum, to rotation symmetry corresponds

conservation of angular momentum and time-invariance implies energy

conservation.

6.1. Routh’s method. We start the discussion of symmetries by

considering a classical technique to simplify the Euler-Lagrange equa-

tions. Consider a Lagrangian of the form L(x, x, y), that is, indepen-

dent of the coordinate y. Note that this corresponds to translation

invariance in the coordinate y. The Euler-Lagrange equation shows

that

py = −DyL(x, x, y)

is constant. We will explore this fact to simplify the Euler-Lagrange

equations. We assume further that w 7→ L(x, x, w) is strictly convex

and superlinear. Then we define the partial Legendre transform with

respect to y, Routh’s function, as

R(x, x,py) = supw−py · w − L(x, x, w).

By convexity, the supremum is achieved at a unique point w(x, x,py).

We have that

py = −DwL y = −DpyR.


Note that, by the Euler-Lagrange equation

py = 0

and,

d

dt

∂R

∂x− ∂R

∂x=− d

dt

∂L

∂x+∂L

∂x− d

dt

[∂L

∂w

∂w

∂x+ py

∂w

∂x

]+∂L

∂w

∂w

∂x+ py

∂w

∂x

=d

dt

∂L

∂x− ∂L

∂x= 0.

Therefore, since py is constant, we can solve these equations in the

following way: for each fixed py consider the equation

d

dt

∂R

∂x− ∂R

∂x= 0.

Once this equation is solved, determine y through

y = −DpyR(x, x,py).

Exercise 89. Apply Routh’s method to the Lagrangian

L =x2

2+

y2

2− U(x).

Exercise 90. Apply Routh’s method to the symmetric to in an external

field which has as Lagrangian

L =I1

2(θ2 + ϕ2 sin2 θ) +

I3

2(ψ + ϕ cos θ)2 − U(ϕ, θ).

Exercise 91. Apply Routh’s method to the spherical pendulum whose

Lagrangian is:

L =θ2 sin2 ϕ+ ϕ2

2− U(ϕ).

6.2. Noether theorem. As a motivation for the definition of in-

variance of a Lagrangian with respect to a transformation group, ob-

serve that if φ : Rn → Rn is a diffeomorphism and γ : [0, T ] → Rn is

an arbitrary curve, then φ(γ) is another curve in Rn whose velocity is

Dxφ(γ)γ. Suppose for each τ ∈ R, φτ : Rn → Rn is a diffeomorphism.


We say that a Lagrangian L(x, v) is invariant under a transformation

group φτ (x) if for each τ ∈ R

L(x, v) = L(φτ (x), Dxφτ (x)v).

We will assume additionally φτ is differentiable in τ .

Theorem 53. Let L be a Lagrangian invariant under a smooth trans-

formation group φτ (x). Let x be a solution of the Euler-Lagrange equa-

tion. then

DvL(x(T ), x(T ))d

dτφτ (x(T ))

∣∣∣∣τ=0

is independent of T .

Proof. Let x be a solution of the Euler-Lagrange equation and

xτ (t) = φτ (x(t)).

Then

xτ = Dxφτ (x(t))x(t).

Consequently,

(66)

∫ T

0

L(xτ , xτ )

is constant in τ . Differentiating (66) with respect to τ we obtain∫ T

0

DxL(xτ , xτ )dxτdτ

+DvL(xτ , xτ )dxτdτ

= 0.

Integrating by parts, using the Euler-Lagrange equation, and taking

τ = 0 we obtain

DvL(xτ (0), xτ (0))d

dτφτ (x(0))

∣∣∣∣τ=0

= DvL(xτ (T ), xτ (T ))d

dτφτ (x(T ))

∣∣∣∣τ=0

.

Exercise 92. Let ω ∈ Rn and L(x, v) be a Lagrangian satisfying, for

all τ , L(x+ωτ, v) = L(x, v). Show that DvL·ω is a constant of motion.


Exercise 93. Let L(x, y, vx, vy) =v2x+v2

y

2− x2+y2

2. Show that L is in-

variant by rotations and, using Noether’s theorem, that the angular

momentum xvy − yvx is a constant of motion.

Theorem 54. Suppose L is a Lagrangian which does not depend on t.

Then the energy is conserved.

Proof. Observe that∫ T+h

h

L(x(t− h), x(t− h))dt

is independent on h. Differentiate with respect to h, integrate by parts

using the Euler-Lagrange equation.

Example 29. Consider the Lagrangian

L =x2 + y2

2y2,

corresponding to the geodesic flow in the Lobatchewski plane. Iden-

tifying the upper semi-plane with z ∈ C : =(z) > 0 and the points

(x, y) with z = x+ iy, the mapping

z 7→ az + b

cz + d

defines an action of the group SL(2,R), the group of matrices with unit

determinant, in the Lobatchewski plane, which leaves the Lagrangian

invariant. Use matrices of the form

A1(τ) =

[1 τ

0 1

], A2(τ) =

[eτ 0

0 e−τ

]e A3(τ) =

[1 0

τ 1

],

we obtain the conservation laws

x

y2,

xx + yy

y2and

x(x2 − y2) + 2yxy

y2.

J

Exercise 94. Obtain the general law F (x,y) = 0 of motion of a geo-

desic in the Lobatchewski plane.


6.3. Monotonicity formulas. As before, let L.Rn × Rn → R be

a smooth Lagrangian. A sub-symmetry (resp. super-symmetry) of L

is a (smooth) one-parameter mapping φτ (x) such that φ0(x) = x and

d

dτL(φτ (x), Dxφτ (x)v)

∣∣∣∣τ=0

≤ 0 (resp. ≥ 0).

A simple variation of the proof of Noether’s theorem yields:

Theorem 55. Let φτ be a sub-symmetry of L. Then

d

dt

[DvL(x, x)

d

dτφτ (x)

∣∣∣∣τ=0

]≤ 0,

with the opposite inequality for super-symmetries.

Proof. It suffices to observe that

0 ≥ d

dτ

∫ T

0

L(φτ (x), Dxφτ (x)x)dt

∣∣∣∣τ=0

=

∫ T

0

DxL(x, x)d

dτφτ (x)

∣∣∣∣τ=0

+DvL(x, x)d

dt

d

dτφτ (x)

∣∣∣∣τ=0

= DvL(x, x)d

dτφτ (x)

∣∣∣∣τ=0

∣∣∣∣T0

,

which then implies the result.

An application of this theorem is the following corollary:

Corollary 56. Let L(x, v) : Rn × Rn → R be smooth Lagrangian ad-

mitting a strict sub-symmetry. Then the corresponding Euler-Lagrange

equations does not have periodic orbits.

Next we present some additional examples and applications.

Example 30. Suppose, for some y ∈ Rn and h ≥ 0, L(x + hy, v) ≤L(x, v), then

d

dtDvL(x, x)y ≤ 0.

J


Example 31. Consider the case in which L(λx, λv) is increasing in λ,

for λ ≥ 0. Thend

dtDvL(x, x)x ≥ 0.

J

Example 32. Consider the mapping φτ (x) = x + τF (x), and assume

thatd

dτL(x+ τF (x), v + τDxFv) ≤ 0,

at τ = 0. Thend

dtDvL(x, x)F (x) ≤ 0.

Consider the case L = |v|22

, and F = ∇U , for some concave function U .

Thend

dτ

|(I + τD2U)v|2

2

∣∣∣∣τ=0

= vTD2Uv ≤ 0.

Thusd

dt∇U · v ≤ 0,

that isd2

dt2U(x) ≤ 0,

that is U(x(t)) is a concave function. J

Example 33. Consider a system of n non-interacting particles, and

set

U =∑i 6=j

|xi − xj|.

Clearly U is a convex function. By the previous example we have

d2

dt2|xi − xj| ≥ 0.

J

Exercise 95. Consider a smooth Lagrangian of the form

e−αtL(x, v)

This Lagrangian is sub-invariant in time.

7. CRITICAL POINT THEORY 111

1. Prove thatd

dte−αtE(t) ≥ 0,

where

E = DvL(x, x)x− L(x, x).

In particular, show that this estimate yields exponential blow up

of the energy.

2. Impose conditions upon L that ensure that exponential blow up

of the kinetic energy can also be bounded using simple estimates

by E(t) ≤ Ceβt.

Exercise 96. Consider the Lagrangian:, L : Rn × Rn → R

L(x, v) =1

β|v|β − 1

α|x|α.

Deduce the Virial theorem:

limT→∞

1

T

∫ T

0

|x|β = limT→∞

1

T

∫ T

0

|x|α

Hint: use the scaling transformation x→ λx, for λ in a neighborhood

of 1.

7. Critical point theory

In this section we discuss methods to construct non-minimizing

critical points.

7.1. Some informal computations. Let T > 0 be given. For

a ∈ Rn, let xa be an orbit which minimizes the action under the con-

straint x(0) = x(T ) = a. In general x(0) 6= x(T ) this orbit does not

have period T . Let I[a] be the function that associates to a the action

xa:

I[a] =

∫ T

0

L(xa, xa)dt.

At the maxima or minima of I[a], if I is differentiable

I ′[a] = 0,


that is (assuming xa is differentiable in a)

0 =

∫ T

0

DxL(xa, xa)Daxa +DvL(xa, xa)Daxa.

Integrating by parts and using the fact that xa satisfies the Euler-

Lagrange equation, we obtain

DvL(xa(0), xa(0)) = DvL(xa(T ), xa(T )),

which is equivalent to p(0) = p(T ) and, if the Legendre transform

v 7→ p = −DvL is injective (see exercise 97), implies xa(0) = xa(T ).

Thus we conclude that the orbits corresponding to maxima or minima

of I[a] are T periodic.

Exercise 97. Show that if L(x, v) if strictly convex in v then then

v 7→ DvL(x, v)

is injective.

In general the differentiablity of xa is hard to establish and in the

next section we work around this problem using mountain pass tech-

niques.

7.2. Mountain pass lemma. LetH be a Hilbert space with inner

product (·, ·). Consider a functional Φ : H → R. Φ is differentiable if

there exists a function Φ′(u) ∈ H such that

lim‖u−v‖→0

‖Φ(u)− Φ(v)− (Φ′(u), v − u)‖‖u− v‖

= 0.

A function Φ is C1 if Φ′ exists and is continuous. Similarly Φ is C1,1 if

Φ′ is Lipschitz.

A point u ∈ H is a critical point if Φ′(u) = 0. The set of critical

points in the level set Φ(u) = c is denoted by

Kc = u : Φ′(u) = 0, Φ(u) = c.


A functional Φ : H → R satisfies the Palais-Smale condition if any

sequence (uk) ∈ H satisfying sup |Φ(uk)| ≤ C and ‖Φ′(uk)‖ → 0 is

pre-compact, that is, it admits a convergent subsequence.

Lemma 57 (Deformation lemma). Let Φ : H → R be a functional

satisfying the Palais Smale condition. Let c ∈ R be such that Kc = ∅.Then, there exists ε > 0 and δ > 0 and a continuous function η :

[0, 1]×H → H such that

1. η0(u) = u;

2. η1(u) = u if |Φ(u)− c| > ε;

3. Φ(ηt(u)) ≤ Φ(u);

4. Φ(η1(u)) ≤ c− δ if Φ(u) ≤ c+ δ.

Proof. Firstly, we claim that there exist non-negative real num-

bers σ and ε such that

|Φ(u)− c| < ε =⇒ ‖Φ′(u)‖ ≥ σ.

To show this claim, assume by contradiction that there exist sequences

σk → 0 and εk → 0 such that |Φ(uk) − c)| ≤ εk and ‖Φ′(uk)‖ ≤ σk.

This implies the existence of a convergent subsequence of uk with limit

u. This vector is a critical point, which is a contradiction.

Choose δ, 0 < δ < ε and 0 < δ < σ2

2. Define

A = u : |Φ(u)− c| > ε, B = u : |Φ(u)− c| < δ.

Let

g(u) =dist(u,A)

dist(u,A) + dist(u,B),

0 ≤ g ≤ 1. We have that g ≡ 0 in A and g ≡ 1 in B. Let also

h(t) =

1 if 0 ≤ t ≤ 1

1t

if t > 1.

Consider

V (u) = −g(u)h(‖Φ′(u)‖)Φ′(u).


For each u ∈ H consider the equation

(67) ηt = V (ηt),

with η0 = u. We have that

d

dtΦ(ηt) ≤ 0.

If ηt ∈ B thend

dtΦ(ηt) ≤ −σ2

and for ηt ∈ A then V ≡ 0.

Finally, to end the proof, it is enough to observe that if |Φ(u)−c| < δ

then we have Φ(η1) ≤ c− δ since σ2

2> δ.

Exercise 98. Show that the solution of (67) is continuous on the initial

condition u.

Theorem 58 (Mountain pass). Let Φ be a C1,1 functional satisfying

the Palais Smale condition. Suppose that

1. Φ(0) = 0;

2. Φ(u) ≥ a if ‖u‖ = r, where a, r > 0

3. there exists v ∈ H such that Φ(v) ≤ 0, with ‖v‖ > r.

Let

Γ = g ∈ C([0, 1], H) : g(0) = 0, g(1) = v

then the set Kc, with

c = infg∈Γ

max0≤t≤1

Φ(g(t)),

is non-empty.

Proof. Clearly c > a. Suppose that Kc = ∅. Choose ε < a2

and

apply the deformation lemma to construct the homeomorphism η. Let

g be such that

max0≤t≤1

Φ(g(t)) ≤ c+ δ.


Then

max0≤t≤1

Φ(η(g(t))) ≤ c− δ,

which is a contradiction.

Exercise 99. Consider the Lagrangian

L(x, v) =1

2|v|2 +

1

2x2 − εx

4

4.

Let Φ be the functional

Φ(x) =

∫ 1

0

L(x, x)ds

defined in H1per(0, 1).

1. Show that Φ is differentiable and show that its derivative is give

by

〈Φ′(x),y〉 =

∫ 1

0

xy + xy − εx3y

2. Show that Φ′(x) is Lipschitz in x, that is, the vector w ∈ H

that satisfies

〈Φ′(x),y〉 =

∫ 1

0

zy + zy,

is a Lipschitz function of x.

3. Show that Φ satisfies the Palais-Smale condition:

(a) Let xn be a sequence satisfying Φ(xn) ≤ C and Φ′(xn)→ 0.

Show that ∫ 1

0

x2n + x2

n ≤ C.

(b) Show that this implies that, through a subsequence, xn x,

for some function x in H1per(0, 1) and that xn → x uni-

formly.

(c) Use the fact that Φ′(xn) → 0 in H1per(0, 1) to show that

xn → x in H1per(0, 1) using Lax-Milgram theorem.

4. Show that x ≡ 0 is a strict local minimum of the action, that

is,

Φ[x] ≥ α‖x‖H1per,

for some α > 0 and ‖x‖ sufficiently small.


5. Show that there exists a curve y 1-periodic that satisfies Φ[y] <

0.

6. Prove the existence of a non-trivial 1-periodic solution Euler-

Lagrange equation.

8. Invariant measures

An important issue in dynamical systems are invariant measures

under the flow induced by a vector field. In this section we review

some results and construct invariant measures under the Hamiltonian

flow.

Lemma 59. Let µ be a measure on a manifold M . Let χ be a smooth

vector field on M . The measure µ is invariant with respect to the flow

generated by the vector field χ iff for any smooth compactly supported

function ξ : M → R we have∫M

∇ξ · χdµ = 0.

Proof. Let Φt be the flow, generated by the vector field χ. Then if

µ is invariant under Φt, for any smooth compactly supported function

ξ(x) and any t > 0 we have∫ξ(Φt(x)

)− ξ(x)dµ = 0.

By differentiating with respect to t, and setting t = 0, we obtain the

“only if” part of the theorem.

To establish the converse, we have to prove that for any t the mea-

sure µt is well-defined as

µt(S) = µ((Φt)

−1(S)).

and coincides with µ.

8. INVARIANT MEASURES 117

By the Riesz representation theorem it is sufficient to check that

the identity ∫ξdµ =

∫ξdµt

holds for any continuous function ξ (vanishing at ∞). Any continuous

function can be uniformly approximated by smooth functions. There-

fore it is sufficient to prove the above identity for smooth functions ξ

with compact support.

Assume, without loss of generality, that ξ(x) is a C2-smooth func-

tion. Fix t > 0. We have to prove that∫ξ(Φt(x)

)− ξ(x)dµ = 0.

We have∫ξ(Φt(x)

)− ξ(x)dµ =

N−1∑k=0

∫ξ(Φt(k+1)/N(x)

)− ξ(Φtk/N(x)

)dµ

=N−1∑k=0

∫ξk(Φt/N(x)

)− ξk(x)dµ ,

where ξk(x) = ξ(Φtk/N(x)

)N−1∑k=0

∫ξk(Φt/N(x)

)− ξk(x)dµ

=N−1∑k=0

∫∇ξk(x) ·

(Φt/N(x)− x

)+O( t

N2 )dµ =

=N−1∑k=0

∫∇ξk(x) ·

(tNχ(x) +O( t

N2 ))

+O( tN2 )dµ

= tN

N−1∑k=0

∫∇ξk(x) · χ(x)dµ+O( t

N) = O( t

N).

Taking the limit N →∞ we complete the proof.

Exercise 100. Consider a measure on R2n with density eβH(p,x). Show

that this measure is invariant under the Hamiltonian flow generated by

H.


Exercise 101. Show that the Hamiltonian flow preserves area in phase

space.

Example 34. Let u(x, P ) be a solution of H(P + Dxu, x) = H(P ).

Then the graph

(68) p = P +Dxu(x, P ),

is invariant under the flow generated by (75). Furthermore, the flow

restricted to this graph is conjugated to a translation as X is constant.

If the Hamiltonian H(p, x) is Zn periodic in x, and u is a Zn periodic

function, that is H(p, x + k) = H(p, x), and u(x + k) = u(x), for

all p, x ∈ Rn and k ∈ Zn, the graph (68) can be interpreted as an

invariant torus. Furthermore, as the Lebesgue measure dX in the new

coordinates is invariant under the Hamiltonian dynamics, the change

of variables formula implies that the measure supported in the graph

(68) with density

(69) θ(x)dx = det(I +D2Pxu)dx

is an invariant measure.

9. Non convex problems

This section is an introduction to the calculus of variations for non-

convex Lagrangians.

Exercise 102. Suppose that

lim|v|→∞

L(x, v)

|v|→ ∞,

uniformly in x. Show that any C1 minimizing sequence of the action

with fixed endpoints is equicontinuous.


minx(−1)=0,x(1)=1

∫ 1

−1

|tx(t)|2dt.

Show that

xn =1

2+

arctannx

2 arctann

10. GEOMETRY OF HAMILTONIAN SYSTEMS 119

is a minimizing sequence that does not converge uniformly.

Exercise 104 (Discontinuities). Let L(x, v) : R2n → R be a C2 func-

tion. Consider a continuous trajectory x(·), sectionally C1. Suppose

that x is a minimizer of ∫ T

0

L(x, x)dt,

over all piecewise C1 curves which satisfy fixed boundary conditions.

Let t0 be a point where x is discontinuous with left and right limits v±.

Determine an equation that relates v+ with v−. Show that, if L(x, v)

is strictly convex in v, the continuous minimizers which are sectionally

C2 and whose left and right derivatives exist at all points are of class

C1.

Exercise 105 (Lavrentiev phenomenon). Consider the variational prob-

lem

minu(0)=1,u(1)=1

∫ 1

0

(u3 − t)2u6.

Show that u = t1/3 minimizes this problem when the minimum is taken

over continuous functions u on [0, 1] and differentiable in (0, 1). How-

ever, for any sequence uk of continuous functions on [0, 1] satisfying

uk(0) = 0 and uk(1) = 1 with bounded derivative and converging point-

wise to x1/3 we have ∫ 1

0

(u3k − t)2u6

k →∞.

10. Geometry of Hamiltonian systems

We can discuss the Hamiltonian formalism using a more geometric

approach. Suppose for now that in (47) we can apply the minimax

principle and exchange infv supp with supp infv. In this case we obtain

the problem

(70) infx(·)

supp(·)

∫ T

0

−H(p,x)dt− p · xdt.


To generalize the problem, suppose the variable x represents a point in

a manifold M and consider the differential form on [0, T ]× T ∗M

σ = −Hdt− α,

with α = pdx. Then (70) is equivalent to determining critical points of

(71)

∫γ

σ

over all curves (x,p) : [0, T ] → T ∗M . In a more general setting,

suppose we are given an even dimensional manifold S, which replaces

T ∗M , and is endowed with a 1−form α such that dα is non-degenerate.

Let H : S → R, we would like to determine the critical curves, γ, of

(71). That is, curves γ such that for all C1 variation γτ we have

d

dτ

∫γτ

σ

∣∣∣∣τ=0

= 0.

Let γ : [0, T ] → S be a critical point, XH : [0, T ] → TS a tangent

vector to the curve γ, Y vector field in S with Y (γ(0)) = Y (γ(T )) = 0

and, finally, set φτ = exp(τY ). Consider

i(τ) =

∫γ

σ =

∫ T

0

−H(φτ (γ))dt− αφτ (γ)(Dφ∗τXH)dt

along φτ (γ).


di(τ)

dτ

∣∣∣∣τ=0

=

∫ T

0

−dH(x,p)(Y )− dα(XH , Y ).

e, therefore, the critical points satisfy

dα(XH , ·) = −dH(·).

Hint: Observe that

d

dταφτ (γ)(Dφ

∗τXH)

∣∣∣∣τ=0

= LY α(XH),

and recall that LY α = d(iY α) + iY (dα).

10. GEOMETRY OF HAMILTONIAN SYSTEMS 121

A symplectic manifold is an even dimensional manifold S endowed

with a closed non-degenerate 2-form ω (recall that a form is non-

degenerate if for all non-zero vector field X, ω(X, ·) is non-zero). Given

a Hamiltonian H : S → R, the vector field XH which generates the

Hamiltonian flow is uniquely determined by the equation

ω(XH , ·) = −dH.

It is important to observe that the form ω is only required to be closed,

and not exact. Locally this distinction is irrelevant, but it has impor-

tant consequences at the global level.

Exercise 107. Consider R4 with the symplectic form ω = dp1 ∧ dx1 +

2dp2 ∧ dx2. Let H : R4 → R. Determine XH .

To determine the vector field XH it is necessary to solve the system

of linear equations iXHω = −dH. To avoid this problem, we introduce

the Poisson bracket o F,G of two functions F and G defined as

F,G = ω(XF , XG).

Exercise 108. Show that F,G = XF (G). In this way we can identify

F, · = XF .

Exercise 109. Let ω =∑

i dpi ∧ dxi. Determine the Poisson bracket.

Exercise 110. Show that ·, ·

1. is bilinear;

2. anti-symmetric;

3. satisfies the Leibnitz rule:

F,GH = F,GH + F,HG;

4. satisfies the Jacobi identity:

F, G,H+ H, F,G+ G, H,F = 0.

A Poisson manifold is a manifold P (in arbitrary dimension) en-

dowed with a bracket ·, · satisfying the properties 1-4 of the previous

exercise.


Exercise 111. Show that using the Poisson one can define the vector

field corresponding to a Hamiltonian H through the identification of

H, · with the vector field XH .

Exercise 112. Let M be a Poisson manifold and F1, F2 : M → R such

that

F1, F2 = C.

Show that

[XF1 , XF2 ] = 0.

Hint: Consider [XF1 , XF2 ]g for arbitrary g : M → R.

11. Perturbation theory

Exercise 113. Consider the Hamiltonian H : R4 → R given by

H(p, x) = ω · p+ ε sin(x1 + x2).

Assume that ω ∈ R2 satisfies ω · k > 0 for all k ∈ Z2. Show that

exists a canonical transformation(x, p) 7→ (X,P ) such that the new

Hamiltonian is

H(P ) = ω · P.Consider now the case

H(p, x) =|p|2

2+ ω · p+ ε sin(x+ y).

Show that in a neighborhood of P = 0 we have, using the same change

of coordinates,

H(P,X) = ω · P +O(ε2 + |P |2).

We consider Hamiltonians of the form

(72) Hε(p, x) = H0(p) + εH1(p, x),

with H0, H1 smooth, H0(p) strictly convex and H1(p, x) bounded with

bounded derivatives, and Zn periodic in x. We would like to approxi-

mate the solutions of

(73) Hε(P +Dxuε, x) = Hε(P ),

11. PERTURBATION THEORY 123

We are given a reference value P = P0, and we assume that for

ε = 0 the rotation vector ω0 = DPH0(P0) satisfies Diophantine non-

resonance conditions

(74) |ω0 · k| ≥C

|k|s,

for some positive constant C and some real s > 0.

In this section we review the classical perturbation theory for Hamil-

tonian systems using a construction equivalent to the Poincare nor-

mal form near an invariant tori. Somewhat incorrectly, but following

[AKN97], we call it the Linstedt series method. Although these re-

sults are fairly standard, see [AKN97], for instance, we present them

in a more convenient form for our purposes.

Consider the Hamiltonian dynamics:

(75)

x = −DpHε(p,x)

p = DxHε(p,x),

we use the convention that boldface (x,p) are trajectories of the Hamil-

tonian flow and not the coordinates (x, p). The Hamilton-Jacobi inte-

grability theory suggests that we should look for functions Hε(P ) and

uε(x, P ), periodic in x, solving the Hamilton-Jacobi equation:

(76) Hε(P +Dxuε, x) = Hε(P ).

Then, by performing the change of coordinates (x, p)↔ (X,P ) deter-

mined by:

(77)

X = x+DPuε

p = P +Dxuε,

the dynamics (75) is simplified toX = −DPH(P)

P = 0,

we use again the convention that boldface (X,P) are trajectories of the

Hamiltonian flow and not the new coordinates (X,P ).


If u is an approximate solution to (132) satisfying

(78) Hε(P +Dxu, x) = Hε(P ) + f(x, P ),

then the change of coordinates (77) transforms (75) into

(79)

X = −DPHε(P)−DPf(X,P)

P = DXf(X,P),

with the convention that f(X,P ) = f(x(X,P ), P ).

The KAM theory deals with constructing solutions of (132) by using

an iterative procedure, a modified Newton’s method, that yields an

expansion

uε = u0 + εv1 + ε2v2 · · · .

The main technical point in KAM theory is to prove the convergence of

these expansions. An alternate method that yields such an expansion

is the Linstedt series [AKN97]. However we should point out that

whereas the KAM expansion is a convergent one, the Linstedt series

may fail to converge. Nevertheless, since we will only need finitely many

terms we will use a variation of the Linstedt series that we describe

next.

We say that a vector ω ∈ Rn is Diophantine if for all k ∈ Zn\0,|ω · k| ≥ C

|k|s , for some C, s > 0. Let P0 be such that ω0 = DPH0(P0)

is Diophantine. We look for an approximate solution of

Hε(P +Dxuε(x, P ), x) = Hε(P ),

valid for P = P0 +O(ε). When ε = 0, H0(P ) = H0(P ) and the solution

u0 is constant, for instance we may take u0 ≡ 0. For ε > 0 we have,

formally, uε = O(ε), and so we suggests the following approximation

uεN to uε:

uεN =εv1(x, P0) + ε(P − P0)DPv1(x, P0) + ε2v2(x, P0)+(80)

+1

2ε(P − P0)2D2

PPv1(x, P0) + ε2(P − P0)DPv2(x, P0)+

+ ε3v3(x, P0) + · · · ,

11. PERTURBATION THEORY 125

this expansion is carried out up to order N − 1 in such a way that,

formally uε − uεN = O(εN). For example

uε1 = 0, uε2 = εv1, uε3 = εv1 + ε2v2 + ε(P − P0)DPv1.

The functions vi and DkPkvi satisfy transport equations

DpH0(P0)Dxw = f(· · · ),

for some suitable f , and can be solved inductively. For instance:

H1(P0) = DpH0(P0)Dxv1 +H1(P0, x),

DPH1(P0) = DpH0(P0)Dx(DPv1) +D2ppH0(P0)Dxv1 +DpH1(P0, x),

and

H2(P0) =

DpH0(P0)Dxv2 +1

2D2ppH0(P0)Dxv1Dxv1 +DpH1(P0, x)Dxv1.

Note that the derivatives of vi with respect to P , DkPkvi, are computed

by solving appropriate transport equations, as is illustrated above for

DPv1, and not by differentiating vi. In fact vi may not be defined for

P 6= P0. However if its derivative exists it satisfies a transport equation.

The constantsH1(P0), DPH1(P0), H2(P0)... are uniquely determined

by integral compatibility conditions, for example,

H1(P0) =

∫H1(P0, x)dx,

DPH(P0) =

∫DpH1(P0, x)dx,

and

H2(P0) =

∫1

2D2ppH0(P0)Dxv1Dxv1 +DpH1(P0, x)Dxv1dx.

If H is sufficiently smooth and ω0 is non-resonant then these equations

have smooth solutions that are unique up to constants. Finally one

can check that

(81) Hε(P +DxuεN , x) = HN

ε (P ) +O(εN + |P − P0|N),


with

HNε (P ) = H0(P0) + εH1(P0) + (P − P0)DPH0(P0) + ε2H2(P0) + · · · ,

and this expansion is carried up to order N − 1 in such a way that

formally

Hε(P ) = HNε (P ) +O(εN + |P − P0|N).

Consider the change of coordinatesp = P +DxuεN(x, P )

X = x+DP uεN(x, P ).

Then, by (78) and (79), (75) is transformed into:X = −DPHε(P) +O(εN + |P− P0|N−1)

P = O(εN + |P− P0|N).

12. Bibliographical notes

There is a very large literature on the topics of this chapter. The

main references we have used were [Arn66] and [AKN97]. Two clas-

sical physics books on this subject are [Gol80] and [LL76]. On the

more geometrical perspective, the reader may want to look at [?] (see

also [?]) and [Oli98]. Additional material on classical calculus of vari-

ations can be found in [?] and the classical book [?]. In what concerns

symmetries, additional material can be consulted in [?]. A very good

reference in Portuguese is [?].

3

Calculus of variations and elliptic equations

The objective of this chapter is to study the existence and regularity

of minimizers of functionals of the form

I[u] =

∫U

L(Du, u, x)dx,

where U is a open subset of Rn, and L : Rn×m × Rn × U → R is a

suitable Lagrangian. The models we will consider are quite simplified,

illustrating, however, the ideas used in more general cases. Moreover,

we will only establish regularity for minimizers in the interior of U ,

avoiding, thus, the study of the behavior up to the boundary that,

frequently, is quite technical. Also, to simplify, we assume that U is

bounded and has a regular boundary. The interested reader will be

able to find, in higher generality, the results studied in this chapter in,

for instance, [Gia83], [Gia93], or [GT01]. We will consider both the

scalar case m = 1 and vectorial case m > 1. However, as the theory is

more complete in the scalar case, we will prove a few more results.

We will start by establishing necessary conditions for a function

to be a minimizer, and then, as before we proceed with studying the

existence of minimizers using the direct method in the calculus of varia-

tions. This guarantees the existence, for instance if L(p, z, x) is convex

in p and satisfies certain growth conditions. Then, we will show that

these minimizers are weak solutions to the Euler-Lagrange equation

(82) − divxDpL+ Lz = 0.

Although the results that we prove are valid for more general problems,

in a significant part of this chapter, we consider the particular case

127

128 3. CALCULUS OF VARIATIONS AND ELLIPTIC EQUATIONS

where

(83) L = L(p)− zf(x).

The regularity theory for elliptic equations addresses this problem and

establishes conditions under which u is smooth enough so that it sat-

isfies (82) in the classical sense.

The study of the regularity of elliptic equations follows several steps.

First the energy methods show that the minimizers are in W 2,2(Ω) and

solve

divDpL(Du) = f.

This establishes the existence of second derivatives in the weak sense.

Then we will try to show that these are classic solutions to the Euler-

Lagrange equation. This is a second order partial differential equation,

thus we will try to establish that u ∈ C2,α.

We will first will consider the deGiorgi-Nash Moser Holder estimates

for elliptic equations

(84) (aij(x)vxi)xj = (f(x))xk ,

with f ∈ Lp and aij uniformly elliptic, that is,

θ|χ|2 ≤ aij(x)χiχj ≤ Θ|χ|2.

These estimates imply that the solutions of (84) are Holder continuous

independently of the regularity of aij.

We will use the following strategy: each of the derivatives v = uxkof u is a weak solution of

(85) − (D2pipj

Lvxj)xi = fxk .

That is, rewriting the equation, we conclude that v solves an equation

of the form

−(aijvxi)xj = fxk .

The deGiorgi-Nash Moser estimates imply that Du is Holder continu-

ous. Therefore the coefficients D2pipj

L of (85) are Holder continuous.

1. EULER-LAGRANGE EQUATION 129

Finally, the Schauder estimates show that the solutions v of

(aij(x)vxi)xj = f(x)xk ,

with a and f Holder continuous, and a elliptic, have Holder continuous

derivative Dv. The combination of all these estimates yields that v ∈C1,α, that is u ∈ C2,α.

1. Euler-Lagrange equation

The first step to study variational problems involving multiple in-

tegrals is the derivation of necessary conditions for a function to be a

minimizer. In this section we will proceed formally and will not provide

a rigorous justification of the calculations, or worry about the conver-

gence of integrals or the regularity of functions. As the reader will have

the opportunity to observe in the following sections, these are delicate

questions that require careful analysis. However, if adequate hypothe-

ses are imposed, all the calculations in this section can be properly

justified.

1.1. Scalar case. Let L, the Lagrangian, be

L(p, z, x) : Rn × R× U → R+

be a C∞ function, U a bounded and open subset of Rn with smooth

boundary (C∞). We would like to study the minimizers of

I[w] =

∫U

L(Dxw,w, x)dx,

in the set A of functions w that they satisfy certain boundary condi-

tions, for instance,

A = w = g in ∂U ,

where g : ∂U → R is a fixed function.

Let w0 be a minimizer of I[·]. In an analogous way to what was

done in last chapter, we are also going to deduce the Euler-Lagrange

equation.


Theorem 60 (Euler-Lagrange equation). Let w0 be a C2 minimizer of

I[·]. Then

(86) divx[DpL(Dxw0, w0, x)]−DzL(Dxw0, w0, x) = 0.

Proof. If w0 ∈ A then for all φ ∈ C∞0 (U), we have w0 + εφ ∈ A.

Therefore if w0 is a minimum of I[·], the function

i(ε) = I[w0 + εφ]

has a minimum at ε = 0. Consequently i′(0) = 0. Therefore we have∫U

DpL(Dxw0, w0, x)Dxφ+DzL(Dxw0, w0, x)φdx = 0.

Using divergence theorem we conclude that∫U

DpL(Dxw0, w0, x)Dxφ = −∫U

φ divx[DpL(Dxw0, w0, x)].

Therefore, for all φ ∈ C∞0∫U

φ [divx[DpL(Dxw0, w0, x)]−DzL(Dxw0, w0, x)] dx = 0,

which implies

divxDpL(Dxw0, w0, x)−DzL(Dxw0, w0, x) = 0.

Example 35. Let

L(p, z, x) =|p|2

2+ f(x)z,

where f : Rn → R is an arbitrary smooth function. The Euler-Lagrange

equation is then

−∆w + f(x) = 0,

which is the Poisson equation. J

Not every solution to the Euler-Lagrange equation is a minimum

of I, in general solutions can respond to minimum, maximum or even

saddle points. We can, however, as it happens in finite dimension, to


establish further necessary conditions by looking at the second varia-

tion, that is by computing

d2

dε2i(ε)

∣∣∣∣ε=0

,

which for minimum points is nonnegative. Therefore for any φ ∈C∞c (U), we have

(87)

∫U

D2ppLDxφDxφ+ 2D2

pzLDxφ φ+D2zzLφ

2 ≥ 0.

Let B[u, v] be the bilinear form given by the following expression:

B[u, v] =

∫U

D2ppL(Dxw0, w0, x)DxuDxv +D2

pzL(Dxw0, w0, x)vDxu+

+D2pzL(Dxw0, w0, x)uDxv +D2

zzL(Dxw0, w0, x)uv.

From (87) we conclude that B must be positive definite if w0 is a

minimum.

Example 36. Let L = |p|22

+ f(x)z. Then

B[u, v] =

∫U

DxuDxv,

which implies B[φ, φ] ≥ 0. In fact, if φ ∈ C∞c (U) and φ 6= 0 then

B[φ, φ] > 0.

Exercise 114. Prove that in this case this implies that any solution to

be Euler-Lagrange equation is in fact a minimum.

J

Example 37. In this example we derive further necessary conditions

for the existence of a minimizer. Let

ϕδ(x) = δ2η(x) sin

(ξ · xδ

),

where η(x) ∈ C∞c (U). Then

0 ≤ B[ϕδ, ϕδ] =

∫U

D2pipj

Lξiξjη(x)2 sin2

(ξ · xδ

)+O(δ2).


Since sin2(ξ·xδ

) 1

2, as δ → 0, we have

D2pipj

Lξiξj ≥ 0,

that is, the mapping p 7→ L(p, w0, x) is convex for any minimizer w0

and any x ∈ U . As we will see, in the scalar case, the convexity in

p of the Lagrangian is very important both to establish existence of

minimizers as well as proving its regularity. For systems, we will be

able to derive weaker conditions (which agree with convexity in the

scalar case) under which one can show the existence of a minimizer

Exercise 115. Compute the Euler-Lagrange equation corresponding to

u 7→∫B1(0)

u2x − u2

y,

with u : C1(B1)∩C(B1) and u = 0 in ∂B1(0). Show that the solutions

to the Euler-Lagrange equation are not minimizers.


minu,uν=0 em ∂B1(0)

∫B1(0)

(∆u)2 + uf.

Determine the Euler-Lagrange equation and its second variation. Show

that the solutions to the Euler-Lagrange equation are (global) minimiz-

ers.

Exercise 117. Let Ω ⊂ Rn be a regular domain, and u a C2(Ω)∩C(Ω)

solution of

∆u = f

in Ω, with u = 0 in ∂Ω.

Show that u minimizes ∫Ω

|∇u|2

2+ fu,

over all C2 functions that vanish in ∂Ω.

Exercise 118. Let Ω a regular domain and f ∈ L1(Ω). Show that if∫Ω

fϕ = 0


for all ϕ ∈ C∞c (Ω) satisfying∫

Ωϕ = 0 then∫

Ω

fϕ = 0

for all ϕ ∈ C∞(Ω) with∫

Ωϕ = 0. Show that this implies that f is

almost everywhere constant.


min

∫U

|∇u|2 + f(x)udx,

where the minimum is taken over all functions that satisfy∫U

udx = 0,

instead of the usual Dirichlet boundary condition u|∂U = 0. Derive the

Euler-Lagrange equation and a boundary condition for u. Hint: use

the previous exercise.

Exercise 120. Determine the curve y = γ(x), γ(0) = γ(1) = 0, in the

plane such that the area A defined by 0 ≤ x ≤ 1 and 0 ≤ y ≤ γ(x) is

|A| = α (with α sufficiently small) and such that its length is as small

as possible.

Exercise 121. Determine a differential equation for the surface in R3

defined parametrically by

u : B1(0) ⊂ R2 → R3

such that u|∂B1 = γ, that is, its boundary is a given closed curve γ and

which minimizes the area∫B1

det((Du)TDu)1/2dx.

1.2. Systems. For functionals defined for vector valued functions

u : U ⊂ Rn → Rm the derivation of the Euler-Lagrange equation is

similar:

Exercise 122. Let u : U ⊂ Rn → Rm be a minimizer of u : U ⊂ Rn →Rm ∫

U

L(Du, u, x).


Show that

−∑k

(Dpαk

L(x, u,Du))xk

= DzαL(x, u,Du).

The second variation is also similar:

Exercise 123. Let u : U ⊂ Rn → Rm be a minimizer of u : U ⊂ Rn →Rm ∫

U

L(Du, u, x).

Show that for any compactly supported ϕ : U → Rm

∑α,β

[∑j,k

∫U

(D2

pαk pβxj

Lϕαxkϕβxj

+∑j

D2pαk z

βLϕαxkϕβ +D2

zαzβLϕαϕβ

)]≥ 0.

Theorem 61. Let u : U ⊂ Rn → Rm be a C2 minimizer of u : U ⊂Rn → Rm ∫

U

L(Du, u, x),

under fixed boundary conditions at ∂U . Then

Lpiαpjβξαξβkikj ≥ 0,

for all vectors ξ ∈ Rm and k ∈ Rn.

Proof. Let η ∈ C∞c (Ω) be a real valued function. Fix ξ ∈ Rm and

k ∈ Rn. Use

ϕ = ε3ξη(x) sink · xε

in exercise 123 as ε→ 0.

A function F satisfies the Legendre-Hadamard condition if

Fpiαpjβ(p, z, x)ξαξβηiηj ≥ θ|ξ|2|η|2

for all vectors ξ ∈ Rn and η ∈ Rm. The Legendre-Hadamard condition

is weaker than convexity in p (unless m = 1 or n = 1), which would be

Fpiαpjβ(p, z, x)M i

αMjβ ≥ θ|M |2,

for all matrices M ∈ Rm×n.


Exercise 124. Let U be a domain in R2. Show that if L(P, z, x) :

R2×2 → R is given by

L(P ) = detP + ε|P |2

then L satisfies the Legendre-Hadamard condition but is not convex if

ε is sufficiently small.

Exercise 125. Use the Lagrange multiplier method to show that the

minimizers of ∫U

|Du|2

with u = 0 in ∂U , under the constraint∫U

|u|2 = 1,

are eigenvalues of the Laplacian.

Exercise 126. Use the Lagrange multiplier method to determine a

boundary condition in ∂U for the minimizers of∫U

|Du|2

under the constraints ∫U

|u|2 = 1,

∫U

u = 0.

Exercise 127. Let 1 < p < ∞. Determine the Euler-Lagrange equa-

tion for the minimizers of the functional∫U

|Du|p,

with u = g in ∂U .

Exercise 128. Let U be a domain in Rn. Let L(P ) : Rn×n → R be

given by

L(P ) = detP.

Determine the Euler-Lagrange equation corresponding to the functional

u 7→∫U

L(Du).

Explain why this Lagrangian is called a ”null Lagrangian”.


2. Further necessary conditions and applications

2.1. Boundary conditions.

2.2. Variational inequalities.

2.3. Lagrange Multipliers.

2.4. Minimal surfaces.

2.5. Higher order problems.

3. Convexity and sufficient conditions

4. Direct method in the calculus of variations

4.1. Scalar case. To ensure the existence of a minimizer, we will

impose conditions on the Lagrangian which ensure coercivity and lower

semicontinuity. In our discussion we will consider the following model

problem:

(88) minu|∂U=0

∫U

L(Du, u, x)dx.

Similar methods would work if we were to choose the boundary condi-

tion at ∂U , u = g ∈ C∞(∂U) (or with adequate regularity).

The following condition:

(89) L(p, z, x) ≥ α|p|q − β,

for all (p, z, x) ∈ Rn × R× U , with α, β > 0 and 1 < q <∞ is enough

to ensure coercivity. In fact, it implies

I[w] ≥ α‖Du‖qLq(U) − γ,

4. DIRECT METHOD IN THE CALCULUS OF VARIATIONS 137

for some γ > 0. Consequently, in the Sobolev space W 1,q0 we have

I[w]→∞,

as ‖u‖W 1,q0→∞.

Exercise 129. Show that the functional associated to a Lagrangian

satisfying (83) is coercive in W 1,α0 for 1 < α < ∞ if L(p) ≥ C|p|α, f

is bounded.

Let wk be a maximizing sequence. Then

supk‖wk‖W 1,q <∞.

To see this, observe using wk|∂U = 0 and the Poincare inequality we

have

‖wk‖W 1,q ≤ C‖Dwk‖Lq .

Exercise 130. Let U be a bounded domain, g ∈ C∞(∂U) and wk a

minimizing sequence with the boundary condition w = g on ∂U . Show

that

supk‖wk‖W 1,q <∞.

Since in an infinite dimensional space a bounded sequence may fail

to have any convergent subsequence, we will have to use weak conver-

gence. In a reflexive and separable Banach space any bounded sequence

wk has a weakly convergent subsequence (which we still denote by wk):

wk w.

This means, using a bounded sequence in W 1,q, that∫DwkDφ+ wkφ→

∫DwDφ+ wφ,

for all φ ∈ W 1,q′ , where 1q

+ 1q′

= 1.

As the next example shows, one main difficulty in using weak con-

vergence arises from the lack of continuity with respect to weak con-

vergence of non-linear functionals.


Exercise 131. Let wk(x) = sin 2πkx. Show that wk 0 in Lq([0, 1])

and that∫ 1

0w2k = 1

2, independently of k. Conclude that w2

k 6 0.

For our purposes we will show that under certain conditions, we

may have we have weakly lower semicontinuity, that is, whenever

wk w,

then

lim infk→∞

I[wk] ≥ I[w].

Note that in general we do not expect continuity, i.e. I[wk]→ I[w].

Theorem 62. Assume that for fixed z and x the mapping p 7→ L(p, z, x)

is convex then I[·] is weakly lower semicontinuous in W 1,q.

Remark. From the previous chapter we already know that convexity

of L in p is a natural condition.

Proof. Suppose that wk w in W 1,q. Then:

1. supk ‖wk‖W 1,q <∞.

2. By Rellich-Kondrachov theorem we can extract a subsequence

wk → w in Lr, with r < q∗.

3. By extracting, if necessary, a further subsequence, we may as-

sume that wk → w almost everywhere.

4. By Egorov’s theorem, for all ε > 0 there exists a set Eε ⊂ U

such that

|U\Eε| ≤ ε

and

wk → w,

uniformly in Eε.

5. Defining

Fε =

x ∈ U : |w|+ |Dw| < 1

ε


we have |U\Fε| → 0 when ε → 0 and therefore Gε = Fε ∩ Eεsatisfies

|U\Gε| → 0,

when ε→ 0.

We can assume, without loss of generality, that L ≥ 0. Then

I[wk] =

∫U

L(Dwk, wk, x)dx ≥

≥∫Gε

L(Dwk, wk, x) ≥

≥∫Gε

L(Dw,wk, x) +

∫Gε

DpL(Dw,wk, x)(Dwk −Dw)→

→k→∞

∫Gε

L(Dw,w, x),

therefore

lim infk→∞

I[wk] ≥∫Gε

L(Dw,w, x).

Using the monotone convergence theorem when ε→ 0, we obtain

lim infk→∞

I[wk] ≥ I[w].

Exercise 132. Give an example of a sequence uk convergent in Lr

to a function u ∈ Lr which does not converge pointwise. Show that,

however, that there exists a subsequence of uk which converges to u

almost everywhere.

As a corollary we have existence of a solution of (88):

Theorem 63. Suppose L is coercive, that is, it satisfies (89) and con-

vex in p, then there exists a minimizer of (88).

Exercise 133. The following Lagrangian

L(p, z, x) =|p|2

2+ f(x)u


does not satisfy (89). Show, using similar ideas that there exists a

minimizer in W 1,2(Ω) of the functional∫Ω

L(Du, u, x)

in the class of functions u|∂Ω = g, for g ∈ C∞(∂Ω).

Exercise 134. Generalize the previous exercise for Lagrangians of the

form

L(Du) + f(x)u,

with L convex and L(p) ≥ c1|p|q + c2 for some 1 < q <∞.

Exercise 135. Use the direct method in the calculus of variations to

establish the existence of minimizers in W 2,2 of the functional∫Ω

|∆u|2

2+ f(x) ·Du+ g(x)u,

with u|∂ω = h1 and uν |∂ω = h2 (where uν is the normal derivative).

Exercise 136. Let Ω ⊂ Rn be a bounded domain. f : Ω → Rn a C∞

function with compact support. Show that the variational problem

minu|∂Ω=0

∫Ω

|∇u|2

2+ f · ∇u+

1

1 + u2

admits a minimizer in W 1,20 (Ω).

Exercise 137. Let Ω be a regular domain. Establish the existence of

minimizers in W 1,2(Ω) of the functional∫Ω

|∇u|2 + |u|2 +

∫∂Ω

g(x)u,

where g :∈ L2(∂Ω).

4.2. Systems. A functional of the form

u 7→∫U

L(Du, u, x)dx

isquasiconvex if for all P ∈ Rn×m, z0 ∈ Rm and x0 ∈ Rm and any cube

Q ⊂ Rn ∫Q

L(P, z0, x0)dx ≤∫Q

L(P +Dv, z0, x0)dx


for all function v with compact support Q.

Exercise 138. Consider a minimizer u : U → Rm of∫U

L(Du, u, x)dx.

Let Q be a cube containing the origin. Suppose ϕ is a compactly sup-

ported function on Q. Let uλ = u+ λϕ(xλ). Deduce from∫

λQ

L(Du, u, x) ≤∫λQ

L(Duλ, uλ, x)

that ∫Q

L(Du(0), u(0), 0) ≤∫Q

L(Du(0) +Dϕ, u(0), 0).

Exercise 139. Show that convexity implies quasiconvexity.

Theorem 64. Let L(P, z, x) : Rn×m×Rm×U → R, U ⊂ Rn a bounded

domain. Suppose that L is quasiconvex and satisfies the following prop-

erties:

• 0 ≤ c|P |p + c ≤ L ≤ C|P |p + C

• |DPL| ≤ C|P |p−1 + C

• |DzL| ≤ C|P |p−1 + C

• |DxL| ≤ C|P |p + C.

Then there exists a minimizer u ∈ W 1,p0 of

I[U ] =

∫U

L(Du, u, x).

Note that a similar result would also hold for non-homogeneous

boundary conditions u = g in ∂U .

Proof. First recall the following result. Let Qi(x) denote a dyadic

cube containing x with sidelenght 2−i. For f ∈ Lp, 1 < p <∞, define

〈f〉i(x) =

∫Qi(x)

− f.

Then 〈f〉i → f in Lp.


Clearly any minimizing sequence uk is bounded in W 1,p and there-

fore there exists u ∈ W 1,p0 such that

uk u

and uk → u strongly in Lr for some r > p.

Consider the sequence of measures µk with density µk = |Duk|p +

|uk|p. Then there exists µ such that µk µ. By translation we may

choose a dyadic division of Rn, (Qji ) such that µ(∂Qj

i ) = 0 for all cubes.

Let xji denote the center of Qji , and zji = 〈u〉i(xji ).

Fix ε > 0 and choose V ⊂⊂ U such that∫U\V

L(Du, u, x) ≤ ε.

Then

I[uk] ≥∑

j:Qji∩V 6=∅

∫Qji

L(Duk, uk, x)

=∑

j:Qji∩V 6=∅

∫Qji

L(Duk, 〈uk〉i, xji ) + E0,

where the error term E0 can be estimated as follows:

E0 ≤∑

j:Qji∩V 6=∅

∫Qji

|L(Duk, uk, x)− L(Duk, 〈uk〉i, xji )|

≤∑

j:Qji∩V 6=∅

∫Qji

[(C|Duk|p + C) |xji − x|

+(C|Duk|p−1 + C

)|uk − 〈uk〉i|

]→ 0,

as i → ∞, uniformly in k. Indeed, in first term the convergence is

uniform because ‖Duk‖Lp is globally bounded, whereas in the second

case |uk − 〈uk〉i| → 0, uniformly in k because uk is bounded in W 1,p.

Therefore we have

I[uk] ≥∑

j:Qji∩V 6=∅

∫Qji

L(Du+D(uk − u), zji , xji ) + oi(1),


where oi(1) stand for the error terms that converge to 0 as i → ∞,

uniformly in k. Fix ε > 0. Thus, for i sufficiently large and all k:

I[uk] ≥∑

j:Qji∩V 6=∅

∫Qji

L(Du+D(uk − u), zji , xji )− ε,

Now choose 0 < σ < 1 and denote by Qji a concentric cube with Qj

i

but with edge σ2−i. Choose ϕji smooth, compactly supported with

ϕji =

1 in Qji

0 in (Qji )C ,

and with

|Dϕji | ≤C2i

1− σ.

Then

I[uk] ≥∑

j:Qji∩V 6=∅

∫Qji

L(Du+D(vji ), zji , x

ji ) + E1 − ε,

where

vji = ϕji (uk − u),

and

|E1| ≤ C∑

j:Qji∩V 6=∅

∫Qji\Q

ji

1 + |Du|p + |Duk|p + |Dϕji |p|uk − u|p,

and therefore, for any ε > 0

lim supk→∞

|E1| < ε,

if σ is sufficiently close to 1. Thus we can choose i0, and k0 large enough

so that

I[uk] ≥∑

j:Qji∩V 6=∅

∫Qji


ji )− 2ε,


for all i ≥ i0 and all k ≥ k0. Note that∑j:Qji∩V 6=∅

∫Qji


ji )

≥∑

j:Qji∩V 6=∅

∫Qji

L(〈Du〉i +D(vji ), zji , x

ji ) + E2.

Furthermore, we have

E2 ≤∑

j:Qji∩V 6=∅

∫Qji

∣∣L(Du+D(vji ), zi, xi)− L(〈Du〉i +D(vji ), zji , x

ji )∣∣

≤∑

j:Qji∩V 6=∅

∫Qji

(1 + |Du|p−1 + |〈Du〉i|p−1)|Du− 〈Du〉i|,

since ‖〈Du〉i‖Lp ≤ ‖Du‖Lp , for i large enough, we have |E2| ≤ ε.

Therefore, using quasiconcavity, for k and i large enough,

I[uk] ≥∑

j:Qji∩V 6=∅

∫Qji

L(〈Du〉i, zji , xji )− 3ε.

Finally, observe that∑j:Qji∩V 6=∅

∫Qji

L(〈Du〉i, zi, xi)

≥∑

j:Qji∩V 6=∅

∫Qji

L(Du, u, x) + E3

where

E3 ≤∑

j:Qji∩V 6=∅

∫Qji

|L(〈Du〉i, zi, xi)− L(Du, u, x)| ,

which also converges to 0 as i → ∞. Therefore, by sending ε → 0 we

obtain that uk converges weakly in W 1,p to a minimizer.

Exercise 140. Suppose that L(P ) satisfies the uniform strict quasi-

convexity property:∫Q

L(P ) +γ

2|Dv|2 ≤

∫Q

L(P +Dv),

for all v ∈ C∞c (Q). Let uk be a minimizing sequence in W 1,2. Show

that uk → u strongly in W 1,2.


5. Euler-Lagrange equations

The minimizers we obtained in the previous section using the direct

method in the calculus of variations are critical points and, therefore,

we would like to show that they are solutions (in an appropriate sense)

of the Euler-Lagrange equations.

We will suppose the following additional hypothesis on L:

1. |L(p, z, x)| ≤ C(|p|q + |z|q + 1);

2. |DpL(p, z, x)| ≤ C(|p|q−1 + |z|q−1 + 1);

3. |DzL(p, z, x)| ≤ C(|p|q−1 + |z|q−1 + 1).

A function u ∈ W 1,q is weak solution of the Euler-Lagrange equa-

tion (86) if, for all v ∈ C∞c (U),

(90)

∫U

DpL(Du, u, x)Dxv +DzL(Du, u, x)vdx = 0.

Remark. This is a natural constraint to impose since from (90) we

can obtain (86) by integration by parts.

Theorem 65. Under the previous assumptions, if u ∈ W 1,q minimizes

I[·] then u is a weak solution of the Euler-Lagrange equation.

Proof. Let

i(τ) = I[u+ τv].

Theni(τ)− i(0)

τ=

∫U

Lτ (x),

where

Lτ (x) =L(Du+ τDv, u+ τv, x)− L(Du, u, x)

τ.

Clearly

Lτ (x)→ DpL(Du, u, x)Dv +DzL(Du, u, x)v,


almost everywhere. Additionally,

Lτ (x) =1

τ

∫ τ

0

d

dsL(Du+ sDv, u+ sv, x)ds =

=1

τ

∫ τ

0

DpL(Du+ sDv, u+ sv, x)Dv+

+DzL(Du+ sDv, u+ sv, x)vds ≤

≤C(|Du|q + |Dv|q + |u|q + |v|q + 1).

Therefore, the dominated convergence theorem yields the desired re-

sult.

Exercise 141. Prove the last inequality of the previous theorem, that

is,

|DpL(Du+ sDv, u+ sv, x)Dv +DzL(Du+ sDv, u+ sv, x)v|

≤ C(|Du|q + |Dv|q + |u|q + |v|q + 1),

uniformly for 0 ≤ s ≤ τ . Hint: recall the inequalities

ab ≤ ar

r+bs

swith

1

r+

1

s= 1

and

|a+ b|r ≤ C(ar + br).

Exercise 142. Impose conditions on F (A, p, z) so that you can prove

the existence of minimizers in W 2,2 of∫Ω

F (∆u,Du, u),

and that these are weak solutions to the corresponding Euler-Lagrange

equation.

6. Regularity by energy methods

In order to motivate the results of this section, we start with an

example:

6. REGULARITY BY ENERGY METHODS 147

Example 38. Let L = |p|22

+f(x)z. The corresponding Euler-Lagrange

equation is

−∆u+ f(x) = 0.

Let u be a C2 solution of the previous equation. Multiplying the equa-

tion by ∆u and integrating we obtain∫(∆u)2 =

∫f(x)∆u.

Integrating by parts the left-hand side of this identity and ignoring

the boundary terms (of course this wrong and some effort must be done

in order to avoid this difficulty), we have∑i,j

∫|D2

xixju|2 =

∫f(x)∆u ≤ C

ε‖f‖L2 + ε‖∆u‖L2 .

As a conclusion, we have

‖D2u‖2L2 ≤ C‖f‖2

L2 .

This example suggest that if it is possible to somehow control the

boundary terms then the solutions to the Euler-Lagrange equation

should not only be in W 1,2 but also in W 2,2. J

To simplify the presentation we will consider a restricted class of

Lagrangians of the form

L(p)− zf(x),

with

θ ≤ D2ppL(p) ≤ Θ,

for suitable constants 0 < θ < Θ. We should note that more complex

problems can be handled using similar techniques and nothing essential

is really lost by considering this particular problem. We also need to

recall the following theorem Let u : Rn → R. For h ∈ R define

Dhi u =

u(x+ hei)− u(x)

h.


Theorem 66. Let 1 ≤ p <∞, u ∈ W 1,p(U) and V , V ⊂⊂ U . Then

‖Dhu‖Lp(V ) ≤ C‖Du‖Lp(U).

Conversely, if u ∈ Lp and

suph‖Dhu‖Lp(V ) ≤ C,

then u ∈ W 1,p(V ).

Theorem 67. Let u ∈ W 1,2(U) be a weak solution of the equation

− div(DpL(Du)) = f.

Then u ∈ W 2,2loc (U).

Proof. Let V ⊂⊂ W ⊂⊂ U (recall that A ⊂⊂ B means that A is

compact subset of B) and ξ ∈ C∞c (U) withξ ≡ 1 in V

ξ ≡ 0 in U\W0 ≤ ξ ≤ 1.

Let h > 0 be sufficiently small and 1 ≤ k ≤ n. Define

v = −D−hk (ξ2Dhku),

where

Dhkw =

w(x+ hek)− w(x)

h.

Exercise 143. Show that the operator Dhk satisfies an “integration by

parts formula”: ∫vDh

ku = −∫uD−hk v,

for u, v ∈ Cc(U).

Suppose u is a weak solution of the Euler-Lagrange equation, then

0 =

∫DpL(Du)Dv − fv =

=

∫Dhk(DpL(Du))D(ξ2Dh

ku) + fD−hk (ξ2Dhku).


We can rewrite:

Dhk(DpL(Du)) =

DpL(Du(x+ hek))−DpL(Du(x))

h

=1

h

∫ 1

0

d

dsDpL(sDu(x+ hek) + (1− s)Du(x))ds

=1

h

∫ 1

0

D2ppL(· · · )(Du(x+ hek)−Du(x))ds

= ah(x)DhkDu,

where

ah(x) =

∫ 1

0

D2ppL(· · · ).

The matrix ah is positive definite. Therefore

θ

∫ξ2|Dh

kDu|2 ≤∫ξ2ah(Dh

kDu)(DhkDu).

Therefore we have the following estimate:∫U

Dhk(DpL(Du))D(ξ2Dh

ku) ≥ θ

∫U

ξ2|DhkDu|2

+ 2

∫U

ah(DhkDu)(Dh

ku)ξDξ

≥ θ

2

∫U

ξ2|DhkDu|2 − C

∫W

|Dhku|2.

The second term of the Euler-Lagrange equation satisfies the estimate:∣∣∣∣∫ fD−hk (ξ2Dhku)

∣∣∣∣ ≤ C

ε

∫U

|f |2 + C

∫U

|Du|2 + ε

∫U

ξ2|D−hk Dhku|2

≤ C

ε

∫U

|f |2 + C

∫U

|Du|2 + ε

∫U

ξ2|DhkDu|2,

where we used the estimates, which follow from theorem 66,∫U

|(D−hk ξ2

) (Dhku)|2 ≤ C

∫U

|Du|2,

and ∫U

ξ2|D−hk Dhku|2 ≤ C

∫U

ξ2|DhkDu|2.

Therefore, for ε sufficiently small,

θ

4

∫U

ξ2|DhkDu|2 ≤

∫U

|f |2 + |Du|2.


So u ∈ W 2,2(V ).

The last theorem implies in particular that the Euler-Lagrange

equation div(DpL)−DzL = 0 holds almost everywhere.

To conclude our discussion concerning energy methods, we are go-

ing to review some facts concerning elliptic equations, namely Lax-

Milgram’s theorem.

Exercise 144. Let u ∈ W 2,2loc be a solution of the Euler-Lagrange equa-

tion

(91) − div(DpL(Du)) = f(x).

Show that u is a weak solution of

−(D2pipj

L(Du)uxkxj)xi = fxk ,

which can be obtained from (91) by differentiation with respect to xk.

Let v = uxk . The previous exercise shows that v is a weak solution

of

(92) − (aijvxj)xi = g,

where

aij = D2pipj

L(Du), g = fxk .

Equation (92) is an elliptic equation since the matrix a is positive

definite, that is,

aijξiξj ≥ θ|ξ|2,

for all vectors ξ ∈ Rn.

The main result to establish existence of solutions of elliptic equa-

tions is Lax Milgram’s theorem

Theorem 68 (Lax-Milgram ). Let H be a Hilbert space with norm ‖·‖,inner product (·, ·) and duality pairing denoted by 〈·, ·〉. Let

B[·, ·] : H ×H → R


be a continuous bilinear form, that is

|B[u, v]| ≤ α‖u‖‖v‖,

and coercive, that is,

B[u, u] ≥ β‖u‖2.

Let f : H → R be a continuous linear functional (f ∈ H ′). Then there

exists u ∈ H such that

B[u, v] = 〈f, v〉 ∀v ∈ H.

Proof. For the proof of the theorem, we need the following result

from functional analysis:

Theorem 69 (Riesz representation theorem). Let H be a Hilbert space

and H ′ its dual. Then, for each u∗ ∈ H ′, there exists u ∈ H such that

〈u∗, v〉 = (u, v) ∀v ∈ H.

For each fixed u, the functional

v 7→ B[u, v]

is a continuous linear functional. Thus, by Riesz theorem, there exists

w ∈ H, dependent upon u that we denote by

w = Au,

such that

B[u, v] = (Au, v).

We will show that A is a continuous linear mapping. To establish

linearity it suffices to observe that

(A(λ1u1 + λ2u2), v) = B[λ1u1 + λ2u2, v] =

= λ1B[u1, v] + λ2B[u2, v] =

= λ1(Au1, v) + λ2(Au2, v).

The continuity follows from the estimate

‖Au‖2 = (Au,Au) = B[u,Au] ≤ α‖u‖‖Au‖,


and, therefore,

‖Au‖ ≤ α‖u‖.By coercivity we have

β‖u‖2 ≤ B[u, u] = (Au, u) ≤ ‖Au‖‖u‖,

and, therefore,

‖Au‖ ≥ β‖u‖.consequently, A is injective and its image is closed in H.

Finally, we claim that the image of A is H. For that, let w ∈ R(A)⊥.

Then

β‖w‖2 ≤ B[w,w] = (Aw,w) = 0

and, therefore, w = 0. Therefore, we have just shown that A has a

continuous inverse.

Again, by Riesz theorem, there exists w such that

〈f, v〉 = (w, v),

and, consequently, since A is invertible, there exists u such that

Au = w,

that is

B[u, v] = (Au, v) = (w, v) = 〈f, v〉.

As an application of the Lax-Milgram theorem, we have the follow-

ing result:

Example 39. Let H = W 1,20 , f ∈ L2 and

B[u, v] =

∫U

aijuxivxj ,

with aij elliptic,

〈f, v〉 = −∫U

fvxk .

We have

B[u, v] ≤ C‖u‖W 1,20‖v‖W 1,2

0


and, by Poincare inequality,

B[u, u] ≥ β‖u‖2W 1,2

0.

Thus, by Lax-Milgram’s theorem, there exists a weak solution in W 1,20

of

−(aijuxi)xj = fxk .

J

Exercise 145. Use Lax-Milgram’s theorem to establish the existence

of solutions of

∆2u = f,

with u ∈ W 2,20 (B1(0)).

Exercise 146. Suppose that b(x) : Rn → Rn is a bounded C∞(Rn)

function and that f ∈ L2(Rn). Use Lax-Milgram’s theorem to establish

the existence of solutions in W 1,2(Rn) of

−∆u+ b(x) · ∇u+ λu = f,

for λ large enough.

In what remains in this section we will establish an essential result:

Garding’s inequality.

Theorem 70. Let Aαβij (x) be uniformly continuous function and satis-

fying ∑ij

∑αβ

Aαβij ηαηβξiξj ≥ C|η|2|ξ|2.

Let U be a bounded domain with smooth boundary. Then, for all u ∈W 1,2(U) we have

C

∫|u|2 +

∫ ∑ij

∑αβ

Aαβij DiuαDju

β ≥ C

∫|Du|2.

Proof. By the extension theorem, for u ∈ W 1,2(U) there exists

another function u ∈ W 1,2(Rd), compactly supported, such that u = u

in U and ‖u‖W 1,2(Rd) ≤ C‖u‖W 1,2(U). We will drop the ∼ in what

follows, to simplify the notation.


First consider the case in which Aαβij is constant. In this case, by

using Fourier transform we have∫ ∑ij

∑αβ

Aαβij DiuαDju

β = C∑ij

∑αβ

∫Aαβij ξiξju

αuβ

≥ C

∫|ξ|2|u|2 ≤ C‖u‖W 1,2(Rd).

Now we consider a localized version of the inequality, suppose that

suppu ⊂ BR(x0) with R sufficiently small. Let ω(R) denote the mod-

ulus of continuity of Aαβij . Then∫ ∑ij

∑αβ

Aαβij (x)DiuαDju

β =

∫ ∑ij

∑αβ

Aαβij (x0)DiuαDju

β

+

∫ ∑ij

∑αβ

(Aαβij (x)− Aαβij (x0))DiuαDju

β

≥ C‖u‖W 1,2 − ω(R)‖u‖W 1,2 ≥ C

2‖u‖W 1,2 ,

if R is small enough.

Since we can assume that u has compact support in a fixed compact,

we can use a partition of unity to write

u =∑k

ϕ2ku.

Then we have∑k

∫ ∑ij

∑αβ

ϕ2kA

αβij (x)Diu

αDjuβ

=∑k

∫ ∑ij

∑αβ

Aαβij (x)Di(ϕkuα)Dj(ϕku

β) + low order terms

Thus by reassembling everything we obtain the desired inequality.

Exercise 147. Use Garding’s estimate to obtain W 2,2 regularity for

minimizers of Lagrangians that satisfy the Legendre-Hadamard condi-

tion (for systems, the scalar case was already considered!).

7. HOLDER CONTINUITY 155

Exercise 148. Let h > 0 and uhn be the following sequence obtained

by the following inductive procedure: given uhn ∈ W 1,2(Rn), uhn+1 is

determined by:

minuhn+1

∫Rn

(uhn+1 − uhn)2

2h+|∇un+1|2

2,

where uh0 ≡ u0 is the initial data.

1. Use the direct method in the calculus of variations to show that

for each uhn there exists a uhn+1 ∈ W 1,2(Rn).

2. Show that the sequence ‖∇uhn‖L2 is decreasing.

3. Determine the Euler-Lagrange equation for uhn+1.

4. Consider the family in L2 indexed by h

vh =∞∑k=0

uhk(x)1kh≤t<(k+1)h.

Show that vh(·, t) is uniformly bounded in L2(Rn) for all h and

t ∈ [0, T ].

5. Show that, there exists v such that when h→ 0, vh v em L2

and that v is a weak solution of the heat equation

vt = ∆v.

7. Holder continuity

This section is dedicated to establishing C1,α regularity for the solu-

tions of scalar variational problems. As before, we are going to consider

the problem

(93) − div(DpL(Du)) = f.

We will prove that for V ⊂⊂ U ,

Du ∈ Cα(V ),

for some 0 < α < 1, independently of the boundary data. As before,

we will work with x ∈ Rd for d ≥ 2.


For that we will differential (93) with respect to an arbitrary direc-

tion and we conclude that v = uxk is a weak solution to

−(D2pipj

L(Du)vxj)xi = fxk .

This leads us to look at estimates for the linear equation

−(aijvxi)xj =∑k

(fk)xk ,

for fk ∈ L2.

First we will establish certain L∞ estimates for the non-homogeneous

linear equations with zero boundary data. Then we consider the ho-

mogeneous equation subjected to non-zero boundary data. Finally we

gather all these estimates to establish our main result.

The regularity for systems is harder and the methods studied in

this section cannot be applied as they rely on the solution u being a

scalar.

7.1. L∞ estimates. Our first step consists in obtaining L∞ esti-

mates for non-homogeneous linear equations with zero boundary data.

We start by establishing an auxiliary result

Lemma 71. Let β > 1, α > 0, and C > 0. Suppose

φ(h) ≤ C

(h− k)α(φ(k))β.

Then

φ(M) = 0,

for M =(Cφ(0)β−12αβ/(β−1)

)1/α.

Proof. Define

kn = M(1− 1

2n).

Then

(94) φ(kn+1) ≤ C

(kn+1 − kn)αφ(kn)β ≤ C

2α(n+1)

Mαφ(kn)β.


We now will prove by induction that

φ(kn) ≤ φ(0)2−nµ,

with µ = αβ−1

> 0. The case n = 0 is trivial. If the induction hypothesis

holds for some n we must show it also holds for n+ 1. Using (94) and

the induction hypothesis we have

φ(kn+1) ≤ C2α(n+1)

Mαφ(0)β2−βnµ

≤ φ(0)β[

C1/α2n+12−βn/(β−1)

C1/αφ(0)(β−1)/α2β/(β−1)

]α≤ φ(0)

[2n+1−βn/(β−1)

2β/(β−1)

]α= φ(0)2−α(n+1)/(β−1).

Our main theorem in this section is the following:

Theorem 72. Let fi ∈ Lp for some p > d. Let u be a solution of

(95) − (aijuxi)xj =∑i

(fi)xi in Ω

with u = 0 on ∂Ω. Then

(96) ‖u‖L∞ ≤ C‖f‖Lp(Ω)|Ω|1d− 1p .

Proof. Let k > 0 and multiply (95) by (u − k)+. Then, after an

integration by parts∫Ω

aijuxi(u− k)+xjdx = −

∑i

∫Ω

fi(u− k)+xidx.

Define

A(k) = u > k ∩ Ω.

Then ∫A(k)

aijuxiuxj = −∑i

∫A(k)

fiuxi .


Therefore, since aij is elliptic

θ

∫A(k)

|∇u|2 ≤ C

(∫A(k)

|fi|2)1/2(∫

A(k)

|∇u|2)1/2

,

which then yields∫A(k)

|∇u|2 ≤ C

∫A(k)

|fi|2 ≤(∫

A(k)

|fi|p)2/p

|A(k)|1−2/p.

If d > 2 (in the case d = 2 we can choose in the place of 2∗ any exponent

q > 2, and proceed analogously), by Sobolev theorem(∫A(k)

((u− k)+)2∗)2/2∗

=

(∫Ω

((u− k)+)2∗)2/2∗

≤ C

∫Ω

|∇(u− k)+|2

≤∫A(k)

|∇u|2 ≤ C∑i

‖fi‖2Lpφ(k)1−2/p,

where φ(k) = |A(k)|. We also have for any h > k,

(h− k)2φ(h)2/2∗ ≤(∫

A(h)

((u− k)+)2∗)2/2∗

≤ C∑i

‖fi‖2Lpφ(k)(1−2/p).

Therefore we obtain the following relation

φ(h) ≤ C

(∑i

‖fi‖Lp)2∗

φ(k)β

(h− k)α,

where α = 2∗, β =1− 2

p

1− 2d

= (1 − 2p)2∗

2. Then lemma 71 implies that

φ(M) = 0 for some

M ≤ C∑i

‖fi‖Lpφ(0)(β−1)/α ≤ C∑i

‖fi‖Lp|Ω|1/d−1/p.


7.2. Holder continuity for the homogeneous equation. Now

we consider weak solutions to the equation

(97) − (aijvxi)xj = 0,

where aij satisfies

θ ≤ [aij] ≤ Θ,

but no regularity assumptions are imposed, as well as no boundary

data.

A function u ∈ W 1,p(U) is a subsolution of (97) if∫U

aijuxiφxj ≤ 0,

for all φ ∈ W 1,p′

0 (U) with φ ≥ 0. In a similar way, u is a supersolution

if −u is a subsolution.

Exercise 149. Let u be a smooth subsolution of (97). Show that

−(aijuxi)xj ≤ 0.

Lemma 73. Let u be a subsolution of (97) in W 1,p and ψ : R →R a non-decreasing convex function, such that ψ(u) ∈ W 1,p (e.g. ψ′

bounded). Then ψ(u) is also a subsolution.

Proof. Let v = ψ(u). Then∫aijψ(u)xiφxj =

∫aijψ

′(u)uxiφxj =

=

∫aijuxi(ψ

′(u)φ)xj −∫aijuxiuxjψ

′′(u)φ ≤ 0,

since u is a subsolution, ψ′(u)φ is non negative and, by the convexity

of ψ, the last term is negative.

The next lemma shows that the subsolutions of the equation have

its supremum controlled by the Lp norm. This is not a surprising result

since the main strategy in the study of elliptic equations is to try to

establish control of ”‘high”’ norms in terms of ”‘low”’ norms, recall for

instance what was discussed concerning energy methods.


Lemma 74. Let u be a subsolution (97). Then, for p > 0 and 0 < θ <

1,

esssupBRθ u ≤C

(1− θ)n/p

[∫BR

− (u+)p]1/p

.

Proof. Since u+ is a subsolution, we can assume that u ≥ 0.

Case 1. p ≥ 2

Let φ = ξ2up−1, with ξ ∈ C∞c . Then∫aijuxiφxj =

∫aijuxi

[(p− 1)up−2uxjξ

2 + 2ξξxjup−1]≤ 0,

which implies ∫up−2|Du|2ξ2 ≤ C

∫up|Dξ|2.

Since

D(up/2ξ) = Dξup/2 +p

2ξup/2−1Du,

we have ∫|D(up/2ξ)|2 ≤ C

∫up|Dξ|2.

consequently, by Sobolev’s inequality,[∫(up/2ξ)2∗

]2/2∗

≤ C

∫up|Dξ|2.

Given 0 < ρ < R, let ξ ∈ C∞c with 0 ≤ ξ ≤ 1, ξ ≡ 1 in Bρ = B(x0, ρ)

and ξ ≡ 0 em BR = B(x0, R)C . We can additionally assume that

|Dξ| ≤ C

R− ρ.

Then, for n ≥ 3 (for n < 3 the estimate is trivial by Sobolev’s theorem),

(98)

[∫Bρ

upn/(n−2)

](n−2)/n

≤ C

(R− ρ)2

∫BR

up.

Thus we have obtained an estimate for the Lpnn−2 norm in terms of the Lp

norm. Unfortunately, these norms are computed in distinct sets. The

main idea is to iterate this inequality and, at the same time, control the


domains and the estimate’s constants in order to obtain a non-trivial

estimate for the L∞ norm in terms of the Lp norm. For that, consider

Rk = R(θ +1− θ

2k),

which satisfies

Rk −Rk+1 =1− θ2k+1

R.

Let

pk = p

(n

n− 2

)k.

Then, applying estimate (98), with R = Rk, ρ = Rk+1 and p = pk,[∫BRk+1

upk+1

]n−2n

≤ C

R2(1− θ)24k+1

∫BRk

upk ,

that is

‖u‖Lpk+1 (BRk+1) ≤

[C

R2(1− θ)2

] 1pk

4k+1pk ‖u‖Lpk (BRk ).

By iteration we obtain

‖u‖Lpk+1 (BRk+1) ≤

[C

R2(1− θ)2

]∑kj=0

1pj

4∑kj=0

j+1pj ‖u‖Lp(BR).

Since∞∑j=0

1

pj=

n

2p,

and∑∞

j=0j+1pj

is finite, we get

‖u‖Lpk+1 (BRk+1) ≤

C

[R(1− θ)]n/p‖u‖Lp(BR),

where the last constant, C, is independent of k. Letting k → ∞ we

conclude

‖u‖L∞(BRθ) ≤C

(1− θ)n/pR−n/p‖u‖Lp(BR) =

=C

(1− θ)n/p

[∫BR

− up]1/p

.

Case 2. 0 < p < 2


By the previous estimate,

‖u‖L∞(BRθ) ≤C

(1− θ)n/2Rn/2

[∫BR

u2

]1/2

≤

≤ C

(1− θ)n/2Rn/2

[∫BR

up]1/2

‖u‖1−p/2L∞(BR).

Using the inequality:

ab ≤ a2/p

2/p+

b2/(2−p)

2/(2− p),

which holds for 0 < p < 2, we obtain:

‖u‖L∞(BRθ) ≤1

2‖u‖L∞(BR) +

C

[(1− θ)R]n/p

[∫BR

up]1/p

.

If we define

ϕ(t) = ‖u‖L∞(Bt),

we have, for s < t ≤ R,

(99) ϕ(s) ≤ 1

2ϕ(t) +

C

(t− s)n/p‖u‖Lp(BR).

We need now a technical lemma:

Lemma 75. Let ϕ be a bounded non-decreasing function satisfying

(99). Then, for s < t ≤ R,

ϕ(s) ≤ C‖u‖Lp(BR)(t− s)−n/p.

Thus the lemma implies

‖u‖L∞(BRθ) ≤ C‖u‖Lp(BR)

(1− θ)n/pRn/p.

Proof. Let ϕ satisfying

ϕ(s) ≤ 1

2ϕ(t) + a(t− s)−α,

for s < t. Let 0 < τ < 1 and

si+1 = si + (1− τ)τ i(t− s),


with s−1 = s. Then

ϕ(si) ≤1

2ϕ(si+1) + a(1− τ)−ατ−iα(t− s)−α,

and, therefore by induction

ϕ(s) ≤ 1

2iϕ(si) + a

i−1∑j=0

(1− τ)−ατ−jα(t− s)−α2−j.

Choosing τ sufficiently close to one 1 such that τ−α

2< 1 we have, as

i→∞,

ϕ(s−1) = ϕ(s) ≤ Ca(t− s)−α.

This ends the proof of the lemma.

The next step is to study estimates similar to the ones of lemma 74

for p < 0. In this case we obtain, however, the opposite inequality.

Lemma 76. Let u be a non-negative supersolution. Then there exists

δ > 0 and p0 > 0 such that

(100) essinfBR/2 u ≥ δ

(∫BR

− up0

)1/p0

.

Proof. We will leave the following fact as an exercise:

Exercise 150. Let u be a positive supersolution. Show that 1u

is a

subsolution.

Combining the last exercise with lemma 74 we obtain

esssupBR/2 u−1 ≤ C

(∫BR

− u−p)1/p

,

for p > 0. In this way,

essinfBR/2 u ≥ C

[∫BR

− u−p∫BR

− up]−1/p(∫

BR

− up)1/p

,


which implies (100) if we can prove(∫BR

− u−p)(∫

BR

− up)≤ C,

for some p > 0 and C > 0.

To prove this inequality we need the John-Nirenberg lemma, whose

proof is the subject of the next section.

Lemma 77 (John-Nirenberg). Denote by Q a generic cube contained

in U and Q′ ⊂ Q a generic subcube of Q. For f ∈ L1, let |f |∗,Q be

given by

|f |∗,Q = supQ′⊂Q

∫Q′− |f − fQ′ |dx,

where fQ′ denotes the average of f in Q′.

Then, if |f |∗,Q < ∞, there exist positive constants C1, C2 and λ

such that1

|Q||x ∈ Q : |f(x)− fQ| ≥ λ|f |∗,Q| ≤ C1e

−λC2 .

We leave as an exercise the proof of the following corollary:

Corollary 78. If |f |∗,Q <∞ then for some ε > 0 we have∫Q

− eεf < C,

independent of Q.

The proof of the corollary is left as an exercise, which is a variation

of the following lemma:

Lemma 79. Let f ∈ Lp, f ≥ 0. Then∫|f |p =

∫ ∞0

pλp−1|x : f(x) > λ|dλ.

Proof. We have∫|f |pdx =

∫ ∫ f(x)

0

pλp−1dλdx =

∫ ∞0

∫pλp−1χλ<f(x)dxdλ.


Let

v = lnu− β.If |v|∗,Q <∞, corollary (78) implies∫

− up0 ≤ Ceβp0

and ∫− u−p0 ≤ Ce−βp0 .

consequently, ∫− up0

∫− u−p0 ≤ C,

for some p0 > 0 to be determined. This suggests we should try to

estimate | lnu|∗,Q = | ln v|∗,Q.

Let φ(x) = ξ2

u(x). Then

−∫aijuxiuxj

ξ2

u2+

∫aijuxi

2ξξxju≥ 0,

which implies ∫|Du|2

u2ξ2 ≤ C

∫|Dξ|2.

Let ξ ≡ 1 in Q′ and ξ ≡ 0 in the exterior of a cube with twice the

sidelenght and same center. Then we conclude∫Q′|D lnu|2 ≤ Cρn−2,

where ρ is the sidelenght of Q′.

The Poincare inequality implies∫Q′| lnu− (lnu)Q′|2 ≤ Cρ2

∫Q′|D lnu|2 ≤ Cρn.

Thus,∫Q′| lnu− (lnu)Q′ | ≤ Cρn/2

[∫Q′| lnu− (lnu)Q′|2

]1/2

≤ Cρn.

Therefore | lnu|∗,Q <∞, which ends the proof.


Theorem 80 (Harnack inequality). Let u be a positive solution. Then

essinfBR/2 u ≥ C esssupBR/2 u.

Proof. By the two previous lemmas we have

essinfBR/2 u ≥ δ

(∫BR

− up0

)1/p0

≥ C esssupBR/2 u.

Using Harnack’s inequality, the Holder continuity of u is a conse-

quence of the following theorem

Theorem 81 (deGiorgi-Nash-Moser). Let u be a solution of (97) .

Then u is Holder continuous. Furthermore, if we set

M(R) = esssupBR u,

m(R) = essinfBR u.

and let ω(R) = M(R)−m(R), there exists γ < 1 such that

ω(R/2) ≤ γω(R).

Proof. The Harnack inequality implies, by subtracting m(R)− εto u, and letting ε→ 0,

C[m(R/2)−m(R)] ≥M(R/2)−m(R).

Defining ω(r) = M(r)−m(r) we have

ω(R/2) = M(R/2)−m(R/2)

≤M(R/2)−(m(R) +

1

C[M(R/2)−m(R)]

)=

(1− 1

C

)[M(R/2)−m(R)]

≤(

1− 1

C

)ω(R).

By induction we obtain

ω(2−kR) ≤ ηkω(R),


with η < 1. Therefore

ω(ρ) ≤ Cρα,

that is u is Holder continuous.

7.3. John-Nirenberg Lemma. Before discussing the proof of

John-Nirenberg lemma, we need to establish a version of the Calderon-

Zygmund decomposition:

Lemma 82. Let Q be a dyadic cube and Q ⊃ Q the unique dyadic cube

whose side is twice the size of Q.

Let f ∈ L1(Q0) and α > |fQ0|. Then there exists a disjoint sequence

of dyadic cubes Qj such that

|fQj | ≤ α < |fQj |,

and |f | ≤ α almost everywhere in Q0\ ∪j Qj.

Proof. We start with Q0 which we divide into 2n dyadic cubes

Q0,k. Then we select those in which

|fQ0,k| > α

and we subdivide again the ones which are not selected. By continuing

iteratively, we obtain a sequence of cubes Qj such that

|fQj | > α

and

|fQj | ≤ α,

since Qj has not been selected. By Lebesgue differentiation theorem,

in the complement of ∪jQj

|f | ≤ α,

almost everywhere, since no cube of the complement was selected.

Now we give the proof of John-Nirenberg lemma (lemma 77).


Proof. Without loss of generality we may assume that fQ = 0 and

|f |∗,Q = 1.

Let α0 > 0 and for each natural l apply lemma (82) with α = α0l.

Let Qlj be the sequence of cubes that are obtained in this way. Then,

|fQlj | > lα0 |fQlj | ≤ lα0,

and for x 6∈ ∪Qlj

|f(x)| ≤ lα0.

Now we are going to estimate ∣∣∪jQlj

∣∣ .In the complement of this set |f | ≤ lα0 and therefore the previous

estimate gives an upper bound:

|x : |f(x)| > lα0| .

We are going to establish a recurrence relation between the estimate at

l and the one at l+1. This estimate will allow us to obtain exponential

decay in l.

Fix l and j, and suppose i is such taht Ql+1i ⊆ Ql

j. Then∣∣∣fQlj − fQl+1i

∣∣∣ ≤ ∫Ql+1i

− |f − fQlj |

and therefore

(101)∑i:Ql+1

i ⊆Qlj

|Ql+1i ||fQlj − fQl+1

i| ≤

∑i:Ql+1

i ⊆Qlj

∫Ql+1i

|f − fQlj | ≤ |f |∗,Q|Qlj|.

If, for Ql+1i ⊆ Ql

j, we obtain a lower bound for

|fQlj − fQl+1i|,

equation (101) yields a recurrence relation for the values of∑|Ql+1

i |as a function of

∑|Ql

j|, by adding both sides over j. To obtain this


lower bound, observe that

|fQlj − fQl+1i| ≥ |fQl+1

i| − |fQlj |

≥ |fQl+1i| − |fQlj − fQlj | − |fQlj |

≥ (l + 1)α0 − |fQlj − fQlj | − lα0.

However, if P is a dyadic subcube of Q with P ⊆ Q, then

|fP − fP | =1

|P |

∣∣∣∣∫P

(f − fP )

∣∣∣∣ ≤ 1

|P |

∫P

|f − fP |

=2d

|P |

∫P

|f − fP | ≤2d

|P |

∫P

|f − fP |

≤ 2d|f |∗,Q ≤ 2d.

Therefore

|fQlj − fQl+1i| ≥ (l + 1)α0 − 2d − lα0 = α0 − 2d.

If we choose α0 = 2 + 2d we obtain

|fQlj − fQl+1i| ≥ 2,

which implies ∑i

|Ql+1i | ≤

1

2

∑j

|Qlj|.

Therefore

|x : |f(x)| > lα0| ≤ 2−l−1|Q|,

which easily yields the lemma.

7.4. Holder continuity. Finally we use all the estimates in the

previous sections to establish interior Holder continuity.

Theorem 83. Let u be a solution of

(102) − (aijuxi)xj = (fi)xk

in an open set U . Then u is Holder continuous in any compact subset

of U .


Proof. Write u = v + w where v is a solution of

−(aijvxi)xj = (fi)xk ,

in B2R and v = 0 in ∂B2R. Therefore w solves

−(aijwxi)xj = 0,

in B2R and with arbitrary boundary data in ∂B2R. Then we have

‖v‖L∞(B2R) ≤ CR1−n/p,

where C depends on the Lp norm of f , ellipticity of aij but not on the

solution u or R.

Let ωw be the modulus of continuity of w. Then for all R′ < R we

know that

ωw(R′

4) ≤ ηωw(R′),

for some 0 < η < 1. Hence

ωu(R/4) ≤ CR1−d/p + ωw(R/4)

≤ CR1−n/p + ηωw(R)

≤ CR1−d/p + ηωu(R).

Then the Holder continuity follows from next lemma:

Lemma 84. Suppose ω(R/4) ≤ CRα + ηω(R). Then ω(R) ≤ CRγ.

Proof. Suppose

M > supR0≤R≤4R0

ω(R)

Rγ,

to be chosen later as a function of γ. Then, for all R0/4 ≤ R ≤ R0 we

have

ω(R) ≤ ηω(4R) + C(4R)α

≤Mη(4R)γ + C(4R)α

≤MRγ,

if we choose γ < α sufficiently small, and then M large enough so that

4γMη + 4αC < M.

8. SCHAUDER ESTIMATES 171

Now, if R0

4i+1 ≤ R ≤ R0

4iwe have

ω(R) ≤ (Mη4γ + 4αC)Rγ ≤MRγ.

8. Schauder estimates

In this section we will prove that weak solutions of equations of the

form

(aij(x)vxi)xj = f

are C1,α, as long as both the coefficients aij and f are Holder continuous

functions. These are the so called Schauder estimates. We should

observe that although we will carry out the proof for the scalar case,

the argument is unchanged for elliptic systems, in contrast with the

regularity results of the previous section.

8.1. Morrey and Campanato spaces. The key idea in Schauder

estimates is to use the ellipticity of the equation to control the oscilla-

tion of the solution. For this we will need certain spaces of functions the

Campanato and Morrey spaces, as well as some of its basic properties.

For p ≥ 1 and λ ≥ 0 we define the Campanato seminorm

[u]p,λ =

[sup

x∈U,ρ>0ρ−λ

∫U(x,ρ)

|u− ux,ρ|p]1/p

,

where U(x, ρ) = U ∩B(x, ρ) and

ux,ρ =

∫U(x,ρ)

− u.

To avoid technicalities, we assume that for ρ sufficiently small, |U(x, ρ)| ≥cρ−n. In any case, our main objective is to establish interior estimates

on U and not up to the boundary.


The Campanato space Lp,λ(U) is the space of functions u ∈ Lp(U)

which satisfy

‖u‖p,λ ≡ ‖u‖Lp + [u]p,λ <∞.

The Morrey space Lp,λ(U) is the space of functions u ∈ Lp(U) for which

‖u‖Lp,λ ≡[supx∈U

supρ>0

ρ−λ∫U(x,ρ)

|u|p]1/p

<∞.

Exercise 151. Show that [·]p,λ and ‖ · ‖Lp,λ are, respectively, a semi-

norm and a norm.

Proposition 85. Depending on the relative values of λ, p and n we

have the following isomorfisms:

(i) If 0 ≤ λ < n then Lp,λ ' Lp,λ;

(ii) If n < λ < n+ p then Lp,λ ' C0,λ−np ;

(iii) Lp,0 = Lp and Lp,λ ' R if λ > n+ p.

Proof. To prove (i), we start by showing that

[u]p,λ ≤ C‖u‖Lp,λ

and then we will establish the opposite inequality. Let us start by

observing that∫U(x,ρ)

|u− ux,ρ|p ≤ C

∫U(x,ρ)

(|u|p + |ux,ρ|p) .

Then Jensen’s inequality implies

|ux,ρ|p ≤∫U(x,ρ)

− |u|p,

and, therefore, ∫U(x,ρ)

|ux,ρ|p ≤∫U(x,ρ)

|u|p.

This implies,

ρ−λ∫U(x,ρ)

|u− ux,ρ|p ≤ C‖u‖pLp,λ

,

that is, ‖u‖p,λ ≤ C‖u‖Lp,λ .


To prove the opposite inequality, we need some preliminary esti-

mates. First, observe that∫U(x,ρ)

|u|p ≤ C|ux,ρ|p|U(x, ρ)|+ C

∫U(x,ρ)

|u− ux,ρ|p

≤ Cρn|ux,ρ|p + C[u]pp,λρλ.

Therefore

(103) ρ−λ∫U(x,ρ)

|u|p ≤ C[u]pp,λ + Cρn−λ|ux,ρ|p.

Unfortunately, the norm Lp,λ does not control directly |ux,ρ|p. To use

this estimate we need some auxiliary estimates. For R > r we have:

|ux,R − ux,r|p ≤ Cr−n∫U(x,r)

(|ux,R − u|p + |ux,r − u|p)(104)

≤ Cr−n(Rλ + rλ)[u]pp,λ ≤ Cr−nRλ[u]pp,λ.

Let R = R02−i, r = R02−i−1 and R0 > 1. Then

|ux,R − ux,r| ≤ CR(λ−n)/p0 2(n−λ)i/p[u]p,λ.

Let ρ = R02−l−1. Then

|ux,ρ| = |ux,R02−l−1| ≤ |ux,R02−l−1 − ux,R02−l |+ |ux,R02−l − ux,R02−l+1|

+ . . .+ |ux,R0/2 − ux,R0 |+ |ux,R0|

≤ |ux,R0|+ C[u]p,λ

l∑i=0

R(λ−n)/p0 2(n−λ)i/p

≤ |ux,R0|+ Cρ(λ−n)/p[u]p,λ.

Therefore

|ux,ρ|p ≤ C|ux,R0|p + Cρλ−n[u]pp,λ ≤ C‖u‖pLp + Cρλ−n[u]pp,λ.

By combining the last inequality with (103) and using λ < n, we have

ρ−λ∫U(x,ρ)

|u|p ≤ C‖u‖pLp + [u]pp,λ.

In what concerts the second statement of the proposition, (ii), the

inclusion

C0,λ−np ⊂ Lp,λ


is elementary and is left as an exercise:

Exercise 152. Show that C0,λ−np ⊂ Lp,λ.

So we need to establish the opposite inequality, Lp,λ ⊂ C0,λ−np . Let

u ∈ Lp,λ ∩ C0. Given x and y, let R = |x− y| and α = λ−np

. We must

show that

|u(x)− u(y)| ≤ CRα.

By the triangle inequality

|u(x)− u(y)| ≤ |u(x)− ux,2R|+ |ux,2R − uy,2R|+ |uy,2R − u(y)|.

Applying (104) we obtain

|ux,R − ux,R2−l−1 | ≤l∑

i=0

|ux,R2−i − ux,R2−(i+1)|

≤ CRα[u]p,λ

l∑i=0

2−αi ≤ CRα[u]p,λ,

where the constant C is independent of l. Therefore, by taking l→∞,

we obtain

|ux,2R − u(x)|, |uy,2R − u(y)| ≤ CRα[u]p,λ.

We also have,

|U(x, 2R) ∩ U(y, 2R)||ux,2R − uy,2R| ≤∫U(x,2R)

|ux,2R − u|+

+

∫U(y,2R)

|uy,2R − u|.

Since |U(x, 2R)∩U(y, 2R)| ≥ cRn, we obtain, using Holder’s inequality,

|ux,2R − uy,2R| ≤ CR−n|U(x, 2R)|1−1/p

(∫U(x,2R)

|ux,2R − u|p)1/p

+

+ CR−n|U(y, 2R)|1−1/p

(∫U(y,2R)

|uy,2R − u|p)1/p

≤ CRα[u]p,λ.

The last statement of the proposition, (iii), is left as an exercise:


Exercise 153. Prove (iii). Hint: observe that if |u(x) − u(y)| ≤C|x− y|α for some α > 1 then u is constant.

8.2. Preliminary estimates. The next lemma gives a key esti-

mate concerning the behavior of solutions of elliptic equations in the

interior of U .

Lemma 86. Let u ∈ H1 be a solution of

−(aijuxi)xj = fxk ,

with aij elliptic satisfying θ ≤ aij ≤ Θ. Then, for any r and x for

which B(x, r) ⊂ U , we have:∫B(x, r

2)

|Du|2 ≤ C

[r−2

∫B(x,r)

|u|2 +

∫B(x,r)

|f |2].

Proof. Without loss of generality we can assume x = 0. Let

ξ ∈ C∞c with ξ ≡ 1 in B(0, 12) and ξ ≡ 0 in B(0, 1)C , and η(x) = ξ(x

r).

Then

θ

∫Br

|D(ηu)|2 ≤∫Br

aij(ηu)xi(ηu)xj

=

∫Br

aijuxi(η2u)xj +

∫Br

aijηxiu(ηu)xj

−∫Br

aijuxiηηxju

=

∫Br

fxkη2u+

∫Br

aij(ηxiu(ηu)xj − uxiηηxju)

≤ −∫Br

f(η2u)xk +C

εr2

∫Br

u2 + ε

∫Br

η2|Du|2 + ε

∫Br

|D(ηu)|2

≤ C

∫Br

f 2 +C

εr2

∫Br

u2 + Cε

∫Br

η2|Du|2,

which implies, by choosing ε sufficiently small,∫Br/2

|Du|2 ≤ C

∫Br

f 2 +C

r2

∫Br

u2.


Exercise 154. Show that if aij is constant and u ∈ H1 satisfies

−(aijuxi)xj = 0,

then, for all multiindices α ∈ Zn,

‖Dαu‖L2(Br/2) ≤ Cr−|α|‖u‖L2(Br).

8.3. Schauder estimates. The main objective of this section is

to prove:

Theorem 87. Let u ∈ H1 be a solution of

−(aijuxi)xj = fxk .

Then

(i) If aij is constant and f ∈ L2,λ then

Du ∈ L2,λloc ,

for 0 ≤ λ < n+ 2.

(ii) If aij(x) is continuous and f ∈ L2,λ then

Du ∈ L2,λloc ,

for 0 ≤ λ < n.

(iii) If aij(x), f(x) ∈ C0,α with 0 < α < 1 then

Du ∈ C0,βloc ,

for some β > 0.

Proof. (i). Suppose that B(x, 2R) ⊂ U (without loss of generality

assume x = 0). Let w be the unique solution of

−(aijwxi)xj = fxk

in H10 (B2R), whose existence is guaranteed by Lax-Milgram theorem

(theorem 68).


We have the following estimate:∫B2R

|Dw|2 ≤ C

∫B2R

aijwxiwxj = −C∫B2R

(f − γ)wxk

≤ 1

2

∫B2R

|Dw|2 + C

∫B2R

|f − γ|2,

for any constant γ. Define γ = f0,2R ≡ f2R, to simplify the notation,

we obtain

(105)

∫B2R

|Dw|2 ≤ C

∫B2R

|f − f2R|2 ≤ CRλ[f ]22,λ.

We will use w to decompose the solution into two parts:

u = v + w.

w, by definition, satisfies w = 0 in the boundary of B2R and

−(aijwxi)xj = fxk .

Consequently, v = u − w has unknown boundary data but it satisfies

the homogeneous equation

−(aijvxi)xj = 0.

By hypothesis, the coefficients are constant, therefore v is C∞ (ex-

ercise 154) and, by Poincare’s inequality for ρ < 2R∫Bρ

|Dv − (Dv)ρ|2 ≤ Cρ2

∫Bρ

|D2v|2.

It is important to observe that in the Poincare the dependence of the

constant in ρ is exactly the one used previously:

Exercise 155. Show that the Poincare inequality in Bρ has the form∫Bρ

|u− uρ|2 ≤ Cρ2

∫Bρ

|Du|2,

where C does not depend on ρ.

Exercise 156. Use the Fourier transform to prove the following inter-

polation inequality

‖u‖Hk+θ(Rd) ≤ C‖u‖1−θHk(Rd)

‖u‖θHk+1(Rd).


Recall that for s ∈ R the norm Hs is defined by

‖u‖2Hs(Rd) =

∫Rd

(1 + |ξ|2)s|u|2dξ.

Exercise 157. Let 1 ≤ p0, p1 ≤ ∞. Prove the following interpolation

inequality

‖u‖Lpθ ≤ ‖u‖θLp0‖u‖1−θLp1 ,

where1

pθ=

θ

p0

+1− θp1

.

Using interpolation techniques and the Fourier transform, it is also

possible to prove the following version of the Sobolev theorem:

Theorem 88. Let u ∈ Hs, where 0 < s < n2. Let 1

p∗= 1

2− s

n. Then

‖u‖Lp∗ ≤ C‖u‖Hs .

Proof. If s is integer, this the standard Sobolev theorem, for s

fractionary we can use interpolation.

Let v = D2v. Since the coefficients aij are constant

−(aij vxi)xj = 0.

Therefore, for 1 ≤ p <∞,∫Bρ

|v|2 ≤ ρn/p′‖v‖2

L2p(BR/2)

≤ Cρn/p′‖v‖2

Hn/(2p′)(BR/2),

using Sobolev’s theorem (theorem 88). By exercise 154, and using

interpolation, we obtain

(106)

∫Bρ

|v|2 ≤ Cρn/p

′

Rn/p′‖v‖2

L2(BR).


As a conclusion, given ν sufficiently small, there exists p, sufficiently

large, such that np′

= n− ν. By the Poincare inequality

∫Bρ

|Dv − (Dv)ρ|2 ≤ Cρ2

∫Bρ

|D2v|2

≤ Cρn+2−ν

Rn−ν

∫BR

|D2v|2 ≤ C( ρR

)n+2−ν∫B2R

|Dv − (Dv)2R|2,

where in the last inequality we have applied lemma 86.

Let ρ < R2

. Then∫Bρ

|Du− (Du)ρ|2 ≤ 2

∫Bρ

|Dv − (Dv)ρ|2 + 2

∫Bρ

|Dw − (Dw)ρ|2

≤ C( ρR

)n+2−ν∫B2R

|Dv − (Dv)2R|2 + C

∫Bρ

|Dw|2

:= T1 + T2,

where

T1 = C( ρR

)n+2−ν∫B2R

|Dv − (Dv)2R|2 =

= C( ρR

)n+2−ν∫B2R

|Du− (Du)2R −Dw + (Dw)2R|2 ≤

≤ C( ρR

)n+2−ν∫B2R

|Du− (Du)2R|2 + C( ρR

)n+2−ν∫B2R

|Dw|2

and

T2 = C

∫Bρ

|Dw|2.

Let

Φ(ρ) = supρ1≤ρ

∫Bρ1

|Du− (Du)ρ1|2.

Then, using the estimate (105), we have

Φ(ρ) ≤ C( ρR

)n+2−νΦ(2R) + CRλ[f ]22,λ.


From the previous inequality we conclude, when λ < n + 2 and using

the lemma that we will prove next, that

Φ(ρ) ≤ C

[Φ(2R)

( ρR

)λ+ [f ]22,λρ

λ

],

which implies, when λ < n+ 2,

supρ<R

2

ρ−λ∫Bρ

|Du− (Du)ρ|2 ≤ C(‖u‖2L2 + ‖f‖2

L2,λ).

Lemma 89. Either Φ ≥ 0, not decreasing and ρ < R/2, with R < R0

and

Φ(ρ) ≤ aΦ(2R)[( ρ

2R

)α+ ε]

+ bRβ

with 0 < β < α and ε > 0. Then, if ε is small enough,

Φ(ρ) ≤ C

[Φ(2R0)

(ρ

R0

)γ+ bρβ

],

with β < γ < α.

Proof. Let β < γ < α and θ sufficiently small such that

2aθα < θγ.

Suppose ε < θα, such that

aε < θγ/2.

Then

Φ(2θR) ≤ Φ(2R)θγ + bRβ.

Exercise 158. Estimate Φ(2θ2R) and Φ(2θ3R), applying inductively

the previous inequality.

By induction,

Φ(2θk+1R) ≤ θγ(k+1)Φ(2R) + bRβ

(k∑j=0

θγjθβ(k−j)

)≤ θγ(k+1)Φ(2R) + bRβθβkc(θ).

Therefore, given ρ and k satisfying

2θk+1R ≤ ρ ≤ 2θkR,


we have

Φ(ρ) ≤ Φ(2θkR) ≤ θγ(k)Φ(2R) + b(Rθk)βc(θ)

≤ C[( ρR

)γΦ(2R) + bc(θ)ρβ

].

(ii). We have

−(aij(x0)uxi)xj = [(aij(x)− aij(x0))uxi + δkjf ]xj ,

that is

L0u = g,

where L0 is an operator with constant coefficients. To simplify nota-

tion, we take x0 = 0. Let w ∈ H10 (BR), defined by

L0w = g

and v = u− w. Let v = Dv and so, L0v = 0. Proceeding as in (106),∫Bρ

|v|2 ≤ C( ρR

)n−ν ∫BR

|v|2.

Consequently,∫Bρ

|Du|2 ≤ 2

∫Bρ

|v|2 + 2

∫Bρ

|Dw|2 ≤

≤ C( ρR

)n−ν ∫BR

|v|2 + 2

∫Bρ

|Dw|2 ≤

≤ C( ρR

)n−ν ∫BR

|Du|2 + C

[( ρR

)n−ν+ 1

] ∫BR

|Dw|2.

However, w depends implicitly on u, and therefore we must proceed

with caution.∫BR

|Dw|2 ≤ C

∫BR

aij(0)wxiwxj

=

∫BR

gw = −∫BR

fwxk −∫BR

(aij(x)− aij(0))uxiwxj

≤ C

∫BR

|f |2 +1

4

∫BR

|Dw|2 + C(ω(R))2

∫BR

|Du|2,


where ω(R) is the modulus of continuity of aij.

Therefore∫Bρ

|Du|2 ≤ C( ρR

)n−ν ∫BR

|Du|2 +CRλ‖f‖2L2,λ +C(ω(R))2

∫BR

|Du|2.

Thus, for R so that ω(R) is sufficiently small and applying lemma 89

to the function

Φ(ρ) = supρ≤ρ

∫Bρ

|Du|2,

we obtain Du ∈ L2,λ if λ < n.

(iii). Let g and w as in (ii). We have∫Bρ

|Du−(Du)ρ|2 ≤ C( ρR

)n+2−ν∫B2R

|Du− (Du)ρ|2 +

∫B2R

|Dw|2

≤ C( ρR

)n+2−ν∫B2R

|Du− (Du)ρ|2 +

∫B2R

|f − f2R|2

+ Cω(2R)2

∫B2R

|Du|2.

By hypothesis we have ω(R) ≤ Rα and∫B2R

|f − f2R|2 ≤ CRn+2α.

For λ0 < n, part (ii) implies Du ∈ L2,λ0 . Choosing

Φ(ρ) = supρ≤ρ

∫Bρ

|Du− (Du)ρ|2

we obtain

Φ(ρ) ≤ C( ρR

)n+2−νΦ(2R) + CRλ0+2α,

and so

Φ(ρ) ≤ Cρλ0+2α.

4

Optimal control and viscosity solutions

This chapter is dedicated to the study of deterministic optimal con-

trol problems and its connection with Hamilton-Jacobi equations.

put some more details concerning controlled dynamics,

motivation, define a control space

A typical problem in optimal control, which is studied in detail

in this chapter, is the terminal value optimal control problem. This

problem consists in determining the optimal trajectories x(·) which

minimize

J [u;x, t] =

∫ t1

t

L(x,u)ds+ ψ(x(t1)),

among all controls u(·) : [t, t1] → Rn and all (continuous) trajectories

x with a initial condition x(t) = x and which are (almost everywhere

in time) solutions to the controlled dynamics

x = f(x,u).

The value function V is defined as

(107) V (x, t) = inf J [u;x, t]

in which the infimum is taken over all controls.

An important case is the ”calculus of variations setting”. In this

case, f(x, u) = u, and the optimal trajectories x(·) are solutions to the

Euler-Lagrange equation

d

dt

∂L

∂v(x, x)− ∂L

∂x(x, x) = 0,

183

184 4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS

and p = −DvL(x, x) is a solution of Hamilton’s equations:

x = −DpH(p,x), p = DxH(p,x).

This problem was studied in detail in chapter 2. However we will

revisit it, and generalize the previous results by allowing more general

Lagrangians. In fact, will work under the following assumptions:

L(x, v) : R2n → R,

x ∈ Rn, v ∈ Rn, is a C∞ function, strictly convex em v, i.e., D2vvL is

positive definite, and satisfying the coercivity condition

lim|v|→∞

L(x, v, t)

|v|=∞,

for each (x, t); without loss of generality, we may also assume that

L(x, v, t) ≥ 0, by adding a constant if necessary. We will also assume

that

L(x, 0, t) ≤ c1, |DxL| ≤ c2L+ c3,

for suitable constants c1, c2 and c3; finally we assume that there exists

a function C(R) such that

|D2xxL| ≤ C(R), |DvL| ≤ C(R)

whenever |v| ≤ R. The o terminal cost, ψ, is assumed to be a bounded

Lipschitz function.

Example 40. Note that, although the conditions on L are quite tech-

nical, they are fulfilled by a wide class of Lagrangians, for instance

L(x, v) =1

2vTA(x)v − V (x),

where A and V are C∞, Zn-periodic is x, and A(x) is positive definite.

J

Before considering the ”calculus of variations setting” we study a

simpler case. Let U , the control space, be a compact convex set. We

restrict the class of admissible controls by requiring u(s) ∈ U , for

all t ≤ s ≤ t1. Furthermore, we suppose that L(x, u) is a bounded

4. OPTIMAL CONTROL AND VISCOSITY SOLUTIONS 185

continuous function, convex in u. We suppose that the function f(x, u)

satisfies the following Lipschitz condition

|f(x, u)− f(y, u)| ≤ C|x− y|.

To establish existence of optimal solutions we simplify even further by

assuming that f(x, u) has the form

(108) f(x, u) = A(x)u+B(x),

where A and B are Lipschitz continuous functions.

In section 1, we start the rigorous study of optimal control problems

by establishing basic properties. The dynamic programming principle

is proved in §2.

The analog of the Euler-Lagrange equation for optimal control prob-

lems is the Pontryagin maximum principle, which will be studied in §3.

In §4, we show that, if the value function V is differentiable, it satisfies

the Hamilton-Jacobi partial differential equation

−Vt +H(DxV, x) = 0,

in which H(p, x), the Hamiltonian, is the (generalized) Legendre trans-

form of the Lagrangian L

(109) H(p, x) = supv∈U−p · f(x, v)− L(x, v).

It is well known that first order partial differential equations such as the

Hamilton-Jacobi equation may not admit classical solutions. Using the

method of characteristics, the next exercise gives an example of non-

existence of smooth solutions:

Exercise 159. Solve, using the method of characteristics, the equationut + u2x = 0 x ∈ R, t > 0

u(x, 0) = ±x2.

It is therefore necessary to consider weak solutions to the Hamilton-

Jacobi equation: viscosity solutions. In section §9 we develop the the-

ory of viscosity solutions for Hamilton-Jacobi equations, and show that


the value function is the unique viscosity solution of the Hamilton-

Jacobi equation.

Finally, in §10 we address the stationary optimal control problem

which corresponds to the Hamilton-Jacobi equation

H(Dxu, x) = H,

and the discounted cost infinite horizon problem, whose Hamilton-

Jacobi equation is

αu+H(Du, x) = 0.

Main references on optimal control and viscosity solutions are [BCD97],

[FS93], [Lio82], [Bar94], and [Eva98b].

1. Elementary examples and properties

In this section we establish some elementary properties and study

some explicit examples.

Proposition 90. The value function V satisfies the following inequal-

ities

−‖ψ‖∞ ≤ V ≤ c1|t1 − t|+ ‖ψ‖∞.

Proof. The first inequality follows from L ≥ 0. To obtain the

second inequality it is enough to observe that

V ≤ J(x, t; 0) ≤ c1|t1 − t|+ ‖ψ‖∞.

Example 41 (Lax-Hopf formula). Suppose that L(x, v) ≡ L(v), L

convex in v and coercive. Assume further that f(x, v) = v. By Jensen’s

inequality

1

t1 − t

∫ t1

t

L(x(s)) ≥ L

(1

t1 − t

∫ t1

t

x(s)

)= L

(y − xt1 − t

),

1. ELEMENTARY EXAMPLES AND PROPERTIES 187

where y = x(t1). Therefore, to solve the terminal value optimal control

problem, it is enough to consider constant controls of the form u(s) =y−xt1−t . Thus

V (x, t) = infy∈Rn

[(t1 − t)L

(y − xt1 − t

)+ ψ(y)

],

and, consequently, the infimum is a minimum. Thus Lax-Hopf formula

gives an explicit solution to the optimal control problem. J

Exercise 160. Suppose Q and A be n×n constant positive definite ma-

trices. Let L(v) = 12vTQv and ψ(y) = 1

2yTAy. Use Lax-Hopf formula

to determine V (x, t).

Proposition 91. Let ψ1(x) and ψ2(x) be continuous functions such

that

ψ1 ≤ ψ2,

and V1(x, t) and V2(x, t) the corresponding value functions. Then

V1(x, t) ≤ V2(x, t).

Proof. Fix ε > 0. Then there exists an almost optimal control uε

and corresponding trajectory xε such that

V2(x, t) >

∫ t1

t

L(xε(s),uε(s), s)ds+ ψ2(xε(t1))− ε.

Clearly

V1(x, t) ≤∫ t1

t

L(xε(s),uε(s), s)ds+ ψ1(xε(t1)),

and therefore

V1(x, t)− V2(x, t) ≤ ψ1(xε(t1))− ψ2(xε(t1)) + ε ≤ ε.

Since ε is arbitrary, this ends the proof.

An important corollary is the continuity of the value function (with

respect to the L∞ norm) on the terminal value.


Corollary 92. Let ψ1(x) and ψ2(x) be continuous functions and V1(x, t)

and V2(x, t) the corresponding value functions. Then

supx|V1(x, t)− V2(x, t)| ≤ sup

x|ψ1(x)− ψ2(x)|.

Proof. Note that

ψ1 ≤ ψ2 ≡ ψ2 + supy|ψ1(y)− ψ2(y)|.

Let V2 be the value function corresponding to ψ2. Clearly,

V2 = V2 + supy|ψ1(y)− ψ2(y)|.

By the previous proposition,

V1 − V2 ≤ 0,

which implies

V1 − V2 ≤ supy|ψ1(y)− ψ2(y)|.

By reverting the roles of V1 and V2 we obtain the other inequality.

2. Dynamic programming principle

The dynamic programming principle, that we prove in the next the-

orem, is simply a semigroup property that the value function evolution

satisfies.

Theorem 93 (Dynamic programming principle). Suppose that t0 ≤t ≤ t′ ≤ t1. Then

(110) V (x, t) = infu

[∫ t′

t

L(x(s),u(s), s)ds+ V (y, t′)

],

where x(t) = x and x = f(x,u).

2. DYNAMIC PROGRAMMING PRINCIPLE 189

Proof. Denote by V (x, t) the right hand side of (110). For fixed

ε > 0, let uε be an almost optimal control for V (x, t), and xε(s) the

corresponding trajectory trajectory , i.e.,

J(x, t; xε) ≤ V (x, t) + ε.

We claim that V (x, t) ≤ V (x, t) + ε. To check this statement, let

x(·) = xε(·) and y = xε(t′). Then

V (x, t) ≤∫ t′

t

L(xε(s),uε(s), s)ds+ V (y, t′).

Additionally

V (y, t′) ≤ J(y, t′; uε).

Therefore

V (x, t) ≤ J(x, t; uε) ≤ V (x, t) + ε,

and, since ε is arbitrary, V (x, t) ≤ V (x, t).

To prove the opposite inequality, we will proceed by contradiction.

Therefore, if V (x, t) < V (x, t), we could choose ε > 0 and a control u]

such that ∫ t′

t

L(x](s),u](s), s)ds+ V (y, t′) < V (x, t)− ε,

where x] = f(x],u]), x](t) = x, and y = x](t′). Choose u[ such that

J(y, t′; u[) ≤ V (y, t′) +ε

2

Define u? as u?(s) = u](s) for s < t′

u?(s) = u[(s) for t′ < s.

So, we would have

V (x, t)− ε >∫ t′

t

L(x](s),u](s), s)ds+ V (y, t′) ≥

≥∫ t′

t

L(x](s),u](s), s)ds+ J(y, t′; u[)− ε

2=

= J(x, t; u?)− ε

2≥ V (x, t)− ε

2,

which is a contradiction.


3. Pontryagin maximum principle

In this section we assume the control space U is bounded and that

there exists an optimal control u∗ and corresponding optimal trajectory

x∗. We assume also that the terminal data ψ is differentiable.

Let r ∈ [t, t1) be a point where u∗ is strongly approximately con-

tinuous, i.e.,

ϕ(u∗(r)) = limδ→0

1

δ

∫ r+δ

r

ϕ(u∗(s))ds,

for all continuous functions ϕ. Denote by Ξ0 the fundamental solution

of

(111) ξ0 = Dxf(x∗,u∗)ξ0,

with Ξ0(r) = I.

Let p∗ be given by

p∗(r) =Dxψ(xR(t1))Ξ0(t1)f(x∗(r),u∗(r))(112)

+

∫ t1

r

DxL(x∗(s),u∗(s), s)Ξ0(s)f(x∗(r),u∗(r))ds.(113)

Lemma 94 (Pontryagin maximum principle). Suppose that ψ is dif-

ferentiable. Then, for almost all r ∈ [t, t1),

f(x∗(r),u∗(r)) · p∗(r) + L(x∗(r),u∗(r), r)(114)

= minv∈U

[f(x∗, v) · p∗(r) + L(x∗(r), v, r)] .

Proof. Let v ∈ U . For almost all r ∈ [t0, t1) u∗ is strongly ap-

proximately continuous (see [EG92]). Let r be one of these points.

Define

uδ(s) =

v if r < s < r + δ

u∗(s) otherwise,

3. PONTRYAGIN MAXIMUM PRINCIPLE 191

and

xδ(s) =

x∗(s) if t < s < r

x∗(r) +∫ srf(x∗δ , v) if r < s < r + δ

x∗(s) + δξδ if r + δ < s < t1,

where

ξδ(r + δ) =1

δ

∫ r+δ

r

[f(x∗δ(s), v)− f(x∗(s),u∗(s))] ds,

and yδ = x∗(s) + δξδ solves, for r + δ < s < t1,

yδ = f(yδ,u∗).

Observe that

ξ0(r) = limδ→0

ξδ(r + δ) = f(x∗(r), v)− f(x∗(r),u∗(r)).

Then, as δ → 0, ξδ converges to the solution ξ0 of (111). Thus ξ0(s) =

Ξ0(s) (f(x∗(r), v)− f(x∗(r),u∗(r))).

Clearly

J(t, x; u∗) ≤∫ t1

t

L(xδ(s),uδ(s), s)ds+ ψ(x∗(t1) + δξδ).

This last inequality implies

1

δ

∫ r+δ

r

[L(xδ(s), v, s)− L(x∗(s),u∗(s), s)] ds+

+1

δ

∫ t1

r+δ

[L(x∗(s) + δξδ,u∗(s), s)− L(x∗(s),u∗(s), s)] ds+

+1

δ[ψ(x∗(t1) + δξδ)− ψ(x∗(t1))] ≥ 0.

When δ → 0, the first term converges to

L(x∗(r), v, r)− L(x∗(r),u∗(r), r),

since u∗ is strongly approximately continuous. The second term tends

to ∫ t1

r

DxL(x∗(s),u∗(s), s)ξ0(s)ds,

whereas the third one has the following limit:

Dxψ(xR(t1)) · ξ0(t1)).


This implies that for almost all r r,

L(x∗(r), v, r)− L(x∗(r),u∗(r), r)

+ p∗(r) · (f(x∗(r), v)− f(x∗(r),u∗(r))) ≥ 0.

consequently

f(x∗(r),u∗(r)) · p∗(r) + L(x∗(r),u∗(r), r)

= minv∈U

[f(x∗(r), v) · p∗(r) + L(xR(r), v, r)] ,

as required.

4. The Hamilton-Jacobi equation

Proposition 95. Suppose the value function is C1. Let r ∈ [t, t1) be a

point where u∗ is strongly approximately continuous. Then

p∗(r) = DxV (x, r).

Proof. Let u∗ be an optimal control for the initial condition (x, r).

For y ∈ Rn and δ > 0 consider the solution

xδ = f(xδ,u∗),

with initial condition xδ(t) = x+ δy. Then

∂xδ(s)

∂δ

∣∣∣∣δ=0

= Ξ0(s)y.

Since for all δ

V (x+ δy, r) ≤∫ t1

r

L(xδ,u∗)ds+ ψ(xδ(t1)),

by differentiating with respect to δ we obtain

DxV (x, r)y =

∫ t1

r

DxL(x,u∗)Ξ0(s)yds+Dxψ(x(t1))Ξ0(t1)y,

which implies the result.

Theorem 96. Suppose the value function V is C1. Then it solves

(115) − Vt +H(DxV, x) = 0.

5. VERIFICATION THEOREM 193

Proof. Consider an optimal trajectory x∗

V (x∗(t), t) =

∫ t1

t

L(x∗(s),u∗(s))ds.

Then, by differentiating with respect to t we have

Vt(x∗(t), t) +DxV (x∗(t), t)f(x∗(t),u∗(t)) + L(x∗(t),u∗(t)) = 0.

Which by Pontryagin maximum principle is equivalent to the Hamilton-

Jacobi equation (115).

Exercise 161. Let M(t), N(t) be n×n matrices with time-differentiable

coefficients. Suppose that is N invertible. Let D be a n × n constant

matrix. Consider the Lagrangian

L(x, v) =1

2xTM(t)x+

1

2vTN(t)v

and the terminal condition ψ = 12xTDx. Show that there exists a solu-

tion to the Hamilton-Jacobi with terminal condition ψ at t = T of the

form

V =1

2xTP (t)x,

where P (t) satisfies the Ricatti equation

P = P TN−1P −M

and P (T ) = D.

5. Verification theorem

Theorem 97. Let L(x, v) be a C1 Lagrangian, strictly convex in v,

and let f(x, u) a control law satisfying (108), and H the generalized

Legendre transform (109) of L. Let Φ(x, t) a classical solution to the

Hamilton-Jacobi equation

(116) − Φt +H(DxΦ, x) = 0

on the time interval [0, T ], with terminal data Φ(x, T ) = ϕ(x). Then,

for all 0 ≤ t ≤ T ,

Φ(x, t) = V (x, t),

where V is the value function.


Proof. Let x be any trajectory satisfying

x = f(x,u).

Then

ϕ(x(T ))− Φ(x(t), t) =

∫ T

t

d

dsΦ(x(s), s)ds

=

∫ T

t

DxΦ(x(s), s) · f(x,u) + Φs(x(s), s)ds.

Adding∫ TtL(x(s),u(s))ds+Φ(x(t), t) to the above equality and taking

the infimum over all controls u, we obtain

inf

(∫ T

t

L(x(s),u(s))ds+ ϕ(x(T )

))= Φ(x(t), t) + inf

(∫ T

t

Φs(x(s), s) + L(x(s),u(s)) +DxΦ(x(s), s) · f(x,u)ds

).

Now recall that for any v,

−H(p, x) ≤ L(x, v) + p · f(x, v),

therefore

inf

(∫ T

t

L(x(s), x(s))ds+ ϕ(x(T )

))≥ Φ(x(t), t) + inf

(∫ T

t

(Φs(x(s), s) +H(DxΦ(x(s), s),x(s))

)ds

)= Φ(x(t), t).

Let r(x, t) be uniquely defined as

(117) r(x, t) ∈ argminv∈U L(x, v) +DxΦ(x, t) · f(x, v).

A simple argument shows that r is a continuous function.

Now consider the trajectory x given by solving the following differ-

ential equation

x(s) = f(x, r(x(s), s)),

6. EXISTENCE OF OPTIMAL CONTROLS - BOUNDED CONTROL SPACE195

with initial condition x(t) = x. Note that since the right-hand side is

continuous there is a solution, although it may not be unique. Then

inf

(∫ T

t

L(x(s), x(s))ds+ ϕ(x(T )

))≤ Φ(x(t), t) +

∫ T

t

(Φs

(x(s), s

)−H

(DxΦ(x(s), s),x(s)

))ds

= Φ(x(t), t),

which ends the proof.

We should observe from the proof that (117) gives an optimal feed-

back law for the optimal control, provided we can find a solution to the

Hamilton-Jacobi equation.

6. Existence of optimal controls - bounded control space

We now give a proof of the existence of optimal controls for bounded

control space. The unbounded case will be addressed in §8.

Lemma 98. Suppose that f is as in (108). Then J is weakly lower

semicontinuous, with respect to weak-* convergence in L∞.

Proof. Let un be a sequence of controls such that un∗u in L∞[t, t1].

Then, by using Ascoli-Arzela theorem, we can extract a subsequence

such that xn(·) converges uniformly to x(·). Furthermore because the

control law (108) is linear we have

x = f(x,u).

We have

J(x, t; un) =

∫ t1

t

[L(xn(s),un(s), s)− L(x(s),un(s), s)] ds+

+

∫ t1

t

L(x(s),un(s), s)ds+ ψ(xn(t1)).


The first term,∫ t1t

[L(xn(s),un(s), s)− L(x(s),un(s), s)] ds, converges

to zero. Similarly, ψ(xn(t1)) → ψ(x(t1)). Finally, the convexity of L

implies

L(x(s),un(s), s) ≥ L(x(s),u(s), s) +DvL(x(s),u(s), s)(un(s)− u(s)).

Since un u∫ t1

t

DvL(x(s),u(s), s)(un(s)− u(s))ds→ 0.

Hence

lim inf J(x, t; un) ≥ J(x, t; u),

and so J is weakly lower semicontinuous.

Using the previous result we can now state and prove our first

existence result.

Lemma 99. Suppose the control set U is bounded. There exists a

minimizer u∗ of J in U .

Proof. Let un be a minimizing sequence, that is, such that

J(x, t; un)→ infu∈UR

J(x, t; u).

Because this sequence is bounded in L∞, by Banach-Alaoglu theorem

we can extract a sequence un∗u∗. Clearly, we have u∗ ∈ U . We claim

now that

J(x, t; u∗) = infu∈U

J(x, t; u).

This just follows from the weak lower semicontinuity:

infu∈U

J(x, t; u) ≤ J(x, t; u∗) ≤ lim inf J(x, t; un) = infu∈U

J(x, t; u),

which ends the proof.

Example 42 (Bang-Bang principle). Consider the case of a bounded

closed convex control space U and suppose the Lagrangian vanishes.

Suppose f(x, v) = v and that the terminal value ψ is convex.

7. SUB AND SUPERDIFFERENTIALS 197

In this setting we first observe that the set of all optimal controls

is convex. As such it admits an extreme point u∗. We claim that u∗

takes values on ∂U .

To see this choose a time r and suppose that for some ε there is

a set of positive measure in [r, r + ε] for which u∗ is in the interior of

U . Then there exists an L∞ function ν supported on this set such that∫ r+εr

dν = 0, and such that u∗ ± ν is an admissible control. By our

assumptions it is also an optimal control. It is clear then that u∗ is not

an extreme point which is a contradiction. J

Exercise 162. Show that the Bang-Bang principle also holds if the

Lagrangian is independent on the state variable x, that is L ≡ L(v).

7. Sub and superdifferentials

Let ψ : Rn → R be a continuous function. The superdifferential

D+x ψ(x) of ψ at x is the set of vectors p ∈ Rn such that

lim sup|v|→0

ψ(x+ v)− ψ(x)− p · v|v|

≤ 0.

Consequently, p ∈ D+x ψ(x) if and only if

ψ(x+ v) ≤ ψ(x) + p · v + o(|v|),

as |v| → 0. Similarly, the subdifferential, D−x ψ(x), of ψ at x is the set

of vectors p such that

lim inf|v|→0

ψ(x+ v)− ψ(x)− p · v|v|

≥ 0.

Exercise 163. Show that if u : Rn → R has a maximum (resp. mini-

mum) at x0 then 0 ∈ D+u(x0) (resp. D−u(x0)).

We can regard these sets as one-sided derivatives. In fact, ψ is

differentiable then

D−x ψ(x) = D+x ψ(x) = Dxψ(x).

More precisely,


Proposition 100. If D−x ψ(x), D+x ψ(x) 6= ∅ then

D−x ψ(x) = D+x ψ(x) = p

and ψ is differentiable at x with Dxψ = p. Conversely, if ψ is differ-

entiable at x then

D−x ψ(x) = D+x ψ(x) = Dxψ(x).

Proof. Suppose that D−x ψ(x) and D+x ψ(x) are both non-empty.

Then we claim that these two sets agree and have a single point p. To

see this, take p− ∈ D−x ψ(x) and p+ ∈ D+x ψ(x). Then

lim inf|v|→0

ψ(x+ v)− ψ(x)− p− · v|v|

≥ 0

lim sup|v|→0

ψ(x+ v)− ψ(x)− p+ · v|v|

≤ 0.

By subtracting these two identities

lim inf|v|→0

(p+ − p−) · v|v|

≥ 0.

In particular, by choosing v = −ε p+−p−|p−−p+| , we obtain

−|p− − p+| ≥ 0,

which implies p− = p+ ≡ p. Additionally p satisfies

lim|v|→0

ψ(x+ v)− ψ(x)− p · v|v|

= 0,

and, therefore, Dxψ = p.

To prove the converse it suffices to observe that if ψ is differentiable

then

ψ(x+ v) = ψ(x) +Dxψ(x) · v + o(|v|).

Exercise 164. Let ψ be a continuous function. Show that if x0 is a

local maximum of ψ then 0 ∈ D+ψ(x0).


Proposition 101. Let

ψ : Rn → R

be a continuous function. Then, if

p ∈ D+x ψ(x0) (resp. p ∈ D−x ψ(x0)),

there exists a C1 function φ such that

ψ(x)− φ(x)

has a local strict maximum (resp. minimum) at x0 and

p = Dxφ(x0).

On the other hand, if φ is a C1 function such that

ψ(x)− φ(x)

has a local maximum (resp. minimum) at x0 then

p = Dxφ(x0) ∈ D+x ψ(x0) (resp. D−x ψ(x0)).

Proof. By subtracting p · (x − x0) + ψ(x0) to ψ, we can assume,

without loss of generality, that ψ(x0) = 0 and p = 0. By changing

coordinates, if necessary, we can also assume that x0 = 0. Because

0 ∈ D+x ψ(0) we have

lim sup|x|→0

ψ(x)

|x|≤ 0.

Therefore there exists a continuous function ρ(x), with ρ(0) = 0, such

that

ψ(x) ≤ |x|ρ(x).

Let η(r) = max|x|≤rρ(x). η is continuous, non decreasing and η(0) =

0. Let

φ(x) =

∫ 2|x|

|x|η(r)dr + |x|2.

The function φ is C1 and satisfies φ(0) = Dxφ(0) = 0. Additionally, if

x 6= 0,

ψ(x)− φ(x) ≤ |x|ρ(x)−∫ 2|x|

|x|η(r)dr − |x|2 < 0.

Thus ψ − φ has a strict local maximum at 0.


To prove the second part of the proposition, suppose that the dif-

ference ψ(x) − φ(x) has a strict local maximum at 0. Without loss

of generality, we can assume ψ(0) − φ(0) = 0 and φ(0) = 0. Then

ψ(x)− φ(x) ≤ 0 or, equivalently,

ψ(x) ≤ p · x+ (φ(x)− p · x).

Thus, by setting p = Dxφ(0), and using the fact that

lim|x|→0

φ(x)− p · x|x|

= 0,

we conclude that Dxφ(0) ∈ D+x ψ(0). The case of a minimum is similar.

A continuous function f is semiconcave if there exists C such that

f(x+ y) + f(x− y)− 2f (x) ≤ C|y|2.

Similarly, a function f is semiconvex if there exists a constant such that

f(x+ y) + f(x− y)− 2f (x) ≥ −C|y|2.

Proposition 102. The following statements are equivalent:

1. f is semiconcave;

2. f(x) = f(x)− C2|x|2 is concave;

3. for all λ, 0 ≤ λ ≤ 1, and any y, z such that λy + (1− λ)z = 0

we have

λf(x+ y) + (1− λ)f(x+ z)− f(x) ≤ C

2(λ|y|2 + (1− λ)|z|2).

Additionally, if f is semiconcave, then

a. D+x f(x) 6= ∅;

b. if D−x f(x) 6= ∅ then f is differentiable at x;

c. there exists C such that, for each pi ∈ D+x f(xi) (i = 0, 1),

(x0 − x1) · (p0 − p1) ≤ C|x0 − x1|2.

Remark. Of course analogous results hold for semiconvex functions.


Proof. Clearly 2 =⇒ 3 =⇒ 1. Therefore, to prove the equiv-

alence, it is enough to show that 1 =⇒ 2. Subtracting C|x|2 to f ,

we may assume C = 0. Also, by changing coordinates if necessary, it

suffices to prove that for all x, y such that λx+ (1− λ)y = 0, for some

λ ∈ [0, 1], we have:

(118) λf(x) + (1− λ)f(y)− f(0) ≤ 0.

We claim now that the previous equation holds for each λ = k2j

, with

0 ≤ k ≤ 2j. Clearly (118) holds for j = 1. We will proceed by

induction on j. Suppose that (118) if valid for λ = k2j

. We will show

that it also holds for λ = k2j+1 . If k is even, we can reduce the fraction

and, therefore, we assume that k is odd, λ = k2j+1 and λx+(1−λ)y = 0.

Note that

0 =1

2

[k − 1

2j+1x+

(1− k − 1

2j+1

)y

]+

1

2

[k + 1

2j+1x+

(1− k + 1

2j+1y

)].

consequently,

f(0) ≥1

2f

(k − 1

2j+1x+

(1− k − 1

2j+1

)y

)+

+1

2f

(k + 1

2j+1x+

(1− k + 1

2j+1

)y

)but, since k−1 and k+1 are even, k0 = k−1

2and k1 = k+1

2are integers.

Therefore

f(0) ≥ 1

2f

(k0

2jx+

(1− k0

2j

)y

)+

1

2f

(k1

2jx+

(1− k1

2j

)y

)But this implies

f(0) ≥ k0 + k1

2j+1f(x) +

(1− k0 + k1

2j+1

)f(y).

From k0 + k1 = k we obtain

f(0) ≥ k

2j+1f(x) +

(1− k

2j+1

)f(y).

Since the function f is continuous and the rationals of the form k2j

are

dense in R, we conclude that

f(0) ≥ λf(x) + (1− λ)f(y),


for each real λ, with 0 ≤ λ ≤ 1.

To prove the second part of the proposition, observe that by propo-

sition 100, a =⇒ b. To check a, i.e., that D+x f(x) 6= ∅, it is enough

to observe that if f is concave then D+x f(x) 6= ∅. By subtracting

C|x|2 to f , we can reduce the problem to concave functions. Finally, if

pi ∈ D+x f(xi) (i = 0, 1) then

f(x0)− C

2|x0|2 ≤ f(x1)− C

2|x1|2 + (p1 − Cx1) · (x0 − x1),

and

f(x1)− C

2|x1|2 ≤ f(x0)− C

2|x0|2 + (p0 − Cx0) · (x1 − x0).

Therefore,

0 ≤ (p1 − p0) · (x0 − x1) + C|x0 − x1|2,

and so (p0 − p1) · (x0 − x1) ≤ C|x0 − x1|2.

Exercise 165. Let f : Rn → R be a continuous function. Show that if

x0 is a local maximum then 0 ∈ D+f(x0).

8. Optimal control in the calculus of variations setting

We now consider the calculus of variations setting and prove the

existence of optimal controls. The main technical issue is the fact that

the control space U = Rn is unbounded and therefore compactness

arguments do not work directly. Fortunately, the coercivity of the

Lagrangian is enough to establish the existence of a-priori bounds on

optimal controls.

Theorem 103. Let x ∈ Rn and t0 ≤ t ≤ t1. Suppose that the La-

grangian L(x, v) satisfies:

A. L is C∞, strictly convex in v (i.e., D2vvL is positive definite),

and satisfying the coercivity condition

lim|v|→∞

L(x, v)

|v|=∞,

8. OPTIMAL CONTROL IN THE CALCULUS OF VARIATIONS SETTING 203

uniformly in (x, t);

B. L is bounded by bellow (without loss of generality we assume

L(x, v) ≥ 0);

C. L satisfies the inequalities

L(x, 0) ≤ c1, |DxL| ≤ c2L+ c3

for suitable c1, c2, and c3;

D. there exist functions C0(R), C1(R) : R+ → R+ such that

|DvL| ≤ C0(R), |D2xxL| ≤ C1(R)

whenever |v| ≤ R.

Then, if ψ is a bounded Lipschitz function,

1. There exists u∗ ∈ L∞[t, t1] such that its corresponding optimal

trajectory x∗, given by

x∗(s) = u(s) x∗(t) = x,

satisfies

V (x, t) =

∫ t1

t

L(x∗(s), x∗(s))ds+ ψ(x∗(t1)).

2. There exists C, depending only on L, ψ and t1− t but not on x

or t such that |u(s)| < C for t ≤ s ≤ t1. The optimal trajectory

x∗(·) is a C2[t, t1] solution of the Euler-Lagrange equation

(119)d

dtDvL−DxL = 0

with initial condition x∗(t) = x.

3. The adjoint variable p, defined by

(120) p(t) = −DvL(x∗, x∗),

satisfies the differential equationp(s) = DxH(p(s),x∗(s))

x∗(s) = −DpH(p(s),x∗(s))


with terminal condition

p(t1) ∈ D−x ψ(x∗(t1)).

Additionally,

(p(s), H(p(s),x∗(s))) ∈ D−V (x∗(s), s)

for t < s ≤ t1.

4. The value function V is Lipschitz, and so almost everywhere

differentiable.

5. If D2vvL is uniformly bounded, then for each t < t1, V (x, t) is

semiconcave in x.

6. For t ≤ s < t1

(p(s), H(p(s),x∗(s))) ∈ D+V (x∗(s), s)

and, therefore, DV (x∗(s), s) exists for t < s < t1.

Proof. We will divide the proof into several auxiliary lemmas.

For R > 0, define UR = u ∈ U : ‖u‖∞ ≤ R. From lemma 99 there

exists a minimizer uR of J in UR. Then we will show that the minimizer

uR satisfies uniform estimates in R. Finally, we will let R→∞.

Let pR be the adjoint variable given by the Pontryagin maximum

principle. We now will try to estimate the optimal control uR uniformly

in R, in order to send R→∞.

Lemma 104. Suppose ψ is differentiable. Then there exists a constant

C, independent on R, such that

|pR| ≤ C.

Proof. Since ψ is Lipschitz and differentiable we have

|Dxψ| ≤ ‖Dxψ‖∞ <∞.

Therefore

|pR(s)| ≤∫ t1

s

|DxL(xR(r),uR(r)|dr + ‖Dxψ‖∞.


Let VR be the value function for the terminal value problem with the

additional constraint of bounded control: |v| ≤ R. From |DxL| ≤c2L+ c3, it follows

|pR(s)| ≤ C(VR(t, x) + 1),

for an appropriate constant C. Proposition 90, shows that there exists a

constant C, which does not depend on R, such that VR ≤ C. Therefore

|pR| ≤ C.

As we will see, the uniform estimates for pR yield uniform estimates

for uR.

Lemma 105. Let ψ be differentiable. Then there exists R1 > 0 such

that, for all R,

‖uR‖∞ ≤ R1.

Proof. Suppose |p| ≤ C. Then, for each c1, the coercivity condi-

tion on L implies that there exists R1 such that, if

v · p+ L(x, v) ≤ c1

then |v| ≤ R1. But then,

uR(s) · pR(s) + L(xR(s),uR(s)) ≤ L(xR(s), 0) ≤ c1,

that is, ‖uR‖∞ ≤ R1.

Since uR is bounded independently of R, we have

V = J(x, t; uR0),

for R0 > R1. Let u∗ = uR0 and p = pR0 .

Lemma 106 (Pontryagin maximum principle - II). If ψ is differen-

tiable, optimal control u∗ satisfies

u∗ · p + L(x∗,u∗) = minv

[v · p + L(x∗, v)] = −H(p,x∗),


for almost all s and, therefore,

p = −DvL(x∗,u∗) and u∗ = −DpH(p,x∗),

where H = L∗. Additionally, p satisfies the terminal condition

p(t1) = Dxψ(x∗(t1)).

Proof. Clearly it is enough to choose R sufficiently large.

Lemma 107. Let ψ be differentiable. The minimizing trajectory x(·)is C2 and satisfies the Euler-Lagrange equation (119). Furthermore,

p = DxH(p,x∗) x = −DpH(p,x∗).

Proof. By its definition p is continuous. We know that

x∗(s) = −DpH(p(s),x∗(s)),

almost everywhere. Since the right hand side of the previous identity is

continuous, the identity holds everywhere and, therefore, we conclude

that x∗ is C1. Because p is given by the integral of a continuous

function (112),

p(r) = Dxψ(x∗(t1)) +

∫ t1

r

DxL(x∗(s),u∗(s))ds,

we conclude that p is C1. Additionally,

x∗ = −DpH(p,x∗)

and, therefore, x∗ is C1, which implies that x is C2. We have also

p = −DvL(x∗, x∗) p = −DxL(x∗, x∗),

from which it follows

(121)d

dtDvL(x∗, x∗)−DxL(x∗, x∗) = 0.

Thus, since DxL(x∗, x∗) = −DxH(p,x∗), we conclude that

p = DxH(p,x∗) x∗ = −DpH(p,x∗),

as required.


In the case in which ψ is only Lipschitz and not C1, we can consider

a sequence of C1 functions, ψn → ψ uniformly, such that

‖Dxψn‖∞ ≤ ‖Dψ‖L∞ .

for each ψn. Let

Jn(x, t; u) =

∫ t1

t

L(xn(s), xn(s))ds+ ψn(xn(t1)),

and x∗n, u∗n, respectively, the corresponding optimal trajectory and op-

timal control. Similarly, let pn be the corresponding adjoint variable.

Passing to a subsequence, if necessary, the boundary values xn(t1) and

pn(t1) converge, respectively, for some x0 and p0. The optimal trajec-

tories x∗n and corresponding optimal controls u∗n converge uniformly,

by using Ascoli-Arzela theorem, to optimal trajectories and controls of

the limit problem. Let

p(s) = limn→∞

pn(s).

Then, for almost every s,

u∗ · p(s) + L(x∗(s),u∗(s)) = infv

[v · p(s) + L(x∗(s), v)] ,

which implies

p(s) = −DvL(x∗(s), x∗(s)),

for almost all s. But, in the previous equation both terms are contin-

uous functions thus the identity holds for all s.

Lemma 108. For t < s ≤ t1 we have

(p(s), H(p(s),x∗(s))) ∈ D−V (x∗(s), s).

Proof. Let x∗ be the optimal trajectory and u∗ the corresponding

optimal control. For r ≤ t1 and y ∈ Rn, define xr = x∗(r) and consider

the sub-optimal control

u] = u∗ +y − xrr − t

,

whose trajectory we denote by x], x](t) = x. Note that x](r) = y.


We have

V (x, t) =

∫ s

t

L(x∗(τ),u∗(τ))dτ + V (x∗(s), s)

and, by the sub-optimality of x],

V (x∗(t), t) ≤∫ r

t

L(x](τ),u](τ))dτ + V (y, r).

This implies

V (x∗(s), s)− V (y, r) ≤ φ(y, r),

with

φ(y, r) =

∫ r

t

L(x](τ),u](τ))dτ −∫ s

t

L(x∗(τ),u∗(τ))dτ.

Since φ is differentiable at y and r,

(−Dyφ(x∗(s), s),−Drφ(x∗(s), s)) ∈ D−V (x∗(s), s).

Observe that

x](τ) = x∗(τ) +y − xrr − t

(τ − t),

and, therefore,

Dyφ(x∗(s), s) =

∫ s

t

[DxL

τ − ts− t

+DvL1

s− t

]dτ.

Integrating by parts and using (121), we obtain

Dyφ(x∗(s), s) = DvL(x∗(s), x∗(s)) = −p(s).

Similarly,

Drφ(y, r) = L(y,u](r)) +

∫ s

t

[−DxL

y − xr(r − t)2

(τ − t)

+DxL−u∗(r)

(r − t)(τ − t)−DvL

y − xr(r − t)2

+DvL−u∗(r)

r − t

]dτ.

Integrating by parts and evaluating at y = x∗(s), r = s, we obtain

Drφ(x∗(s), s) = L(x∗(s), x∗(s))− u∗(s)DvL(x∗(s), x∗(s))

= −H(p(s),x∗(s)),

as we needed.

Lemma 109. The value function V is Lipschitz.


Proof. Let t < t1 be fixed and x, y arbitrary. We suppose first

that t1 − t < 1. Then

V (y, t)− V (x, t) ≤ J(y, t; u∗)− V (x, t),

where V (x, t) = J(x, t; u∗). Therefore, there exists a constant C, de-

pending only on the Lipschitz constant of ψ and of the supremum of

|DxL|, such that

V (y, t)− V (x, t) ≤ C|x− y|.

Suppose that t1 − t > 1. Letu(s) = u∗ + (x− y) if t < s < t+ 1

u(s) = u∗(s) if t+ 1 ≤ s ≤ t1.

Then

V (y, t)− V (x, t) ≤ J(y, t; u)− V (x, t) ≤ C|x− y|,

where the constant C depends only on DxL and on DvL, and not on

the Lipschitz constant of ψ. Reverting the roles of x and y we conclude

|V (y, t)− V (x, t)| ≤ C|x− y|.

Without loss of generality we can suppose that t < t. Note that

|V (x, t)− V (x∗(t), t)| ≤ C|t− t|.

To prove that V is Lipschitz in t it is enough to check that

(122) |V (x∗(t), t)− V (x, t)| ≤ C|t− t|.

But since x∗ is uniformly bounded

|x∗(t)− x| ≤ C|t− t|

thus, the previous Lipschitz estimate implies (122).

Lemma 110. V is differentiable almost everywhere.

Proof. Since V is Lipschitz, the almost everywhere differentiabil-

ity follows from Rademacher theorem.


In general, the value function is Lipschitz and not C1 or C2. How-

ever we can prove an one-side estimate for second derivatives, i.e. that

V is semiconcave.

Lemma 111. Suppose that |D2xvL|, |D2

vvL| ≤ C(R) whenever |v| ≤ R.

Then, for each t < t1, V (x, t) is semiconcave in x.

Proof. Fix t and x. Choose y ∈ Rn arbitrary. We claim that

V (x+ y, t) + V (x− y, t) ≤ 2V (x, t) + C|y|2,

for some constant C. Clearly,

V (x+ y, t) + V (x− y, t)− 2V (x, t)

≤∫ t1

t

[L(x∗ + y, x∗ + y) + L(x∗ − y, x∗ − y)− 2L(x∗, x∗)] ds,

where

y(s) = yt1 − st1 − t

.

Since |D2xxL| ≤ C1(R),

L(x∗ + y, x∗ + y) ≤ L(x∗, x∗ + y) +DxL(x∗, x∗ + y)y + C|y|2

and, in a similar way for the other term. We also have

L(x∗, x∗ + y) + L(x∗, x∗ − y) ≤ 2L(x∗, x∗) + C|y|2 + C|y||y|.

Thus

L(x∗ + y, x∗ + y) + L(x∗ − y, x∗ − y) ≤ 2L(x∗, x∗) + C|y|2 + C|y|2.

This inequality implies the lemma.

Lemma 112. We have

(p(s), H(p(s),x∗(s))) ∈ D+V (x∗(s), s)

for t ≤ s < t1. Therefore DV (x∗(s), s) exists for t < s < t1.

Proof. Let u∗ be an optimal control at (x, s) and let p be the

corresponding adjoint variable. Define W by

W (y, r) = J

(y, r; u∗ +

x∗(r)− yt1 − r

)− V (x, s).


Hence, for each y ∈ Rn and t ≤ r < t1,

V (y, r)− V (x, s) ≤ W (y, r),

with equality at (y, r) = (x, s). Since W is C1, it is enough to check

that

DyW (x∗(s), s) = p(s),

and

DrW (x∗(s), s) = H(p(s),x∗(s)).

The first identity follows from

DyW (s,x∗(s)) =

∫ t1

s

DxLϕ+DvLdϕ

dτdτ,

where ϕ(τ) = t1−τt1−s . Using the Euler-Lagrange equation

d

dtDvL−DxL = 0

and integration by parts we obtain

DyW (s,x∗(s)) = −DvL(x∗(s), x∗(s)) = p(s).

On the other hand,

DrW (s,x∗(s)) = −L(x∗(s), x∗(s)) +

∫ t1

s

DxLφ+DvLdφ

dτdτ,

where

φ(τ) =τ − t1t1 − s

x∗(s).

Using again the Euler-Lagrange equation and integration by parts, we

obtain

DrW (s,x∗(s)) = −L(x∗(s), x∗(s), s) +DvL(x∗(s), x∗(s))x∗(s),

or equivalently

DrW (s,x∗(s)) = H(p(s),x∗(s)).

The last part of the lemma follows from proposition 100.

This ends the proof of the theorem.


In what follows we prove that the value function is differentiable at

points of uniqueness of optimal trajectory.

A point (x, t) is regular if there exists a unique optimal trajectory

x∗(s) such that x∗(t) = x and

V (x, t) =

∫ t1

t

L(x∗(s), x∗(s))ds+ ψ(x∗(t1)).

Theorem 113. V is differentiable with respect to x at (x, t) if and only

if (x, t) is a regular point.

Proof. The next lemma shows that differentiability at a point x

implies that x is a regular point:

Lemma 114. If V is differentiable with respect to x at a point (x, t),

then there exists a unique optimal trajectory

Proof. Since V is differentiable with respect to x at (x, t), then

any optimal trajectory satisfies

x∗(t) = −DpH(p(t),x∗(t)),

since p(t) = DxV (x). Therefore, once DxV (x∗(t), t) is given, the veloc-

ity x∗(t) is uniquely determined. The solution of the Euler-Lagrange

equation (119) is determined by the initial condition and velocity: x∗(t)

and x∗(t). Thus, the optimal trajectory is unique.

To prove the other implication we need an auxiliary lemma:

Lemma 115. Let p such that

‖DxV (·, t)− p‖L∞(B(x,2ε)) → 0

when ε → 0. Then V is differentiable with respect to x at (x, t) and

DxV (x, t) = p.

Proof. Since V is Lipschitz, it is differentiable almost everywhere.

By Fubin theorem, for almost every point with respect to the Lebesgue


measure induced in Sn−1, V is differentiable y = x + λk, with respect

to the Lebesgue measure in R. For these directions

V (y, t)− V (x, t)− p · (y − x)

|x− y|

=

∫ 1

0

(DxV (x+ s(y − x), t)− p) · (y − x)

|x− y|ds.

Suppose 0 < |x− y| < ε. Then∣∣∣∣V (x, t)− V (y, t)− p · (x− y)

|x− y|

∣∣∣∣ ≤ ‖DxV (·, t)− p‖L∞(B(x,ε)).

In principle, the last identity only holds almost everywhere. However,

for y 6= x, the left-hand side is continuous in y. consequently, the

inequality holds for all y 6= x. Therefore, when y → x,∣∣∣∣V (x, t)− V (y, t)− p · (x− y)

|x− y|

∣∣∣∣→ 0,

which implies DxV (x, t) = p.

Suppose that V is not differentiable at (x, t). We claim that (x, t)

is not regular. By contradiction, suppose that (x, t) is regular. Then if

V fails to be differentiable, the previous lemma implies that for each p,

‖DxV (·, t)− p‖L∞(B(x,ε)) 9 0.

Thus, we could choose two sequences x1n and x2

n such that xin → x but

whose corresponding optimal trajectories xin satisfy

lim x1n(t) 6= lim x2

n(t).

However, this shows that (x, t) is not regular. Indeed if (x, t) were

regular, and xn were any sequence converging to x, and x∗n(·) the cor-

responding optimal trajectory then

x∗n(t)→ x∗(t).

If this were not true, by Ascoli-Arzela theorem, we could extract a

convergent subsequence xnk(·)→ y(·), and for which

x∗nk(t)→ v 6= x∗(t).


Let y(·) be the solution to the Euler-Lagrange equation with initial

condition y(t) = x(t) and y(t) = v. Note that x∗n(·) → y(·) and

x∗n(·)→ y(·), uniformly in compact sets, and, therefore,

V (x, t) = limn→∞

V (xn, t) = limn→∞

J(xn, t; xn)

= J(x, t; y) > J(x, t; x∗) = V (x, t),

since the trajectory y cannot be optimal, by regularity, which is a

contradiction.

Remark. This theorem implies that all points of the form (x∗(s), s),

in which x∗ is and optimal trajectory are regular for t < s < t1.

Exercise 166. Show that the optimal control ”bounded control space”

setting, the value function is Lipschitz continuous if the terminal cost

is Lipschitz continuous.

Exercise 167. In the optimal control ”bounded control space” setting,

show that if ψ is Lipschitz, for any (x, t) there exists p such that

(p(s), H(p(s),x∗(s))) ∈ D−V (x∗(s), s)

for t < s ≤ t1 and

(p(s), H(p(s),x∗(s))) ∈ D+V (x∗(s), s)

for t ≤ s < t1.

9. Viscosity solutions

In this section we discuss the viscosity solutions in the calculus of

variations setting. Since, however with small modifications our results

hold for the bounded control setting, we have added exercises in which

the reader is required to prove the analogous results.

Theorem 116. Consider the calculus of variations setting for the opti-

mal control problem. Suppose that the value function V is differentiable

at (x, t). Then, at this point, V satisfies the Hamilton-Jacobi equation

(123) − Vt +H(DxV, x, t) = 0.

9. VISCOSITY SOLUTIONS 215

Proof. If V is differentiable at (x, t) then the result follows by

using statement 6 in theorem 103.

Exercise 168. Show that (123) also holds in the ”bounded control

case” setting. Hint: use exercises 166 and 167.

Corollary 117. Consider the calculus of variations setting for the op-

timal control problem. The value function V satisfies the Hamilton-

Jacobi equation almost everywhere.

Proof. Since the value function V is differentiable almost every-

where, by theorem 103, theorem 116 implies this result.

Exercise 169. Show that the previous corollary also holds in the ”bounded

control case” setting.

However, it is not true that a Lipschitz function satisfying the

Hamilton-Jacobi equation almost everywhere is the value function of

the terminal value problem, as shown in the next example.

Example 43. Consider the Hamilton-Jacobi equation

−Vt + |DxV |2 = 0

with terminal data V (x, 1) = 0. The value function is V ≡ 0, which

is a (smooth) solution of the Hamilton-Jacobi equation However, there

are other solutions, for instance,

V (x, t) =

0 if |x| ≥ 1− t|x| − 1 + t if |x| < 1− t

which satisfy the same terminal condition t = 1 and is solution almost

everywhere. J

A bounded uniformly continuous function V is a viscosity subsolu-

tion (resp. supersolution) of the Hamilton-Jacobi equation (123) if for

any C1 function φ and any interior point (x, t) ∈ argmaxV − φ (resp.

argmin) then

−Dtφ+H(Dxφ, x, t) ≤ 0


(resp. ≥ 0) at (x, t). A bounded uniformly continuous function V is

a viscosity solution of the Hamilton-Jacobi equation if it is both a sub

and supersolution.

The value function is a viscosity solution of (123), although it may

not be a classical solution. The motivation behind the definition of

viscosity solution is the following: if V is differentiable and (x, t) ∈argmaxV −φ (or argmin) then DxV = Dxφ and DtV = Dtφ, therefore

we should have both inequalities. The specific choice of inequalities is

related with the following parabolic approximation of the Hamilton-

Jacobi equation

(124) −Dtuε +H(Dxu

ε, x, t) = ε∆uε.

This equation arises naturally in optimal stochastic control (see [FS93]).

The limit ε → 0 corresponds to the case in which the diffusion coeffi-

cient vanishes.

Proposition 118. Let uε be a family of solutions of (124) such that, as

ε → 0, the sequence uε → u uniformly. Then u is a viscosity solution

of (123).

Proof. Suppose that φ(x, t) is a C2 function such that u− φ has

a strict local maximum at (x, t). We must show that

−Dtφ+H(Dxφ, x, t) ≤ 0.

By hypothesis, uε → u uniformly. Therefore we can find sequences

(xε, tε) → (x, t) such that uε − φ has a local maximum at (xε, tε).

Therefore,

Duε(xε, tε) = Dφ(xε, tε)

and

∆uε(xε, tε) ≤ ∆φ(xε, tε).

Consequently,

−Dtφ(xε, tε) +H(Dxφ(xε, tε), xε, tε) ≤ ε∆φ(xε, tε).

It is therefore enough to take ε→ 0 to end the proof.


An useful characterization of viscosity solutions is given in the next

proposition:

Proposition 119. Let V be a bounded uniformly continuous function.

Then V is a viscosity subsolution of (123) if and only if for each (p, q) ∈D+V (x, t),

−q +H(p, x, t) ≤ 0.

Similarly, V is a viscosity supersolution if and only if for each (p, q) ∈D−V (x, t),

−q +H(p, x, t) ≥ 0.

Proof. This result is an immediate corollary of proposition 101.

Example 44. In example 43 we have found two different solutions to

equation

−Vt + |DxV |2 = 0

satisfying the same boundary data.

It is easy to check that the value function V = 0 is viscosity solution

(it is smooth, satisfies the equation and the terminal condition). The

other solution, which is an almost everywhere solution is not a viscosity

solution (check!).

Now we will show that the definition of viscosity solution is consis-

tent with classical solutions.

Proposition 120. A differentiable solution of (123) is a classical so-

lution.

Proof. If V is differentiable then

D+V = D−V = (DxV,DtV ).

Since V is a viscosity solution, we obtain immediately

−DtV +H(DxV, x, t) ≤ 0, e −DtV +H(DxV, x, t) ≥ 0,

therefore −DtV +H(DxV, x, t) = 0.


Theorem 121. Let uα be the value function of the infinite horizon

discounted cost problem (??). Then uα is a viscosity solution to

αuα +H(Duα, x) = 0.

Similarly, let V be a solution to the initial value problem (??). Then

V is a viscosity solution of

Vt +H(DxV, x) = 0.

Proof. We present the proof only for the discounted cost infinite

horizon as the other case is similar, and we refer the reader to [Eva98a],

for instance. Let ϕ : Td → R, ϕ(x), be a C∞ function, and let x0 ∈argmin(uα − ϕ). By adding a suitable constant to ϕ we may assume

that u(x0)− ϕ(x0) = 0, and u(x)− ϕ(x) ≥ 0 at all other points.

We must show that

αϕ(x0) +H(Dxϕ(x0), x0) ≥ 0,

that is, there exists v ∈ Rd such that

αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) ≥ 0.

By contradiction assume that there exists θ > 0 such that

αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) < −θ,

for all v. Because the mapping v 7→ L is superlinear and ϕ is C1,

there exists a R > 0 and r1 > 0 such that for all x ∈ Br1(x0) and all

v ∈ BcR(0) = Rd \BR(0) we have

αϕ(x) + v ·Dxϕ(x)− L(x, v) < −θ2.

By continuity, for some 0 < r < r1 and all x ∈ Br(x0) we have

αϕ(x) + v ·Dxϕ(x)− L(x, v) < −θ2,

for all v ∈ BR(0).

Therefore for any trajectory x with x(0) = x0 and any T ≥ 0 such

that the trajectory x stays near x0 on [−T, 0], i.e., x(t) ∈ Br(x0) for


t ∈ [−T, 0] we have

e−αTu(x(−T ))− u(x0) ≥ e−αTϕ(x(−T ))− ϕ(x0)

= −∫ 0

−Teαt(αϕ(x(t)) + x(t) ·Dxϕ(x(t))

)dt

≥ θ

2

∫ 0

−Teαtdt−

∫ 0

−TeαtL(x, x)dt.

This yields

u(x0) ≤ −θ2

∫ 0

−Teαtdt+

∫ 0

−TeαtL(x, x)dt+ e−αTu(x(−T ))

Since the infimum in (??) is, in fact, a minimum we can choose a time

interval [−T ∗, 0] and a trajectory x∗ that minimizes (??):

u(x0) =

∫ 0

−T ∗eαtL(x∗, x∗)dt+ e−αTu(x∗(−T ∗)).

A minimizing trajectory on [−T ∗, 0] also minimizes on any sub interval:

for any T ∈ (0, T ∗) we have

u(x0) =

∫ 0

−TeαtL(x∗, x∗)dt+ e−αTu(x∗(−T )).

Taking T small enough we can insure that x∗ stays near x0 on [−T, 0].

This yields a contradiction.

Now consider x0 ∈ argmax(uα − ϕ). Again, by adding a suitable

constant to ϕ we may assume that u(x0)−ϕ(x0) = 0, and u(x)−ϕ(x) ≤0 at all other points.

We must show that

αϕ(x0) +H(Dxϕ(x0), x0) ≤ 0,

that is, for all v ∈ Rd we have

αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) ≤ 0.

By contradiction assume that there exists θ > 0 such that for some v

αϕ(x0) + v ·Dxϕ(x0)− L(x0, v) > θ.


By continuity, for some r > 0 and all x ∈ Br(x0) we have

αϕ(x) + v ·Dxϕ(x)− L(x, v) >θ

2.

The trajectory x, with x(0) = x0, x = v stays near x0 for t ∈ [−T, 0],

provided T > 0 is sufficiently small. Therefore

e−αTu(x(−T ))− u(x0) ≤ e−αTϕ(x(−T ))− ϕ(x0)

= −∫ 0

−Teαt(αϕ(x(t)) + x(t) ·Dxϕ(x(t))

)dt

≤ −θ2

∫ 0

−Teαtdt−

∫ 0

−TeαtL(x, x)dt.

This yields

u(x0) ≥ θ

2

∫ 0

−Teαtdt+

∫ 0

−TeαtL(x, x)dt+ e−αTu(x(−T )) .

But since by (??)

u(x0) ≤∫ 0

−TeαtL(x, x)dt+ e−αTu(x(−T )),

this yields the contradiction θ2

1−e−αTα≤ 0 with T > 0.

Exercise 170. Show that the function V (x, t) given by the Lax-Hopf

formula is Lipschitz in x for each t < t1, regardless of the smoothness

of the terminal data (note, however that the constant depends on t).

Exercise 171. Use the Lax-Hopf formula to determine the viscosity

solution of

−ut + u2x = 0,

para t < 0 and u(x, 0) = ±x2 − 2x.

Exercise 172. Use the Lax-Hopf formula to determine the viscosity

solution of

−ut + u2x = 0,

for t < 0 and

u(x, 0) =

0 if x < 0

x2 if 0 ≤ x ≤ 1

2x− 1 if x > 1.


To establish uniqueness of viscosity solutions we need the following

lemma:

Lemma 122. Let V be a viscosity solution of

−Vt +H(DxV, x) = 0

in [0, T ] × Rn and φ a C1 function. If V − φ has a maximum (resp.

minimum) at (x0, t0) ∈ Rd × (0, T ] then

(125) −φt(x0, t0)+H(Dxφ(x0, t0), x0) ≤ 0 (resp. ≥ 0) at (x0, t0).

Remark: The important point is that the inequality is valid even for

some non-interior points (t0 = 0).

Proof. Only the case t0 = 0 requires proof since in the other case

the maximum is interior and then the viscosity property (the definition

of viscosity solution) yields the inequality. Consider

φ = φ− ε

t.

Then V − φ has an interior local maximum at (xε, tε) with tε < 0.

Furthermore, (xε, tε)→ (x0, 0), as ε→ 0. At the point (xε, tε) we have

φt(xε, tε) +ε

t2ε+H(Dxφ(xε, tε), xε) ≤ 0,

that is, since εt2ε≥ 0,

φt(x0, 0) +H(Dxφ(x0, 0), x0) ≤ 0.

Analogously we obtain the opposite inequality, using φ = φ+ εt.

Finally we establish uniqueness of viscosity solutions:

Theorem 123 (Uniqueness). Suppose H satisfies

|H(p, x)−H(q, x)| ≤ C(|p|+ |q|)|p− q|

|H(p, x)−H(p, y)| ≤ C|x− y|(C +H(p, x))

Then the value function is the unique viscosity solution to the Hamilton-

Jacobi equation

−Vt +H(DxV, x) = 0


that satisfies the terminal condition V (x, T ) = ψ(x).

Proof. Let V and V be two viscosity solutions with

sup−T≤t≤0

V − V = σ > 0.

For 0 < ε, λ < 1 we define

ψ(x, y, t, s) = V (x, t)−V (y, s)−λ(t+s+2T )− 1

ε2(|x−y|2+|t−s|2)−ε(|x|2+|y|2).

When ε, λ are sufficiently small we have

maxψ(x, y, t, s) = ψ(xε,λ, yε,λ, tε,λ, sε,λ) >σ

2.

Since ψ(xε,λ, yε,λ, tε,λ, sε,λ) ≥ ψ(0, 0,−T,−T ), and both V and V are

bounded, we have

|xε,λ − yε,λ|2 + |tε,λ − sε,λ|2 ≤ Cε2

and

ε(|xε,λ|2 + |yε,λ|2) ≤ C.

From these estimates and the fact that V and V are continuous, it then

follows that|xε,λ − yε,λ|2 + |tε,λ − sε,λ|2

ε2= o(1),

as ε→ 0.

Denote by ω and ω the modulus of continuity of V and V . Thenσ

2≤ V (xε,λ, tε,λ)− V (yε,λ, sε,λ)

= V (xε,λ, tε,λ)− V (xε,λ,−T ) + V (xε,λ,−T )− V (xε,λ,−T )+

+ V (xε,λ,−T )− V (xε,λ, sε,λ) + V (xε,λ, sε,λ)− V (yε,λ, sε,λ) ≤

≤ ω(T + tε,λ) + ω(T + sε,λ) + ω(o(ε)).

Therefore, if ε is sufficiently small T + tε,λ > µ > 0, uniformly in ε.

Let φ be given by

φ(x, t) = V (yε,λ, sε,λ) + λ(2T + t+ sε,λ)+

+1

ε2(|x− yε,λ|2 + |t− sε,λ|2) + ε(|x|2 + |yε,λ|2).


Then, the difference

V (x, t)− φ(x, t)

achieves a maximum at (xε,λ, tε,λ).

Similarly, for φ given by

φ(y, s) = V (xε,λ, tε,λ)− λ(2T + tε,λ + s)−

− 1

ε2(|xε,λ − y|2 + |tε,λ − s|2)− ε(|xε,λ|2 + |y|2),

the difference

V (y, s)− φ(y, s)

has a minimum at (yε,λ, sε,λ).

Therefore

φt(xε,λ, tε,λ) +H(Dxφ(xε,λ, tε,λ), xε,λ) ≤ 0,

and

φs(yε,λ, sε,λ) +H(Dyφ(yε,λ, sε,λ), yε,λ) ≥ 0.

Simplifying, we have

(126) λ+ 2tε,λ − sε,λ

ε2+H(2

xε,λ − yε,λε2

+ 2εxε,λ, xε,λ) ≤ 0,

and

(127) − λ+ 2tε,λ − sε,λ

ε2+H(2


− 2εyε,λ, yε,λ) ≥ 0.

From (126) we gather that

(128) H(2xε,λ − yε,λ

ε2+ 2εxε,λ, xε,λ) ≤ −λ+

o(1)

ε.


By subtracting (126) to (127) we have

2λ ≤ H(2xε,λ − yε,λ

ε2− 2εyε,λ, yε,λ)−H(2


+ 2εxε,λ, xε,λ)

≤ H(2xε,λ − yε,λ

ε2− 2εyε,λ, yε,λ)−H(2


− 2εyε,λ, xε,λ)

+H(2xε,λ − yε,λ

ε2− 2εyε,λ, xε,λ)−H(2



≤(C + CH(2



)|xε,λ − yε,λ|

+ Cε

(∣∣∣∣2xε,λ − yε,λε2+ 2εxε,λ

∣∣∣∣+

∣∣∣∣2xε,λ − yε,λε2− 2εyε,λ

∣∣∣∣) |xε,λ − yε,λ|≤(o(1)

ε+ C

)(|xε,λ − yε,λ|+ |tε,λ − sε,λ|) ,

when ε→ 0, which is a contradiction.

10. Stationary problems

In this section we consider optimal control stationary problems.

These problems arise in stationary steady state control and also in the

infinite horizon discounted cost problem. In this chapter we consider

the calculus of variations setting, however similar results hold for the

bounded control setting.

We define the discounted cost function Jα, with discount rate α, as

Jα(x;u) =

∫ ∞0

L(x(s), x(s))e−αsds.

In this case, the optimal trajectories x(·) satisfy the differential equa-

tion

x = u,

with the initial condition x(0) = x.

As before, the value function, uα, is given by

uα(x) = inf Jα(x; u),

where infimum is taken over all controls u ∈ L∞loc.

10. STATIONARY PROBLEMS 225

The dynamic programming principle in this case is

Proposition 124. For each t > 0

uα(x) = infx(0)=x

[∫ t

0

L(x(s), x(s))e−αsds+ e−αtuα(x(t))

].

Proof. Observe that

uα(x) = infx(0)=x

[∫ t

0

L(x(s), x(s))e−αsds

+e−αt∫ ∞t

L(x(s), x(s))e−α(s−t)ds

]≥ inf

x(0)=x

[∫ t

0


].

The other inequality is left as an exercise:


uα(x) ≤ infx(0)=x

[∫ t

0


].

Because of the dynamic programming, it is clear that

V (x, t) = e−αtuα(x)

is a viscosity solution of

−Vt + e−αtH(eαtDxV, x) = 0.

This then implies

Corollary 125. uα is a viscosity solution of

αuα +H(Dxuα, x) = 0.

Furthermore

Corollary 126. If uα is differentiable then it is a solution of

(129) H(Dxuα, x) + αuα = 0.


Exercise 174. Show that the optimal trajectories for the discounted

cost infinite horizon are solutions to the (negatively damped) Euler-

Lagrange equation

(130)d

dt

∂L

∂x− α∂L

∂x− ∂L

∂x= 0.

If x(t) satisfies (130), the energy H may not be conserved

Example 45. Let L(x, v) = v2

2+ cosx. Then (130) reads

x− αx + sin x = 0.

When α = 0 the energy

H =x2

2− cos x

is constant in time, but for α > 0 we have

dH

dt= αx2.

Therefore, the energy increases in time unless x = 0. J

Proposition 127. Suppose that x(t) satisfies (130). Then

dH

dt= αDvL(x(t), x(t)) · x(t).

Proof. Let

p(t) = −DvL(x(t), x(t))

we have

dH

dt= DpH · p +DxH · x

= x · (αDvL+DxL)−DxL · x = αDvL · x.

We assume now that H is Zn periodic in x. We will show that as

α→ 0, the solution uα converges (up to constants) to a solution of

(131) H(Dxu, x) = H.

for some H.


Theorem 128. Let uα be a viscosity solution to

αuα +H(Duα, x) = 0.

Then αuα is uniformly bounded and uα is Lipschitz, uniformly in α.

Proof. First let xM be the point where uα(x) has a global max-

imum, and xm a point of global minimum. Then, by the viscosity

property, i.e., the definition of the viscosity solution, we have

αuα(xM) +H(0, xM) ≤ 0, αuα(xm) +H(0, xm) ≥ 0,

which yields that αuα is uniformly bounded.

Now we establish the Lipschitz bound. Observe that if uα is Lips-

chitz, then there exists M > 0 such that

uα(x)− uα(y) ≤M |x− y|,

for all x, y. By contradiction, assume that for every M > 0 there exists

x and y such that

uα(x)− uα(y) > M |x− y|.

Let ϕ(x) = uα(y) + M |x − y|. Then uα(x) − ϕ(x) has a maximum at

some point x 6= y. Therefore

αuα(x) +H(M x−y|x−y| , x

)≤ 0,

which by the coercivity of H yields a contradiction if M is sufficiently

large.

Example 46. We can also use directly calculus of variations methods

to show that the exists C, independent of α, such that

uα ≤C

α.

Indeed, since L(x, 0) is bounded

uα(x) ≤ Jα(x, 0) ≤∫ ∞

0

L(x, 0)e−αsds ≤ C

α.

J


Theorem 129. (Stability theorem for viscosity solutions) Assume that

for α > 0 function uα is a viscosity solution for Hα(u,Du, x) = 0. Let

Hα → H uniformly on compact sets, and uα → u uniformly. Then u

is a viscosity solution for H(u,Du, x) = 0.

Proof. Suppose u−ϕ has a strict local maximum (resp. minimum)

at a point x0. Then there exists xα → x such that uα − ϕ has a local

maximum (resp. minimum) at xα. Then

Hα(uα(xα), Dϕ(xα), xα) ≤ 0 (resp. ≥ 0).

Letting α→ 0 finishes the proof.

As demonstrated in context of homogenization of Hamilton-Jacobi

equations, in the classic but unpublished paper by Lions, Papanicolaou

and Varadhan [Lio82], it is possible to construct, using the previous

result, viscosity solutions to the stationary Hamilton-Jacobi equation

(132) H(Du, x) = H.

Theorem 130 (Lions, Papanicolao, Varadhan). There exists a num-

ber H and a function u(x), Zd periodic in x, that solves (132) in the

viscosity sense.

Proof. Since uα − minuα is periodic, equicontinuous, and uni-

formly bounded, it converges, up to subsequences, to a function u.

Moreover uα ≤ Cα

, thus αuα converges uniformly, up to subsequences,

to a constant, which we denote by −H. Then, the stability theorem for

viscosity solutions, theorem 129, implies that u is a viscosity solution

of

H(Du, x) = H.

Theorem 131. Let u : Td → R be a viscosity solution to

H(Du, x) = C.

Then u is Lipschitz, and the Lipschitz constant does not depend on u.


Proof. First observe that from the fact that u = u − 0 achieves

maximum and minimum in Td we have

minx∈Td

H(0, x) ≤ C ≤ maxx∈Td

H(0, x).

Then, it is enough to argue as in the proof of Theorem 128.

Exercise 175. Let u : R → R be continuous and piecewise differen-

tiable (with left and right limits for the derivative at any point). Show

that u is a viscosity solution of

H(Dxu, x) = H

if

1. u satisfies the equation almost everywhere;

2. whenever Dxu is discontinuous then Dxu(x−) > Dxu(x+).

Example 47 (One dimensional pendulum). The Hamiltonian corre-

sponding to a one-dimensional pendulum with unit mass and unit

length is

H(p, x) =p2

2− cos 2πx.

In this case, it is not difficult to determine explicitly the solution to

the Hamilton-Jacobi equation

H(P +Dxu, x) = H(P ),

where P is a real parameter. In fact, for P ∈ R and almost every

x ∈ R, the solution u(P, x) satisfies

(P +Dxu)2

2= H(P ) + cos 2πx.

consequently, H(P ) ≥ 1 and, therefore,

Dxu = −P ±√

2(H(P ) + cos 2πx), q.t.p. x ∈ R.

Thus

u =

∫ x

0

−P + s(y)

√2(H(P ) + cos 2πy)dy + u(0),

where |s(y)| = 1. Since H is convex em p and u is a viscosity solution,

the only possible discontinuities on the derivative of u are the ones


that satisfy Dxu(x−)−Dxu(x+) > 0, see exercise 175. Therefore s can

change sign from 1 to −1 at any point, however the jumps from −1 to

1 can only happen when√2(H(P ) + cos 2πx) = 0.

Since we are looking for 1-periodic solutions, there are only two cases

to consider. The first, in which H(P ) > 1 and the solution is C1

since√

2(H(P ) + cos 2πy) never vanishes. In this case H(P ) can be

determined as from P through the equation

P = ±∫ 1

0

√2(H(P ) + cos 2πy)dy.

It is easy to check that this equation has a unique solution H(P ) when-

ever

|P | >∫ 1

0

√2(1 + cos 2πy)dy,

that is,

|P | > 4

π.

The second case occurs whenever the last inequality does not hold, that

is H(P ) = 1 and thus s(x) can have discontinuities. In fact, s(x) jumps

from −1 to 1 whenever x = 12

+ k, with k ∈ Z, and there exists a point

x0 defined by the equation∫ 1

0

s(y)√

2(1 + cos 2πy)dy = P,

such that s(x) jumps from 1 to −1 at x0 + k, k ∈ Z. J

Exercise 176. Let φ : Tn → R be a C1 function not identically con-

stant. Show that there exist two distinct viscosity solutions of

Dxu · (Dxu−Dxφ) = 0,

whose difference is not constant.

5

Duality theory

This chapter is dedicated to the study of duality theory in optimiza-

tion problems. The main applications we study are infinite dimensional

linear programming problems such as Monge Kantorowich and Mather

problems.

1. Model problems

In this section we discuss certain minimization problems which in-

volve linear objective functions under linear constraints, that is, infinite

dimensional linear programming problems. Surprisingly there are deep

relations between these problems and certain nonlinear partial differ-

ential equations.

1.1. Mather problem.

1.1.1. Classical Mather problem. Let Td be the d-dimensional stan-

dard torus. Consider a Lagrangian L(x, v), L : Td × Rd → R, smooth

in both variables, strictly convex in the velocity v, and coercive, that

is,

lim|v|→∞

infx

L(x, v)

|v|= +∞.

The minimal action principle of classical mechanics asserts that the

trajectories x(t) of mechanical systems are critical points or minimizers

of the action

(133)

∫ T

0

L(x, x)ds.

231

232 5. DUALITY THEORY

These critical points are then solutions to the Euler-Lagrange equations

(134)d

dtDvL(x, x)−DxL(x, x) = 0.

Mather’s problem is a relaxed version of this variational principle, and

consists in minimizing the action

(135)

∫Td×Rd

L(x, v)dµ(x, v)

among a suitable class of probability measures µ(x, v). Originally, in

[Mat91], this minimization was performed over all measures invariant

under the Euler-Lagrange equations (134). However, as realized by

[Mn96], it is more convenient to consider a larger class of measures,

the holonomic measures. It turns out that both problems are equivalent

as any holonomic minimizing measure is automatically invariant under

the Euler-Lagrange equations. In what follows, we will define this class

of measures and provide the motivation for it.

Let x(t) be a trajectory on Td. Define a measure µTx on Td×Rd by

its action on test functions ψ ∈ Cc(Td×Rd), ψ(x, v), (continuous with

compact support) as follows:

〈ψ, µTx 〉 =1

T

∫ T

0

ψ(x(t), x(t)

)dt.

If x(t) is globally Lipschitz, the family µTxT>0 has support contained

in a fixed compact set, and therefore is weakly-∗ compact. Conse-

quently one can extract a limit measure µx which encodes some of the

asymptotic properties of the trajectory x.

Let ϕ ∈ C1(Td). For ψ(x, v) = v ·Dϕ(x) we have

〈ψ, µx〉 = limT→∞

1

T

∫ T

0

x ·Dϕ(x)dt = limT→∞

ϕ(x(T )

)− ϕ

(x(0)

)T

= 0.

Let γ(v) be a continuous function, γ : Rd → R, such that inf γ(v)1+|v| >

0, and lim|v|→∞

γ(v)1+|v| = ∞. A measure µ in Td × Rd is admissible if

1. MODEL PROBLEMS 233∫Td×Rd γ(v)dµ < ∞. An admissible measure µ on Td × Rd is called

holonomic if for all ϕ ∈ C1(Td) we have

(136)

∫Td×Rd

v ·Dϕdµ = 0.

Mather’s problem consists in minimizing (135) under all probabil-

ity measures that satisfy (136). As pointed out before, however, this

problem was introduced by Mane in [Mn96] in his study of Mather’s

original problem [Mat91].

1.1.2. Stochastic Mather problem. In the framework of stochastic

optimal control one is led to replace deterministic trajectories by sto-

chastic processes. Suppose that x(t) satisfies the stochastic differential

equation

dx = νdt+ σdW,

in which ν is a bounded, progressively measurable process, σ > 0 and

W a n−dimensional Brownian motion. One would like to minimize the

average action

E

∫ T

0

L(x, ν)dt.

As before, one can associate to these stochastic processes, probability

measures µ in Tn × Rn defined as∫Tn×Rn

φ(x, v)dµ = limT→∞

1

T

∫ T

0

φ(x(t), ν(t))dt,

in which the limit is taken through an appropriate subsequence.

The Dynkin’s formula is the analog for stochastic processes to the

fundamental theorem of calculus. This formula applied to ϕ(x(t)),

states that

E [ϕ(x(T ))− ϕ(x)] = E

∫ T

0

νDxϕ(x(t)) +σ2

2∆ϕ(x(t))dt.

This identity implies∫Tn×Rn

vDxϕ(x) +σ2

2∆ϕ(x)dµ = 0,

for all ϕ(x) : Tn → R, C2.


The stochastic Mather problem [Gom02a] consists in minimizing∫Tn×Rn

L(x, v)dµ,

over all probability measures µ Tn × Rn that satisfy∫Tn×Rn

vDxϕ(x) +σ2

2∆ϕ(x)dµ = 0,

for all ϕ(x) : Tn → R of class C2.

1.1.3. Discrete Mather problem. Also interesting is the discrete case,

in which the trajectories are replaced by sequences (xn, vn) that satisfy

xn+1 = xn + vn. In this case, if the sequence vn is globally bounded,

for instance, we can associate to this sequence a measure µ in Tn×Rn

through ∫Tn×Rn

φ(x, v)dµ = limN→∞

1

N

N∑n=1

φ(xn, vn),

in which the limit is take through an appropriate subsequence.

Since for all continuous functions ϕ : Tn → R we have

N∑n=1

ϕ(xn + vn)− ϕ(xn) = ϕ(xN+1)− ϕ(x1),

we obtain ∫Tn×Rn

[ϕ(x+ v)− ϕ(x)] dµ = 0.

Therefore, we propose Mather discrete problem, which consists in

minimizing ∫Tn×Rn

L(x, v)dµ,

over all probability measures µ in Tn × Rn that satisfy∫Tn×Rn

[ϕ(x+ v)− ϕ(x)] dµ = 0,

for all continuous function ϕ : Tn → R.

1. MODEL PROBLEMS 235

1.1.4. Generalized Mather problem. To state the generalized Mather

problem, we must now make precise our framework. Let U ⊂ Rm be

a non-empty closed convex set. Assume that, for some k ≥ 0 (usually

k = 0, 1, 2) there exists a linear operator Av : Ck(Tn) → C(Tn × U),

which satisfies the following two conditions: the first one is that for

each fixed ϕ ∈ Ck(Tn) we have

|Avϕ| ≤ Cϕ(1 + |v|),

uniformly in Tn × U , which of course, if U is bounded means simply

that |Avϕ| is bounded; the second condition is that for ϕ ∈ Ck(Tn) the

mapping (x, v) 7→ Avϕ is continuous in Tn × U .

We assume that there exists another operator B defined in Ck(Tn)

which satisfies the following compatibility conditions with Av:

(137) Avκ = Bκ,

for any κ ∈ R, and that, for any given probability measure ν on Tn,

there exists a probability measure µν in Tn × U such that

(138)

∫Tn×U

Avϕdµν =

∫TnBϕdν,

for all ϕ ∈ Ck(Tn).

The Lagrangian L(x, v) : Tn × U → R is continuous and convex in

v, bounded below, and, either U is bounded, and no further hypothesis

are required, or if U is unbounded we assume that, uniformly in x

lim|v|→∞

L(x, v)

|v|=∞.

The generalized Mather problem consists in minimizing

(139)

∫Tn×U

L(x, v)dµ,

over all probability measures µ in Tn × U that satisfy the constraint

(140)

∫Tn×U

Avϕdµ =

∫TnBϕdν,

for all functions ϕ : Tn → R with appropriate regularity.


1.2. Monge-Kantorowich problem. The Monge-Kantorowich

optimal mass transport problem, see [Eva99] or [Vil03b] is the fol-

lowing: given two positive measures µ+ and µ− in Rn which satisfy the

mass balance condition ∫Rndµ+ =

∫Rndµ−,

then one looks for a function s : Rn → Rn which transports µ+ into

µ−, that is, ∫Rnϕ(s(x))dµ+ =

∫Rnϕ(y)dµ−,

for each ϕ ∈ C∞c (Rn), more compactly we write this condition as

s#µ+ = µ−, and furthermore that minimizes total transport cost

1

2

∫Rn|x− s(x)|2dµ+(x).

Unfortunately, proving directly that such a mapping exists is a hard

problem, and we will instead consider a relaxed version of the problem.

Obviously, given a mapping s for which s#µ+ = µ− we can define

a measure π in R2n by∫R2n

φ(x, y)dπ =

∫Rnφ(x, s(x))dµ+.

Additionally, the marginals satisfy π|x = µ+ and π|y = µ−.

It is therefore natural to consider the relaxed Monge-Kantorowich

problem, which consists in minimizing

min1

2

∫R2n

|x− y|2dπ,

where the minimum is taken over all probability measures that satisfy

π|x = µ+ and π|y = µ−, that his∫R2n

ϕ(x)dπ =

∫Rnϕ(x)dµ+,

and ∫R2n

ψ(y)dπ =

∫Rnψ(y)dµ−,

for all continuous functions ϕ and ψ.

2. SOME INFORMAL COMPUTATIONS 237

Our strategy is to first prove existence of a solution to the relaxed

problem, which can be done under quite general assumptions, and only

then to prove (whenever possible) that the support of the optimal plan

is in fact a graph (x, s(x)) and, therefore, that there exists an optimal

transport mapping. The next example shows that the existence of an

optimal transport mapping can in fact fail:

Exercise 177. Let µ+ = δ0 and µ− = 12δ−1 + 1

2δ1. show that there does

not exist a function s which transports µ+ into µ−.

2. Some informal computations

2.1. Mather problem. In Mather’s problem, both in the deter-

ministic and in stochastic cases, the constraint∫Tn×Rn

vDxϕ(x) +σ2

2∆ϕ(x)dµ = 0,

(σ ≥ 0) is linear in v. Additionally, the Lagrangian is strictly convex

in v. This implies that minimizing measure has support in a graph

(x, v(x)). In fact, if the minimizing measure µ(x, v) were not support

in a graph, we could replace it by another measure µ given by∫Tn×Rn

φ(x, v)dµ(x, y) =

∫Tnφ(x, v(x))dθ(x),

where

v(x) =

∫Rnvµ(x, v)dv

and ∫Tnψ(x)dθ(x) =

∫Tn×Rn

ψ(x)µ(x, v)dv,

for all ψ ∈ C(Tn). Thus∫Tn×Rn

vDxϕ(x) +σ2

2∆ϕ(x)dµ = 0.

Additionally, the convexity of L in v implies∫Ldµ ≤

∫Ldµ.


If L is strictly convex, the inequality is strict unless v = v(x), µ almost

everywhere.

In conclusion:

Theorem 132. Let L(x, v) be strictly convex in v and µ a minimizing

measure for Mather’s problem (deterministic or stochastic). Then µ it

is supported in a graph

(x, v) = (x, v(x)).

additionally the projection θ of µ in the coordinate x satisfies

−∇ · (v(x)θ(x)) +σ2

2∆θ = 0,

and the distribution sense.

In order to simplify the presentation we are going to assume that

L = |v|22− U(x). Using formally Lagrange multipliers (see note after

exercise 25), we conclude that Mather’s problem is equivalent to the

problem without constraints

minθ,v(x)

∫Tn

(|v|2

2− U(x) + vDxϕ+

σ2

2∆ϕ+H

)θdx.

The function ϕ corresponds to the Lagrange multiplier for the holo-

nomy condition and H to the constraint∫

Tn θ = 1.

To obtain the Euler-Lagrange equation, we make the following vari-

ations

v → v + εw, θ → θ + εη.

This implies

v = −Dxϕ(x),

and|v|2

2− U(x) + vDxϕ+

σ2

2∆ϕ+H = 0.

Therefore

(141) − σ2

2∆ϕ+H(Dxϕ, x) = H,

2. SOME INFORMAL COMPUTATIONS 239

with

H(p, x) =|p|2

2+ U(x).

Exercise 178. Adapt minimax principle from exercise 25 to Mather’s

problem and formally verify the previous results.

As an application, we are going to prove an estimate for the second

derivatives of the solution of the Hamilton-Jacobi equation. In order

to keep the presentation as elementary as possible we assume that the

dimension is 1. We further assume that the solution to equation (141)

is twice differentiable in x:

−σ2

2∆(ϕxx) +DxϕDx(ϕxx) + |Dxϕx|2 + Uxx = 0.

Since v = −Dxϕ we have∫−σ

2

2∆(ϕxx) +DxϕDx(ϕxx)dµ = 0,

and therefore ∫|D2ϕ|2dµ ≤ C.

In section 5 we will make rigorous many of the ideas discussed in

this section. Mather’s problem is an infinite dimensional linear pro-

gramming problem. In in general, as we have discussed for finite di-

mensional problems, one can use duality to gain a better understanding

of the problem. For Mather’s problem (see exercise 178), the dual is

given by

infφ

supx−σ

2

2∆φ+H(Dxφ, x).

The duality theory implies that the value of this infimum is

−∫Ldµ.

On the other hand, this value is also the unique number H for which

−σ2

2∆u+H(Dxu, x) = H


has a periodic solution u. To check this fact directly, let u be a solution

of (141) then

infφ

supx−σ

2

2∆φ+H(Dxφ, x) ≤ sup

x−σ

2

2∆u+H(Dxu, x) = H.

Additionally, for each periodic function φ, u − φ has a minimum at a

point x0. At this point, Dxu = Dxφ, and ∆u ≥ ∆φ. Therefore

supx−σ

2

2∆φ+H(Dxφ, x) ≥ −σ

2

2∆φ(x0) +H(Dxφ, x0)

≥ −σ2

2∆u(x0) +H(Dxu, x0) = H.

2.2. Monge-Kantorowich problem. To obtain formally the Euler-

Lagrange equation to the Monge-Kantorowich we will suppose that

both µ+ and µ− have densities ρ+ and ρ−. Let s(x) be an optimal mass

transport map, µ the measure in R2n induced by s(x), with marginals

µ±. Let w be a divergence free vector field in Rn and ϕτ the flow

associated to the differential equation

d

dτz =

w(z)

ρ+(z).

Since w has zero divergence

∇ ·(ρ+ d

dτϕτ

)= 0.

Therefore ϕ#τ µ

+ = µ+. Define the measure µτ in R2n as∫φ(x, y)dµτ =

∫φ(ϕτ (x), y)dµ.

Since µ0 = µ, and µ is optimal, we have

d

dτ

∫|x− y|2dµτ

∣∣∣∣τ=0

= 0,

this is

2

∫(ϕτ (x)− y) · d

dτϕτ (x)dµ

∣∣∣∣τ=0

= 0.

This implies ∫(x− s(x)) · w(x) = 0.

3. DUALITY 241

This identity holds for all the versions free vector fields. Consequently,

the function x− s(x) is a gradient. Therefore

s(x) = DxΨ(x),

for some Ψ(x). The condition s#µ+ = µ− which, by the change of

variables formula is equivalent to

ρ+(x) = ρ−(s(x)) detDs(x),

which can be written asMonge-Ampere equation

ρ+(x) = ρ−(DΨ(x)) detD2Ψ(x).

Exercise 179. Use the minimax principle, see exercise 25, to deter-

mine the dual of Monge-Kantorowich problem.

Exercise 180. Consider the anti-optimal transport problem which con-

sists in determining the measure π(x, y) with marginals µ1 and µ2 which

maximizes ∫R2n

|x− y|2dπ(x, y).

Determine its dual.

Exercise 181. Use minimax principle to determine the dual of the

problem

min

∫R2n

c(x, y)π(x, y)dxdy

over all nonnegative probability densities π which satisfy∫Rnπ(x, y)dx =

∫Rnπ(y, x)dx.

3. Duality

According to the informal ideas discussed in section 1, we are now

going to discuss rigorously the duality theory. The main tool is the

Legendre-Fenchel-Rockefellar theorem, whose proof will be presented

in what follows, our proof is based in the one presented in [Vil03b].

Let E be a locally convex topological vector space with dual E ′.

The duality pairing between E and E ′ is denoted by (·, ·). Let h : E →


(−∞,+∞] be a convex function. The Legendre-Fenchel transform h∗ :

E ′ → [−∞,+∞] of h is defined by

h∗(y) = supx∈E

((x, y)− h(x)

),

for y ∈ E ′. In a similar way, if g : E → [−∞,+∞) is concave we define

g∗(y) = infx∈E

((x, y)− g(x)

).

Theorem 133 (Fenchel-Legendre-Rockafellar). Let E be a locally con-

vex topological vector space over R with dual E ′. Let h : E → (−∞,+∞]

be a convex function and g : E → [−∞,+∞) a concave function. Then,

if there exists a point x0 where both g and h are finite and at least one

of them is continuous,

(142) miny∈E′

[h∗(y)− g∗(y)] = supx∈E

[g(x)− h(x)] .

Remark. It is part of the theorem that the infimum in the left-hand

side above is a minimum.

Proof. First we show the “≥” inequality in (142). Recall that

infy∈E′

[h∗(y)− g∗(y)] = infy∈E′

supx1,x2∈E

[g(x1)− h(x2)− (y, x1 − x2)] .

By choosing x1 = x2 = x we conclude that

infy∈E′

[h∗(y)− g∗(y)] ≥ supx∈E

[g(x)− h(x)] .

The opposite inequality is more involved and requires the use of Hahn-

Banach’s theorem. Let

λ = supx∈E

[g(x)− h(x)] .

If λ = +∞ there is nothing to prove, thus we may assume λ < +∞.

We just need to show that there exists y ∈ E ′ such that for all x1 and

x2 we have

(143) g(x1)− h(x2)− (y, x1 − x2) ≤ λ,

since then, by taking the supremum over x1 and x2 yields

h∗(y)− g∗(y) ≤ λ.

3. DUALITY 243

From λ ≥ g(x)− h(x) it follows g(x) ≤ λ + h(x). Hence the following

convex subsets of E × R:

C1 =

(x1, t1) ∈ E × R : t1 < g(x1)

and

C2 =

(x2, t2) ∈ E × R : λ+ h(x2) < t2.

are disjoint. Let x0 as in the statement of the theorem. We will assume

that g is continuous at x0 (for the case in which h is the continuous

function the argument is similar). Since (x0, g(x0) − 1) ∈ C1 and g is

continuous at x0, C1 has non empty interior. Therefore, see [?, Chpt

4, sect 14.5], the sets C1 and C2 can be separated by a nonzero linear

function, i.e., there exists a nonzero vector z = (w, α) ∈ E ′ × R such

that

infc1∈C1

(z, c1) ≤ supc2∈C2

(z, c2),

that is, for any x1 such that g(x1) > −∞ and for any x2 s.t. h(x2) <

+∞ we have

(w, x1) + αt1 ≤ (w, x2) + αt2,

whenever t1 < g(x1) and λ+ h(x2) < t2.

Note that α can not be zero. Otherwise by using x2 = x0 and

taking x1 in a neighborhood of x0 where g is finite we deduce that w

is also zero. Therefore α > 0, otherwise, by taking t1 → −∞ we would

obtain a contradiction. Dividing w by α and letting y = −wα

, we would

obtain

−(y, x1) + g(x1) ≤ −(y, x2) + h(x2) + λ.

This is equivalent to (143) and thus we completed the proof.

Remark. The condition of continuity at x0 can be relaxed to the

condition of “Gateaux continuity” or directional continuity, that is the

function t 7→ f(x0 + tx) is continuous at t = 0 for any x ∈ E. Here f

stands for either h or g.


4. Generalized Mather problem

The generalized Mather problem is an infinite dimensional linear

programming problem. Its dual problem, that we compute in this

section, can be obtained using Fenchel-Legendre-Rockafellar’s theorem,

as we explain in what follows.

Let Ω = Tn × U . If U is bounded, set γ = 1, otherwise, let γ be a

function γ(v) : Ω→ [1,+∞) satisfying

lim|v|→+∞

L(x, v)

γ(v)= +∞, lim

|v|→+∞

|v|γ(v)

= 0.

Let M be the set of Radon measures in Ω with weight γ, that is,

M =

µ signed measure in Ω with

∫Ω

γd|µ| <∞.

The setM is the dual of the set Cγ,0(Ω) of continuous functions φ that

satisfy

(144) ‖φ‖γ = supΩ

∣∣∣∣φγ∣∣∣∣ <∞,

if U is bounded, and, if U is unbounded, satisfy both (144) and

lim|v|→∞

φ(x, v)

γ(v)= 0.

Let

M1 =

µ ∈M :

∫Ω

dµ = 1, µ ≥ 0

,

and

M2 = cl

µ ∈M :

∫Ω

Avϕdµ =

∫Ω

Bϕdν, ∀ϕ(x) ∈ Ck(Tn)

,

in which k is the degree of differentiability needed on ϕ so that Avϕ is

well defined, and the closure cl is taken in the weak topology.

For φ ∈ Cγ,0(Ω) let

h(φ) = sup(x,v)∈Ω

(−φ(x, v)− L(x, v)).

4. GENERALIZED MATHER PROBLEM 245

Since h is the supremum of convex functions, it is also a convex func-

tion, and, as was shown in [Gom02a], it is also continuous with respect

to uniform convergence in Cγ,0(Ω). Consider the set

C = clφ : φ = Avϕ, ϕ ∈ Ck(Tn)

,

where cl denotes the closure in Cγ,0. Since Av is a linear operator, C is

a convex set.

Let ν be a fixed probability measure on Tn, and let µν as in (138).

Define

g(φ) =

−∫φdµν if φ ∈ C,

−∞ otherwise.

As C is a closed convex set, g is concave and upper semicontinuous.

Note that if φ = Avϕ, then∫φdµν =

∫Bϕdν.

We claim that the dual of

(145) supφ∈Cγ0 (Ω)

g(φ)− h(φ)

is the generalized Mather problem .

We start by computing the Legendre transforms of h and g.

Proposition 134. We have

h∗(µ) =

∫Ldµ if µ ∈M1

+∞ otherwise,

and

g∗(µ) =

0 if µ ∈M2

−∞ otherwise.

Proof. By its definition

h∗(µ) = supφ∈Cγ0 (Ω)

(−∫φdµ− h(φ)

).

First we show that if µ is non-positive then h∗(µ) =∞.


Lemma 135. If µ 6≥ 0 then h∗(µ) = +∞.

Proof. If µ 6≥ 0 we can choose a sequence of non-negative func-

tions φn ∈ Cγ0 (Ω) such that∫

−φndµ→ +∞.

Therefore, since

sup−φn − L ≤ 0,

we have h∗(µ) = +∞.

Lemma 136. If µ ≥ 0 then

h∗(µ) ≥∫Ldµ+ sup

ψ∈Cγ0 (Ω)

(∫ψdµ− supψ

).

Proof. Let Ln be a sequence of functions in Cγ0 (Ω) increasing

pointwisely to L. Any φ in Cγ0 (Ω) can be written as φ = −Ln − ψ, for

some ψ in Cγ0 (Ω). Therefore

supφ∈Cγ0 (Ω)


)=

= supψ∈Cγ0 (Ω)

(∫Lndµ+

∫ψdµ− sup(Ln + ψ − L)

).

Since

sup (Ln − L) ≤ 0,

we have

sup(Ln + ψ − L) ≤ supψ.

Therefore

supφ∈Cγ0 (Ω)


)≥ sup

ψ∈Cγ0 (Ω)

(∫Lndµ+

∫ψdµ− supψ

).

By the monotone convergence theorem∫Lndµ→

∫Ldµ.


Thus,

supφ∈Cγ0 (Ω)


)≥∫Ldµ+ sup

ψ∈Cγ0 (Ω)

(∫ψdµ− supψ

),

as required.

If∫Ldµ = +∞ then h∗(µ) = +∞. On the other hand, if

∫dµ 6= 1

then

supψ∈Cγ0 (Ω)

(∫ψdµ− supψ

)≥ sup

α∈Rα

(∫dµ− 1

)= +∞,

by choosing ψ = α, constant. Therefore h∗(µ) = +∞.

When∫dµ = 1, the previous lemma implies

h∗(µ) ≥∫Ldµ,

by choosing ψ = 0.

Additionally, for each φ∫(−φ− L)dµ ≤ sup(−φ− L),

if∫dµ = 1. Therefore

supφ∈Cγ0 (Ω)


)≤∫Ldµ.

In this way,

h∗(µ) =

∫Ldµ if µ ∈M1

+∞ otherwise.

Let µν be such that ∫Avϕdµν =

∫Bϕdν,

for all ϕ ∈ Ck(Tn). We can write any measure µ ∈ M2 as a sum of

µν + µ, with ∫Avϕdµ = 0,


for all ϕ ∈ Ck(Tn). By continuity, it follows∫φdµ = 0,

for all φ ∈ C. Furthermore, for any µ 6∈ M2, there exists φ ∈ C such

that ∫φd(µ− µν) 6= 0.

Thus

g∗(µ) = infφ∈C−∫φdµ+

∫φdµν =

0 if µ ∈M2

−∞ otherwise.

Theorem 137.

(146) supφ∈Cγ,0(Ω)

(g(φ)− h(φ)) = minµ∈M

(h∗(µ)− g∗(µ)).

Note 1: minµ∈M(h∗(µ)− g∗(µ)) = minµ∈M1∩M2

∫Ldµ.

Note 2: It is part of the theorem that the right-hand side of (146) is

a minimum, and therefore there exists a generalized Mather measure.

Proof. The set g > −∞ is non-empty, and, in this set, h is a

continuous function as proved in [Gom02a]. Then the result fol-

lows from Fenchel-Legendre-Rockafellar’s Theorem, see, for instance

[Vil03b].

Let

H(ϕ, x) = supv−L(x, v)− Avϕ.

As an example, suppose Avϕ = ∆ϕ+ vDxϕ. Then

H(ϕ, x) = −∆ϕ+H(Dxϕ, x).


The result in Theorem 137 can then be restated in the more convenient

identity:

(147) minµ

∫Ldµ = − inf

ϕsupx

[H(ϕ, x) +

∫Bϕdν

],

where the minimum on the left-hand side is taken over all measures µ

that satisfy (140), and the infimum on the right-hand side is taken over

all ϕ ∈ Ck(Tn).

In the remaining of this section we consider Mather’s classical prob-

lem Avϕ = vDxϕ and B = 0.

Theorem 138. Let Avϕ = vDxϕ. Let H? given by

H? = − supφ∈Cγ0 (Ω)

(h2(φ)− h1(φ)).

Then

H? = infλ : ∃ϕ ∈ C1(Tn) : H(Dxϕ, x) < λ.

Proof. It is enough to observe that

H? = infϕ∈C1(Tn)

sup(x,v)∈Ω

−vDxϕ− L = infϕ∈C1(Tn)

supx∈Tn

H(Dxϕ, x).

Theorem 139. H? is the only value for which

H(Dxu, x) = H?

admits a periodic viscosity solution.

Proof. Let u be a periodic viscosity solution of

H(Dxu, x) = H.

We claim that there is no C1 function ψ such that

(148) supxH(Dxψ, x) < H.


By contradiction, let ψ be a function satisfying (148). Since u and ψ

are periodic functions, there exists a point x0 in which u − ψ would

have a local minimum. But then

H(Dxψ, x0) ≥ H,

which is is a contradiction. Thus, we conclude that H? ≥ H.

To prove the other inequality, consider a standard mollifier ηε and

define uε = ηε ∗ u. Then

H(Dxuε, x) ≤ H + h(ε, x),

where

h(ε, x) = sup|p|≤R

sup|x−y|≤ε

|H(p, x)−H(p, y)|,

and R is an estimate for the Lipschitz constant of u. Let

Hε = H + supxh(ε, x).

Then uε satisfies

H(Dxuε, x) ≤ Hε.

Therefore

H? ≤ limε→0

Hε = H.

Consequently H? = H.

4.1. Regularity. In this section we present (with small adapta-

tions) the regularity results for viscosity solutions in the support of the

Mather measures by [EG01]. We should point out that the proofs of

Theorems 141–147 presented here appeared in [EG01]. For the setting

of this survey, we had to add an elementary lemma, Lemma 140, for the

presentation to be self-contained, as our definition of Mather measures

differs from the one used in [EG01].

Lemma 140. Let µ be a minimizing holonomic measure. Then∫Td×Rd

DxL(x, v)dµ = 0.


Proof. Let h ∈ Rd, consider the measure µh on Td × Rd given by∫Td×Rd

φ(x, v)dµh =

∫Td×Rd

φ(x+ h, v)dµ,

for all continuous and compactly supported function φ : Td×Rd → R.

Clearly, for every h, µh is holonomic. Since µ is minimizing, it follows

d

dε

∫L(x+ εh, v)dµ

∣∣∣∣ε=0

= 0,

that is, ∫Td×Rd

DxL(x, v)hdµ = 0.

Since h ∈ R is arbitrary, the statement of the Lemma follows.

It will be convenient to define the measure µ on Td × Rd as the

push forward measure of the measure µ with respect to the one to one

map (v, x) 7→ (p, x), where p = DvL(v, x). In other words we define

the measure µ on Td × Rd to be∫Td×Rd

φ(x, p)dµ =

∫Td×Rd

φ(x,DvL(x, v))dµ.

We also define projection µ in Td of a measure µ in Td × Rd as∫Tdϕ(x)dµ(x) =

∫Td×Rd

ϕ(x)dµ(x, v).

Note that, in similar way, µ is also the projection of the measure µ.

Observe that for any smooth function ϕ(x) we have that µ satisfies the

following version of the holonomy condition:∫Td×Rd

DpH(p, x)Dxϕ(x)dµ = 0,

because we can use identity (??) if p = Dv(x, v).

Theorem 141. Let u be any viscosity solution of (132), and let µ

be any minimizing holonomic measure. Then µ-almost everywhere,

Dxu(x) exists and p = Dxu(x), µ-almost everywhere.


Proof. Let u be any viscosity solution of (132). Let ηε be a stan-

dard mollifier, uε = ηε ∗ u. By strict uniform convexity there exists

γ > 0 such that for any p, q ∈ Rd and any x ∈ Td we have

H(p, x) > H(q, x) +DpH(q, x)(p− q) +γ

2|p− q|2.

By Theorem 131, any viscosity solution of (132), and in particular u,

is Lipschitz.

Recall that, by Rademacher’s theorem [Eva98a], a locally Lips-

chitz function is differentiable Lebesgue almost everywhere. Using

p = Dxu(y) and q = Dxuε(x), conclude that for every point x and

for Lebesgue almost every point y:

H(Dxu(y), x) ≥ H(Dxuε(x), x)+DpH(Dxu

ε(x), x)(Dxu(y)−Dxuε(x))+

γ

2|Dxu

ε(x)−Dxu(y)|2.

Multiplying the previous identity by ηε(x− y) and integrating over Rd

in y yields

H(Dxuε(x), x)+

γ

2

∫Rdηε(x−y)|Dxu

ε(x)−Dxu(y)|2dy ≤∫

Rdηε(x−y)H(Dxu(y), x)dy ≤ H+O(ε).

Let

βε(x) =γ

2

∫Rdηε(x− y)|Dxu

ε(x)−Dxu(y)|2dy.

Now observe that

γ

2

∫Td×Rd

|Dxuε(x)− p|2dµ ≤

∫Td×Rd

[H(Dxuε(x), x)−H(p, x)−DpH(p, x)(Dxu

ε(x)− p)] dµ

≤∫

Td×RdH(Dxu

ε(x), x)dµ−H,

because ∫Td×Rd

DpH(x, p)Dxuε(x) = 0,

and

pDpH(x, p)−H(x, p) = L(x,DpH(x, p)),

and∫

Td×Rd L(x,DpH(x, p))dµ = −H. Therefore,

γ

2

∫Td×Rd

|Dxuε(x)− p|2dµ+

∫Tdβε(x)dµ ≤ O(ε).

Thus, for µ-almost every point x, βε(x)→ 0. Therefore, µ-almost every

point is a point of approximate continuity of Dxu (see [EG92], p. 49).


Since u is semiconcave (Proposition ??), it is differentiable at points of

approximate continuity. Furthermore

Dxuε → Dxu

pointwise, µ-almost everywhere, and so Dxu is µ measurable. Also we

have

p = Du(x), µ− almost everywhere.

By looking at the proof the previous theorem we can also state the

following useful result:

Corollary 142. Let ηε be a standard mollifier, uε = ηε ∗ u. Then∫Td|Dxu

ε −Dxu|2dµ ≤ Cε,

as ε→ 0.

As a Corollary we formulate an equivalent form of Theorem 141.

Corollary 143. Let u be any viscosity solution of (132), and let µ

be any minimizing holonomic measure. Then µ-almost everywhere,

Dxu(x) exists and

(149) DvL(v, x) = Dxu(x) µ− almost everywhere.

and

(150) DxL(v, x) = −DxH(Dxu(x), x) µ− almost everywhere.

Proof. First we observe that the measure µ is the push forward

measure of the measure µ with respect to the one to one map (v, x) 7→(p, x), where p = DvL(v, x). Therefore an µ – almost everywhere

identity

F1(p, x) = F2(p, x) (p, x)-µ almost everywhere

implies the µ – almost everywhere identity

F1(DvL(v, x), x) = F2(DvL(v, x), x) (v, x)-µ almost everywhere.


Thus (149) follows directly from Theorem 141.

Using (149) and the identity DxL(v, x) = −DxH(DvL(v, x), x), we

arrive at (150).

We observe that from the previous corollary it also follows∫TdDpH(Dx, x)Dxudµ = 0.

Indeed,∫TdDpH(Dxu, x)Dxudµ =

∫TdDpH(Dx, x)Dxu

εdµ+

∫TdDpH(Dxu, x) (Dxu−Dxu

ε) dµ.

We have ∫TdDpH(Dx, x)Dxu

εdµ = 0.

To handle the second term, fix δ > 0. Then∣∣∣∣∫TdDpH(Dxu, x) (Dxu−Dxu

ε)

∣∣∣∣ ≤ δ

∫Td|DpH(Dxu, x)|2dµ+

1

δ

∫Td|Dxu−Dxu

ε|2 dµ.

Note that since u is Lipschitz the term DpH(Dxu, x) is bounded, and

so is∫

Td |DpH(Dxu, x)|2dµ. Send ε→ 0, and then let δ → 0.

Theorem 144. Let u be any viscosity solution of (132), and let µ be

any minimizing holonomic measure. Then∫Td|Dxu(x+ h)−Dxu(x)|2dµ ≤ C|h|2.

Proof. Applying Theorem ?? we have

H(Dxuε(x+ h), x+ h) ≤ H + Cε.

By Theorem 141 the derivative Dxu(x) exists µ almost everywhere.

By proposition ?? viscosity solution satisfies equation (132) in classical

sense at all points of differentiability. Thus H(Dxu(x), x) = H for µ

almost all points x. Now observe that

Cε ≥ H(Dxuε(x+ h), x+ h)−H(Dxu(x), x)

= H(Dxuε(x+ h), x+ h)−H(Dxu

ε(x+ h), x) +H(Dxuε(x+ h), x)−H(Dxu(x), x)


The term

H(Dxuε(x+ h), x+ h)−H(Dxu

ε(x+ h), x) = DxH(Dxuε(x+ h), x)h+O(h2)

= DxH(Dxu(x), x)h+O(h2 + h|Dxuε(x+ h)−Dxu(x)|)

≥ DxH(Dxu(x), x)h+O(h2)− γ

4|Dxu

ε(x+ h)−Dxu(x)|2.

Therefore, for µ almost every x, we have

H(Dxuε(x+h), x)−H(Dxu, x) ≤ Cε−DxH(Dxu(x), x)h+

γ

4|Dxu

ε(x+h)−Dxu(x)|2+Ch2.

Since

H(Dxuε(x+h), x)−H(Dxu, x) ≥ γ

2|Dxu

ε(x+h)−Dxu(x)|2+DpH(Dxu, x)(Dxuε(x+h)−Dxu(x))

we have

γ

4

∫|Dxu

ε(x+h)−Dxu(x)|2dµ ≤ Cε+C|h|2−∫DxH(Dxu(x), x)hdµ.

By (150) and Lemma 140 it follows∫DxH(Dxu(x), x)hdµ = −

∫DxL(v, x)hdµ = 0.

As ε→ 0, through a suitable subsequence (since Dxuε(x+h) is bounded

in L2µ), we may assume thatDxu

ε(x+h) ξ(x) in L2µ, for some function

ξ ∈ L2µ, and ∫

|ξ −Dxu|2dµ ≤ C|h|2.

Finally, we claim that ξ(x) = Dxu(x + h) for µ almost all x. This

follows from Theorem 141 and the fact that for µ almost all x we have

ξ(x) ∈ D−x u(x + h), where D−x stands for the subdifferential. To see

this, observe that by Proposition ?? u is semiconcave, therefore uε are

uniformly semiconcave, that is

uε(y + h) ≤ uε(x+ h) +Dxuε(x+ h)(y − x) + C|y − x|2,

where C is independent of ε. Fixing y and integrating against a non-

negative function ϕ(x) ∈ L2µ yields∫

Td

(uε(y + h)− uε(x+ h)−Dxu

ε(x+ h)(y − x)− C|y − x|2)ϕ(x)dµ ≤ 0

By passing to the limit we have that

u(y+h) ≤ u(x+h)+ξ(x)(y−x)+C|y−x|2 for all y and µ-almost all x,


that is, ξ(x) ∈ D−x u(x+ h) for µ-almost all x.

Lemma 145. Let u be any viscosity solution of (132), and let µ be

any minimizing holonomic measure. Let ψ : Td × R → R be a smooth

function. Then ∫TdDpH(Dxu, x)Dx [ψ(x, u(x))] dµ = 0

Proof. Clearly we have∫TdDpH(Dxu, x)Dx [ψ(x, uε(x))] dµ = 0.

By the uniform convergence of uε to u, and L2µ convergence of Dxu

ε to

Dxu, see Corollary 142, we get the result.

Theorem 146. Let u be any viscosity solution of (132), and let µ be

any minimizing holonomic measure. Then, for µ almost every x and

all h ∈ Rd,

|u(x+ h)− 2u(x) + u(x− h)| ≤ C|h|2.

Proof. Let h 6= 0 and define

u(x) = u(x+ h), u(x) = u(x− h).

Consider the mollified functions uε, uε, where we take

(151) 0 < ε ≤ η|h|2,

for small η > 0. We have

H(Duε, x+ h) ≤ H + Cε, H(Duε, x− h) ≤ H + Cε.

For µ-almost every point x, (for which Du(x) exists and therefore

H(Du(x), x) = H) we have

H(Duε, x)−2H(Du, x)+H(Duε, x) ≤ 2Cε+H(Duε, x)−H(Duε, x+h)+H(Duε, x)−H(Duε, x−h).

Hence

γ

2(|Duε −Du|2 + |Duε −Du|2) +DpH(Du, x) · (Duε − 2Du+Duε)

≤ C(ε+ |h|2) + (DxH(Duε, x)−DxH(Duε, x)) · h.


Using the inequality∣∣(DxH(p, x)−DxH(q, x))·h∣∣ ≤ ∥∥∥ ∂2H

∂p∂x

∥∥∥ |p− q| |h| ≤ γ4|p−q|2+ 1

γ

∥∥∥ ∂2H∂p∂x

∥∥∥2

|h|2 ,

where∥∥∥ ∂2H∂p∂x

∥∥∥ = supp,x

sup|z|=1,|h|=1

∑i,j

∣∣∣zjhi ∂2H∂pj∂xi

(p, x)∣∣∣, we arrive at

γ

4(|Duε −Du|2 + |Duε −Du|2) +DpH(Du, x) · (Duε − 2Du+Duε) ≤ C(ε+ |h|2).

Fix now a smooth, nondecreasing, function Φ : R → R, and write

φ := Φ′ ≥ 0. Multiply the last inequality above by φ(uε−2u+uε

|h|2

), and

integrate with respect to µ:

γ

4

∫Td

(|Duε −Du|2 + |Duε −Du|2)φ

(uε − 2u+ uε

|h|2

)dµ(152)

+

∫TdDpH(Du, x) · (Duε − 2Du+Duε)φ(· · · ) dµ

≤ C(ε+ |h|2)

∫Tdφ(· · · ) dµ.

Now the second term on the left hand side of (152) equals

(153) |h|2∫

Rd

∫TdDpH(p, x) ·Dx

[Φ

(uε − 2u+ uε

|h|2

)]dµ

and thus, by Lemma 145 it vanishes. So now dropping the above term

from (152) and rewriting, we deduce

∫Td|Duε(x+ h)−Duε(x− h)|2φ

(uε(x+ h)− 2u(x) + uε(x− h)

|h|2

)dµ

(154)

≤ C(ε+ |h|2)

∫Tdφ

(uε(x+ h)− 2u(x) + uε(x− h)

|h|2

)dµ.

We confront now a technical problem, as (154) entails a mixture

of first-order difference quotients for Duε and second-order difference

quotients for u, uε. We can however relate these expressions, since u is

semiconcave.

To see this, first of all define

(155) Eε := x ∈ supp(µ) | uε(x+ h)− 2u(x) + uε(x− h) ≤ −κ|h|2,


the large constant κ > 0 to be fixed below. The functions

(156) u(x) := u(x)− α

2|x|2, uε(x) := uε(x)− α

2|x|2

are concave. Also a point x ∈ supp(µ) belongs to Eε if and only if

(157) uε(x+ h)− 2u(x) + uε(x− h) ≤ −(κ+ α)|h|2.

Set

(158) f ε(s) := uε(x+ s

h

|h|

)(−|h| ≤ s ≤ |h|).

Then f is concave, and

uε(x+ h)− 2uε(x) + uε(x− h) = f ε(|h|)− 2f ε(0) + f ε(−|h|)

=

∫ |h|−|h|

f ε′′(x)(|h| − |s|) ds

≥ |h|∫ |h|−|h|

f ε′′(s) ds (since f ε

′′ ≤ 0)

= |h|[f ε′(|h|)− f ε′(−|h|)

]= (Duε(x+ h)−Duε(x− h)) · h.

Consequently, if x ∈ Eε, this inequality and (157) together imply

2|uε(x)− u(x)|+ |Duε(x+ h)−Duε(x− h)||h| ≥ (κ+ α)|h|2.

Now |uε(x) − u(x)| ≤ Cε on Td, since u is Lipschitz continuous. We

may therefore take η in (151) small enough to deduce from the foregoing

that

(159) |Duε(x+ h)−Duε(x− h)| ≥ (κ

2+ α)|h|.

But then

(160) |Duε(x+ h)−Duε(x− h)| ≥ (κ

2− α)|h|.

Return now to (154). Taking κ > 2α and

φ(z) =

1 if z ≤ −κ0 if z > −κ.


The inequality (154) was derived for smooth functions φ. However, by

replacing φ in (154) by a sequence φn of smooth functions increasing

pointwise to φ, and using the monotone convergence theorem, we con-

clude that (154) holds for this function φ. Then we discover from (154)

that

(κ

2− α)2|h|2µ(Eε) ≤ C(ε+ |h|2)µ(Eε).

We fix κ so large that

(κ

2− α)2 ≥ C + 1,

to deduce

(|h|2 − Cε)µ(Eε) ≤ 0.

Thus µ(Eε) = 0 if η in (151) is small enough, and this means

uε(x+ h)− 2u(x) + uε(x− h) ≥ −κ|h|2

for µ-almost every point x. Now let ε→ 0:

u(x+ h)− 2u(x) + u(x− h) ≥ −κ|h|2

µ-almost everywhere Since

u(x+ h)− 2u(x) + u(x− h) ≤ α|h|2

owing to the semiconcavity, we have

|u(x+ h)− 2u(x) + u(x− h)| ≤ C|h|2

for µ-almost every point x. As u is continuous, the same inequality

obtains for all x ∈ supp(µ).

Now we state and prove the main result of this section.

Theorem 147. Let u be any viscosity solution of (132), and let µ

be any minimizing holonomic measure. Then for µ-almost every x,

Dxu(x) exists and for Lebesgue almost every y

(161) |Dxu(x)−Dxu(y)| ≤ C|x− y|.


Proof. First we show that

(162) |u(y)− u(x)− (y − x) ·Dxu(x)| ≤ C|x− y|2.

Fix y ∈ Rd and take any point x ∈ supp(µ) at which u is differentiable.

According to Theorem 146 with h := y − x, we have

(163) |u(y)− 2u(x) + u(2x− y)| ≤ C|x− y|2.

By semiconcavity, we have

(164) u(y)− u(x)−Du(x) · (y − x) ≤ C|x− y|2,

and also

(165) u(2x− y)− u(x)−Du(x) · (2x− y − x) ≤ C|x− y|2.

Use (165) in (163):

u(y)− u(x)−Du(x) · (y − x) ≥ −C|x− y|2.

This and (164) establish (162).

Estimate (161) follows from (162), as follows. Take x, y as above.

Let z be a point to be selected later, with |x − z| ≤ 2|x − y|. The

semiconcavity of u implies that

(166) u(z) ≤ u(y) +Du(y) · (z − y) + C|z − y|2.

Also,

u(z) = u(x)+Du(x)·(z−x)+O(|x−z|2), u(y) = u(x)+Du(x)·(y−x)+O(|x−y|2),

according to (162). Insert these identities into (166) and simplify:

(Du(x)−Du(y)) · (z − y) ≤ C|x− y|2.

Now take

z := y + |x− y| Du(x)−Du(y)

|Du(x)−Du(y)|to deduce (161).


Now take any point x ∈ supp(µ), and fix y. There exist points

xk ∈ supp(µ) (k = 1, . . . ) such that xk → x and u is differentiable at

xk. According to estimate (162)

|u(y)− u(xk)−Du(xk) · (y − xk)| ≤ C|xk − y|2 (k = 1, . . . ).

The constant C does not depend on k or y. Now let k → ∞. Owing

to (161) we see that Du(xk) converges to some vector η, for which

|u(y)− u(x)− η · (y − x)| ≤ C|x− y|2.

Consequently u is differentiable at x and Du(x) = η.

It follows from Theorem 147 that function v defined by Theorem

?? is Lipschitz on a set of full measure µ. Indeed, by substituting the

L.H.S. and the R.H.S. of (149) into Hp(p, x) = Hp(p, x) in place of p’s

and using (??) we have

v(x) = DpH(Du(x), x) µ almost everywhere.

We can then extend v as a Lipschitz function to the support of µ,

which is contained in the closure of this set of full measure. Note that

any Lipschitz function ϕ defined on a closed set K can be extended to

a globally defined Lipschitz function ϕ in the following way: without

loss of generality assume that Lip(ϕ) = 1; define

ϕ(x) = infy∈K

ϕ(y) + 2d(x, y).

An easy exercise then shows that ϕ = ϕ in K and that ϕ is Lipschitz.

Therefore we may assume that v is globally defined and Lipschitz.

4.2. Holonomy variations. In this section we study a class of

variations that preserve the holonomy constraint. These variations will

be used later to establish the invariance under the Euler-Lagrange flow

of minimizing holonomic measures.

Let ξ : Td → Rd, ξ(x) be a C1 vector field on Td. Let Φ(t, x) be the

flow by ξ, i.e.,

Φ(0, x) = x, and ∂∂t

Φ(t, x) = ξ(Φ(t, x)

).


Consider the prolongation of ξ to Td ×Rd, which is the vector field on

Td × Rd given by

(167) xk(x, v) = ξk(x) , vk(x, v) = vi∂ξk∂xi

(x) .

Lemma 148. The flow of (167) is given by

(168) Xk(t, x, v) = Φk(t, x) , Vk(t, x, v) = vs∂Φk

∂xs(t, x).

Proof. Since the X-part of the flow coincides with the Φ-flow, it

only remains to show that

V (0, x, v) = v , and ∂∂tV (t, x, v) = v

(X(t, x, v), V (t, x, v)

).

The first statement (V (0, x, v) = v) is clear since the map x 7→ Φ(0, x)

is the identity map. The second statement can be rewritten as

∂∂tVk(t, x, v) = Vi(t, x, v)

∂ξk∂xi Φ(t,x)

.

A simple computations yields

∂∂tVk(t, x, v) = vs

∂∂xs

(∂∂t

Φk(t, x))= vs

∂∂xs

(ξk(Φ(t, x)

))= vs

∂ξk∂xi Φ(t,x)

∂Φi

∂xs (t,x)

= Vi(t, x, v)∂ξk∂xi Φ(t,x)

,

which is the desired identity.

For any real number t and any function ψ(x, v), define a new func-

tion ψt as follows

(169) ψt(x, v) = ψ(X(t, x, v), V (t, x, v)

).

Thus the flow (168) generates the flow on space of functions ψ(x, v)

given by (169).

Lemma 149. The set C, defined in (??), is invariant under the flow

given by (169).

Proof. Let g ∈ C1(Td) be such that ψ(x, v) = vi∂∂xig(x). Let gt

denote the flow by Φ of the function g, i.e., gt(x) = g(Φ(t, x)

). We


claim that for any real number t we have

ψt(x, v) = vi∂

∂xigt(x),

where ψt is given by (169). Indeed,

ψt(x, v) = Vk(t, x, v)∂g

∂xk X(t,x,v)

= vs∂g

∂xk Φ(t,x)

∂Φk

∂xs (t,x)

= vs∂

∂xs

(g(Φ(t, x)

))= vs

∂

∂xsgt(x),

and so the Lemma is proven.

The flow on functions (169) generates the flow on measures: (t, µ) 7→µt, where

(170)

∫ψdµt =

∫ψtdµ.

Lemma 150. The flow (170) preserves the holonomy constraint.

Proof. Let µ be a holonomic measure. We have to prove that µtis also a holonomic, i.e.,

∫ψdµt = 0 for any ψ ∈ C. This is clear since

the flow (169) preserves the set C.

Theorem 151. Let µ be a minimizing measure for the action (135),

subject to the holonomy constraint. Then for any C1 vector field ξ :

Td → Rd we have

(171)

∫∂L

∂xsξs +

∂L

∂vsvk

∂∂xk

ξsdµ = 0.

Proof. Let µt be the flow generated from µ by (170). Relation

(171) expresses the fact ddt

(∫L(x, v)dµt

)t=0

= 0.

4.3. Invariance. In this section we present a new proof of the

invariance under the Euler-Lagrange flow of minimal holonomic mea-

sures.

In what follows ( )−1js denotes the j, s entry of the inverse matrix. We

will only use this notation for symmetric matrices, thus, this notation

will not lead to any ambiguity. Before stating and proving the main

Theorem of this section, we will prove an auxiliary lemma.


Lemma 152. Let µ be a minimal holonomic measure. Let vε(x) be

any smooth function. Let φ(x, v) be any smooth compactly supported

function. Then

(172)∫vk∂φ

∂xk

(x, vε(x)

)+∂φ

∂vj

(x, vε(x)

)(∂2L

∂2v

)−1

js

(x, vε(x)

)( ∂L∂xs

(x, v)− vk∂2L

∂xk∂vs

(x, vε(x)

))dµ =∫

vk∂

∂xk

(φ(x, vε(x)

))dµ−

∫vk

∂

∂xk

( ∂L∂vs

(x, vε(x)

)Xεs

)dµ+

∫vk

( ∂L∂vs

(x, vε(x)

)− ∂L∂vs

(x, v)) ∂

∂xk

(Xεs

)dµ,

where Xεs is a function of x only (does not depend on v), and is defined

as follows:

Xεs(x) =

∂φ

∂vj

(x, vε(x)

)(∂2L

∂2v

)−1

js

(x, vε(x)

).

Remark. We will only use this lemma for the case when vε is the

standard smoothing of the function v(x), that is, vε = ηε ∗ v, where ηεis a standard mollifier. The function v(x) is the function whose graph

contains the support of µ, given in Theorem ??. This explains the

notation vε.

Proof. This Lemma is based on Theorem 151. In this proof and

bellow vε stands for the function vε(x). We have:

vk∂φ

∂xk

(x, vε(x)

)= vk

∂

∂xk

(φ(x, vε(x)

))− vk

∂φ

∂vj

(x, vε(x)

) ∂vεj∂xk

(x).

Rewrite the last term:

vk∂φ

∂vj(x, vε(x))

∂vεj∂xk

(x) = vk∂φ

∂vj(x, vε)

(∂2L

∂2v

)−1

js(x, vε)

∂2L

∂vs∂vq(x, vε)

∂vεq∂xk

(x) = vkXεs(x)

∂2L

∂vs∂vq(x, vε)

∂vεq∂xk

(x).

Plug these two lines into (172). And therefore we reduce (172) to

(173)∫Xεs(x)

(∂L

∂xs(x, v)− vk

( ∂2L

∂xk∂vs(x, vε) +

∂2L

∂vs∂vq(x, vε)

∂vεq∂xk

))dµ =

−∫vk

∂

∂xk

( ∂L∂vs

(x, vε)Xεs

)dµ+

∫vk

( ∂L∂vs

(x, vε)− ∂L∂vs

(x, v)) ∂

∂xk

(Xεs

)dµ.


Using the chain rule in the LHS and the Leibniz rule in the RHS we

further reduce (173) to∫Xεs

(∂L

∂xs(x, v)−vk

∂

∂xk

( ∂L∂vs

(x, vε)))

dµ = −∫vkX

εs

∂

∂xk

( ∂L∂vs

(x, vε))dµ−

∫vk∂L

∂vs(x, v)

∂

∂xk

(Xεs

)dµ.

Noting the cancellation of the term∫vkX

εs∂∂xk

(∂L∂vs

(x, vε))dµ, we see

that the last identity is equivalent to (171) with ξs(x) = Xεs(x).

Theorem 153. Let µ be a minimizing holonomic measure. Then µ is

invariant under the Euler-Lagrange flow.

Proof. By Lemma 59 we have to prove that for any smooth com-

pactly supported function φ(x, v)

(174)

∫vk∂φ

∂xk+∂φ

∂vj

(∂2L

∂2v

)−1

js

[∂L

∂xs− vk

∂2L

∂xk∂vs

]dµ = 0,

where ( )−1js stands for the j, s entry of the inverse matrix.

The idea of the proof is first to rewrite (174) in an equivalent form

and then apply an approximation argument. Since µ is supported by

the graph v = v(x) we will change the x, v arguments with x,v(x)

for the following four types of functions ∂φ∂xk

, ∂φ∂vj

,(∂2L∂2v

)−1

js, and ∂2L

∂xk∂vs,

occurring in (174):

(175)∫vk∂φ

∂xk

(x,v(x)

)+∂φ

∂vj

(x,v(x)

)(∂2L

∂2v

)−1

js

(x,v(x)

)( ∂L∂xs

(x, v)− vk∂2L

∂xk∂vs

(x,v(x)

))dµ = 0.

To complete the proof of the theorem, we use Lemma 152. The first

and second integrals in the RHS of (172) are zero due to the holonomy

constraint. The third integral in the RHS of (172) tends to zero as

ε→ 0, because |vε(x)−v(x)| < cε and therefore |vε(x)− v| < cε µ-a.e.,

and because Xεs is uniformly Lipschitz and hence ∂xkX

εs is uniformly

bounded. Therefore the LHS of (172) tends to zero as ε→ 0.


But the LHS of (172) also tends to the LHS of (175) as ε → 0.

Indeed, since v(x) is a Lipschitz vector field we have

vε(x)→ v(x) (uniformly) and∂vε(x)

∂xis uniformly bounded.

Moreover for any smooth function Ψ(x, v) we have

Ψ(x, vε(x)

)→ Ψ

(x,v(x)

)(uniformly) and

∂

∂x

(Ψ(x, vε(x)

))is uniformly bounded.

Also note that for µ almost all (x, v) we have v = v(x). Therefore the

Theorem is proven.

5. Monge-Kantorowich problem

In this section we are going to study the Monge-Kantorowich prob-

lem. First we will show that there exists a solution.

Theorem 154. Let µ± be two probability measures on Rn with∫Rn|x|2dµ± <∞.

Then exists a measure µ which minimizes

1

2

∫R2n

|x− y|2dµ(x, y)

over all probability measures µ on R2n which satisfy µ|x = µ+ and

µ|y = µ−.

Remark. The integrability condition∫

Rn |x|2dµ± <∞ can be relaxed

see, for instance [Vil03a].

Proof. Let µn be a minimizing sequence, that is,∫R2n

1

2|x− y|2dµn → inf

µ

∫R2n

1

2|x− y|2dµ.

Since the sequence µn satisfy µn|x = µ+ and µn|y = µ− we have

supn

∫R2n

|x|2 + |y|2dµn <∞.

5. MONGE-KANTOROWICH PROBLEM 267

consequently, the sequence µn is precompact, that is, through a sub-

sequence µn µ, for some measure µ with the same marginals. Let

ck(x, y) be a sequence of compactly supported continuous functions

such that ck(x, y) increases pointwise to 12|x− y|2. Then, by the mono-

tone convergence theorem

1

2

∫R2n

|x− y|2dµ = limk→∞

∫R2n

ck(x, y)dµ

= limk→∞

limn→∞

∫R2n

ck(x, y)dµn

≤ limn→∞

∫R2n

1

2|x− y|2dµn,

from which we conclude that µ is a minimizer.

Exercise 182. Show that the dual of Monge-Kantorowich problem con-

sists in determining continuous functions φ(x) and ψ(y) such that

φ(x) + ψ(y) ≤ 1

2|x− y|2

and that maximize∫Rnφ(x)dµ1(x) +

∫Rnψ(y)dµ2(y).

Let φ and ψ be two admissible functions, that is,

φ(x) + ψ(y) ≤ 1

2|x− y|2.

Then

φ(x)− |x|2

2+ ψ(y)− |y|

2

2≤ −xy,

that is

φ(x) + ψ(y) ≥ xy,

with φ(x) = |x|22− φ(x) and ψ(y) = |y|2

2− ψ(y). On the other hand,∫

Rnφ(x)dµ+(x)+

∫Rnψ(y)dµ−(y)

=−∫

Rnφ(x)dµ+(x)−

∫Rnψ(y)dµ−(y)

+

∫Rn

|x|2

2dµ+(x) +

∫Rn

|y|2

2dµ−(y).


Let

Θ(φ, ψ) = −∫

Rnφ(x)dµ+(x)−

∫Rnψ(y)dµ−(y).

Let

ψ∗(x) = infyxy − ψ(y),

be the Legendre transform of ψ. The pair (ψ∗, ψ) satisfies

Θ(φ, ψ) ≤ Θ(ψ∗, ψ).

Applying a similar reasoning to the pair (ψ∗, ψ), and replacing ψ(y) by

ψ∗∗(y) = infxxy − ψ(x)

we obtain

Θ(ψ∗, ψ) ≤ Θ(ψ∗, ψ∗∗).

Therefore, the dual of the Monge-Kantorowich problem is equivalent

to minimizing ∫Rnψ∗(x)dµ+(x) +

∫Rnψ∗∗(y)dµ−(y)

over convex conjugate functions satisfying

ψ∗(x) + ψ∗∗(y) ≥ xy.

Bibliography

[AKN97] V. I. Arnold, V. V. Kozlov, and A. I. Neishtadt. Mathematical aspects ofclassical and celestial mechanics. Springer-Verlag, Berlin, 1997. Trans-lated from the 1985 Russian original by A. Iacob, Reprint of the originalEnglish edition from the series Encyclopaedia of Mathematical Sciences[Dynamical systems. III, Encyclopaedia Math. Sci., 3, Springer, Berlin,1993; MR 95d:58043a].

[Arn66] V. Arnold. Sur la geometrie differentielle des groupes de Lie de dimensioninfinie et ses applications a l’hydrodynamique des fluides parfaits. Ann.Inst. Fourier (Grenoble), 16(fasc. 1):319–361, 1966.

[Bar94] Guy Barles. Solutions de viscosite des equations de Hamilton-Jacobi.Springer-Verlag, Paris, 1994.

[BCD97] Martino Bardi and Italo Capuzzo-Dolcetta. Optimal control and viscos-ity solutions of Hamilton-Jacobi-Bellman equations. Systems & Control:Foundations & Applications. Birkhauser Boston Inc., Boston, MA, 1997.With appendices by Maurizio Falcone and Pierpaolo Soravia.

[EG92] L. C. Evans and R. F. Gariepy. Measure theory and fine properties offunctions. CRC Press, Boca Raton, FL, 1992.

[EG01] L. C. Evans and D. Gomes. Effective Hamiltonians and averaging forHamiltonian dynamics. I. Arch. Ration. Mech. Anal., 157(1):1–33, 2001.

[Eva98a] L. C. Evans. Partial differential equations. American Mathematical So-ciety, Providence, RI, 1998.

[Eva98b] Lawrence C. Evans. Partial differential equations, volume 19 of GraduateStudies in Mathematics. American Mathematical Society, Providence,RI, 1998.

[Eva99] Lawrence C. Evans. Partial differential equations and Monge-Kantorovich mass transfer. In Current developments in mathematics,1997 (Cambridge, MA), pages 65–126. Int. Press, Boston, MA, 1999.

[Fra02] Joel N. Franklin. Methods of mathematical economics, volume 37 of Clas-sics in Applied Mathematics. Society for Industrial and Applied Mathe-matics (SIAM), Philadelphia, PA, 2002. Linear and nonlinear program-ming, fixed-point theorems, Reprint of the 1980 original.

269

270 BIBLIOGRAPHY

[FS93] Wendell H. Fleming and H. Mete Soner. Controlled Markov processes andviscosity solutions, volume 25 of Applications of Mathematics. Springer-Verlag, New York, 1993.

[Gia83] Mariano Giaquinta. Multiple integrals in the calculus of variations andnonlinear elliptic systems, volume 105 of Annals of Mathematics Studies.Princeton University Press, Princeton, NJ, 1983.

[Gia93] Mariano Giaquinta. Introduction to regularity theory for nonlinear ellip-tic systems. Lectures in Mathematics ETH Zurich. Birkhauser Verlag,Basel, 1993.

[Gol80] H. Goldstein. Classical mechanics. Addison-Wesley Publishing Co.,Reading, Mass., second edition, 1980. Addison-Wesley Series in Physics.

[Gom00] D. Gomes. Viscosity Solutions and Asymptotics for Hamiltonian Sys-tems, Ph. D. Thesis. University of California at Berkeley, 2000.

[Gom02a] D. Gomes. A stochastic analogue of Aubry-Mather theory. Nonlinearity,15(3):581–603, 2002.

[Gom02b] Diogo Aguiar Gomes. A stochastic analogue of Aubry-Mather theory.Nonlinearity, 15(3):581–603, 2002.

[GSS08] D. Gomes, A. Sernadas, and C. Sernadas. Foundations and applicationsof linear optimization. preprint, 2008.

[GT01] David Gilbarg and Neil S. Trudinger. Elliptic partial differential equa-tions of second order. Classics in Mathematics. Springer-Verlag, Berlin,2001. Reprint of the 1998 edition.

[Lio82] Pierre-Louis Lions. Generalized solutions of Hamilton-Jacobi equations.Pitman (Advanced Publishing Program), Boston, Mass., 1982.

[LL76] L. D. Landau and E. M. Lifshitz. Course of theoretical physics. Vol.1. Pergamon Press, Oxford, third edition, 1976. Mechanics, Translatedfrom the Russian by J. B. Skyes and J. S. Bell.

[Mat91] J. N. Mather. Action minimizing invariant measures for positive definiteLagrangian systems. Math. Z., 207(2):169–207, 1991.

[Mn96] Ricardo Mane. Generic properties and problems of minimizing measuresof Lagrangian systems. Nonlinearity, 9(2):273–310, 1996.

[Oli98] Waldyr Oliva. Geometric Mechanics. IST - Lecture Notes, Lisbon, 1998.[Vil] C. Villani. Optimal transportation, dissipative pdes and functional in-

equalities.[Vil03a] Cedric Villani. Topics in optimal transportation, volume 58 of Graduate

Studies in Mathematics. American Mathematical Society, Providence,RI, 2003.

[Vil03b] Cedric Villani. Topics in optimal transportation, volume 58 of GraduateStudies in Mathematics. American Mathematical Society, Providence,RI, 2003.

Index

Campanato space, 172canonical transformation, 83Christoffel symbol, 61coercivity in Rn, 11condition

Legendre-Hadamard, 134conjugate point, 97connection

compatible with the metric, 66Levi-Civita, 66symmetric, 65

convex, 13strictly, 13

critical point, 14, 112critical point of the action, 51curvature

sectional, 101curvature tensor, 99

derivativecovariant, 65

Dynamic programming principle, 188

equationPoisson, 130Euler-Lagrange, 51Monge-Ampere, 241

equationsHamilton, 81

Euler-Lagrange equation, 129

generating function, 84

Harnack inequality, 166

invariantPoincare-Cartan, 82

Karush-Kuhn-Tucker (KKT)conditions, 38

Legendre transform, 76Lemma

John-Nirenberg, 164lower semicontinuity, 12

minimax principle, 28minimizing sequence, 10Morrey space, 172

Palais-Smale condition, 113Parallel transport, 64Poisson manifold, 121problem

Monge-Kantorowich, 236

quasiconvex, 140

regular point, 212

semiconcave, 200semiconvex, 200subdifferential, 23, 197subsolution, 159

271

272 INDEX

superdifferential, 197supersolution, 159symplectic manifold, 121

TheoremDeGiorgi-Nash-Moser, 166Fenchel-Legendre-Rockafellar, 242Lax Milgram, 150

torsion, 65

viscosity solution, 216viscosity supersolution/subsolution,

215

weakly lower semicontinuity, 138

Documents

Calculus of Variations and Partial Di erential Equationsdgomes/notas_calvar.pdf · Introduction This book is dedicated to the study of calculus of variations and its connection and