Optimal Transportation Problem by Stochastic Optimal Control

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

SIAM J. CONTROL OPTIM. c© 2008 Society for Industrial and Applied MathematicsVol. 47, No. 3, pp. 1127–1139

OPTIMAL TRANSPORTATION PROBLEM BY STOCHASTICOPTIMAL CONTROL∗

TOSHIO MIKAMI† AND MICHELE THIEULLEN‡

Abstract. We address an optimal mass transportation problem by means of optimal stochasticcontrol. We consider a stochastic control problem which is a natural extension of the Monge–Kantorovich problem. Using a vanishing viscosity argument we provide a probabilistic proof of twofundamental results in mass transportation: the Kantorovich duality and the graph property for thesupport of an optimal measure for the Monge–Kantorovich problem. Our key tool is a stochasticduality result involving solutions of the Hamilton–Jacobi–Bellman PDE.

Key words. optimal mass transportation theory, Monge–Kantorovich problem, Monge prob-lem, duality, stochastic control, Hamilton–Jacobi–Bellman PDE, value function, vanishing viscosity,semiconvex functions

AMS subject classifications. 60J25, 60J60, 60G99, 93E20, 49J20, 70H20

DOI. 10.1137/050631264

1. Introduction. Our goal in the present paper is to show that stochastic opti-mal control theory can be used efficiently to study deterministic optimal mass trans-portation problems. Let us recall that optimal transportation theory consists of thefollowing two minimization problems, where P0 and P1 are given Borel probabilitymeasures on Rd and the cost function c : Rd × Rd → R+ ∪ {+∞} is measurable. Inthis paper the cost function has the form

(1.1) c(x, y) = L(y − x)

with L(u) : Rd → [0,∞) convex in u. In the Monge problem the object of study is

(1.2) TM (P0, P1) := inf

{∫Rd

L(g(x) − x)P0(dx)

},

and the infimum is taken over all measurable maps g : Rd �→ Rd such that the imageof P0 by g is P1. In the Monge–Kantorovich problem (MKP), one considers

(1.3) TMK(P0, P1) := inf

{∫Rd×Rd

L(y − x)μ(dxdy)

}

on the set of probability measures μ on Rd ×Rd with marginals P0 and P1 (namely,such that μ(A× Rd) = P0(A) and μ(Rd ×B) = P1(B)). The resolution of (1.2) is adifficult problem. Kantorovich introduced the relaxed version (1.3) as a step to solve(1.2). It is easy to check that

(1.4) TMK(P0, P1) ≤ TM (P0, P1).

∗Received by the editors May 11, 2005; accepted for publication (in revised form) November 27,2007; published electronically March 19, 2008.

http://www.siam.org/journals/sicon/47-3/63126.html†Department of Mathematics, Hokkaido University, Sapporo 060-0810, Japan (mikami@math.

sci.hokudai.ac.jp). This author’s research was partially supported by the Grant-in-Aid for Scientificresearch 15340047, 15340051, 16654031, JSPS.

‡Corresponding author. Laboratoire de Probabilites et Modeles Aleatoires, Boite 188, 4, PlaceJussieu, Universite Paris VI, 75252 Paris cedex 05, France ([email protected]). This author’s re-search was supported by the Grant-in-Aid for Scientific research 15340051, 16654031, JSPS.

1127

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


1128 TOSHIO MIKAMI AND MICHELE THIEULLEN

Indeed, any measurable mapping g : Rd �→ Rd such that the image measure of P0 byg is P1 satisfies

(1.5)

∫Rd

L(g(x) − x)P0(dx) =

∫Rd×Rd

L(y − x)μg(dxdy),

where μg is the image measure of P0 by the mapping

Rd �→ Rd × Rd,(1.6)

x �→ (x, g(x)).(1.7)

Inequality (1.4) follows since μg is a probability measure on Rd ×Rd with marginalsP0 and P1. Moreover, an optimal measure for (1.3) always exists (cf. [14]). If anysuch measure is supported by the graph of a measurable map, we say that the graphproperty holds; that is, if for any μ∗ optimal for (1.3), there exists a set Γ satisfyingμ∗(Γ) = 1 and

(1.8) Γ = {(x, θ(x));x ∈ Rd}

for some measurable mapping θ. If the graph property holds, it provides a solution toMonge problem (1.2). Indeed, in this case TMK(P0, P1) =

∫Rd×Rd L(y−x)μ∗(dxdy) =∫

Rd L(θ(x) − x)P0(dx). Using (1.4), we see that the mapping θ minimizes Mongeproblem (1.2). In order to check whether the graph property is satisfied, Kantorovichduality for (1.3) plays a fundamental role. It was first proved by Kantorovich (cf. [7])when the cost function is a distance and later generalized by Kellerer (cf. [8]). It runsas follows:

(1.9) TMK(P0, P1) = sup

{∫Rd

ψ(y)P1(dy) −∫Rd

ϕ(x)P0(dx)

},

where the supremum is taken over all pairs (ϕ,ψ) ∈ L1(P0)×L1(P1) satisfying ψ(y)−ϕ(x) ≤ L(y − x). To go from Kantorovich duality to the graph property, two typesof arguments have been used: differentiability properties of convex functions for thequadratic cost (cf. [1]) and geometrical properties of cyclically monotone sets forgeneral costs (cf. [6]).

In the present paper we show that Kantorovich duality and the graph property canbe proved by stochastic optimal control combined with a vanishing viscosity argument.It is not clear a priori that stochastic optimal control theory is well suited to studyingproblems such as TMK or TM , where the initial and the final distributions are bothimposed. However, for the case when the cost is L(u) = |u|2, one of us (cf. [10])addressed (1.2) directly without using (1.3) and gave a probabilistic proof of existenceand uniqueness of a solution to (1.2). The proof in [10] relies on h-path processesand cyclically monotone sets. In the present paper, on the contrary, we focus on(1.3) and on duality arguments. We rely on a stochastic duality result which weproved in [11]. The basis of this result is the correspondence between solutions ofthe Hamilton–Jacobi–Bellman (HJB) partial differential equation (PDE) and valuefunctions of stochastic control. We do not use cyclically monotone sets. By vanishingviscosity we prove Kantorovich duality and we recover the graph property. Thusthe present paper together with [11] provides a global treatment of these two buildingblocks of optimal transportation theory by stochastic optimal control. Let us mentionthat here L is more general than |u|2 and our method greatly simplifies the arguments

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


OPTIMAL TRANSPORTATION VIA STOCHASTIC CONTROL 1129

of [10]. Classically (cf. [14]) the graph property is proved for L satisfying a cone-typecondition which is easy to check only for radial L. We prove the graph propertywithout any additional cone condition when L is not necessarily radial such thatL(u) ∼ |u|2 at infinity.

In our approach by vanishing viscosity there are still open questions left. Onequestion is the convergence of the optimal process for the stochastic control problem(see section 2 below) to the optimal trajectory for (1.2). It is known that when

L(u) = |u|2, each optimal process is an h-path process, which is a rather explicitproperty. Using this information, it was proved in [10] that these optimal h-pathprocesses converge to the deterministic optimal trajectory of (1.2) when their diffusion

part tends to zero. To prove an analogous convergence when L(u) ∼ |u|2 at infinity,we may be willing to use the following result obtained in [11] which can be compared

to the h-path process property: when L(u) ∼ |u|2 at infinity, the optimal process ofthe stochastic control problem solves a forward-backward system (cf. [11] Theorem2.2).

The paper is organized as follows. In section 2 we review the stochastic dualitytheorem that we have proved in [11]. Sections 3 and 4 present two applications ofthis stochastic duality combined with a vanishing viscosity argument: Kantorovichduality in section 3 as well as the graph property in section 4 are proved using thismethod.

2. A stochastic duality result. We will be working under the following as-sumptions: L(u) : Rd → [0,∞) is convex in u,

(A.1) for some δ > 1,

lim inf|u|→∞

L(u)

|u|δ > 0.

(A.2) (i) L ∈ C3(Rd),(ii) D2

uL(u) is positive definite for all u ∈ Rd.We denote by H the Legendre transform of L:

(2.1) H(z) := supu∈Rd

{〈z, u〉 − L(u)},

for z ∈ Rd; ∇ := (∂/∂xi)di=1 and 〈·, ·〉 denotes the inner product in Rd.

2.1. The stochastic control problem. We consider the following stochasticoptimization problem. For ε > 0, let

Vε(P0, P1) := inf

{E

[∫ 1

0

L(βX(t,X))dt

]∣∣∣∣ ∀X ∈ Aε such that

PX−10 = P0, PX−1

1 = P1

},(2.2)

where Aε is the set of all Rd-valued, continuous semimartingales {X(t)}0≤t≤1 on aprobability space (Ω,B, P ) such that there exists a Borel measurable βX : [0, 1] ×C([0, 1]) �→ Rd for which

(i) ω �→ βX(t,X(ω)) is Bt(C)+-measurable for all t ∈ [0, 1], where Bt(C) denotesthe Borel σ-field of C([0, t]);

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



(ii) {X(t) − X(0) −∫ t

0βX(s,X)ds :=

√εWX(t)}0≤t≤1, where WX is a σ[X(s) :

0 ≤ s ≤ t]-Brownian motion.Results about existence and uniqueness of a minimizer for Vε are gathered in thefollowing statement.

Theorem 2.1. Let ε > 0. Let us assume that Vε(P0, P1) < +∞ and thatassumptions (A.1) and (A.2) hold. Then

(i) Vε(P0, P1) admits a minimizer.(ii) if assumption (A.1) holds with δ = 2, Vε(P0, P1) admits a Markovian mini-

mizer.(iii) if L is strictly convex and assumption (A.1) holds with δ = 2, then Vε(P0, P1)

admits a unique minimizer (which is Markovian from (ii)).Actually statements (ii) and (iii) will be of no use in the present paper. They

were important in [11] in order to characterize the minimizer of (2.2) as the solutionof a forward-backward system which consists of the coupling of a usual stochasticdifferential equation (SDE) with a backward one (we refer the reader to [2] for astudy of such systems).

2.2. Stochastic duality. We now recall (Theorem 2.3 below) the stochasticduality result we obtained in [11]. In order to set the framework, we first quote afundamental result of optimal stochastic control theory.

In the same way as Aε, we define the set of semimartingales Aεt in C([t, 1]) and

we notice that (A.2)(ii) implies the strict convexity of u �→ L(u). Moreover, the HJBequation with diffusion coefficient (or viscosity) ε is the following PDE with giventerminal value ϕ(1, ·) = f(·):

(2.3)∂ϕ(t, x)

∂t+

ε

2�ϕ(t, x) + H(∇ϕ(t, x)) = 0 ((t, x) ∈ (0, 1) × Rd),

where � :=∑d

i=1 ∂2/∂x2

i and ∇ := ( ∂∂xi

; 1 ≤ i ≤ d).Theorem 2.2 (cf. [5]). Suppose that (A.1) and (A.2) hold. Then for any f ∈

C∞b (Rd), the HJB equation (2.3) with ϕ(1, ·) = f has a unique solution ϕ ∈ C1,2([0, 1]×

Rd) ∩ C0,1b ([0, 1] × Rd), which can be written as follows (as a value function):

ϕ(t, x) = supX∈Aε

t

{E[f(X(1))|X(t) = x](2.4)

−E

[∫ 1

t

L(βX(s,X))ds

∣∣∣∣X(t) = x

]},

and for the minimizer X ∈ Aεt, the following holds:

βX(s,X) = DxH(∇ϕ(s,X(s))).

In other words, this theorem establishes a one-to-one correspondence betweenclassical solutions of (2.3) and value functions of stochastic control problems withsmooth terminal cost. Actually it is a duality result since the supremum in (2.4)involves L, while (2.3) involves its Legendre transform H.

In [11] we proved the following duality theorem for the minimization problem(2.2).

Theorem 2.3 (stochastic duality). Let ε > 0 be fixed and Vε(P0, P1) be as definedin (2.2). Let us assume that (A.1), (A.2) are satisfied and

(2.5) Vε(P0, P1) < +∞.

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



Then, the following identity holds:

(2.6) Vε(P0, P1) = νε(P0, P1)

with νε(P0, P1) defined by

(2.7) νε(P0, P1) := sup

{∫Rd

ϕ(1, y)P1(dy) −∫Rd

ϕ(0, x)P0(dx)

},

where the supremum is taken over all classical solutions ϕ, to HJB equation (2.3) forwhich ϕ(1, ·) ∈ C∞

b (Rd).Remark 2.1. Actually a stronger version of Theorem 2.3 is proved in [11]: (2.6)

holds true without assuming that Vε(P0, P1) < +∞; moreover, this identity still holdswhen the supremum in νε is taken over all bounded, uniformly Lipschitz continuousviscosity solutions ϕ of (2.3).

Before proceeding further we check that the right-hand side of (2.6) is finite whenVε(P0, P1) < +∞ and give an outline of the proof of this theorem. For the detailedproof we refer the reader to [11].

Let us first notice that Theorem 2.2 recalled above ensures, in particular, thatgiven f ∈ C∞

b (Rd), a classical solution of HJB PDE (2.3) exists with f as a terminal

value and belongs to C1,2([0, 1] × Rd) ∩ C0,1b ([0, 1] × Rd). For more details we refer

the reader to [5, p. 206, Theorem 11.1 and p. 210, Remark 11.2]. Therefore the seton which the supremum in (2.7) is taken is not empty.

Let us now assume that Vε(P0, P1) < +∞. For all X ∈ Aε and ϕ solution of (2.3)satisfying ϕ(1, ·) ∈ C∞

b (Rd), the constraints on the marginals of X combined withthe Ito formula imply the following identities:

(2.8)

∫Rd

ϕ(1, y)P1(dy) −∫Rd

ϕ(0, x)P0(dx) = E(ϕ(1, X1) − ϕ(0, X0))

(2.9) = E

∫ 1

0

(∂ϕ(s,Xs)

∂t+

ε

2�ϕ(s,Xs) + βX(s,X)∇ϕ(s,Xs)

)ds.

Since ϕ solves (2.3) and H is the Legendre transform of L, it follows that

(2.10)

∫Rd

ϕ(1, y)P1(dy) −∫Rd

ϕ(0, x)P0(dx) ≤ E

∫ 1

0

L(βX(s,X))ds.

Hence

(2.11)

∫Rd

ϕ(1, y)P1(dy) −∫Rd

ϕ(0, x)P0(dx) ≤ Vε(P0, P1)

since ϕ and X have been chosen independently.The scheme of the proof of Theorem 2.3 proceeds as follows. We show first that the

function Q �→ Vε(P0, Q) is lower semicontinuous and convex on the set of probabilitymeasures on Rd. Therefore it coincides with its double dual, in particular at pointP1; namely,

(2.12) Vε(P0, P1) = supf∈Cb(Rd)

{∫Rd

f(x)P1(dx) − Vε(P0, ·)∗(f)

},

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



where for f ∈ Cb(Rd),

(2.13) Vε(P0, ·)∗(f) := supQ∈M1(Rd)

{∫Rd

f(x)Q(dx) − Vε(P0, Q)

}.

In this identity, Q plays the role of a terminal law and is arbitrary. Hence we are backin the framework of classical stochastic control; in particular, we can use Theorem2.2. For all of the details, see [11].

3. Kantorovich duality by vanishing viscosity. We start with a precisestatement of Kantorovich duality mentioned in the introduction. For the sake ofcoherence we keep a cost function of the form c(x, y) = L(y−x), but this result holdstrue for general costs (cf. [12] or [14]).

Theorem 3.1 (cf. [7], [8]). Let L : Rd → R+∪{+∞} be a lower semicontinuousfunction. Let P0 and P1 be given Borel probability measures on Rd. Let us keep thenotation TMK(P0, P1) for the MKP as in (1.3) and define

(3.1) T (P0, P1) = sup

{∫Rd


ϕ(x)P0(dx)

},

where the supremum is taken over all pairs (ϕ,ψ) ∈ L1(P0)×L1(P1) satisfying ψ(y)−ϕ(x) ≤ L(y − x). Then

(3.2) TMK(P0, P1) = T (P0, P1).

We now apply our stochastic duality (Theorem 2.3) in order to prove Kantorovichduality with the help of a vanishing viscosity argument. The first part of the followingstatement is our key tool to go from ε > 0 to ε = 0. Remember that T (P0, P1) isdefined by (3.1).

Theorem 3.2. Let us assume that TMK(P0, P1) < +∞ and that assumptions(A.1)–(A.2) hold. Let us recall that Vε (resp., νε) has been defined in (2.2) (resp.,(2.7)). We denote by gε � P1 the convolution of P1 with the Gaussian kernel gε(x) =

(2πε)− d

2 exp(− |x|22ε ). Then

(1.i) for all ε > 0,

(3.3) νε(P0, gε � P1) ≤ T (P0, P1).

(1.ii) Moreover,

(3.4) TMK(P0, P1) ≤ lim infε→0Vε(P0, gε � P1).

(2) As a consequence we recover the Kantorovich duality,

(3.5) TMK(P0, P1) = T (P0, P1).

Proof of Theorem 3.2. The only thing to prove is TMK(P0, P1) ≤ T (P0, P1).Indeed, the converse inequality is easy, as we now check. Let (u, v) ∈ L1(P0)×L1(P1)be such that v(y) − u(x) ≤ L(y − x) and μ with marginals P0 and P1. Then∫

Rd

v(y)P1(dy) −∫Rd

u(x)P0(dx) =

∫Rd×Rd

(v(y) − u(x))μ(dxdy)

≤∫Rd×Rd

L(y − x)μ(dxdy),

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



which yields

(3.6) T (P0, P1) ≤ TMK(P0, P1).

Let us now prove the converse.(1.i) Take ε > 0 and let ϕ(t, x) denote a solution to the HJB PDE (2.3) with

ϕ(1, ·) ∈ C∞b (Rd), which implies that ϕ ∈ C1,2([0, 1]×Rd)∩C0,1

b ([0, 1]×Rd). Let usdefine

uε(x) := ϕ(0, x),(3.7)

vε(y) := E(ϕ(1, y +√εW1)).(3.8)

The pair (uε, vε) belongs to L1(P0) × L1(P1) and satisfies∫Rd

ϕ(1, y)gε � P1(dy) −∫Rd

ϕ(0, x)P0(dx)

=

∫Rd

vε(y)P1(dy) −∫Rd

uε(x)P0(dx).

Moreover, by definition

(3.9) vε(y) − uε(x) = E(ϕ(1, Xx,y1 ) − ϕ(0, Xx,y

0 )),

where Xx,yt := x+ t(y− x) +

√εWt. Using the Ito formula and the fact that ϕ solves

(2.3), we obtain

(3.10) E(ϕ(1, Xx,y1 ) − ϕ(0, Xx,y

0 )) = E

∫ 1

0

(〈y − x,∇ϕ〉 −H(∇ϕ))(s,Xx,ys )ds,

which implies vε(y) − uε(x) ≤ L(y − x). Inequality (3.3) follows.(1.ii) Let us first notice that

(3.11) 0 ≤ lim infε→0Vε(P0, gε � P1)

since by definition Vε is positive. Moreover,

(3.12) lim infε→0Vε(P0, gε � P1) < +∞.

Indeed, (3.6) and (1.i) imply νε(P0, gε � P1) ≤ TMK(P0, P1) < +∞, and stochasticduality applied to the pair (P0, gε � P1) yields (3.12). Let us now consider a sequence(εn) which converges to 0 such that Vεn(P0, gεn �P1) converges to lim infε→0Vε(P0, gε �P1). Let us denote by Xn a minimizer of Vεn(P0, gεn � P1) (cf. Theorem 2.1). Foreach n, Xn ∈ Aεn . In particular, with the notation of (2.2),

(3.13) limn→+∞

E

∫ 1

0

L(βXn(s,Xn))ds = lim infε→0Vε(P0, gε � P1).

The superlinearity of L (namely, L(u) ≥ |u|δ with δ > 1) ensures that the sequenceof semimartingales (Xn) is tight and any converging subsequence converges to an

absolutely continuous process (cf. [15]). Let Xt = X0 +∫ t

0bX(s)ds be the limit of a

converging subsequence. From the convexity property of L,

(3.14) E

∫ 1

0

L(bX(s))ds ≥ E(L(X1 −X0)).

The law of X1 is equal to P1 since it is the limit in distribution of a subsequence ofgε � P1. Using Fatou’s lemma we obtain (3.4).

(2) By combining inequalities (3.3), (3.4), and (3.6) with stochastic duality appliedto the pair (P0, gε � P1), we recover Kantorovich duality.

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



4. Graph property by vanishing viscosity.

4.1. Graph property. We sketch the argument briefly, for the quadratic cost.For a complete exposition we refer the reader to [14].

So, let us assume for a short while that L(y − x) = 12 |y − x|2. In this case,

(4.1) TMK(P0, P1) := inf

{∫Rd×Rd

1

2|y − x|2μ(dxdy)

}

on the set of probability measures μ on Rd ×Rd with marginals P0 and P1 satisfying∫Rd |y|2P1(dy) < +∞ (resp.,

∫Rd |x|2P0(dx) < +∞). Kantorovich duality (3.2) takes

the form

(4.2) TMK(P0, P1) = sup

{∫Rd


ϕ(x)P0(dx)

},

where the supremum is taken over all pairs (ϕ,ψ) ∈ L1(P0)×L1(P1) satisfying ψ(y)−ϕ(x) ≤ 1

2 |y − x|2. Using the identity∫Rd×Rd

12 |y|

2μ(dxdy) =

∫Rd

12 |y|

2P1(dy) (resp.,∫

Rd×Rd12 |x|

2μ(dxdy) =

∫Rd

12 |x|

2P0(dx)), and setting u(x) := ϕ(x) + 1

2 |x|2

(resp.,

v(y) := 12 |y|

2 − ψ(y)), identity (4.2) can be rewritten as

(4.3) sup

∫Rd×Rd

〈x, y〉μ(dxdy) = inf

∫Rd

v(y)P1(dy) +

∫Rd

u(x)P0(dx),

where the supremum on the left-hand side is taken over all probabilities with marginalsP0 and P1 with finite second order moments, and the infimum on the right-hand sideis over pairs (u, v) ∈ L1(P0) × L1(P1) satisfying

(4.4) 〈x, y〉 ≤ u(x) + v(y) ∀(x, y) ∈ Rd × Rd.

This simple remark has an important consequence (cf. [14]): On the right-hand sideof (4.3) it is sufficient to consider pairs (u, v) such that u is convex and v is theLegendre transform of u. Now let μ∗ be optimal for (1.3) and (u∗, v∗) be optimalfor the right-hand side of (4.3) (cf. [14] for the respective existence of these optima).Then

(4.5) 〈x, y〉 = u∗(x) + v∗(y) for μ∗-a.a. (x, y).

Let us assume, moreover, that P0 is absolutely continuous w.r.t. Lebesgue measure.Then, differentiability properties of convex functions imply that u∗ is differentiableP0-a.s. on Rd. Let us consider x0 ∈ Rd such that (x0, y0) ∈ Suppμ∗ for some y0. Letu∗ be differentiable at x0. Comparison of identity (4.4), written for (u∗, v∗) valid forall (x, y0), x ∈ Rd on the one hand and (4.5) at (x0, y0) on the other hand, impliesthat y0 = ∇u∗(x0). We conclude that μ∗ is indeed supported on a graph, which isthe graph of ∇u∗: the graph property holds.

We now come back to general costs and recall precise statements on the graphproperty obtained, respectively, by Brenier and Benamou (quadratic cost) and Gangboand McCann (more general costs). The proof of Gangbo and McCann relies oncyclically monotone sets. These authors assume that L has the following property:For (p, θ, r) ∈ Rd×]0, π[×]0,+∞[, when the norm of p is large enough, there exists az ∈ Rd such that the restriction of L to the set

(4.6) K(p, z, θ, r) :=

{x ∈ Rd; |x− p||z| cos

(θ

2

)≤ 〈z, x− p〉 ≤ r|z|

}Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



attains its maximum at p. Let us notice that K(p, z, θ, r) is a truncated cone (withvertex p, angle 1

2θ, direction z); for this reason we call this condition the cone condi-tion. A drawback of the cone condition is that it can be easily checked only for radialfunctions L.

Theorem 4.1 (cf. [1], [6]). Let us assume that TMK(P0, P1) < +∞ and P0 isabsolutely continuous w.r.t. the Lebesgue measure on Rd.

(1) Let L(u) = 12 |u|

2. There exists a unique μ∗ minimizing (1.3). The support of

μ∗ is the graph of ∇u, where u is convex.

(2) Let L be superlinear (lim|u|→∞L(u)|u| = +∞) and strictly convex, satisfying

the cone condition. Let H denote the Legendre transform of L. Then there exists aunique μ∗ minimizing (1.3). There exists φ, L-concave, such that the support of μ∗ isthe graph of the mapping

(4.7) g(x) = x + ∇H(−∇φ(x)).

Let us recall that a function γ : Rd → R ∪ {−∞} is L-concave if there existsβ : Rd → R ∪ {−∞} with β �≡ −∞ such that

(4.8) ∀x ∈ Rd γ(x) = infy∈Rd

(L(y − x) − β(y)).

4.2. Vanishing viscosity method. In this section we apply stochastic dualityto recover the fact that an optimal measure for (1.3) is supported on a graph when P0 isabsolutely continuous w.r.t. the Lebesgue measure on Rd. More precisely, stochasticduality will enable us to reach a weak form of the situation just described for thequadratic cost, where we had at the same time identities (4.4) and (4.5). However,this weak form will turn out to be sufficient to conclude. In what follows we denoteby μ∗ an optimal measure for (1.3). The following statement exhibits a set S whichsupports μ∗.

Theorem 4.2. Let us assume that (A.1)–(A.2) hold true. Let μ∗ be optimal forthe MKP (1.3). There exists a sequence (εn, ϕεn) such that εn → 0,

(4.9)∂ϕεn(t, x)

∂t+

εn2�ϕεn(t, x) + H(∇ϕεn(t, x)) = 0,

as well as ϕεn(1, ·) ∈ C∞b (Rd) for all n, and μ∗(S) = 1, where

(4.10) S :=

{(x, y); lim

n→+∞E(ϕεn(1, y +

√εnW1)) − ϕεn(0, x) = L(y − x)

}.

Proof. For each ε > 0, by the definition of νε(P0, gε � P1), we can choose ϕε(t, x)such that

(4.11)∂ϕε(t, x)

∂t+

ε

2�ϕε(t, x) + H(∇ϕε(t, x)) = 0,

and choose ϕε(1, ·) ∈ C∞b (Rd) as well as

(4.12) νε(P0, gε � P1) − ε ≤∫Rd

ϕε(1, y)gε � P1(dy) −∫Rd

ϕε(0, x)P0(dx).

Since μ∗ has marginals P0 and P1, the right-hand side of this inequality can be writtenas

(4.13)

∫Rd×Rd

(E(ϕε(1, y +√εW1)) − ϕε(0, x))μ∗(dxdy).

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



Let us substract TMK(P0, P1) =∫Rd×Rd L(y − x)μ∗(dxdy) from both sides of (4.12).

We know that L(y−x)− (E(ϕε(1, y+√εW1))−ϕε(0, x)) always remains nonnegative

and limε→0 νε(P0, gε � P1) = TMK(P0, P1) (cf. Theorem 3.2 and its proof). Therefore

(4.14) E(ϕε(1, y +√εW1)) − ϕε(0, x) − L(y − x)

converges to 0 in L1(μ∗) when ε goes to 0, which implies μ∗-a.s. convergence for asubsequence (εn).

At this stage, let us assume that both sequences of functions (ϕεn(0, ·)) and(E(ϕεn(1, · + √

εnW1))) admit limits when n → +∞ which we denote, respectively,by u(·) and v(·). The pair (u, v) then satisfies

(4.15) v(y) − u(x) ≤ L(y − x) ∀(x, y) ∈ Rd × Rd

with equality on S. If we know that u(·) is differentiable at any interior point (x0, y0)of S, we can conclude, as we did for the quadratic cost, that the following holds:

(4.16) ∇u(x0) = ∇L(y0 − x0)

and consequently y0 = x0 + ∇H(∇u(x0)). The graph property will hold if this argu-ment is applicable at any (x0, y0) in S. Unfortunately neither separate convergencenor differentiability holds true in general; we also do not know whether interior pointsexist. Moreover, in the quadratic example, differentiability of u(·) was a consequenceof its convexity; actually existence of partial derivatives ∂u

∂xiwould have been sufficient

to conclude. In what follows we will approach the ideal quadratic situation by takingadvantage of semiconvexity properties of value functions under relevant assumptions.

Definition 4.1. Let Φ be a function defined on a convex subset of Rd with valuesin R ∪ {+∞}. The function Φ is semiconvex with constant C if there exists C > 0

such that x �→ Φ(x) + C |x|22 is convex.

Proposition 4.1 (cf. [5, p. 229]). Let G be a compact subset of Rd and Φ asemiconvex function on G. Let us assume that x0 maximizes Φ on G and belongs tothe interior of G. Then Φ is differentiable at x0 with DΦ(x0) = 0.

Theorem 4.3. Let us assume that L satisfies

(4.17) ∃ C > 0, D2uL ≤ C.

Let ϕ(t, x) be a value function given by (2.4). Then, ϕ(0, ·) is semiconvex with constantC.

The proofs of Theorem 4.3 and Proposition 4.1 are given in the appendix. Forother sufficient conditions which guarantee that the value function is semiconvex werefer the reader to [5]. Definition 4.1 is equivalent to the requirement

(4.18) ∀(x, z) Φ(x + z) + Φ(x− z) − 2Φ(x) ≥ −C|z|2

Note. From now on we will be working under assumption (4.17). We do notrequire L to satisfy the cone condition.

Definition 4.2. Let (εn) denote a sequence given by Theorem 4.2. For a ∈ Rd,

(4.19) ψa(x) := lim supn→+∞(ϕεn(0, x) − ϕεn(0, a)).

Proposition 4.2. Under assumption (4.17), the set Da := {x ∈ Rd;ψa(x) <+∞} is a convex set independent of a ∈ π1(S) := {x ∈ Rd;∃y ∈ Rd, (x, y) ∈ S}.Moreover, ψa is semiconvex on Da.

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



Proposition 4.3. Let us denote by (ei, 1 ≤ i ≤ n) the canonical basis of Rd

and assume that L ∈ C1(Rd). Let a belong to π1(S) and (a, b) ∈ S. If for some

i ∈ {1, . . . , d} there exist sequences (h(i)n ) and (y

(i)n ) such that h

(i)n → 0, y

(i)n → b, and

for all n, (a + h(i)n ei, y

(i)n ) ∈ S, then limn→+∞

1

h(i)n

ψa(a + h(i)n ei) exists and coincides

with ∂iL(b− a).Proof of Proposition 4.2. Since (a, b) ∈ S for all (u, c) ∈ Rd × Rd,

(4.20) ψa(u) ≥ L(b− a) − L(b− u),

(4.21) ψa(u) ≥ ψc(u) + L(b− a) − L(b− c).

Indeed, since ϕεn(0, c) − E(ϕεn(1, b +√εnW1) ≥ −L(b− c), the following holds:

ϕεn(0, u) − ϕεn(0, a)

≥ ϕεn(0, u) − ϕεn(0, c) + E(ϕεn(1, b +√εnW1)) − ϕεn(0, a) − L(b− c).

To obtain (4.21) it remains to let n go to +∞ and apply Theorem 4.2. Inequality(4.20) follows when u equals c. Semiconvexity of ψa on its domain Da follows fromTheorem 4.3 and the fact that if (Φn) is a sequence of semiconvex functions withthe same constant C, then lim supΦn is itself semiconvex with this same constant.Therefore the set Da is convex since it coincides with the domain of the convexfunction ψa + C

2 | · |2. Moreover, let a and a′ in π1(S). By applying (4.21) twice, to

(a, a′) and to (a′, a), we conclude that Da = Da′ .Proof of Proposition 4.3. Take i ∈ {1, . . . , d}, a ∈ π1(S) and b such that (a, b) ∈ S.

Inequality (4.21) implies

(4.22) L(b−a)−L(b−(a+h(i)n ei)) ≤ ψa(a+h(i)

n ei) ≤ L(y(i)n −a)−L(y(i)

n −(a+h(i)n ei)).

The desired statement follows since L is C1 by letting n → +∞.For a ∈ π1(S) and b such that (a, b) belongs to S, we see that the function

x �→ L(b−a)−ψa(x) plays the same role as x �→ v(b)−u(x) in (4.15): L(b−a)−ψa(x) ≤L(b−x) on Rd with equality when x = a. We cannot apply Proposition 4.1 since we donot know whether π1(S) has interior points. However, suppose that the assumptionsof Proposition 4.3 are satisfied for all i ∈ {1, . . . , d}. Then we can set ∇ψa to be thevector

(4.23) ∇ψa :=

(lim

n→+∞

1

h(i)n

ψa(a + h(i)n ei); i ≤ 1 ≤ d

).

Hence b is uniquely given by a + ∇H(∇Ψa).Let us now assume that P0 is absolutely continuous w.r.t. the Lebesgue measure

on Rd. We prove below that the set of points a ∈ π1(S), where ∇ψa(a) does not exist,has Lebesgue measure 0. It suffices to show that the set Π := {a ∈ π1(S); ∂1ψa(a)does not exist} has Lebesgue measure 0. Let us first make the following remark: Forα ∈ π1(S) and β such that (α, β) ∈ S, consider

U+n (α, β) = {(x, y) ∈ Rd × Rd;x = α + he1, h > 0, |h|2 + |y − β|2 < n−2},

U−n (α, β) = {(x, y) ∈ Rd × Rd;x = α + he1, h < 0, |h|2 + |y − β|2 < n−2};

if U+n (α, β)∩S �= ∅ or U−

n (α, β)∩S �= ∅ for a sequence of integers n going to +∞, then∂1ψα(α) exists as a consequence of Proposition 4.3. Take now a ∈ Π and (a, b) ∈ S.

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



There exists N = N(a, b) ≥ 1 such that U+N (a, b) ∩ S = U−

N (a, b) ∩ S = ∅. Thereforethe set σ := {x ∈ R; a + xe1 ∈ Π} is at most countable and thus has Lebesguemeasure 0. We have just proved the following theorem.

Theorem 4.4. Let P0(dx) � dx and TMK(P0, P1) < +∞. Under assumption(4.17), the graph property holds.

5. Appendix.Proof of Theorem 4.3. Let (Xx

t ; t ∈ [0, 1]) in Aε be optimal for (2.4); namely,

Xxt = x +

∫ t

0∇H(∇ϕ(s,Xx

s ))ds +√εWt and

(5.1) ϕ(0, x) = E

[f(Xx

1 ) −∫ 1

0

L(∇H(∇ϕ(s,Xxs )))ds

]

with ϕ(1, ·) = f(·). For z ∈ Rd let us set X1t := Xx

t +(1−t)z and X2t := Xx

t −(1−t)z;these processes both belong to Aε and satisfy X1

0 = x + z, X20 = x − z, and X1

1 =X2

1 = Xx1 . Let βx

t := ∇H(∇ϕ(t,Xxt )). From the definition of ϕ in (2.4) it follows

that

ϕ(0, x + z) + ϕ(0, x− z) − 2ϕ(0, x)

≥ E

∫ 1

0

(2L(βxt ) − L(βx

t + z) − L(βxt − z))dt.(5.2)

The conclusion follows from assumption (4.17).Proof of Proposition 4.1. Let B be an open ball centered at x0 included in G.

Such a ball exists since x0 is an interior point of G. The function x �→ Ψ(x) :=

Φ(x) + C |x−x0|22 is convex on B for some constant C > 0. Therefore there exists a

vector b ∈ Rd such that, for all x ∈ B, Ψ(x) ≥ Ψ(x0) + 〈b, x − x0〉. Moreover, sincex0 maximizes Φ on B,

(5.3) 〈b, x− x0〉 ≤ C|x− x0|2

2∀x ∈ B.

For ε > 0 small enough, the point x = x0 + εb belongs to B. We conclude that b = 0since it must satisfy ε|b|2 ≤ C

2 ε2|b|2 for all ε small enough.

Acknowledgments. We thank two anonymous referees for their comments andsuggestions which helped us to improve the first version of this paper. This workwas done during the visit of the second author (M. Thieullen) to the University ofHokkaido. She would like to thank this university for its hospitality.

REFERENCES

[1] Y. Brenier and J. D. Benamou, A numerical method for the optimal mass transport problemand related problems, in Monge Ampere Equation: Applications to Geometry and Opti-mization, Proceedings of the NSF-CBMS Conference (Deerfield Beach, FL, 1997), L. A.Caffarelli and M. Milman, eds., Contemp. Math. 226, AMS, Providence, RI, 1999, pp. 1–11.

[2] F. Delarue, On the existence and uniqueness of solutions to FBSDEs in a nondegeneratecase, Stochastic Process. Appl., 99 (2002), pp. 209–286.

[3] L. C. Evans, Partial Differential Equations, Grad. Stud. Math. 19, AMS, Providence, RI, 1998.[4] L. C. Evans, Partial differential equations and Monge–Kantorovich mass transfer, in Current

Developments in Mathematics (Cambridge, MA, 1997), S. T. Yau, ed., Int. Press, Boston,MA, 1999, pp. 65–126.

[5] W. H. Fleming and H. M. Soner, Controlled Markov Processes and Viscosity Solutions,Springer-Verlag, Berlin, Heidelberg, New York, Tokyo, 1993.

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p



[6] W. Gangbo and R. J. McCann, The geometry of optimal transportation, Acta Math., 177(1996), pp. 113–161.

[7] L. V. Kantorovich, On the translocation of masses, C. R. (Dokl.) Acad. Sci. URSS, 37 (1942),pp. 199–201; reprinted in J. Math. Sci., 133 (206), pp. 1381–1382.

[8] H. G. Kellerer, Duality theorem for marginal problems, Z. Wahrsch. Verw. Gebiete, 67 (1984),pp. 399–432.

[9] T. Mikami, Optimal control for absolutely continuous stochastic processes and the mass trans-portation problem, Electron. Comm. Probab., 7 (2002), pp. 199–213.

[10] T. Mikami, Monge’s problem with a quadratic cost by the zero noise limit of h-path processes,Probab. Theory Related Fields, 129 (2004), pp. 245–260.

[11] T. Mikami and M. Thieullen, Duality theorem for the stochastic optimal control problem,Stochastic Process Appl., 116 (2006), pp. 1815–1835.

[12] S. T. Rachev and L. Ruschendorf, Mass Transportation Problems, Vol. I: Theory, Vol. II:Application, Springer-Verlag, Berlin, Heidelberg, New York, Tokyo, 1998.

[13] L. Ruschendorf and W. Thomsen, Note on the Schrodinger equation and I-projections,Statist. Probab. Lett., 17 (1993), pp. 369–375.

[14] C. Villani, Topics in Optimal Transportation, Grad. Stud. Math. 58, AMS, Providence, RI,2003.

[15] W. A. Zheng, Tightness results for laws of diffusion processes application to stochastic me-chanics, Ann. Inst. H. Poincare Probab. Statist., 21 (1985), pp. 103–124.

Dow

nloa

ded

11/2

3/14

to 1

29.1

20.2

42.6

1. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Documents

Optimal Transportation Problem by Stochastic Optimal Control