23
Steiner problems in optimal transport Jonathan Dahl 1 Introduction The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application ([18] is an extensive reference). Roughly stated, the problem involves one who has an initial configuration of mass and would like to transport it to a terminal configuration of mass, doing so at least cost. For instance, one might have a set of water towers and a region of drought that one would like to relieve as quickly as possible. Abstractly, this becomes a constrained optimization problem in a space P of probability measures over the base space. Unfortunately, the mere existence of a solution is difficult to come by, due to the non-linear nature of the problem. It was over 200 years before Kantorovich [12, 11] provided serious progress by formulating and solving a weak version of the problem. We will focus on this Monge-Kantorovich problem, defined in detail below. Returning to our drought problem, suppose that the drought and even the construction of the water towers has yet to occur. The question becomes where to build the water towers to best prepare for possible droughts. If there are mul- tiple possible droughts one wishes to protect against, but one can only afford enough water towers to combat a single drought at a time, one wishes to find a configuration of water towers which is nicely balanced amongst the possible droughts. We will investigate this by solving Steiner-type problems in the prob- ability space P . A Steiner problem is a search for a length minimizing network, usually satisfying some boundary conditions, in a metric space. Steiner problems are traditionally solved via local compactness arguments; however, as we cannot expect local compactness from our probability space P , we will instead need to argue using the geometry of the base space. In particular, we show: Theorem 12. Suppose X is a separable, locally compact Hadamard space or a compact complete metric space. Then the parameterized and general versions of the Monge-Kantorovich-Steiner problem and the Steiner problem are solvable in (P p (X ),W p ) for arbitrary boundary data. Here (P p (X ),W p ) is the p-Wasserstein space, whose definition and basic properties are recalled in Section 2.2. The argument actually gives a technically more general result, listed precisely in Section 3.3. As Steiner solutions can be considered generalized geodesics [14], we cannot reasonably hope to solve the classical Steiner problem if P is not a geodesic 1

Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Steiner problems in optimal transport

Jonathan Dahl

1 Introduction

The problem of optimal transport, originally proposed by Monge, has a longhistory of investigation and application ([18] is an extensive reference). Roughlystated, the problem involves one who has an initial configuration of mass andwould like to transport it to a terminal configuration of mass, doing so at leastcost. For instance, one might have a set of water towers and a region of droughtthat one would like to relieve as quickly as possible. Abstractly, this becomes aconstrained optimization problem in a space P of probability measures over thebase space. Unfortunately, the mere existence of a solution is difficult to comeby, due to the non-linear nature of the problem. It was over 200 years beforeKantorovich [12, 11] provided serious progress by formulating and solving aweak version of the problem. We will focus on this Monge-Kantorovich problem,defined in detail below.

Returning to our drought problem, suppose that the drought and even theconstruction of the water towers has yet to occur. The question becomes whereto build the water towers to best prepare for possible droughts. If there are mul-tiple possible droughts one wishes to protect against, but one can only affordenough water towers to combat a single drought at a time, one wishes to finda configuration of water towers which is nicely balanced amongst the possibledroughts. We will investigate this by solving Steiner-type problems in the prob-ability space P. A Steiner problem is a search for a length minimizing network,usually satisfying some boundary conditions, in a metric space.

Steiner problems are traditionally solved via local compactness arguments;however, as we cannot expect local compactness from our probability space P,we will instead need to argue using the geometry of the base space. In particular,we show:

Theorem 12. Suppose X is a separable, locally compact Hadamard space or acompact complete metric space. Then the parameterized and general versions ofthe Monge-Kantorovich-Steiner problem and the Steiner problem are solvable in(Pp(X ),Wp) for arbitrary boundary data.

Here (Pp(X ),Wp) is the p-Wasserstein space, whose definition and basicproperties are recalled in Section 2.2. The argument actually gives a technicallymore general result, listed precisely in Section 3.3.

As Steiner solutions can be considered generalized geodesics [14], we cannotreasonably hope to solve the classical Steiner problem if P is not a geodesic

1

Page 2: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

space. Therefore, we will also define and solve for weak solutions of a Monge-Kantorovich-Steiner problem. The main idea in our definition of the weak prob-lem is that, in a geodesic space, the edges of a Steiner solution are alwaysgeodesic segments, so the problem only sees distances of a finite point config-uration. The weak problem is then to minimize the sum of these distances inplace of the sum of the lengths of the connecting paths.

We will conclude by using the geometry of Wasserstein spaces of order 2 tostudy the structure of the Steiner solutions. In particular, we will see how theSteiner problem for Gaussian boundary data in P2(R) is a disguised form of theclassical planar Steiner problem.

2 Statement of the problem

2.1 Steiner problems

Steiner problems are concerned with finding minimal networks between a fixedset of points in a metric space. More precisely, a network Γ is a continuous mapφ : G → X where X is a metric space and G is a graph, topologized in thestandard way, called the parametric graph of Γ. φ may be decomposed into aunion of curves, allowing one to compute the total length l(Γ) = l(φ) by workingon each curve separately. The Steiner problem is to find a network of minimallength in some set of networks. We will focus on two cases.

Definition 1. The parameterized Steiner problem for a graph G and a metricspace X is: given k vertices v1, . . . , vk ∈ G and k points p1, . . . , pk ∈ X , find anetwork of minimal length in the set of all networks in X with parametric graphG that send vi to pi for 1 ≤ i ≤ k.

Definition 2. The general Steiner problem for a metric space X is: given kpoints p1, . . . , pk ∈ X , find a network of minimal length in the set of all networksφ : G→ X such that G is a connected graph and p1, . . . , pk are contained in theimage of the vertex set.

We call the points p1, . . . , pk the boundary points of the problem. If X is acomplete, locally compact, geodesic space, then the parameterized and generalSteiner problems are solvable for any boundary points (see [10]). Conversely,since for boundary points p1, p2 these Steiner problems are equivalent to thegeodesic problem, X must be a geodesic space for solutions to exist for arbi-trary boundary data. We will show by an explicit class of examples, however,that local compactness is not a necessary condition for existence of solutions toarbitrary boundary data.

If we do not allow new vertices, but instead look for a length-minimizingspanning subgraph of the complete graph on our given boundary points, a solu-tion is called a minimal spanning tree. We are minimizing over a finite numberof graphs here, so the minimal spanning tree problem may be solved by a sim-ple algorithm. This is important, as the Steiner problem is much more difficult(in fact it is NP-complete), and the minimal spanning tree may be seen as a

2

Page 3: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

good approximation. We define the Steiner ratio to codify the quality of theapproximation of the Steiner problem by the minimal spanning tree problem.

Definition 3. The Steiner ratio ρ(X ) of a metric space X is

infM

Ls(M)La(M)

,

where M is any finite set of points in X , Ls(M) is the infimum of the lengthsof all networks spanning M , and La(M) is the length of the minimal spanningtree of M .

In general, the Steiner ratio is in [1/2, 1]. It is trivial to see that the Steinerratio of R is 1, but calculation of the Steiner ratio of the plane R2 took a gooddeal of effort. In 1968, Gilbert and Pollak [8] conjectured that the Steiner ratioof the plane is

√3/2. It was not until 1990 that Du and Hwang [6] positively

resolved the conjecture.

2.2 Optimal transport

We now introduce the basic notions of optimal transport.Given two spaces X ,Y and two subsets of probability measures P ⊂ P (X )

and Q ⊂ P (Y), we define the set of transport plans Π(P,Q) as the set of prob-ability measures π ∈ P (X × Y) such that (projX )#π ∈ P and (projY)#π ∈ Q.(projX )#π and (projY)#π are called the marginals of π. Here f#µ denotes thepush-forward of µ by f , defined by f#µ(A) = µ(f−1(A)).

If we suppose that the cost of implementing a transport plan depends onlyon the structure of the spaces X and Y, we might suppose that for some costfunction c : X × Y → R, the total cost of the transport plan is∫

X×Yc(x, y) dπ(x, y).

If for some π0 ∈ P (X × Y) with marginals µ and ν, we have∫X×Y

c(x, y) dπ0(x, y) = infπ∈Π(µ,ν)

∫X×Y

c(x, y) dπ(x, y),

then we say that π0 is an optimal transference plan. In this case[∫X×Y

c(x, y) dπ0(x, y)]α

is called the α-optimal cost between µ and ν. (We include the parameter α sothat we can allow Wasserstein distance in this framework.)

Definition 4. The Monge-Kantorovich problem for spaces X ,Y is: given a costfunction c, and measures µ ∈ P (X ) and ν ∈ P (Y), find an optimal transferenceplan between µ and ν.

3

Page 4: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

The Monge-Kantorovich problem is solvable in very general settings.

Theorem 1 (Theorem 4.1 of [18]). Let (X , µ) and (Y, ν) be two Polishprobability spaces. Let a : X → R ∪ −∞ and b : Y → R ∪ −∞ be uppersemicontinuous functions such that a ∈ L1(µ) and b ∈ L1(ν). Suppose thatc : X × Y → R ∪ +∞ is a lower semicontinuous function such that c(x, y) ≥a(x)+ b(y) for all x ∈ X , y ∈ Y. Then there exists an optimal transference planbetween µ and ν.

An important special case is when X = Y is a Polish metric space withdistance function d and c = dp for some p ∈ [1,∞). 1/p-optimal cost is thenthe Wasserstein distance of order p, given by

Wp(ν, µ) =(

infπ∈Π(µ,ν)

∫X×X

dp(x, y) dπ(x, y))1/p

.

For any arbitrary x0 ∈ X , we define

Pp(X ) =µ ∈ P (X ) |

∫Xdp(x, x0) dµ(x) <∞

.

Wp is then a metric on Pp(X ). Furthermore, if X is a complete, separable andlocally compact length space, then Pp(X ) a geodesic space (see Chapters 6 and 7of [18]).

We wish to solve the Steiner problem in a probability space P ⊂ P (X ). It isuseful however to first consider a hybrid Monge-Kantorovich-Steiner problem.In a geodesic space, solutions of Steiner problems map edges to geodesics, so thisbecomes equivalent to minimizing a certain sum of distances. In a more generalmetric space, this correspondence need not hold, but one could view a solutionwhich minimizes the sum of distances as a weak solution to the Steiner problem.Accordingly, the α-optimal cost of a network φ : G → P (X ) is defined as thesum over all edges vi, vj of G of the α-optimal cost between φ(vi) and φ(vj).The optimal costs are achieved by solutions of the Monge-Kantorovich problem,so we consider the following Monge-Kantorovich-Steiner problems:

Definition 5. The parameterized Monge-Kantorovich-Steiner problem for agraph G, a metric space X , α > 0 and a cost function c is: given k verticesv1, . . . , vk ∈ G and k probability measures µ1, . . . , µk ∈ P ⊂ P (X ), find anetwork of minimal α-optimal cost in the set of all networks in P with parametricgraph G that send vi to µi for 1 ≤ i ≤ k.

Definition 6. The general Monge-Kantorovich-Steiner problem for a metricspace X , α > 0 and a cost function c is: given a subset P ⊂ P (X ) and k proba-bility measures µ1, . . . , µk ∈ P, find a network of minimal α-optimal cost in theset of all networks φ : G→ P such that G is a connected graph and µ1, . . . , µkare contained in the image of the vertex set.

If for some π ∈ Π(P (X ), P (Y)) with marginals µ ∈ P (X ), ν ∈ P (Y) thereexists a measurable map T : X → Y such that π = (id, T )#µ, then π is said

4

Page 5: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

to be deterministic, and T is called the transport map. The classical Mongeproblem is to look for an optimal deterministic transference plan. We may thusconsider the following two problems as well:

Definition 7. The parameterized Monge-Steiner problem for a graph G, ametric space X , α > 0 and a cost function c is: given k vertices v1, . . . , vk ∈ Gand k probability measures µ1, . . . , µk ∈ P ⊂ P (X ), find a network of minimalα-optimal cost in the set of all networks in P with parametric graph G that sendvi to µi for 1 ≤ i ≤ k, and each α-optimal cost is achieved by a deterministictransference plan.

Definition 8. The general Monge-Steiner problem for a metric space X , α > 0and a cost function c is: given k probability measures µ1, . . . , µk ∈ P ⊂ P (X ),find a network of minimal α-optimal cost in the set of all networks φ : G → Psuch thatG is a connected graph and µ1, . . . , µk are contained in the image of thevertex set, and each α-optimal cost is achieved by a deterministic transferenceplan.

Solvability of the Monge problem holds far less generally than solvability ofthe Monge-Kantorovich problem. For example, a transport map can only senda Dirac mass to another Dirac mass. We will therefore see the strongest resultsfor the Monge-Kantorovich-Steiner problems and the classical Steiner problems.

3 Existence of solutions

3.1 The direct method of minimization

In the existence proof for a solution of the classical Steiner problem on Rn,one shows that any minimizing sequence must eventually remain in a boundedset and applies a compactness argument. Unfortunately, bounded sets are nolonger precompact in a probability space P (X ). We must therefore use anothercriterion for precompactness, which is given by the following:

Theorem 2 (Prokhorov’s Theorem [16]). If X is a Polish space, then a setP ⊂ P (X ) is precompact for the weak topology if and only if it is tight, i.e. forany ε ≥ 0 there exists a compact set Kε ⊂ X such that µ(X \Kε) ≤ ε if µ ∈ P.

We now show that the direct method works if a tightness bound is assumed.

Proposition 3 (cf. Theorem 1). Let X be a Polish space, and suppose α > 0and c : X × X → [0,+∞] is a lower semicontinuous cost function. Fix boundarypoints µ1, . . . , µk ∈ P (X ) and let λ1,1, λ1,2, . . . , λk+l,k+l ∈ [0,∞) be fixed edgecoefficients. Let Φ : (P (X )l) → R be the Monge-Kantorovich-Steiner functional

Φ(µk+1, . . . , µk+l) =k+l∑i,j=1

λi,j[

infπ∈Π(µi,µj)

∫X×X

c(x, y) dπ(x, y)]α.

5

Page 6: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

If P ⊂ P (X ) is tight, then there exist ν1, . . . , νl ∈ P such that

Φ(ν1, . . . , νl) = infPl

Φ.

Proof. Define F : [Π(P (X ), P (X ))](k+l)2 → R by

F (π1,1, . . . , πk+l,k+l) =k+l∑i,j=1

λi,j[∫X×X

c(x, y) dπi,j(x, y)]α,

and note thatinfPl

Φ = infQF

where Q ⊂ [Π(P (X ), P (X ))](k+l)2

is the set of (k + l)2-tuples of transfer-ence plans with marginals matching fixed points µ1, . . . , µk where appropriate,with marginals in P otherwise and with internally consistent marginals. SinceX is Polish, each set µi is tight. Also, P is tight by assumption. ThusP, µ1, . . . , µk is a finite collection of precompact sets by Prokhorov’s Theo-rem and P ′ = P ∪ µ1, . . . , µk is precompact and tight. Q ⊂ [Π(P ′,P ′)](k+l)2

by construction. Π(P ′,P ′) is tight, hence precompact, by the following lemma:

Lemma 4 (Lemma 4.4 of [18]). Let X and Y be two Polish spaces. Let P ⊂P (X ) and Q ⊂ P (Y) be tight subsets. Then the set Π(P,Q) of all transferenceplans whose marginals lie in P and Q respectively, is itself tight in P (X × Y).

Let πni,j ⊂ Q be an F -minimizing sequence. Since πn1,1 ⊂ Π(P ′,P ′),by taking a subsequence we may assume πn1,1 converges weakly to some π1,1 ∈Π(P ′,P ′). Taking a subsequence (k + l)2 − 1 more times, we may even assumeπni,j converges weakly to some πi,j ∈ Π(P ′,P ′) for all i, j. Q is clearly closed, so(π1,1, . . . , πk+l,k+l) ∈ Q. We cite another lemma to show that (π1,1, . . . , πk+l,k+l)is an F -minimizer.

Lemma 5 (Lemma 4.3 of [18]). Let X and Y be two Polish spaces, andc : X ×Y → R∪ +∞ a lower semicontinuous cost function. Let h : X ×Y →R ∪ −∞ be an upper semicontinuous function such that c ≥ h. Let (πk)k∈Nbe a sequence in P (X × Y), converging weakly to some π ∈ P (X × Y), in sucha way that h ∈ L1(πk), h ∈ L1(π), and∫

X×Yh dπk →

∫X×Y

h dπ.

Then ∫X×Y

c dπ ≤ lim infk→∞

∫X×Y

c dπk.

In particular, if c ≥ 0, then F : π →∫cπ is lower semicontinuous on P (X ×Y),

equipped with the topology of weak convergence.

6

Page 7: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Applying Lemma 5 for h ≡ 0,

infQF ≤ F (π1,1, . . . , πk+l,k+l) ≤ lim inf

n→∞F (πn1,1, . . . , π

nk+l,k+l) = inf

QF

and (π1,1, . . . , πk+l,k+l) is F -minimizing. Taking marginals yields the desiredΦ-minimizer.

3.2 Tightness from CAT(0)

Proposition 3 shows that the direct method for solving the Monge-Kantorovich-Steiner problem works as long as one has an a priori tightness estimate. We willnow see how the geometry of the base space can provide this tightness estimate.We recall some basic notions of metric geometry.

Definition 9. A space of nonpositive Alexandrov curvature is a length spacewhich can be covered by a family of open sets Vi such that for each Vi:

1. There exists a shortest path in Vi connecting any two points in Vi.

2. For any a, b, c ∈ Vi and any point d in the shortest path ac, let ∆abcbe the comparison triangle for ∆abc in R2, i.e. |ab| = |ab|, |ac| = |ac|and |bc| = |bc|, and let d be the point in ac such that |ad| = |ad|. Then|bd| ≤ |bd|. (Intuitively, ∆abc is skinnier than ∆abc.)

This notion is equivalent to non-positive sectional curvature in the settingof Riemannian manifolds.

Definition 10. A Hadamard space (or complete CAT(0) space) is a simplyconnected complete space of nonpositive Alexandrov curvature.

Hadamard spaces X are important because the curvature conditions holdfor all triangles in X , not just small triangles [3]. In our discussion, this allowsus to ensure the convexity of distance functions on X , which will give us goodcontrol of convex hulls. In particular, X is locally convex. We will need Xto be separable, so we note that this is always true for X of finite Hausdorffdimension.

In order to assure that the Monge-Kantorovich-Steiner problem is aware ofour geometric assumptions, we will also assume that the cost function is basedon the distance function.

Lemma 6. Let X be a separable Hadamard space, α > 0 and c : X × X →[0,+∞] a lower semicontinuous cost function of the form c = ϕ d where ϕ is amonotone non-decreasing function and d is the distance in X . Let µ1, . . . , µk ∈P (X ) be fixed boundary points and let λ1,1, λ1,2, . . . , λk+l,k+l ∈ [0,∞) be fixededge coefficients. Suppose that µ1, . . . , µk all have compact support. Let Φ :(P (X )l) → R be the Monge-Kantorovich-Steiner functional

Φ(µk+1, . . . , µk+l) =k+l∑i,j=1

λi,j[

infπ∈Π(µi,µj)

∫X×X

c(x, y) dπ(x, y)]α.

7

Page 8: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Then there exist ν1, . . . , νl ∈ P (X ) (with compact support) such that

Φ(ν1, . . . , νl) = inf(P (X ))l

Φ.

Proof. Let K be a large enough compact set so that µ1, . . . , µk ∈ P (K). SinceX is locally convex, H = co (K) is compact by an exercise in [2].

As shown in [3], there exists a unique orthogonal projection map projH :X → H which is a distance non-increasing retraction of X onto H. So givenπ ∈ Π(µi, P (X )), we have (projH ×projH)#π ∈ Π(µi, P (H)) and∫

X×Xc(x, y) d(projH ×projH)#π(x, y) ≤

∫X×X

c(x, y) dπ(x, y).

Thus inf(P (X ))l Φ = inf(P (H))l Φ.Since H is compact, choosing Kε = H yields that P (H) is tight. P (H) is

also closed, so Proposition 3 implies that there exists ν1, . . . , νl ∈ P (H) suchthat

Φ(ν1, . . . , νl) = inf(P (X ))l

Φ = inf(P (H))l

Φ.

Theorem 7. If X is a separable Hadamard space or a compact space, then theparameterized Monge-Kantorovich-Steiner problems are solvable for compactlysupported boundary data with a lower semicontinuous cost function c : X ×X →[0,+∞] of the form c = ϕ d where ϕ is a monotone non-decreasing functionand d is the distance in X .

If c = dp and α = 1/p for some p ∈ [1,∞), then the general Monge-Kantorovich-Steiner problem is solvable as well.

Furthermore, the minimizing configuration in P (X ) produced in both casesconsists of measures with compact support.

Proof. For the parameterized problem with graphG, label the boundary verticesv1, . . . , vk and the remaining vertices vk+1, . . . , vk+l. Let λi,j be half of theincidence number of vi and vj . Proposition 3 or Lemma 6 then provides thesolution.

For the general problem, we follow the classical argument (see [10]). Let Kbe the set of finite connected graphs G with k distinguished boundary vertices,let ΦG denote the corresponding Monge-Kantorovich-Steiner functional for eachG ∈ K. We wish to achieve

infG∈K

inf ΦG = infG∈K

ΦG(νG),

where νG ∈ (P (X ))|v(G)|−k is the solution of the parameterized problem.Let K′ be the subset of K consisting of trees, and let K′′ be the subset of

K′ where all vertices not in the distinguished boundary set have degree at leastthree. Since K′′ is a finite set, the infimum over K′′ is trivially achieved. Itremains to show that

infG∈K

inf ΦG = infG∈K′

inf ΦG = infG∈K′′

inf ΦG.

8

Page 9: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

For any G ∈ K and any ν ∈ (P (X ))|v(G)|−k, we may take a spanning treeG′ ∈ K′ of G and ΦG′(ν) ≤ ΦG(ν) since

λi,j infπ∈Π(µi,µj)

∫X×X

c(x, y) dπ(x, y)

is always non-negative and we are only possibly setting some of the λi,j to zero.So

infG∈K

inf ΦG = infG∈K′

inf ΦG.

Similarly, for any G′ ∈ K′, removing interior vertices of degree one will keepus in K′ and will not increase ΦG′(ν). We may also replace any interior vertexof degree two by an edge between its neighbors to obtain a graph G′′ ∈ K′′.It follows from the triangle inequality for Wasserstein distances that ΦG′′(ν) ≤ΦG′(ν). Thus

infG∈K

inf ΦG = minG∈K′′

ΦG(νG).

The compactness of supports also follows from Lemma 6.

We note that in the c = d2 and α = 1/2 case, if X is a Riemannian manifold,G is a star and the boundary data is absolutely continuous, the optimal couplingsgiven by Theorem 1 may be taken to be deterministic, i.e. the Monge problemis solvable (Theorem 10.40 of [18]). Thus Theorem 7 gives:

Corollary 8. If M is a Riemannian manifold, M is compact or has non-positivesectional curvature, G is a star, c = d2, α = 1/2 and each fixed boundary pointµi is compactly supported and absolutely continuous with respect to the volumemeasure of M , then the parameterized Monge-Steiner problem is solvable.

We may say quite a bit more for the classical Steiner problem.

Corollary 9. If X is a compact space, then the parameterized and generalSteiner problems are solvable for arbitrary boundary data on (Pp(X ),Wp). IfX is a separable locally compact Hadamard space, then the parameterized andgeneral Steiner problems are solvable for compactly supported boundary data on(Pp(X ),Wp).

Proof. In either case, we have Monge-Kantorovich-Steiner solutions for the costfunction dp with α = 1/p. The associated cost functional is Wp, which metrizesPp(X ) as a geodesic space. Thus the adjacent measures may be joined bygeodesics, forming a minimal network.

Note that the geodesics in the Steiner network may be assumed to stay inthe set P2(H) of measures of compact support, as can be seen by consideringthe geodesic problem as a parametric Steiner problem.

9

Page 10: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

3.3 Generalizing the argument

The curvature assumption on X was only used above to control properties ofconvex hulls. In particular, the arguments carry through for any complete sep-arable length space X satisfying:

1. If K ⊂ X is compact, then co (K) is compact.

2. There exists a compact set K0 ⊂ X such that if K ⊂ M is a compactset with K0 ⊂ K, then there exists a distance non-increasing retractionprojH : X → H where H is compact and K ⊂ H.

As mentioned above, condition 1 holds for any complete metric space which is lo-cally convex, thus condition 1 holds for Riemannian manifolds (by the existenceof strongly convex neighborhoods [5]) and for Alexandrov spaces of curvaturebounded above (see Proposition II.1.4 of [3]). If X = S × N where X has theproduct metric, S is compact and N is a Hadamard space, then it is easy toshow that condition 2 holds for K0 = ∅ and H = S × co (projN K) by setting

projH = id×projco(projN K)

,

where the second component function is the orthogonal projection in N . Wetherefore have:

Proposition 10. If X = S × N where X is a complete, separable, locallyconvex length space endowed with the product metric, S is compact and N isa Hadamard space, then the parameterized and general versions of the Monge-Kantorovich-Steiner problem and the Steiner problem are solvable in (Pp(X ),Wp)for arbitrary boundary data of compact support.

As a particular case, the hypothesis of Proposition 10 is satisfied for anyRiemannian manifold M which splits isometrically as M = S × Rn with Scompact. One may think of this condition as a strong version of the SoulTheorem of Cheeger and Gromoll holding.

We also assumed compact support for the boundary data in order to say thatthe supports of the minimizing sequence could be assumed to be compact. Forgeneral boundary data µ1, . . . , µk, one only has tightness of the set µ1, . . . , µk.We will thus approximate the solution for boundary data µ1, . . . , µk by a se-quence of solutions for compact boundary data µn1 , . . . , µ

nk and show that the

approximate solutions converge to a solution of the original problem.We begin with a lemma about approximate solutions to weak Steiner prob-

lems in metric spaces.

Lemma 11. Let (X, d) be a metric space and suppose for 1 ≤ i ≤ k and1 ≤ j ≤ l we have pni , pi, q

nj , qj ∈ X such that pni → pi and qnj → qj in the

d-metric as n→∞. If for some graph G the qnj solve the G-parameterized weakSteiner problem for boundary data pni , then the qj solve the G-parameterizedweak Steiner problem for boundary data pi. If (X, d) is a geodesic space, qjsolves the strong Steiner problem as well.

10

Page 11: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Proof. Let Φn : X l → R and Φ : X l → R be the corresponding length functions.Suppose for contradiction that for some r1, . . . , rl ∈ X and ε > 0 we haveΦ(rj) < Φ(qj)− ε. For some large N , if n > N then

d(pni , pi), d(qnj , qj) ≤

ε

8|G|.

For such n we may approximate Φ(qj) ≤ Φn(qnj )+ε/4 and Φn(rj) ≤ Φ(rj)+ε/4.Thus

Φn(rj) ≤ Φn(qnj )− ε

2,

contradicting the minimality of Φn(qnj ).

We are now ready to prove

Theorem 12. Suppose X = S × N where X is a complete, separable, locallycompact, locally convex length space endowed with the product metric, S is com-pact and N is a Hadamard space. Then the parameterized and general versionsof the Monge-Kantorovich-Steiner problem and the Steiner problem are solvablein (Pp(X ),Wp) for arbitrary boundary data.

Proof. First consider the G-parameterized problem where each vertex of G isadjacent to the boundary. Let µ1, . . . , µk ∈ Pp(X ) denote the boundary dataand let Φ : (P (X )l) → R be the Monge-Kantorovich-Steiner functional. Sinceµ1, . . . , µk is tight, we may choose rn > 0 such that µi(X \ Brn) ≤ 1/n forall i, n, where Brn is the intrinsic closed ball of radius rn about some fixedbase point x0. In particular, µi(Brn) ≥ (n − 1)/n. Brn is compact since X iscomplete and locally compact. Define the cutoff measures

µni =µi|Brn

µi|Brn (X )=

µi|Brnµi(Brn)

.

Note that for m ≥ 2, we have the tightness estimate

µmi (X \Brn) =µi|Brm (X \Brn)

µi(Brm)≤ µi(X \Brn)

µi(Brm)≤ m

m− 11n≤ 2n.

Let Φm denote the Monge-Kantorovich-Steiner functional for boundary dataµmi . Since

b := Φ(δx0 , . . . , δx0) ≥ Φm(δx0 , . . . , δx0),

b gives an upper bound on the infimums for Φ and Φm.Let νmj solve the G-parameterized problem for the compact boundary data

µmi . We now show that νmj is tight. Let

ψ(n) = (b+ 1)(n

2

)1/p

+ 2rn.

Suppose for contradiction that there exists a pair (j,m) such that

νmj (X \Bψ(n)) >4n.

11

Page 12: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Choose i such that νmj and µmi are G-adjacent. Since µmi (X \ Brn) < 2/n,transporting from νmj to µmi must move at least 2/n of the mass from outsideBψ(n) to inside Brn . More precisely, if π ∈ Π(νmj , µmi ) then

π((X \Bψ(n))×Brn) + π((X \Bψ(n))× (X \Brn)) = νmj (X \Bψ(n)) >4n,

and similarly

π((X \Bψ(n))×Brn) + π(Bψ(n) ×Brn) = 1− µmi (X \Bψ(n)) > 1− 2n.

Adding inequalities we find

1 +2n

< 2π((X \Bψ(n))×Brn) + π(Bψ(n) ×Brn) + π((X \Bψ(n))× (X \Brn))

≤ 2π((X \Bψ(n))×Brn) + π(Bψ(n) ×Brn)+π((X \Bψ(n))× (X \Brn)) + π(Bψ(n) × (X \Brn))

= π((X \Bψ(n))×Brn) + π(X × X )= π((X \Bψ(n))×Brn) + 1

since π(X × X ) = 1, and therefore π((X \Bψ(n))×Brn) > 2/n. Thus∫X×X

dp(x, y) dπ(x, y) ≥∫

(X\Bψ(n))×Brndp(x, y) dπ(x, y)

≥∫

(X\Bψ(n))×Brn(ψ(n)− rn)p dπ(x, y)

= (ψ(n)− rn)pπ((X \Bψ(n))×Brn)

>2n

(ψ(n)− rn)p,

and we see that

Wp(νmj , µmi ) ≥

(2n

)1/p

(ψ(n)− rn) ≥ b+ 1.

In particular, Φm(νmj ) > b, contradicting the minimality of Φm(νmj ). Thereforeνmj is tight.

For general graphs G, we may inductively show tightness of νmj for verticesk + 1 edges away from the boundary by assuming tightness of such sequencesfor vertices k edges away from the boundary by the above argument. Since Gis a finite graph, we cover all vertices in a finite number of iterations of theargument.

Taking subsequences, we may assume that for all j, νmj → νj in Wp. By theprevious lemma, it remains to show that µmi → µi in Wp. This follows by theweak convergence of µmi to µi and the bound∫

dp(x, x0) dµmi (x) = W pp (µmi , δx0) ≤W p

p (µi, δx0) =∫dp(x, x0) dµi(x)

by Theorem 6.9 in [18].The solution of the general problem follows as in the proof of Theorem 7.

12

Page 13: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

4 Geometry of (P2(M), W2)

4.1 Structure of Steiner trees

We now restrict our attention to the general Steiner problem on (P2(X ),W2)and investigate how the geometry of X can force structure on the parametricgraph of a solution. We must first recall some notions of metric geometry.

Given three distinct points x, y, z in a length space Y, the comparison an-gle ∠xyz is defined as the corresponding angle in the triangle in R2 of sidesd(x, y), d(x, z), d(y, z). Explicitly,

∠xyz = arccosd2(x, y)− d2(x, z) + d2(y, z)

2d(x, y)d(y, z).

If β : [0, ε) → Y and γ : [0, ε) → Y are two paths in Y with β(0) = γ(0) = p,then we define the angle

∠(α, β) = lims,t→0

∠(α(s), p, β(t))

whenever the limit exists.

Definition 11. A length space Y is said to have nonnegative Alexandrov cur-vature it has a covering by neighborhoods Vi such that for any two shortestpaths β : [0, ε) → Vi and γ : [0, ε) → Vi with β(0) = γ(0) = p,

∠(α(s), p, β(t))

is nonincreasing in both s and t.

There are several equivalent definitions of nonnegative Alexandrov curva-ture. For instance, it is shown in [4] that for a locally compact length space Y,nonnegative Alexandrov curvature is equivalent to having covering by neighbor-hoods Vi such that for any four distinct points a, b, c, d ∈ Vi,

∠bac+ ∠cad+ ∠dab ≤ 2π.

The following results of Lott and Villani will allow us to work geometricallyon our probability space.

Theorem 13 ([13]). Suppose M is a compact Riemannian manifold with non-negative sectional curvature. Then for all µ0, . . . , µ3 ∈ P2(M),

∠µ1µ0µ2 + ∠µ2µ0µ3 + ∠µ3µ0µ1 ≤ 2π.

In particular, P2(M) has nonnegative Alexandrov curvature.

Note that the needed local compactness of P2(M) follows from the compact-ness of M via Prokhorov’s Theorem. By passing to limits in the inequality, weobtain

∠γ1µ0γ2 + ∠γ2µ0γ3 + ∠γ3µ0γ1 ≤ 2π

for geodesics γi starting at µ0.

13

Page 14: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Theorem 14 ([13]). Suppose M is a compact Riemannian manifold withnonnegative sectional curvature. Then for each absolutely continuous measureµ ∈ P2(M), the tangent cone Kµ of P2(M) at µ is a Hilbert space, under theinner product generated by angles of geodesics in the space of directions.

We also recall a first variation formula for Alexandrov spaces.

Lemma 15 ([4]). Let Y be a complete, locally compact length space of nonneg-ative Alexandrov curvature, p ∈ Y and γ : [0, ε) → Y a unit-speed shortest path.Let l(t) = d(p, γ(t)). Then

limt→0+

l(ti)− l(0)ti

= minσ0

[− cos(∠σ0γ)] ,

where the minimum is taken over all shortest paths from γ(0) to p. (In partic-ular, the limit exists and the minimum is achieved.)

We now assume M = S×Rn where S is compact with nonnegative sectionalcurvature, and µ1, . . . , µk ∈ P2(M) have compact support. By Proposition 10,there is a general Steiner solution, which by the proof of Theorem 7, may berepresented by a network Γ whose parametric graph G is a tree and all interiorvertices have degree at least three. This is known as the canonical representative.We will now show that if an interior vertex does not have degree three, then thecorresponding measure is not absolutely continuous.

First, note that for some compact K ⊂ M containing the supports ofµ1, . . . , µk, we have that Γ(G) ⊂ P2(H) for H = S × co (projN K) as above. Γis thus trivially a Steiner solution for the restrained general Steiner problem onP2(H), where we can apply Theorems 13 and 14 and Lemma 15.

By Theorem 13, it suffices to show that the angle between any pair of ad-jacent geodesics γ1, γ2 connected at an absolutely continuous µ0 ∈ P2(H) is atleast 2π/3. Suppose for contradiction that

∠γ1µ0γ2 < 2π/3.

Let e1, e2 be the edges corresponding to γ1, γ2, and let v be the vertex corre-sponding to µ0. Split the vertex v into v1, v2 and create a new graph G′ wherev1 is incident to exactly e1, e2 and a new edge e, and v2 is incident to e andthe remaining edges originally incident to v. There is an obvious graph homo-morphism h : G′ → G identifying v1 and v2. Let Γ′ : G′ → P2(H) denote thenetwork Γ h. Γ′ is clearly a (non-canonical) Steiner network, so it is a globaland local minimizer for length.

Let N1, N2 be the unit vector representatives of γ1, γ2 in the tangent coneKµ0 at µ0. Since Kµ0 is an inner product space, there is a (unit-speed) geodesicη : [0, ε) → P2(H) with η(0) = µ0 such that the angle between η and N1 +N2

is arbitrarily small. Let N be the unit vector representative of η and let l(t) bethe length of the network Γ′t given by shifting v1 to η(t). By minimality of Γ′,

limt→0+

l(ti)− l(0)ti

≥ 0.

14

Page 15: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

The edge e maps to η([0, t]) and thus has length t. The only other lengthschanged are the images of e1, e2, so by Lemma 15,

0 ≤ limt→0+

l(ti)− l(0)ti

= 1 + minσ1

[− cos(∠σ1η)] + minσ2

[− cos(∠σ2η)]

≤ 1− cos(∠N1η)− cos(∠N2η)= 1− 〈N1, N〉 − 〈N2, N〉= 1− 〈N1 +N2, N〉= 1− ‖N1 +N2‖ cos(∠(N1 +N2), N) < 0,

since ‖N1 + N2‖ > 1 and cos(∠(N1 + N2), N) is arbitrarily close to 1. Thiscontradiction implies

∠γ1µ0γ2 ≥ 2π/3.

Summarizing, we have

Theorem 16. Suppose M is a Riemannian manifold with isometric splittingM = S × Rn where S is compact with nonnegative sectional curvature, andµ1, . . . , µk ∈ P2(M) have compact support. Then there is a Steiner solution inP2(M) spanning µ1, . . . , µk. Furthermore, this solution has a canonical repre-sentative Γ : G→ P2(M) such that

1. G is a tree.

2. Vertices in G not mapped to µ1, . . . , µk have degree at least three.

3. For any vertex v in G\Γ−1(µ1, . . . , µk), if Γ(v) is absolutely continuouswith respect to the volume measure, then the degree of v is three and allpairs of geodesics in Γ(G) meeting at Γ(v) do so with an angle of 2π/3.

The method of proof for Theorem 16 also applies to Steiner trees in locallycompact, finite dimensional, nonnegatively curved Alexandrov space; one mustsimply replace the notion of absolute continuity of a measure with the notion ofbeing a manifold point. This further illustrates the analogy between measuresin P2(M)\P ac2 (M) and singular points in a finite dimensional Alexandrov spacementioned in [13].

One may see that the absolute continuity assumption for the vertex v inTheorem 16 is only used to establish the existence of ε-almost midpoints inthe tangent cone Kv. These ε-almost midpoints always exist for finite dimen-sional Alexandrov spaces of nonnegative curvature, however [9] has an infinitedimensional counterexample.

4.2 Unboundedness of curvature

As nonpositive curvature provides useful convexity properties for making unique-ness arguments for minimizers, one might hope that (P2(M),W2) has nonpos-itive curvature for some class of manifolds M . In particular, one might ask if

15

Page 16: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

P2(Rn) is Alexandrov flat. For n ≥ 2, this is shown to be false by the followingexample of [1]:

Consider R2 ⊂ Rn, and let µ1 = 12 (δ(1,1) + δ(5,3)), µ2 = 1

2 (δ(−1,1) + δ(−5,3))and µ3 = 1

2 (δ(0,0) + δ(0,−4)). The (constant speed) W2-geodesic between µ1 andµ2 is given by

µt =12(δ(1−6t,1+2t) + δ(5−6t,3−2t)), so µ1/2 =

12(δ(−2,2) + δ(2,2)).

One may then compute

W 22 (µ1, µ2) = 40,

W 22 (µ1, µ3) = 30,

W 22 (µ2, µ3) = 30,

W 22 (µ1/2, µ

3) = 24,

d2(µ1/2, µ3) = 20,

where µ1/2, µ3 are the corresponding points on the comparison triangle ∆µ1µ2µ3

in R2. SinceW2(µ1/2, µ3) > d(µ1/2, µ

3), P2(Rd) cannot have nonpositive Alexan-drov curvature.

We will now adapt the above example to show that the Alexandrov cur-vature of P2(M) is unbounded from above for any Riemannian manifold M ofdimension at least two. To state this precisely, we must generalize our definitionof Alexandrov space.

Definition 12. For κ ∈ R, the model space Mnκ is defined as

• Mn0 = Rn,

• if κ > 0, then Mnκ is the n-dimensional sphere of sectional curvature κ,

• if κ < 0, then Mnκ is the n-dimensional hyperbolic space of sectional

curvature κ.

Definition 13. An Alexandrov space of curvature bounded above (below) byκ ∈ R is a length space which can be covered by a family of open sets Vi suchthat for each Vi:

1. There exists a shortest path in Vi connecting any two points in Vi.

2. For any a, b, c ∈ Vi and any point d in the shortest path ac, let ∆abc bethe comparison triangle for ∆abc in the model space M2

κ , i.e. |ab| = |ab|,|ac| = |ac| and |bc| = |bc|, and let d be the point in ac such that |ad| = |ad|.Then |bd| is less than (greater than) |bd|. (Intuitively, ∆abc is skinnier(fatter) than ∆abc.)

This is consistent with our previous definitions.

16

Page 17: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Consider P2(Rd) and µ1, µ2, µ3, µ1/2 as above. It is easy to check that thetriangle comparison condition varies continuously in κ, so for some ε > 0, theAlexandrov curvature of P2(Rd) is not bounded above by ε. Now set for λ > 0

λµ1 =12(δ(λ,λ) + δ(5λ,3λ)),

λµ2 =12(δ(−λ,λ) + δ(−5λ,3λ)),

λµ3 =12(δ(0,0) + δ(0,−4λ)).

Then

W 22 (λµ1, λµ2) = 40λ2,

W 22 (λµ1, λµ3) = 30λ2,

W 22 (λµ2, λµ3) = 30λ2,

W 22 (λµ1/2,

λµ3) = 24λ2,

and the λµ show that the Alexandrov curvature of P2(Rd) is not bounded aboveby ελ−2. Sending λ → 0 proves the Alexandrov curvature of P2(Rd) is un-bounded from above.

For Riemannian M , choose an origin point p ∈M and define the µ as above,replacing the points in R2 by their image under the exponential map at p. Asλ→ 0, the distances approach the corresponding Euclidean distances. Thus forsome ε0 > 0 and all small enough λ, the λµ show that the Alexandrov curvatureof P2(M) is not bounded above by (ε+ ε0)λ−2, so the Alexandrov curvature ofP2(M) is unbounded from above.

Now suppose we are given ν ∈ P2(M) and r > 0, and for each α ∈ (0, 1] andeach µ above, define

αµ = (1− α)ν + αµ.

By the restriction property [18],

W2(ν, αµ) = αW2(ν, µ)

and for small enough α, each αµ is in B(ν, r). Similarly,

W2(λαµ1, λαµ

2) = αW2(λµ1, λµ2), . . . ,W2(λαµ1/2,λαµ

3) = αW2(λµ1/2,λµ3),

so again for small enough λ, the λαµ show that the Alexandrov curvature of

B(ν, r) is not bounded above by (ε + ε0)λ−2α−2, and hence the Alexandrovcurvature of B(ν, r) is unbounded from above.

Note that the above argument also works with only minor alteration for n-dimensional Alexandrov spaces of curvature bounded below; one must simplychoose an n-strained point as the origin. (The n-strained points form a densemanifold in the Alexandrov space, see [4].) Thus we have shown:

Proposition 17. If κ ∈ R, X is a Riemannian manifold or finite-dimensionalAlexandrov space of curvature bounded below, dimX ≥ 2 and U is open inP2(X ), then U is not an Alexandrov space of curvature bounded above by κ.

17

Page 18: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Furthermore, approximating the µ above by µk ∈ P ac2 (X ) yields:

Corollary 18. If κ ∈ R, X is a Riemannian manifold or finite-dimensionalAlexandrov space of curvature bounded below, dimX ≥ 2 and U is open inP ac2 (X ), then U is not an Alexandrov space of curvature bounded above by κ.

It is therefore necessary to make fairly strong assumptions on the probabilityspace P ⊂ P (X ) in order to have nonpositive curvature.

4.3 Gaussian Steiner problems in (P2(R), W2)

We now consider Steiner problems in (P2(Rd),W2) where the boundary dataµ1, . . . , µk consists of Gaussian measures. This will allow us to work in a non-positively curved probability space, as well as give Steiner vertex degree resultsfor a class of non-compactly supported boundary data.

The Gaussian measures on Rd are given by

γm,V dx =(

12π

)d/2 1√detV

exp[−1

2⟨x−m,V −1(x−m)

⟩]dx

for any m ∈ Rd and and d× d symmetric positive definite matrix V . Here

E[γm,V ] = m,Cov(γm,V ) = V.

We denote the set of Gaussian measures by Γ(Rd), and given a d×d orthogonalmatrix P we denote

Γ(Rd, P ) = γm,V |V is diagonalized by P.

Following formal arguments of Otto [15], Takatsu [17] has shown that Γ(Rd)and Γ(Rd, P ) are geodesically convex in (P2(Rd),W2), with the W2-metric mak-ing Γ(Rd) a d+d(d+1)/2-dimensional Riemannian manifold and making Γ(Rd, P )isometric to Rd × (0,+∞)d. In particular,

W 22 (γm,V , γn,U ) = |m− n|2 + trV + trU − 2 tr

√U1/2V U1/2. (1)

We would like to solve the Monge-Kantorovich-Steiner problem in the space(P2(Rd),W2) for boundary data in Γ(Rd) and show that the solution stays inΓ(Rd). Our plan is to project

ϕ : µ→ γE[µ],Cov(µ),

assuming Cov(µ) > 0. The following lemma shows this is distance nonincreas-ing.

Lemma 19. For µ, ν ∈ P2(Rd),

W 22 (µ, ν) ≥ |E(µ)− E(ν)|2 + trU + trV − 2 tr

√U1/2V U1/2,

where U = Cov(µ) and V = Cov(ν).

18

Page 19: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

Proof. We begin by showing that for random variables X,Y : Ω → Rd,

tr Cov(X,Y ) ≤ tr√

(Cov(X))1/2 Cov(Y )(Cov(X))1/2. (2)

Since, trA2 ≤ (trA)2, it suffices to show

trCov(X,Y ) ≤√

tr(Cov(X))1/2 Cov(Y )(Cov(X))1/2.

Traces are independent of choice of orthonormal basis and Cov(X) is real andsymmetric, so choose a diagonalizing orthonormal basis for Cov(X). In thisbasis, we may write Cov(X) as the diagonal matrix with entries λi on thediagonal and Cov(Y ) = (bij). Then (Cov(X))1/2 has entries

√λi,

(Cov(X))1/2 Cov(Y )(Cov(X))1/2 = (√λiλjbij),

and √tr(Cov(X))1/2 Cov(Y )(Cov(X))1/2 =

√√√√ d∑i=1

λibii.

Now

trCov(X,Y ) =d∑i=1

Cov(Xi, Yi) ≤d∑i=1

√Var(Xi) Var(Yi)

since the correlation between Xi and Yi is at most 1. (Note that if eithervariation is zero, then Cov(Xi, Yi) = 0 =

√Var(Xi) Var(Yi).) Furthermore,

d∑i=1

√Var(Xi) Var(Yi) =

d∑i=1

√λibii ≤

√√√√ d∑i=1

λibii,

so we have proven equation (2).We also have

trCov(X,Y ) =d∑i=1

Cov(Xi, Yi)

=d∑i=1

E[XiYi]− E[Xi]E[Yi]

= E[X · Y ]− E[X] · E[Y ],

sotrCov(X,Y ) = E[X · Y ]− E[X] · E[Y ]. (3)

In particular,tr Cov(X) = E[X ·X]− E[X] · E[X]. (4)

19

Page 20: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

We may reformulate the Wasserstein distance as

W 22 (µ, ν) = inf E[|X − Y |2],

where the infimum is taken over all random variables X,Y : R2d → Rd withlaws µ and ν respectively. For any such X,Y ,

E[|X − Y |2] = E[X2] + E[Y 2]− 2E[X · Y ]= E[X2] + E[Y 2]− 2E[X] · E[Y ]− 2(E[X · Y ]− E[X] · E[Y ])= E[X2] + E[Y 2]− 2E[X] · E[Y ]

−2 trCov(X,Y ) (by eqn. (3))≥ E[X2] + E[Y 2]− 2E[X] · E[Y ]

−2 tr√

(Cov(X))1/2 Cov(Y )(Cov(X))1/2 (by eqn. (2))

= (E[X]− E[Y ])2 + E[X2]− E[X] · E[X] + E[Y 2]

−E[Y ] · E[Y ]− 2 tr√

(Cov(X))1/2 Cov(Y )(Cov(X))1/2

= |E[X]− E[Y ]|2 + trCov(X) + trCov(Y )

−2 tr√

(Cov(X))1/2 Cov(Y )(Cov(X))1/2 (by eqn. (4))

= |E(µ)− E(ν)|2 + trU + trV − 2 tr√U1/2V U1/2.

Since the inequality holds for any X,Y , it holds for the infimum as well.

If we defineP(Rd) = µ ∈ P2(Rd)|Cov(µ) > 0

and

P(Rd, P ) = µ ∈ P2(Rd)|Cov(µ) > 0,Cov(µ) is diagonalized by P,

then we immediately obtain solutions to Steiner problems by applying the pro-jection map ϕ and the local compactness of Γ(Rd) and Γ(Rd, P ).

Corollary 20. Given boundary data µ1, . . . , µk in Γ(Rd), the parameterized andgeneral forms of the Monge-Kantorovich-Steiner problem, the Monge-Steinerproblem and the Steiner problem in P(Rd) have solutions in Γ(Rd).

Corollary 21. Given boundary data µ1, . . . , µk in Γ(Rd, P ), the parameter-ized and general forms of the Monge-Kantorovich-Steiner problem, the Monge-Steiner problem and the Steiner problem in P(Rd, P ) have solutions in Γ(Rd, P ).The solutions are also Steiner graphs in R2d. In particular, for the generalSteiner problems there is a Steiner solution which has a canonical representa-tive Φ : G→ P2(R) such that

1. G is a tree.

2. For any vertex v in G \Φ−1(µ1, . . . , µk), the degree of v is three and allpairs of geodesics in Φ(G) meeting at Φ(v) do so with an angle of 2π/3.

20

Page 21: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

The structural result above follows by the variational argument we usedbefore, as Γ(Rd, P ) is flat, hence nonnegatively curved.

For clarity of results, we will now concentrate on the case d = 1. Here wehave that advantage that Γ(R) = Γ(R, P ) = R× (0,+∞). Define the projection

ϕ : µ→

γE[µ],Var(µ) if Var(µ) > 0,δE[µ] if Var(µ) = 0.

,

and note that (1) still holds for this interpretation of γm,0. Thus Lemma 19implies that ϕ : P2(R) → R × [0,+∞) is a contraction mapping. Also, ϕ fixesΓ(R). If we are given boundary data µ1, . . . , µk in Γ(R), then for some ε > 0,Var(µi) > ε for all i, and we may retract R× [0, ε] to R×ε without increasingthe network length in Γ(R) = R × [0,+∞). We obtain in this manner Steinersolutions over the full space P2(R) which stay in Γ(R).

Theorem 22. Given boundary data µ1, . . . , µk in Γ(R), the parameterized andgeneral forms of the Monge-Kantorovich-Steiner problem, the Monge-Steinerproblem and the Steiner problem in P2(R) have solutions in Γ(R). The solutionsare also Steiner graphs in R2, and the Steiner ratio for this restricted class ofboundary data is

√3/2.

Proof. The preceding discussion proves existence of a solution in Γ(R), whichmay then be viewed as a solution to the corresponding planar Steiner problem.The Steiner ratio result then follows by the proof of the Gilbert-Pollak conjecture[6].

The preceding Steiner ratio result cannot hold for arbitrary boundary data.

Proposition 23. If M is a Riemannian manifold of nonnegative curvature,then

ρ(P2(M)) ≤ limd→∞

ρ(Rd).

Proof. It suffices to find for any d ∈ N and any ε > 0 a finite set S ⊂ P2(M)such that

Ls(S)La(S)

≤ ρ(Rd) + ε.

By definition there is a finite set S′ ⊂ Rd such that

Ls(S′)La(S′)

≤ ρ(Rd) +ε

3.

Choose a compact N ⊂ M and some µ ∈ P ac2 (M) with support in N . Thenby Theorem 14, the tangent cone Kµ of P (N) at µ is an infinite-dimensionalHilbert space, so we may choose an orthonormal set ν1, . . . , νd ∈ Kµ. In thisway, we obtain a subset of Kµ isometric to Rd. Let S1 ⊂ Kµ be the finite setcorresponding to S′. We may approximate S1 by a finite set S2 such that eachpoint is on a ray corresponding to a direction of some geodesic in P2(N), and

Ls(S2)La(S2)

≤ Ls(S1)La(S1)

3=Ls(S′)La(S′)

3.

21

Page 22: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

We may rescale by λ about the origin to obtain λS2 with

Ls(λS2)La(λS2)

=Ls(S2)La(S2)

,

and for small enough λ the exponential map will be well-defined on λS2. Since∣∣∣∣Ls(λS2)La(λS2)

− Ls(exp(λS2))La(exp(λS2))

∣∣∣∣ → 0 as λ→ 0,

for small enough λ we have

Ls(exp(λS2))La(exp(λS2))

≤ Ls(λS2)La(λS2)

3≤ ρ(Rd) + ε.

Sending ε→ 0 gives the desired result.

Since ρ(R3) <√

3/2 (see [7]), this proposition shows that ρ(P2(R)) <√

3/2.

References

[1] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savare. Gradient flows in met-ric spaces and in the space of probability measures. Lectures in MathematicsETH Zurich. Birkhauser Verlag, Basel, 2005.

[2] Luigi Ambrosio and Paolo Tilli. Topics on analysis in metric spaces, vol-ume 25 of Oxford Lecture Series in Mathematics and its Applications. Ox-ford University Press, Oxford, 2004.

[3] Martin R. Bridson and Andre Haefliger. Metric spaces of non-positive cur-vature, volume 319 of Grundlehren der Mathematischen Wissenschaften[Fundamental Principles of Mathematical Sciences]. Springer-Verlag,Berlin, 1999.

[4] Dmitri Burago, Yuri Burago, and Sergei Ivanov. A course in metric ge-ometry, volume 33 of Graduate Studies in Mathematics. American Mathe-matical Society, Providence, RI, 2001.

[5] Manfredo Perdigao do Carmo. Riemannian geometry. Mathematics: The-ory & Applications. Birkhauser Boston Inc., Boston, MA, 1992. Translatedfrom the second Portuguese edition by Francis Flaherty.

[6] D.-Z. Du and F. K. Hwang. An approach for proving lower bounds: solutionof Gilbert-Pollak’s conjecture on Steiner ratio. In 31st Annual Symposiumon Foundations of Computer Science, Vol. I, II (St. Louis, MO, 1990),pages 76–85. IEEE Comput. Soc. Press, Los Alamitos, CA, 1990.

[7] Ding-Zhu Du and Warren D. Smith. Disproofs of generalized Gilbert-Pollakconjecture on the Steiner ratio in three or more dimensions. J. Combin.Theory Ser. A, 74(1):115–130, 1996.

22

Page 23: Steiner problems in optimal transport › ~jdahl › spot.pdf · The problem of optimal transport, originally proposed by Monge, has a long history of investigation and application

[8] E. N. Gilbert and H. O. Pollak. Steiner minimal trees. SIAM J. Appl.Math., 16:1–29, 1968.

[9] Stephanie Halbeisen. On tangent cones of Alexandrov spaces with curvaturebounded below. Manuscripta Math., 103(2):169–182, 2000.

[10] Alexandr O. Ivanov and Alexei A. Tuzhilin. Minimal networks. CRC Press,Boca Raton, FL, 1994. The Steiner problem and its generalizations.

[11] L. V. Kantorovich. On a problem of Monge. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 312(Teor. Predst. Din. Sist.Komb. i Algoritm. Metody. 11):15–16, 2004.

[12] L. V. Kantorovich. On mass transportation. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI), 312(Teor. Predst. Din. Sist.Komb. i Algoritm. Metody. 11):11–14, 2004.

[13] John Lott and Cedric Villani. Ricci curvature for metric-measure spacesvia optimal transport, 2004.

[14] Chikako Mese and Sumio Yamada. The parameterized Steiner problemand the singular plateau Problem via energy. Trans. Amer. Math. Soc.,358(7):2875–2895 (electronic), 2006.

[15] Felix Otto. The geometry of dissipative evolution equations: the porousmedium equation. Comm. Partial Differential Equations, 26(1-2):101–174,2001.

[16] Yu. V. Prokhorov. Convergence of random processes and limit theorems inprobability theory. Teor. Veroyatnost. i Primenen., 1:177–238, 1956.

[17] Asuka Takatsu. On Wasserstein geometry of the space of Gaussian mea-sures, 2008.

[18] Cedric Villani. Optimal transport, old and new. 2007.

23