Large deviation and variational approaches to generalised ...tugaut.perso.math.cnrs.fr/pdf/APSSE/2018.01.23.pdf2018/01/23 · Prob H n n = 1 = 2 n = exp( nlog 2) S. Varadhan Abel

Large deviation and variational approaches togeneralised gradient flows

Hong Duong

Imperial College London

Université Jean Monnet, 23/01/2018

Hong Duong (Imperial College London) Large deviation and variational approaches to generalised gradient flows23/01/2018 1 / 30

Introduction

1 Wasserstein gradient flows

2 From Large-deviation principle to Wasserstein gradient flows

3 Generalised Wasserstein gradient flows

4 Multiscale analysis/Coarse-graining


Wasserstein gradient flows

Jordan-Kinderlehrer-Otto, 1998: the Fokker-Planck equation

∂tρ = div(∇V ρ) + ∆ρ, ρ(0) = ρ0,

is a gradient flow of the free energy

F(ρ) =ˆρ log ρ+ V ρ

with respect to the Wasserstein distance on the probability measures withfinite second moments

W 22 (µ, ν) = infγ∈Γ(µ,ν)

ˆRd×Rd

|x − y |2 γ(dxdy)

(Γ(µ, ν) is the set of couplings between µ and ν)


JKO-scheme

JKO-scheme: Given a time-step h > 0;

ρh0 := ρ0,

for n ≥ 1, ρhn is defined to be the unique minimizer of

1

2hW 22 (ρ, ρ

hn−1) + F(ρ).

Then the sequence {ρhn}n≥0, after an appropriate interpolation, convergesto the solution to the Fokker-Planck equation as h→ 0.

Jh(·|ρ0) :=1

2

[ 12h

W 22 (·, ρ0) + F(·)−F(ρ0)]


The theory of Wasserstein gradient flows

The theory of Wasserstein gradient flow has developed tremendously overthe last 20 years

Many PDEs are Wasserstein gradient flows: porous medium equation,McKean-Vlasov equations, finite Markov chain, etc;

Link different areas of Mathematics together (optimal transport,measure geometry & probability)

see e.g., monographs

Ambrosio L., Gigli N., Savaré G (2008),

Villani C. (2003 & 2009).

or the most recent survey by Santambrogio (2017).

BUT

Only for gradient flows, i.e., dissipative systems;

No microscopic interpretation.


Questions discussed today

Why the JKO scheme?

Can one establish JKO-type scheme for non-dissipative system, e.g.,the kinetic Fokker-Planck equation?

Can one exploit variational formulation for multi-scale analysis?


From Large-deviation principle to Wasserstein gradient flows


Large-deviation principles

Toss a fair coin n times, Hn = number ofheads

What is the chance of getting all heads?

Prob

(Hnn

= 1

)= 2−n = exp(−n log 2)

S. VaradhanAbel prize 2007

In general,

Prob

(Hnn≈ x

)n→∞∼ exp(−n I (x)), for 0 ≤ x ≤ 1

I : rate functional

bb

bc bc

0 112

∞ ∞

I(x)

x

log(2)


Microscopic interpretation of the JKO-scheme

Stochastic particle systems

dX it = −∇V (X it ) dt +√

2dW it , i = 1, . . . , n;

and its empirical measure

ρnt :=1

n

n∑i=1

δX it .

Large deviation principle:

Prob(ρnh ≈ ρ|ρn0 ≈ ρ0

)∼ exp

(− nIh(ρ|ρ0)

)as n→∞.


From large-deviation principle to Wasserstein gradient flow

Theorem (Adams-Peletier-Dirr-Zimmer 2011, D.-Laschos-Renger2013, Erbar-Maas-Renger 2015)

Ih(·|ρ0)−1

4hW 22 (·, ρ0)

h→0−−−−−→Γ

1

2F(·)− 1

2F(ρ0).

i.e.,Ih(·|ρ0) ≈ Jh(·|ρ0) as h→ 0.

(Γ: Gamma convergence)


Idea of proof

H−1(ρ)-norm: D = C∞c (Rd), s ∈ D′:

‖s‖2−1,ρ := supf ∈D

{2〈s, f 〉 −

ˆ|∇f |2 dρ

}Benamou-Brenier formulation

W 22 (ρ0, ρ1) = inf{ˆ 1

0‖∂tρt‖2−1,ρt dt : (ρt)t ∈ AC 2(ρ0, ρ1)};

Representation of the rate-functional

Ih(ρ1|ρ0) =1

4h

ˆ 10

∥∥∥∂tρt − h(div(∇V ρ) + ∆ρ)∥∥∥2−1,ρt

dt.


Idea of proof (cont.)

Proof of Γ-convergence

Liminf-inequality:

Ih(ρ1|ρ0) =1

2

(F(·)−F(ρ0)

)+

1

4h

ˆ 10‖∂tρt‖2−1,ρt dt

+h

4

ˆ 10‖ div(∇V ρ)∆ρ‖2−1,ρt dt

≥ 12

(F(·)−F(ρ0)

)+

1

4h

ˆ 10‖∂tρt‖2−1,ρt dt

≥ 12

(F(·)−F(ρ0)

)+

1

4hW 22 (ρ0, ρ1).

Limsup-inequality: construction of a curve (ρt)t such that the lastterm in Ih vanishes.


Variational formulation for non-dissipative systems


Langevin dynamics and the kinetic Fokker-Planck equation

The kinetic Fokker-Planck equation

∂tρ = −divq( pmρ)

+ divp(∇V ρ

)+ γ divp

( pmρ)

+ γβ−1∆pρ.

The Langevin dynamics

dQ =P

mdt,

dP = −∇V (Q) dt − γ Pm

dt +√

2γβ−1 dWt .

Neither a gradient flow nor a Hamiltonian flow, but a combination of them.


Variational formulation for the kinetic Fokker-Planckequation

New optimal transportation cost functional

Ch(q, p; q′, p′) := h inf

{ˆ h0

∣∣mξ̈(t) +∇V (ξ(t))∣∣2 dt : ξ ∈ C 1([0, h],Rd)such that

(ξ,mξ̇)(0) = (q, p), (ξ,mξ̇)(h) = (q′, p′)

}.

Given µ(dqdp), ν(dq′dp′), define

W̃h(µ, ν) := infγ∈Γ(µ,ν)

ˆR2d×R2d

Ch(q, p; q′, p′)γ(dqdpdq′dp′)


Explanation of the cost

Wasserstein cost

C (x , y) = |x − y |2 = h inf{ˆ h

0

∣∣ξ̇(t)∣∣2 dt : ξ ∈ C 1([0, h],Rd)such that ξ(0) = x , ξ(h) = y

}.

Freidlin-Wentzell small-noise large deviation

dX εt =√

2εdWt .

then {X ε : X ε(0) = x ,X ε(h) = y} satisfies a LDP with rate functionC (x , y).Our cost: small-noise perturbation of the Hamiltonian system

dQ =P

mdt

dP = −∇V (Q) dt +√

2εdWt


Variational formulation for the kinetic Fokker-Planckequation

JKO-type scheme for the kinetic Fokker-Planck equation: ρh0 := ρ0, forn ≥ 1, ρhn as the solution of the minimization problem

minρ

1

2h

1

γW̃h(ρ

hk−1, ρ) + F(ρ),

Theorem (D.-Peletier-Zimmer 2014)

Under the piece-wise constant interpolation, the sequence {ρhn}nconverges, as h→ 0, to the solution of the kinetic Fokker-Planck equation.

Extensions

D.-Tran 2017: a degenerate Kolmogorov diffusion.

D.-Pavliotis, in prep.: hypoelliptic diffusions.


Multiscale analysis and coarse-graining


Langevin and overdamped Langevin dynamics

Langevin dynamics

dQ =P

mdt,

dP = −∇V (Q) dt − γ Pm

dt +√

2γβ−1 dWt .

overdamped Langevin dynamics

dX = −∇V (X ) dt +√

2β−1dWt


Singular limits of Langevin dynamics

Overdamped limit (γ →∞): Kramers 1940 (formal computations),Nelson 1967 (rigorous, SDEs).

Small noise limit (ε := β−1 → 0): Freidlin-Wentzell theory (RandomPerturbations of Dynamical Systems) 1994.


Coarse-graining

Challenge: the curse of dimensionality (d extremely large).

Question: Reduced description of the system, that still includes importantdynamical information?

Coarse-graining/model reduction:

F : Rd → Rk with k < d .

Quantity of interest: path t 7→ F (X (t)). [X = (Q,P) or X = Q]


Previous work

Gyöngy 1986 introduced an exact dynamics

dY1(t) = b1(t,Y1(t)) dt +√

2β−1A1(t,Y1(t))dW (t), Y1(t) ∈ Rk

Law(Y1) = Law(F (X )) but it is extremely difficult/expensive tocompute b1 and A1.

Legoll et al. 2010, 2016 proposed an effective dynamics

dY2(t) = b2(Y2(t)) dt +√

2β−1A2(Y2(t)) dW (t), Y2(t) ∈ Rk

It is much easier/cheaper to compute b2 and A2. The authorsestimate the error between ρ1 := Law(Y1) and ρ2 := Law(Y2).


Limitations of Legoll et al.

only overdamped Langevin,

scalar coarse-graining maps (k = 1),

errors measured only in the relative entropy sense.


Our work: a unifying technique

Based on the connection to large-deviation principle, we

provide a unifying technique for both singular limits andcoarse-graining,

generalise Legoll et al. to the full Langevin dynamics for vectorialmaps and more error-measurement methods.


Theorem (D.-Lamacz-Peletier-Sharma 2017,D.-Lamacz-Peletier-Schlichting-Sharma sub. 2017)

Under certain regularity and growth assumptions on V and F ,

(1) (qualitative) We recover well-known results for the overdamped andsmall-noise limits of Langevin dynamics.

(2) (quantitative) It holds that

supt∈[0,T ]

D(ρ1(t), ρ2(t))︸︷︷︸Error measurement

(D=W 22 or H)

≤D(ρ1(0), ρ2(0))︸︷︷︸Initial data

+C (V ,F )︸︷︷︸Data

H(Law(X0)||µ).︸︷︷︸Reference dynamics

If V (x) = 1εV0(x0, . . . , xd−k) + V1(x), then

C (V ,F ) ≤ Cε2.


Idea of proof: connections to large-deviation principle

Both X and Y2 can be written as

dZ (t) = b(Z (t)) dt +√

2σ(Z (t)) dW (t), Z (t) ∈ RN .

Consider n-copies of Z

dZi (t) = b(Zi (t)) dt +√

2σ(Zi (t)) dWi (t), Zi (t) ∈ RN , (i = 1, . . . , n)

The empirical measure

ρn(t) :=1

n

n∑i=1

δZi (t).

[Dawson-Gartner 1987]: ρn satisfies a large-deviation principle with a ratefunctional I

Prob(ρn∣∣[0,T ]

≈ ρ∣∣[0,T ]

)n→∞∼ exp(−nI(ρ))


Properties of the rate-functional

(i) I ≥ 0(ii) I(ρ) = 0 ⇐⇒ ρ = Law(Z)(iii) I has a dual formulation

I(ρ) = supf

G (ρ, f )

G (ρ, f ) =

ˆRN

[fTρT − f0ρ0]−ˆ T

0

ˆRN

(∂t f + Lf + Γ(f , f )

)dρt dt.


Key inequalities

(HIR inequality) Let ρ = Law(Z ) then

H(η(t)||ρ(t)) +ˆ T

0R(η(s)||ρ(s)) ds ≤ H(η(0)||ρ(0)) + I(η),

for any η ∈ C ([0,T ],P(RN)).(Estimate of the rate functional)

I(η) ≤ C (V ,F )H(Law(X0)||µ)


Conclusions

The central role of the large-deviation rate functional

(1) microscopic interpretations of the Wasserstein gradient flowformulation,

(2) construction of new approximation schemes,

(3) Singular limits and coarse-graining of Langevin and overdampedLangevin dynamics.


Thank you for your attention!


Wasserstein gradient flowsFrom Large-deviation principle to Wasserstein gradient flowsGeneralised Wasserstein gradient flowsMultiscale analysis/Coarse-graining

Documents

Large deviation and variational approaches to generalised ...tugaut.perso.math.cnrs.fr/pdf/APSSE/2018.01.23.pdf2018/01/23 · Prob H n n = 1 = 2 n = exp( nlog 2) S. Varadhan Abel