35
Introduction to large deviation theory: Theory, applications, simulations Hugo Touchette School of Mathematical Sciences Queen Mary, University of London 3rd International Summer School on Modern Computational Science: Simulation of Rare and Extreme Events Oldenburg, Germany 15-26 August 2011 Hugo Touchette (QMUL) Large deviations August 2011 1 / 74 Outline Themes Random variables / stochastic systems Most probable values / typical states Fluctuations around these states Small vs large deviations / fluctuations / rare events 1 Large deviation theory 2 Applications 3 Simulations I 4 Simulations II Lecture notes: arxiv:1106.4146 H. Touchette, The large deviation approach to statistical mechanics, Physics Reports 478, 1-69, 2009. arxiv:0804.0327 Hugo Touchette (QMUL) Large deviations August 2011 2 / 74

Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Introduction to large deviation theory:Theory, applications, simulations

Hugo Touchette

School of Mathematical SciencesQueen Mary, University of London

3rd International Summer School on Modern Computational Science:Simulation of Rare and Extreme Events

Oldenburg, Germany15-26 August 2011

Hugo Touchette (QMUL) Large deviations August 2011 1 / 74

Outline

Themes

Random variables / stochastic systems

Most probable values / typical states

Fluctuations around these states

Small vs large deviations / fluctuations / rare events

1 Large deviation theory

2 Applications

3 Simulations I

4 Simulations II

Lecture notes: arxiv:1106.4146

H. Touchette, The large deviation approach to statistical mechanics,Physics Reports 478, 1-69, 2009. arxiv:0804.0327

Hugo Touchette (QMUL) Large deviations August 2011 2 / 74

Page 2: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Sum of Gaussian random variables

Sn =1

n

n∑i=1

Xi , p(Xi = x) =1√

2πσ2e−(x−µ)2/(2σ2)

0 100 200 300 400 5000

100

200

300

400

500

0 100 200 300 400 500

-1

0

1

2

3

4

n n

SnSnn

Basic observations

Sn → µ in probability

Fluctuations ∼ 1/√

n→ 0

Hugo Touchette (QMUL) Large deviations August 2011 4 / 74

Sum of Gaussian random variables (cont’d)

Probability density function (pdf) of Sn:

p(Sn = s) =

√n

2πσ2e−n(s−µ)2/(2σ2)

Variance: var(Sn) =σ2

n→ 0

Dominant part:

p(Sn = s) ≈ e−nI (s)

Rate function:

I (s) =(s − µ)2

2σ2

p(S =s)

I(s)

n

s

n=10

n=100

n=500

µ

Hugo Touchette (QMUL) Large deviations August 2011 5 / 74

Page 3: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Exercise: Calculation of p(Sn) via generating functions[Exercise 2.7.1]

Hugo Touchette (QMUL) Large deviations August 2011 6 / 74

Example: Sum of exponential random variables

Sn =1

n

n∑i=1

Xi

p(x) =1

µe−x/µ

0 100 200 300 400 5000

100

200

300

400

500

0 100 200 300 400 500

1

1.5

2

2.5

3

3.5

n n

SnSnn

Large deviation probability:

p(Sn = s) ≈ e−nI (s)

Rate function:

I (s) =s

µ− 1− ln

s

µ

I(s)

s

n=10

n=100

n=500

µ

p(S =s)n

Hugo Touchette (QMUL) Large deviations August 2011 7 / 74

Page 4: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Random bits

Sequence of bits:

ω = 010011100101︸ ︷︷ ︸n bits

,P(0) = p0

P(1) = p1 = 1− p0

Empirical vector:

Ln,0(ω) =# 0’s in ω

n

Ln,1(ω) =# 1’s in ω

n

Ln = (Ln,0, Ln,1)

Example:

ω = 0001101001︸ ︷︷ ︸n=10

, L10,0 =6

10=

3

5

L10,1 =4

10=

2

5

Hugo Touchette (QMUL) Large deviations August 2011 8 / 74

Random bits (cont’d)

Probability of a sequence:

P(ω) = pnLn,0

0 pnLn,1

1

Probability of Ln:

P(Ln = µ) = P(Ln,0 = µ0, Ln,1 = µ1)

=n!

(nµ0)!(nµ1)!pnµ0

0 pnµ11

[Exercise 2.7.5]

Large deviation probability

P(Ln = µ) ≈ e−nD(µ), D(µ) = µ0 lnµ0

p0+ µ1 ln

µ1

p1

D = relative entropy

Zero of D: µ = (p0, p1)

Ln → (p0, p1) in probability

Hugo Touchette (QMUL) Large deviations August 2011 9 / 74

Page 5: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Spin system

Spin chain (configuration):

ω = ω1, ω2, . . . , ωn︸ ︷︷ ︸n spins

, ωi ∈ −1,+1

Mean magnetization:

Mn =1

n

n∑i=1

ωi

Density of states:

Ω(m) = # configurations ω with Mn = m

Large deviation form

Ω(m) ≈ ens(m), s(m) = −1−m

2ln

1−m

2− 1 + m

2ln

1 + m

2

Hugo Touchette (QMUL) Large deviations August 2011 10 / 74

Large deviation theory

Random variable: An

Probability density: p(An = a)

Large deviation principle (LDP)

p(An = a) ≈ e−nI (a)

Meaning of ≈:ln p(a) = −nI (a) + o(n)

limn→∞

−1

nln p(a) = I (a)

Rate function: I (a) ≥ 0

Goals of large deviation theory

1 Prove that a large deviation principle exists

2 Calculate the rate function

Hugo Touchette (QMUL) Large deviations August 2011 11 / 74

Page 6: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Varadhan’s TheoremExponential average:

〈enf (An)〉 =

∫p(An = a) enf (a) da

Assume LDP for An:

p(An = a) ≈ e−nI (a)

Courant Institute

Abel Prize 2007

Theorem: Varadhan (1966)

λ(f ) = limn→∞

1

nlog〈enf (An)〉 = max

af (a)− I (a)

Special case: f (a) = ka

λ(k) = maxaka− I (a)

Hugo Touchette (QMUL) Large deviations August 2011 12 / 74

Heuristic derivation of Varadhan’s result

Hugo Touchette (QMUL) Large deviations August 2011 13 / 74

Page 7: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Gartner-Ellis Theorem

Scaled cumulant generating function (SCGF)

λ(k) = limn→∞

1

nln〈enkAn〉, k ∈ R

Theorem: Gartner (1977), Ellis (1984)

If λ(k) is differentiable, then

1 Existence of LDP:

p(An = a) ≈ e−nI (a)

2 Rate function:

I (a) = maxkka− λ(k)

Richard S. Ellis

UMass, USA

I (a) = Legendre transform of λ(k)

I (a) convex in this case

Hugo Touchette (QMUL) Large deviations August 2011 14 / 74

Exercise: Legendre transforms[Exercise 2.7.8]

Hugo Touchette (QMUL) Large deviations August 2011 15 / 74

Page 8: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Contraction principle

LDP An −→ Bn LDP?

LDP for An:p(An = a) ≈ e−nIA(a)

Contraction: Bn = f (An)Probability for Bn:

p(Bn = b) =

∫f −1(b)

p(An = a) da

Contraction principle

LDP for Bn:p(Bn = b) ≈ e−nIB(b)

Rate function:

IB(b) = mina:f (a)=b

IA(a) = minf −1(b)

IA(a)

Hugo Touchette (QMUL) Large deviations August 2011 16 / 74

Sums of IID random variablesRandom variable:

Sn =1

n

n∑i=1

Xi , Xi ∼ p(x), IID

SCGF:

λ(k) = limn→∞

1

nln〈enkSn〉 = lim

n→∞

1

nln

⟨n∏

i=1

ekXi

⟩= ln〈ekX 〉

Gartner-Ellis Theorem

I (s) = k(s)s − λ(k(s)), λ′(k(s)) = s

I (s) is convex

Zero of I (s) at 〈X 〉Originally proved by Cramer (1938)

Hugo Touchette (QMUL) Large deviations August 2011 17 / 74

Page 9: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Gaussian random variables[Exercise 3.6.1]

Sn =1

n

n∑i=1

Xi , p(Xi = x) =1√

2πσ2e−(x−µ)2/(2σ2)

Generating function:

〈ekX 〉 = ln

∫ ∞−∞

ekx 1√2πσ2

e−(x−µ)2/(2σ2)dx = ekµ+σ2k2/2

Log-generating function:

λ(k) = ln〈ekX 〉 = kµ+σ2

2k2

Rate function:

I (s) = k(s)s−λ(k(s)) =(s − µ)2

2σ2

p(S =s)

I(s)

n

s

n=10

n=100

n=500

µ

Hugo Touchette (QMUL) Large deviations August 2011 18 / 74

Example: Exponential random variables*

[Exercise 3.6.1]

Sn =1

n

n∑i=1

Xi , p(Xi = x) =1

µe−x/µ, x > 0

Log-generating function:

λ(k) = − ln(1− µk), k <1

µ

Rate function:

I (s) =s

µ− 1− ln

s

µ, s > 0

I(s)

s

n=10

n=100

n=500

µ

p(S =s)n

Hugo Touchette (QMUL) Large deviations August 2011 19 / 74

Page 10: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Bernoulli sample mean[Exercise 3.6.1]

Sn =1

n

n∑i=1

Xi ,Xi ∈ 0, 1P(Xi = 1) = αP(Xi = 0) = 1− α

Discrete RV:

Sn ∈

0,1

n,

2

n, . . . ,

n − 1

n, 1

Discrete probability distribution: P(Sn = s)

Values of Sn “fill” unit interval [0, 1] as n→∞Continuous-limit probability density:

p(Sn = s) =P(Sn ∈ [s, s + ∆s])

∆s

Smoothed pdf (weak convergence)

Hugo Touchette (QMUL) Large deviations August 2011 20 / 74

Bernoulli sample mean (cont’d)

0.0 0.2 0.4 0.6 0.8 1.00.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

s

PS n

HsL

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

s

I nHsL

•n=5•n=25•n=50•n=100

0.0 0.2 0.4 0.6 0.8 1.00

2

4

6

8

s

p SnHsL

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

s

I nHsL

— n=5— n=25— n=50— n=100

LDP: p(Sn = s) ≈ e−nI (s)

Rate function:

I (s) = s lns

α+ (1− s) ln

1− s

1− αHugo Touchette (QMUL) Large deviations August 2011 21 / 74

Page 11: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Cauchy random variables*

[Exercise 3.6.1]

Sn =1

n

n∑i=1

Xi , p(Xi = x) =1

π

1

x2 + 1, x ∈ R

Generating function:

λ(k) =

0 k = 0∞ k 6= 0

No large deviations

I (s) = 0

P(Sn = s) has power-law tails (not exponential)

Hugo Touchette (QMUL) Large deviations August 2011 22 / 74

Sanov’s Theorem[Exercise 3.6.8]

n IID random variables: ω = ω1, ω2, . . . , ωn

Empirical frequencies:

Ln,j =1

n

n∑i=1

δωi ,j =# (ωi = j)

n

Empirical vector: Ln = (Ln,1, Ln,2, . . . , Ln,q)

SCGF:

λ(k) = ln

q∑j=1

pj ekj

Gartner-Ellis Theorem

LDP: P(Ln = µ) ≈ e−nD(µ)

Rate function: D(µ) = k(µ) · µ− λ(k(µ)) =

q∑j=1

µj lnµj

pj

Hugo Touchette (QMUL) Large deviations August 2011 23 / 74

Page 12: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Markov processesDonsker and Varadhan (1975)

Markov chain:

ω = ω1, ω2, . . . , ωn, p(ω) = ρ(ω1)π(ω2|ω1) · · ·π(ωn|ωn−1)

Additive process:

Sn =1

n

n∑i=1

f (ωi )

Gatner-Ellis Theorem

Tilted transition matrix: πk(ωn|ωn−1) = π(ωn|ωn−1)ekf (ωn)

Dominant eigenvalue: ζ(πk)

SCGF:λ(k) = ln ζ(πk)

LDP:p(Sn = s) ≈ e−nI (s), I (s) = max

kks − λ(k)

Hugo Touchette (QMUL) Large deviations August 2011 24 / 74

Exercise: SCGF for Markov chains

Hugo Touchette (QMUL) Large deviations August 2011 25 / 74

Page 13: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Binary Markov chain*[Exercise 3.6.10]

Sample mean:

Rn =1

n

n∑i=1

ωi , ωi ∈ 0, 1

Transition matrix:

Π =

(π(0|0) π(0|1)π(1|0) π(1|1)

)=

(1− α αα 1− α

), α ∈ (0, 1)

Rate function:

I (r) ≈ I0(r)+2(1−2r)2ε+(2−32r2 +64r3−32r4)ε2, ε = 1/2−α

λ(k)

k

(b)

I(r)

r

(c)

-4 -2 0 2 4

-1

0

1

2

3

4

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6 α = 0.75

α = 0.25

α = 0.5

α = 0.75

α = 0.25

10

(a)

α

α

1 − α

1 − α

α = 0.5

Hugo Touchette (QMUL) Large deviations August 2011 26 / 74

General properties

Most probable value = min and zero of I

Zero of I = Law of Large Numbers

Parabolic minimum = Central Limit Theorem

I(a)

a a

I(a)

p (a)n p (a)n

I (a) 6= maxkka− λ(k) when I is nonconvex

[Exercise 3.6.2]

Hugo Touchette (QMUL) Large deviations August 2011 27 / 74

Page 14: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Summary

Large deviation principle

p(An = a) ≈ e−nI (a)

Valid for uncorrelated and correlated processes

Exponential term is dominant

Describes small and large fluctuations

Most probable value: min of I (a)

Law of large numbers: min (zero) of I (a)

Central limit theorem: Parabolic minimum of I (a)Methods for obtaining I (a):

I Gartner-Ellis TheoremI Contraction principle

Exercises

2.7.1 – 2.7.10

3.6.1 – 3.6.10

Hugo Touchette (QMUL) Large deviations August 2011 28 / 74

Continuous-time processesStochastic process: x(t)Path pdf:

p[x ]︸︷︷︸functional

= p(x(t)Tt=0) = Probability of path x(t)

Observable: AT [x ]Observable distribution:

p(AT = a) =

∫x(t):AT [x]=a

D[x ] p[x ]

Problems

Find p(AT = a)

Find most probable value of AT

Determine fluctuations around steady state

Scaling limits:I Long-time: T →∞I Low noise

Hugo Touchette (QMUL) Large deviations August 2011 30 / 74

Page 15: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Low-noise limit of stochastic differential equations

Dynamical system:x(t) = F (x(t))

Perturbed dynamics:

X (t) = F (X (t)) +√ε ξ(t)︸ ︷︷ ︸

perturbation

Low-noise limit: ε→ 0

Gaussian white noise:

〈ξ(t)〉 = 0, 〈ξ(t)ξ(t ′)〉 = δ(t − t ′)

0 τ

x0 x(t)

0 τ

x0 X (t)ǫ

Hugo Touchette (QMUL) Large deviations August 2011 31 / 74

Exercise: Simulating SDEs

Hugo Touchette (QMUL) Large deviations August 2011 32 / 74

Page 16: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

LDP for the random paths

Functional LDP

p[x ] ≈ e−J[x]/ε, J[x ] =1

2

∫ T

0[x(t)− F (x(t))]2 dt︸ ︷︷ ︸

Lagrangian︸ ︷︷ ︸Action

Low-noise limit: ε→ 0

Derived in maths by Freidlin and Wentzell (1984)

Derived in physics by Onsager and Machlup (1953)

Zero of J[x ] = most probable dynamics = unperturbed dynamics:

x∗ = F (x∗)

Gaussian fluctuations around x∗(t)

Hugo Touchette (QMUL) Large deviations August 2011 33 / 74

Exercise: Derivation of the action

Hugo Touchette (QMUL) Large deviations August 2011 34 / 74

Page 17: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Other LDPs by contraction

Transition probability:

p(x ,T |x0) ≈ e−V (x ,T |x0)/ε

V (x ,T |x0) = minx(t):x(0)=x0,x(T )=x

J[x ]

Stationary distribution:

p(x) ≈ e−V (x)/ε

V (x) = minx(t):x(−∞)=0,x(0)=x

J[x ]

Exit time:

τε ≈ eV ∗/ε in probability

V ∗ = minx∈∂D

mint≥0

V (x , t|xs)

0 τ

x0

x(t)x

0

0

x(t)x

...

...

−∞

D

∂D

t

Hugo Touchette (QMUL) Large deviations August 2011 35 / 74

Example: Ornstein-Uhlenbeck process[Exercise 3.6.14]

System:x(t) = −γx(t) +

√ε ξ(t)

Stationary distribution:

p(x) ≈ e−V (x)/ε, V (x) = minx(t):x(0)=x0,x(∞)=x

I [x ]

Euler-Lagrange equation:

δI [x∗] = 0 ⇐⇒ d

dt

∂L

∂x− ∂L

∂x= 0, L =

1

2(x + γx)2

Solution: V (x) = I [x∗] = γx2

General result

x = −U ′(x) +√ε ξ(t) =⇒ V (x) = 2U(x)

Hugo Touchette (QMUL) Large deviations August 2011 36 / 74

Page 18: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Noisy Van der Pol oscillator*[Exercise 3.6.17]

Equations of motion:

x = v

v = −x + v(α− x2 − v2) +√ε ξ(t)

I Coupled Langevin equationsI Bifurcation: Stable fixed point (α < 0) to stable limit cycle (α > 0)

0 10 20 30 40

1

0

1

2

t2 1 0 1 2

2

1

0

1

2

xv

Stationary distribution: p(r , θ) ≈ e−W (r)/ε

Hugo Touchette (QMUL) Large deviations August 2011 37 / 74

Noisy Van der Pol oscillator (cont’d)Solution:

W (r) = −αr2 +r4

2

21

01

2 2

1

0

1

2

010203040

21

01

2 2

1

0

1

2

05101520

21

01

2 2

1

0

1

2

2024

4 2 0 2 44

2

0

2

4

4 2 0 2 44

2

0

2

4

4 2 0 2 44

2

0

2

4

Hugo Touchette (QMUL) Large deviations August 2011 38 / 74

Page 19: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Long-time limit of additive observablesStochastic process: x(t)Observable:

AT [x ] =1

T

∫ T

0f (x(t)) dt

Gartner-Ellis

SCGF:

λ(k) = limT→∞

1

Tln〈eTkAT 〉, 〈eTkAT 〉 =

∫D[x ] p[x ] eTkAT [x]

Rate function: I (a) = maxkka− λ(k)

Donsker-Varadhan

Generator: L

Tilted generator: Lk = L + kf

SCGF: λ(k) = ζ(Lk)

Hugo Touchette (QMUL) Large deviations August 2011 39 / 74

Example: Pulled Brownian particle[Exercise 3.6.18]

Langevin dynamics:

mx(t) = −αx︸︷︷︸drag

−k[x(t)− vt]︸ ︷︷ ︸spring force

+ ξ(t)︸︷︷︸noise

Work:

WT =1

T

∫ T

0F (t) v dt = ∆U + QT T

vtQ

U

Large deviation principle

SCGF:

λ(k) = limT→∞

1

Tln〈eTkWT 〉 = ck(1 + k), c = v2

Rate function:

I (w) = maxkkw − λ(k) =

(w − c)2

4c

Hugo Touchette (QMUL) Large deviations August 2011 40 / 74

Page 20: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Summary

Path LDPs

p[x ] ≈ e−I [x]/ε p(AT [x ] = a) ≈ e−TI (a)

LDPs for time-evolving or steady-state processesDescribes most probable state (or trajectory)Describes fluctuations around most probable statesMost probable state = zero of I = min of IMost probable state given by variational principle

Connection with classical mechanics

δI [x∗] = 0 =⇒

Euler-Lagrange equationHamiltonian equationHamiltonian-Jacobi equation

Exercises

3.6.11 – 3.6.18

Hugo Touchette (QMUL) Large deviations August 2011 41 / 74

Other applications

Equilibrium statistical mechanics

Multifractals

Chaotic systems (thermodynamic formalism)

Disorded systemsI Random walks in random environmentsI Spin glassesI Quenched/annealed large deviations

Nonequilibrium systems

Interacting particle modelsI Zero-range processI Exclusion processI Current, density profileI Fluctuation relationsI Space/time large deviations

Hugo Touchette (QMUL) Large deviations August 2011 42 / 74

Page 21: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Equilibrium many-particle systems

N particles

Microstate: ω = ω1, ω2, . . . , ωN

Space of one particle: ωi ∈ Λ

Space of N particles: ΛN = ΛN

Probability distribution on ΛN : P(ω)

Macrostate: MN(ω)

Probability distribution for MN :

P(MN = m) =∑

ω:MN(ω)=m

P(ω)

Problems

Find P(MN = m)

Find most probable values of MN (= equilibrium states)

Study fluctuations around most probable value

Consider thermodynamic limit N →∞Hugo Touchette (QMUL) Large deviations August 2011 43 / 74

Thermodynamic LDPs

Ensembles

Microcanonical

Pu(MN = m) ≈ e−NIu(m)

Canonical

Pβ(MN = m) ≈ e−NIβ(m)

Generalize and refine Einstein’s theory of fluctuations

Equilibrium and fluctuation properties given by rate function

Equilibrium states = min of I (m) = zero of I (m)

Equilibrium states given by variational principles

Entropy

P(UN = u) ≈ eNs(u), s(u) = minββu − ϕ(β)

Legendre transform of thermo = Legendre transform of LDT

Legendre transform is valid only if s(u) is concave

Hugo Touchette (QMUL) Large deviations August 2011 44 / 74

Page 22: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Problem

Sequence of RVs: ω = X1,X2, . . . ,Xn

Observable (RV): Sn(ω)

Probability density:

p(Sn = s) =P(Sn ∈ [s, s + ∆s])

∆s=

P(Sn ∈ ∆s)

∆s

Goals

1 Sample Sn

2 Estimate p(Sn = s)

3 Test existence of LDP

4 Estimate rate function I (s)

n→∞∆s → 0

Continuous-time problem: Time discretization

X (t)τt=0 −→ Xini=1, Xi = X (i∆t)

Hugo Touchette (QMUL) Large deviations August 2011 46 / 74

Direct estimators

Configuration sample:

ω(1), ω(2), . . . , ω(L)I L copies or realizations of ωI Prior pdf: p(ω)

Observable sample:

s(1), s(2), . . . , s(L), s(j) = Sn(ω(j))

Estimator of p(Sn = s):

p(s) =P(∆s)

∆s=

1

L∆s

L∑j=1

1∆s (s(j)).

I Empirical pdfI Unbiased estimator [Exercise 4.7.3]

Hugo Touchette (QMUL) Large deviations August 2011 47 / 74

Page 23: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Exercise: What is an empirical pdf?

Hugo Touchette (QMUL) Large deviations August 2011 48 / 74

Direct estimators (cont’d)Estimator of rate function:

I (s) = −1

nln p(s)

1 Estimate p(s) for fixed L and n

2 Estimate I (s) for fixed L and n

3 Repeat for increasing values of n

-1 0 1 2 30

1

2

3

4

s

p SnHsL

— n=5— n=25— n=50— n=100

-1 0 1 2 30.0

0.5

1.0

1.5

2.0

2.5

3.0

s

I nHsL

— n=1— n=2— n=3— n=10

Use large enough L to get good statistics

Hugo Touchette (QMUL) Large deviations August 2011 49 / 74

Page 24: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Problem with direct sampling

p(Sn = s) ≈ e−nI (s)

Event Sn = s (or Sn ∈ ∆s) is exponentially rare

Choose L ∼ enI (s) to see this event

L exponential with n

Solution: Importance sampling

Sample ω with another pdf q(ω)

Choose q(ω) to make Sn = s probable

Correct estimation of p(Sn = s) according to q(ω) chosen

Hugo Touchette (QMUL) Large deviations August 2011 50 / 74

Example: Gaussian sample mean

[Exercise 4.7.2]

Sn =1

n

n∑i=1

Xi , p(Xi = x) =1√

2πσ2e−(x−µ)2/(2σ2)

1 Generate x1, x2, . . . , xn ∼ Gaussian variates

2 Compute Sn

3 Repeat L times to obtain sample s(1), s(2), . . . , s(L)4 Compute p(s) of sample

5 Compute I (s)

6 Repeat for different n and L

Hugo Touchette (QMUL) Large deviations August 2011 51 / 74

Page 25: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Gaussian sample mean (cont’d)

-1 0 1 2 3

0.0

0.5

1.0

1.5

2.0

s

I n,L

HsLn = 10

ê L = 100

ê L = 1000

ê L = 10000

ê L = 100000

-1 0 1 2 3

0.0

0.5

1.0

1.5

2.0

s

I n,L

HsL

L = 10 000

ê n = 10

ê n = 2

ê n = 20

ê n = 5

Increase L to sample tails

Increase L to smooth results (smaller error bars)[Exercise 4.7.4] [Exercise 4.7.5]

Increase L for increasing n

Choose L ∼ en

Hugo Touchette (QMUL) Large deviations August 2011 52 / 74

Importance sampling

Original sampling pdf: p(ω)

New sampling pdf: q(ω)

New estimator

q(s) =1

L∆s

L∑j=1

1∆s

(Sn(ω(j))

)R(ω(j))

Importance sampling or likelihood ratio: R(ω) =p(ω)

q(ω)

q(s) is unbiased:

〈q(s)〉q = 〈p(s)〉p = p(Sn = s)

q(s) may have smaller variance than p(s)

Choose q such that varq(q) < varp(p)

Hugo Touchette (QMUL) Large deviations August 2011 53 / 74

Page 26: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Exercise: Derivation of new estimator

Hugo Touchette (QMUL) Large deviations August 2011 54 / 74

Exponential change of measure

Original sampling pdf: p(ω)

Exponentially tilted pdf:

pk(ω) =enkSn(ω)

Wn(k)p(ω), k ∈ R

Generating function:

Wn(k) = 〈enkSn〉p =

∫enkSn(ω) p(ω) dω

Likelihood ratio:

R(ω) =p(ω)

pk(ω)= e−nkSn(ω) Wn(k) ≈ e−n[kSn(ω)−λ(k)]

pk(ω) is efficient

Zero-variance estimator as n→∞How to choose k?

Hugo Touchette (QMUL) Large deviations August 2011 55 / 74

Page 27: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Gaussian sample mean[Exercise 4.7.9]

Sn =1

n

n∑i=1

Xi , p(ω) =m∏

i=1

p(Xi ), p(Xi = x) =1√

2πσ2e−(x−µ)2/(2σ2)

Choose k ∈ RGenerate variate x1, x2, . . . , xn according to tilted pdf:

pk(xi ) =ekxi p(xi )

W (k)=

e−(xi−µ−σ2k)2/(2σ2)

√2πσ2

Compute Sn

Repeat L times to obtain sample s(1), s(2), . . . , s(L)Compute q(s)

Compute I (s)

Repeat for k ∈ [kmin, kmax]

Hugo Touchette (QMUL) Large deviations August 2011 56 / 74

Gaussian sample mean (cont’d)

èè

è

èè

è

èèèèè

è

èèèèèèè

è

èèèè

èèèè

èèèèèèè

èèèèèèè

èèèèè

èè

èèè

èè

èèè

è

è

è

è

è

è

è

è

èè

è

èèè

èè

èèèèèèè

èèèèèèèèèèèèèèèèèèè

èèèè

èèèèèèèèèèè

è

è

è

è

èè

-2 -1 0 1 2 3 4

0

1

2

3

4

s

I n,L

HsL

L = 100ê n = 10ê n = 5 è

è

è

è

è

è

è

èè

èèè

èèèèèèèè

èèèèèè

èèèèèèèèèèèè

èèè

èèèèèèèèè

è

èèè

è

è

èè

è

è

è

è

è

è

è

è

è

è

è

è

èèèèèèèèèèèèèèèèèèèèèèèèèèèèè

èèèèèèèèèèèèè

è

è

è

è

è

è

è

è

-2 -1 0 1 2 3 40

1

2

3

4

s

I n,L

HsL

L = 1000ê n = 10ê n = 5

pk is efficient to recover I (s) at s = λ′(k)

Scan k ∈ [kmin, kmax] to obtain desired range s ∈ [smin, smax]

Non-parametric: choose k such that s = λ′(k) to obtain I at s

Hugo Touchette (QMUL) Large deviations August 2011 57 / 74

Page 28: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Gaussian sample mean (cont’d)

SCGF:

λ(k) = µk +1

2σ2k2

Concentration point:

λ′(k) = µ+ σ2k = s ⇒ k(s) =s − µσ2

Explicit form of tilted pdf:

pk(s)(xi ) =e−(xi−s)2/(2σ2)

√2πσ2

Importance sampling interpretation

Sn = s large deviation under p(ω)↓

Sn = s typical event under pk(s)(ω)

Hugo Touchette (QMUL) Large deviations August 2011 58 / 74

Metropolis (Monte Carlo) sampling

Problem

Tilted pdf pk involves Wn(k)

Tilted pdf pk assumes knowledge of SCGF λ(k)

Knowledge of SCGF ⇒ I (s)

Solutions1 Use other types of estimators2 Use Metropolis (Monte Carlo) sampling with pk

I Based on pk(ω)/pk(ω′)I Free of Wn(k) and λ(k)

[Exercise 4.7.11]

Hugo Touchette (QMUL) Large deviations August 2011 59 / 74

Page 29: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Exercise: Illustration of Metropolis sampling

Hugo Touchette (QMUL) Large deviations August 2011 60 / 74

Application to SDEs

Path pdf form

Original path pdf: p[x ]

Tilted pdf:

pk [x ] =eTkST [x] p[x ]

WT (k)≈ e−TIk [x], WT (k) = 〈eTkST 〉p

Transition probability form

Original transition matrix: Π(∆t)

Tilted transition matrix:

Πk(∆t) =ek∆tx ′Π(∆t)

eλ(k)∆t= e [kx ′+G−λ(k)]∆t

Generator form

Original generator: G

Tilted generator: Gk = G + kx ′ − λ(k)

Hugo Touchette (QMUL) Large deviations August 2011 61 / 74

Page 30: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Gaussian additive processSDE: x(t) = ξ(t)Pure Brownian motion without driftObservable:

DT [x ] =1

T

∫ T

0x(t) dt =

x(T )

T

Intuitive observations

Typical state: DT = 0

Modified process with typical state DT = d :

x(t) = d + ξ(t)

Effective dynamics for large deviation

Explicit result:

Ik(d)[x ] = I [x ]− dDT [x ] +d2

2=

1

2T

∫ T

0(x − d)2 dt,

Hugo Touchette (QMUL) Large deviations August 2011 62 / 74

Exercise: Simulation of SDEs

Hugo Touchette (QMUL) Large deviations August 2011 63 / 74

Page 31: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Summary

Direct sampling of Sn with p(ω) inefficient

Requires exponential sample size L

Change sampling distribution to q(ω)

Make rare event under p more probable under q

Possible change of measure: Exponential measure

Exponential sampling efficient

Subexponential sample size

Structure of exponential measure = structure of LDT

Combine exponential sampling with Metropolis sampling

Other methods?

Hugo Touchette (QMUL) Large deviations August 2011 64 / 74

Sample mean method

Sample λ(k) instead of p(Sn = s)

Sampling with p(ω):

Sn → λ′(0) in probability

Sampling with pk(ω):

Sn → λ′(k) in probability

Estimator of Sn:

s(k) =1

L

L∑j=1

Sn(ω(j))

Estimator of λ(k):

λ(k) =

∫ k

0s(k ′) dk ′

Compute I (s) by Legendre transformNo need to estimate p

Hugo Touchette (QMUL) Large deviations August 2011 66 / 74

Page 32: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Example: Gaussian sample mean

[Exercise 5.4.1]

Sn =1

n

n∑i=1

Xi , p(ω) =m∏

i=1

p(Xi ), p(Xi = x) =1√

2πσ2e−(x−µ)2/(2σ2)

Choose k ∈ RGenerate variates x1, x2, . . . , xL according to tilted pdf:

pk(xi ) =ekxi p(xi )

W (k)=

e−(xi−µ−σ2k)2/(2σ2)

√2πσ2

Compute s for L large

Repeat for different k

Integrate results to obtain λ(k)

Obtain I by Legendre transform

Hugo Touchette (QMUL) Large deviations August 2011 67 / 74

Gaussian sample mean (cont’d)

-6 -4 -2 0 2 4 6

-4

-2

0

2

4

6

k

s LHkL

— L = 5— L = 50— L = 100

-6 -4 -2 0 2 4 60

5

10

15

20

k

Λ` LHsL

-2 -1 0 1 2 3 40

1

2

3

4

s

I LHsL

— L = 5— L = 50— L = 100

No empirical pdf calculatedNon-convex artefacts for small L (λ(k) not convex)

Hugo Touchette (QMUL) Large deviations August 2011 68 / 74

Page 33: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Empirical generating functionIID sample mean: λ(k) = E [ekX ]Estimator for λ(k):

λ(k) = ln1

L

L∑j=1

ekX (j)

Alternative estimator:

s(k) = λ′(k) =

∑Lj=1 X (j) ekX (j)∑L

j=1 ekX (j)

Markov chain:

X1 + · · ·+ Xb︸ ︷︷ ︸Y1

+ Xb+1 + · · ·+ X2b︸ ︷︷ ︸Y2

+ · · ·+ Xn−b+1 + · · ·+ Xn︸ ︷︷ ︸Ym

Markov estimator:

λ(k) =1

bln

1

L

L∑j=1

ekY (j), m =

n

b

Hugo Touchette (QMUL) Large deviations August 2011 69 / 74

Example: Gaussian sample mean

[Exercise 5.4.2]

Sn =1

n

n∑i=1

Xi , p(ω) =m∏

i=1

p(Xi ), p(Xi = x) =1√

2πσ2e−(x−µ)2/(2σ2)

Choose k ∈ RGenerate variates x1, x2, . . . , xL according to original pdf p(xi )

Compute λ(k) for L large

Repeat for different k

Obtain I by Legendre transform

Repeat for larger L

Hugo Touchette (QMUL) Large deviations August 2011 70 / 74

Page 34: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Gaussian sample mean (cont’d)

-6 -4 -2 0 2 4 60

5

10

15

20

k

Λ` LHkL

— L = 10— L = 100— L = 1000— L = 10000

-6 -4 -2 0 2 4 6

-4

-2

0

2

4

6

k

sHkL

-2 -1 0 1 2 3 40

1

2

3

4

s

I LHsL

— L = 10— L = 100— L = 1000— L = 10000

Efficient for RVs with bounded support [Exercise 5.4.2]Less efficient for unbounded RVs

Hugo Touchette (QMUL) Large deviations August 2011 71 / 74

Application for SDEs*[Exercise 5.4.3]

Hugo Touchette (QMUL) Large deviations August 2011 72 / 74

Page 35: Introduction to large deviation theory: Theory, applications, simulationsappliedmaths.sun.ac.za/~htouchette/archive/talks/oldenburgldt1_2.pdf · Introduction to large deviation theory:

Other methods

Optimal path methods

p(x ,T |x0) ≈ e−V (x ,T |x0)/ε

V (x ,T |x0) = minx(t):x(0)=x0,x(T )=x

J[x ] D

∂D

t

I [Exercise 5.4.4 – 5.4.6]

Transition path methodI Monte Carlo for pathsI See Christoph Dellago

Splitting / cloning methodsI See notes for references

Eigenvalue method

λ(k) = dominant eigenvalue of tilted generator Gk

I See notes for references

Hugo Touchette (QMUL) Large deviations August 2011 73 / 74

Reading

H. TouchetteThe large deviation approach to statistical mechanicsPhysics Reports 478, 1-69, 2009arxiv:0804.0327

Y. OonoLarge deviation and statistical physicsProg. Theoret. Phys. Suppl. 99, 165-205, 1989

A. Dembo and O. ZeitouniLarge Deviations Techniques and ApplicationsSpringer, New York, 1998

J. A. BucklewIntroduction to Rare Event SimulationSpringer, New York, 2004

Hugo Touchette (QMUL) Large deviations August 2011 74 / 74