A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs...

Preview:

Citation preview

Intro MCMC Dirichlet Results Conc Refs

A Dirichlet Form approachto MCMC Optimal Scaling

Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard.

g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk,mylene.bedard@umontreal.ca

Supported by EPSRC Research Grants EP/D002060, EP/K013939

LMS Durham Symposium on Stochastic Analysis

12th July 2017

1

Intro MCMC Dirichlet Results Conc Refs

Introduction

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

2

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

= p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density

Joint probability density

=

p(x,y)

p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

↓ ↓

p(x|y) = p(x,y)Z

Build Markov chain with

Norming constant Z

this as equilibrium

can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

↓ ↓

p(x|y) = p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.

3

Intro MCMC Dirichlet Results Conc Refs

Introduction to Markov chain Monte Carlo (MCMC)General reference: Brooks et al. (2011) MCMC Handbook.Suppose x represents an unknown (and therefore random!)parameter, and y represents data depending on theunknown parameter, joint probability density p(x,y).

Conditional density Joint probability density

↓ ↓

p(x|y) = p(x,y)Z

↑ ↑Build Markov chain with Norming constant Z

this as equilibrium can be hard to compute!

(no need to know Z)

Simulate Markov chain till approximate equilibrium.3

Intro MCMC Dirichlet Results Conc Refs

Example: MCMC for Anglo-Saxon statisticsSome historians conjecture, Anglo-Saxon placenames clusterby dissimilar names. Zanella (2015, 2016) uses MCMC:data provides some support, resulting in useful clustering.

4

Intro MCMC Dirichlet Results Conc Refs

Example: MCMC for Anglo-Saxon statisticsSome historians conjecture, Anglo-Saxon placenames clusterby dissimilar names. Zanella (2015, 2016) uses MCMC:data provides some support, resulting in useful clustering.

4

Intro MCMC Dirichlet Results Conc Refs

MCMC and optimal scaling

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

5

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].

Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).(Much easier to apply theory if chain is reversible.)Theory: En → E almost surely.

6

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).

(Much easier to apply theory if chain is reversible.)Theory: En → E almost surely.

6

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).(Much easier to apply theory if chain is reversible.)

Theory: En → E almost surely.

6

Intro MCMC Dirichlet Results Conc Refs

MCMC ideaGoal: estimate E = Eπ[h(X)].Method: simulate ergodic Markov chain with stationarydistribution π : use empirical estimate En = 1

n∑n0+nn=n0

h(Xn).(Much easier to apply theory if chain is reversible.)Theory: En → E almost surely.

6

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Intro MCMC Dirichlet Results Conc Refs

Varieties of MH-MCMCHere is the famous Metropolis-Hastings recipe for drawingfrom a distribution with density f :

Propose Y using conditional density q(y|x);Accept/Reject move from X to Y,

based on ratio f(Y) q(X|Y) / f(X) q(Y|X)

Options:

1. Independence sampler: proposal q(y|x) = q(y) doesn’tdepend on x;

2. Random walk (RW MH-MCMC): proposalq(y|x) = q(y − x) behaves as a random walk;

3. MALA MH-MCMC: proposalq(y|x) = q(y − x − λgrad logf) drifts towards hightarget density f .

We shall focus on RW MH-MCMC with Gaussian proposals.

7

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):mcmc.x += z

mcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):mcmc.x += z

mcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):mcmc.x += z

mcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):

mcmc.x += zmcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):

mcmc.x += zmcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Intro MCMC Dirichlet Results Conc Refs

Gaussian RW MH-MCMC

Simple Python code for Gaussian RW MH-MCMC, usingnormal and exponential from Numpy:

Propose multivariate Gaussian step;

Test whether to accept proposal by comparingexponential random variable with log MH ratio;

Implement step if accepted (vector addition).

while not mcmc.stopped():z = normal(0, tau, size=mcmc.dim)if exponential() > mcmc.phi(mcmc.x + z)-mcmc.phi(mcmc.x):

mcmc.x += zmcmc.record_result()

What is best choice of scale / standard deviation tau?

8

Intro MCMC Dirichlet Results Conc Refs

RW MH-MCMC with Gaussian proposals(smooth target, marginal ∝ exp(−x4))

Target is given by 10 i.i.d. coordinates.

Scale parameter for proposal: τ = 1 is too large!

Acceptance ratio 1.7%9

Intro MCMC Dirichlet Results Conc Refs

RW MH-MCMC with Gaussian proposals(smooth target, marginal ∝ exp(−x4))

Target is given by 10 i.i.d. coordinates.

Scale parameter for proposal: τ = 0.1 is better.

Acceptance ratio 76.5%9

Intro MCMC Dirichlet Results Conc Refs

RW MH-MCMC with Gaussian proposals(smooth target, marginal ∝ exp(−x4))

Target is given by 10 i.i.d. coordinates.

Scale parameter for proposal: τ = 0.01 is too small.

Acceptance ratio 98.5%9

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).

10

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).

10

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).

10

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (I)

RW MH-MCMC on (Rd, π⊗d)

π(dxi) = e−φ(xi)dxi; MH acceptance rule A(d) = 0 or 1.

X(d)0 = ( X1 , . . . , Xd ) Xiiid∼ π

X(d)1 = (X1 +A(d)W1, . . . , Xd +A(d)Wd ) Wiiid∼ N(0, σ2

d)

Questions: (1) complexity as d ↑ ∞? (2) optimal σd?

Theorem (Roberts, Gelman and Gilks, 1997)

Given σ2d =

σ2

d , Lipschitz φ′, and finite Eπ[(φ′)8], Eπ[(φ′′)4]

{X(d)btdc,1}t ⇒ Z where dZt = s(σ)12 dBt + 1

2s(σ)φ′(Zt) dt .

Answers: (1) mix in O(d) steps; (2) σmax = arg maxσ s(σ).10

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (II)Optimization: maximize s(σ)!

Given I = Eπ[φ′(X)2] and normal CDF Φ,

s(σ) = σ2 2Φ(−σ√I

2 ) = σ2A(σ) = 4I(Φ−1(A(σ)2 )

)2A(σ)

So σmax maximized by choosing asymptotic acceptance rate

A(σmax) = arg maxA∈[0,1]{(Φ−1(A2 )

)2A}}≈ 0.234

Strengths:Establish complexity as d→∞;Practical information on how to tune proposal;Does not depend on φ (CLT-type universality).

Some weaknesses that we will address: (there are others)Convergence of marginal rather than joint distributionStrong regularity assumptions:Lipschitz g′, finite E

[(g′)8

], E[(g′′)4

].

11

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

MCMC Optimal Scaling: classic result (III)There is a wide range of extensions: for example,

Langevin / MALA, for which the magic acceptanceprobability is 0.574 (Roberts and Rosenthal, 1998);

Non-identically distributed independent targetcoordinates (Bédard, 2007);

Gibbs random fields (Breyer and Roberts, 2000);

Infinite dimensional random fields (Mattingly, Pillai andStuart, 2012);

Markov chains on a hypercube (Roberts, 1998);

Adaptive MCMC; adjust online to optimize acceptanceprobability (Andrieu and Thoms, 2008; Rosenthal, 2011).

All these build on the s.d.e. approach of Roberts,Gelman and Gilks (1997); hence regularity conditionstend to be severe (but see Durmus et al., 2016).

12

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and optimal scaling

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

13

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 1Definition of Dirichlet form

A (symmetric) Dirichlet form E on a Hilbert space H is aclosed bilinear function E(u,v), defined / finite for anyu,v ∈ D ⊆ H, which satisfies:

1. D is a dense linear subspace of H;

2. E(u,v) = E(v,u) for u,v ∈ D, so E is symmetric;

3. E(u) = E(u,u) ≥ 0 for u ∈ D;

4. D is a Hilbert space under the (“Sobolev”) inner product〈u,v〉 + E(u,v);

5. If u ∈ D then u∗ = (u∧ 1)∨ 0 ∈ D, moreoverE(u∗, u∗) ≤ E(u,u).

Relate to Markov process if (quasi)-regular.

Regular Dirichlet form for locally compact Polish E:

D∩ C0(E) is E12 -dense in D, uniformly dense in C0(E).

14

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 1Definition of Dirichlet form

A (symmetric) Dirichlet form E on a Hilbert space H is aclosed bilinear function E(u,v), defined / finite for anyu,v ∈ D ⊆ H, which satisfies:

1. D is a dense linear subspace of H;

2. E(u,v) = E(v,u) for u,v ∈ D, so E is symmetric;

3. E(u) = E(u,u) ≥ 0 for u ∈ D;

4. D is a Hilbert space under the (“Sobolev”) inner product〈u,v〉 + E(u,v);

5. If u ∈ D then u∗ = (u∧ 1)∨ 0 ∈ D, moreoverE(u∗, u∗) ≤ E(u,u).

Relate to Markov process if (quasi)-regular.Regular Dirichlet form for locally compact Polish E:

D∩ C0(E) is E12 -dense in D, uniformly dense in C0(E).

14

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Intro MCMC Dirichlet Results Conc Refs

Dirichlet forms and MCMC 2Two examples

1. Dirichlet form obtained from (re-scaled) RW MH-MCMC:

Ed(h) = d2E[(h(X(d)1 )− h(X(d)0 )

)2].

(Ed can be viewed as the Dirichlet form arising fromspeeding up the RW MH-MCMC by rate d.)

2. Heuristic “infinite-dimensional diffusion” limit of thisform under scaling:

E∞(h) = s(σ)2Eπ⊗∞

[|∇h|2

].

Under mild conditions this is:closable

√, Dirichlet

√, quasi-regular

√.

Can we deduce that the RW MH-MCMC scales to look likethe “infinite-dimensional diffusion”,by showing that Ed “converges” to E∞?

15

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;

(Γ2) For every h ∈ H there are hn → h ∈ H such thatE∞(h) ≥ lim supnEn(hn).

2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).

2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;

(M2) For every h ∈ H there are hn → h strongly in H suchthat E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).

3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Useful modes of convergence for Dirichlet forms1. Gamma-convergence; En “Γ -converges” to E∞ if

(Γ1) E∞(h) ≤ lim infnEn(hn) whenever hn → h ∈ H;(Γ2) For every h ∈ H there are hn → h ∈ H such that

E∞(h) ≥ lim supnEn(hn).2. Mosco (1994) introduces stronger conditions;

(M1) E∞(h) ≤ lim infnEn(hn) whenever hn → h weakly in H;(M2) For every h ∈ H there are hn → h strongly in H such

that E∞(h) ≥ lim supnEn(hn).3. Mosco (1994, Theorem 2.4.1, Corollary 2.6.1):

conditions (M1) and (M2) imply convergence ofassociated resolvent operators,

and indeed of associated semigroups.

4. Sun (1998) gives further conditions which imply weakconvergence of the associated processes: theseconditions are implied by existence of a finite constant Csuch that En(h) ≤ C(|h|2 +E(h)) for all h ∈ H.

16

Intro MCMC Dirichlet Results Conc Refs

Results and methods of proofs

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

17

Intro MCMC Dirichlet Results Conc Refs

Results

Theorem (Zanella, Bédard and WSK, 2016)

Consider the Gaussian RW MH-MCMC based on proposalvariance σ2/d with target π⊗d, where dπ = f dx = e−φ dx.Suppose I =

∫∞∞ |φ′|2f dx <∞ (finite Fisher information),

and |φ′(x + v)−φ′(x)| < κmax{|v|γ , |v|α}for some κ > 0, 0 < γ < 1, and α > 1.

Let Ed be the corresponding Dirichlet form scaled as above.Ed Mosco-converges to E

[1∧ exp(N (−1

2σ2I, σ2I))

]E∞,

so corresponding L2 semigroups also converge.

Corollary

Suppose in the above that φ′ is globally Lipschitz. Thecorrespondingly scaled processes exhibit weak convergence.

18

Intro MCMC Dirichlet Results Conc Refs

Results

Theorem (Zanella, Bédard and WSK, 2016)

Consider the Gaussian RW MH-MCMC based on proposalvariance σ2/d with target π⊗d, where dπ = f dx = e−φ dx.Suppose I =

∫∞∞ |φ′|2f dx <∞ (finite Fisher information),

and |φ′(x + v)−φ′(x)| < κmax{|v|γ , |v|α}for some κ > 0, 0 < γ < 1, and α > 1.

Let Ed be the corresponding Dirichlet form scaled as above.Ed Mosco-converges to E

[1∧ exp(N (−1

2σ2I, σ2I))

]E∞,

so corresponding L2 semigroups also converge.

Corollary

Suppose in the above that φ′ is globally Lipschitz. Thecorrespondingly scaled processes exhibit weak convergence.

18

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 1: a CLT result

Lemma (A conditional CLT)

Under the conditions of the Corollary, almost surely (in xwith invariant measure π⊗∞) the log Metropolis-Hastingsratio converges weakly (in W ) as follows as d→∞:

log

d∏i=1

f(xi + σWi√d )

f (xi)

=

d∑i=1

(φ(xi + σWi√

d )−φ(xi))⇒ N (−1

2σ2I, σ2I) .

We may use this to deduce the asymptotic acceptance rate ofthe RW MH-MCMC sampler.

19

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 1: a CLT result

Lemma (A conditional CLT)

Under the conditions of the Corollary, almost surely (in xwith invariant measure π⊗∞) the log Metropolis-Hastingsratio converges weakly (in W ) as follows as d→∞:

log

d∏i=1

f(xi + σWi√d )

f (xi)

=

d∑i=1

(φ(xi + σWi√

d )−φ(xi))⇒ N (−1

2σ2I, σ2I) .

We may use this to deduce the asymptotic acceptance rate ofthe RW MH-MCMC sampler.

19

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).2. Decompose variance of second summand to deduce

Var[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).

2. Decompose variance of second summand to deduceVar

[∑di=1

σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).2. Decompose variance of second summand to deduce

Var[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Intro MCMC Dirichlet Results Conc Refs

Key idea for CLT

Use exact Taylor expansion techniques:

d∑i=1

(φ(xi + σWi√

d )−φ(xi))=

d∑i=1

φ′(xi)σWi√d +

d∑i=1

σWi√d

∫ 1

0

(φ′(xi + σWi√

d u)−φ′(xi)

)du .

Condition implicitly on x for first 2.5 steps.

1. First summand converges weakly to N (0, σ2I).2. Decompose variance of second summand to deduce

Var[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ 0.

3. Use Hoeffding’s inequality then absolute expectations:

E[∑d

i=1σWi√d

∫ 10

(φ′(xi + σWi√

d u−φ′(xi)

)du]→ −1

2σ2I.

20

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 2: establishing condition (M2)

For every h ∈ L2(π⊗∞), find hn → h (strongly) in L2(π⊗∞)such that E∞(h) ≥ lim supnEn(hn).

1. Sufficient to consider case E∞(h) <∞.

2. Find sequence of smooth cylinder functions hn with“compact cylindrical support”, such that|E∞(h)−E∞(hn)| ≤ 1/n.

3. Using smoothness etc, Em(hn)→ E∞(hn) as m →∞.

4. Subsequences . . . .

21

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 3: establishing condition (M1)

If hn → h weakly in L2(π⊗∞), show E∞(h) ≤ lim infnEn(hn).Detailed stochastic analysis involves:

1. Set Ψn(h) =√n2 (h(X

(n)0 )− h(X(n)1 )).

2. Integrate against test functionξ(X1:N ,W1:N)I(U < a(X1:N ,W1:N)) for ξ smooth,compact support, U a Uniform(0,1) random variable.Apply Cauchy-Schwarz.

3. Use integration by parts, careful analysis and conditionson φ′.

22

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 3: establishing condition (M1)

If hn → h weakly in L2(π⊗∞), show E∞(h) ≤ lim infnEn(hn).Detailed stochastic analysis involves:

1. Set Ψn(h) =√n2 (h(X

(n)0 )− h(X(n)1 )).

2. Integrate against test functionξ(X1:N ,W1:N)I(U < a(X1:N ,W1:N)) for ξ smooth,compact support, U a Uniform(0,1) random variable.Apply Cauchy-Schwarz.

3. Use integration by parts, careful analysis and conditionson φ′.

22

Intro MCMC Dirichlet Results Conc Refs

Methods of proof 3: establishing condition (M1)

If hn → h weakly in L2(π⊗∞), show E∞(h) ≤ lim infnEn(hn).Detailed stochastic analysis involves:

1. Set Ψn(h) =√n2 (h(X

(n)0 )− h(X(n)1 )).

2. Integrate against test functionξ(X1:N ,W1:N)I(U < a(X1:N ,W1:N)) for ξ smooth,compact support, U a Uniform(0,1) random variable.Apply Cauchy-Schwarz.

3. Use integration by parts, careful analysis and conditionson φ′.

22

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:

there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Intro MCMC Dirichlet Results Conc Refs

Doing even better

Durmus et al. (2016) introduce Lp mean differentiability:there is φ such that, for some p > 2, some α > 0,

φ(X +u)−φ(X) = (φ(X)+ R(X,u)) u ,E [|R(X,u)|p]1/p = o(|u|α) .

Also I = E[|φ|2

]<∞.

Durmus et al. (2016) obtain optimal scaling results when

p > 4, and E[|φ|6

]<∞,

Lp mean differentiability applies straightforwardly to theZanella, Bédard and WSK (2016) argument mutatis mutandis:the regularity conditions can be weakened even more at leastfor vague convergence.

23

Intro MCMC Dirichlet Results Conc Refs

Conclusion

Introduction

MCMC and optimal scaling

Dirichlet forms and optimal scaling

Results and methods of proofs

Conclusion

24

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);

MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

Conclusion

The Dirichlet form approach allows significant relaxationof conditions required for optimal scaling results;

Combine with Lp mean differentiability to obtain furtherrelaxation of regularity conditions;

Soft argument for 12variance+mean ≈ 0 implied by

N (−12σ

2I, σ2I);MALA generalization (exercise in progress);

Need to explore development beyond i.i.d. targets;e.g. can regularity be similarly relaxed in more generalrandom field settings?

Apply to discrete Markov chain cases?(c.f. Roberts, 1998);

Investigate applications to Adaptive MCMC.

25

Intro MCMC Dirichlet Results Conc Refs

References:

Andrieu, Christophe and Johannes Thoms (2008). “Atutorial on adaptive MCMC”. In: Statistics andComputing 18.4, pp. 343–373.Bédard, Mylène (2007). “Weak convergence ofMetropolis algorithms for non-I.I.D. target distributions”.In: Annals of Applied Probability 17.4, pp. 1222–1244.

Breyer, L A and Gareth O Roberts (2000). “FromMetropolis to diffusions : Gibbs states and optimalscaling”. In: Stochastic Processes and their Applications90.2, pp. 181–206.Brooks, Stephen P, Andrew Gelman, Galin L Jones andXiao-Li Meng (2011). Handbook of Markov Chain MonteCarlo. Boca Raton: Chapman & Hall/CRC, pp. 592+xxv.

26

Intro MCMC Dirichlet Results Conc Refs

Durmus, Alain, Sylvain Le Corff, Eric Moulines andGareth O Roberts (2016). “Optimal scaling of theRandom Walk Metropolis algorithm under $Lˆp$ meandifferentiability”. In: arXiv 1604.06664.Geyer, Charlie (1999). “Likelihood inference for spatialpoint processes”. In: Stochastic Geometry: likelihoodand computation. Ed. by Ole E Barndorff-Nielsen, WSKand M N M van Lieshout. Boca Raton: Chapman &Hall/CRC. Chap. 4, pp. 79–140.Hastings, W K (1970). “Monte Carlo sampling methodsusing Markov chains and their applications”. In:Biometrika 57, pp. 97–109.Mattingly, Jonathan C., Natesh S. Pillai andAndrew M. Stuart (2012). “Diffusion limits of the randomwalk metropolis algorithm in high dimensions”. In:Annals of Applied Probability 22.3, pp. 881–890.

27

Intro MCMC Dirichlet Results Conc Refs

Metropolis, Nicholas, Arianna W. Rosenbluth,Marshall N. Rosenbluth, Augusta H. Teller andEdward Teller (1953). “Equation of state calculations byfast computing machines”. en. In: Journal ChemicalPhysics 21.6, pp. 1087–1092.Mosco, Umberto (1994). “Composite media andasymptotic Dirichlet forms”. In: Journal of FunctionalAnalysis 123.2, pp. 368–421.Roberts, Gareth O (1998). “Optimal Metropolisalgorithms for product measures on the vertices of ahypercube”. In: Stochastics and Stochastic Reports June2013, pp. 37–41.Roberts, Gareth O, A Gelman and W Gilks (1997). “WeakConvergence and Optimal Scaling of Random WalkAlgorithms”. In: The Annals of Applied Probability 7.1,pp. 110–120.

28

Intro MCMC Dirichlet Results Conc Refs

Roberts, Gareth O and Jeffrey S Rosenthal (1998).“Optimal scaling of discrete approximations to Langevindiffusions.”. In: J. R. Statist. Soc. B 60.1, pp. 255–268.Rosenthal, Jeffrey S (2011). “Optimal ProposalDistributions and Adaptive MCMC”. In: Handbook ofMarkov Chain Monte Carlo 1, pp. 93–112.Sun, Wei (1998). “Weak convergence of Dirichletprocesses”. In: Science in China Series A: Mathematics41.1, pp. 8–21.Thompson, Elizabeth A (2005). “MCMC in the analysis ofgenetic data on pedigree”. In: Markov chain MonteCarlo: Innovations and Applications. Ed. by WSK,Faming Liang and Jian-Sheng Wang. Singapore: WorldScientific. Chap. 5, pp. 183–216.Zanella, Giacomo (2015). “Bayesian ComplementaryClustering, MCMC, and Anglo-Saxon Placenames”. PhDThesis. University of Warwick.

29

Intro MCMC Dirichlet Results Conc Refs

Zanella, Giacomo (2016). “Random Partition Models andComplementary Clustering of Anglo-Saxon Placenames”.In: Annals of Applied Statistics 9.4, pp. 1792–1822.

Zanella, Giacomo, Mylène Bédard and WSK (2016). “ADirichlet Form approach to MCMC Optimal Scaling”. In:arXiv 1606.01528, 22pp.

30

Intro MCMC Dirichlet Results Conc Refs

Version 1.34 (Wed Jul 12 06:27:58 2017 +0100)================================================commit d81469c5f14c484aa363616e3d701c6e3fbb1141Author: Wilfrid Kendall <W.S.Kendall@warwick.ac.uk>

Made it explicit that Lp mean differentiability still doesn’t cover weak ocnvergencewithout extra regularity: need to beat this!

31

Recommended