103
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October 13, 2010

Mark Girolami's Read Paper 2010

Embed Size (px)

DESCRIPTION

Mark Girolami's slides of his Read Paper at the RSS on October 13, 2

Citation preview

Page 1: Mark Girolami's Read Paper 2010

Manifold Monte Carlo Methods

Mark Girolami

Department of Statistical ScienceUniversity College London

Joint work with Ben Calderhead

Research Section Ordinary Meeting

The Royal Statistical SocietyOctober 13, 2010

Page 2: Mark Girolami's Read Paper 2010

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

 

I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.

I Advanced Monte Carlo methodology founded on geometric principles

Page 3: Mark Girolami's Read Paper 2010

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

 

I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.

I Advanced Monte Carlo methodology founded on geometric principles

Page 4: Mark Girolami's Read Paper 2010

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

 

I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.

I Advanced Monte Carlo methodology founded on geometric principles

Page 5: Mark Girolami's Read Paper 2010

Riemann manifold Langevin and Hamiltonian Monte Carlo Methods

 

I Riemann manifold Langevin and Hamiltonian Monte Carlo MethodsGirolami, M. & Calderhead, B., J.R.Statist. Soc. B (2011), 73, 2, 1 - 37.

I Advanced Monte Carlo methodology founded on geometric principles

Page 6: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 7: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 8: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 9: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 10: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 11: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 12: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 13: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 14: Mark Girolami's Read Paper 2010

Talk Outline

I Motivation to improve MCMC capability for challenging problems

I Stochastic diffusion as adaptive proposal process

I Exploring geometric concepts in MCMC methodology

I Diffusions across Riemann manifold as proposal mechanism

I Deterministic geodesic flows on manifold form basis of MCMC methods

I Illustrative Examples:-

I Warped Bivariate Gaussian

I Gaussian mixture model

I Log-Gaussian Cox process

I Conclusions

Page 15: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 16: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 17: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 18: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 19: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 20: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 21: Mark Girolami's Read Paper 2010

Motivation Simulation Based Inference

I Monte Carlo method employs samples from p(θ) to obtain estimateZφ(θ)p(θ)dθ =

1N

Xnφ(θn) +O(N−

12 )

I Draw θn from ergodic Markov process with stationary distribution p(θ)

I Construct process in two parts

I Propose a move θ → θ′ with probability pp(θ′|θ)

I accept or reject proposal with probability

pa(θ′|θ) = min

»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Efficiency dependent on pp(θ′|θ) defining proposal mechanism

I Success of MCMC reliant upon appropriate proposal design

Page 22: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 23: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 24: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 25: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 26: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 27: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 28: Mark Girolami's Read Paper 2010

Adaptive Proposal Distributions - Exploit Discretised Diffusion

I For θ ∈ RD with density p(θ), L(θ) ≡ log p(θ), define Langevin diffusion

dθ(t) =12∇θL(θ(t))dt + db(t)

I First order Euler-Maruyama discrete integration of diffusion

θ(τ + ε) = θ(τ) +ε2

2∇θL(θ(τ)) + εz(τ)

I Proposalpp(θ′|θ) = N (θ′|µ(θ, ε), ε2I) with µ(θ, ε) = θ +

ε2

2∇θL(θ)

I Acceptance probability to correct for bias

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Isotropic diffusion inefficient, employ pre-conditioning

θ′ = θ + ε2M∇θL(θ)/2 + ε√

Mz

I How to set M systematically? Tuning in transient & stationary phases

Page 29: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,

1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 30: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,

1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 31: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,

1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 32: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifold

I Non-Euclidean geometry for probabilities - distances, metrics, invariants,curvature, geodesics

I Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 33: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesics

I Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 34: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,

1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 35: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,

1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 36: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Rao, 1945; Jeffreys, 1948, to first orderZp(y; θ + δθ) log

p(y; θ + δθ)

p(y; θ)dθ ≈ δθTG(θ)δθ

where

G(θ) = Ey|θ

(∇θp(y; θ)

p(y; θ)

∇θp(y; θ)

p(y; θ)

T)

I Fisher Information G(θ) is p.d. metric defining a Riemann manifoldI Non-Euclidean geometry for probabilities - distances, metrics, invariants,

curvature, geodesicsI Asymptotic statistical analysis. Amari, 1981, 85, 90; Murray & Rice,

1993; Critchley et al, 1993; Kass, 1989; Efron, 1975; Dawid, 1975;Lauritsen, 1989

I Statistical shape analysis Kent et al, 1996; Dryden & Mardia, 1998

I Can geometric structure be employed in MCMC methodology?

Page 37: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Tangent space - local metric defined by δθTG(θ)δθ =P

k,l gklδθkδθl

I Christoffel symbols - characterise connection on curved manifold

Γikl =

12

Xm

g im„∂gmk

∂θl +∂gml

∂θk −∂gkl

∂θm

«I Geodesics - shortest path between two points on manifold

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

Page 38: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Tangent space - local metric defined by δθTG(θ)δθ =P

k,l gklδθkδθl

I Christoffel symbols - characterise connection on curved manifold

Γikl =

12

Xm

g im„∂gmk

∂θl +∂gml

∂θk −∂gkl

∂θm

«I Geodesics - shortest path between two points on manifold

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

Page 39: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Tangent space - local metric defined by δθTG(θ)δθ =P

k,l gklδθkδθl

I Christoffel symbols - characterise connection on curved manifold

Γikl =

12

Xm

g im„∂gmk

∂θl +∂gml

∂θk −∂gkl

∂θm

«I Geodesics - shortest path between two points on manifold

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

Page 40: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Tangent space - local metric defined by δθTG(θ)δθ =P

k,l gklδθkδθl

I Christoffel symbols - characterise connection on curved manifold

Γikl =

12

Xm

g im„∂gmk

∂θl +∂gml

∂θk −∂gkl

∂θm

«

I Geodesics - shortest path between two points on manifold

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

Page 41: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Tangent space - local metric defined by δθTG(θ)δθ =P

k,l gklδθkδθl

I Christoffel symbols - characterise connection on curved manifold

Γikl =

12

Xm

g im„∂gmk

∂θl +∂gml

∂θk −∂gkl

∂θm

«I Geodesics - shortest path between two points on manifold

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

Page 42: Mark Girolami's Read Paper 2010

Geometric Concepts in MCMC

I Tangent space - local metric defined by δθTG(θ)δθ =P

k,l gklδθkδθl

I Christoffel symbols - characterise connection on curved manifold

Γikl =

12

Xm

g im„∂gmk

∂θl +∂gml

∂θk −∂gkl

∂θm

«I Geodesics - shortest path between two points on manifold

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

Page 43: Mark Girolami's Read Paper 2010

Illustration of Geometric Concepts

I Consider Normal density p(x |µ, σ) = Nx (µ, σ)

I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T

I Metric is Fisher Information

G(µ, σ) =

»σ−2 0

0 2σ−2

I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean

I Metric tensor for univariate Normal defines a Hyperbolic Space

Page 44: Mark Girolami's Read Paper 2010

Illustration of Geometric Concepts

I Consider Normal density p(x |µ, σ) = Nx (µ, σ)

I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T

I Metric is Fisher Information

G(µ, σ) =

»σ−2 0

0 2σ−2

I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean

I Metric tensor for univariate Normal defines a Hyperbolic Space

Page 45: Mark Girolami's Read Paper 2010

Illustration of Geometric Concepts

I Consider Normal density p(x |µ, σ) = Nx (µ, σ)

I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T

I Metric is Fisher Information

G(µ, σ) =

»σ−2 0

0 2σ−2

I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean

I Metric tensor for univariate Normal defines a Hyperbolic Space

Page 46: Mark Girolami's Read Paper 2010

Illustration of Geometric Concepts

I Consider Normal density p(x |µ, σ) = Nx (µ, σ)

I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T

I Metric is Fisher Information

G(µ, σ) =

»σ−2 0

0 2σ−2

I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean

I Metric tensor for univariate Normal defines a Hyperbolic Space

Page 47: Mark Girolami's Read Paper 2010

Illustration of Geometric Concepts

I Consider Normal density p(x |µ, σ) = Nx (µ, σ)

I Local inner product on tangent space defined by metric tensor, i.e.δθTG(θ)δθ, where θ = (µ, σ)T

I Metric is Fisher Information

G(µ, σ) =

»σ−2 0

0 2σ−2

I Inner-product σ−2(δµ2 + 2δσ2) so densities N (0, 1) & N (1, 1) furtherapart than the densities N (0, 2) & N (1, 2) - distance non-Euclidean

I Metric tensor for univariate Normal defines a Hyperbolic Space

Page 48: Mark Girolami's Read Paper 2010

M.C. Escher, Heaven and Hell, 1960

Page 49: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

I Discretised Langevin diffusion on manifold defines proposal mechanism

θ′d = θd +ε2

2

“G−1(θ)∇θL(θ)

”d− ε2

DXi,j

G(θ)−1ij Γd

ij + ε“p

G−1(θ)z”

d

I Manifold with constant curvature then proposal mechanism reduces to

θ′ = θ +ε2

2G−1(θ)∇θL(θ) + ε

pG−1(θ)z

I MALA proposal with preconditioning

θ′ = θ +ε2

2M∇θL(θ) + ε

√Mz

I Proposal and acceptance probability

pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Proposal mechanism diffuses approximately along the manifold

Page 50: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

I Discretised Langevin diffusion on manifold defines proposal mechanism

θ′d = θd +ε2

2

“G−1(θ)∇θL(θ)

”d− ε2

DXi,j

G(θ)−1ij Γd

ij + ε“p

G−1(θ)z”

d

I Manifold with constant curvature then proposal mechanism reduces to

θ′ = θ +ε2

2G−1(θ)∇θL(θ) + ε

pG−1(θ)z

I MALA proposal with preconditioning

θ′ = θ +ε2

2M∇θL(θ) + ε

√Mz

I Proposal and acceptance probability

pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Proposal mechanism diffuses approximately along the manifold

Page 51: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

I Discretised Langevin diffusion on manifold defines proposal mechanism

θ′d = θd +ε2

2

“G−1(θ)∇θL(θ)

”d− ε2

DXi,j

G(θ)−1ij Γd

ij + ε“p

G−1(θ)z”

d

I Manifold with constant curvature then proposal mechanism reduces to

θ′ = θ +ε2

2G−1(θ)∇θL(θ) + ε

pG−1(θ)z

I MALA proposal with preconditioning

θ′ = θ +ε2

2M∇θL(θ) + ε

√Mz

I Proposal and acceptance probability

pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Proposal mechanism diffuses approximately along the manifold

Page 52: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

I Discretised Langevin diffusion on manifold defines proposal mechanism

θ′d = θd +ε2

2

“G−1(θ)∇θL(θ)

”d− ε2

DXi,j

G(θ)−1ij Γd

ij + ε“p

G−1(θ)z”

d

I Manifold with constant curvature then proposal mechanism reduces to

θ′ = θ +ε2

2G−1(θ)∇θL(θ) + ε

pG−1(θ)z

I MALA proposal with preconditioning

θ′ = θ +ε2

2M∇θL(θ) + ε

√Mz

I Proposal and acceptance probability

pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

I Proposal mechanism diffuses approximately along the manifold

Page 53: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

I Discretised Langevin diffusion on manifold defines proposal mechanism

θ′d = θd +ε2

2

“G−1(θ)∇θL(θ)

”d− ε2

DXi,j

G(θ)−1ij Γd

ij + ε“p

G−1(θ)z”

d

I Manifold with constant curvature then proposal mechanism reduces to

θ′ = θ +ε2

2G−1(θ)∇θL(θ) + ε

pG−1(θ)z

I MALA proposal with preconditioning

θ′ = θ +ε2

2M∇θL(θ) + ε

√Mz

I Proposal and acceptance probability

pp(θ′|θ) = N (θ′|µ(θ, ε), ε2G(θ))

pa(θ′|θ) = min»1,

p(θ′)pp(θ|θ′)p(θ)pp(θ′|θ)

–I Proposal mechanism diffuses approximately along the manifold

Page 54: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

−5 0 55

10

15

20

25

30

35

40

µ

!

−5 0 55

10

15

20

25

30

35

40

µ

!

Page 55: Mark Girolami's Read Paper 2010

Langevin Diffusion on Riemannian manifold

−10 0 10

5

10

15

20

25

µ

!

−10 0 10

5

10

15

20

25

µ

!

Page 56: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour - first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 57: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour - first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 58: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour - first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 59: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour

- first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 60: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour - first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 61: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour - first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 62: Mark Girolami's Read Paper 2010

Geodesic flow as proposal mechanism

I Desirable that proposals follow direct path on manifold - geodesics

I Define geodesic flow on manifold by solving

d2θi

dt2 +Xk,l

Γikl

dθk

dtdθl

dt= 0

I How can this be exploited in the design of a transition operator?

I Need slight detour - first define log-density under model as L(θ)

I Introduce auxiliary variable p ∼ N (0,G(θ))

I Negative joint log density is

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|+ 12

pTG(θ)−1p

Page 63: Mark Girolami's Read Paper 2010

Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|| {z }Potential Energy

+12

pTG(θ)−1p| {z }Kinetic Energy

I Marginal density follows as required

p(θ) ∝ exp {L(θ)}p2πD|G(θ)|

Zexp

−1

2pTG(θ)−1p

ffdp = exp {L(θ)}

I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)

pn+1|θn ∼ N (0,G(θn))

θn+1|pn+1 ∼ p(θn+1|pn+1)

I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

Page 64: Mark Girolami's Read Paper 2010

Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|| {z }Potential Energy

+12

pTG(θ)−1p| {z }Kinetic Energy

I Marginal density follows as required

p(θ) ∝ exp {L(θ)}p2πD|G(θ)|

Zexp

−1

2pTG(θ)−1p

ffdp = exp {L(θ)}

I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)

pn+1|θn ∼ N (0,G(θn))

θn+1|pn+1 ∼ p(θn+1|pn+1)

I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

Page 65: Mark Girolami's Read Paper 2010

Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|| {z }Potential Energy

+12

pTG(θ)−1p| {z }Kinetic Energy

I Marginal density follows as required

p(θ) ∝ exp {L(θ)}p2πD|G(θ)|

Zexp

−1

2pTG(θ)−1p

ffdp = exp {L(θ)}

I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)

pn+1|θn ∼ N (0,G(θn))

θn+1|pn+1 ∼ p(θn+1|pn+1)

I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

Page 66: Mark Girolami's Read Paper 2010

Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|| {z }Potential Energy

+12

pTG(θ)−1p| {z }Kinetic Energy

I Marginal density follows as required

p(θ) ∝ exp {L(θ)}p2πD|G(θ)|

Zexp

−1

2pTG(θ)−1p

ffdp = exp {L(θ)}

I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)

pn+1|θn ∼ N (0,G(θn))

θn+1|pn+1 ∼ p(θn+1|pn+1)

I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

Page 67: Mark Girolami's Read Paper 2010

Riemannian Hamiltonian Monte CarloI Negative joint log-density ≡ Hamiltonian defined on Riemann manifold

H(θ,p) = −L(θ) +12

log 2πD|G(θ)|| {z }Potential Energy

+12

pTG(θ)−1p| {z }Kinetic Energy

I Marginal density follows as required

p(θ) ∝ exp {L(θ)}p2πD|G(θ)|

Zexp

−1

2pTG(θ)−1p

ffdp = exp {L(θ)}

I Obtain samples from marginal p(θ) using Gibbs sampler for p(θ,p)

pn+1|θn ∼ N (0,G(θn))

θn+1|pn+1 ∼ p(θn+1|pn+1)

I Employ Hamiltonian dynamics to propose samples for p(θn+1|pn+1),Duane et al, 1987; Neal, 2010.

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

Page 68: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 69: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 70: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 71: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 72: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 73: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 74: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 75: Mark Girolami's Read Paper 2010

Riemannian Manifold Hamiltonian Monte Carlo

I Consider the Hamiltonian H̃(θ,p) = 12 pTG̃(θ)−1p

I Hamiltonians with only a quadratic kinetic energy term exactly describegeodesic flow on the coordinate space θ with metric G̃

I However our Hamiltonian is H(θ,p) = V (θ) + 12 pTG(θ)−1p

I If we define G̃(θ) = G(θ)× (h − V (θ)), where h is a constant H(θ,p)

I Then the Maupertuis principle tells us that the Hamiltonian flow forH(θ,p) and H̃(θ,p) are equivalent along energy level h

I The solution of

dt=

∂pH(θ,p)

dpdt

= − ∂

∂θH(θ,p)

is therefore equivalent to the solution of

d2θi

dt2 +Xk,l

Γ̃ikl

dθk

dtdθl

dt= 0

I RMHMC proposals are along the manifold geodesics

Page 76: Mark Girolami's Read Paper 2010

Warped Bivariate GaussianI p(w1,w2|y, σx , σy ) ∝

QNn=1N (yn|w1 + w2

2 , σ2y )N (w1,w2|0, σ2

x I)

-4 -3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

w! w!

w" w"

HMC RMHMC

Page 77: Mark Girolami's Read Paper 2010

Warped Bivariate GaussianI p(w1,w2|y, σx , σy ) ∝

QNn=1N (yn|w1 + w2

2 , σ2y )N (w1,w2|0, σ2

x I)

-4 -3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

w! w!

w" w"

HMC RMHMC

Page 78: Mark Girolami's Read Paper 2010

Gaussian Mixture Model

I Univariate finite mixture model

p(xi |θ) =KX

k=1

πkN (xi |µk , σ2k )

I FI based metric tensor non-analytic - employ empirical FI

G(θ) =1N

ST S − 1N2 s̄s̄T −−−−→

N→∞cov (∇θL(θ)) = I(θ)

∂G(θ)

∂θd=

1N

∂ST

∂θdS + ST ∂S

∂θd

!− 1

N2

„∂s̄∂θd

s̄T + s̄∂s̄T

∂θd

«with score matrix S with elements Si,d = ∂ log p(xi |θ)

∂θdand s̄ =

PNi=1 ST

i,·

Page 79: Mark Girolami's Read Paper 2010

Gaussian Mixture Model

I Univariate finite mixture model

p(xi |θ) =KX

k=1

πkN (xi |µk , σ2k )

I FI based metric tensor non-analytic - employ empirical FI

G(θ) =1N

ST S − 1N2 s̄s̄T −−−−→

N→∞cov (∇θL(θ)) = I(θ)

∂G(θ)

∂θd=

1N

∂ST

∂θdS + ST ∂S

∂θd

!− 1

N2

„∂s̄∂θd

s̄T + s̄∂s̄T

∂θd

«with score matrix S with elements Si,d = ∂ log p(xi |θ)

∂θdand s̄ =

PNi=1 ST

i,·

Page 80: Mark Girolami's Read Paper 2010

Gaussian Mixture Model

I Univariate finite mixture model

p(xi |θ) =KX

k=1

πkN (xi |µk , σ2k )

I FI based metric tensor non-analytic - employ empirical FI

G(θ) =1N

ST S − 1N2 s̄s̄T −−−−→

N→∞cov (∇θL(θ)) = I(θ)

∂G(θ)

∂θd=

1N

∂ST

∂θdS + ST ∂S

∂θd

!− 1

N2

„∂s̄∂θd

s̄T + s̄∂s̄T

∂θd

«with score matrix S with elements Si,d = ∂ log p(xi |θ)

∂θdand s̄ =

PNi=1 ST

i,·

Page 81: Mark Girolami's Read Paper 2010

Gaussian Mixture ModelI Univariate finite mixture model

p(x |µ, σ2) = 0.7×N (x |0, σ2) + 0.3×N (x |µ, σ2)

µ

lnσ

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Figure: Arrows correspond to the gradients and ellipses to the inverse metrictensor. Dashed lines are isocontours of the joint log density

Page 82: Mark Girolami's Read Paper 2010

Gaussian Mixture ModelI Univariate finite mixture model

p(x |µ, σ2) = 0.7×N (x |0, σ2) + 0.3×N (x |µ, σ2)

µ

lnσ

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

Figure: Arrows correspond to the gradients and ellipses to the inverse metrictensor. Dashed lines are isocontours of the joint log density

Page 83: Mark Girolami's Read Paper 2010

Gaussian Mixture Model

µ

lnσ

MALA path

−910

−1010

−1110

−1210

−1310

−1410

−1410

−2910

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ

lnσ

mMALA path

−910

−1010

−1110

−1210

−1310

−1410

−1410

−2910

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ

lnσ

Simplified mMALA path

−910

−1010

−1110

−1210

−1310

−1410

−1410

−2910

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Auto

corr

ela

tion

MALA Sample Autocorrelation Function

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Auto

corr

ela

tion

mMALA Sample Autocorrelation Function

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Auto

corr

ela

tion

Simplified mMALA Sample Autocorrelation Function

Figure: Comparison of MALA (left), mMALA (middle) and simplified mMALA (right)convergence paths and autocorrelation plots. Autocorrelation plots are from thestationary chains, i.e. once the chains have converged to the stationary distribution.

Page 84: Mark Girolami's Read Paper 2010

Gaussian Mixture Model

µ

lnσ

HMC path

−910

−1010

−1110

−1210

−1310

−1410

−1410

−2910

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ

lnσ

RMHMC path

−910

−1010

−1110

−1210

−1310

−1410

−1410

−2910

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ

lnσ

Gibbs path

−910

−1010

−1110

−1210

−1310

−1410

−1410

−2910

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Auto

corr

ela

tion

HMC Sample Autocorrelation Function

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Auto

corr

ela

tion

RMHMC Sample Autocorrelation Function

0 2 4 6 8 10 12 14 16 18 20−0.2

0

0.2

0.4

0.6

0.8

Lag

Sam

ple

Auto

corr

ela

tion

Gibbs Sample Autocorrelation Function

Figure: Comparison of HMC (left), RMHMC (middle) and GIBBS (right) convergencepaths and autocorrelation plots. Autocorrelation plots are from the stationary chains,i.e. once the chains have converged to the stationary distribution.

Page 85: Mark Girolami's Read Paper 2010

Log-Gaussian Cox Point Process with Latent Field

I The joint density for Poisson counts and latent Gaussian field

p(y, x|µ, σ, β) ∝Y64

i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1

θ (x−µ1)/2)

I Metric tensors

G(θ)i,j =12

trace„

Σ−1θ

∂Σθ

∂θiΣ−1

θ

∂Σθ

∂θj

«G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )

I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional

I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain

I Manifold methods requires no pilot tuning or additional transformations

Page 86: Mark Girolami's Read Paper 2010

Log-Gaussian Cox Point Process with Latent Field

I The joint density for Poisson counts and latent Gaussian field

p(y, x|µ, σ, β) ∝Y64

i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1

θ (x−µ1)/2)

I Metric tensors

G(θ)i,j =12

trace„

Σ−1θ

∂Σθ

∂θiΣ−1

θ

∂Σθ

∂θj

«G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )

I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional

I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain

I Manifold methods requires no pilot tuning or additional transformations

Page 87: Mark Girolami's Read Paper 2010

Log-Gaussian Cox Point Process with Latent Field

I The joint density for Poisson counts and latent Gaussian field

p(y, x|µ, σ, β) ∝Y64

i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1

θ (x−µ1)/2)

I Metric tensors

G(θ)i,j =12

trace„

Σ−1θ

∂Σθ

∂θiΣ−1

θ

∂Σθ

∂θj

«G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )

I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional

I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain

I Manifold methods requires no pilot tuning or additional transformations

Page 88: Mark Girolami's Read Paper 2010

Log-Gaussian Cox Point Process with Latent Field

I The joint density for Poisson counts and latent Gaussian field

p(y, x|µ, σ, β) ∝Y64

i,jexp{yi,jxi,j−m exp(xi,j )} exp(−(x−µ1)TΣ−1

θ (x−µ1)/2)

I Metric tensors

G(θ)i,j =12

trace„

Σ−1θ

∂Σθ

∂θiΣ−1

θ

∂Σθ

∂θj

«G(x) = Λ + Σ−1

θ

where Λ is diagonal with elements m exp(µ+ (Σθ)i,i )

I Latent field metric tensor defining flat manifold is 4096× 4096, O(N3)obtained from parameter conditional

I MALA requires transformation of latent field to sample - with separatetuning in transient and stationary phases of Markov chain

I Manifold methods requires no pilot tuning or additional transformations

Page 89: Mark Girolami's Read Paper 2010

RMHMC for Log-Gaussian Cox Point Processes

Table: Sampling the latent variables of a Log-Gaussian Cox Process - Comparison ofsampling methods

Method Time ESS (Min, Med, Max) s/Min ESS Rel. SpeedMALA (Transient) 31,577 (3, 8, 50) 10,605 ×1MALA (Stationary) 31,118 (4, 16, 80) 7836 ×1.35

mMALA 634 (26, 84, 174) 24.1 ×440RMHMC 2936 (1951, 4545, 5000) 1.5 ×7070

Page 90: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 91: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 92: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 93: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 94: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 95: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 96: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 97: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 98: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 99: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 100: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 101: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 102: Mark Girolami's Read Paper 2010

Conclusion and Discussion

I Geometry of statistical models harnessed in Monte Carlo methods

I Diffusions that respect structure and curvature of space - Manifold MALA

I Geodesic flows on model manifold - RMHMC - generalisation of HMC

I Assessed on correlated & high-dimensional latent variable models

I Promising capability of methodology

I Ongoing development

I Potential bottleneck at metric tensor and Christoffel symbols

I Theoretical analysis of convergence

I Investigate alternative manifold structures

I Design and effect of metric

I Optimality of Hamiltonian flows as local geodesics

I Alternative transition kernels

I No silver bullet or cure all - new powerful methodology for MC toolkit

Page 103: Mark Girolami's Read Paper 2010

Funding Acknowledgment

I Girolami funded by EPSRC Advanced Research Fellowship and BBSRCproject grant

I Calderhead supported by Microsoft Research PhD Scholarship