Transcript
Page 1: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 1

Jun LiuDepartment of Statistics

Stanford University

Based on the joint work with F. Liang and W.H. Wong.

Page 2: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 2

The Basic Problems of Monte CarloThe Basic Problems of Monte Carlo

• Draw random variable

• Estimate the integral

X x~ ( )

I f d E f ( ) ( ) ( )x x x X

Sometimes with unknown normalizing constant

Page 3: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 3

How to Sample from (x)• The Inversion Method. If U ~ Unif (0,1) then

• The Rejection Method.– Generate x from g(x);– Draw u from unif(0,1);– Accept x if– The accepted x follows (x).

X F U F 1( ) , ~ where is the cdf of .

c g(x)

(x)

x

u cg(x)c

u< ( )/cg( ) x x

The “envelope” distrn

Page 4: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 4

High Dimensional Problems?High Dimensional Problems?

X

( ),x x Lattice points,

11

( ) exp{ ( )}X X Z 1 Eng

where Eng( )~

X 1T

x x

Metropolis Algorithm:(a) pick a lattice point, say , at random(b) change current xto 1- x(so X(t) X*)(c) compute r= (X*)/ (X(t) )(d) make the acceptance/rejection decision.

Ising Model

Partitionfunction

Page 5: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 5

General Metropolis-Hastings RecipeGeneral Metropolis-Hastings Recipe

• Start with any X(0)=x0, and a “proposal chain” T(x,y)

• Suppose X(t)=xt . At time t+1, – Draw y~T(xt ,y) (i.e., propose a move for the next step)– Compute the Metropolis ratio (or “goodness” ratio)

– Acceptance/Rejection decision: : Let

ry T x yx T y x

t

t

( ) ( , )( ) ( , )

Xy p= { ,r}x p

t

t

( ) min

1 11

, with , with

“Thinning down”

Page 6: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 6

• The detailed balance

( ) ( , ) min ,( ) ( , )( ) ( , )

min ( ) ( , ), ( ) ( , )

( ) ( , ) min ,( ) ( , )( ) ( , )

x T x yy T y xx T x y

x T x y y T y x

y T y xx T x yy T y x

1

1

Actual transition probabilityfrom x to y, where x y

Transition probabilityfrom y to x.

Page 7: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 7

General Markov Chain SimulationGeneral Markov Chain Simulation

• Question: how to simulate from a target distribution (X) via Markov chain?

• Key: find a transition function A(X,Y) so that f0 An

that is, is an invariant distribution of A.• Different from traditional Markov Chain

theory.

Page 8: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 8

If the actual transition probability is

( ) ( , )y x ywhere (x,y) is a symmetric function of x,y,Then the chain has (x) as its invariant distribution.

T x y yy T y xx T x y

T x yy

T y xx( , ) min , ( ) min ,( ) ( , )

( ) ( , )( , )( )

( , )( )1

I learnt it from Stein

Page 9: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 9

• The moves are very “local”• Tend to be trapped in a local mode.

( ) .( )

.( )

x e ex y x Y

12

20 25

2

212

5 2

0 255 2

22

Page 10: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 10

Other Approaches?• Gibbs sampler/Heat Bath: better or worse?• Random directional search --- should be

better if we can do it. “Hit-and-run.”• Adaptive directional sampling (ADS)

(Gilks, Roberts and George, 1994).

Iteration t

xc

xaMultiplechains

Page 11: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 11

Gibbs Sampler/Heat Bath

• Define a “neighborhood” structure N(x)– can be a line, a subspace, trace of a group, etc.

• Sample from the conditional distribution.• Conditional Move

A chosen direction

)(|)()(~oldxNxnew xxpx

Page 12: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 12

How to sample along a line?

• What is the correct conditional distribution?

– Random direction:

– Directions chosen a priori: the same as above– In ADS?

p t t( ) ( ) x r x x x r ' t

x x x r x xc c c a ct ' ( , )

p t t tda( ) | | ( ) 1 x r

Page 13: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 13

The Snooker Theorem• Suppose x~ and y is any point in the d-dim

space. Let r=(x-y)/|x-y|. If t is drawn from

Then

follows the target distribution .

p t t y td( ) | | ( ) 1r

x y t' r

x y (anchor)

)(tp

If y is generated from distr’n, the new point x’ is indep. of y.

Page 14: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 14

Connection with transformation group

• WLOG, we let y=0.• The move is now: x x’=t x

The set {t: t0} forms a transformation group.

Liu and Wu (1999) show that if t is drawn from

p t t J H dtt( ) ( ) | ( )| ( ) x x

Then the move is invariant with respect to .

Page 15: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 15

Another Hurdle • How to draw from something like

• Adaptive rejection? Approximation? Griddy Gibbs?

• M-H Independence Sampler (Hastings, 1970)

– need to draw from something that is close enough to p(x).

p t t td( ) | | ( ) 1 y r

Page 16: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 16

• Propose bigger jumps – may be rejected too often

• Proposal with mix-sized stepsizes.• Try multiple times and select good one(s)

(“bridging effect”) (Frankel & Smit, 1996)• Is it still a valid MCMC algorithm?

Page 17: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 17

• Draw y1,…,yk from the proposal T(x, y) .

• Select Y=yj with probability (yj)T(yj,x).

• Draw from T(Y, x). Let• Accept the proposed yj with probability

Current is at x

*1

*1 ,, kxx xxk *

),()(),()(

),()(),()( ,1min***

1*1

11

jkkj

kk

yxTxyxTxxyTyxyTyp

Can be dependent ones

Page 18: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 18

A Modification

• If T(x,y) is symmetric, we can have a different rejection probability:

)()()()( ,1min

**1

1

k

k

xxyyp

Ref: Frankel and Smit (1996)

Page 19: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 19

Random Ray Monte Carlo:

xy1

y2

y3y4y5

• Propose random direction• Pick y from y1 ,…, y5

• Correct for the MTM bias

Back to the example

Page 20: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 20

An Interesting Twist

• One can choose multiple tries semi-deterministically.

Random equal grids

y1y2

y3y4

y5y6

y7y8x

*1x

*3x *

4x*5x *

6x *7x *

8x*2x y

•Pick y from y1 ,…, y8

•The correction rule is the same:

)()()()( ,1min

*8

*1

81

xxyyp

Page 21: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 21

Use Local Optimization in MCMC• The ADS formulation is powerful, but its

direction is too “random.”• How to make use of their framework?

– Population of samples– Randomly select to be updated.– Use the rest to determine an “anchor point”

• Here we can use local optimization techniques;

• Use MTM to draw sample along the line, with the help of the Snooker Theorem.

},,{ )()1( mttt xxS

)(ctx

ty

Page 22: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 22

xc xa

Distribution contour

A gradient or conjugate gradient direction.

(anchor point)

Page 23: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 23

Numerical Examples

• An easy multimodal problem

19.9.1

,44

19.9.1

,66

),0(31

2222 NNIN

Page 24: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 24

Page 25: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 25

A More DifficultTest Example• Mixture of 2 Gaussians:

• MTM with CG can sample the distribution.• The Random-Ray also worked well.• The standard Metropolis cannot get across.

),5(),()( 5232

5231 II0 NNx

Page 26: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 26

Fitting a Mixture model

• Likelihood:

• Prior: uniform in all, but with constraints

L y y py

n ji j

jji

n

( | , , )

11

3

1

FHGIKJ

RS|T|UV|W|

(log , log ; , , ,2, )p p jj j1 2 1 3

min321 ; ; jjpAnd each group has at least one data point.

Page 27: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 27

Page 28: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 28

Bayesian Neural Network Training

• Setting: Data =

• 1-hidden layer feed-forward NN Model

• Objective function for optimization:

)},(,),,(),,{( 2211 nnyyy xxx

Nonlinear curve fitting: ttt fy )(x

).()(ˆ1

jTt

M

jjtxf x

y

Mh1x px

1h

),0|(),0|(

)()),(ˆ|(22

22

NN

gfyNP tt

x

Page 29: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 29

Liang and Wong (1999) proposed a method that combines the snooker theorem, MTM, exchange MC, and genetic algorithm.

Activation function: tanh(z)# hidden units M=2

Page 30: Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 30


Recommended