Download ppt - Jun Liu Department of Statistics Stanford University

04/22/23 MCMC and Statistics 1

Jun LiuDepartment of Statistics

Stanford University

Based on the joint work with F. Liang and W.H. Wong.


The Basic Problems of Monte CarloThe Basic Problems of Monte Carlo

• Draw random variable

• Estimate the integral

X x~ ( )

I f d E f ( ) ( ) ( )x x x X

Sometimes with unknown normalizing constant


How to Sample from (x)• The Inversion Method. If U ~ Unif (0,1) then

• The Rejection Method.– Generate x from g(x);– Draw u from unif(0,1);– Accept x if– The accepted x follows (x).

X F U F 1( ) , ~ where is the cdf of .

c g(x)

(x)

x

u cg(x)c

u< ( )/cg( ) x x

The “envelope” distrn


High Dimensional Problems?High Dimensional Problems?

X

( ),x x Lattice points,

11

( ) exp{ ( )}X X Z 1 Eng

where Eng( )~

X 1T

x x

Metropolis Algorithm:(a) pick a lattice point, say , at random(b) change current xto 1- x(so X(t) X*)(c) compute r= (X*)/ (X(t) )(d) make the acceptance/rejection decision.

Ising Model

Partitionfunction


General Metropolis-Hastings RecipeGeneral Metropolis-Hastings Recipe

• Start with any X(0)=x0, and a “proposal chain” T(x,y)

• Suppose X(t)=xt . At time t+1, – Draw y~T(xt ,y) (i.e., propose a move for the next step)– Compute the Metropolis ratio (or “goodness” ratio)

– Acceptance/Rejection decision: : Let

ry T x yx T y x

t

t

( ) ( , )( ) ( , )

Xy p= { ,r}x p

t

t

( ) min

1 11

, with , with

“Thinning down”


• The detailed balance

( ) ( , ) min ,( ) ( , )( ) ( , )

min ( ) ( , ), ( ) ( , )

( ) ( , ) min ,( ) ( , )( ) ( , )

x T x yy T y xx T x y

x T x y y T y x

y T y xx T x yy T y x

1

1

Actual transition probabilityfrom x to y, where x y

Transition probabilityfrom y to x.


General Markov Chain SimulationGeneral Markov Chain Simulation

• Question: how to simulate from a target distribution (X) via Markov chain?

• Key: find a transition function A(X,Y) so that f0 An

that is, is an invariant distribution of A.• Different from traditional Markov Chain

theory.


If the actual transition probability is

( ) ( , )y x ywhere (x,y) is a symmetric function of x,y,Then the chain has (x) as its invariant distribution.

T x y yy T y xx T x y

T x yy

T y xx( , ) min , ( ) min ,( ) ( , )

( ) ( , )( , )( )

( , )( )1

I learnt it from Stein


• The moves are very “local”• Tend to be trapped in a local mode.

( ) .( )

.( )

x e ex y x Y

12

20 25

2

212

5 2

0 255 2

22


Other Approaches?• Gibbs sampler/Heat Bath: better or worse?• Random directional search --- should be

better if we can do it. “Hit-and-run.”• Adaptive directional sampling (ADS)

(Gilks, Roberts and George, 1994).

Iteration t

xc

xaMultiplechains


Gibbs Sampler/Heat Bath

• Define a “neighborhood” structure N(x)– can be a line, a subspace, trace of a group, etc.

• Sample from the conditional distribution.• Conditional Move

A chosen direction

)(|)()(~oldxNxnew xxpx


How to sample along a line?

• What is the correct conditional distribution?

– Random direction:

– Directions chosen a priori: the same as above– In ADS?

p t t( ) ( ) x r x x x r ' t

x x x r x xc c c a ct ' ( , )

p t t tda( ) | | ( ) 1 x r


The Snooker Theorem• Suppose x~ and y is any point in the d-dim

space. Let r=(x-y)/|x-y|. If t is drawn from

Then

follows the target distribution .

p t t y td( ) | | ( ) 1r

x y t' r

x y (anchor)

)(tp

If y is generated from distr’n, the new point x’ is indep. of y.


Connection with transformation group

• WLOG, we let y=0.• The move is now: x x’=t x

The set {t: t0} forms a transformation group.

Liu and Wu (1999) show that if t is drawn from

p t t J H dtt( ) ( ) | ( )| ( ) x x

Then the move is invariant with respect to .


Another Hurdle • How to draw from something like

• Adaptive rejection? Approximation? Griddy Gibbs?

• M-H Independence Sampler (Hastings, 1970)

– need to draw from something that is close enough to p(x).

p t t td( ) | | ( ) 1 y r


• Propose bigger jumps – may be rejected too often

• Proposal with mix-sized stepsizes.• Try multiple times and select good one(s)

(“bridging effect”) (Frankel & Smit, 1996)• Is it still a valid MCMC algorithm?


• Draw y1,…,yk from the proposal T(x, y) .

• Select Y=yj with probability (yj)T(yj,x).

• Draw from T(Y, x). Let• Accept the proposed yj with probability

Current is at x

*1

*1 ,, kxx xxk *

),()(),()(

),()(),()( ,1min***

1*1

11

jkkj

kk

yxTxyxTxxyTyxyTyp

Can be dependent ones


A Modification

• If T(x,y) is symmetric, we can have a different rejection probability:

)()()()( ,1min

**1

1

k

k

xxyyp

Ref: Frankel and Smit (1996)


Random Ray Monte Carlo:

xy1

y2

y3y4y5

• Propose random direction• Pick y from y1 ,…, y5

• Correct for the MTM bias

Back to the example


An Interesting Twist

• One can choose multiple tries semi-deterministically.

Random equal grids

y1y2

y3y4

y5y6

y7y8x

*1x

*3x *

4x*5x *

6x *7x *

8x*2x y

•Pick y from y1 ,…, y8

•The correction rule is the same:

)()()()( ,1min

*8

*1

81

xxyyp


Use Local Optimization in MCMC• The ADS formulation is powerful, but its

direction is too “random.”• How to make use of their framework?

– Population of samples– Randomly select to be updated.– Use the rest to determine an “anchor point”

• Here we can use local optimization techniques;

• Use MTM to draw sample along the line, with the help of the Snooker Theorem.

},,{ )()1( mttt xxS

)(ctx

ty


xc xa

Distribution contour

A gradient or conjugate gradient direction.

(anchor point)


Numerical Examples

• An easy multimodal problem

19.9.1

,44

19.9.1

,66

),0(31

2222 NNIN



A More DifficultTest Example• Mixture of 2 Gaussians:

• MTM with CG can sample the distribution.• The Random-Ray also worked well.• The standard Metropolis cannot get across.

),5(),()( 5232

5231 II0 NNx


Fitting a Mixture model

• Likelihood:

• Prior: uniform in all, but with constraints

L y y py

n ji j

jji

n

( | , , )

11

3

1

FHGIKJ

RS|T|UV|W|

(log , log ; , , ,2, )p p jj j1 2 1 3

min321 ; ; jjpAnd each group has at least one data point.



Bayesian Neural Network Training

• Setting: Data =

• 1-hidden layer feed-forward NN Model

• Objective function for optimization:

)},(,),,(),,{( 2211 nnyyy xxx

Nonlinear curve fitting: ttt fy )(x

).()(ˆ1

jTt

M

jjtxf x

y

Mh1x px

1h

),0|(),0|(

)()),(ˆ|(22

22

NN

gfyNP tt

x


Liang and Wong (1999) proposed a method that combines the snooker theorem, MTM, exchange MC, and genetic algorithm.

Activation function: tanh(z)# hidden units M=2