04/22/23 MCMC and Statistics 1
Jun LiuDepartment of Statistics
Stanford University
Based on the joint work with F. Liang and W.H. Wong.
04/22/23 MCMC and Statistics 2
The Basic Problems of Monte CarloThe Basic Problems of Monte Carlo
• Draw random variable
• Estimate the integral
X x~ ( )
I f d E f ( ) ( ) ( )x x x X
Sometimes with unknown normalizing constant
04/22/23 MCMC and Statistics 3
How to Sample from (x)• The Inversion Method. If U ~ Unif (0,1) then
• The Rejection Method.– Generate x from g(x);– Draw u from unif(0,1);– Accept x if– The accepted x follows (x).
X F U F 1( ) , ~ where is the cdf of .
c g(x)
(x)
x
u cg(x)c
u< ( )/cg( ) x x
The “envelope” distrn
04/22/23 MCMC and Statistics 4
High Dimensional Problems?High Dimensional Problems?
X
( ),x x Lattice points,
11
( ) exp{ ( )}X X Z 1 Eng
where Eng( )~
X 1T
x x
Metropolis Algorithm:(a) pick a lattice point, say , at random(b) change current xto 1- x(so X(t) X*)(c) compute r= (X*)/ (X(t) )(d) make the acceptance/rejection decision.
Ising Model
Partitionfunction
04/22/23 MCMC and Statistics 5
General Metropolis-Hastings RecipeGeneral Metropolis-Hastings Recipe
• Start with any X(0)=x0, and a “proposal chain” T(x,y)
• Suppose X(t)=xt . At time t+1, – Draw y~T(xt ,y) (i.e., propose a move for the next step)– Compute the Metropolis ratio (or “goodness” ratio)
– Acceptance/Rejection decision: : Let
ry T x yx T y x
t
t
( ) ( , )( ) ( , )
Xy p= { ,r}x p
t
t
( ) min
1 11
, with , with
“Thinning down”
04/22/23 MCMC and Statistics 6
• The detailed balance
( ) ( , ) min ,( ) ( , )( ) ( , )
min ( ) ( , ), ( ) ( , )
( ) ( , ) min ,( ) ( , )( ) ( , )
x T x yy T y xx T x y
x T x y y T y x
y T y xx T x yy T y x
1
1
Actual transition probabilityfrom x to y, where x y
Transition probabilityfrom y to x.
04/22/23 MCMC and Statistics 7
General Markov Chain SimulationGeneral Markov Chain Simulation
• Question: how to simulate from a target distribution (X) via Markov chain?
• Key: find a transition function A(X,Y) so that f0 An
that is, is an invariant distribution of A.• Different from traditional Markov Chain
theory.
04/22/23 MCMC and Statistics 8
If the actual transition probability is
( ) ( , )y x ywhere (x,y) is a symmetric function of x,y,Then the chain has (x) as its invariant distribution.
T x y yy T y xx T x y
T x yy
T y xx( , ) min , ( ) min ,( ) ( , )
( ) ( , )( , )( )
( , )( )1
I learnt it from Stein
04/22/23 MCMC and Statistics 9
• The moves are very “local”• Tend to be trapped in a local mode.
( ) .( )
.( )
x e ex y x Y
12
20 25
2
212
5 2
0 255 2
22
04/22/23 MCMC and Statistics 10
Other Approaches?• Gibbs sampler/Heat Bath: better or worse?• Random directional search --- should be
better if we can do it. “Hit-and-run.”• Adaptive directional sampling (ADS)
(Gilks, Roberts and George, 1994).
Iteration t
xc
xaMultiplechains
04/22/23 MCMC and Statistics 11
Gibbs Sampler/Heat Bath
• Define a “neighborhood” structure N(x)– can be a line, a subspace, trace of a group, etc.
• Sample from the conditional distribution.• Conditional Move
A chosen direction
)(|)()(~oldxNxnew xxpx
04/22/23 MCMC and Statistics 12
How to sample along a line?
• What is the correct conditional distribution?
– Random direction:
– Directions chosen a priori: the same as above– In ADS?
p t t( ) ( ) x r x x x r ' t
x x x r x xc c c a ct ' ( , )
p t t tda( ) | | ( ) 1 x r
04/22/23 MCMC and Statistics 13
The Snooker Theorem• Suppose x~ and y is any point in the d-dim
space. Let r=(x-y)/|x-y|. If t is drawn from
Then
follows the target distribution .
p t t y td( ) | | ( ) 1r
x y t' r
x y (anchor)
)(tp
If y is generated from distr’n, the new point x’ is indep. of y.
04/22/23 MCMC and Statistics 14
Connection with transformation group
• WLOG, we let y=0.• The move is now: x x’=t x
The set {t: t0} forms a transformation group.
Liu and Wu (1999) show that if t is drawn from
p t t J H dtt( ) ( ) | ( )| ( ) x x
Then the move is invariant with respect to .
04/22/23 MCMC and Statistics 15
Another Hurdle • How to draw from something like
• Adaptive rejection? Approximation? Griddy Gibbs?
• M-H Independence Sampler (Hastings, 1970)
– need to draw from something that is close enough to p(x).
p t t td( ) | | ( ) 1 y r
04/22/23 MCMC and Statistics 16
• Propose bigger jumps – may be rejected too often
• Proposal with mix-sized stepsizes.• Try multiple times and select good one(s)
(“bridging effect”) (Frankel & Smit, 1996)• Is it still a valid MCMC algorithm?
04/22/23 MCMC and Statistics 17
• Draw y1,…,yk from the proposal T(x, y) .
• Select Y=yj with probability (yj)T(yj,x).
• Draw from T(Y, x). Let• Accept the proposed yj with probability
Current is at x
*1
*1 ,, kxx xxk *
),()(),()(
),()(),()( ,1min***
1*1
11
jkkj
kk
yxTxyxTxxyTyxyTyp
Can be dependent ones
04/22/23 MCMC and Statistics 18
A Modification
• If T(x,y) is symmetric, we can have a different rejection probability:
)()()()( ,1min
**1
1
k
k
xxyyp
Ref: Frankel and Smit (1996)
04/22/23 MCMC and Statistics 19
Random Ray Monte Carlo:
xy1
y2
y3y4y5
• Propose random direction• Pick y from y1 ,…, y5
• Correct for the MTM bias
Back to the example
04/22/23 MCMC and Statistics 20
An Interesting Twist
• One can choose multiple tries semi-deterministically.
Random equal grids
y1y2
y3y4
y5y6
y7y8x
*1x
*3x *
4x*5x *
6x *7x *
8x*2x y
•Pick y from y1 ,…, y8
•The correction rule is the same:
)()()()( ,1min
*8
*1
81
xxyyp
04/22/23 MCMC and Statistics 21
Use Local Optimization in MCMC• The ADS formulation is powerful, but its
direction is too “random.”• How to make use of their framework?
– Population of samples– Randomly select to be updated.– Use the rest to determine an “anchor point”
• Here we can use local optimization techniques;
• Use MTM to draw sample along the line, with the help of the Snooker Theorem.
},,{ )()1( mttt xxS
)(ctx
ty
04/22/23 MCMC and Statistics 22
xc xa
Distribution contour
A gradient or conjugate gradient direction.
(anchor point)
04/22/23 MCMC and Statistics 23
Numerical Examples
• An easy multimodal problem
19.9.1
,44
19.9.1
,66
),0(31
2222 NNIN
04/22/23 MCMC and Statistics 24
04/22/23 MCMC and Statistics 25
A More DifficultTest Example• Mixture of 2 Gaussians:
• MTM with CG can sample the distribution.• The Random-Ray also worked well.• The standard Metropolis cannot get across.
),5(),()( 5232
5231 II0 NNx
04/22/23 MCMC and Statistics 26
Fitting a Mixture model
• Likelihood:
• Prior: uniform in all, but with constraints
L y y py
n ji j
jji
n
( | , , )
11
3
1
FHGIKJ
RS|T|UV|W|
(log , log ; , , ,2, )p p jj j1 2 1 3
min321 ; ; jjpAnd each group has at least one data point.
04/22/23 MCMC and Statistics 27
04/22/23 MCMC and Statistics 28
Bayesian Neural Network Training
• Setting: Data =
• 1-hidden layer feed-forward NN Model
• Objective function for optimization:
)},(,),,(),,{( 2211 nnyyy xxx
Nonlinear curve fitting: ttt fy )(x
).()(ˆ1
jTt
M
jjtxf x
y
Mh1x px
1h
),0|(),0|(
)()),(ˆ|(22
22
NN
gfyNP tt
x
04/22/23 MCMC and Statistics 29
Liang and Wong (1999) proposed a method that combines the snooker theorem, MTM, exchange MC, and genetic algorithm.
Activation function: tanh(z)# hidden units M=2
04/22/23 MCMC and Statistics 30