Upload
caroline-golden
View
218
Download
0
Embed Size (px)
Citation preview
The Markov Chain Monte Carlo Method
Isabelle StantonMay 8, 2008 Theory Lunch
Monte Carlo vs Las Vegas
Las Vegas Algorithms are randomized and always give the correct results but gamble with computation time
Quicksort
Monte Carlo algorithms have fixed running time but may be wrong
Simulated Annealing Estimating volume
Markov Chains
a memoryless stochastic process, eg, flipping a coin
56
4
32
1
1/6
1/6
1/6
1/6
1/6
1/6
1/6
1/6 1/6
1/6
1/6 1/6
1/6
Other Examples of Markov Chains
Shuffling cards Flipping a coin PageRank Model Particle systems – focus of MCMC work
General Idea
Model the system using a Markov Chain Use a Monte Carlo Algorithm to perform some
computation task
Applications
Approximate Counting - # of solutions to 3-SAT or Knapsack
Statistical Physics – when do phase transitions occur?
Combinatorial optimization – simulated annealing type of algorithms
We'll focus on counting
Monte Carlo Counting How do you estimate the volume
of a complex solid? Render with environment maps
efficiently? Estimate an integral numerically?
(Picnic) Knapsack
Holds 20
weighs 4
weighs 10
weighs 4
weighs 2
weighs 5
What is a solution?How many solutions are there?
Counting Knapsack Solutions
Item weights: a = (a0,...a
n)
Knapsack size: a real number b Estimate the number of {0,1} vectors, x, that
satisfy a*x ≤ b
Let N denote the number of solutions
Naїve Solution
Randomly generate x Calculate a*x If a*x ≤ b return 2n
else return 0
This will return N in expectation: 0*(2n-N) + N*2n / 2n
Is this fast?
Counterexample: a = (1, ... 1) and b = n/3 Any solution has less than n/3 1's There are (n choose n/3)*2n/3 solutions
no
Pr(sample x, ||x|| ≤ n/3) < (n choose n/3)*2-2n/3
In expectation, need to generate 2n/3 x's before we get a single solution!
Any polynomial number of trials will grossly underestimate N
Knapsack with MCMC
Let Mknap
be a markov chain withstate space Ω(b) = {x | a*x ≤ b}
This will allow us to sample a solution
Various Mknap
000
001 010 100
011 101 110
111
a=(0,.5,.5) b = 1.5
a=(0,1,1) b = 1.5
001 010 100
110101011
000000
001 010 100
110101
Mknap
Transitions
Transitions With probability 1/2, x transitions to x Otherwise, select an i u.a.r.
from 0 to n-1 and flip
the ith bit of x.
If x' is a
solution,
transition there.
000
001 010 100
011 101 110
111
001 010 100
110101
000000
001 010 100
110101
a=(0,1,1) b = 1.5
0.5
0.5
0.50.5
0.50.5
1/6 1/61/6
1/61/6
1/61/6
Connected?
Is Mknap
connected?
Yes. To get from x to x' go through 0.
Ergodicity
What is the stationary distribution of Knapsack? Sample each solution with prob 1/N
A MC is ergodic if the probability distribution over the states converges to the stationary distribution of the system, regardless of the starting configuration
Is Mknap
ergodic? Yes.
Algorithm Idea
Start at 0 and simulate Mknap
for enough steps that the distribution over the states is close to uniform
Why does uniformity matter? Does this fix the problem yet?
The trick
Assume that a0 ≤ a
1 ... ≤ a
n (0,1,2,…,n-1,n)
Let b0 = 0 and b
i = min{b, Σia
j}
|Ω(bi-1
)| ≤ |Ω(bi)| - why?
|Ω(bi)| ≤ (n+1)|Ω(b
i-1)| - why?
Change any element of Ω(bi) to one of Ω(bi-1) by switching
the rightmost 1 to a 0
How does that help?
|Ω(b)| = |Ω(bn)| = |Ω(b
n)|/|Ω(b
n-1)| x
|Ω(bn-1
)|/|Ω(bn-2
)| x ... x |Ω(b1)|/Ω|(b
0)| x |Ω(b
0)|
We can estimate each of these ratios by doing a walk on Ω(b
i) and computing the fraction of
samples in Ω(bi-1
)
Good estimate since
|Ω(bi-1
)| ≤ |Ω(bi)| ≤ (n+1)|Ω(b
i-1)|
Analysis
Ignoring bias, the expectation of each trial is |Ω(b
i-1)|/|Ω(b
i)|
We perform t = 17ε-2n2 steps Focus on Var(X)/E(X)^2 in analyzing efficiency
for MCMC methods
Analysis
If Z is the product of the trials, E[Z] = П |Ω(b
i-1)|/|Ω(b
i)|
*Magic Statistics Steps* Var(Z)/(E[Z])2 ≤ ε2/16 By Chebyshev's:
Pr[(1-ε/2)|Ω(b)| ≤ Z ≤ (1+ε/2)|Ω(b)| ] ≥ 3/4
Analysis
We used nt = 17ε-2n3 steps This is a FPRAS (Fully Polynomial Randomized
Approximation Scheme) Except... what assumption did I make?
Mixing Time
Assumption: We are close to the uniform distribution in 17ε-2n2 steps
This is known as the mixing time It is unknown if this distribution mixes in
polynomial time
Mixing Time
What does mix in polynomial time? Dice – 1 transition Shuffling cards – 7 shuffles ferromagnetic Ising model at high temperature –
O(nlog n) What doesn't?
ferromagnetic Ising model at low temperature – starts to form magnets
Wes Weimer Memorial Conclusion Slide
The markov chain monte carlo
method models the problem
as a Markov Chain and then
uses random walks Mixing time is important P# problems are hard Wes likes trespassing