73
Lecture 12: Markov Chain Monte Carlo Ziyu Shao School of Information Science and Technology ShanghaiTech University December 25, 2018 Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 1 / 145 Outline 1 Background of Markov Chain 2 Markov Property and Transition Matrix 3 Classification of States 4 Stationary Distribution & Reversibility 5 Basic Graph Theory 6 Application: PageRank 7 Introduction of Markov Chain Monte Carlo (MCMC) 8 Metropolis–Hastings Algorithm 9 Gibbs Sampler Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 2 / 145

Lecture 12: Markov Chain Monte Carlo

  • Upload
    others

  • View
    7

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lecture 12: Markov Chain Monte Carlo

Lecture 12: Markov Chain Monte Carlo

Ziyu Shao

School of Information Science and TechnologyShanghaiTech University

December 25, 2018

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 1 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 2 / 145

Page 2: Lecture 12: Markov Chain Monte Carlo

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 3 / 145

Model Selection in Stochastic Modeling

Enough complexity to capture the complexity of the phenomena inquestion

Enough structure and simplicity to allow one to compute things ofinterest

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 4 / 145

Page 3: Lecture 12: Markov Chain Monte Carlo

Motivation

Introduced by Andrey Markov

IID sequence of random variables: too restrictive assumption

Completely dependent among random variables: hard to analysis

Markov chain: happy medium between complete independence &complete dependence.

Markov chain: does allow for correlations among random variablesand also has enough structure and simplicity to allow forcomputations to be carried out

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 5 / 145

Markov Process

Three basic components of Markov process

A sequence of random variables {Xt , t ∈ T }, where T is an index set,usually called “time”.

All possible sample values of {Xt , t ∈ T } are called “states”, whichare elements of a sate space S.

“Markov property”: given the present value(information) of theprocess, the future evolution of the process is independent of the pastevolution of the process.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 6 / 145

Page 4: Lecture 12: Markov Chain Monte Carlo

Example: Discrete S & Discrete T

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 7 / 145

Classification of Markov Process

Discrete-Time Markov Chain: Discrete S & Discrete TContinuous-Time Markov Chain: Discrete S & Continuous TDiscrete Markov Process: Continuous S & Discrete TContinuous Markov Process: Continuous S & Continuous T

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 8 / 145

Page 5: Lecture 12: Markov Chain Monte Carlo

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 9 / 145

Markov Chain (DTMC)

Definition

A sequence of random variables X0,X1,X2, ... taking values in the statespace {1, 2, ...,M} is called a Markov chain if for all n ≥ 0,

P (Xn+1 = j |Xn = i ,Xn−1 = in−1, ...,X0 = i0) = P (Xn+1 = j |Xn = i) .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 10 / 145

Page 6: Lecture 12: Markov Chain Monte Carlo

Transition Matrix

Definition

Let X0,X1,X2, ... be a Markov chain with state space {1, 2, ...,M}, and letqij = P(Xn+1 = j |Xn = i) be the transition probability from state i tostate j . The M ×M matrix Q = (qij) is called the transition matrix of thechain.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 11 / 145

Graphical and Matrix Form of Markov Chain

1

2

3

4

0.60.3

0.1

0.4

0.30.2

0.10.8

0.20.2

0.6

0.2Q =

0.1 0.3 0 0.60.4 0.2 0.1 0.30 0.8 0 0.2

0.2 0 0.2 0.6

Row sum = 1 in TransitionMatrix.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 12 / 145

Page 7: Lecture 12: Markov Chain Monte Carlo

n-step Transition Probability

Definition

The n-step transition probability from i to j is the probability of being at j

exactly n steps after being at i . We denote this by q(n)ij :

q(n)ij = P (Xn = j |X0 = i) .

Lemma

Entries in the matrix Qn are probabilities of n-step transitions.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 13 / 145

Proof

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 14 / 145

Page 8: Lecture 12: Markov Chain Monte Carlo

Chapman-Kolmogorov Equation

Theorem (Chapman-Kolmogorov Equation)

For any m ≥ 0, n ≥ 0, i , j ∈ S,

q(n+m)ij =

∑k∈S

q(n)ik q

(m)kj

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 15 / 145

Proof

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 16 / 145

Page 9: Lecture 12: Markov Chain Monte Carlo

Marginal Distribution of Xn

Theorem

Define t = (t1, t2, ..., tM) by ti = P(X0 = i), and view t as a row vector.Then the marginal distribution of Xn is given by the vector tQn. That is,the jth component of tQn is P(Xn = j).

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 17 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 18 / 145

Page 10: Lecture 12: Markov Chain Monte Carlo

Recurrent and Transient States

Definition

State i of a Markov chain is recurrent if starting from i , the probability is1 that the chain will eventually return to i . Otherwise, the state istransient, which means that if the chain starts from i , there is a positiveprobability of never returning to i .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 19 / 145

Example

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 20 / 145

Page 11: Lecture 12: Markov Chain Monte Carlo

Irreducible and Reducible Chain

Definition

A Markov chain with transition matrix Q is irreducible if for any twostates i and j , it is possible to go from i to j in a finite number of steps(with positive probability). That is, for any states i ,j there is some positiveinteger n such that the (i , j) entry of Qn is positive. A Markov chain thatis not irreducible is called reducible.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 21 / 145

Irreducible Implies All States Recurrent

Theorem

In an irreducible Markov chain with a finite state space, all states arerecurrent.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 22 / 145

Page 12: Lecture 12: Markov Chain Monte Carlo

A Reducible Markov Chain with Recurrent States

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 23 / 145

Gambler’s Ruin As A Markov Chain

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 24 / 145

Page 13: Lecture 12: Markov Chain Monte Carlo

Coupon Collector As A Markov Chain

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 25 / 145

Period

Definition

The period of a state i in a Markov chain is the greatest common divisor(gcd) of the possible numbers of steps it can take to return to i whenstarting at i . That is, the period of i is the greatest common divisor ofnumbers n such that the (i , i) entry of Qn is positive. (The period of i isundefined if it’s impossible ever to return to i after starting at i .) A stateis called aperiodic if its period equals 1, and periodic otherwise. The chainitself is called aperiodic if all its states are aperiodic, and periodicotherwise.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 26 / 145

Page 14: Lecture 12: Markov Chain Monte Carlo

Example

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 27 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 28 / 145

Page 15: Lecture 12: Markov Chain Monte Carlo

Definition

Definition

A row vector s = (s1, ..., sM) such that si ≥ 0 and∑

i si = 1 is a stationarydistribution for a Markov chain with transition matrix Q if∑

i

siqij = sj .

for all j , or equivalently,sQ = s.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 29 / 145

Example: Two-State Markov Chain

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 30 / 145

Page 16: Lecture 12: Markov Chain Monte Carlo

Existence and Uniqueness of Stationary Distribution

Theorem

Any irreducible Markov chain has a unique stationary distribution. In thisdistribution, every state has positive probability.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 31 / 145

Reversibility

Definition

Let Q = (qij) be the transition matrix of a Markov chain. Suppose there iss = (s1, ..., sM) with si ≥ 0,

∑i si = 1, such that

siqij = sjqji

for all states i and j . This equation is called the reversibility or detailedbalance condition, and we say that the chain is reversible with respect to sif it holds.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 32 / 145

Page 17: Lecture 12: Markov Chain Monte Carlo

Check the Detailed Balance Equation

Theorem

If for an irreducible Markov chain with transition matrix Q = (qi ,j), thereexists a probability solution π to the detailed balance equations

πiqi ,j = πjqj ,i

for all pairs of states i , j , then this Markov chain is positive recurrent,time-reversible and the solution π is the unique stationary distribution.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 33 / 145

Example: Simple Random Walk

Consider a negative drift simple random walk, restricted to benon-negative, in which q0,1 = 1, qi ,i+1 = p < 0.5, qi ,i−1 = 1− p for i ≥ 1.Corresponding state diagram is:

0 1 2 · · · i i + 1

1

1− p

p p

1− p 1− p

p

1− p

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 34 / 145

Page 18: Lecture 12: Markov Chain Monte Carlo

Example: Simple Random Walk

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 35 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 36 / 145

Page 19: Lecture 12: Markov Chain Monte Carlo

Modeling Language: Graph Theory

Origin: 1735 Euler for Seven Bridges of Konigsberg

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 37 / 145

Key Elements of A Graph

A graph is an ordered pair G = (V ,E )

V : a set of vertices or nodes

E : a set of edges or links between nodes

Edge: undirected/directed

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 38 / 145

Page 20: Lecture 12: Markov Chain Monte Carlo

Undirected Graph

Degree of vertex v : metric for connectivity of vertex v .

deg(v): the number of edges with v as an end vertex

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 39 / 145

Directed Graph

Indegree of vertex v : the number of incoming edges ends at v .

Outdegree of vertex v : the number of outgoing edges starting from v .

I (v): indegree of v

O(v): outdegree of v

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 40 / 145

Page 21: Lecture 12: Markov Chain Monte Carlo

Adjacency Matrix

A square (0, 1)−matrix to represent a finite graph

Matrix elements: pairs of vertices are adjacent or not

Symmetric matrix: for undirected graph

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 41 / 145

Adjacency Matrix: Directed Graph

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 42 / 145

Page 22: Lecture 12: Markov Chain Monte Carlo

Adjacency Matrix of Nauru Graph

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 43 / 145

Adjacency Matrix of Cayley Graph

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 44 / 145

Page 23: Lecture 12: Markov Chain Monte Carlo

Random Walk on Undirected Graph

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 45 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 46 / 145

Page 24: Lecture 12: Markov Chain Monte Carlo

How to Organize the Web?

First Try: Web Directories

Yahoo, DMOZ, LookSmart

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 47 / 145

Yahoo

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 48 / 145

Page 25: Lecture 12: Markov Chain Monte Carlo

How to Organize the Web?

Second Try: Web Search

Information Retrieval:I find relevant docs in a small and trusted setI newspaper articles, patents, etc

Hardness: web is huge, full of untrusted documents, random things,web spam, etc.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 49 / 145

Challenges for Web Search

Web contains many sources of information. Who to trust?I Trick: Trustworthy pages may point to each other!

What is the best answer to query keywords?I Webpages are not equally important (www.nothing.com

vs.www.stanford.edu)I Trick: rank pages containing keywords according to their importances

(popularity)I Find the page with the highest rankI How to rank?

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 50 / 145

Page 26: Lecture 12: Markov Chain Monte Carlo

World Wide Web as A Graph

Web as a directed graph

Nodes: webpages

Edges: hyperlinks

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 51 / 145

Web as A Directed Graph

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 52 / 145

Page 27: Lecture 12: Markov Chain Monte Carlo

Milestones

1998: Larry Page (1973-) and Sergey Brin (1973-) inventedPageRank algorithm and then founded Google.

1998: Jon Kleinberg (1971-) invented Hyperlink-Induced TopicSearch (HITS) algorithm.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 53 / 145

Links as Votes

Page is more important if it has more links

Incoming links or outgoing links?

Think of incoming links as votes:I www.stanford.edu has 23400 incoming linksI www.nothing.com has 1 incoming link

Are all in-links are equal?I Links from important pages count moreI Recursive question!

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 54 / 145

Page 28: Lecture 12: Markov Chain Monte Carlo

Example: PageRank Scores

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 55 / 145

Simple Recursive Formulation

Each link’s vote is proportional to the importance of its source page.

If page j with importance rj has n out-links, each link gets rj/n votes

Page j’s own importance is the sum of the votes on its in-links

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 56 / 145

Page 29: Lecture 12: Markov Chain Monte Carlo

Example

rj = ri/3 + rk/4.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 57 / 145

PageRank: The “Flow” Model

A “vote” from an important page is worth more

A page is important if it is pointed to by other important pages

Define a rank rj for page j :

rj =∑i→j

riOi

where Oi is the outdegree of i .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 58 / 145

Page 30: Lecture 12: Markov Chain Monte Carlo

Example: Flow Equation

ry = ry/2 + ra/2

ra = ry/2 + rm

rm = ra/2

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 59 / 145

Solving the Flow Equations

Additional constraint forces uniqueness:I ry + ra + rm = 1.I solution: ry = 2/5, ra = 2/5, rm = 1/5.

Gaussian elimination method works for small examples

We need a better method for large web-size graphs

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 60 / 145

Page 31: Lecture 12: Markov Chain Monte Carlo

PageRank: Matrix Formulation

Adjacency matrix QI Each page i has Oi out-linksI If i → j , then Qi,j = 1

Oi, else Qi,j = 0.

Q is a stochastic matrix

Row sum to 1.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 61 / 145

PageRank: Matrix Formulation

Rank vector rI Vector with an entry per pageI ri is the importance score of page iI∑

i ri = 1

The flow equations can be written

r = r · Q

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 62 / 145

Page 32: Lecture 12: Markov Chain Monte Carlo

Example

ry = ry/2 + ra/2

ra = ry/2 + rm

rm = ra/2

Q =

y a m

y 12

12 0

a 12 0 1

2m 0 1 0

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 63 / 145

Random Walk Interpretation

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 64 / 145

Page 33: Lecture 12: Markov Chain Monte Carlo

The Stationary Distribution

r · Q = r

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 65 / 145

Power Iteration Method

Given a web graph with n nodes, where the nodes are pages andedges are hyperlinks.

Power iteration: a simple iterative schemeI Suppose there are N web pagesI Initialize: r(0) = [ 1

N , . . . ,1N ].

I Iterate: r(t + 1) = r(t) · Q.I Stop when |r(t + 1)− r(t)|1 < ε

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 66 / 145

Page 34: Lecture 12: Markov Chain Monte Carlo

The Google Formulation

r(t+1)j =

∑i→j

r(t)i

Oi=⇒ r(t+1) = r(t) · Q

Does this converge?Converge to what we want?Result reasonable?

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 67 / 145

Example 1: Spider Traps

A B

r(t+1)A = r

(t)B

r(t+1)B = r

(t)A

rA → 1 0 1 0 . . .rB → 0 1 0 1 . . .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 68 / 145

Page 35: Lecture 12: Markov Chain Monte Carlo

Example 2: Dead End

A B

rA → 1 0 0 0 . . .rB → 0 1 0 0 . . .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 69 / 145

Observations

Dead End

Spider Trap

Some pages are dead ends (no out-links)

Spider traps (all out-links are within the group)

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 70 / 145

Page 36: Lecture 12: Markov Chain Monte Carlo

Example

y

a m

y a my 1/2 1/2 0a 1/2 0 0m 0 1/2 1

=⇒ry → 1/3 2/6 3/12 . . . 0ra → 1/3 1/6 2/12 . . . 0rm → 1/3 3/6 7/12 . . . 1

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 71 / 145

Google’s Solution for Spider Trap

Teleports!

At each time, walker at page i has two options{w.p. β Follow a out-link at random 1

Oi

w.p. 1− β Jump to some random pages(1)

β = (0.8, 0.9)

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 72 / 145

Page 37: Lecture 12: Markov Chain Monte Carlo

Google’s Solution for Dead Ends

y

a m

y a my 1/2 1/2 0a 1/2 0 1/2m 0 0 1

=⇒

y a my 1/2 1/2 0a 1/2 0 1/2m 1/3 1/3 1/3

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 73 / 145

Random Teleports

Pagerank Equation:

rj =∑i→j

β · riOi

+ (1− β) · 1

N

Google Matrix

G = β · Q + (1− β)

[1

N

]N×N

r = r · G

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 74 / 145

Page 38: Lecture 12: Markov Chain Monte Carlo

Random Teleports

Google Matrix

Q → G = β · Q + (1− β)

[1

N

]N×N

y

a m

y

a m

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 75 / 145

Do Some Calculation

Let β = 0.8.

Q = 0.8 ·

1/2 1/2 01/2 0 1/2

0 0 1

+ 0.2 ·

1/3 1/3 1/31/3 1/3 1/31/3 1/3 1/3

As a result,

G =

7/15 7/15 1/157/15 1/15 7/151/15 1/15 13/15

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 76 / 145

Page 39: Lecture 12: Markov Chain Monte Carlo

Implementation of PageRank in Practice

Google’s Team with the leader Jeff Dean developed several tools

BigTable: distributed storage system

GFS (Google File System): distributed file system

Spanner: distributed database

Mapreduce: distributed computing system (followed by Hadoop &Spark)

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 77 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 78 / 145

Page 40: Lecture 12: Markov Chain Monte Carlo

Motivation of MCMC

Previously we know we can generate desired distributions based onUnif(0,1) distribution.

Universality of the Uniform depends partly on how tractable it is tocompute the inverse CDF.

In general it is hard to compute CDF and its inverse.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 79 / 145

Motivation of MCMC

Markov Chain Monte Carlo (MCMC): build your own Markov chain(X0,X1,. . .) so that the desired distribution π is the stationarydistribution of the chain.

Forward & Reverse Engineering PerspectiveI Forward engineering: given the transition matrix P, find the

stationary distribution of Markov chain.I Reverse engineering: given a distribution π that we want to simulate,

we will engineer a Markov chain whose stationary distribution is π.Then run this engineered Markov chain for a long time, the distributionof the chain will approach π.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 80 / 145

Page 41: Lecture 12: Markov Chain Monte Carlo

MCMC: Sampling Perspective

Sampling from distribution π

Utilizes Markov sequences (samples) to effectively simulate from whatwould otherwise be intractable distributions.

With samples, we can further compute sample mean & samplevariance & other sample functions to approximate the value ofinteresting statistics

Guaranteed by the Strong Law of Large Numbers

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 81 / 145

Strong Law of Large Numbers for I.I.D. Sequences

If Y1,Y2, · · · is an i.i.d. sequence with common mean µ <∞, then thestrong law of large numbers says that, with probability 1,

limn→∞

Y1 + · · ·+ Yn

n= µ.

Equivalently, let Y be a random variable with the same distribution as theYi and assume that r is a bounded, real-valued function. Then,r(Y1), r(Y2), · · · is also an i.i.d. sequence with finite mean, and, withprobability 1,

limn→∞

r(Y1) + · · ·+ r(Yn)

n= E (r(Y )).

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 82 / 145

Page 42: Lecture 12: Markov Chain Monte Carlo

Strong Law of Large Numbers for Markov Chains

Theorem

Assume that X0,X1, · · · is an ergodic Markov chain with stationarydistribution π. Let r be a bounded, real-valued function. Let X be arandom variable with distribution π. Then, with probability 1,

limn→∞

r(X1) + · · ·+ r(Xn)

n= E (r(X )).

where E (r(X )) =∑

j r(j)πj .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 83 / 145

Theoretical Foundations of MCMC

All MCMC algorithms construct reversible (time-reversible) Markovchain: detailed balance equations help us.

Theorem

If for an irreducible Markov chain with transition matrix Q = (qi ,j), thereexists a probability solution π to the detailed balance equations

πiqi ,j = πjqj ,i

for all pairs of states i , j , then this Markov chain is positive recurrent,time-reversible and the solution π is the unique stationary distribution.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 84 / 145

Page 43: Lecture 12: Markov Chain Monte Carlo

Example: Binary Sequences with No Adjacent 1s

Consider sequences of length m consisting of 0s and 1s. Call a sequencegood if it has no adjacent 1s. What is the expected number of 1s in agood sequence if all good sequences are equal likely?

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 85 / 145

MCMC Approach

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 86 / 145

Page 44: Lecture 12: Markov Chain Monte Carlo

MCMC Approach

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 87 / 145

MCMC Approach

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 88 / 145

Page 45: Lecture 12: Markov Chain Monte Carlo

Two Main Methods of MCMC

Metropolis-Hastings algorithm: this method generates a random walkusing a proposal distribution and a method for rejecting some of theproposed moves.

Gibbs Sampling: this method requires all the conditional distributionsof the target distribution to be sampled exactly. When drawing fromthe full-conditional distributions is not straightforward, othersamplers-within-Gibbs are used.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 89 / 145

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 90 / 145

Page 46: Lecture 12: Markov Chain Monte Carlo

Basic Idea

Proposed by Nicholas Metropolis in 1953 & further developed byWilfred Keith Hastings in 1970.

Start with any irreducible Markov chain on the state space of interest

Then modify it into a new Markov chain with desired stationarydistribution

Modification: moves are proposed according to the original chain, butthe proposal may or may not accepted.

Art: choice of the probability of accepting the proposal

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 91 / 145

Algorithm for DTMC

Let π = (π1, ..., πM) be a desired stationary distribution on state space{1, ...,M}. Assume that πi > 0 for all i (if not, just delete any states iwith si = 0 from the state space). Suppose that P = (pij) is the transitionmatrix for a Markov chain on state space {1, ...,M}. Intuitively, P is aMarkov chain that we know how to run but that doesn’t have the desiredstationary distribution. Now we will modify P to construct a Markovchain.

Use the original transition probabilities pij to propose where to go next

Then accept the proposal with probability aij

Staying in the current state in the event of a rejection.

Normalization of π does not need to be known.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 92 / 145

Page 47: Lecture 12: Markov Chain Monte Carlo

Algorithm 1 Metropolis-Hastings

Require:Stationary distribution π = (π1, ..., πM);Original transition matrix P = (pij);State X0 (chosen randomly or deterministically);

Ensure:Modified transition matrix P ′ = (p′

ij);1: repeat2: If Xn = i , propose a new state j using the transition probabilities in the ith

row of the original transition matrix P;

3: Compute the acceptance probability aij = min(πjpjiπipij

, 1)

;

4: Flip a coin that lands Heads with probability aij ;5: If the coin lands Heads, accept the proposal (i.e., go to j), setting Xn+1 = j .

Otherwise, reject the proposal (i.e., stay at i), setting Xn+1 = i ;6: until Convergence;7: return P ′;

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 93 / 145

Metropolis–Hastings Algorithm

Theorem

The sequence X0,X1, · · · constructed by the Metropolis–Hastingsalgorithm is a reversible Markov chain with stationary distribution π.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 94 / 145

Page 48: Lecture 12: Markov Chain Monte Carlo

Remarks

The exact form of π is not necessary to implementMetropolis–Hastings. The algorithm only uses ratios of the form

πjπi

.Thus, π needs only to be specified up to proportionality.

If the proposal transition matrix P is symmetric, aij = min(πjπi, 1)

.

The algorithm works for any irreducible proposal chain (originalchain). Thus, the user has wide latitude to find a proposal chain thatis efficient in the context of their problem.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 95 / 145

Remarks

The generated sequence X0,X1, . . . ,Xn gives an approximate samplefrom π.

If the chain requires many steps to get close to stationarity, there maybe initial bias.

Burn-in: the practice of discarding the initial iterations and retainingXm,Xm+1, . . . ,Xn, for some m.

The strong laws of large numbers for Markov chains:

limn→∞

r(Xm) + · · ·+ r(Xn)

n −m + 1= E (r(X )) =

∑x

r(x)πx .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 96 / 145

Page 49: Lecture 12: Markov Chain Monte Carlo

Example: Zipf Distribution Simulation

Let M ≥ 2 be an integer. An r.v. X has the Zipf distribution withparameter a > 0 if its PMF is

P (X = k) =1/ka∑M

i=1 (1/ja),

for k = 1, 2, ...,M (and 0 otherwise). This distribution is widely used inlinguistics for studying frequencies of words.Create a Markov chain X0,X1, ... whose stationary distribution is the Zipfdistribution, and such that |Xn+1 − Xn| ≤ 1 for all n. Your answer shouldprovide a simple, precise description of how each move of the chain isobtained, i.e., how to transit from Xn to Xn+1 for each n.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 97 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 98 / 145

Page 50: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 99 / 145

Example: Knapsack Problem

m treasures with labels from 1 to m, where the jth treasure is worthgj gold pieces and weighs wj pounds.

The maximum weight we can carry is w pounds.

We must choose a vector x = (x1, ..., xm), where xj is 1 if we choosethe jth treasure and 0 otherwise, such that the total weight of thetreasures j with xj = 1 is at most w .

Let C be the space of all such vectors, so C consists of all binaryvectors (x1, ..., xm) with

∑mj=1 xjwj ≤ w .

We wishes to maximize the total worth of the treasure we takes.

Finding an optimal solution is an extremely difficult problem, knownas the knapsack problem, which has a long history in computerscience.

A brute force solution would be completely infeasible in general. Howabout MCMC?

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 100 / 145

Page 51: Lecture 12: Markov Chain Monte Carlo

Example: Knapsack Problem

(a) Consider the following Markov chain. Start at (0, 0, ..., 0). One moveof the chain is as follows. Suppose the current state is x = (x1, ..., xm).Choose a uniformly random J in {1, 2, ...,m}, and obtain y from x byreplacing xJ with 1− xJ (i.e., toggle whether that treasure will betaken). If y is not in C , stay at x ; if y is in C , move to y . Show thatthe uniform distribution over C is stationary for this chain.

(b) Show that the chain from (a) is irreducible, and that it may or maynot be aperiodic (depending on w ,w1, ...,wm).

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 101 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 102 / 145

Page 52: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 103 / 145

Example: Knapsack Problem

(c) The chain from (a) is a useful way to get approximately uniformsolutions, but Bilbo is more interested in finding solutions where thevalue (in gold pieces) is high. In this part, the goal is to construct aMarkov chain with a stationary distribution that puts much higherprobability on any particular high-value solution than on any particularlow-value solution. Specifically, suppose that we want to simulatefrom the distribution

π (x) ∝ eβV (x),

where V (x) =∑m

j=1 xjgj is the value of x in gold pieces and β is apositive constant. The idea behind this distribution is to giveexponentially more probability to each high-value solution than toeach low-value solution. Create a Markov chain whose stationarydistribution is as desired.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 104 / 145

Page 53: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 105 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 106 / 145

Page 54: Lecture 12: Markov Chain Monte Carlo

Impact of β

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 107 / 145

Application: Simulated Annealing

MCMC algorithm for optimization problems

M-H with varying β: β gradually increases over time

β is small at first, exploring the state space broadly

β gets larger and larger, more likely to stuck in local optimal, andstationary distribution more likely to concentrate around thebest-value solution

Why the name of annealing?

Annealing of metals: first heated to a very high temperature, andthen gradually cooled until it reaches a very strong and stable state

β = 1Temperature

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 108 / 145

Page 55: Lecture 12: Markov Chain Monte Carlo

Continuous State Space

MCMC can also be used in the continuous case, when π is aprobability density function.

For a continuous state space, a transition function replaces thetransition matrix.

Pij is the value of a conditional density function given X0 = i .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 109 / 145

Example: Beta Simulation

Suppose that we want to generate W ∼ Beta(a, b), but we don’t knowabout the rbeta command in R. Instead, what we have available are i.i.d.Unif(0, 1) r.v.s. How can we generate W which is approximatelyBeta(a, b) if a and b are any positive real numbers, with the help of aMarkov chain on the continuous state space (0, 1)?

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 110 / 145

Page 56: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 111 / 145

Example: Normal-Normal Conjugacy

Let Y |θ ∼ N(θ, σ2

), where σ2 is known but θ is unknown. Using the

Bayesian framework, we treat θ as a random variable, with prior given byθ ∼ N

(µ, τ2

)for some known constants µ and τ2. That is, we have the

two-level model

θ ∼ N(µ, τ2

)Y |θ ∼ N

(θ, σ2

)Describe how to use the Metropolis-Hastings algorithm to find theposterior mean and variance of θ after observing the value of Y .

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 112 / 145

Page 57: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 113 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 114 / 145

Page 58: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 115 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 116 / 145

Page 59: Lecture 12: Markov Chain Monte Carlo

Simulation Results with MCMC

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 117 / 145

Simulation Results with MCMC

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 118 / 145

Page 60: Lecture 12: Markov Chain Monte Carlo

Outline

1 Background of Markov Chain

2 Markov Property and Transition Matrix

3 Classification of States

4 Stationary Distribution & Reversibility

5 Basic Graph Theory

6 Application: PageRank

7 Introduction of Markov Chain Monte Carlo (MCMC)

8 Metropolis–Hastings Algorithm

9 Gibbs Sampler

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 119 / 145

Basic Idea

Proposed by brothers Stuart and Donald Geman in 1984.

Named after the physicist Josiah Willard Gibbs, in reference to ananalogy between the sampling algorithm and statistical physics.

Obtaining approximate draws from a joint distribution, based onsampling from conditional distributions one at a time

Especially useful in problems where these conditional distributions arepleasant to work with.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 120 / 145

Page 61: Lecture 12: Markov Chain Monte Carlo

Basic Idea

At each stage, one variable is updated (keeping all the other variablesfixed) by drawing from the conditional distribution of that variablegiven all the other variables.

Two major kinds of Gibbs sampler:I systematic scan: the updates sweep through the components in a

deterministic order.I random scan: a randomly chosen component is updated at each stage.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 121 / 145

Algorithm: Systematic Scan Gibbs Sampler

Let X and Y be discrete r.v.s with joint PMFpX ,Y (x , y) = P(X = x ,Y = y). We wish to construct a two-dimensionalMarkov chain (Xn,Yn) whose stationary distribution is pX ,Y . Thesystematic scan Gibbs sampler proceeds by updating the X -component andthe Y -component in alternation. If the current state is (Xn,Yn) = (xn, yn),then we update the X -component while holding the Y -component fixed,and then update the Y -component while holding the X -component fixed.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 122 / 145

Page 62: Lecture 12: Markov Chain Monte Carlo

Algorithm 2 Systematic Scan Gibbs Sampler

Require:Joint PMF pX ,Y ;Initial state (X0,Y0);

Ensure:Two-dimensional Markov chain (Xn,Yn);

1: repeat2: Draw a value xn+1 from the conditional distribution of X given Y =

yn, and set Xn+1 = xn+1;3: Draw a value yn+1 from the conditional distribution of Y given X =

xn+1, and set Yn+1 = yn+1;4: return (Xn+1,Yn+1);5: until n ≥ N;

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 123 / 145

Algorithm: Random Scan Gibbs Sampler

As above, let X and Y be discrete r.v.s with joint PMF pX ,Y (x , y). Wewish to construct a two-dimensional Markov chain (Xn,Yn) whosestationary distribution is pX ,Y . Each move of the random scan Gibbssampler picks a uniformly random component and updates it, according tothe conditional distribution given the other component.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 124 / 145

Page 63: Lecture 12: Markov Chain Monte Carlo

Algorithm 3 Random scan Gibbs sampler

Require:Joint PMF pX ,Y ;Initial state (X0,Y0);

Ensure:Two-dimensional Markov chain (Xn,Yn);

1: repeat2: Choose which component to update, with equal probabilities;3: If the X -component was chosen, draw a value xn+1 from the condi-

tional distribution of X given Y = yn, and set Xn+1 = xn+1,Yn+1 =yn. Similarly, if the Y -component was chosen, draw a value yn+1 fromthe conditional distribution of Y given X = xn, and set Xn+1 = xn,Yn+1 = yn+1;

4: return (Xn+1,Yn+1);5: until n ≥ N;

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 125 / 145

Random Scan Gibbs as Metropolis-Hastings

Theorem

The random scan Gibbs sampler is a special case of theMetropolis-Hastings algorithm, in which the proposal is always accepted.In particular, it follows that the stationary distribution of the random scanGibbs sampler is as desired.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 126 / 145

Page 64: Lecture 12: Markov Chain Monte Carlo

Gibbs Sampling vs. Metropolis-Hastings

Gibbs sampling emphasizes conditional distributions.

Metropolis-Hastings emphasizes acceptance probabilities.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 127 / 145

Example: Graph ColoringLet G be a network (also called a graph): there are n nodes, and for eachpair of distinct nodes, there either is or isn’t an edge joining them. Wehave a set of k colors, e.g., if k = 7, the color set may be {red, orange,yellow, green, blue, indigo, violet}. A k-coloring of the network is anassignment of a color to each node, such that two nodes joined by an edgecan’t be the same color. Graph coloring is an important topic in computerscience, with wide-ranging applications such as task scheduling and thegame of Sudoku.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 128 / 145

Page 65: Lecture 12: Markov Chain Monte Carlo

Suppose that it is possible to k-color G . Form a Markov chain on thespace of all k-colorings of G , with transitions as follows: starting with ak-coloring of G , pick a uniformly random node, figure out what the legalcolors are for that node, and then repaint that node with a uniformlyrandom legal color (note that this random color may be the same as thecurrent color). Show that this Markov chain is reversible, and find itsstationary distribution.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 129 / 145

Proof

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 130 / 145

Page 66: Lecture 12: Markov Chain Monte Carlo

Remarks

Think of each node in the graph as a discrete r.v. that can take on kpossible values.

These nodes have a joint distribution and a complicated dependencestructure (constraint that connected nodes cannot have the samecolor).

We would like to sample a random k-coloring of the entire graph(draw from the joint distribution of all the nodes).

We instead condition on all but one node.

If the joint distribution is to be uniform over all legal graphs, then theconditional distribution of one node given all the others is uniformover its legal colors.

At each stage of the algorithm, we are drawing from the conditionaldistribution of one node given all the others: we’re running a randomscan Gibbs sampler!

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 131 / 145

Gibbs Sampler for Continuous State Space

In the Gibbs sampler, the target distribution π is an m-dimensional jointdensity

π(x) = π(x1, · · · , xm).

A multivariate Markov chain is constructed whose stationary distribution isπ, and which takes values in an m-dimensional space. The algorithmgenerates elements by iteratively updating each component of anm-dimensional vector conditional on the other m − 1 components.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 132 / 145

Page 67: Lecture 12: Markov Chain Monte Carlo

Example: Bivariate Standard Normal Distribution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 133 / 145

Example: Bivariate Standard Normal Distribution

The Gibbs sampler is implemented to simulate (X ,Y ) from a bivariatestandard normal distribution with correlation ρ.

1 Initialize: (x0, y0)← (0, 0), m← 1.

2 Generate xm from the conditional distribution of X given Y = ym−1.That is, simulate from a normal distribution with mean ρym−1 andvariance 1− ρ2.

3 Generate ym from the conditional distribution of Y given X = xm.That is, simulate from a normal distribution with mean ρxm andvariance 1− ρ2.

4 m← m + 1.

5 Return to Step 2.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 134 / 145

Page 68: Lecture 12: Markov Chain Monte Carlo

Simulation Results with MCMC

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 135 / 145

Example: Chicken-Egg with Unknown Parameters

A chicken lays N eggs, where N ∼ Pois(λ). Each egg hatches withprobability p, where p is unknown; we let p ∼ Beta(a, b). The constantsλ, a, b are known.Here’s the catch: we don’t get to observe N. Instead, we only observe thenumber of eggs that hatch, X . Describe how to use Gibbs sampling tofind E (p|X = x), the posterior mean of p after observing x hatched eggs.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 136 / 145

Page 69: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 137 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 138 / 145

Page 70: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 139 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 140 / 145

Page 71: Lecture 12: Markov Chain Monte Carlo

Example: Three-dimensional Joint Distribution

Random variables X ,P and N have joint density

π(x , p, n) ∝(

n

x

)px(1− p)n−x

4n

n!

for x = 0, 1, . . . , n, 0 < p < 1, n = 0, 1, . . .. The p variable is continuous;x and n are discrete.

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 141 / 145

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 142 / 145

Page 72: Lecture 12: Markov Chain Monte Carlo

Solution

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 143 / 145

Simulation Results with MCMC

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 144 / 145

Page 73: Lecture 12: Markov Chain Monte Carlo

References

Chapters 11 & 12 of BH

Ziyu Shao (ShanghaiTech) Lecture 12: Markov Chain Monte Carlo December 25, 2018 145 / 145