Probabilistic Graphical Models seminar 15/16 (0368-4511-01) Haim Kaplan Tel Aviv University

Probabilistic Graphical Modelsseminar 15/16 (0368-4511-01)

Haim KaplanTel Aviv University

What is a Probabilistic Graphical Model (PGM) ?

A method to represent a joint probability distribution over a set of random variables X=(X1,…,Xn)

Explicit Representation

Huge X1 X2 X3 Xn P(X1,…Xn)

1 0 1 1 1 0 0 0 1 1/100

PGM

Use the special structure of the distribution to get a more compact representation

Two kinds: Baysian networks (directed)Markov networks (undirected)

A Baysian Network (Chapter 3)

Tt𝒕

A Markov Model (Chapter 4)B

DC

A

3.7 if A and B

( , ) 2.1 if A and B

0.7 otherwise

2.3 if B and C and D( , , )

5.1 otherwise

A B

B C D

Potential functions defined over cliques:

A Markov ModelB

DC

A

( )cX c

Z X

1( ) ( )c

c

P X XZ

Inference (chapters 9-13)

• Given a PGM we want to compute– Conditional probability query: P(Y|E=e). The

probability distribution on the values of Y given that E=e.

– Maximum a posteriori (MAP): argmaxyP(y|E=e). The assignment y to Y that maximizes P(y|E=e). (Most Probable Explanation (MPE) when Y consists of all remaining variables)

Examples

Hidden Markov models for the parts of speech problem

• Xi’s value is a part of speech (noun, verb, proposition..)

• Oi’s value is a word

Hidden Markov models for the parts of speech problem

• We observe the Oi’s and we would like to compute the Xi’s that maximize P(Xi’s | Oi’s) ?

Pylogenetic trees

Pedigrees

Phenotype vs Genotype

Pedigrees

F2

O1

F1M2

O2

M1

F2

O1

F1M2

O2

M1

F2

O1

F1M2

O2

M1

More than one gene

Haplotype variables form a hidden Markov model

F2

O1

F1M2

O2

M1

F2

O1

F1M2

O2

M1

More than one individual

A1 A2

A1 A2

The computer vision applications –Examples

• Image segmentation (2D, 3D)• Stereo – depth estimation

• There are many others…

Image segmentation

• Separate foreground (object) from background

Image segmentation• Each pixel is a vertex

• A vertex connects to its neighbors

Image segmentation• For every pixel we have a factor Ф(X): We determine Ф(b) and Ф(f)

X

Image segmentation• For every pair of adjacent pixels X,Y we have a

factor Ф(X,Y): We determine Ф(b,b), Ф(f,f), Ф(b,f), Ф(f,b)

YX

Image segmentation• We perform a MAP query:

YX

𝐴𝑟𝑔𝑚𝑎𝑥 𝑓 ,𝑏−𝑎𝑠𝑠𝑖𝑔𝑛𝑚𝑒𝑛𝑡𝑠 𝑃 (𝑋=𝑏 ,𝑌= 𝑓 ,……….)

3D segmentation

Consecutive video frames are adjacent layers in 3D grid

3D segmentation

Stereo vision – computing depths

Left camera Right camera

Real world point 1

Image 1 Image 2

Disparity1 = x1-y1

y1x1


y1

Real world point 2

Disparity1 = x1-y1

x1

Left camera Right camera

Real world point 1

Image 1 Image 2y2x2

Disparity1 = x2-y2


• Disparity is usually a small number of pixels• We want to label each pixel with its disparity• A multi-label problem


• Compute Фp(d) for each pixel p: how likely is p to have disparity d ?

p

• Compute for each pair of adjacent pixels p,q, Фp,q(d1,d2): how likely are p and q to have disparities d1 and d2, respectively ?

x+dx

Interactions between proteins

I(p1,p2) I(p1,p3) I(p2,p3)

L(p1,S) L(p2,S) L(p3,S)

P(I(p1,p2),I(p1,p3),I(p2,p3),L(p1,S),L(p2,S),L(p3,S))=Ф(I(p1,p2)) × Ф(I(p1,p3)) × Ф(I(p2,p3)) ×Ф(I(p1,p2),L(p1,S),L(p2,S)) × Ф(I(p1,p3),L(p1,S),L(p3,S)) × Ф(I(p2,p3),L(p2,S),L(p3,S))


I(p1,p2) I(p1,p3) I(p2,p3)

L(p1,S) L(p2,S) L(p3,S)

P(I(p1,p2),I(p1,p3),I(p2,p3),L(p1,S),L(p2,S),L(p3,S))=Ф(I(p1,p2)) × Ф(I(p1,p3)) × Ф(I(p2,p3)) ×Ф(I(p1,p2),L(p1,S),L(p2,S)) × Ф(I(p1,p3),L(p1,S),L(p3,S)) × Ф(I(p2,p3),L(p2,S),L(p3,S)) ×Ф(I(p1,p2),L(p1,A),L(p2,A)) × Ф(I(p1,p3),L(p1,A),L(p3,A)) × Ф(I(p2,p3),L(p2,A),L(p3,A))

L(p1,A) L(p2,A) L(p3,A)


I(p1,p2) I(p1,p3) I(p2,p3)

L(p1,S) L(p2,S) L(p3,S)

P(I(p1,p2),I(p1,p3),I(p2,p3),L(p1,S),L(p2,S),L(p3,S))=Ф(I(p1,p2)) × Ф(I(p1,p3)) × Ф(I(p2,p3))Ф(I(p1,p2),L(p1,S),L(p2,S)) × Ф(I(p1,p3),L(p1,S),L(p3,S)) × Ф(I(p2,p3),L(p2,S),L(p3,S))Ф(I(p1,p2),L(p1,A),L(p2,A)) × Ф(I(p1,p3),L(p1,A),L(p3,A)) × Ф(I(p2,p3),L(p2,A),L(p3,A)) ×Ф(I(p2,p3), Ex1(p2,p3)) × Ф(I(p2,p3), Ex1(p2,p3)) × Ф(I(p1,p3), Ex2(p1,p3))

L(p1,A) L(p2,A) L(p3,A)

Ex1(p2,p3)

Ex2(p2,p3)

Ex2(p1,p3)

Inference (chapters 9-13)

• Given a PGM we want to compute– Conditional probability query: P(Y|E=e). The

probability distribution on the values of Y given that E=e.

– Maximum a posteriori (MAP): argmaxyP(y|E=e). The assignment y to Y that maximizes P(y|E=e). (Most Probable Explanation (MPE) when Y consists of all remaining variables)

Complexity (chapter 9)

• These problems are NP-hard…sometimes even to approximate

Exact Solutions

),,(),(),(),(),(1

),,,,,(

65253423121

654321

XXXXXXXXXXXZ

XXXXXXP

An example:

Exact Solutions – variable elimination

• P(X1) ?

),,(),(),(),(),(1

),,,,,(

65253423121

654321

XXXXXXXXXXXZ

XXXXXXP

Exact Solutions – variable elimination

),,(),(),(),(),(1

)(

65253423121

1

2 3 4 5 6

XXXXXXXXXXXZ

XPX X X X X

Explicit Computation

Sum all rows with X1=1 and all rows with X1=0

X1 X2 X3 Xn Ф(X1,…Xn)

1 0 1 1 1 0 0 0 1 13

Variable elimination

65

432

),,(),(

),(),(),(1

)(

65253

423121

1

XX

XXX

XXXXX

XXXXXXZ

XP

X2 X5 X6 Ф(X2,X5,X6)

X2 X5 m(X2,X5)


65

432

),,(),(

),(),(),(1

)(

65253

423121

1

XX

XXX

XXXXX

XXXXXXZ

XP

),(),(

),(),(),(1

5253

423121

5

432

XXmXX

XXXXXXZ

X

XXX


),(),(

),(),(),(1

)(

5253

4231211

5

432

XXmXX

XXXXXXZ

Xp

X

XXX

),('),(),(),(1

32423121

432

XXmXXXXXXZ XXX

432

),(),('),(),(1

42323121XXX

XXXXmXXXXZ

)(''),('),(),(1

2323121

32

XmXXmXXXXZ XX


)(''),('),(),(1

)( 23231211

32

XmXXmXXXXZ

XpXX

),('),()(''),(1

3231221

32

XXmXXXmXXZ XX

),(''')(''),(1

21221

2

XXmXmXXZ X

)(''''1

1XmZ

If variables are binary, temporary tables are of size 23, global table is of size 26

Elimination order

• There are many possible elimination orders• Which one do we want to pick ?• When we eliminate a variable X, how large is

the intermediate table that we create ?

Elimination order

X

Elimination order

• The number of variables that X is in factors with is equal to the number of neighbors of X

Elimination order

• To augment the graph such that it reflects the factors after the elimination of X we need to make the neighbors of X into a clique..

• Want an elimination order that does not generate large cliques

• NP-hard to find the optimal• There are good heuristics

Suppose the graph is a tree

X1

X2X3

X4 X5X6

X7

)(),()( 775575

7

XXXXmX


X1

X2X3

X4 X5X6

)(),()( 775575

7

XXXXmX

)()(),()( 575553353

5

XmXXXXmX


X1

X2X3

X4X6

)()(),()( 575553353

5

XmXXXXmX

)(),()( 663363

6

XXXXmX


X1

X2X3

X4

)()(),()( 575553353

7

XmXXXXmX

)(),()( 663363

7

XXXXmX

Computing many marginals ?

• P(X1), P(X2), P(X3)…… ?• Want to recycle parts of the computation

Sum-product algorithm

X1

X2X3

X4 X5X6

X7

m75(X5)m57(X7)

m63(X3)

m34(X4)

m12(X2)

Generalizations

• Junction tree algorithm• Belief propagation

Sampling methods (chapter 12)

• Sample from the distribution• Estimate P(y|E=e) by its fraction in the

samples (for which E=e)• How do we sample efficiently ?

Sampling methods (chapter 12)

• Use Markov Chains (with stationary distribution P(Y|E=e)

• Gibbs chain• Metropolis - Hastings• Discuss issues like the mixing time of the chain

Learning (Chapters 17-20)

• Find the PGM, given samples d1,d2,…,dm from the PGM

• There are 2 levels of difficulty here:– Graph structure is know, we just estimate the

factors– Need to estimate the graph structure as well

• Sometime values of variables in the sample are missing

Learning -- techniques

• Maximum likelihood estimation– Find the factors that maximize the probability of

sampling d1,d2,….,dm

– Problem usually decomposes for Baysian Networks, harder for Markov Networks

• Baysian estimation: assume some prior on the model

Documents

Probabilistic Graphical Models seminar 15/16 (0368-4511-01) Haim Kaplan Tel Aviv University