Upload
erika-gordon
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Probabilistic Graphical Modelsseminar 15/16 (0368-4511-01)
Haim KaplanTel Aviv University
What is a Probabilistic Graphical Model (PGM) ?
A method to represent a joint probability distribution over a set of random variables X=(X1,…,Xn)
Explicit Representation
Huge X1 X2 X3 Xn P(X1,…Xn)
1 0 1 1 1 0 0 0 1 1/100
PGM
Use the special structure of the distribution to get a more compact representation
Two kinds: Baysian networks (directed)Markov networks (undirected)
A Baysian Network (Chapter 3)
Tt𝒕
A Markov Model (Chapter 4)B
DC
A
3.7 if A and B
( , ) 2.1 if A and B
0.7 otherwise
2.3 if B and C and D( , , )
5.1 otherwise
A B
B C D
Potential functions defined over cliques:
A Markov ModelB
DC
A
( )cX c
Z X
1( ) ( )c
c
P X XZ
Inference (chapters 9-13)
• Given a PGM we want to compute– Conditional probability query: P(Y|E=e). The
probability distribution on the values of Y given that E=e.
– Maximum a posteriori (MAP): argmaxyP(y|E=e). The assignment y to Y that maximizes P(y|E=e). (Most Probable Explanation (MPE) when Y consists of all remaining variables)
Examples
Hidden Markov models for the parts of speech problem
• Xi’s value is a part of speech (noun, verb, proposition..)
• Oi’s value is a word
Hidden Markov models for the parts of speech problem
• We observe the Oi’s and we would like to compute the Xi’s that maximize P(Xi’s | Oi’s) ?
Pylogenetic trees
Pedigrees
Phenotype vs Genotype
Pedigrees
F2
O1
F1M2
O2
M1
F2
O1
F1M2
O2
M1
F2
O1
F1M2
O2
M1
More than one gene
Haplotype variables form a hidden Markov model
F2
O1
F1M2
O2
M1
F2
O1
F1M2
O2
M1
More than one individual
A1 A2
A1 A2
The computer vision applications –Examples
• Image segmentation (2D, 3D)• Stereo – depth estimation
• There are many others…
Image segmentation
• Separate foreground (object) from background
Image segmentation• Each pixel is a vertex
• A vertex connects to its neighbors
Image segmentation• For every pixel we have a factor Ф(X): We determine Ф(b) and Ф(f)
X
Image segmentation• For every pair of adjacent pixels X,Y we have a
factor Ф(X,Y): We determine Ф(b,b), Ф(f,f), Ф(b,f), Ф(f,b)
YX
Image segmentation• We perform a MAP query:
YX
𝐴𝑟𝑔𝑚𝑎𝑥 𝑓 ,𝑏−𝑎𝑠𝑠𝑖𝑔𝑛𝑚𝑒𝑛𝑡𝑠 𝑃 (𝑋=𝑏 ,𝑌= 𝑓 ,……….)
3D segmentation
Consecutive video frames are adjacent layers in 3D grid
3D segmentation
Stereo vision – computing depths
Left camera Right camera
Real world point 1
Image 1 Image 2
Disparity1 = x1-y1
y1x1
Stereo vision – computing depths
y1
Real world point 2
Disparity1 = x1-y1
x1
Left camera Right camera
Real world point 1
Image 1 Image 2y2x2
Disparity1 = x2-y2
Stereo vision – computing depths
• Disparity is usually a small number of pixels• We want to label each pixel with its disparity• A multi-label problem
Stereo vision – computing depths
• Compute Фp(d) for each pixel p: how likely is p to have disparity d ?
p
• Compute for each pair of adjacent pixels p,q, Фp,q(d1,d2): how likely are p and q to have disparities d1 and d2, respectively ?
x+dx
Interactions between proteins
I(p1,p2) I(p1,p3) I(p2,p3)
L(p1,S) L(p2,S) L(p3,S)
P(I(p1,p2),I(p1,p3),I(p2,p3),L(p1,S),L(p2,S),L(p3,S))=Ф(I(p1,p2)) × Ф(I(p1,p3)) × Ф(I(p2,p3)) ×Ф(I(p1,p2),L(p1,S),L(p2,S)) × Ф(I(p1,p3),L(p1,S),L(p3,S)) × Ф(I(p2,p3),L(p2,S),L(p3,S))
Interactions between proteins
I(p1,p2) I(p1,p3) I(p2,p3)
L(p1,S) L(p2,S) L(p3,S)
P(I(p1,p2),I(p1,p3),I(p2,p3),L(p1,S),L(p2,S),L(p3,S))=Ф(I(p1,p2)) × Ф(I(p1,p3)) × Ф(I(p2,p3)) ×Ф(I(p1,p2),L(p1,S),L(p2,S)) × Ф(I(p1,p3),L(p1,S),L(p3,S)) × Ф(I(p2,p3),L(p2,S),L(p3,S)) ×Ф(I(p1,p2),L(p1,A),L(p2,A)) × Ф(I(p1,p3),L(p1,A),L(p3,A)) × Ф(I(p2,p3),L(p2,A),L(p3,A))
L(p1,A) L(p2,A) L(p3,A)
Interactions between proteins
I(p1,p2) I(p1,p3) I(p2,p3)
L(p1,S) L(p2,S) L(p3,S)
P(I(p1,p2),I(p1,p3),I(p2,p3),L(p1,S),L(p2,S),L(p3,S))=Ф(I(p1,p2)) × Ф(I(p1,p3)) × Ф(I(p2,p3))Ф(I(p1,p2),L(p1,S),L(p2,S)) × Ф(I(p1,p3),L(p1,S),L(p3,S)) × Ф(I(p2,p3),L(p2,S),L(p3,S))Ф(I(p1,p2),L(p1,A),L(p2,A)) × Ф(I(p1,p3),L(p1,A),L(p3,A)) × Ф(I(p2,p3),L(p2,A),L(p3,A)) ×Ф(I(p2,p3), Ex1(p2,p3)) × Ф(I(p2,p3), Ex1(p2,p3)) × Ф(I(p1,p3), Ex2(p1,p3))
L(p1,A) L(p2,A) L(p3,A)
Ex1(p2,p3)
Ex2(p2,p3)
Ex2(p1,p3)
Inference (chapters 9-13)
• Given a PGM we want to compute– Conditional probability query: P(Y|E=e). The
probability distribution on the values of Y given that E=e.
– Maximum a posteriori (MAP): argmaxyP(y|E=e). The assignment y to Y that maximizes P(y|E=e). (Most Probable Explanation (MPE) when Y consists of all remaining variables)
Complexity (chapter 9)
• These problems are NP-hard…sometimes even to approximate
Exact Solutions
),,(),(),(),(),(1
),,,,,(
65253423121
654321
XXXXXXXXXXXZ
XXXXXXP
An example:
Exact Solutions – variable elimination
• P(X1) ?
),,(),(),(),(),(1
),,,,,(
65253423121
654321
XXXXXXXXXXXZ
XXXXXXP
Exact Solutions – variable elimination
),,(),(),(),(),(1
)(
65253423121
1
2 3 4 5 6
XXXXXXXXXXXZ
XPX X X X X
Explicit Computation
Sum all rows with X1=1 and all rows with X1=0
X1 X2 X3 Xn Ф(X1,…Xn)
1 0 1 1 1 0 0 0 1 13
Variable elimination
65
432
),,(),(
),(),(),(1
)(
65253
423121
1
XX
XXX
XXXXX
XXXXXXZ
XP
X2 X5 X6 Ф(X2,X5,X6)
X2 X5 m(X2,X5)
Variable elimination
65
432
),,(),(
),(),(),(1
)(
65253
423121
1
XX
XXX
XXXXX
XXXXXXZ
XP
),(),(
),(),(),(1
5253
423121
5
432
XXmXX
XXXXXXZ
X
XXX
Variable elimination
),(),(
),(),(),(1
)(
5253
4231211
5
432
XXmXX
XXXXXXZ
Xp
X
XXX
),('),(),(),(1
32423121
432
XXmXXXXXXZ XXX
432
),(),('),(),(1
42323121XXX
XXXXmXXXXZ
)(''),('),(),(1
2323121
32
XmXXmXXXXZ XX
Variable elimination
)(''),('),(),(1
)( 23231211
32
XmXXmXXXXZ
XpXX
),('),()(''),(1
3231221
32
XXmXXXmXXZ XX
),(''')(''),(1
21221
2
XXmXmXXZ X
)(''''1
1XmZ
If variables are binary, temporary tables are of size 23, global table is of size 26
Elimination order
• There are many possible elimination orders• Which one do we want to pick ?• When we eliminate a variable X, how large is
the intermediate table that we create ?
Elimination order
X
Elimination order
• The number of variables that X is in factors with is equal to the number of neighbors of X
Elimination order
• To augment the graph such that it reflects the factors after the elimination of X we need to make the neighbors of X into a clique..
• Want an elimination order that does not generate large cliques
• NP-hard to find the optimal• There are good heuristics
Suppose the graph is a tree
X1
X2X3
X4 X5X6
X7
)(),()( 775575
7
XXXXmX
Suppose the graph is a tree
X1
X2X3
X4 X5X6
)(),()( 775575
7
XXXXmX
)()(),()( 575553353
5
XmXXXXmX
Suppose the graph is a tree
X1
X2X3
X4X6
)()(),()( 575553353
5
XmXXXXmX
)(),()( 663363
6
XXXXmX
Suppose the graph is a tree
X1
X2X3
X4
)()(),()( 575553353
7
XmXXXXmX
)(),()( 663363
7
XXXXmX
Computing many marginals ?
• P(X1), P(X2), P(X3)…… ?• Want to recycle parts of the computation
Sum-product algorithm
X1
X2X3
X4 X5X6
X7
m75(X5)m57(X7)
m63(X3)
m34(X4)
m12(X2)
Generalizations
• Junction tree algorithm• Belief propagation
Sampling methods (chapter 12)
• Sample from the distribution• Estimate P(y|E=e) by its fraction in the
samples (for which E=e)• How do we sample efficiently ?
Sampling methods (chapter 12)
• Use Markov Chains (with stationary distribution P(Y|E=e)
• Gibbs chain• Metropolis - Hastings• Discuss issues like the mixing time of the chain
Learning (Chapters 17-20)
• Find the PGM, given samples d1,d2,…,dm from the PGM
• There are 2 levels of difficulty here:– Graph structure is know, we just estimate the
factors– Need to estimate the graph structure as well
• Sometime values of variables in the sample are missing
Learning -- techniques
• Maximum likelihood estimation– Find the factors that maximize the probability of
sampling d1,d2,….,dm
– Problem usually decomposes for Baysian Networks, harder for Markov Networks
• Baysian estimation: assume some prior on the model