Probabilistic Inference Lecture 5

Probabilistic InferenceLecture 5

M. Pawan Kumarpawan.kumar@ecp.fr

Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

• Open Book– Textbooks– Research Papers– Course Slides– No Electronic Devices

• Easy Questions – 10 points

• Hard Questions – 10 points

What to Expect in the Final Exam

Easy Question – BPCompute the reparameterization constants for (a,b)and (c,b) such that the unary potentials of b are equalto its min-marginals.

5 5-3Vc

6 12-6

-2 -1 -4 -3

Hard Question – BPProvide an O(h) algorithm to compute thereparameterization constants of BP for an edge whosepairwise potentials are specified by a truncated linearmodel.

Easy Question – Minimum CutProvide the graph corresponding to the MAP estimationproblem in the following MRF.

5 5-3Vc

6 12-6

-2 -1 -4 -3

Hard Question – Minimum CutShow that the expansion algorithm provides a bound of2M for the truncated linear metric, where M is the valueof the truncation.

Easy Question – RelaxationsUsing an example, show that the LP-S relaxation is not tight for a frustrated cycle (cycle with an odd number ofsupermodular pairwise potentials).

Hard Question – RelaxationsProve or disprove that the LP-S and SOCP-MS relaxations are invariant to reparameterization.

Integer Programming Formulation

min ∑a ∑i a;i ya;i + ∑(a,b) ∑ik ab;ik yab;ik

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Integer Programming Formulation

min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

= [ … a;i …. ; … ab;ik ….]y = [ … ya;i …. ; … yab;ik ….]

Linear Programming Relaxation

min Ty

ya;i {0,1}

∑i ya;i = 1

yab;ik = ya;i yb;k

Two reasons why we can’t solve this

min Ty

ya;i [0,1]

∑i ya;i = 1

yab;ik = ya;i yb;k

One reason why we can’t solve this

min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ∑kya;i yb;k

min Ty

ya;i [0,1]

∑i ya;i = 1

= 1∑k yab;ik = ya;i∑k yb;k

min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i

min Ty

ya;i [0,1]

∑i ya;i = 1

∑k yab;ik = ya;i

No reason why we can’t solve this *

*memory requirements, time complexity

Dual of the LP RelaxationWainwright et al., 2001

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

q*(4) q*(5) q*(6)

Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

q*(4) q*(5) q*(6)

Dual of LP

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi q*(i)max

max q*(i)

I can easily compute q*(i)

I can easily maintain reparam constraint

So can I easily solve the dual?

• TRW Message Passing

• Dual Decomposition

Outline

Things to Remember

• Forward-pass computes min-marginals of root

• BP is exact for trees

• Every iteration provides a reparameterization

TRW Message PassingKolmogorov, 2006

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

i q*(i)

Pick a variable Va

i q*(i)

Vc Vb Va

Va Vd Vg

1 + 4 + rest q*(1) + q*(4) + K

Vc Vb Va Va Vd Vg

Reparameterize to obtain min-marginals of Va

’1 + ’4 + rest

Vc Vb Va

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

Va Vd Vg

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

One pass of Belief Propagation

q*(’1) + q*(’4) + K

’1 + ’4 + rest

Vc Vb Va Va Vd Vg

Remain the same

q*(’1) + q*(’4) + K

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

’1 + ’4 + rest

min{’1a;0,’1a;1} + min{’4a;0,’4a;1} + K

Vc Vb Va Va Vd Vg

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

’1 + ’4 + rest

Vc Vb Va Va Vd Vg

Compute average of min-marginals of Va

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

min{’1a;0,’1a;1} + min{’4a;0,’4a;1} + K

’1 + ’4 + rest

Vc Vb Va Va Vd Vg

’’a;0 = ’1a;0+ ’4a;0

2’’a;1 = ’1a;1+ ’4a;1

’1c;0

’1c;1

’1b;0

’1b;1

’1a;0

’1a;1

’4a;0

’4a;1

’4d;0

’4d;1

’4g;0

’4g;1

min{’1a;0,’1a;1} + min{’4a;0,’4a;1} + K

’’1 + ’’4 + rest

Vc Vb Va Va Vd Vg

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

’’a;0 = ’1a;0+ ’4a;0

2’’a;1 = ’1a;1+ ’4a;1

min{’1a;0,’1a;1} + min{’4a;0,’4a;1} + K

’’1 + ’’4 + rest

Vc Vb Va Va Vd Vg

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

’’a;0 = ’1a;0+ ’4a;0

2’’a;1 = ’1a;1+ ’4a;1

min{’1a;0,’1a;1} + min{’4a;0,’4a;1} + K

Vc Vb Va Va Vd Vg

2 min{’’a;0, ’’a;1} + K

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

’’1 + ’’4 + rest

’’a;0 = ’1a;0+ ’4a;0

2’’a;1 = ’1a;1+ ’4a;1

Vc Vb Va Va Vd Vg

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

min {p1+p2, q1+q2} min {p1, q1} + min {p2, q2}≥ 2 min{’’a;0, ’’a;1} + K

’’1 + ’’4 + rest

Vc Vb Va Va Vd Vg

Objective function increases or remains constant

’1c;0

’1c;1

’1b;0

’1b;1

’’a;0

’’a;1

’’a;0

’’a;1

’4d;0

’4d;1

’4g;0

’4g;1

2 min{’’a;0, ’’a;1} + K

’’1 + ’’4 + rest

TRW Message Passing

Initialize i. Take care of reparam constraint

Choose random variable Va

Compute min-marginals of Va for all trees

Node-average the min-marginals

REPEAT

Kolmogorov, 2006

Can also do edge-averaging

Example 1

3Vc Va

Pick variable Va. Reparameterize.

Example 1

2Vb Vc

3Vc Va

Average the min-marginals of Va

Example 1

2Vb Vc

3Vc Va

Pick variable Vb. Reparameterize.

Example 1

-7 -5.5

7Vb Vc

3Vc Va

Average the min-marginals of Vb

Example 1

-7 -5.5

6.5Vb Vc

3Vc Va

7 Value of dual does not increase

Example 1

-7 -5.5

6.5Vb Vc

3Vc Va

7 Maybe it will increase for Vc

Example 1

-7 -5.5

6.5Vb Vc

3Vc Va

Strong Tree Agreement

Exact MAP Estimate

f1(a) = 0 f1(b) = 0 f2(b) = 0 f2(c) = 0 f3(c) = 0 f3(a) = 0

Example 2

2Vb Vc

0Vc Va

Pick variable Va. Reparameterize.

Example 2

2Vb Vc

0Vc Va

Average the min-marginals of Va

Example 2

2Vb Vc

0Vc Va

4 Value of dual does not increase

Example 2

2Vb Vc

0Vc Va

4 Maybe it will decrease for Vb or Vc

Example 2

2Vb Vc

0Vc Va

f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1

f2(b) = 0 f2(c) = 1

Weak Tree Agreement Not Exact MAP Estimate

Example 2

2Vb Vc

0Vc Va

Weak Tree Agreement Convergence point of TRW

f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1

f2(b) = 0 f2(c) = 1

Obtaining the Labelling

Only solves the dual. Primal solutions?

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

’ = i

Fix the labelOf Va

Obtaining the Labelling

Only solves the dual. Primal solutions?

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

’ = i

Fix the labelOf Vb

Continue in some fixed orderMeltzer et al., 2006

Computational Issues of TRW

• Speed-ups for some pairwise potentials

Basic Component is Belief Propagation

Felzenszwalb & Huttenlocher, 2004

• Memory requirements cut down by half Kolmogorov, 2006

• Further speed-ups using monotonic chains Kolmogorov, 2006

Theoretical Properties of TRW

• Always converges, unlike BP Kolmogorov, 2006

• Strong tree agreement implies exact MAP Wainwright et al., 2001

• Optimal MAP for two-label submodular problems

Kolmogorov and Wainwright, 2005

ab;00 + ab;11 ≤ ab;01 + ab;10

ResultsBinary Segmentation Szeliski et al. , 2008

Labels - {foreground, background}

Unary Potentials: -log(likelihood) using learnt fg/bg models

Pairwise Potentials: 0, if same labels

1 - exp(|da - db|), if different labels

ResultsBinary Segmentation

Szeliski et al. , 2008

ResultsBinary Segmentation

Szeliski et al. , 2008

Belief Propagation

ResultsStereo Correspondence Szeliski et al. , 2008

Labels - {disparities}

Unary Potentials: Similarity of pixel colours

ResultsSzeliski et al. , 2008

Stereo Correspondence

ResultsSzeliski et al. , 2008

Belief Propagation

Stereo Correspondence

ResultsNon-submodular problems Kolmogorov, 2006

BP TRW-S

30x30 grid K50

BP TRW-S

BP outperforms TRW-S

Code + Standard Data

http://vision.middlebury.edu/MRF

• TRW Message Passing

• Dual Decomposition

Outline

Dual Decomposition

minx ∑i gi(x)s.t. x C

Dual Decomposition

minx,xi ∑i gi(xi)

s.t. xi C xi = x

Dual Decomposition

minx,xi ∑i gi(xi)

s.t. xi C

Dual Decomposition

minx,xi ∑i gi(xi) + ∑i λi

T(xi-x)

s.t. xi Cmaxλi

KKT Condition: ∑i λi = 0

Dual Decomposition

minx,xi ∑i gi(xi) + ∑i λi

s.t. xi Cmaxλi

Dual Decomposition

minxi ∑i (gi(xi) + λi

Txi)s.t. xi C

Projected Supergradient Ascent

maxλi

Supergradient s of h(z) at z0

h(z) - h(z0) ≤ sT(z-z0), for all z in the feasible region

Dual Decomposition

Txi)s.t. xi C

Initialize λi0

maxλi

Dual Decomposition

Txi)s.t. xi C

Compute supergradients

maxλi

si = argminxi ∑i (gi(xi) + (λi

t)Txi)

Dual Decomposition

Txi)s.t. xi C

Project supergradients

maxλi

pi = si - ∑j sj/m

where ‘m’ = number of subproblems (slaves)

Dual Decomposition

Txi)s.t. xi C

Update dual variables

maxλi

λit+1

= λit + ηt pi

where ηt = learning rate = 1/(t+1) for example

Dual DecompositionInitialize λi

Compute projected supergradients

si = argminxi ∑i (gi(xi) + (λi

t)Txi)

pi = si - ∑j sj/m

Update dual variables

λit+1

= λit + ηt pi

REPEAT

Dual DecompositionKomodakis et al., 2007

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Slaves agree on label for Va

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Slaves disagree on label for Va

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

-0.5p1

Unary cost increases

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

-0.5p1

Unary cost decreases

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

Va Vb Vc

Vd Ve Vf

Vg Vh Vi

-0.5p1

Push the slavestowards agreement

ComparisonTRW DD

Fast Slow

Local Maximum Global Maximum

RequiresMin-Marginals

RequiresMAP Estimate

Other forms of slavesTighter relaxations

Sparse high-order potentials

Probabilistic Inference Lecture 5

Documents

Complex Inference in Neural Circuits with Probabilistic ...papers.nips.cc/paper/4555-complex-inference-in... · Complex Inference in Neural Circuits with Probabilistic Population

Probabilistic inference in graphical models

Inference in Probabilistic Graphical - CSweb.cs.ucla.edu/.../Group6_Inference_in_Probabilistic_Graphical_Mod… · inference in probabilistic graphical models Proved that learned

Probabilistic Inference Lecture 2 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online

DOC493: Data Analysis and Probabilistic Inference · Tuesday 6 Feb Lecture 9 Lecture 10 Thursday 8 Feb Lecture 11 Tutorial 5 ... DOC493: Data Analysis and Probabilistic Inference

Probabilistic inference in graphical modelsmlg.eng.cam.ac.uk/zoubin/course04/hbtnn2e-I.pdf · Probabilistic inference in graphical models ... inference algorithms allow statistical

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference

08 probabilistic inference over time

Bayesian Inference for NASA Probabilistic

Advanced Artificial Intelligence Lecture 5: Probabilistic Inference

Probabilistic Graphical Models: Distributed Inference and ...ssg.mit.edu/ssg_theses/ssg_theses_2010_present/LiuY_Phd_6_14.pdf · Probabilistic Graphical Models: Distributed Inference

CO902 Probabilistic and statistical inference · CO902 Probabilistic and statistical inference Lecture 1 ... “Probability and Statistics ... Casella & Berger “Statistical Inference

Probabilistic Inference Lecture 7

Probabilistic Inference Lecture 6 – Part 2

Probabilistic Inference Lecture 2

Probabilistic Inference in Multi-Agent Systems

Meta-Amortized Variational Inference and Learning...Probabilistic Inference Probabilistic inference is a particular way of viewing the world: + Observations = Prior belief Updated

PLIS : a Probabilistic Lexical Inference System

Probabilistic Inference Lecture 4 – Part 1

Probabilistic Inference in Distributed Systems