Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian...

Tractable Higher Order Models in Computer Vision (Part II)

Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet KhliMicrosoft Research Cambridge

Presented by Xiaodan Liang

Part II

• Submodularity • Move making algorithms

• Higher-order model : Pn Potts model

Feature selection

Factoring distributions

Problem inherently combinatorial!

Example: Greedy algorithm for feature selection

Key property: Diminishing returnsSelection A = {} Selection B = {X2,X3}

Adding X1 will help a lot!

Adding X1 doesn’t help much

New feature X1

Large improvement

Small improvement

Submodularity:

Y“Sick”

“Fever”

“Rash”X3

“Male”

Y“Sick”

Theorem [Krause, Guestrin UAI ‘05]: Information gain F(A) in Naïve Bayes models is submodular!

Why is submodularity useful?Theorem [Nemhauser et al ‘78]Greedy maximization algorithm returns Agreedy:

F(Agreedy) ¸ (1-1/e) max|A| k F(A)

• Greedy algorithm gives near-optimal solution!• For info-gain: Guarantees best possible unless P = NP!

[Krause, Guestrin UAI ’05]

Submodularity in Machine Learning• Many ML problems are submodular, i.e., for F

submodular require:• Minimization: A* = argmin F(A)– Structure learning (A* = argmin I(XA; XV\A))– Clustering– MAP inference in Markov Random Fields– …

• Maximization: A* = argmax F(A) – Feature selection– Active learning– Ranking– …

Set functions

Submodular set functions• Set function F on V is called submodular if

• Equivalent diminishing returns characterization:

Large improvement

Small improvement

Submodularity:

BA A [ B

Submodularity and supermodularity

Example: Mutual information

Closedness propertiesF1,…,Fm submodular functions on V and 1,…,m > 0

Then: F(A) = i i Fi(A) is submodular!

Submodularity closed under nonnegative linear combinations!

Extremely useful fact!!– F(A) submodular ) P() F(A) submodular!– Multicriterion optimization:

F1,…,Fm submodular, i¸0 ) i i Fi(A) submodular

Submodularity and Concavity

g(|A|)

Maximum of submodular functions

Suppose F1(A) and F2(A) submodular.

Is F(A) = max(F1(A),F2(A)) submodular?

F(A) = max(F1(A),F2(A))

max(F1,F2) not submodular in general!

Minimum of submodular functions

Well, maybe F(A) = min(F1(A),F2(A)) instead?

F1(A) F2(A) F(A)

; 0 0 0{a} 1 0 0{b} 0 1 0{a,b} 1 1 1

F({b}) – F(;)=0

F({a,b}) – F({a})=1

But stay tuned

min(F1,F2) not submodular in general!

Submodularity and convexity

The submodular polyhedron PFExample: V = {a,b}

x({a}) · F({a})

x({b}) · F({b})

x({a,b}) · F({a,b})

-1 x{a}

A F(A); 0{a} -1{b} 2{a,b} 0

Lovasz extension

-1 w{a}

Example: Lovasz extension

g([0,1]) = [0,1]T [-2,2] = 2 = F({b})

g([1,1]) = [1,1]T [-1,1] = 0 = F({a,b})

{} {a}

{b} {a,b}[-1,1][-2,2]

g(w) = max {wT x: x 2 PF}

w=[0,1]want g(w)

Greedy ordering:e1 = b, e2 = a

w(e1)=1 > w(e2)=0

xw(e1)=F({b})-F(;)=2

xw(e2)=F({b,a})-F({b})=-2

xw=[-2,2]

A F(A); 0{a} -1{b} 2{a,b} 0

Why is this useful?Theorem [Lovasz ’83]:g(w) attains its minimum in [0,1]n at a corner!

If we can minimize g on [0,1]n, can minimize F…(at corners, g and F take same values)

F(A) submodular g(w) convex (and efficient to evaluate)

Does the converse also hold?No, consider g(w1,w2,w3) = max(w1,w2+w3)

{a} {b} {c} F({a,b})-F({a})=0 < F({a,b,c})-F({a,c})=1

Minimizing a submodular function

Ellipsoid algorithm

Interior Points algorithm

Example: Image denoising

P(x1,…,xn,y1,…,yn) = i,j i,j(yi,yj) i i(xi,yi)

Want argmaxy P(y | x) =argmaxy log P(x,y) =argminy i,j Ei,j(yi,yj)+i Ei(yi)

When is this MAP inference efficiently solvable(in high treewidth graphical models)?

Ei,j(yi,yj) = -log i,j(yi,yj)

Pairwise Markov Random Field

Xi: noisy pixels

Yi: “true” pixels

MAP inference in Markov Random Fields[Kolmogorov et al, PAMI ’04, see also: Hammer, Ops Res ‘65]

Constrained minimization

Part II

Multi-Label problems

Move makingexpansions move and swap move for this problem

Metric and Semi metric Potential functions

• if the pairwise potential functions define a metric then the energy function in equation (8) can be approximately minimized using alpha expansions.

• if pairwise potential functions defines a semi-metric, it can be minimized using alpha beta-swaps.

Move Energy

• Each move:• A transformation function: • The energy of a move t:• The optimal move:

Submodular set functions play an important role in energy minimization as they can be minimized in polynomial time

The swap move algorithm

The expansion move algorithm

Higher order potential

• The class of higher order clique potentialsfor which the expansion and swap moves can be

computed in polynomial timeThe clique potential take the form:

• Question you should be asking:

• Show that move energy is submodular for all xc

Can my higher order potential be solved using α-expansions?

• Form of the Higher Order Potentials

Moves for Higher Order Potentials

Clique Inconsistency function:

Pairwise potential:

cSum Form

Max Form

Theoretical Results: Swap• Move energy is always submodular if

non-decreasing concave.

proofs

Condition for Swap move

Concave Function:

Prove • all projections on two variables of any alpha

beta-swap move energy are submodular.

• The cost of any configuration

substitute

Constraints 1:Lema 1:Constraints2:

The theorem is true

Condition for alpha expansion

• Metric:

• Form of the Higher Order Potentials

Moves for Higher Order Potentials

Clique Inconsistency function:

Pairwise potential:

cSum Form

Max Form

Part II

Image Segmentation

E(X) = ∑ ci xi + ∑ dij |xi-xj|i i,j

E: {0,1}n → R

0 →fg, 1→bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04]

Image Unary Cost Segmentation

Pn Potts Potentials

Patch Dictionary

(Tree)

Cmax 0

{0 if xi = 0, i ϵ p Cmax otherwise

h(Xp) =

[slide credits: Kohli]

Pn Potts Potentials

E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j p

{0 if xi = 0, i ϵ p Cmax otherwise

h(Xp) =

E: {0,1}n → R

0 →fg, 1→bg

n = number of pixels

[slide credits: Kohli]

Theoretical Results: Expansion

• Move energy is always submodular if

increasing linear

See paper for proofs

PN Potts Model

c Cost : g

PN Potts Model

c Cost : gmax

Optimal moves for PN Potts• Computing the optimal swap move

Label 3Label 4

Case 1Not all variables assigned label 1 or 2

Move Energy is independent of tc

and can be ignored.

Label 1( )aLabel 2 ( )b

Label 3Label 4

Case 2All variables assigned label 1 or 2

Label 3Label 4

Case 2All variables assigned label 1 or 2

Can be minimized by solving a st-mincut problem

Solving the Move Energy

Add a constant

This transformation does not effect the solution

add a constant K to all possible values of the clique potential without changing the optimal move

Solving the Move Energy• Computing the optimal swap move

Source

v1 v2 vn

ti = 0 vi Source Set

tj = 1 vj Sink Set

Case 1: all xi = a (vi Source)

Source

v1 v2 vn

MtCost:

Source

Case 2: all xi = b (vi Sink)

v1 v2 vn

Source

Case 3: all xi = ,a b (vi Source, Sink)

Recall that the cost of an st-mincut is the sum of weights of the edges included in the stmincut which go from the source set to the sink set.

Optimal moves for PN Potts• The expansion move energy

• Similar graph construction.

Experimental Results• Texture Segmentation

Unary(Colour)

Pairwise(Smoothness)

Higher Order(Texture)

Original Pairwise Higher order

Experimental Results

Original Swap (3.2 sec)

Expansion (2.5 sec)

Pairwise Higher Order

Swap (4.2 sec)

Expansion (3.0 sec)

Experimental Results

Original

Pairwise Higher Order

Swap (4.7 sec)

Expansion (3.7sec)

Swap (5.0 sec)

Expansion (4.4 sec)

More Higher-order models

Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian...

Documents

Tractable Quanti cation of Metastability for Robust ...katiebyl/papers/Saglam_PhDthesis_2015.pdfTractable Quanti cation of Metastability ... Tractable Quanti cation of Metastability

Coiled Coils: A Tractable Problem for Bioinformatics

Tilburg University Tractable Counterparts of

We build an analytically and computationally tractable

Learning and Inference in Tractable Probabilistic ...homes.cs.washington.edu/~pedrod/papers/uai15.pdf · Learning and Inference in Tractable Probabilistic Knowledge Bases ... that

Distortion is Fixed Parameter Tractable

Tractable Lexical-Functional Grammarcognet.mit.edu/sites/default/files/coli_a_00384.pdf · 2020. 11. 24. · Tractable Lexical-Functional Grammar Jurgen Wedekind¨ University of Copenhagen

On Feature Combination for Multiclass Object Classification Peter Gehler and Sebastian Nowozin Reading group October 15, 2009

Tractable Multi-product Pricing under Discrete Choice

A Tractable Approach to Probabilistically Accurate Mode Estimation

Asymptotic Formulas in Analytically Tractable …Asymptotic Formulas in Analytically Tractable Stochastic Volatility Models Archil Gulisashvili Ohio University (joint work with E

Analytically tractable model for community ecology with ...physics.bu.edu/~pankajm/Papers/Dickens-Ecology.pdfAnalytically tractable model for community ecology with many species

Linkage Learning for Pittsburgh LCS: Making Problems Tractable

’A People of Tractable Conversation’: A Reappraisal of

Tractable Reliable Communication in Compromised Networks

On Tractable Parameterizations of Graph Isomorphism

Tractable Approximations to Robust Conic Optimization Problemsdbertsim/papers/Robust Optimization/Tractable... · Digital Object Identiﬁer (DOI) 10.1007/s10107-005-0677-1 Math

Computability - Tractable, Intractable and Non-computable Function

Versatile fixed-parameter tractable sampling for multi

Transaction Costs Made Tractable