Upload
tranhuong
View
216
Download
0
Embed Size (px)
Citation preview
2:30 – 4:30 – Labelling Problems Graphical Models Message Passing
4:30 – 5:00 - Coffee break
5:00 – 7:00 - Graph Cuts Move Making Algorithms Speed and Efficiency
Infer the values of hidden random variables
Geometry Estimation Object Segmentation
Building
Sky
Tree Grass
Pose Estimation
Orientation Joint angles
z є (R,G,B)n x є {0,1}n
( n = number of pixels )
n random variables: x1,…,xn
Joint Probability: P(x1,…,xn)
Conditional Probability: P(x1|z1…,zn)
z є (R,G,B)n x є {0,1}n
Posterior Probability
Likelihood (data-dependent)
Prior (data- independent)
P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
(MAP Solution) x* = arg max P(x|z) = arg min E(x) x x
( n = number of pixels )
MAP Solution
x* = argmax P(z|x) = argmax P(zi|xi)
Log P(zi|xi=0) P(zi|xi=1)
x
∏ xi
P(x|z) ~ P(z|x) P(x)
x
P(x|z) = P(z|x) P(x)
Posterior Likelihood Prior
f(xi,xj) ∏ xi,xj
Encourages consistency between labelling of adjacent pixels
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj) i,j i
P(x|z) = P(zi|xi) ∏ xi
P(xi,xj) ∏ xi,xj
-ve log
Posterior Probability
Energy
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj) i,j i
MRF Solution (Ising prior)
Likelihood Solution
Image
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj) i,j i
P(x|z) = P(zi|xi) ∏ xi
P(xi,xj,zi,zj) ∏ xi,xj
-ve log
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
||zi-zj||2
Co
st (
θij)
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj) i,j i
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
Pairwise Cost
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj) i,j i
Pairwise Cost
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
Global Minimum (x*)
Write probability distributions as Graphical model:
x2 x1
x4 x3
x5
Factor graph
unobserved
P(x) ~ exp{-E(x)} E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)
variables are in same factor.
“4 factors”
Gibbs distribution
References:
- Pattern Recognition and Machine Learning [Bishop ‘08, book, chapter 8] - several lectures at the Machine Learning Summer School 2009 (see video lectures)
Definition “Order”: The arity (number of variables) of the largest factor
E(X) = θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
x2 x1
x4 x3
x5
Factor graph with order 3
arity 3 arity 2
4-connected; pairwise MRF
Higher-order RF
E(x) = ∑ θij (xi,xj) i,j Є N4
higher(8)-connected; pairwise MRF
E(x) = ∑ θij (xi,xj) i,j Є N8
Order 2 Order 2 Order n
E(x) = ∑ θij (xi,xj)
+θ(x1,…,xn) i,j Є N4
“Pairwise energy” “higher-order energy”
P(x|z) ~ exp{-E(x)}
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj) i
Observed variable
Unobserved (latent) variable
xi
i,j Є N4
zi
Factor graph
xj
E(x,w) = ∑ |T(w)i-xi| + ∑ θij (xi,xj) i i,j Є N4
E(x,w): {0,1}n x {Exemplar} → R
Large set of example segmentation:
T(1) T(2) T(3)
1
Up to 2.000.000 exemplars
Goal: Detect and segment test image:
Hamming distance
[Kohli et al. IJCV 08, Lempisky et al. ECCV ’08]
w
(Template, Position, Scale, Orientation)
Test Image Test Image (60% Noise)
Training Image
Result
Pairwise Energy
P(x)
Minimized using st-mincut or max-product
message passing
Test Image Test Image (60% Noise)
Training Image
Result
Pairwise Energy
P(x)
Minimized using st-mincut or max-product
message passing
Higher Order Structure not Preserved
Minimize:
Where:
Higher Order Function (|c| = 10x10 = 100)
Assigns cost to 2100 possible labellings!
Exploit function structure to transform it to a Pairwise function
E(X) = P(X) + ∑ hc (Xc) c
hc: {0,1}|c| → R
p1 p2 p3
Rother and Kohli, MSR Tech Report [2010]
Test Image Test Image (60% Noise)
Training Image
Pairwise Result
Higher-Order Result
Learned Patterns
Rother and Kohli, MSR Tech Report [2010]
d=4
d=0
Ground truth depth Image – left(a) Image – right(b)
• Images rectified • Ignore occlusion for now
E(d): {0,…,D-1}n → R
Energy:
Labels: d (depth/shift)
di
θij (di,dj) = g(|di-dj|)
E(d): {0,…,D-1}n → R
Energy:
E(d) = ∑ θi (di) + ∑ θij (di,dj)
Pairwise:
i i,j Є N4
θi (di) = (lj-ri-di) “SAD; Sum of absolute differences” (many others possible, NCC,…)
i
i-2 (di=2)
Unary:
Left Image R
ight
Imag
e
left
right
[Olga Veksler PhD thesis, Daniel Cremers et al.]
|di-dj|
θij (di,dj) = g(|di-dj|)
cost
No truncation (global min.)
[Olga Veksler PhD thesis, Daniel Cremers et al.]
|di-dj|
discontinuity preserving potentials [Blake&Zisserman’83,’87]
θij (di,dj) = g(|di-dj|)
cost
No truncation (global min.)
with truncation (NP hard optimization)
|di-dj|
θij (di,dj) = g(|di-dj|)
cost
[Olga Veksler PhD thesis]
(Potts model) Smooth disparities
Potts model
Left image
see http://vision.middlebury.edu/stereo/
No MRF Pixel independent (WTA)
No horizontal links Efficient since independent chains
Ground truth Pairwise MRF [Boykov et al. ‘01]
Object Segmentation
water
boat chair
tree
road
Organ Detection
Assign a label to each pixel (or Voxel in 3D)
(1) No relationships between Pixel s
(2) Pixel Groups, no relationships between groups
(4) Pairwise Relationships between Pixel s
(3) Pairwise Relationships between Pixel groups
(5) Higher order Relationships between
Pixels
Image
Image Window (W)
Pixel to be classified (P)
Pixel Classifier
P,W Cost for assigning label Cow
Boosting [Shotton et al, 2006] Random Decision Forests
E(x) = ∑ fi (xi) + ∑ gij (xi,xj) + ∑ hc(xc) i ij c
Unary Pairwise Higher Order
How to minimize E(x)?
x takes discrete values
Which functions are exactly solvable?
Approximate solutions of NP-hard problems
Scalability and Efficiency
Which functions are exactly solvable? Boros Hammer [1965], Kolmogorov Zabih [ECCV 2002, PAMI 2004] , Ishikawa [PAMI 2003],
Schlesinger [EMMCVPR 2007], Kohli Kumar Torr [CVPR2007, PAMI 2008] , Ramalingam Kohli Alahari Torr [CVPR 2008] , Kohli Ladicky Torr [CVPR 2008, IJCV 2009] , Zivny Jeavons [CP 2008]
Approximate solutions of NP-hard problems Schlesinger [1976 ], Kleinberg and Tardos [FOCS 99], Chekuri et al. [2001], Boykov et al. [PAMI
2001], Wainwright et al. [NIPS 2001], Werner [PAMI 2007], Komodakis [PAMI 2005], Lempitsky et al. [ICCV 2007], Kumar et al. [NIPS 2007], Kumar et al. [ICML 2008], Sontag and Jakkola [NIPS 2007], Kohli et al. [ICML 2008], Kohli et al. [CVPR 2008, IJCV 2009], Rother et al. [2009]
Scalability and Efficiency Kohli Torr [ICCV 2005, PAMI 2007], Juan and Boykov [CVPR 2006], Alahari Kohli Torr [CVPR
2008] , Delong and Boykov [CVPR 2008]
Dynamic Programming (DP)
Belief Propagtion (BP)
Tree-Reweighted (TRW), Max-sum Diffusion
Graph Cut (GC)
Branch & Bound
Relaxation methods
Combinatorial Algorithms
Message passing
Convex Optimization (Linear Programming)
Iteratively pass messages between nodes...
Message update rule? Belief propagation (BP) Tree-reweighted belief propagation (TRW)
Max-product (minimizing an energy function, or MAP estimation)
x* = arg max P(x) = arg min E(x) E(x) = -ln P(x) + K
Sum-product (computing marginal probabilities)
P(xi) = ∑ P(x)
p q E(x) = ∑ fi (xi) + ∑ gij (xi,xj) i ij
xV-{i}
q p r
f (xp) + gpq (xp,xq)
Mp->q(L1) = min f (xp) + gpq (xp, L1) xp
= min (5+0, 1+2, 2+2)
5
1
2
gpq =2 ( xp ≠xq ) Potts Model Mp->q(L1,L2,L3) = (3,1,2)
L1
j
q p r
Mp->q + f (xq) + gqr (xq,xr)
Global minimum in linear time
Select minimum cost label
Trace back path to get minimum cost labeling
(Min-marginal)
Mq->r + f (xr)
min { E(x) | xr = j} x
Dynamic programming: global minimum in linear time
BP: Inward pass (dynamic programming)
Outward pass
Gives min-marginals
q p r
root leaf
leaf
Pass messages using same rules
Empirically often works quite well
May not converge
“Pseudo” min-marginals
Gives local minimum in the “tree neighborhood”
[Weiss&Freeman’01],[Wainwright et al.’04]
Assumptions: ▪ BP has converged
▪ no ties in pseudo min-marginals
Naïve implementation: O(K2) [K = number of labels] Often can be improved to O(K)
Potts interactions, truncated linear, truncated quadratic, ...
j
Provides a lower bound
Lower Bound < E(x*) < E(x’)
Slightly different messages
Try to solve a LP relaxation of the MAP problem
+
[Kolmogorov, Wainwright et al.]
Related to Dynamic Programming
Exact on Trees (Hyper-trees)
Connections to Linear programming
Popular methods: BP, TRW-S, Diffusion
Schedules have a significant result on the solution
For more details: [Kolmogorov ‘06, Wainwright et al. 05]
Space of Problems
Submodular
Functions
CSP
Tree Structured
Pair-wise O(n3)
MAXCUT
O(n6)
n = Number of Variables
Segmentation Energy
NP-Hard
Example: n = 2, A = [1,0] , B = [0,1]
f([1,0]) + f([0,1]) f([1,1]) + f([0,0])
Property : Sum of submodular functions is submodular
E(x) = ∑ ci xi + ∑ dij |xi-xj| i i,j
Binary Image Segmentation Energy is submodular
for all A,B ϵ {0,1}n f(A) + f(B) f(A˅B) + f(A˄B) (AND) (OR)
Pseudo-boolean function f{0,1}n ℝ is submodular if
Discrete Analogues of Concave Functions [Lovasz, ’83]
Widely applied in Operations Research
Applications in Machine Learning
MAP Inference in Markov Random Fields
Clustering [Narasimhan , Jojic, & Bilmes, NIPS 2005]
Structure Learning [Narasimhan & Bilmes, NIPS 2006]
Maximizing the spread of influence through a social network [Kempe, Kleinberg & Tardos, KDD 2003]
Polynomial time algorithms Ellipsoid Algorithm: [Grotschel, Lovasz & Schrijver ‘81]
First strongly polynomial algorithm: [Iwata et al. ’00] [A. Schrijver ’00]
Current Best: O(n5 Q + n6) [Q is function evaluation time] [Orlin ‘07]
Symmetric functions: E(x) = E(1-x) Can be minimized in O(n3)
Minimizing Pairwise submodular functions
Can be transformed to st-mincut/max-flow [Hammer , 1965]
Very low empirical running time ~ O(n)
E(X) = ∑ fi (xi) + ∑ gij (xi,xj) i ij
Source
Sink
v1 v2
2
5
9
4 1
2
Graph (V, E, C)
Vertices V = {v1, v2 ... vn}
Edges E = {(v1, v2) ....}
Costs C = {c(1, 2) ....}
Source
Sink
v1 v2
2
5
9
4 1
2
What is a st-cut?
An st-cut (S,T) divides the nodes between source and sink.
What is the cost of a st-cut?
Sum of cost of all edges going from S to T
5 + 1 + 9 = 15
What is a st-cut?
An st-cut (S,T) divides the nodes between source and sink.
What is the cost of a st-cut?
Sum of cost of all edges going from S to T
What is the st-mincut?
st-cut with the minimum cost
Source
Sink
v1 v2
2
5
9
4 1
2
2 + 2 + 4 = 8
Construct a graph such that:
1. Any st-cut corresponds to an assignment of x
2. The cost of the cut is equal to the energy of x : E(x)
Solution T
S st-mincut
E(x)
[Hammer, 1965] [Kolmogorov and Zabih, 2002
E(x) = ∑ θi (xi) + ∑ θij (xi,xj) i,j i
θij(0,1) + θij (1,0) θij
(0,0) + θij (1,1) For all ij
E(x) = ∑ ci xi + ∑ cij xi(1-xj) cij≥0 i,j i
Equivalent (transformable)
a1 a2
E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2
2
5
9
4
2
1 a1 = 1 a2 = 1
E (1,1) = 11
Cost of cut = 11
Sink (1)
Source (0)
a1 a2
E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2
2
5
9
4
2
1
Sink (1)
Source (0)
a1 = 1 a2 = 0
E (1,0) = 8
st-mincut cost = 8
Source
Sink
v1 v2
2
5
9
4 2
1
Solve the dual maximum flow problem
Compute the maximum flow between Source and Sink s.t.
Edges: Flow < Capacity
Nodes: Flow in = Flow out
Assuming non-negative capacity
In every network, the maximum flow equals the cost of the st-mincut
Min-cut\Max-flow Theorem
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
Source
Sink
v1 v2
2
5
9
4 2
1
Flow = 0
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
Source
Sink
v1 v2
2-2
5-2
9
4 2
1
Flow = 0 + 2
Source
Sink
v1 v2
0
3
9
4 2
1
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
Flow = 2
Source
Sink
v1 v2
0
3
9
4 2
1
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 2
Source
Sink
v1 v2
0
3
9
4 2
1
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 2
Source
Sink
v1 v2
0
3
5
0 2
1
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 2 + 4
Source
Sink
v1 v2
0
3
5
0 2
1
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 6
Source
Sink
v1 v2
0
3
5
0 2
1
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 6
Source
Sink
v1 v2
0
1
3
0 2-2
1+2
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 6 + 2
Source
Sink
v1 v2
0
2
4
0
3
0
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 8
Source
Sink
v1 v2
0
2
4
0 3
0
Augmenting Path Based Algorithms
1. Find path from source to sink with positive capacity
2. Push maximum possible flow through this path
3. Repeat until no path can be found
Flow = 8
a1 a2
E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2
2
5
9
4
2
1
Sink (1)
Source (0)
2a1 + 5ā1
= 2(a1+ā1) + 3ā1
= 2 + 3ā1
Sink (1)
Source (0)
a1 a2
E(a1,a2) = 2 + 3ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2
0
3
9
4
2
1 2a1 + 5ā1
= 2(a1+ā1) + 3ā1
= 2 + 3ā1
a1 a2
E(a1,a2) = 2 + 3ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2
0
3
9
4
2
1 9a2 + 4ā2
= 4(a2+ā2) + 5ā2
= 4 + 5ā2
Sink (1)
Source (0)
a1 a2
E(a1,a2) = 2 + 3ā1+ 5a2 + 4 + 2a1ā2 + ā1a2
0
3
5
0
2
1 9a2 + 4ā2
= 4(a2+ā2) + 5ā2
= 4 + 5ā2
Sink (1)
Source (0)
a1 a2
E(a1,a2) = 6 + 3ā1+ 5a2 + 2a1ā2 + ā1a2
0
3
5
0
2
1
3ā1+ 5a2 + 2a1ā2
= 2(ā1+a2+a1ā2) +ā1+3a2
= 2(1+ā1a2) +ā1+3a2
F1 = ā1+a2+a1ā2 F2 = 1+ā1a2
a1 a2 F1 F2
0 0 1 1
0 1 2 2
1 0 1 1
1 1 1 1
Sink (1)
Source (0)
a1 a2
E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2
0
1
3
0
0
3
3ā1+ 5a2 + 2a1ā2
= 2(ā1+a2+a1ā2) +ā1+3a2
= 2(1+ā1a2) +ā1+3a2
a1 a2 F1 F2
0 0 1 1
0 1 2 2
1 0 1 1
1 1 1 1
F1 = ā1+a2+a1ā2 F2 = 1+ā1a2
Sink (1)
Source (0)
a1 a2
0
1
3
0
0
3
E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2
No more augmenting paths
possible
Sink (1)
Source (0)
a1 a2
0
1
3
0
0
3
E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2
Total Flow
Residual Graph (positive coefficients)
bound on the optimal solution
Tight Bound --> Inference of the optimal solution becomes trivial
Sink (1)
Source (0)
a1 a2
0
1
3
0
0
3
E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2
a1 = 1 a2 = 0
E (1,0) = 8
st-mincut cost = 8
Total Flow
bound on the energy of the optimal solution
Residual Graph (positive coefficients)
Tight Bound --> Inference of the optimal solution becomes trivial
Sink (1)
Source (0)
[Slide credit: Andrew Goldberg]
Augmenting Path and Push-Relabel n: #nodes
m: #edges
U: maximum edge weight
Algorithms assume non-
negative edge weights
[Slide credit: Andrew Goldberg]
n: #nodes
m: #edges
U: maximum edge weight
Algorithms assume non-
negative edge weights
Augmenting Path and Push-Relabel
a1 a2
1000 1
Sink
Source
1000
1000 1000
0 Good Augmenting
Paths
Ford Fulkerson: Choose any augmenting path
a1 a2
1000 1
Sink
Source
1000
1000 1000
0 Bad Augmenting
Path
Ford Fulkerson: Choose any augmenting path
a1 a2
999 0
Sink
Source
1000
1000 999
1
Ford Fulkerson: Choose any augmenting path
n: #nodes
m: #edges
We will have to perform 2000 augmentations!
Worst case complexity: O (m x Total_Flow) (Pseudo-polynomial bound: depends on flow)
Dinic: Choose shortest augmenting path
n: #nodes
m: #edges
Worst case Complexity: O (m n2)
a1 a2
1000 1
Sink
Source
1000
1000 1000
0
Specialized algorithms for vision problems Grid graphs
Low connectivity (m ~ O(n))
Dual search tree augmenting path algorithm [Boykov and Kolmogorov PAMI 2004]
• Finds approximate shortest augmenting paths efficiently
• High worst-case time complexity
• Empirically outperforms other algorithms on vision problems
Efficient code available on the web
http://www.adastral.ucl.ac.uk/~vladkolm/software.html
E(x) = ∑ ci xi + ∑ dij |xi-xj| i i,j
Global Minimum (x*)
x x* = arg min E(x)
How to minimize E(x)?
E: {0,1}n → R 0 → fg
1 → bg
n = number of pixels
Sink (1)
Source (0)
Graph *g;
For all pixels p
/* Add a node to the graph */ nodeID(p) = g->add_node();
/* Set cost of terminal edges */ set_weights(nodeID(p), fgCost(p), bgCost(p));
end
for all adjacent pixels p,q
add_weights(nodeID(p), nodeID(q), cost(p,q)); end
g->compute_maxflow(); label_p = g->is_connected_to_source(nodeID(p)); // is the label of pixel p (0 or 1)
a1 a2
fgCost(a1)
Sink (1)
Source (0)
fgCost(a2)
bgCost(a1) bgCost(a2)
Graph *g;
For all pixels p
/* Add a node to the graph */ nodeID(p) = g->add_node();
/* Set cost of terminal edges */ set_weights(nodeID(p), fgCost(p), bgCost(p));
end
for all adjacent pixels p,q
add_weights(nodeID(p), nodeID(q), cost(p,q)); end
g->compute_maxflow(); label_p = g->is_connected_to_source(nodeID(p)); // is the label of pixel p (0 or 1)
a1 a2
fgCost(a1)
Sink (1)
Source (0)
fgCost(a2)
bgCost(a1) bgCost(a2)
cost(p,q)
Graph *g;
For all pixels p
/* Add a node to the graph */ nodeID(p) = g->add_node();
/* Set cost of terminal edges */ set_weights(nodeID(p), fgCost(p), bgCost(p));
end
for all adjacent pixels p,q
add_weights(nodeID(p), nodeID(q), cost(p,q)); end
g->compute_maxflow(); label_p = g->is_connected_to_source(nodeID(p)); // is the label of pixel p (0 or 1)
Graph *g;
For all pixels p
/* Add a node to the graph */ nodeID(p) = g->add_node();
/* Set cost of terminal edges */ set_weights(nodeID(p), fgCost(p), bgCost(p));
end
for all adjacent pixels p,q
add_weights(nodeID(p), nodeID(q), cost(p,q)); end
g->compute_maxflow(); label_p = g->is_connected_to_source(nodeID(p)); // is the label of pixel p (0 or 1)
a1 a2
fgCost(a1)
Sink (1)
Source (0)
fgCost(a2)
bgCost(a1) bgCost(a2)
cost(p,q)
a1 = bg a2 = fg
Mixed (Real-Integer) Problems
Higher Order Energy Functions
Multi-label Problems
Ordered Labels ▪ Stereo (depth labels)
Unordered Labels ▪ Object segmentation ( ‘car’, `road’, `person’)
Mixed (Real-Integer) Problems
Higher Order Energy Functions
Multi-label Problems
Ordered Labels ▪ Stereo (depth labels)
Unordered Labels ▪ Object segmentation ( ‘car’, `road’, `person’)
x – binary image segmentation (xi ∊ {0,1})
ω – non-local parameter (lives in some large set Ω)
constant unary
potentials pairwise
potentials
E(x,ω) = C(ω) + ∑ θi (ω, xi) + ∑ θij (ω,xi,xj) i,j i
≥ 0
ω Template Position
Scale Orientation
We have seen several of them in the intro ...
x – binary image segmentation (xi ∊ {0,1})
ω – non-local parameter (lives in some large set Ω)
constant unary
potentials pairwise
potentials
E(x,ω) = C(ω) + ∑ θi (ω, xi) + ∑ θij (ω,xi,xj) i,j i
≥ 0
{x*,ω*} = arg min E(x,ω)
• Standard “graph cut” energy if ω is fixed
x,ω
[Kohli et al, 06,08] [Lempitsky et al, 08]
Local Method: Gradient Descent over ω
ω*
ω * = arg min min E (x,ω)
x ω
Submodular
[Kohli et al, 06,08]
Local Method: Gradient Descent over ω
ω * = arg min min E (x,ω)
x ω
Submodular
Dynamic Graph Cuts
15- 20 time speedup!
E (x,ω1)
E (x,ω2)
Similar Energy Functions
[Kohli et al, 06,08]
Global Method: Branch and Mincut
[Lempitsky et al, 08]
Produces the global optimal solution Exhaustively explores all ω in Ω in the worst case
30,000,000
shapes
Mixed (Real-Integer) Problems
Higher Order Energy Functions
Multi-label Problems
Ordered Labels ▪ Stereo (depth labels)
Unordered Labels ▪ Object segmentation ( ‘car’, `road’, `person’)
E(X) = ∑ ci xi + ∑ dij |xi-xj| i i,j
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
Image Unary Cost Segmentation
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j p
p
{ C1 if xi = 0, i ϵ p Cmax otherwise
h(Xp) =
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[Kohli et al. ‘07]
E(X) = ∑ ci xi + ∑ dij |xi-xj| + ∑ hp (Xp) i i,j
Image Pairwise Segmentation Final Segmentation
p
E: {0,1}n → R
0 →fg, 1→bg
n = number of pixels
[Kohli et al. ‘07]
T
S st-mincut
Pairwise Submodular Function
Higher Order Submodular
Functions
Billionnet and M. Minoux [DAM 1985] Kolmogorov & Zabih [PAMI 2004] Freedman & Drineas [CVPR2005] Kohli Kumar Torr [CVPR2007, PAMI 2008] Kohli Ladicky Torr [CVPR 2008, IJCV 2009] Ramalingam Kohli Alahari Torr [CVPR 2008] Zivny et al. [CP 2008]
Exact Transformation
?
Identified transformable families of higher order function s.t.
1. Constant or polynomial number of auxiliary variables (a) added
2. All pairwise functions (g) are submodular
Pairwise Submodular
Function
Higher Order Function
Simple Example using Auxiliary variables
{ 0 if all xi = 0 C1 otherwise f(x) =
min f(x) min C1a + C1 ∑ ā xi
x ϵ L = {0,1}n
x = x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi < 1 a=0 (ā=1) f(x) = 0
∑xi ≥ 1 a=1 (ā=0) f(x) = C1
min f(x) min C1a + C1 ∑ ā xi x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
C1
C1∑xi
[Kohli et al. ‘08]
min f(x) min C1a + C1 ∑ ā xi x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
C1
C1∑xi
a=1 a=0 Lower envelop of concave functions is
concave
[Kohli et al. ‘08]
min f(x) min f1 (x)a + f2(x)ā x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
Lower envelop of concave functions is
concave
f2(x)
f1(x)
[Kohli et al. ‘08]
min f(x) min f1 (x)a + f2(x)ā x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
a=1 a=0 Lower envelop of concave functions is
concave
f2(x)
f1(x)
[Kohli et al. ‘08]
min f(x) min max f1 (x)a + f2(x)ā x
= x ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
a=1 a=0 Upper envelope
of linear functions is
convex
f2(x)
f1(x)
a ϵ {0,1}
Very Hard Problem!!!! [Kohli and Kumar, CVPR 2010]
Object Prior on size of object (say n/2)
Background
BINARY SEGMENTATION
SIZE/VOLUME PRIORS
[Kohli and Kumar, CVPR 2010]
SILHOUETTE CONSTRAINTS
Rays must touch silhouette
at least once
3D RECONSTRUCTION
[Sinha et al. ‘05, Cremers et al. ’08] [Woodford et al. 2009]
Mixed (Real-Integer) Problems
Higher Order Energy Functions
Multi-label Problems
Ordered Labels ▪ Stereo (depth labels)
Unordered Labels ▪ Object segmentation ( ‘car’, `road’, `person’)
Exact Transformation to QPBF Move making algorithms
E(y) = ∑ fi (yi) + ∑ gij (yi,yj) i,j i
y ϵ Labels L = {l1, l2, … , lk}
Min y
[Roy and Cox ’98] [Ishikawa ’03] [Schlesinger & Flach ’06]
[Ramalingam, Alahari, Kohli, and Torr ’08]
So what is the problem?
Eb (x1,x2, ..., xm) Em (y1,y2, ..., yn)
Multi-label Problem Binary label Problem
yi ϵ L = {l1, l2, … , lk} xi ϵ L = {0,1}
such that:
Let Y and X be the set of feasible solutions, then
1. One-One encoding function T:X->Y
2. arg min Em(y) = T(arg min Eb(x))
• Popular encoding scheme
[Roy and Cox ’98, Ishikawa ’03, Schlesinger & Flach ’06]
# Nodes = n * k
# Edges = m * k2
• Popular encoding scheme
[Roy and Cox ’98, Ishikawa ’03, Schlesinger & Flach ’06]
# Nodes = n * k
# Edges = m * k2
Ishikawa’s result:
E(y) = ∑ θi (yi) + ∑ θij (yi,yj) i,j i
y ϵ Labels L = {l1, l2, … , lk}
θij (yi,yj) = g(|yi-yj|) Convex Function
g(|yi-yj|)
|yi-yj|
• Popular encoding scheme
[Roy and Cox ’98, Ishikawa ’03, Schlesinger & Flach ’06]
# Nodes = n * k
# Edges = m * k2
Schlesinger & Flach ’06:
E(y) = ∑ θi (yi) + ∑ θij (yi,yj) i,j i
y ϵ Labels L = {l1, l2, … , lk}
θij(li+1,lj) + θij (li,lj+1) θij
(li,lj) + θij (li+1,lj+1)
li +1
li
lj +1
lj
Applicability Cannot handle truncated costs
(non-robust)
Computational Cost Very high computational cost
Problem size = |Variables| x |Labels|
Gray level image denoising (1 Mpixel image)
(~2.5 x 108 graph nodes)
θij (yi,yj) = g(|yi-yj|)
|yi-yj|
discontinuity preserving potentials Blake&Zisserman’83,87
Unary Potentials
Pair-wise Potentials
Complexity
Ishikawa Transformation
[03]
Arbitrary Convex and Symmetric
T(nk, mk2)
Schlesinger Transformation
[06]
Arbitrary Submodular T(nk, mk2)
Hochbaum [01]
Linear Convex and Symmetric
T(n, m) + n log k
Hochbaum [01]
Convex Convex and Symmetric
O(mn log n log nk)
Other “less known” algorithms
T(a,b) = complexity of maxflow with a nodes and b edges
Exact Transformation to QPBF Move making algorithms
E(y) = ∑ fi (yi) + ∑ gij (yi,yj) i,j i
y ϵ Labels L = {l1, l2, … , lk}
Min y
[Boykov , Veksler and Zabih 2001] [Woodford, Fitzgibbon, Reid, Torr, 2008]
[Lempitsky, Rother, Blake, 2008] [Veksler, 2008] [Kohli, Ladicky, Torr 2008]
Search Neighbourhood
Current Solution
Optimal Move
xc
(t) Key Property
Move Space
Bigger move space
Solution Space
En
erg
y
• Better solutions
• Finding the optimal move hard
Minimizing Pairwise Functions [Boykov Veksler and Zabih, PAMI 2001]
• Series of locally optimal moves
• Each move reduces energy
• Optimal move by minimizing submodular function
Space of Solutions (x) : Ln
Move Space (t) : 2n Search Neighbourhood
Current Solution
n Number of Variables
L Number of Labels
Kohli et al. ‘07, ‘08, ‘09 Extend to minimize Higher order Functions
Minimize over move variables t
x = t x1 + (1- t) x2
New solution
Current Solution
Second solution
Em(t) = E(t x1 + (1- t) x2)
For certain x1 and x2, the move energy is sub-modular QPBF
[Boykov , Veksler and Zabih 2001]
Sky
House
Tree
Ground Swap Sky, House
• Variables labeled α, β can swap their labels
[Boykov , Veksler and Zabih 2001]
Move energy is submodular if:
Unary Potentials: Arbitrary
Pairwise potentials: Semi-metric
θij (la,lb) ≥ 0
θij (la,lb) = 0 a = b
Examples: Potts model, Truncated Convex
[Boykov , Veksler and Zabih 2001]
• Variables labeled α, β can swap their labels
[Boykov, Veksler, Zabih]
• Variables take label a or retain current label
[Boykov , Veksler and Zabih 2001]
Sky
House
Tree
Ground
Initialize with Tree Status: Expand Ground Expand House Expand Sky
[Boykov, Veksler, Zabih] [Boykov , Veksler and Zabih 2001]
• Variables take label a or retain current label
Move energy is submodular if:
Unary Potentials: Arbitrary
Pairwise potentials: Metric
[Boykov, Veksler, Zabih]
θij (la,lb) + θij (lb,lc) ≥ θij (la,lc)
Semi metric +
Triangle Inequality
Examples: Potts model, Truncated linear
Cannot solve truncated quadratic
• Variables take label a or retain current label
[Boykov , Veksler and Zabih 2001]
Expansion and Swap can be derived as a primal dual scheme
Get solution of the dual problem which is a lower bound on the energy of solution
Weak guarantee on the solution
[Komodakis et al 05, 07]
E(x) < 2(dmax /dmin) E(x*)
dmax
dmin
θij (li,lj) = g(|li-lj|)
|yi-yj|
Move Type First Solution
Second Solution
Guarantee
Expansion Old solution All alpha Metric
Fusion Any solution Any solution
Minimize over move variables t
x = t x1 + (1-t) x2
New solution
First solution
Second solution
Move functions can be non-submodular!!
x = t x1 + (1-t) x2
x1, x2 can be continuous
F x1
x2
x
Optical Flow Example
Final Solution
Solution from
Method 1
Solution from
Method 2
[Woodford, Fitzgibbon, Reid, Torr, 2008] [Lempitsky, Rother, Blake, 2008]
Move variables can be multi-label
Optimal move found out by using the Ishikawa Transform
Useful for minimizing energies with truncated convex pairwise potentials
θij (yi,yj) = min(|yi-yj|2,T)
|yi-yj|
θij (yi,yj)
T
x = (t ==1) x1 + (t==2) x2 +… +(t==k) xk
[Kumar and Torr, 2008] [Veksler, 2008]
3,600,000,000 Pixels Created from about 800 8 MegaPixel Images
[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]
Processing Videos 1 minute video of 1M pixel resolution
3.6 B pixels
3D reconstruction [500 x 500 x 500 = .125B voxels]
Kohli & Torr (ICCV05, PAMI07)
Can we do better?
Segment
Segment
First Frame
Second Frame [Kohli & Torr, ICCV05 PAMI07]
EA SA minimize
fast
Simpler
3–100000
time speedup! Reparametrization
Reuse Computation
minimize SB
differences between A and B
EB*
EB Frame 2
Frame 1
[Kohli & Torr, ICCV05 PAMI07] [Komodakis & Paragios, CVPR07]
Reparametrized Energy
Kohli & Torr (ICCV05, PAMI07)
E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 2a1ā2 + ā1a2
E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2
Original Energy
E(a1,a2) = 2a1 + 5ā1+ 9a2 + 4ā2 + 7a1ā2 + ā1a2
E(a1,a2) = 8 + ā1+ 3a2 + 3ā1a2 + 5a1ā2
New Energy
New Reparametrized Energy
[Kohli & Torr, ICCV05 PAMI07] [Komodakis & Paragios, CVPR07]
Original Problem (Large)
Fast partially optimal
algorithm
Approximate Solution
[ Alahari Kohli & Torr CVPR ‘08]
Approximation algorithm
(Slow)
Original Problem (Large)
Fast partially optimal
algorithm
Approximate Solution
Approximation algorithm
(Slow)
Reduced Problem
Solved Problem (Global Optima)
Approximation algorithm
Approximate Solution
Fast partially optimal algorithm
[Kovtun ‘03] [ Kohli et al. ‘09]
[ Alahari Kohli & Torr CVPR ‘08]
(Fast)
Original Problem (Large)
Fast partially optimal
algorithm
Approximate Solution
Tree Reweighted Message Passing
(9.89 sec)
Reduced Problem
Solved Problem (Global Optima) Total Time
(0.30 sec)
Approximate Solution
sky
Building
Airplane
Grass
sky
Building
Airplane
Grass
sky
Building
Airplane
Grass
3- 100 Times Speed up
Tree Reweighted Message Passing
[ Alahari Kohli & Torr CVPR ‘08]
Fast partially optimal algorithm
[Kovtun ‘03] [ Kohli et al. ‘09]
Message Passing algorithms
Belief propagation, TRW
Exact on tree structured graphs
Connections to Linear programming relaxations
Graph Cuts and submodular functions
Move making algorithms – expansion, swap, fusion
Minimizing Higher order functions
Going beyond small connectivity
Representation, Inference and learning of Higher order models
Topology constraints: connectivity of objects
Parallelization – GPUs
Linear programming formulations
Past presenters
Pawan, Carsten and Andrew Blake
Special thanks to the IbPRIA team
In particular:
Joost Van de Weijer
Jordi Gonzàlez
Mario Hernández Tejera
Set of Label assignments
Associated Costs
Introduce pattern variable z with (t+1)-states to get a pairwise function:
Unary term assigns the cost for the selected label assignment z
Pairwise terms ensure x is consistent with pattern z
p1 p2 p3
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj) i
Observed variable
Unobserved (latent) variable
i,j Є N4
Conditional Random Field (CRF): no pure prior
MRF
xj
zj
zi
xj
zj
zi
xi
xj
Factor graph CRF
θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||})
ß=2(Mean(||zi-zj||2) )-1
||zi-zj||
θij
|xi-xj|
θij (xi,xj) = g(|xi-xj|)
Pai
r-w
ise
E(x) = ∑ θi (xi,zi) + w∑ θij (xi,xj)
θi (xi,zi) = |xi-zi|
Convex unary and pairwise terms:
original input TRW-S (discrete labels)
HBF [Szelsiki ‘06] (continuous labels) ~ 15times faster
xi ∊ R
xi
un
ary
Can be solved globally optimal
Non-submodular Energy Functions
Mixed (Real-Integer) Problems
Higher Order Energy Functions
Multi-label Problems
Ordered Labels ▪ Stereo (depth labels)
Unordered Labels ▪ Object segmentation ( ‘car’, `road’, `person’)
Minimizing general non-submodular functions is NP-hard.
Commonly used method is to solve a relaxation of the problem
E(x) = ∑ θi (xi) + ∑ θij (xi,xj) i,j i
θij(0,1) + θij (1,0) ≤ θij
(0,0) + θij (1,1) for some ij
[Boros and Hammer, ‘02]
Formulate the program
Solve LP and obtain Integer solution by rounding
Integral solution is not guaranteed to be optimal
However, we can obtain multiplicative bounds on the error
Feasibility Region
(LOCAL Polytope)
Round
Traditional LP Solvers are unable to handle large instances ( >107 variables) Simplex
Interior Point Methods
[Bertsimas and Tsitsikilis, 1997]
[Boyd and Vandenberghe, 2004]
Exploit the structure of the problem
Special Message Passing Solvers Message Passing Algorithms
Weak convergence properties
[Kolmogorov and Wainwright]
[Ravikumar et al., ICML 2008] Feasibility Region
(LOCAL Polytope)
A new weaker relaxation...roof-dual [Boros and Hammer, 2002] [Kohli et al., ICML 2008] [Kolmogorov and Rother, PAMI 2006]
... which can be exactly solved by
minimizing a pairwise submodular function
Obtain partially optimal solutions x = [00*10**1111*] xOPT = [001 100011110]
Feasibility Region
(LOCAL Polytope)
x
pairwise nonsubmodular
unary
pairwise submodular
)0,1()1,0()1,1()0,0( pqpqpqpq
)0,1(~
)1,0(~
)1,1(~
)0,0(~
pqpqpqpq
[Boros and Hammer, ‘02]