View
386
Download
3
Category
Tags:
Preview:
Citation preview
Course Program9.30-10.00 Introduction (Andrew Blake)
10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)
15min Coffee break
11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)
12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)
1 hour Lunch break
14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)
15:00-15.30 Speed and Efficiency (Pushmeet Kohli)
15min Coffee break
15:45-16.15 Comparison of Methods (Carsten Rother)
16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc. (Carsten Rother + Pawan Kumar)
All online material will be online (after conference):http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/
Discrete Models in Computer Vision
Carsten Rother
Microsoft Research Cambridge
Overview
• Introduce Factor graphs notation
• Categorization of models in Computer Vision:
– 4-connected MRFs
– Highly-connected MRFs
– Higher-order MRFs
Model : discrete or continuous variables? discrete or continuous space? Dependence between variables? …
Markov Random Field Models for Computer Vision
Inference: Graph Cut (GC)
Belief Propagtion (BP)
Tree-Reweighted Message Parsing (TRW)
Iterated Conditional Modes (ICM)
Cutting-plane
Dual-decomposition
…
Learning: Exhaustive search (grid search)
Pseudo-Likelihood approximation
Training in Pieces
Max-margin
…
Applications: 2D/3D Image segmentation Object Recognition 3D reconstruction Stereo matching Image denoising Texture Synthesis Pose estimation Panoramic Stitching …
Recap: Image Segmentation
P(x|z) ~ P(z|x) P(x)
P(x|z) ~ exp{-E(x)}
E(x) = ∑ θi (xi) + w∑ θij (xi,xj)
Posterior; Likelihood; Prior
Gibbs distribution
i,j Є N4i
Energy
E: {0,1}n → R
Unary term Pairwise term
Input z Maximum-a-posteriori (MAP)x* = argmax P(x|z)
x x= argmin E(x)
Min-Marginals(uncertainty of MAP-solution)
Can be used in several ways:• Insights on the model• For optimization (TRW, comes later)
image MAP Min-Marginals(foreground)
(bright=very certain)
ψv;i = min E(x)xv=i
Definition:
Introducing Factor Graphs
Write probability distributions as Graphical models:
- Direct graphical model- Undirected graphical model (… what Andrew Blake used)
- Factor graphs
References:- Pattern Recognition and Machine Learning *Bishop ‘08, book, chapter 8+
- several lectures at the Machine Learning Summer School 2009 (see video lectures)
Factor Graphs
P(x) ~ θ(x1,x2,x4) θ(x3,x4) θ(x2,x3) θ(x4,x5)
x2x1
x4x3
x5
Factor graph
unobserved/latent/hidden variable
P(x) ~ exp{-E(x)} E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)
variables are in same factor.
“4 factors”
Gibbs distribution
Definition “Order”
Definition “Order”:The arity (number of variables) of the largest factor
P(X) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
x2x1
x4x3
x5
Factor graph with order 3
arity 3 arity 2
Examples - Order
4-connected; pairwise MRF
Higher-order MRF
E(x) = ∑ θij (xi,xj)i,j Є N4
higher(8)-connected; pairwise MRF
E(x) = ∑ θij (xi,xj)i,j Є N8
Order 2 Order 2 Order n
E(x) = ∑ θij (xi,xj)
+θ(x1,…,xn)i,j Є N4
“Pairwise energy” “higher-order energy”
Example: Image segmentationP(x|z) ~ exp{-E(x)}
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj)i
Observed variable
Unobserved (latent) variable
xi
i,j Є N4
zi
Factor graph
xj
Most simple inference technique:ICM (iterated conditional mode)
x2
x1 x4
x5
x3
x* = argmin E(x)x
Goal:
E(x) = θ12 (x1,x2)+ θ13 (x1,x3)+θ14 (x1,x4)+ θ15 (x1,x5)+…
Most simple inference technique:ICM (iterated conditional mode)
Can get stuck in local minima!
E(x) = θ12 (x1,x2)+ θ13 (x1,x3)+θ14 (x1,x4)+ θ15 (x1,x5)+…
ICM Global min
x2
x1 x4
x5
x3
x* = argmin E(x)x
Simulated Annealing: accept a move even if energy increases (with certain probability)
means observed
Goal:
Overview
• Introduce Factor graphs notation
• Categorization of models in Computer Vision:
– 4-connected MRFs
– Highly-connected MRFs
– Higher-order MRFs
d=4
d=0
Stereo matching
Ground truth depthImage – left(a) Image – right(b)
• Images rectified• Ignore occlusion for now
E(d): {0,…,D-1}n → R
Energy:
Labels: d (depth/shift)
di
Stereo matching - Energy
θij (di,dj) = g(|di-dj|)
E(d): {0,…,D-1}n → R
Energy:
E(d) = ∑ θi (di) + ∑ θij (di,dj)
Pairwise:
i i,j Є N4
θi (di) = (lj-ri-di)“SAD; Sum of absolute differences”(many others possible, NCC,…)
i
i-2(di=2)
Unary:
Left ImageR
ight
Imag
e
many others
left
right
Stereo matching - prior
[Olga Veksler PhD thesis, Daniel Cremers et al.]
|di-dj|
θij (di,dj) = g(|di-dj|)
cost
No truncation(global min.)
Stereo matching - prior
[Olga Veksler PhD thesis, Daniel Cremers et al.]
|di-dj|
discontinuity preserving potentials*Blake&Zisserman’83,’87+
θij (di,dj) = g(|di-dj|)
cost
No truncation(global min.)
with truncation(NP hard optimization)
Stereo matching see http://vision.middlebury.edu/stereo/
No MRFPixel independent (WTA)
No horizontal links Efficient since independent chains
Ground truthPairwise MRF[Boykov et al. ‘01+
Texture synthesis
Input
Output
[Kwatra et. al. Siggraph ‘03 +
E: {0,1}n → R
b
a
O
1i,j Є N4
E(x) = ∑ |xi-xj| [ |ai-bi|+|aj-bj| ]
a
b
a
b
i j i j
Good case: Bad case:
Video Synthesis
OutputInput
Video
Video (duplicated)
Panoramic stitching
Panoramic stitching
AutoCollage
http://research.microsoft.com/en-us/um/cambridge/projects/autocollage/ [Rother et. al. Siggraph ‘05 +
Recap: 4-connected MRFs
• A lot of useful vision systems are based on 4-connected pairwise MRFs.
• Possible Reason (see Inference part): a lot of fast and good (globally optimal) inference methods exist
Overview
• Introduce Factor graphs notation
• Categorization of models in Computer Vision?
– 4-connected MRFs
– Highly-connected MRFs
– Higher-order MRFs
Why larger connectivity?
We have seen…
• “Knock-on” effect (each pixel influences each other pixel)
• Many good systems
What is missing:
1. Modelling real-world texture (images)
2. Reduce discretization artefacts
3. Encode complex prior knowledge
4. Use non-local parameters
Reason 1: Texture modelling
Test image Test image (60% Noise)Training images
Result MRF9-connected
(7 attractive; 2 repulsive)
Result MRF4-connected
Result MRF4-connected(neighbours)
Reason1: Texture Modellinginput output
[Zalesny et al. ‘01+
Reason2: Discretization artefacts
*Boykov et al ‘03, ‘05+
Larger connectivity can model true Euclidean length (also other metric possible)
Eucl.
Length of the paths:
4-con.
5.65
8
1
8-con.
6.28
6.28
5.08
6.75
Reason2: Discretization artefacts
higher-connectivity can model true Euclidean length
4-connected Euclidean
8-connected Euclidean
8-connected geodesic
[Boykov et al. ‘03; ‘05+
3D reconstruction
[Slide credits: Daniel Cremers]
Reason 3: Encode complex prior knowledge: Stereo with occlusion
Each pixel is connected to D pixels in the other image
E(d): {1,…,D}2n → R
matchθlr (dl,dr) =
dl dr
d=10 (match)
1
D
d
1
D
dd=20 (0 cost)
d=1 ( cost)∞
Left view right view
Stereo with occlusion
Ground truth Stereo with occlusion[Kolmogrov et al. ‘02+
Stereo without occlusion*Boykov et al. ‘01+
Reason 4: Use Non-local parameters:Interactive Segmentation (GrabCut)
[Boykov and Jolly ’01+
GrabCut *Rother et al. ’04+
A meeting with the Queen
Reason 4: Use Non-local parameters:Interactive Segmentation (GrabCut)
An object is a compact set of colors:
[Rother et al Siggraph ’04+
E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)i i,j Є N4
E(x,w): {0,1}n x {GMMs}→ R
Red Red
w
Model jointly segmentation and color model:
Reason 4: Use Non-local parameters:Segmentation and Recognition
E(x,w) = ∑ |T(w)i-xi| + ∑ θij (xi,xj)i i,j Є N4
E(x,w): {0,1}n x {Exemplar}→ R
Large set of example segmentation:
T(1) T(2) T(3)
1
Up to 2.000.000 exemplars
Goal, Segment test image:
“Hamming distance”
[Lempisky et al. ECCV ’08+
Reason 4: Use Non-local parameters:Segmentation and Recognition
[Lempisky et al. ECCV ’08+
UIUC dataset; 98.8% accuracy
• Introduce Factor graphs notation
• Categorization of models in Computer Vision?
– 4-connected MRFs
– Highly-connected MRFs
– Higher-order MRFs
Overview
Why Higher-order Functions?
In general θ(x1,x2,x3) ≠ θ(x1,x2) + θ(x1,x3) + θ(x2,x3)
Reasons for higher-order MRFs:
1. Even better image(texture) models:– Field-of Expert [FoE, Roth et al. ‘05+
– Curvature *Woodford et al. ‘08+
2. Use global Priors:– Connectivity *Vicente et al. ‘08, Nowizin et al. ‘09+
– Encode better training statistics *Woodford et al. ‘09+
Reason1: Better Texture Modelling
Test Image Test Image (60% Noise)
Training images
Result pairwise MRF9-connected
Higher Order Structure not Preserved
Higher-order MRF
*Rother et al CVPR ‘09+
Reason 2: Use global PriorForeground object must be connected:
User input Standard MRF:Removes noise (+)Shrinks boundary (-)
with connectivity
E(x) = P(x) + h(x) with h(x)={∞ if not 4-connected0 otherwise
[Vicente et. al. ’08Nowizin et al ‘09+
Reason 2: Use global PriorWhat is the prior of a MAP-MRF solution:
[Woodford et. al. ICCV ‘09](see poster on Friday)
Training image: 60% black, 40% white
MRF is a bad prior since input marginal statistic ignored !
MAP:prior(x) = 0.6 = 0.0168
Others less likely :
prior(x) = 0.6 * 0.4 = 0.0055 3
Introduce a global term, which controls global stats:
Pairwise MRF –Increase Prior strength
Ground truth
Noisy input
Global gradient prior
• Introduce Factor graphs notation
• Categorization of models in Computer Vision?
– 4-connected MRFs
– Highly-connected MRFs
– Higher-order MRFs
Summary
…. all useful models, but how do I optimize them?
Course Program9.30-10.00 Introduction (Andrew Blake)
10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)
15min Coffee break
11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)
12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)
1 hour Lunch break
14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)
15:00-15.30 Speed and Efficiency (Pushmeet Kohli)
15min Coffee break
15:45-16.15 Comparison of Methods (Carsten Rother)
16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc. (Carsten Rother + Pawan Kumar)
All online material will be online (after conference):http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/
END
unused slides …
Markov Property
• Markov Property: Each variable is only connected to a few others,
i.e. many pixels are conditional independent
• This makes inference easier (possible at all)
• But still… every pixel can influence any other pixel (knock-on effect)
xi
Recap: Factor Graphs
• Factor graphs: very good representation since it reflects directly the given energy
• MRF (Markov Property) means many pixels are conditional independent
• Still … all pixels influence each other (knock-on effect)
Interactive Segmentation - Tutorial example
Goal
Given Z and unknown (latent) variables x:
P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
z = (R,G,B)n x = {0,1}n
Posterior Probability
Likelihood(data-
dependent)
Maximium a Posteriori (MAP): x* = argmax P(x|z)
Prior(data-
independent)
X
Likelihood P(x|z) ~ P(z|x) P(x)
Red
Gre
en
Red
Gre
en
Maximum likelihood:
x* = argmax P(z|x) =
argmax P(zi|xi)
Log P(zi|xi=0) Log P(zi|xi=1)
X
x∏xi
Likelihood P(x|z) ~ P(z|x) P(x)
Prior P(x|z) ~ P(z|x) P(x)
P(x) = 1/f ∏ θij (xi,xj)
f = ∑ ∏ θij (xi,xj) “partition function”
θij (xi,xj) = exp{-|xi-xj|} “ising prior”
xi xj
x
i,j Є N
i,j Є N
(exp{-1}=0.36; exp{0}=1)
Posterior distribution
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)i i,j
P(zi|xi=1) xi + P(zi|xi=0) (1-xi) θi (xi,zi) =
θij (xi,xj) = |xi-xj|
Note, likelihood can be an arbitrary function of the data
P(x|z) ~ P(z|x) P(x)
Posterior “Gibbs” distribution:
Likelihood
prior
f(z,w) = ∑ exp{-E(x,z,w)}X
Energy
Unary terms Pairwise terms
Energy minization
-log P(x|z) = -log (1/f(z,w)) + E(x,z,w)
MAP same as minimum Energy
MAP; Global min E
x* = argmin E(x,z,w)
ML
f(z,w) = ∑ exp{-E(x,z,w)}X
X
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
Weight prior and likelihood
w =0
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)
w =10
w =200w =40
Moving away from a pure prior …
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj)i,ji
θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||2})
ß=2(Mean(||zi-zj||2) )-1
Contrast Costising cost
||zi-zj||2
cost
“Going from a Markov random Field to Conditional random field”
Tree vs Loopy graphs
- MAP (in general) NP hard(see inference part)
- Marginals P(xi) also NP hard
[Felzenschwalb, Huttenlocher ‘01+
• MAP is tractable• Marginal, e.g. P(foot), tractable
chaintree
rootxi
Markov blanket of xi: all variables which are in same factor as xi
Stereo matching - prior
|di-dj|
θij (di,dj) = g(|di-dj|)
cost
[Olga Veksler PhD thesis]
(Potts model) Smooth disparities
Potts model
Left image
Modelling texture [Zalesny et al ‘01+
input
“Unary only”
“8 connected MRF”
“13 connected MRF”
5.08 5.65
6.75 8
Reason2: Discretization artefacts
*Boykov et al ‘03, ‘05+
Larger connectivity can model true Euclidean length (also any Riemannian metric, e.g. geodesic length, can be modelled)
4-con.
Length of the path
8-con.
6.28
6.28
θij (xi,xj) = ∆a / (2*dis(xi,xj)) |xi –xj| ∆a = π/4
1
√2
1
true euc.
References Higher-order Functions?
• In general
Field of Experts Model (2x2; 5x5)*Roth, Black CVPR ‘05 +[Potetz, CVPR ‘07+
Minimize Curvature (3x1)*Woodford et al. CVPR ‘08 +
Large Neighbourhood (10x10 -> whole image)*Rother, Kolmogorov, Minka & Blake, CVPR ‘06+*Vicente, Kolmogorov, Rother, CVPR ‘08+[Komodiakis, Paragios, CVPR ‘09+*Rother, Kohli, Feng, Jia, CVPR ‘09+*Woodford, Rother, Kolmogorov, ICCV ‘09+*Vicente, Kolmogorov, Rother, ICCV ‘09+*Ishikawa, CVPR ‘09+*Ishikawa, ICCV ‘09+
θ(x1,x2,x3) ≠ θ(x1,x2) + θ(x1,x3) + θ(x2,x3)
Conditional Random Field (CRF)
Definition CRF: all factors may depend on the data zNo problem for inference (but parameter learning)
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj)i
with θij (xi,xj,zi,zj) = |xi-xj| exp(-ß||zi-zj||2)
Contrast CostIsing cost
i,j Є N4
||zi-zj||2
xj
zj
zi
xj
zj
zi
xi
xj
θij
Factor graph
Recommended