GraphCut-based Optimisation for Computer Vision Ľubor Ladický

GraphCut-based OptimisationGraphCut-based Optimisationfor Computer Visionfor Computer Vision

Ľubor Ladický

Overview

• Motivation

• Min-Cut / Max-Flow (Graph Cut) Algorithm

• Markov and Conditional Random Fields

• Random Field Optimisation using Graph Cuts

• Submodular vs. Non-Submodular Problems

• Pairwise vs. Higher Order Problems

• 2-Label vs. Multi-Label Problems

• Recent Advances in Random Field Optimisation

• Conclusions

Overview

• Motivation

• Conclusions

Image Labelling Problems

Image Denoising Geometry Estimation Object Segmentation

Assign a label to each image pixelAssign a label to each image pixel

Building

Tree Grass

Depth Estimation

• Labellings highly structured

Possible labelling Unprobable labelling Impossible labelling

• Labellings highly structured

• Labels highly correlated with very complex dependencies

• Neighbouring pixels tend to take the same label

• Low number of connected components

• Classes present may be seen in one image

• Geometric / Location consistency

• Planarity in depth estimation

• … many others (task dependent)

• Labelling highly structured

• Independent label estimation too hard

• Whole labelling should be formulated as one optimisation problem

• Number of pixels up to millions

• Hard to train complex dependencies

• Optimisation problem is hard to infer

• Number of pixels up to millions

• Hard to train complex dependencies

• Optimisation problem is hard to infer

Vision is hard !

• You can

either

• Change the subject from the Computer Vision to the History of Renaissance Art

Vision is hard !

• You can

either

• Change the subject from the Computer Vision to the History of Renaissance Art

• Learn everything about Random Fields and Graph Cuts

Vision is hard !

Foreground / Background Estimation

Rother et al. SIGGRAPH04

Data term Smoothness term

Data term

Estimated using FG / BG colour models

Smoothness term

Intensity dependent smoothness

How to solve this optimisation problem?

• Transform into min-cut / max-flow problem

• Solve it using min-cut / max-flow algorithm

Overview

• Motivation

• Applications

• Conclusions

Max-Flow Problem

source

Task :

Maximize the flow from the sink to the source such that

1) The flow it conserved for each node

2) The flow for each pipe does not exceed the capacity

Max-Flow Problem

source

Max-Flow Problem

source

flow from the source

capacity

flow from node i to node j

conservation of flow

set of edges

set of nodes

Max-Flow Problem

source

Ford & Fulkerson algorithm (1956)

Find the path from source to sink

While (path exists)

flow += maximum capacity in the path

Build the residual graph (“subtract” the flow)

Find the path in the residual graph

Max-Flow Problem

source

While (path exists)

Max-Flow Problem

source

While (path exists)

flow = 3

Max-Flow Problem

source

While (path exists)

flow = 3

Max-Flow Problem

source

While (path exists)

flow = 3

Max-Flow Problem

source

While (path exists)

flow = 3

Max-Flow Problem

source

While (path exists)

flow = 6

Max-Flow Problem

source

While (path exists)

flow = 6

Max-Flow Problem

source

While (path exists)

flow = 6

Max-Flow Problem

source

While (path exists)

flow = 6

Max-Flow Problem

source

While (path exists)

flow = 11

Max-Flow Problem

source

While (path exists)

flow = 11

Max-Flow Problem

source

While (path exists)

flow = 11

Max-Flow Problem

source

While (path exists)

flow = 11

Max-Flow Problem

source

While (path exists)

flow = 13

Max-Flow Problem

source

While (path exists)

flow = 13

Max-Flow Problem

source

While (path exists)

flow = 13

Max-Flow Problem

source

While (path exists)

flow = 13

Max-Flow Problem

source

While (path exists)

flow = 15

Max-Flow Problem

source

While (path exists)

flow = 15

Max-Flow Problem

source

While (path exists)

flow = 15

Max-Flow Problem

source

While (path exists)

flow = 15

Max-Flow Problem

source

While (path exists)

flow = 15

Max-Flow Problem

source

Why is the solution globally optimal ?

flow = 15

Max-Flow Problem

source

1. Let S be the set of reachable nodes in the

residual graph

flow = 15

Max-Flow Problem

1. Let S be the set of reachable nodes in the

residual graph

2. The flow from S to V - S equals to the sum

of capacities from S to V – S

source

Max-Flow Problem

1. Let S be the set of reachable nodes in the residual graph

2. The flow from S to V - S equals to the sum of capacities from S to V – S

3. The flow from any A to V - A is upper bounded by the sum of capacities from A to V – A

source

Max-Flow Problem

flow = 15

4. The solution is globally optimal

source

3/30/10/1

Individual flows obtained by summing up all paths

Max-Flow Problem

flow = 15

4. The solution is globally optimal

source

3/30/10/1

Max-Flow Problem

source

Order does matter

Max-Flow Problem

source

Order does matter

Max-Flow Problem

source

Order does matter

Max-Flow Problem

source

Order does matter

Max-Flow Problem

source

Order does matter

Max-Flow Problem

source

Order does matter

• Standard algorithm not polynomial

• Breath first leads to O(VE2)

• Path found in O(E)

• At least one edge gets saturated

• The saturated edge distance to the source has to increase and is at most V leading to O(VE)

(Edmonds & Karp, Dinic)

Max-Flow Problem

source

Order does matter

• Standard algorithm not polynomial

• Breath first leads to O(VE2)

(Edmonds & Karp)

• Various methods use different algorithm to find the path and vary in complexity

Min-Cut Problem

source

Task :

Minimize the cost of the cut

1) Each node is either assigned to the source S or sink T

2) The cost of the edge (i, j) is taken if (iS) and (jT)

Min-Cut Problem

source

Min-Cut Problem

source

source set sink set

edge costs

Min-Cut Problem

source

source set sink set

edge costs

cost = 18

Min-Cut Problem

source

source set sink set

edge costs

cost = 25

Min-Cut Problem

source

source set sink set

edge costs

cost = 23

Min-Cut Problem

source

Min-Cut Problem

source

transformable into Linear program (LP)

Min-Cut Problem

source

Dual to max-flow problem

transformable into Linear program (LP)

Min-Cut Problem

source

After the substitution of the constraints :

Min-Cut Problem

source

xi = 0 xi S xi = 1 xi T

source edges sink edges

Edges between variables

Min-Cut Problem

source

C(x) = 5x1 + 9x2 + 4x3 + 3x3(1-x1) + 2x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 6x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 3x6(1-x2) + 2x4(1-x5) + 3x6(1-x5) + 6(1-x4)

+ 8(1-x5) + 5(1-x6)

Min-Cut Problem

C(x) = 5x1 + 9x2 + 4x3 + 3x3(1-x1) + 2x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 6x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 3x6(1-x2) + 2x4(1-x5) + 3x6(1-x5) + 6(1-x4)

+ 8(1-x5) + 5(1-x6)

source

Min-Cut Problem

C(x) = 2x1 + 9x2 + 4x3 + 2x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 3x6(1-x2) + 2x4(1-x5) + 3x6(1-x5) + 6(1-x4)

+ 5(1-x5) + 5(1-x6)

+ 3x1 + 3x3(1-x1) + 3x5(1-x3) + 3(1-x5)

source

Min-Cut Problem

C(x) = 2x1 + 9x2 + 4x3 + 2x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 3x6(1-x2) + 2x4(1-x5) + 3x6(1-x5) + 6(1-x4)

+ 5(1-x5) + 5(1-x6)

+ 3x1 + 3x3(1-x1) + 3x5(1-x3) + 3(1-x5)

3x1 + 3x3(1-x1) + 3x5(1-x3) + 3(1-x5)

3 + 3x1(1-x3) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 3 + 2x1 + 9x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 3x6(1-x2) + 2x4(1-x5) + 3x6(1-x5) + 6(1-x4)

+ 5(1-x5) + 5(1-x6) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 3 + 2x1 + 9x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 3x6(1-x2) + 2x4(1-x5) + 3x6(1-x5) + 6(1-x4)

+ 5(1-x5) + 5(1-x6) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 3 + 2x1 + 6x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 2x5(1-x4) + 6(1-x4)

+ 2(1-x5) + 5(1-x6) + 3x3(1-x5)

+ 3x2 + 3x6(1-x2) + 3x5(1-x6) + 3(1-x5)

3x2 + 3x6(1-x2) + 3x5(1-x6) + 3(1-x5)

3 + 3x2(1-x6) + 3x6(1-x5)

source

Min-Cut Problem

C(x) = 6 + 2x1 + 6x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 2x5(1-x4) + 6(1-x4) + 3x2(1-x6) + 3x6(1-x5)

+ 2(1-x5) + 5(1-x6) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 6 + 2x1 + 6x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 2x2(1-x3) + 5x3(1-x2) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 5x6(1-x3) + 1x3(1-x6)

+ 2x5(1-x4) + 6(1-x4) + 3x2(1-x6) + 3x6(1-x5)

+ 2(1-x5) + 5(1-x6) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 11 + 2x1 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 6x3(1-x6)

+ 2x5(1-x4) + 6(1-x4) + 3x2(1-x6) + 3x6(1-x5)

+ 2(1-x5) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 11 + 2x1 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x4(1-x1)

+ 1x5(1-x1) + 3x5(1-x3) + 6x3(1-x6)

+ 2x5(1-x4) + 6(1-x4) + 3x2(1-x6) + 3x6(1-x5)

+ 2(1-x5) + 3x3(1-x5)

source

Min-Cut Problem

C(x) = 13 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x1(1-x4)

+ 1x5(1-x1) + 3x5(1-x3) + 6x3(1-x6)

+ 2x4(1-x5) + 6(1-x4) + 3x2(1-x6) + 3x6(1-x5)

+ 3x3(1-x5)

source

Min-Cut Problem

C(x) = 13 + 1x2 + 2x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x1(1-x4)

+ 1x5(1-x1) + 3x5(1-x3) + 6x3(1-x6)

+ 2x4(1-x5) + 6(1-x4) + 3x2(1-x6) + 3x6(1-x5)

+ 3x3(1-x5)

source

Min-Cut Problem

C(x) = 15 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x1(1-x4)

+ 1x5(1-x1) + 6x3(1-x6) + 6x3(1-x5)

+ 2x5(1-x4) + 4(1-x4) + 3x2(1-x6) + 3x6(1-x5)

source

Min-Cut Problem

C(x) = 15 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x1(1-x4)

+ 1x5(1-x1) + 6x3(1-x6) + 6x3(1-x5)

+ 2x5(1-x4) + 4(1-x4) + 3x2(1-x6) + 3x6(1-x5)

source

Min-Cut Problem

C(x) = 15 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x1(1-x4)

+ 1x5(1-x1) + 6x3(1-x6) + 6x3(1-x5)

+ 2x5(1-x4) + 4(1-x4) + 3x2(1-x6) + 3x6(1-x5)

source

x6 cost = 0

min cut

cost = 0

• All coefficients positive

• Must be global minimum

S – set of reachable nodes from s

Min-Cut Problem

C(x) = 15 + 1x2 + 4x3 + 5x1(1-x3)

+ 3x3(1-x1) + 7x2(1-x3) + 2x1(1-x4)

+ 1x5(1-x1) + 6x3(1-x6) + 6x3(1-x5)

+ 2x5(1-x4) + 4(1-x4) + 3x2(1-x6) + 3x6(1-x5)

source

x6 cost = 0

min cut

T• All coefficients positive

• Must be global minimum

T – set of nodes that can reach t

(not necesarly the same)

Overview

• Motivation

• Conclusions

Markov and Conditional RF

• Markov / Conditional Random fields model probabilistic dependencies of the set of random variables

• Markov / Conditional Random fields model conditional dependencies between random variables

• Each variable is conditionally independent of all other variables given its neighbours

• Posterior probability of the labelling x given data D is :

partition function potential functionscliques

• Posterior probability of the labelling x given data D is :

• Energy of the labelling is defined as :

partition function potential functionscliques

• The most probable (Max a Posteriori (MAP)) labelling is defined as:

• The only distinction (MRF vs. CRF) is that in the CRF the conditional dependencies between variables depend also on the data

• Typically we define an energy first and then pretend there is an underlying probabilistic distribution there, but there isn’t really (Pssssst, don’t tell anyone)

Overview

• Motivation

• Conclusions

Standard CRF Energy

Pairwise CRF models

Graph Cut based Inference

The energy / potential is called submodular (only) if for every pair of variables :

For 2-label problems x{0, 1} :

Pairwise potential can be transformed into :

After summing up :

Let : Then :

After summing up :

Let : Then :

Equivalent st-mincut problem is :

Data term

Estimated using FG / BG colour models

Smoothness term

Intensity dependent smoothness

source

Rother et al. SIGGRAPH04

Extendable to (some) Multi-label CRFs

• each state of multi-label variable encoded using multiple binary variables

• the cost of every possible cut must be the same as the associated energy

• the solution obtained by inverting the encoding

ObtainsolutionGraph Cut

BuildGraph

Encoding

multi-label energy solution

Dense Stereo Estimation

Left Camera Image Right Camera Image Dense Stereo Result

• For each pixel assigns a disparity label : z• Disparities from the discrete set {0, 1, .. D}

Data term

Left image feature

Shifted right image feature

Left Camera Image Right Camera Image

Data term

Left image feature

Shifted right image feature

Smoothness term

Ishikawa PAMI03

Encodingsource

xk1 xk

source

Ishikawa PAMI03

Unary Potential

source

xk1 xk

∞ ∞

source

cost = +

The cost of every cut should beequal to the corresponding energy under the encoding

Ishikawa PAMI03

Unary Potential

source

xk1 xk

∞ ∞

source

cost = +

Ishikawa PAMI03

Unary Potential

source

xk1 xk

∞ ∞

source

cost = ∞

Ishikawa PAMI03

Pairwise Potential

source

xk1 xk

∞ ∞

source

cost = + + K

Ishikawa PAMI03

source

xk1 xk

∞ ∞

source

cost = + + D K

Pairwise Potential

See [Ishikawa PAMI03] for more details

source

xk1 xk

∞ ∞

source

Extendable to any convex cost

Convex function

Move making algorithms

• Original problem decomposed into a series of subproblems solvable with graph cut

• In each subproblem we find the optimal move from the current solution in a restricted search space

UpdatesolutionGraph Cut

BuildGraph

Encoding

Propose move

Initial solutionsolution

Example : [Boykov et al. 01]

α-expansion– each variable either keeps its old label or changes to α– all α-moves are iteratively performed till convergence

Transformation function :

α-swap– each variable taking label α or can change its label to α or – all α-moves are iteratively performed till convergence

Sufficient conditions for move making algorithms :(all possible moves are submodular)

α-swap : semi-metricity

Proof:

Sufficient conditions for move making algorithms :(all possible moves are submodular)

α-expansion : metricity

Proof:

Object-class Segmentation

Data term Smoothness termData term

Smoothness term

Discriminatively trained classifier

Original Image Initial solution

Original Image Initial solution Building expansion

building

Sky expansion

building

Sky expansion Tree expansion

grass grass

building

building building

sky sky

Sky expansion Tree expansion Final Solution

grass grass grass

building

building buildingbuilding

sky sky sky

tree treeaeroplane

Range-expansion [Kumar, Veksler & Torr 1]– Expansion version of range swap moves– Each varibale can change its label to any label in a convex

range or keep its old label

Range-(swap) moves [Veksler 07]– Each variable in the convex range can change its label to any

other label in the convex range – all range-moves are iteratively performed till convergence

Dense Stereo Reconstruction

Data term

Smoothness term

Same as before

Data term

Smoothness term

Same as before

Convex range Truncation

Original Image Initial Solution

Original Image Initial Solution After 1st expansion

After 2nd expansion

After 1st expansion

After 2nd expansion

After 1st expansion

After 3rd expansion

After 2nd expansion

After 1st expansion

After 3rd expansion Final solution

• Extendible to certain classes of binary higher order energies

• Higher order terms have to be transformable to the pairwise submodular energy with binary auxiliary variables z

Higher order term Pairwise term

Transf. to pairwise subm.

Encoding

Propose move

• Extendible to certain classes of binary higher order energies

• Higher order terms have to be transformable to the pairwise submodular energy with binary auxiliary variables z

Higher order term Pairwise term

Example :

Kolmogorov ECCV06, Ramalingam et al. DAM12

Enforces label consistency of the clique as a weak constraint

Input image Pairwise CRF Higher order CRF

Kohli et al. CVPR07

Transf. to pairwise subm.

Encoding

Propose move

Initial solution solution

If the energy is not submodular / graph-representable

– Overestimation by a submodular energy

– Quadratic Pseudo-Boolean Optimisation (QPBO)

Overestim. by pairwise subm.

Encoding

Propose move

Overestimation by a submodular energy (convex / concave)

– We find an energy E’(t) s.t.

– it is tight in the current solution E’(t0) = E(t0)

– Overestimates E(t) for all t E’(t) ≥ E(t)

– We replace E(t) by E’(t)

– The moves are not optimal, but quaranteed to converge

– The tighter over-estimation, the better

Example :

Quadratic Pseudo-Boolean Optimisation– Each binary variable is encoded using two binary

variables xi and xi s.t. xi = 1 - xi _

Quadratic Pseudo-Boolean Optimisation

source

– Energy submodular in xi and xi

– Solved by dropping the constraint xi = 1 - xi

– If xi = 1 - xi the solution for xi is optimal

Overview

• Motivation

• Conclusions

Motivation

• To propose formulations enforcing certain structure properties of the output useful in computer vision

– Label Consistency in segments as a weak constraint

– Label agreement over different scales

– Label-set consistency

– Multiple domains consistency

• To propose feasible Graph-Cut based inference method

Structures in CRF

• Taskar et al. 02 – associative potentials

• Woodford et al. 08 – planarity constraint

• Vicente et al. 08 – connectivity constraint

• Nowozin & Lampert 09 – connectivity constraint

• Roth & Black 09 – field of experts

• Woodford et al. 09 – marginal probability

• Delong et al. 10 – label occurrence costs

Object-class Segmentation Problem

• Aims to assign a class label for each pixel of an image

• Classifier trained on the training set

• Evaluated on never seen test images

Pairwise CRF models

Standard CRF Energy for Object Segmentation

Cannot encode global consistency of labels!!

Local context

Ladický, Russell, Kohli, Torr ECCV10

• Thing – Thing • Stuff - Stuff • Stuff - Thing

[ Images from Rabinovich et al. 07 ]

Encoding Co-occurrence

Co-occurrence is a powerful cue [Heitz et al. '08][Rabinovich et al. ‘07]

• Thing – Thing • Stuff - Stuff • Stuff - Thing

Encoding Co-occurrence

Co-occurrence is a powerful cue [Heitz et al. '08][Rabinovich et al. ‘07]

Proposed solutions :

1. Csurka et al. 08 - Hard decision for label estimation2. Torralba et al. 03 - GIST based unary potential3. Rabinovich et al. 07 - Full-connected CRF

[ Images from Rabinovich et al. 07 ] Ladický, Russell, Kohli, Torr ECCV10

What properties should these global co-occurence potentials have ?

Desired properties

1. No hard decisions

Desired properties

1. No hard decisions

Incorporation in probabilistic framework

Unlikely possibilities are not completely ruled out

Desired properties

1. No hard decisions2. Invariance to region size

Desired properties

Cost for occurrence of {people, house, road etc .. } invariant to image area

Desired properties

The only possible solution :

Local context Global contextCost defined over the

assigned labels L(x)

L(x)={ , , }

Desired properties

1. No hard decisions2. Invariance to region size3. Parsimony – simple solutions preferred

L(x)={ building, tree, grass, sky }

L(x)={ aeroplane, tree, flower, building, boat, grass, sky }

Desired properties

1. No hard decisions2. Invariance to region size3. Parsimony – simple solutions preferred4. Efficiency

Desired properties

1. No hard decisions2. Invariance to region size3. Parsimony – simple solutions preferred4. Efficiency

a) Memory requirements as O(n) with the image size and number or labels

b) Inference tractable

• Torralba et al.(2003) – Gist-based unary potentials

• Rabinovich et al.(2007) - complete pairwise graphs

• Csurka et al.(2008) - hard estimation of labels present

Previous work

• Zhu & Yuille 1996 – MDL prior• Bleyer et al. 2010 – Surface Stereo MDL prior• Hoiem et al. 2007 – 3D Layout CRF MDL Prior

• Delong et al. 2010 – label occurence cost

Related work

C(x) = K |L(x)|

C(x) = ΣLKLδL(x)

• Zhu & Yuille 1996 – MDL prior• Bleyer et al. 2010 – Surface Stereo MDL prior• Hoiem et al. 2007 – 3D Layout CRF MDL Prior

• Delong et al. 2010 – label occurence cost

Related work

C(x) = K |L(x)|

C(x) = ΣLKLδL(x)

All special cases of our model

Inference for Co-occurence

Label indicator functions

Co-occurence representation

Move Energy Cost of current label set

Move Energy

Decomposition to α-dependent and α-independent part

α-independent α-dependent

Move Energy

Decomposition to α-dependent and α-independent part

Either α or all labels in the image after the move

Move Energy

submodular non-submodular

Move Energy

non-submodular

Non-submodular energy overestimated by E'(t)• E'(t) = E(t) for current solution• E'(t) E(t) for any other labelling

Move Energy

non-submodular

Occurrence - tight

Move Energy

non-submodular

Co-occurrence overestimation

Move Energy

non-submodular

General case [See the paper]

Move Energy

non-submodular

Quadratic representation

Potential Training for Co-occurence

Ladický, Russell, Kohli, Torr ICCV09

Generatively trained Co-occurence potential

Approximated by 2nd order representation

Results on MSRC data set

Input Image

Comparisons of results with and without co-occurence

Input ImageWithoutCo-occurence

WithCo-occurence

WithoutCo-occurence

WithCo-occurence

Pixel CRF Energy for Object-class Segmentation

Pairwise CRF models

Input Image Pairwise CRF Result

Shotton et al. ECCV06

Pixel CRF Energy for Object-class Segmentation

Pairwise CRF models

• Lacks long range interactions

• Results oversmoothed

• Data term information may be insufficient

Segment CRF Energy for Object-class Segmentation

Pairwise CRF models

Input Image UnsupervisedSegmentation

Shi, Malik PAMI2000,Comaniciu, Meer PAMI2002,

Felzenschwalb, Huttenlocher, IJCV2004,

Pairwise CRF models

Input Image Pairwise CRF ResultBatra et al. CVPR08, Yang et al. CVPR07, Zitnick et al. CVPR08,

Rabinovich et al. ICCV07, Boix et al. CVPR10

Pairwise CRF models

• Allows long range interactions

• Does not distinguish between instances of objects

• Cannot recover from incorrect segmentation

Pairwise CRF models

• Does not distinguish between instances of objects

• Cannot recover from incorrect segmentation

Robust PN approach

Pairwise CRF Segment consistency

Input ImageKohli, Ladický, Torr CVPR08

Higher order CRF

Robust PN approach

Kohli, Ladický, Torr CVPR08

Robust PN approach

No dominant label

Robust PN approach

Dominant label l

Robust PN approach

• Enforces consistency as a weak constraint

• Can recover from incorrect segmentation

• Can combine multiple segmentations

• Does not use features at segment level

Robust PN approach

Transformable to pairwise graph with auxiliary variable

Robust PN approach

Input ImageLadický, Russell, Kohli, Torr ICCV09

Higher order CRF

Associative Hierarchical CRFs

Can be generalized to Hierarchical model

• Allows unary potentials for region variables

• Allows pairwise potentials for region variables

• Allows multiple layers and multiple hierarchies

AH CRF Energy for Object-class Segmentation

Pairwise CRF Higher order term

Higher order term recursively defined as

Segment unary term

Segment pairwise term

Segment higher order term

Higher order term recursively defined as

Segment unary term

Segment pairwise term

Segment higher order term

Why is this generalisation useful?

Let us analyze the case with

• one segmentation

• potentials only over segment level

1) Minimum is segment-consistent

2) Cost of every segment-consistent CRF equal to the cost of pairwise CRF over segments

Equivalent to pairwise CRF over segments!

• Merges information over multiple scales

• Easy to train potentials and learn parameters

• Allows multiple segmentations and hierarchies

• Limited (?) to associative interlayer connections

Inference for AH-CRF

Graph Cut based move making algorithms [Boykov et al. 01]α-expansion transformation function for base layer

Ladický (thesis) 2011

Graph Cut based move making algorithms [Boykov et al. 01]α-expansion transformation function for base layer

α-expansion transformation function for auxiliary layer

(2 binary variables per node)

Interlayer connection between the base layer and the first auxiliary layer:

The move energy is:

The move energy for interlayer connection between the base and auxiliary layer is:

Can be transformed to binary submodular pairwise function:

The move energy for interlayer connection between auxiliary layers is:

Graph constructions

Pairwise potential between auxiliary variables :

Move energy if x = x : c d

Move energy if x ≠ x : c d

Graph constructions

Potential training for Object class Segmentation

Pixel unary potential

Unary likelihoods based on spatial configuration (Shotton et al. ECCV06)

Classifier trained using boosting

Potential training for Object class Segmentation

Segment unary potential

Classifier trained using boosting (resp. Kernel SVMs)

Results on MSRC dataset

Results on Corel dataset

Results on VOC 2010 dataset

Input Image Result Input Image Result

VOC 2010-Qualitative

Input Image Result Input Image Result

CamVid-Qualitative

Our Result

Input Image

GroundTruth

Brostow et al.

Quantitative comparison

MSRC dataset

Corel dataset

VOC2009 dataset

Quantitative comparison

MSRC dataset

VOC2009 dataset

Unary Potential

Unary Cost dependent on the similarity of patches, e.g.cross correlation

Disparity = 0

Unary Potential

Disparity = 5

Unary Potential

Disparity = 10

• Encourages label consistency in adjacent pixels

• Cost based on the distance of labels

Linear Truncated Quadratic Truncated

Pairwise Potential

Does not work for Road Scenes !

Original Image Dense Stereo Reconstruction

Patches can be matched to any other patch for flat surfices

Different brightness in cameras

Patches can be matched to any other patch for flat surfices

Different brightness in cameras

Could object recognition for road scenes help?

Recognition of road scenes is relatively easy

Joint Dense Stereo Reconstructionand Object class Segmentation

• Object class and 3D location are mutually informative

• Sky always in infinity (disparity = 0)

building carsky

• Cars, buses & pedestrians have their typical height

building carsky

• Road and pavement on the ground plane

building carsky

• Buildings and pavement on the sides

building carsky

• Buildings and pavement on the sides

• Both problems formulated as CRF

• Joint approach possible? road

Joint Formulation

• Each pixels takes label zi = [ xi yi ] L1 L2

• Dependency of xi and yi encoded as a unary and pairwise potential, e.g.

• strong correlation between x = road, y = near ground plane

• strong correlation between x = sky, y = 0

• Correlation of edge in object class and disparity domain

Joint Formulation

• Weighted sum of object class, depth and joint potential

• Joint unary potential based on histograms of height

Object layer

Disparity layer

Joint unary links

Unary Potential

Joint Formulation

Pairwise Potential

• Object class and depth edges correlated

• Transitions in depth occur often at the object boundaries

Object layer

Disparity layer

Joint pairwise links

Joint Formulation

Inference

• Standard α-expansion• Each node in each expansion move keeps its old label

or takes a new label [xL1, yL2],

• Possible in case of metric pairwise potentials

Inference

• Standard α-expansion• Each node in each expansion move keeps its old label

or takes a new label [xL1, yL2],

• Possible in case of metric pairwise potentials

Too many moves! ( |L1| |L2| ) Impractical !

Inference

• Projected move for product label space• One / Some of the label components remain(s)

constant after the move

• Set of projected moves• α-expansion in the object class projection• Range-expansion in the depth projection

Dataset

• Leuven Road Scene dataset• Contained

• 3 sequences• 643 pairs of images

• We labelled• 50 training + 20 test images• Object class (7 labels)• Disparity (100 labels)

• Publicly available• http://cms.brookes.ac.uk/research/visiongroup/files/Leuven.zip

Left camera Right camera Object GT Disparity GT

Qualitative Results

• Large improvement for dense stereo estimation

• Minor improvement in object class segmentation

Quantitative Results

Dependency of the ratio of correctly labelled pixels within the maximum allowed error delta

• We proposed :• New models enforcing higher order structure

• Label-consistency in segments• Consistency between scale• Label-set consistency• Consistency between different domains

• Graph-Cut based inference methods• Source code http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm

There is more to come !

Summary

Questions?

Thank you

GraphCut-based Optimisation for Computer Vision Ľubor Ladický

Documents

jarosova-vstup-1-3.qxd 9.1.2013 14:06 Page 1...Recenzenti prof. PhDr. MIROSLAV DUDOK, DrSc. PhDr. ĽUBOR KRÁLIK, CSc. jarosova-vstup-1-3.qxd 9.1.2013 14:06 Page 2 SLOVO V SLOVNÍKU

Kết hợp phương pháp phân ngưỡng và Graphcut trong phân

Human Body Segmentation with Multi-limb Error-Correcting ...sergio/linked/presentation_ibpria_2013.pdf · o GraphCut segmentation procedure. o Novel limb-labeled dataset. o Shows

Graphcut Textures: Image and Video Synthesis Using Graph Cuts

Motion Coherent Tracking with Multi-label MRF …based methods often adopt a global optimization method (e.g. graphcut) and explicitly search a large ﬁne-grained space of potential

GraphCut -based Optimisation for Computer Vision

Ing. Ľubor Labanič riaditeľ odboru bankových služieb pre samosprávy Dexia banka Slovensko a.s

Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction Ľubor Ladický, Paul Sturgess, Christopher Russell, Sunando Sengupta, Yalin

Graphcut Textures Image and Video Synthesis Using Graph Cuts Vivek KwatraVivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk and Aaron BobickArno SchödlIrfan

07.14.Graphcut Textures- Image and Video Synthesis Using Graph Cuts

Enhanced Fuzzy-Based Local Information Algorithm for Sonar ... · Image segmentation methods include the use of mixture [6], graphcut [7], and active contour [8]. In this work, we

C:/Documents and Settings/Srikumar Ramalingam…phst/Papers/2011/TR2011-066.pdf · 2012-03-27 · 2 Srikumar Ramalingam, Chris Russell, Lubor Ladický and Philip H.S. Torr taken to

doc. Ing. František Urban, CSc., Ing. Ľubor Kučák, CSc. Ing. Peter Fodor

Multi-clasificación discriminativa por partes …sergio/linked/memoriapfc_juanca.pdfeste caso se ha optado por segmentación mediante la formulación GraphCut. Los métodos presentados

Action Recognition From Weak Alignment of Body …vgg/publications/2014/... · HOAI, LADICKÝ, ZISSERMAN: ACTION FROM WEAK ALIGNMENT OF BODY PARTS 1 Action Recognition From Weak Alignment

Graphcut Texture: Image and Video Synthesis Using Graph Cuts Vivek Kwatra, Arno Schodl, Irfan Essa, Greg Turk, Aaron Bobick Georgia Institute of Technology

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford

Univerzita Komenského v Bratislave Právnická fakulta...BAB01 Cibulka, Ľubor a kolektív : Vybrané kapitoly z dejín a náuky o spolo čnosti : Príru čka na prijímacie skúšky

AUTOMATE DE DECOUPE CFA0 GRAPHCUT · -Système de marquage : Système de marquage Stylo et Jet d'encre. -Pointage de l’origine : Un pointeur laser est prévu (configuration standard)

Graphcut Textures - ETHZgraphics.ethz.ch/.../former/seminar/handouts/Sigg_GraphCutTextures.… · 1 Graphcut Textures Image and Video Synthesis Using Graph Cuts Vivek Kwatra Arno