Pushmeet Kohli Microsoft Research - DAMTP...Autocollage Robotics DARPA Urban Challenge P(x|z) ∏...

Preview:

Citation preview

Pushmeet Kohli Microsoft Research

Infer the values of hidden random variables

Geometry Estimation Object Segmentation

Building

Sky

Tree Grass

Pose Estimation

Orientation Joint angles

Medical Imaging

Image Search

Human Computer Interaction

Minority Report (2002)

Image and

Video Editing

Autocollage

Robotics

DARPA Urban Challenge

P(x|z) P(z|xi) ∏ xi

P(xi,xj) ∏ xi,xj Posterior Probability

Object Segmentation

Building

Sky

Tree Grass

xi ϵ Labels L = {l1, l2, … , lk}

=

Likelihood Term Pairwise Consistency Prior

x1 x2

P(x|z) P(z|xi) ∏ xi

P(xi,xj) ∏ xi,xj Posterior Probability

Object Segmentation

Building

Sky

Tree Grass

xi ϵ Labels L = {l1, l2, … , lk}

=

Likelihood Term Pairwise Consistency Prior

Left image Right image

Disparity map

Depth Estimation

E(X) E: {0,1}n → R 0 → fg

1 → bg

Image (D)

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

Unary Cost (ci) Dark (negative) Bright (positive)

E: {0,1}n → R 0 → fg

1 → bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

∑ fi (xi)

Pixel Colour

E(X) =

Unary Cost (ci) Dark (negative) Bright (positive)

E: {0,1}n → R 0 → fg

1 → bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

x* = arg min E(x)

∑ fi (xi)

Pixel Colour

E(X) =

Discontinuity Cost (dij)

E: {0,1}n → R 0 → fg

1 → bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

∑ fi (xi)

Pixel Colour

E(X) = + ∑ gij (xi,xj)

Smoothness Prior

Discontinuity Cost (dij)

E: {0,1}n → R 0 → fg

1 → bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

∑ fi (xi)

Pixel Colour

E(X) = + ∑ dij|xi-xj|

Smoothness Prior

Global Minimum (x*)

x* = arg min E(x) x

How to minimize E(x)?

E: {0,1}n → R 0 → fg

1 → bg

n = number of pixels

[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]

∑ fi (xi)

Pixel Colour

E(X) = + ∑ dij|xi-xj|

Smoothness Prior

Space of Problems

Submodular

Functions

CSP

Tree Structured

Pair-wise O(n3)

MAXCUT

O(n6)

n = Number of Variables

Segmentation Energy

NP-Hard

.. So what are the challenges?

Image Segmentation

Semantic Segmentation

Texture Denoising

Penalizes all discontinuities

Pairwise Models are Weak

Does not enforce “group” consistency

Does not enforce consistency

of local structure

Unary Potentials [Shotton et al. ECCV 2006]

+ Pairwise Smoothness

Potentials

Higher Order Energy

Colour, Location & Texture Sky

Building

Tree

Grass

Higher Order Potentials (Defined using multiple

Segmentations)

+

Kohli Ladicky Torr [CVPR 2008]

{ 0 if xi=L, I ϵ p C otherwise

h(Xp) =

p

Pixels belonging to a group should take the

same label

Unary Potentials [Shotton et al. ECCV 2006]

+ Pairwise Smoothness

Potentials

+

Higher Order Energy

Colour, Location & Texture Sky

Building

Tree

Grass

Kohli Ladicky Torr [CVPR 2008]

{ 0 if xi=L, i ϵ p C otherwise h(Xp) =

Higher Order Potentials (Defined using multiple

Segmentations)

Unary Potentials [Shotton et al. ECCV 2006]

+ Pairwise Smoothness

Potentials

Segmentation Solution

+

Higher Order Energy

Energy Minimization

Colour, Location & Texture Sky

Building

Tree

Grass

Kohli Ladicky Torr [CVPR 2008]

Higher Order Potentials (Defined using multiple

Segmentations)

Image

Unary Classifiers Pairwise CRF

Pn Potts

Robust Pn Potts

Clustering 1 Clustering 2

Image (MSRC-21)

Pairwise CRF

Higher order CRF

Ground Truth

Sheep

Grass

[Runner-Up, PASCAL VOC 2008 ]

Test Image Test Image (60% Noise)

Training Image

Result

Pairwise Energy

P(x)

Minimized using st-mincut or max-product

message passing

Higher Order Structure not Preserved

Minimize:

Where:

Higher Order Function (|c| = 10x10 = 100)

Assigns cost to 2100 possible labellings!

Exploit function structure to transform it to a Pairwise function

E(X) = P(X) + ∑ hc (Xc) c

hc: {0,1}|c| → R

p1 p2 p3

Test Image Test Image (60% Noise)

Training Image

Pairwise Result

Higher-Order Result

Learned Patterns

Identified transformable families of higher order function s.t.

1. Constant or polynomial number of auxiliary variables (a) added

2. All pairwise functions (g) are submodular

Pairwise Submodular

Function

Higher Order Function

H (X) = F ( ∑ xi ) Example:

H (X)

∑ xi

concave

[Kohli et al. ‘08]

0

Simple Example using Auxiliary variables

{ 0 if all xi = 0 C1 otherwise f(x) =

min f(x) min C1a + C1 ∑ ā xi

x ϵ L = {0,1}n

x = x,a ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi < 1 a=0 (ā=1) f(x) = 0

∑xi ≥ 1 a=1 (ā=0) f(x) = C1

min f(x) min C1a + C1 ∑ ā xi x

= x,a ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi

1 2 3

C1

C1∑xi

[Kohli et al. ‘08]

min f(x) min C1a + C1 ∑ ā xi x

= x,a ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi

1 2 3

C1

C1∑xi

a=1 a=0 Lower envelop of concave functions is

concave

[Kohli et al. ‘08]

min f(x) min f1 (x)a + f2(x)ā x

= x,a ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi

1 2 3

Lower envelop of concave functions is

concave

f2(x)

f1(x)

[Kohli et al. ‘08]

min f(x) min f1 (x)a + f2(x)ā x

= x,a ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi

1 2 3

a=1 a=0 Lower envelop of concave functions is

concave

f2(x)

f1(x)

[Kohli et al. ‘08]

min f(x) min max f1 (x)a + f2(x)ā x

= x ϵ {0,1}

Higher Order Submodular Function

Quadratic Submodular Function

∑xi

1 2 3

a=1 a=0 Upper envelope

of linear functions is

convex

f2(x)

f1(x)

a ϵ {0,1}

Very Hard Problem!!!! [Kohli and Kumar, CVPR 2010]

Object Prior on size of object (say n/2)

Background

BINARY SEGMENTATION

SIZE/VOLUME PRIORS

[Kohli and Kumar, CVPR 2010]

SILHOUETTE CONSTRAINTS

Rays must touch silhouette

at least once

3D RECONSTRUCTION

[Sinha et al. ‘05, Cremers et al. ’08] [Woodford et al. 2009]

Original Video

MSR

Edited Video

Alternative Representation

MSR

(Memory Efficient)

[Rav-Acha, Kohli, Rother & Fitzgibbon, SIGGRAPH 2008]

A deforming 3D object

x

y x = (x, y) Note:

u = (u, v) Note:

Visibility Mask Warping

w(u,t) b(u,t)

Texture map

visibility mask warping Image Texture map

+ =

+ =

visibility mask warping Image Texture map

[Rav-Acha, Kohli, Rother & Fitzgibbon, SIGGRAPH 2008]

Speed and Accuracy

3,600,000,000 Pixels Created from about 800 8 MegaPixel Images

[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]

[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]

Speed and Scalability

Processing Videos 1 minute video of 1M pixel resolution

3.6 B pixels

3D reconstruction [500 x 500 x 500 = .125B voxels]

Speed and Scalability

Dynamic Algorithms [ICCV2005][CVPR 2008][PAMI 2007] [PAMI 2010]

Reuse computation

Hybrid Algorithms [PAMI 2010]

Use different algorithms to solve easy/hard parts

Adaptive Algorithms [AISTATS 2010] [ICML 2011] [CVPR 2011]

Focus on the unsolved parts of the problem

Recommended