147
Stereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 1 / 69

Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Stereo and Energy Minimization

Raquel Urtasun

TTI Chicago

Feb 21, 2013

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 1 / 69

Page 2: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Stereo Estimation Methods

Local methods

Grow and seed methods: use a few good correspondences and grow theestimation from them

Adaptive Window methods (AW)

Global methods: define a Markov random field over

Pixel-levelFronto-parallel planesSlanted planes

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 2 / 69

Page 3: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 4: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 5: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 6: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 7: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 8: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 9: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Which Similarity Measure?

Sum of square intensity differences (SSD), i.e., mean square error

Absolute intensity differences (SAD), i.e., mean absolute difference

Robust measures including truncated quadratic functions: they limit theinfluence of mismatches

Normalized cross-correlation: behaves similarly to the SSD

Binary matching costs based on binary features such as edges (e.g., matchor not match)

Invariant to differences in camera gain or bias, e.g., gradient basedmeasurement, filter responses

All sort of feature descriptors that we saw before in class as well as others

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 3 / 69

Page 10: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Disparity Estimation

DSI: Disparity image

Ground truth Scene

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 4 / 69

Page 11: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Sparse Correspondences

Early approaches to stereo were feature-based

Consists on first extract a set of ”matchable” features. How?

This resulted in a sparse disparity computation, and a sparse 3D point cloud

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 5 / 69

Page 12: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Sparse Correspondences

Early approaches to stereo were feature-based

Consists on first extract a set of ”matchable” features. How?

This resulted in a sparse disparity computation, and a sparse 3D point cloud

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 5 / 69

Page 13: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Sparse Correspondences

Early approaches to stereo were feature-based

Consists on first extract a set of ”matchable” features. How?

This resulted in a sparse disparity computation, and a sparse 3D point cloud

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 5 / 69

Page 14: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Denser correspondences

Typical stereo pipeline

1 Matching cost computation

2 Cost aggregation

3 Disparity computation

4 Disparity refinement

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 6 / 69

Page 15: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values ata given disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel

4 Disparity refinement: consistency check and sub-pixel estimation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 7 / 69

Page 16: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values ata given disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel

4 Disparity refinement: consistency check and sub-pixel estimation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 7 / 69

Page 17: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values ata given disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel

4 Disparity refinement: consistency check and sub-pixel estimation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 7 / 69

Page 18: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values ata given disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel

4 Disparity refinement: consistency check and sub-pixel estimation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 7 / 69

Page 19: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Aggregation in local methods

Aggregate the matching cost summing over a support region

The support region can be 2D (i.e., x,y) or 3D (i.e., x,y,d). The lattersupports slanted surfaces.

Simplest approach: Aggregation with fixed support can be done byperforming 2D or 3D convolution

C (x , y , d) = w(x , y , d) ∗ C0(x , y , d)

Problems

If too small, then ambiguousIf too big, bleeding effects at the edges

W = 3 W = 20

Figure: from N. Snavely

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 8 / 69

Page 20: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Aggregation in local methods

Aggregate the matching cost summing over a support region

The support region can be 2D (i.e., x,y) or 3D (i.e., x,y,d). The lattersupports slanted surfaces.

Simplest approach: Aggregation with fixed support can be done byperforming 2D or 3D convolution

C (x , y , d) = w(x , y , d) ∗ C0(x , y , d)

Problems

If too small, then ambiguousIf too big, bleeding effects at the edges

W = 3 W = 20

Figure: from N. Snavely

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 8 / 69

Page 21: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Aggregation in local methods

Aggregate the matching cost summing over a support region

The support region can be 2D (i.e., x,y) or 3D (i.e., x,y,d). The lattersupports slanted surfaces.

Simplest approach: Aggregation with fixed support can be done byperforming 2D or 3D convolution

C (x , y , d) = w(x , y , d) ∗ C0(x , y , d)

Problems

If too small, then ambiguousIf too big, bleeding effects at the edges

W = 3 W = 20

Figure: from N. Snavely

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 8 / 69

Page 22: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Aggregation in local methods

Aggregate the matching cost summing over a support region

The support region can be 2D (i.e., x,y) or 3D (i.e., x,y,d). The lattersupports slanted surfaces.

Simplest approach: Aggregation with fixed support can be done byperforming 2D or 3D convolution

C (x , y , d) = w(x , y , d) ∗ C0(x , y , d)

Problems

If too small, then ambiguousIf too big, bleeding effects at the edges

W = 3 W = 20

Figure: from N. Snavely

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 8 / 69

Page 23: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Matching cost computation

I(x, y) J(x, y)

C(x, y, d); the disparity space image (DSI) x

d

The disparity is then computed by

d(x , y) = arg mind′

C (x , y , d ′)

[Source: N. Snavely]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 9 / 69

Page 24: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 25: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 26: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 27: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?

Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 28: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 29: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 30: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

More complex aggregation

Solution: make the window adaptive to the image evidence (e.g.,aggregate from pixels with similar appearance)

Have we seen something similar in class? When? How would you do this?

An alternative solution is to select between different windows. This can becomputed efficiently using integral images. How?

In iterative diffusion we repeately add to each pixel’s costs the weightedvalue of their neighbors

What happens as a function of the number of iterations?Does a strategy like this seem familiar?

Global methods are a more principle way to do aggregation

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 10 / 69

Page 31: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values at agiven disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel.Winner takes all strategy!

4 Disparity refinement:

Consistency check: as the previous method check uniqueness ofmatches only on one directionHole fillingSub-pixel estimationRemove of spurious matches

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 11 / 69

Page 32: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values at agiven disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel.Winner takes all strategy!

4 Disparity refinement:

Consistency check: as the previous method check uniqueness ofmatches only on one direction

Hole fillingSub-pixel estimationRemove of spurious matches

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 11 / 69

Page 33: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values at agiven disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel.Winner takes all strategy!

4 Disparity refinement:

Consistency check: as the previous method check uniqueness ofmatches only on one directionHole filling

Sub-pixel estimationRemove of spurious matches

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 11 / 69

Page 34: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values at agiven disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel.Winner takes all strategy!

4 Disparity refinement:

Consistency check: as the previous method check uniqueness ofmatches only on one directionHole fillingSub-pixel estimation

Remove of spurious matches

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 11 / 69

Page 35: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values at agiven disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel.Winner takes all strategy!

4 Disparity refinement:

Consistency check: as the previous method check uniqueness ofmatches only on one directionHole fillingSub-pixel estimationRemove of spurious matches

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 11 / 69

Page 36: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local methods

1 Matching cost computation: is the square difference in intensity values at agiven disparity

2 Cost aggregation: adds matching cost over square window with constantdisparity

3 Disparity computation: select the minimal aggregated value at each pixel.Winner takes all strategy!

4 Disparity refinement:

Consistency check: as the previous method check uniqueness ofmatches only on one directionHole fillingSub-pixel estimationRemove of spurious matches

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 11 / 69

Page 37: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smoothThe regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 38: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smoothThe regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 39: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smoothThe regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 40: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smoothThe regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 41: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smooth

The regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 42: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smoothThe regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 43: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Subpixel Estimation

Most algorithms retrieve a disparity which is an integer

This is not good enough for some applications, e.g., image-based rendering

To remedy this, most approaches perform sub-pixel refinement after thedisparity is estimated

Gradient descent or fitting a curve is the most common ways to do thesubpixel estimation

What might be the problem?

Disparities should be smoothThe regions where the disparities are estimated should be on the same(correct) surface, e.g., think of occlusion boundaries

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 12 / 69

Page 44: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Other postprocessing

Occluded areas can be detected using cross-checking, i.e., comparingleft-to-right and right-to-left disparity maps

A median filter is typically used to remove spurious mismatches

Holes due to occlusion can be filled by surface fitting, or by distributingneighboring disparity estimates

It is interesting sometimes to estimate a notion of confidence for eachestimate, e.g.,

Var(d) =σ2f

a

with σ2f the image noise and a the curvature of the disparity image (DSI)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 13 / 69

Page 45: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Other postprocessing

Occluded areas can be detected using cross-checking, i.e., comparingleft-to-right and right-to-left disparity maps

A median filter is typically used to remove spurious mismatches

Holes due to occlusion can be filled by surface fitting, or by distributingneighboring disparity estimates

It is interesting sometimes to estimate a notion of confidence for eachestimate, e.g.,

Var(d) =σ2f

a

with σ2f the image noise and a the curvature of the disparity image (DSI)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 13 / 69

Page 46: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Other postprocessing

Occluded areas can be detected using cross-checking, i.e., comparingleft-to-right and right-to-left disparity maps

A median filter is typically used to remove spurious mismatches

Holes due to occlusion can be filled by surface fitting, or by distributingneighboring disparity estimates

It is interesting sometimes to estimate a notion of confidence for eachestimate, e.g.,

Var(d) =σ2f

a

with σ2f the image noise and a the curvature of the disparity image (DSI)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 13 / 69

Page 47: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Other postprocessing

Occluded areas can be detected using cross-checking, i.e., comparingleft-to-right and right-to-left disparity maps

A median filter is typically used to remove spurious mismatches

Holes due to occlusion can be filled by surface fitting, or by distributingneighboring disparity estimates

It is interesting sometimes to estimate a notion of confidence for eachestimate, e.g.,

Var(d) =σ2f

a

with σ2f the image noise and a the curvature of the disparity image (DSI)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 13 / 69

Page 48: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Optimization

There is no more aggregation step

The problem is formulated as an energy minimization, i.e., MAP on aMarkov random field

The MRF is expressed as the level of:

PixelsFronto-parallel planes for sets of pixelsSlanted plances for sets of pixels

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 14 / 69

Page 49: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Optimization

There is no more aggregation step

The problem is formulated as an energy minimization, i.e., MAP on aMarkov random field

The MRF is expressed as the level of:

PixelsFronto-parallel planes for sets of pixelsSlanted plances for sets of pixels

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 14 / 69

Page 50: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Optimization

There is no more aggregation step

The problem is formulated as an energy minimization, i.e., MAP on aMarkov random field

The MRF is expressed as the level of:

Pixels

Fronto-parallel planes for sets of pixelsSlanted plances for sets of pixels

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 14 / 69

Page 51: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Optimization

There is no more aggregation step

The problem is formulated as an energy minimization, i.e., MAP on aMarkov random field

The MRF is expressed as the level of:

PixelsFronto-parallel planes for sets of pixels

Slanted plances for sets of pixels

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 14 / 69

Page 52: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Optimization

There is no more aggregation step

The problem is formulated as an energy minimization, i.e., MAP on aMarkov random field

The MRF is expressed as the level of:

PixelsFronto-parallel planes for sets of pixelsSlanted plances for sets of pixels

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 14 / 69

Page 53: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Optimization

There is no more aggregation step

The problem is formulated as an energy minimization, i.e., MAP on aMarkov random field

The MRF is expressed as the level of:

PixelsFronto-parallel planes for sets of pixelsSlanted plances for sets of pixels

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 14 / 69

Page 54: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 55: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 56: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 57: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 58: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 59: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 60: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

Most methods write the energy of the system as

E (d1, · · · , dn) =∑i

Ci (di ) +∑i

∑j∈N (j)

Cij(di , dj)

where di ∈ {0, 1, · · · ,D} represents the disparity of the i−th pixel

If the second term does not exist, can we solve this easily? How?

What about if the second term exists? do you know when we can solve this?

What are these costs functions?

Ci is any of the local matching algorithms we have seen

What about Cij?

Encode for example that neighbors should be similar, or that for similarappearances, the neighbors should be similar. How do we encode this?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 15 / 69

Page 61: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Unitary cost functions

I(x, y) J(x, y)

C(x, y, d); the disparity space image (DSI) x

d

[Source: N. Snavely]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 16 / 69

Page 62: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Pairwise cost functions

A function of the disparity of neighboring pixels

C (di , dj) = ρ(di − dj)

with ρ a monotonic increasing function of disparity difference

This can be the L-1 distance

C (di , dj) = |di − dj |

A popular choice is the Potts model

C (di , dj) =

{γ if di 6= yj

0 if di = yj

Figure: from N. Snavely

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 17 / 69

Page 63: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Pairwise cost functions

A function of the disparity of neighboring pixels

C (di , dj) = ρ(di − dj)

with ρ a monotonic increasing function of disparity difference

This can be the L-1 distance

C (di , dj) = |di − dj |

A popular choice is the Potts model

C (di , dj) =

{γ if di 6= yj

0 if di = yj

Figure: from N. SnavelyRaquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 17 / 69

Page 64: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Pairwise cost functions

A function of the disparity of neighboring pixels

C (di , dj) = ρ(di − dj)

with ρ a monotonic increasing function of disparity difference

This can be the L-1 distance

C (di , dj) = |di − dj |

A popular choice is the Potts model

C (di , dj) =

{γ if di 6= yj

0 if di = yj

Figure: from N. SnavelyRaquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 17 / 69

Page 65: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Pairwise cost functions

This can be made dependent on the image evidence

C (di , dj) = ρd(di , dj) · ρl(||I (ui , vi )− I (uj , vj)||)

where ρl is some monotonically decreasing function of intensity differencesthat lowers smoothness costs at high-intensity gradients

More sophisticated cost functions can be defined. Any ideas?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 18 / 69

Page 66: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Pairwise cost functions

This can be made dependent on the image evidence

C (di , dj) = ρd(di , dj) · ρl(||I (ui , vi )− I (uj , vj)||)

where ρl is some monotonically decreasing function of intensity differencesthat lowers smoothness costs at high-intensity gradients

More sophisticated cost functions can be defined. Any ideas?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 18 / 69

Page 67: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSamplingSimulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 68: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSamplingSimulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 69: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximations

SamplingSimulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 70: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSampling

Simulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 71: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSamplingSimulated annealing

Graph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 72: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSamplingSimulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functions

Message passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 73: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSamplingSimulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 74: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

MRFs on pixels

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

where xi ∈ {0, 1, · · · ,D} represents a variable for the disparity of the i−thpixel

This optimization is in general NP-hard.

Global optima can be obtained in a few cases. Do you know any?

Several ways to get an approximate solution typically

Dynamic programming approximationsSamplingSimulated annealingGraph-cuts: imposes restrictions on the type of pairwise cost functionsMessage passing: iterative algorithms that pass messages betweennodes in the graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 19 / 69

Page 75: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

A simple DP algorithm

Neglect the vertical smoothness constraints Scharstein and Szeliski (2002)

Then simply optimize independent scanlines (one for each direction j) in theglobal energy function by recursive computation

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

What are the problems?

What do we do with each direction?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 20 / 69

Page 76: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

A simple DP algorithm

Neglect the vertical smoothness constraints Scharstein and Szeliski (2002)

Then simply optimize independent scanlines (one for each direction j) in theglobal energy function by recursive computation

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

What are the problems?

What do we do with each direction?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 20 / 69

Page 77: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

A simple DP algorithm

Neglect the vertical smoothness constraints Scharstein and Szeliski (2002)

Then simply optimize independent scanlines (one for each direction j) in theglobal energy function by recursive computation

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

What are the problems?

What do we do with each direction?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 20 / 69

Page 78: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

A simple DP algorithm

Neglect the vertical smoothness constraints Scharstein and Szeliski (2002)

Then simply optimize independent scanlines (one for each direction j) in theglobal energy function by recursive computation

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

What are the problems?

What do we do with each direction?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 20 / 69

Page 79: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

A simple DP algorithm

Finds smooth path through DPI from left to right

y = 141

x

d

[Source: N. Snavely]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 21 / 69

Page 80: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

A simple DP algorithm

[Source: N. Snavely]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 22 / 69

Page 81: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Semiglobal block matching [Hirschmueller08]

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

with the following pairwise term

C (di , dj) =

0 if di = dj

λ1 if |di − dj | = 1

λ2 otherwise

It computes the costs in each direction

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

And aggregate the costs

D(p; d) =∑j

Lj(p, d)

Then do winner take all

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 23 / 69

Page 82: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Semiglobal block matching [Hirschmueller08]

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

with the following pairwise term

C (di , dj) =

0 if di = dj

λ1 if |di − dj | = 1

λ2 otherwise

It computes the costs in each direction

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

And aggregate the costs

D(p; d) =∑j

Lj(p, d)

Then do winner take all

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 23 / 69

Page 83: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Semiglobal block matching [Hirschmueller08]

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

with the following pairwise term

C (di , dj) =

0 if di = dj

λ1 if |di − dj | = 1

λ2 otherwise

It computes the costs in each direction

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

And aggregate the costs

D(p; d) =∑j

Lj(p, d)

Then do winner take all

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 23 / 69

Page 84: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Semiglobal block matching [Hirschmueller08]

The energy is defined as

E (d1, · · · , dn) =∑i

C (di ) +∑i

∑j∈N (j)

C (di , dj)

with the following pairwise term

C (di , dj) =

0 if di = dj

λ1 if |di − dj | = 1

λ2 otherwise

It computes the costs in each direction

Dj(p; d) = C (p; d) + mind′{D(p− j, d ′) + ρd(d − d ′)}

And aggregate the costs

D(p; d) =∑j

Lj(p, d)

Then do winner take all

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 23 / 69

Page 85: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Global Minimization Techniques

Multiple ways to get an approximate solution typically

Dynamic programming approximations

Sampling

Simulated annealing

Graph-cuts: imposes restrictions on the type of pairwise cost functions

Message passing: iterative algorithms that pass messages between nodes inthe graph. Which graph?

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 24 / 69

Page 86: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Let’s look more generaly into MRFs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 25 / 69

Page 87: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Structure Prediction

Input: x ∈ X , typically an image.

Output: label y ∈ Y.

Consider a score function θ(x , y) called potential or feature such that

θ(x , y) =

{high if y is a good label for x

low if y is a bad label for x

We want to predict a label as

y∗ = arg maxyθ(x , y)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 26 / 69

Page 88: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Score Decomposition

We assume that the score decomposes

θ(y |x) =∑i

θi (yi ) +∑α

θα(yα)

This represents a (conditional) Markov Random Field (CRF)

p(y |x) =1

Z

∏i

ψi (x , yi )∏α

ψα(x , yα)

with logψi (x , yi ) = θi (x , yi ), and logψα(x , yα) = θα(x , yα).

Z =∑

y

∏i ψi (x , yi )

∏α ψα(x , yα) is the partition function.

Prediction also decomposes

y∗ = arg maxy

∑i

θi (yi ) +∑α

θα(yα)

This in general is NP hard.

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 27 / 69

Page 89: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Score Decomposition

We assume that the score decomposes

θ(y |x) =∑i

θi (yi ) +∑α

θα(yα)

This represents a (conditional) Markov Random Field (CRF)

p(y |x) =1

Z

∏i

ψi (x , yi )∏α

ψα(x , yα)

with logψi (x , yi ) = θi (x , yi ), and logψα(x , yα) = θα(x , yα).

Z =∑

y

∏i ψi (x , yi )

∏α ψα(x , yα) is the partition function.

Prediction also decomposes

y∗ = arg maxy

∑i

θi (yi ) +∑α

θα(yα)

This in general is NP hard.

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 27 / 69

Page 90: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Score Decomposition

We assume that the score decomposes

θ(y |x) =∑i

θi (yi ) +∑α

θα(yα)

This represents a (conditional) Markov Random Field (CRF)

p(y |x) =1

Z

∏i

ψi (x , yi )∏α

ψα(x , yα)

with logψi (x , yi ) = θi (x , yi ), and logψα(x , yα) = θα(x , yα).

Z =∑

y

∏i ψi (x , yi )

∏α ψα(x , yα) is the partition function.

Prediction also decomposes

y∗ = arg maxy

∑i

θi (yi ) +∑α

θα(yα)

This in general is NP hard.

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 27 / 69

Page 91: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Score Decomposition

We assume that the score decomposes

θ(y |x) =∑i

θi (yi ) +∑α

θα(yα)

This represents a (conditional) Markov Random Field (CRF)

p(y |x) =1

Z

∏i

ψi (x , yi )∏α

ψα(x , yα)

with logψi (x , yi ) = θi (x , yi ), and logψα(x , yα) = θα(x , yα).

Z =∑

y

∏i ψi (x , yi )

∏α ψα(x , yα) is the partition function.

Prediction also decomposes

y∗ = arg maxy

∑i

θi (yi ) +∑α

θα(yα)

This in general is NP hard.

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 27 / 69

Page 92: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Score Decomposition

We assume that the score decomposes

θ(y |x) =∑i

θi (yi ) +∑α

θα(yα)

This represents a (conditional) Markov Random Field (CRF)

p(y |x) =1

Z

∏i

ψi (x , yi )∏α

ψα(x , yα)

with logψi (x , yi ) = θi (x , yi ), and logψα(x , yα) = θα(x , yα).

Z =∑

y

∏i ψi (x , yi )

∏α ψα(x , yα) is the partition function.

Prediction also decomposes

y∗ = arg maxy

∑i

θi (yi ) +∑α

θα(yα)

This in general is NP hard.

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 27 / 69

Page 93: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Markov Networks

A clique in an undirected graph is a subset of its vertices such that everytwo vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including onemore adjacent vertex.

For a set of variables y = {y1, · · · , yN} a Markov network is defined as aproduct of potentials over the maximal cliques yα of the graph G

p(y1, · · · , yN) =1

Z

∏α

ψα(yα)

Special case: cliques of size 2 – pairwise Markov network

In case all potentials are strictly positive this is called a Gibbs distribution

Example: p(a, b, c) = 1Z ψa,c(a, c)ψbc(b, c)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 28 / 69

Page 94: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Markov Networks

A clique in an undirected graph is a subset of its vertices such that everytwo vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including onemore adjacent vertex.

For a set of variables y = {y1, · · · , yN} a Markov network is defined as aproduct of potentials over the maximal cliques yα of the graph G

p(y1, · · · , yN) =1

Z

∏α

ψα(yα)

Special case: cliques of size 2 – pairwise Markov network

In case all potentials are strictly positive this is called a Gibbs distribution

Example: p(a, b, c) = 1Z ψa,c(a, c)ψbc(b, c)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 28 / 69

Page 95: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Markov Networks

A clique in an undirected graph is a subset of its vertices such that everytwo vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including onemore adjacent vertex.

For a set of variables y = {y1, · · · , yN} a Markov network is defined as aproduct of potentials over the maximal cliques yα of the graph G

p(y1, · · · , yN) =1

Z

∏α

ψα(yα)

Special case: cliques of size 2 – pairwise Markov network

In case all potentials are strictly positive this is called a Gibbs distribution

Example: p(a, b, c) = 1Z ψa,c(a, c)ψbc(b, c)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 28 / 69

Page 96: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Markov Networks

A clique in an undirected graph is a subset of its vertices such that everytwo vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including onemore adjacent vertex.

For a set of variables y = {y1, · · · , yN} a Markov network is defined as aproduct of potentials over the maximal cliques yα of the graph G

p(y1, · · · , yN) =1

Z

∏α

ψα(yα)

Special case: cliques of size 2 – pairwise Markov network

In case all potentials are strictly positive this is called a Gibbs distribution

Example: p(a, b, c) = 1Z ψa,c(a, c)ψbc(b, c)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 28 / 69

Page 97: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Markov Networks

A clique in an undirected graph is a subset of its vertices such that everytwo vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including onemore adjacent vertex.

For a set of variables y = {y1, · · · , yN} a Markov network is defined as aproduct of potentials over the maximal cliques yα of the graph G

p(y1, · · · , yN) =1

Z

∏α

ψα(yα)

Special case: cliques of size 2 – pairwise Markov network

In case all potentials are strictly positive this is called a Gibbs distribution

Example: p(a, b, c) = 1Z ψa,c(a, c)ψbc(b, c)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 28 / 69

Page 98: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Markov Networks

A clique in an undirected graph is a subset of its vertices such that everytwo vertices in the subset are connected by an edge.

A maximal clique is a clique that cannot be extended by including onemore adjacent vertex.

For a set of variables y = {y1, · · · , yN} a Markov network is defined as aproduct of potentials over the maximal cliques yα of the graph G

p(y1, · · · , yN) =1

Z

∏α

ψα(yα)

Special case: cliques of size 2 – pairwise Markov network

In case all potentials are strictly positive this is called a Gibbs distribution

Example: p(a, b, c) = 1Z ψa,c(a, c)ψbc(b, c)

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 28 / 69

Page 99: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Properties of Markov Network

Marginalizing over c makes a and b dependent

Conditioning on c makes a and b independent

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 29 / 69

Page 100: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Properties of Markov Network

Marginalizing over c makes a and b dependent

Conditioning on c makes a and b independent

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 29 / 69

Page 101: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local and Global Markov properties

Local Markov property: condition on neighbours makes indep. of the rest

p(yi |y \ {yi}) = p(y |ne(yi ))

Example: y4⊥{y1, y7}|{y2, y3, y5, y6}Global Markov Property: For disjoint sets of variables (A,B,S), where Sseparates A from B then A⊥B|S

S is called a separator.Example: y1⊥y7|{y4}

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 30 / 69

Page 102: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local and Global Markov properties

Local Markov property: condition on neighbours makes indep. of the rest

p(yi |y \ {yi}) = p(y |ne(yi ))

Example: y4⊥{y1, y7}|{y2, y3, y5, y6}Global Markov Property: For disjoint sets of variables (A,B,S), where Sseparates A from B then A⊥B|SS is called a separator.Example: y1⊥y7|{y4}

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 30 / 69

Page 103: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Local and Global Markov properties

Local Markov property: condition on neighbours makes indep. of the rest

p(yi |y \ {yi}) = p(y |ne(yi ))

Example: y4⊥{y1, y7}|{y2, y3, y5, y6}Global Markov Property: For disjoint sets of variables (A,B,S), where Sseparates A from B then A⊥B|SS is called a separator.Example: y1⊥y7|{y4}

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 30 / 69

Page 104: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Relationship Potentials to Graphs

Consider

p(a, b, c) =1

Zψ(a, b)ψ(b, c)ψ(c , a)

What is the corresponding Markov network (graphical representation)?

Which other factorization is represented by this network?

The factorization is not specified by the graph

Let’s look at Factor Graphs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 31 / 69

Page 105: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Relationship Potentials to Graphs

Consider

p(a, b, c) =1

Zψ(a, b)ψ(b, c)ψ(c , a)

What is the corresponding Markov network (graphical representation)?

Which other factorization is represented by this network?

The factorization is not specified by the graph

Let’s look at Factor Graphs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 31 / 69

Page 106: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Relationship Potentials to Graphs

Consider

p(a, b, c) =1

Zψ(a, b)ψ(b, c)ψ(c , a)

What is the corresponding Markov network (graphical representation)?

Which other factorization is represented by this network?

The factorization is not specified by the graph

Let’s look at Factor Graphs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 31 / 69

Page 107: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Relationship Potentials to Graphs

Consider

p(a, b, c) =1

Zψ(a, b)ψ(b, c)ψ(c , a)

What is the corresponding Markov network (graphical representation)?

Which other factorization is represented by this network?

p(a, b, c) =1

Zψ(a, b, c)

The factorization is not specified by the graph

Let’s look at Factor Graphs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 31 / 69

Page 108: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Relationship Potentials to Graphs

Consider

p(a, b, c) =1

Zψ(a, b)ψ(b, c)ψ(c , a)

What is the corresponding Markov network (graphical representation)?

Which other factorization is represented by this network?

p(a, b, c) =1

Zψ(a, b, c)

The factorization is not specified by the graph

Let’s look at Factor Graphs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 31 / 69

Page 109: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Relationship Potentials to Graphs

Consider

p(a, b, c) =1

Zψ(a, b)ψ(b, c)ψ(c , a)

What is the corresponding Markov network (graphical representation)?

Which other factorization is represented by this network?

p(a, b, c) =1

Zψ(a, b, c)

The factorization is not specified by the graph

Let’s look at Factor Graphs

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 31 / 69

Page 110: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Factor Graphs

Now consider we introduce an extra node (a square) for each factor

The factor graph (FG) has a node (represented by a square) for each factorψ(yα) and a variable node (represented by a circle) for each variable xi .

Left: Markov Network

Middle: Factor graph representation of ψ(a, b, c)

Right: Factor graph representation of ψ(a, b)ψ(b, c)ψ(c , a)

Different factor graphs can have the same Markov network

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 32 / 69

Page 111: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Examples

Which distribution?

What factor graph?

p(x1, x2, x3) = p(x1)p(x2)p(x3|x1, x2)

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 33 / 69

Page 112: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Inference in trees

Given distribution p(y1, · · · , yn)

Inference: computing functions of the distribution

meanmarginalconditionals

Marginal inference in singly-connected graph (trees)

Later: extensions to loopy graphs

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 34 / 69

Page 113: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Variable Elimination

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 35 / 69

Page 114: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Variable Elimination

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 36 / 69

Page 115: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Variable Elimination

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 37 / 69

Page 116: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Finding Conditional Marginals

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 38 / 69

Page 117: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Finding Conditional Marginals

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 39 / 69

Page 118: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Now with factor graphs

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 40 / 69

Page 119: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Inference in Chain Structured Factor Graphs

Simply recurse further

γm→n(n) carries the information beyond m

We did not need the factors in general (next) we will see that making adistinction is helpful

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 41 / 69

Page 120: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

General singly-connected factor graphs I

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 42 / 69

Page 121: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

General singly-connected factor graphs II

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 43 / 69

Page 122: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

General singly-connected factor graphs III

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 44 / 69

Page 123: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

General singly-connected factor graphs IV

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 45 / 69

Page 124: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Summary

Once computed, messages can be re-used

All marginals p(c), p(d), p(c , d), · · · can be written as a function ofmessages

We need an algorithm to compute all messages: Sum-Product algorithm

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 46 / 69

Page 125: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Sum-product algorithm overview

Algorithm to compute all messages efficiently, assuming the graph issingly-connected

It can be used to compute any desired marginals

Also known as belief propagation (BP)

The algorithm is composed of

1 Initialization

2 Variable to Factor message

3 Factor to Variable message

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 47 / 69

Page 126: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

1. Initialization

Messages from extremal (simplical) node factors are initialized to the factor(left)

Messages from extremal (simplical) variable nodes are set to unity (right)

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 48 / 69

Page 127: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

2. Variable to Factor message

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 49 / 69

Page 128: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

3. Factor to Variable message

We sum over all states in the set of variables

This explains the name for the algorithm (sum-product)

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 50 / 69

Page 129: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Marginal computation

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 51 / 69

Page 130: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Message Ordering

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 52 / 69

Page 131: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Problems with loops

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 53 / 69

Page 132: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

What to infer?

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 54 / 69

Page 133: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Computing the Partition Function

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 55 / 69

Page 134: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Log Messages

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 56 / 69

Page 135: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Log Messages

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 57 / 69

Page 136: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Trick

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 58 / 69

Page 137: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Finding the maximal state: Max-Product

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 59 / 69

Page 138: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Be careful: not maximal marginal states!

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 60 / 69

Page 139: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Example chain

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 61 / 69

Page 140: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Example chain

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 62 / 69

Page 141: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Trees

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 63 / 69

Page 142: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Max-Product Algorithm

Pick any variable as root and

1 Initialisation (same as sum-product)

2 Variable to Factor message (same as sum-product)

3 Factor to Variable message

Then compute the maximal state

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 64 / 69

Page 143: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

1. Initialization

Messages from extremal node factors are initialized to the factor

Messages from extremal variable nodes are set to unity

Same as sum product

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 65 / 69

Page 144: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

2. Variable to Factor message

Same as for sum-product

[Source: P. Gehler]

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 66 / 69

Page 145: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

3. Factor to Variable message

Different message than in sum-product

This is now a max-product

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 67 / 69

Page 146: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Maximal state of Variable

This does not work with loops

Same problem as the sum product algorithm

[Source: P. Gehler]Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 68 / 69

Page 147: Stereo and Energy Minimization - Department of Computer ...urtasun/courses/CV/lecture13.pdfStereo and Energy Minimization Raquel Urtasun TTI Chicago Feb 21, 2013 Raquel Urtasun (TTI-C)

Dealing with loops

Keep on doing this iterations, i.e., loopy BP

The problem with loopy BP is that it is not guaranteed to converge

Message-passing algorithms based on LP relaxations have been developed

These methods are guaranteed to converge

Perform much better in practice

Raquel Urtasun (TTI-C) Computer Vision Feb 21, 2013 69 / 69