High-Quality Video View Interpolation Larry Zitnick Interactive Visual Media Group Microsoft Research

High-Quality Video View Interpolation

Larry ZitnickInteractive Visual Media Group

Microsoft Research

3D video

Lumigraph Light field

Geometry centric Image centric

Warping InterpolationPolygon rendering + texture mapping

Fixed geometry

View-dependent geometry

View-dependent texture

Sprites with depth

Layered depth Image

Current practice

Many cameras

Motion Jitter

vs.

free viewpoint video

Current practice

Many cameras

Motion Jitter

vs.

free viewpoint video

Video view interpolation

Fewer cameras

Smooth Motion

Automatic

and

Real-time rendering

System overview

OFFLINE

ONLINE

Video Capture

Stereo Compression

SelectiveDecompression

Render

File

Representation

Video Capture

concentratorsconcentrators

hard diskshard disks controlling

laptopcontrollinglaptop

camerascamerascameras

Calibration

Zhengyou Zhang, 2000

Input videos

System overview

OFFLINE

ONLINE

Video Capture

Stereo Compression


Render

File

Representation

Video Capture

Stereo

Key to view interpolation: Geometry

Stereo Geometry

Camera 1 Camera 2

Image 1 Image 2

Virtual Camera

Match ScoreMatch ScoreMatch Score Match Score

Good

Bad

Image correspondence

Correct

Image 1 Image 2

Leg

Wall

Incorrect

Why segments?

Better delineation of boundaries.

Why segments?

Larger support for matching.

Handle gain and offset differences without global model (Kim, Kolmogorov and Zabih, 2003.)

Why segments?

More efficient.

786,432 pixels vs. 1000 segments

Compute disparities per segment rather than per pixel.

Segmentation

Many methods will work:

Graph-based (Felzenszwalb and Huttenlocher, 2004)

Mean Shift (Comaniciu, et al. 2001)

Min-cut (Boykov et al. 2001)

Others…

Segmentation: Important properties

Not too large, not too small…

As large as possible while not spanning multiple objects.

Segmentation: Important properties

Stable Regions

Segmentation: Our Approach

First average…

…then segment.

Anisotropic smoothing

Segmentation: Result

Close-up

Matching segments

Many measures will work: SSD Normalized correlation Mutual information

Depends on color balancing and image quality.

Matching segments: Important properties

Never remove correct matches.

Remove as many false matches as possible

Use global methods to remove remaining false positives.

Matching segments: Our approach

Create gain histogram

Good match

Bad match0.8

0.8 1.25

1.25

1

0

p

pgain

Image 1 Image 2

Local matching

Low textureLow texture

Number of states = number of depth levels

Image 2Image 1

Global regularization

Create MRF (Markov Random Field):

A F

E

DC

B

Each segment is a node

P Q R

S T

U


Disparity Images

Likelihood (data term)

Prior (regularization term)

Image 2Image 1


A F

E

DC

B

colorA ≈ colorB → zA ≈ zB

P Q R

S T

U


i iSk knl

kliliki ddΝDP 2,;

Variance – % of border and similarity of colorNormal distribution

A F

E

DC

B

A F

E

DC

B

A F

E

DC

B

Disparity Disparity Disparity

Multiple disparity maps

Compute a disparity map for each image.

We want the disparity maps to be consistent across images…

Image 2Image 1

Consistent disparities

A F

E

DC

B

zA ≈ zP, zQ, zS

P Q R

S T

U

A

A


Disparities dependent on neighboring disparities.

Likelihood includes neighboring disparities.


if not occluded

if occluded

Use original data term if not occluded.

Bias disparities to lie behind known surfaces when occluded.

Is the segment occluded?

Not occludedOccluded

Ii

If occluded…

Occluded

Disparity

Ii

Iteratively solve MRF

Depth through time

MattingInterpolated view without matting

Background Surface

Foreground Surface

Camera

Foreground Alpha

Background

Bayesian MattingChuang et al. 2001

Strip Width

Background

Foreground

Rendering with matting

MattingNo Matting

System overview

OFFLINE

ONLINE

Video Capture

Stereo Compression


Render

File

RepresentationStereo Representation

Representation

Main Layer:

Color

Depth

Main

Boundary

Boundary Layer:

Color

Depth

Alpha

Background

ForegroundStrip Width

System overview

OFFLINE

ONLINE

Video Capture

Stereo Compression


Render

File

RepresentationRepresentation Compression

Time = 0 Time = 1

Camera 1

Camera 2

Camera 3

Camera 4

Compression

Time = 0 Time = 1

Camera 1

Camera 2

Camera 3

Camera 4

TemporalPrediction

Compression

Time = 0 Time = 1

Camera 1

Camera 2

Camera 3

Camera 4

SpatialPrediction

Compression

Spatial prediction

Depth and Texture

ReferenceCamera

PredictedCamera

ReferenceCamera

PredictedCamera

Depth and Texture

Warped

Spatial prediction

Error Signal

ReferenceCamera

PredictedCamera

Warped Depth and Texture

Spatial prediction

_

+

Reconstructed (after error signal is added)

ReferenceCamera

PredictedCamera

Warped Depth and Texture

Spatial prediction

Boundary layer coding

Depth Color Texture Alpha Matte

Color

Depth

Alpha

Use our own shape coding method similar to MPEG-4Use our own shape coding method similar to MPEG-4

System overview

OFFLINE

ONLINE

Video Capture

Stereo Compression


Render

File

Representation


Render

Compression

File

Rendering

Source Cameras

ProjectMain Layer

ProjectBoundary Layer

Composite

ProjectMain Layer


Rendering

ProjectedVideo of background

depth

Video of background

color

Depth Color

GPU

Vertex Shader Pixel ShaderPosition,Texture Coord

Rendering the main layer (Step 1)

Z-Buffer

Color Buffer

Main Layer Depth

Projected

GPU

Pixel Shader

CPU

Generate Erase Mesh

Rendering the main layer (Step 2)

Color Buffer

Z-Buffer

Locate Depth Discontinuities

GPU

Compositing

CPU

Generate Boundary Mesh

Boundary Depth

Boundary RGBA

Rendering boundary layerProjected Main Layer

Vertex Colors Color Buffer

Z-Buffer

Projected

Graphics for Vision

Use the GPU for vision.

Real-time stereo – (Yang and Pollefeys, CVPR 03)

ProjectMain Layer


Composite

ProjectMain Layer


Rendering

GPU

Pixel Shader

Camera 1 Camera 2

Final Result

Final composite

Weights based on proximity to virtual viewpoint

Compositing views

DEMO

“Massive Arabesque” videoclip

Future work

Mesh simplification More complicated scenes Temporal interpolation (use optical flow) Wider range of virtual motion 2D grid of cameras

Summary

Sparse camera configuration High-quality depth recovery Automatic matting New two-layer representation Inter-camera compression Real-time rendering

Documents

High-Quality Video View Interpolation Larry Zitnick Interactive Visual Media Group Microsoft Research