Depth from a Light Field Image with Learning-based ...lightfield-analysis.net/LF4CV/slides/s3_challenge_psrf_haegon_jeon.pdf · Depth from a Light Field Image with Learning-based

Depth from a Light Field Image with Learning-based Matching Costs

Finalist of the Depth Estimation Challenge at LF4CV &

Submitted to IEEE TPAMI (under review)

Hae-Gon Jeon¹, Jaesik Park², Gyeongmin Choe¹, Jinsun Park¹

Yunsu Bok³, Yu-Wing Tai⁴, In So Kweon¹

¹KAIST ²Intel labs ³ETRI ⁴Tencent

Goal of the Proposed Method

Problem2:

Severe noise

Problem1:

Severe vignetting 1. Hard to find accurate correspondence in

radiometric distortions and severe noise

Using various hand-craft matching cost

2. Which one is correct matching cost?

Predicting the correct matching cost using

two random forests

3. Does it work well in real world light-field images?

Realistic dataset generation based on an

imaging pipeline of the Lytro camera

Overview of the Proposed Method

1. Realistic Light Field Image Generation;

Emulating an imaging pipeline of Lytro camera

3. Random Forest 1 - Classification;

Selecting dominant matching costs

4. Random Forest 2 - Regression;

Predicting a disparity value with sub-pixel precision

2. Making Cost Volumes using Phase Shift;

Overcoming inherent degradation of light-field

images caused by a lenslet array

SAD GRAD Census ZNCC

q = [ ]

Data Generation Vignetting Map

Noise-free multi-view images Vignetting map from averaged

white plane images

White plane image

Data Generation Lenslet Image Generation

Sub-aperture image with

vignetting map

Extract a pixel from each sub-aperture image

Aggregate these pixels in a lenslet

Data Generation Add Noise

Noise level estimation of each color channel

0.2 0.3 0.4 0.5 0.60

0.005

0.01

0.015

0.02

0.025

Intensity

Sta

nd

ard

De

via

tio

n

Green Channel1

0.2 0.3 0.4 0.5 0.60

0.005

0.01

0.015

0.02

0.025

Intensity

Sta

nd

ard

De

via

tio

n

Green Channel2

0.2 0.3 0.4 0.5 0.60

0.005

0.01

0.015

0.02

0.025

Intensity

Sta

nd

ard

De

via

tio

n

Blue Channel

0.2 0.3 0.4 0.5 0.60

0.005

0.01

0.015

0.02

0.025

Intensity

Sta

nd

ard

De

via

tio

n

Red Channel

Convert color image to raw image

Y. Schechner et al., “Multiplexing for optimal lighting”, IEEE TPAMI 2007

Data Generation Realistic Sub-aperture Image Generation

Noisy raw imageDemosaicingRearrange pixels at each lenslet to

each sub-aperture image

Training Set http://hci-lightfield.iwr.uni-heidelberg.de/

Antinous, Range: [ -3.3, 2.8 ] Boardgames, Range: [ -1.8, 1.6 ] Dishes, Range: [ -3.1, 3.5 ] Greek, Range: [ -3.5, 3.1 ]

Kitchen, Range: [ -1.6, 1.8 ] Medieval2, Range: [ -1.7, 2.0 ] Museum, Range: [ -1.5, 1.3 ] Pens, Range: [ -1.7, 2.0 ]

Pillows, Range: [ -1.7, 1.8 ] Platonic, Range: [ -1.7, 1.5 ] Rosemary, Range: [ -1.8, 1.8 ] Table, Range: [ -2.0, 1.6 ]

Tomb, Range: [ -1.5, 1.9 ] Tower, Range: [ -3.6, 3.5 ] Town, Range: [ -1.6, 1.6 ] Vinyl, Range: [ -1.6, 1.2 ]

Cost Volumes Phase Shift

Flipping adjacent views

Sub-aperture images

Very narrow baseline;Physically 0.45mm

Within 1px

Averbuch and Keller, “A unified approach to FFT based images registration”, IEEE TIP 2003

Phase shift =>

1/100 pixel precision

Original Bilinear Bicubic Phase

Cost Volumes Phase ShiftGT

GT

Bilinear Bicubic Phase

Bilinear Bicubic Phase

0.2 %

1 %

0.2 %

1 %

16.2% 15.35% 9.88%

9.03% 8.73% 6.38%

Jeon et al., “Accurate Depth Map Estimation from a Lenslet Light Field Camera”, CVPR 2015

Cost Volumes Matching Costs

Sum of Absolute Difference (SAD)

Zero-mean Normalized Cross correlation (ZNCC)

Census Transform (Census)

Sum of Gradient Difference (GRAD)

+ Robust to image noise;

act as averaged filter

+ Compensate for differences in

both gain and offset

+ Synergy with other matching costs

+ imposing higher weights at edge boundaries

+ Tolerate radiometric

distortions

H. Hirschmuller and D. Scharstein, “Evaluation of stereo matchingcosts on images with radiometric differences,” IEEE TPAMI 2009.

Cost Volumes Matching group1

𝑓( )

Sub-aperture images

Matching Cost

,Reference view Target view

Cost volume

Depth

label

Cost Volumes Matching group2

𝑓( )

Sub-aperture images

Matching Cost

,Reference view Target view

Cost volume

Depth

label

Cost Volumes Computed Cost Volumes

Matc

hin

g g

roup

Matching costSum of

Absolute

Difference

(SAD)

Zero-mean

Normalized

Cross

correlation

(ZNCC)

Census

Transform

(Census)

Sum of

Gradient

Difference

(GRAD)


Disparities from each cost volume

via Winner-Takes-All


Vectorizing estimated depth labels

with a ground truth depth label

31 53 43 55

55

55

55

55 6160 74

Ground truth

67 51 53 37 6658 12

25 42 49 55 6143 57

76 72 66 23 5558 56

SAD+GRAD GRAD+Census Census+SAD 𝛼 ∈ [0, 1.0]

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

Multiple disparity hypotheses

Campbell et al., “Using Multiple hypotheses to improve depth-maps for multi-view stereo”, ECCV 2008




Vectorizing estimated depth labels

with a ground truth depth label

25 54 48 32

32

32

32

32 3442 11

Ground truth

19 20 43 37 3233 5

31 42 29 12 4134 57

44 39 56 49 4317 32

SAD+GRAD GRAD+Census Census+SAD 𝛼 ∈ [0, 1.0]

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯

⋯




Training a random forest

32

32

31 42 29 12 4134 57

44 39 56 49 4317 32

⋯

⋯

⋯

⋯

25 54 48 3232 3442 11⋯ ⋯ ⋯

3219 20 43 37 3233 5⋯ ⋯ ⋯

⋯

⋯

𝐪

Random forest 1

Classification

Random Forest1 - Classification

Random Forest1 - Classification

Import

ance

q4

q3 q1 q2

q7 q9

q5 q8 q10 q11

q6

𝐪Retrieving a set of

important matching costs

using the permutation

importance measure

[L. Breiman, “Random forests,” Machine learning]

+ Removing unnecessary

matching cost

+ Designing a better

prediction model

Matching Group1 Matching Group2 Matching Group3 Matching Group4

Random Forest2 - Regression

Random forest 2

Regression 𝐪 q4 q3 q1 q2 q7 q9 q5 q8 q10 q11 q6

vs.

Estimated disparity value

with sub-pixel precisionSAD+GRAD

[H.-G. Jeon et al., IEEE CVPR 2015]

with Weighted Median Filter

[Z. Ma et al., IEEE ICCV 2013]

Input of a random forest

for regression

Benchmark Bad pixel ratio (>0.07px) & Mean square error

Bad pixel ratio Mean square error

(2017.05.23)

Evaluation Results - StratifiedGT

Est

imate

dError M

ap

GT

Est

imate

dError M

ap

Evaluation Results - Training

Most errors are shown in depth

boundaries

GT

Est

imate

dError M

ap

Evaluation Results - Test

Real World Examples – Lytro Illum

Wanner and Goldluecke, IEEE TPAMI 14

Yu et al, ICCV 13

Jeon et al, CVPR 15

Williem et al, CVPR 16

Wang et al, IEEE TPAMI 16

Tao et al, IEEE TPAMI 17 Proposed


Yu et al, ICCV 13

Jeon et al, CVPR 15




Real World Examples – Lytro IllumWanner and Goldluecke, IEEE TPAMI 14

Yu et al, ICCV 13

Jeon et al, CVPR 15





Yu et al, ICCV 13

Jeon et al, CVPR 15




Conclusion

Pros: Accurate disparity estimation

+ Handling narrow baseline problem

+ Robust to Image noise

+ Applicable real-world light field images

Cons:

- Heavy computational burden

- Need to minimize disparity error in depth discontinuities

- Requiring for handling textureless regions

Contributions:

● Analysis of the problems of depth estimation using light-field cameras

● Data augmentation that simulates a pipeline of a hand-held light-field camera

● Pixel-wise disparity value prediction using two random forests

Object

3D printing3D Mesh

Data Generation Add Noise

Without augmented

training set

With training set augmented

with Gaussian noise

With fully augmented

training set

Documents

Depth from a Light Field Image with Learning-based ...lightfield-analysis.net/LF4CV/slides/s3_challenge_psrf_haegon_jeon.pdf · Depth from a Light Field Image with Learning-based