Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Depth from a Light Field Image with Learning-based Matching Costs
Finalist of the Depth Estimation Challenge at LF4CV &
Submitted to IEEE TPAMI (under review)
Hae-Gon Jeon¹, Jaesik Park², Gyeongmin Choe¹, Jinsun Park¹
Yunsu Bok³, Yu-Wing Tai⁴, In So Kweon¹
¹KAIST ²Intel labs ³ETRI ⁴Tencent
Goal of the Proposed Method
Problem2:
Severe noise
Problem1:
Severe vignetting 1. Hard to find accurate correspondence in
radiometric distortions and severe noise
Using various hand-craft matching cost
2. Which one is correct matching cost?
Predicting the correct matching cost using
two random forests
3. Does it work well in real world light-field images?
Realistic dataset generation based on an
imaging pipeline of the Lytro camera
Overview of the Proposed Method
1. Realistic Light Field Image Generation;
Emulating an imaging pipeline of Lytro camera
3. Random Forest 1 - Classification;
Selecting dominant matching costs
4. Random Forest 2 - Regression;
Predicting a disparity value with sub-pixel precision
2. Making Cost Volumes using Phase Shift;
Overcoming inherent degradation of light-field
images caused by a lenslet array
SAD GRAD Census ZNCC
q = [ ]
Data Generation Vignetting Map
Noise-free multi-view images Vignetting map from averaged
white plane images
White plane image
Data Generation Lenslet Image Generation
Sub-aperture image with
vignetting map
Extract a pixel from each sub-aperture image
Aggregate these pixels in a lenslet
Data Generation Add Noise
Noise level estimation of each color channel
0.2 0.3 0.4 0.5 0.60
0.005
0.01
0.015
0.02
0.025
Intensity
Sta
nd
ard
De
via
tio
n
Green Channel1
0.2 0.3 0.4 0.5 0.60
0.005
0.01
0.015
0.02
0.025
Intensity
Sta
nd
ard
De
via
tio
n
Green Channel2
0.2 0.3 0.4 0.5 0.60
0.005
0.01
0.015
0.02
0.025
Intensity
Sta
nd
ard
De
via
tio
n
Blue Channel
0.2 0.3 0.4 0.5 0.60
0.005
0.01
0.015
0.02
0.025
Intensity
Sta
nd
ard
De
via
tio
n
Red Channel
Convert color image to raw image
Y. Schechner et al., “Multiplexing for optimal lighting”, IEEE TPAMI 2007
Data Generation Realistic Sub-aperture Image Generation
Noisy raw imageDemosaicingRearrange pixels at each lenslet to
each sub-aperture image
Training Set http://hci-lightfield.iwr.uni-heidelberg.de/
Antinous, Range: [ -3.3, 2.8 ] Boardgames, Range: [ -1.8, 1.6 ] Dishes, Range: [ -3.1, 3.5 ] Greek, Range: [ -3.5, 3.1 ]
Kitchen, Range: [ -1.6, 1.8 ] Medieval2, Range: [ -1.7, 2.0 ] Museum, Range: [ -1.5, 1.3 ] Pens, Range: [ -1.7, 2.0 ]
Pillows, Range: [ -1.7, 1.8 ] Platonic, Range: [ -1.7, 1.5 ] Rosemary, Range: [ -1.8, 1.8 ] Table, Range: [ -2.0, 1.6 ]
Tomb, Range: [ -1.5, 1.9 ] Tower, Range: [ -3.6, 3.5 ] Town, Range: [ -1.6, 1.6 ] Vinyl, Range: [ -1.6, 1.2 ]
Cost Volumes Phase Shift
Flipping adjacent views
Sub-aperture images
Very narrow baseline;Physically 0.45mm
Within 1px
Averbuch and Keller, “A unified approach to FFT based images registration”, IEEE TIP 2003
Phase shift =>
1/100 pixel precision
Original Bilinear Bicubic Phase
Cost Volumes Phase ShiftGT
GT
Bilinear Bicubic Phase
Bilinear Bicubic Phase
0.2 %
1 %
0.2 %
1 %
16.2% 15.35% 9.88%
9.03% 8.73% 6.38%
Jeon et al., “Accurate Depth Map Estimation from a Lenslet Light Field Camera”, CVPR 2015
Cost Volumes Matching Costs
Sum of Absolute Difference (SAD)
Zero-mean Normalized Cross correlation (ZNCC)
Census Transform (Census)
Sum of Gradient Difference (GRAD)
+ Robust to image noise;
act as averaged filter
+ Compensate for differences in
both gain and offset
+ Synergy with other matching costs
+ imposing higher weights at edge boundaries
+ Tolerate radiometric
distortions
H. Hirschmuller and D. Scharstein, “Evaluation of stereo matchingcosts on images with radiometric differences,” IEEE TPAMI 2009.
Cost Volumes Matching group1
𝑓( )
Sub-aperture images
Matching Cost
,Reference view Target view
Cost volume
Depth
label
Cost Volumes Matching group2
𝑓( )
Sub-aperture images
Matching Cost
,Reference view Target view
Cost volume
Depth
label
Cost Volumes Computed Cost Volumes
Matc
hin
g g
roup
Matching costSum of
Absolute
Difference
(SAD)
Zero-mean
Normalized
Cross
correlation
(ZNCC)
Census
Transform
(Census)
Sum of
Gradient
Difference
(GRAD)
Cost Volumes Computed Cost Volumes
Disparities from each cost volume
via Winner-Takes-All
Cost Volumes Computed Cost Volumes
Vectorizing estimated depth labels
with a ground truth depth label
31 53 43 55
55
55
55
55 6160 74
Ground truth
67 51 53 37 6658 12
25 42 49 55 6143 57
76 72 66 23 5558 56
SAD+GRAD GRAD+Census Census+SAD 𝛼 ∈ [0, 1.0]
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
Multiple disparity hypotheses
Campbell et al., “Using Multiple hypotheses to improve depth-maps for multi-view stereo”, ECCV 2008
Multiple disparity hypotheses
Multiple disparity hypotheses
Cost Volumes Computed Cost Volumes
Vectorizing estimated depth labels
with a ground truth depth label
25 54 48 32
32
32
32
32 3442 11
Ground truth
19 20 43 37 3233 5
31 42 29 12 4134 57
44 39 56 49 4317 32
SAD+GRAD GRAD+Census Census+SAD 𝛼 ∈ [0, 1.0]
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯
Multiple disparity hypotheses
Multiple disparity hypotheses
Multiple disparity hypotheses
Training a random forest
32
32
31 42 29 12 4134 57
44 39 56 49 4317 32
⋯
⋯
⋯
⋯
25 54 48 3232 3442 11⋯ ⋯ ⋯
3219 20 43 37 3233 5⋯ ⋯ ⋯
⋯
⋯
𝐪
Random forest 1
Classification
Random Forest1 - Classification
Random Forest1 - Classification
Import
ance
q4
q3 q1 q2
q7 q9
q5 q8 q10 q11
q6
𝐪Retrieving a set of
important matching costs
using the permutation
importance measure
[L. Breiman, “Random forests,” Machine learning]
+ Removing unnecessary
matching cost
+ Designing a better
prediction model
Matching Group1 Matching Group2 Matching Group3 Matching Group4
Random Forest2 - Regression
Random forest 2
Regression 𝐪 q4 q3 q1 q2 q7 q9 q5 q8 q10 q11 q6
vs.
Estimated disparity value
with sub-pixel precisionSAD+GRAD
[H.-G. Jeon et al., IEEE CVPR 2015]
with Weighted Median Filter
[Z. Ma et al., IEEE ICCV 2013]
Input of a random forest
for regression
Benchmark Bad pixel ratio (>0.07px) & Mean square error
Bad pixel ratio Mean square error
(2017.05.23)
Evaluation Results - StratifiedGT
Est
imate
dError M
ap
GT
Est
imate
dError M
ap
Evaluation Results - Training
Most errors are shown in depth
boundaries
GT
Est
imate
dError M
ap
Evaluation Results - Test
Real World Examples – Lytro Illum
Wanner and Goldluecke, IEEE TPAMI 14
Yu et al, ICCV 13
Jeon et al, CVPR 15
Williem et al, CVPR 16
Wang et al, IEEE TPAMI 16
Tao et al, IEEE TPAMI 17 Proposed
Wanner and Goldluecke, IEEE TPAMI 14
Yu et al, ICCV 13
Jeon et al, CVPR 15
Williem et al, CVPR 16
Wang et al, IEEE TPAMI 16
Tao et al, IEEE TPAMI 17 Proposed
Real World Examples – Lytro IllumWanner and Goldluecke, IEEE TPAMI 14
Yu et al, ICCV 13
Jeon et al, CVPR 15
Williem et al, CVPR 16
Wang et al, IEEE TPAMI 16
Tao et al, IEEE TPAMI 17 Proposed
Wanner and Goldluecke, IEEE TPAMI 14
Yu et al, ICCV 13
Jeon et al, CVPR 15
Williem et al, CVPR 16
Wang et al, IEEE TPAMI 16
Tao et al, IEEE TPAMI 17 Proposed
Conclusion
Pros: Accurate disparity estimation
+ Handling narrow baseline problem
+ Robust to Image noise
+ Applicable real-world light field images
Cons:
- Heavy computational burden
- Need to minimize disparity error in depth discontinuities
- Requiring for handling textureless regions
Contributions:
● Analysis of the problems of depth estimation using light-field cameras
● Data augmentation that simulates a pipeline of a hand-held light-field camera
● Pixel-wise disparity value prediction using two random forests
Object
3D printing3D Mesh
Data Generation Add Noise
Without augmented
training set
With training set augmented
with Gaussian noise
With fully augmented
training set