Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Immersive Visual Communication with Depth
Minh N. Do University of Illinois at Urbana-Champaign
Collaborators: Dan Kubacki, Jiangbo Lu, Matthiew Maitre, Dongbo Min, Ha Nguyen, Quang Nguyen, Viet-Anh Nguyen, Sanjay Patel
3
Our Vision
Ø Existing audio-visual recording and playback… • Single camera and microphone
• Little processing
• Viewers: passive
Ø Future… • Sensors (cameras, microphones) are cheap
• Massive computing and bandwidth are available
• Viewers want interactive, immersive, and remote experiences
⇒ Require new signal processing theory and algorithms
Ø Depth cameras are in much lower resolution and noisier compared to color cameras
Ø Need sophisticated interpolation to fill
" Resolution: 204x204. Captured by PMD CamCube 2.0
Problems with Depth Cameras
Depth Image-Based Rendering (DIBR)
6
Problem: • Input: images from color and depth cameras at arbitrary locations.
• Output: generated images from arbitrary viewpoints.
Motivation: • 3DTV, free-viewpoint TV
• Telepresence, distant collaborations
• Video conferencing
Goal: • Better image quality.
• Fewer cameras.
• Faster/ real time rendering
Proposed “3D Propagation” Algorithm
Three main steps:
1. Depth propagation
2. Color-based depth filling and enhancement
3. Rendering
7
Color-based Depth Filling and Enhancement
8
Occlusion removal
Depth-color bilateral filtering
Directional disocclusion
filling
Depth edge enhancement
Propagated depth image at color view
Naïve disocclusion filling Directional disocclusion filling
• Based on the relationship between the disocclusion holes and the change of the camera position.
Directional Disocclusion Filling
1. Significant depth edge gradients are detected with the Sobel operator.
2. In the color domain,
a block-based search is performed to find the best match.
3. Copy depth values.
Before After
Depth Edge Enhancement
Example Free-viewpoint Video Generation
Left – Color (800x600) Middle – Depth (160x120) (25 times smaller)
Right – Color (800x600)
INPUTS
OUTPUT
Rendered view (800x600)
Video demo
Mapping to GPGPU Architecture
Ø The algorithm is consciously developed with techniques that have high degree of locality (e.g. bilateral filtering, Sobel operator, block-based search).
Ø Can be potentially mapped onto GPGPU architectures.
23
Hardware Depth Propagation
Depth-color bilateral filter
CPU Intel Core 2 Duo E8400 3.0GHz
38 ms 1 041 ms
GPU NVIDIA GeForce 9800 GT
24 ms 14 ms
Speedup 1.6x 74.4x
Depth Video Enhancement for ToF Camera
Ø Problem: Depth images from ToF camera are low-resolution, blurred, noisy
Ø Setting: Given a noisy, low-resolution depth map DL and a registered noise-free, high-solution color image I
è Estimate DH
28
Proposed Method: Weighted Mode Filtering
Ø Generating joint histogram
• g(p): color value at pixel p
• f(p): depth value at pixel p
• fG(p): enhanced depth value at pixel p
• GI, GS, Gr: Gaussian function
29
Joint histogram HG
Ø
p
m2
m1
d2 d1
Weighting value GIGSGr of all pixels q are counted and summed on HG
Then,
Up-sampling results for low-quality depth image (from ‘Mesa Imaging SR4000’, 176x144) with corresponding
color image (from ‘Point Grey Flea’, 1024x768).
Result Comparison
Depth + Color Video Coding
32
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 3
Spatial Down Sampling
Dynamic Range Reduction
Transform
Proposed GMF Filter
Motion Compensation
Intra-frame Prediction
Entropy Coding
Motion Estimation
Intra/Inter
Quantization
Inverse Transform
Inverse Quantization
Memory
M o t i o n I n f o r m
a t i o n
Depth map
Color video
H.264/AVC Encoder
Preprocessing
M bits N bits
(M > N)
(a) Encoder
Entropy Decoding
Inverse Quantization
Inverse Transform
Motion Compensation
Frame Memory
Compressed depth
H.264 Decoder
Proposed GMF Filter
GMF-Based Upsampling
Dynamic Range Increase
Postprocessing
Ouput depth
Color video
M bits N bits
(M > N)
(b) Decoder
Fig. 1. Block diagrams of the proposed global mode filtering based depth map encoder and decoder.
III. PROPOSED GLOBAL MODE FILTERING-BASED DEPTHCODING
Fig. 1 shows the architecture of the proposed depth mapencoder and decoder based on the H.264/AVC standard. Theencoder contains a pre-processing block that enables thespatial resolution and dynamic range reduction of depth signal,if necessary, for an efficient depth map compression. Themotivation is that with an efficient upsampling algorithm,encoding the depth data on the reduced resolution and dynamicrange can reduce the bit rate substantially while still achievinga good synthesized view quality. In addition, the H.264/AVCdeblocking filter will be replaced with a novel GMF-basedin-loop filter to suppress the compression artifacts, especiallyon object boundaries, by taking the depth characteristics into
account. For the decoding process, the GMF-based methodis utilized to upsample the spatial resolution and the dynamicrange of the decoded depth map, if necessary. In what follows,we present the three key components of our proposed depthmap encoder and decoder: (1) GMF-based in-loop filter, (2)GMF-based spatial resolution upsampling, and (3) GMF-baseddynamic range upsampling.
A. In-loop FilterContaining homogeneous regions separated by sharp edges,
transform-based compressed depth map often exhibits largecoding artifacts such as ringing artifacts and blurriness alongthe depth boundaries. These artifacts in turn severely degradethe visual quality of the synthesized view. Fig. 2 shows the
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 3
Spatial Down Sampling
Dynamic Range Reduction
Transform
Proposed GMF Filter
Motion Compensation
Intra-frame Prediction
Entropy Coding
Motion Estimation
Intra/Inter
Quantization
Inverse Transform
Inverse Quantization
Memory
M o t i o n I n f o r m
a t i o n
Depth map
Color video
H.264/AVC Encoder
Preprocessing
M bits N bits
(M > N)
(a) Encoder
Entropy Decoding
Inverse Quantization
Inverse Transform
Motion Compensation
Frame Memory
Compressed depth
H.264 Decoder
Proposed GMF Filter
GMF-Based Upsampling
Dynamic Range Increase
Postprocessing
Ouput depth
Color video
M bits N bits
(M > N)
(b) Decoder
Fig. 1. Block diagrams of the proposed global mode filtering based depth map encoder and decoder.
III. PROPOSED GLOBAL MODE FILTERING-BASED DEPTHCODING
Fig. 1 shows the architecture of the proposed depth mapencoder and decoder based on the H.264/AVC standard. Theencoder contains a pre-processing block that enables thespatial resolution and dynamic range reduction of depth signal,if necessary, for an efficient depth map compression. Themotivation is that with an efficient upsampling algorithm,encoding the depth data on the reduced resolution and dynamicrange can reduce the bit rate substantially while still achievinga good synthesized view quality. In addition, the H.264/AVCdeblocking filter will be replaced with a novel GMF-basedin-loop filter to suppress the compression artifacts, especiallyon object boundaries, by taking the depth characteristics into
account. For the decoding process, the GMF-based methodis utilized to upsample the spatial resolution and the dynamicrange of the decoded depth map, if necessary. In what follows,we present the three key components of our proposed depthmap encoder and decoder: (1) GMF-based in-loop filter, (2)GMF-based spatial resolution upsampling, and (3) GMF-baseddynamic range upsampling.
A. In-loop FilterContaining homogeneous regions separated by sharp edges,
transform-based compressed depth map often exhibits largecoding artifacts such as ringing artifacts and blurriness alongthe depth boundaries. These artifacts in turn severely degradethe visual quality of the synthesized view. Fig. 2 shows the
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 8
TABLE IIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BALLET SEQUENCE.
Depth bitrate (kbps) Rendering quality (dB)Proposed Trilateral Boundary Proposed Trilateral Boundary
QP H.264/AVC filter filter rec. filter H.264/AVC filter filter rec. filter22 2426.71 2365.12 2445.67 2447.86 40.74 42.31 41.53 41.2525 1824.46 1782.77 1865.29 1861.42 39.52 41.22 40.54 40.2228 1347.74 1320.48 1392.29 1383.34 38.40 39.83 39.46 39.3931 988.88 973.91 1032.24 1017.81 37.34 38.88 38.16 38.00
TABLE IIIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BREAKDANCERS SEQUENCE.
Depth bitrate (kbps) Rendering quality (dB)Proposed Trilateral Boundary Proposed Trilateral Boundary
QP H.264/AVC filter filter rec. filter H.264/AVC filter filter rec. filter22 2447.70 2443.59 2422.07 2447.56 43.37 44.49 44.24 43.7025 1784.42 1781.85 1771.50 1787.87 42.29 43.54 42.93 42.7828 1246.30 1251.10 1242.03 1258.64 41.32 42.50 42.04 41.8531 859.04 870.89 861.89 874.10 40.18 41.85 41.25 41.05
800 1100 1400 1700 2000 2300 260037
38
39
40
41
42
43
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
H.264/AVC deblocking filterProposed filterTrilateral filterBoundary reconstruction filter
(a) Ballet
800 1100 1400 1700 2000 2300 260040
41
42
43
44
45
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
H.264/AVC deblocking filterProposed filterTrilateral filterBoundary reconstruction filter
(b) Breakdancers
Fig. 8. RD curves obtained by encoding the depth maps using theproposed and existing in-loop filters.
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Fig. 9. Sample frames of the reconstructed depth map and renderedview for the Ballet sequence obtained by different in-loop filters:(a) H.264/AVC Deblocking filter, (b) Boundary reconstruction filter,(c) Trilateral filter, (d) Proposed in-loop filter, (e) Synthesized imagefrom (a), (f) Synthesized image from (b), (g) Synthesized image from(c), (h) Synthesized image from (d).
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 8
TABLE IIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BALLET SEQUENCE.
Depth bitrate (kbps) Rendering quality (dB)Proposed Trilateral Boundary Proposed Trilateral Boundary
QP H.264/AVC filter filter rec. filter H.264/AVC filter filter rec. filter22 2426.71 2365.12 2445.67 2447.86 40.74 42.31 41.53 41.2525 1824.46 1782.77 1865.29 1861.42 39.52 41.22 40.54 40.2228 1347.74 1320.48 1392.29 1383.34 38.40 39.83 39.46 39.3931 988.88 973.91 1032.24 1017.81 37.34 38.88 38.16 38.00
TABLE IIIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BREAKDANCERS SEQUENCE.
Depth bitrate (kbps) Rendering quality (dB)Proposed Trilateral Boundary Proposed Trilateral Boundary
QP H.264/AVC filter filter rec. filter H.264/AVC filter filter rec. filter22 2447.70 2443.59 2422.07 2447.56 43.37 44.49 44.24 43.7025 1784.42 1781.85 1771.50 1787.87 42.29 43.54 42.93 42.7828 1246.30 1251.10 1242.03 1258.64 41.32 42.50 42.04 41.8531 859.04 870.89 861.89 874.10 40.18 41.85 41.25 41.05
800 1100 1400 1700 2000 2300 260037
38
39
40
41
42
43
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
H.264/AVC deblocking filterProposed filterTrilateral filterBoundary reconstruction filter
(a) Ballet
800 1100 1400 1700 2000 2300 260040
41
42
43
44
45
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
H.264/AVC deblocking filterProposed filterTrilateral filterBoundary reconstruction filter
(b) Breakdancers
Fig. 8. RD curves obtained by encoding the depth maps using theproposed and existing in-loop filters.
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Fig. 9. Sample frames of the reconstructed depth map and renderedview for the Ballet sequence obtained by different in-loop filters:(a) H.264/AVC Deblocking filter, (b) Boundary reconstruction filter,(c) Trilateral filter, (d) Proposed in-loop filter, (e) Synthesized imagefrom (a), (f) Synthesized image from (b), (g) Synthesized image from(c), (h) Synthesized image from (d).
Effects on Decoded Depth and Synthesized View
33
(a-e) H.264 deblocking; (b-f) boundary reconstruction; (c-g) trilateral; (d-h) proposed
Effect on Depth Coding Performance
34
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 9
200 500 800 1100 1400 1700 2000 2300 260035
36
37
38
39
40
41
42
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
Regular depth codingProposed upsampling filterNearest neighborhood upsampling filterBoundary rec. upsampling filter
(a) Ballet
200 500 800 1100 1400 1700 2000 2300 260037
38
39
40
41
42
43
44
45
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
Regular depth codingProposed upsampling filterNearest neigborhood upsampling filterBoundary rec. upsampling filter
(b) Breakdancers
Fig. 10. RD curves obtained by reconstructing the decoded depthvideo using different upsampling filters and the regular depth codingat the original resolution.
outperformed the regular depth coding over a wide scope ofbit rates. Furthermore, in comparison with the existing up-sampling filters, the proposed upsampling filter also achievedmore than 1-dB PSNR gain of the virtual view in termsof average Bjontegaard metric. For visualization, Fig. 11shows the sample frames of the depth video obtained by thedifferent upsampling filters and the regular depth coding. Thefigures show that the proposed upsampling filter reconstructedefficiently the depth map at the original resolution. Beingable to preserve and recover the depth edge information, weachieved a better visual quality of the synthesized view.
C. Depth Dynamic Range Reduction
In the evaluation of the down/upscaling approach for thedynamic range, the original bit depths of the depth video werereduced to 6 bits and 7 bits without spatial downsamplingprior to encoding. After decoding, the proposed depth dynamic
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Fig. 11. Sample frames of the reconstructed depth map and renderedview for the Breakdancers test sequence: (a) Regular depth coding,(b) Nearest neighborhood upsampling filter, (c) Depth upsamplingboundary reconstruction filter, (d) Proposed upsampling filter, (e)Synthesized image from (a), (f) Synthesized image from (b), (g)Synthesized image from (c), (h) Synthesized image from (d).
range upscaling was used to reconstruct the original dynamicrange of depth data. Fig. 12 shows the RD curves obtainedby this approach with different number of bits per sampleand the regular depth coding using original depth data. Notsurprisingly, by encoding the depth video with the lowerdynamic range, the total depth bit rate was reduced notably.Furthermore, with the effectiveness of the proposed filter,we reconstructed the depth video at the original dynamicrange well, which is reflected by the high-quality virtual view.Specifically, by using the proposed approach with the 7-bitdepth video, we achieved a bit rate savings of approximately27.8% and 46.1% in terms of average Bjontegaard metric forthe Ballet and Breakdancers sequences, respectively, while re-taining the same virtual view quality. However, the 6-bit depth
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 9
200 500 800 1100 1400 1700 2000 2300 260035
36
37
38
39
40
41
42
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
Regular depth codingProposed upsampling filterNearest neighborhood upsampling filterBoundary rec. upsampling filter
(a) Ballet
200 500 800 1100 1400 1700 2000 2300 260037
38
39
40
41
42
43
44
45
Depth bitrate (kbps)
Synt
hesi
zed
view
qua
lity
(dB)
Regular depth codingProposed upsampling filterNearest neigborhood upsampling filterBoundary rec. upsampling filter
(b) Breakdancers
Fig. 10. RD curves obtained by reconstructing the decoded depthvideo using different upsampling filters and the regular depth codingat the original resolution.
outperformed the regular depth coding over a wide scope ofbit rates. Furthermore, in comparison with the existing up-sampling filters, the proposed upsampling filter also achievedmore than 1-dB PSNR gain of the virtual view in termsof average Bjontegaard metric. For visualization, Fig. 11shows the sample frames of the depth video obtained by thedifferent upsampling filters and the regular depth coding. Thefigures show that the proposed upsampling filter reconstructedefficiently the depth map at the original resolution. Beingable to preserve and recover the depth edge information, weachieved a better visual quality of the synthesized view.
C. Depth Dynamic Range Reduction
In the evaluation of the down/upscaling approach for thedynamic range, the original bit depths of the depth video werereduced to 6 bits and 7 bits without spatial downsamplingprior to encoding. After decoding, the proposed depth dynamic
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Fig. 11. Sample frames of the reconstructed depth map and renderedview for the Breakdancers test sequence: (a) Regular depth coding,(b) Nearest neighborhood upsampling filter, (c) Depth upsamplingboundary reconstruction filter, (d) Proposed upsampling filter, (e)Synthesized image from (a), (f) Synthesized image from (b), (g)Synthesized image from (c), (h) Synthesized image from (d).
range upscaling was used to reconstruct the original dynamicrange of depth data. Fig. 12 shows the RD curves obtainedby this approach with different number of bits per sampleand the regular depth coding using original depth data. Notsurprisingly, by encoding the depth video with the lowerdynamic range, the total depth bit rate was reduced notably.Furthermore, with the effectiveness of the proposed filter,we reconstructed the depth video at the original dynamicrange well, which is reflected by the high-quality virtual view.Specifically, by using the proposed approach with the 7-bitdepth video, we achieved a bit rate savings of approximately27.8% and 46.1% in terms of average Bjontegaard metric forthe Ballet and Breakdancers sequences, respectively, while re-taining the same virtual view quality. However, the 6-bit depth
Ø Reconstructed depth map at 0.04bpp: a gain of 5dB!
Increased Reconstruction Quality
Standard wavelets (9/7 with symmetric ext.)
Shape-adaptive wavelets (9/7 with linear ext.)
39
Conclusion
Ø Depth is the key for future immersive visual communications
• Image-based rendering
• Free-viewpoint 3D video
• Eye-gaze correction
• Background subtraction and replacement
Ø Noisy and low-resolution depth input can be effectively improved by:
• Combining with color
• Integration over time
Ø Challenges:
• Real-time processing @ video rate
• User experience and quality assessment
References
§ Q. H. Nguyen, M. N. Do, and S. J. Patel, “Depth image-based rendering using low resolution depth,” ICIP, 2009.
§ H. T. Nguyen and M. N. Do, "Error analysis for image-based rendering with depth information", IEEE Transactions on Image Processing, Apr. 2009.
§ M. Maitre and M. N. Do, "Depth and depth-color coding using shape-adaptive wavelets," Journal of Visual Communication and Image Representation, July 2010.
§ M. N. Do, Q. H. Nguyen, H. T. Nguyen, D. Kubacki, and S. J. Patel, "Immersive visual communication with depth cameras and parallel computing", IEEE Signal Processing Magazine, Jan. 2011.
§ D. Min, J. Lu, and M. N. Do, “Depth video enhanced based on weighted mode filtering,” IEEE Transactions on Image Processing, to appear.
40