Immersive Visual Communication with Depth - IFP Group at the

Immersive Visual Communication with Depth

Minh N. Do University of Illinois at Urbana-Champaign

Collaborators: Dan Kubacki, Jiangbo Lu, Matthiew Maitre, Dongbo Min, Ha Nguyen, Quang Nguyen, Viet-Anh Nguyen, Sanjay Patel

1

The Dream: Television and Movie become Remote Reality

2

Interpretation of the Dream: Free-viewpoint 3D Video

3

Our Vision

Ø  Existing audio-visual recording and playback… •  Single camera and microphone

•  Little processing

•  Viewers: passive

Ø  Future… •  Sensors (cameras, microphones) are cheap

•  Massive computing and bandwidth are available

•  Viewers want interactive, immersive, and remote experiences

⇒ Require new signal processing theory and algorithms

Emerging Low-Cost Depth Cameras

4

PrimeSense

Optrima/SoftKinetic PMD

Microsoft

Ø  Depth cameras are in much lower resolution and noisier compared to color cameras

Ø  Need sophisticated interpolation to fill

" Resolution: 204x204. Captured by PMD CamCube 2.0

Problems with Depth Cameras

Depth Image-Based Rendering (DIBR)

6

Problem: •  Input: images from color and depth cameras at arbitrary locations.

•  Output: generated images from arbitrary viewpoints.

Motivation: •  3DTV, free-viewpoint TV

•  Telepresence, distant collaborations

•  Video conferencing

Goal: •  Better image quality.

•  Fewer cameras.

•  Faster/ real time rendering

Proposed “3D Propagation” Algorithm

Three main steps:

1.  Depth propagation

2.  Color-based depth filling and enhancement

3.  Rendering

7

Color-based Depth Filling and Enhancement

8

Occlusion removal

Depth-color bilateral filtering

Directional disocclusion

filling

Depth edge enhancement

Propagated depth image at color view

Occlusion Removal

Depth-Color Bilateral Filtering

Naïve disocclusion filling Directional disocclusion filling

•  Based on the relationship between the disocclusion holes and the change of the camera position.

Directional Disocclusion Filling

1.  Significant depth edge gradients are detected with the Sobel operator.

2.  In the color domain,

a block-based search is performed to find the best match.

3.  Copy depth values.

Before After

Depth Edge Enhancement

After filling

Before filling

Result

Rendering

Example Free-viewpoint Video Generation

Left – Color (800x600) Middle – Depth (160x120) (25 times smaller)

Right – Color (800x600)

INPUTS

OUTPUT

Rendered view (800x600)

Video demo

Free-viewpoint Video Output

15

16

17

18

19

20

21

22

Mapping to GPGPU Architecture

Ø  The algorithm is consciously developed with techniques that have high degree of locality (e.g. bilateral filtering, Sobel operator, block-based search).

Ø  Can be potentially mapped onto GPGPU architectures.

23

Hardware Depth Propagation

Depth-color bilateral filter

CPU Intel Core 2 Duo E8400 3.0GHz

38 ms 1 041 ms

GPU NVIDIA GeForce 9800 GT

24 ms 14 ms

Speedup 1.6x 74.4x

Real Lab Setup

24

Example Acquired Input

25

Left view (640x480) Right view (640x480)

Depth (204x204)

Zoom In of Eye-Gaze Correction

26

Left view

Right view

Rendered view

Ground truth

Eye-Gaze Correction and Background Subtraction

27

Depth Video Enhancement for ToF Camera

Ø  Problem: Depth images from ToF camera are low-resolution, blurred, noisy

Ø  Setting: Given a noisy, low-resolution depth map DL and a registered noise-free, high-solution color image I

è Estimate DH

28

Proposed Method: Weighted Mode Filtering

Ø  Generating joint histogram

•  g(p): color value at pixel p

•  f(p): depth value at pixel p

•  fG(p): enhanced depth value at pixel p

•  GI, GS, Gr: Gaussian function

29

Joint histogram HG

Ø 

p

m2

m1

d2 d1

Weighting value GIGSGr of all pixels q are counted and summed on HG

Then,

Up-sampling results for low-quality depth image (from ‘Mesa Imaging SR4000’, 176x144) with corresponding

color image (from ‘Point Grey Flea’, 1024x768).

Result Comparison

Depth + Color Video Coding

32

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 3

Spatial Down Sampling

Dynamic Range Reduction

Transform

Proposed GMF Filter

Motion Compensation

Intra-frame Prediction

Entropy Coding

Motion Estimation

Intra/Inter

Quantization

Inverse Transform

Inverse Quantization

Memory

M o t i o n I n f o r m

a t i o n

Depth map

Color video

H.264/AVC Encoder

Preprocessing

M bits N bits

(M > N)

(a) Encoder

Entropy Decoding


Inverse Transform

Motion Compensation

Frame Memory

Compressed depth

H.264 Decoder

Proposed GMF Filter

GMF-Based Upsampling

Dynamic Range Increase

Postprocessing

Ouput depth

Color video

M bits N bits

(M > N)

(b) Decoder

Fig. 1. Block diagrams of the proposed global mode filtering based depth map encoder and decoder.

III. PROPOSED GLOBAL MODE FILTERING-BASED DEPTHCODING

Fig. 1 shows the architecture of the proposed depth mapencoder and decoder based on the H.264/AVC standard. Theencoder contains a pre-processing block that enables thespatial resolution and dynamic range reduction of depth signal,if necessary, for an efficient depth map compression. Themotivation is that with an efficient upsampling algorithm,encoding the depth data on the reduced resolution and dynamicrange can reduce the bit rate substantially while still achievinga good synthesized view quality. In addition, the H.264/AVCdeblocking filter will be replaced with a novel GMF-basedin-loop filter to suppress the compression artifacts, especiallyon object boundaries, by taking the depth characteristics into

account. For the decoding process, the GMF-based methodis utilized to upsample the spatial resolution and the dynamicrange of the decoded depth map, if necessary. In what follows,we present the three key components of our proposed depthmap encoder and decoder: (1) GMF-based in-loop filter, (2)GMF-based spatial resolution upsampling, and (3) GMF-baseddynamic range upsampling.

A. In-loop FilterContaining homogeneous regions separated by sharp edges,

transform-based compressed depth map often exhibits largecoding artifacts such as ringing artifacts and blurriness alongthe depth boundaries. These artifacts in turn severely degradethe visual quality of the synthesized view. Fig. 2 shows the


Spatial Down Sampling

Dynamic Range Reduction

Transform

Proposed GMF Filter

Motion Compensation

Intra-frame Prediction

Entropy Coding

Motion Estimation

Intra/Inter

Quantization

Inverse Transform


Memory

M o t i o n I n f o r m

a t i o n

Depth map

Color video

H.264/AVC Encoder

Preprocessing

M bits N bits

(M > N)

(a) Encoder

Entropy Decoding


Inverse Transform

Motion Compensation

Frame Memory

Compressed depth

H.264 Decoder

Proposed GMF Filter

GMF-Based Upsampling

Dynamic Range Increase

Postprocessing

Ouput depth

Color video

M bits N bits

(M > N)

(b) Decoder

Fig. 1. Block diagrams of the proposed global mode filtering based depth map encoder and decoder.

III. PROPOSED GLOBAL MODE FILTERING-BASED DEPTHCODING

Fig. 1 shows the architecture of the proposed depth mapencoder and decoder based on the H.264/AVC standard. Theencoder contains a pre-processing block that enables thespatial resolution and dynamic range reduction of depth signal,if necessary, for an efficient depth map compression. Themotivation is that with an efficient upsampling algorithm,encoding the depth data on the reduced resolution and dynamicrange can reduce the bit rate substantially while still achievinga good synthesized view quality. In addition, the H.264/AVCdeblocking filter will be replaced with a novel GMF-basedin-loop filter to suppress the compression artifacts, especiallyon object boundaries, by taking the depth characteristics into

account. For the decoding process, the GMF-based methodis utilized to upsample the spatial resolution and the dynamicrange of the decoded depth map, if necessary. In what follows,we present the three key components of our proposed depthmap encoder and decoder: (1) GMF-based in-loop filter, (2)GMF-based spatial resolution upsampling, and (3) GMF-baseddynamic range upsampling.

A. In-loop FilterContaining homogeneous regions separated by sharp edges,

transform-based compressed depth map often exhibits largecoding artifacts such as ringing artifacts and blurriness alongthe depth boundaries. These artifacts in turn severely degradethe visual quality of the synthesized view. Fig. 2 shows the


TABLE IIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BALLET SEQUENCE.

Depth bitrate (kbps) Rendering quality (dB)Proposed Trilateral Boundary Proposed Trilateral Boundary

QP H.264/AVC filter filter rec. filter H.264/AVC filter filter rec. filter22 2426.71 2365.12 2445.67 2447.86 40.74 42.31 41.53 41.2525 1824.46 1782.77 1865.29 1861.42 39.52 41.22 40.54 40.2228 1347.74 1320.48 1392.29 1383.34 38.40 39.83 39.46 39.3931 988.88 973.91 1032.24 1017.81 37.34 38.88 38.16 38.00

TABLE IIIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BREAKDANCERS SEQUENCE.



800 1100 1400 1700 2000 2300 260037

38

39

40

41

42

43

Depth bitrate (kbps)

Synt

hesi

zed

view

qua

lity

(dB)

H.264/AVC deblocking filterProposed filterTrilateral filterBoundary reconstruction filter

(a) Ballet

800 1100 1400 1700 2000 2300 260040

41

42

43

44

45


Synt

hesi

zed

view

qua

lity

(dB)


(b) Breakdancers

Fig. 8. RD curves obtained by encoding the depth maps using theproposed and existing in-loop filters.

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Fig. 9. Sample frames of the reconstructed depth map and renderedview for the Ballet sequence obtained by different in-loop filters:(a) H.264/AVC Deblocking filter, (b) Boundary reconstruction filter,(c) Trilateral filter, (d) Proposed in-loop filter, (e) Synthesized imagefrom (a), (f) Synthesized image from (b), (g) Synthesized image from(c), (h) Synthesized image from (d).


TABLE IIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BALLET SEQUENCE.



TABLE IIIEXPERIMENTAL RESULTS OBTAINED BY THE PROPOSED AND EXISTING IN-LOOP FILTERS FOR THE BREAKDANCERS SEQUENCE.



800 1100 1400 1700 2000 2300 260037

38

39

40

41

42

43


Synt

hesi

zed

view

qua

lity

(dB)


(a) Ballet

800 1100 1400 1700 2000 2300 260040

41

42

43

44

45


Synt

hesi

zed

view

qua

lity

(dB)


(b) Breakdancers

Fig. 8. RD curves obtained by encoding the depth maps using theproposed and existing in-loop filters.

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Fig. 9. Sample frames of the reconstructed depth map and renderedview for the Ballet sequence obtained by different in-loop filters:(a) H.264/AVC Deblocking filter, (b) Boundary reconstruction filter,(c) Trilateral filter, (d) Proposed in-loop filter, (e) Synthesized imagefrom (a), (f) Synthesized image from (b), (g) Synthesized image from(c), (h) Synthesized image from (d).

Effects on Decoded Depth and Synthesized View

33

(a-e) H.264 deblocking; (b-f) boundary reconstruction; (c-g) trilateral; (d-h) proposed

Effect on Depth Coding Performance

34


200 500 800 1100 1400 1700 2000 2300 260035

36

37

38

39

40

41

42


Synt

hesi

zed

view

qua

lity

(dB)

Regular depth codingProposed upsampling filterNearest neighborhood upsampling filterBoundary rec. upsampling filter

(a) Ballet

200 500 800 1100 1400 1700 2000 2300 260037

38

39

40

41

42

43

44

45


Synt

hesi

zed

view

qua

lity

(dB)

Regular depth codingProposed upsampling filterNearest neigborhood upsampling filterBoundary rec. upsampling filter

(b) Breakdancers

Fig. 10. RD curves obtained by reconstructing the decoded depthvideo using different upsampling filters and the regular depth codingat the original resolution.

outperformed the regular depth coding over a wide scope ofbit rates. Furthermore, in comparison with the existing up-sampling filters, the proposed upsampling filter also achievedmore than 1-dB PSNR gain of the virtual view in termsof average Bjontegaard metric. For visualization, Fig. 11shows the sample frames of the depth video obtained by thedifferent upsampling filters and the regular depth coding. Thefigures show that the proposed upsampling filter reconstructedefficiently the depth map at the original resolution. Beingable to preserve and recover the depth edge information, weachieved a better visual quality of the synthesized view.

C. Depth Dynamic Range Reduction

In the evaluation of the down/upscaling approach for thedynamic range, the original bit depths of the depth video werereduced to 6 bits and 7 bits without spatial downsamplingprior to encoding. After decoding, the proposed depth dynamic

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Fig. 11. Sample frames of the reconstructed depth map and renderedview for the Breakdancers test sequence: (a) Regular depth coding,(b) Nearest neighborhood upsampling filter, (c) Depth upsamplingboundary reconstruction filter, (d) Proposed upsampling filter, (e)Synthesized image from (a), (f) Synthesized image from (b), (g)Synthesized image from (c), (h) Synthesized image from (d).

range upscaling was used to reconstruct the original dynamicrange of depth data. Fig. 12 shows the RD curves obtainedby this approach with different number of bits per sampleand the regular depth coding using original depth data. Notsurprisingly, by encoding the depth video with the lowerdynamic range, the total depth bit rate was reduced notably.Furthermore, with the effectiveness of the proposed filter,we reconstructed the depth video at the original dynamicrange well, which is reflected by the high-quality virtual view.Specifically, by using the proposed approach with the 7-bitdepth video, we achieved a bit rate savings of approximately27.8% and 46.1% in terms of average Bjontegaard metric forthe Ballet and Breakdancers sequences, respectively, while re-taining the same virtual view quality. However, the 6-bit depth


200 500 800 1100 1400 1700 2000 2300 260035

36

37

38

39

40

41

42


Synt

hesi

zed

view

qua

lity

(dB)

Regular depth codingProposed upsampling filterNearest neighborhood upsampling filterBoundary rec. upsampling filter

(a) Ballet

200 500 800 1100 1400 1700 2000 2300 260037

38

39

40

41

42

43

44

45


Synt

hesi

zed

view

qua

lity

(dB)

Regular depth codingProposed upsampling filterNearest neigborhood upsampling filterBoundary rec. upsampling filter

(b) Breakdancers

Fig. 10. RD curves obtained by reconstructing the decoded depthvideo using different upsampling filters and the regular depth codingat the original resolution.

outperformed the regular depth coding over a wide scope ofbit rates. Furthermore, in comparison with the existing up-sampling filters, the proposed upsampling filter also achievedmore than 1-dB PSNR gain of the virtual view in termsof average Bjontegaard metric. For visualization, Fig. 11shows the sample frames of the depth video obtained by thedifferent upsampling filters and the regular depth coding. Thefigures show that the proposed upsampling filter reconstructedefficiently the depth map at the original resolution. Beingable to preserve and recover the depth edge information, weachieved a better visual quality of the synthesized view.

C. Depth Dynamic Range Reduction

In the evaluation of the down/upscaling approach for thedynamic range, the original bit depths of the depth video werereduced to 6 bits and 7 bits without spatial downsamplingprior to encoding. After decoding, the proposed depth dynamic

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Fig. 11. Sample frames of the reconstructed depth map and renderedview for the Breakdancers test sequence: (a) Regular depth coding,(b) Nearest neighborhood upsampling filter, (c) Depth upsamplingboundary reconstruction filter, (d) Proposed upsampling filter, (e)Synthesized image from (a), (f) Synthesized image from (b), (g)Synthesized image from (c), (h) Synthesized image from (d).

range upscaling was used to reconstruct the original dynamicrange of depth data. Fig. 12 shows the RD curves obtainedby this approach with different number of bits per sampleand the regular depth coding using original depth data. Notsurprisingly, by encoding the depth video with the lowerdynamic range, the total depth bit rate was reduced notably.Furthermore, with the effectiveness of the proposed filter,we reconstructed the depth video at the original dynamicrange well, which is reflected by the high-quality virtual view.Specifically, by using the proposed approach with the 7-bitdepth video, we achieved a bit rate savings of approximately27.8% and 46.1% in terms of average Bjontegaard metric forthe Ballet and Breakdancers sequences, respectively, while re-taining the same virtual view quality. However, the 6-bit depth

Depth-Color Compression using Shape-Adaptive Wavelets

Ø  Reconstructed depth map at 0.04bpp: a gain of 5dB!

Increased Reconstruction Quality

Standard wavelets (9/7 with symmetric ext.)

Shape-adaptive wavelets (9/7 with linear ext.)

Registration and Integration of Depth Frames in Video

37

Preliminary Result

38

Raw depth Registration Integration

39

Conclusion

Ø  Depth is the key for future immersive visual communications

•  Image-based rendering

•  Free-viewpoint 3D video

•  Eye-gaze correction

•  Background subtraction and replacement

Ø  Noisy and low-resolution depth input can be effectively improved by:

•  Combining with color

•  Integration over time

Ø  Challenges:

•  Real-time processing @ video rate

•  User experience and quality assessment

References

§  Q. H. Nguyen, M. N. Do, and S. J. Patel, “Depth image-based rendering using low resolution depth,” ICIP, 2009.

§  H. T. Nguyen and M. N. Do, "Error analysis for image-based rendering with depth information", IEEE Transactions on Image Processing, Apr. 2009.

§  M. Maitre and M. N. Do, "Depth and depth-color coding using shape-adaptive wavelets," Journal of Visual Communication and Image Representation, July 2010.

§  M. N. Do, Q. H. Nguyen, H. T. Nguyen, D. Kubacki, and S. J. Patel, "Immersive visual communication with depth cameras and parallel computing", IEEE Signal Processing Magazine, Jan. 2011.

§  D. Min, J. Lu, and M. N. Do, “Depth video enhanced based on weighted mode filtering,” IEEE Transactions on Image Processing, to appear.

40

Commercialize Depth-based Visual Comm.

41

Documents

Immersive Visual Communication with Depth - IFP Group at the