1
Introduction Compression Performance Conclusion s Large Camera Arrays Capture multi-viewpoint images of a scene/object. Potential applications abound: surveillance, special movie effects. Image-based rendering [Levoy ’96] Joint encoding of multiple views cannot be used Distributed Compression for Large Camera Arrays A distributed compression scheme for large camera arrays. Low-complexity Wyner-Ziv encoder Allows independent encoding of each camera view but centralized decoding to exploit inter-viewpoint image similarities. The existence of rendered side information and the use of shape adaptation techniques enhances compression efficiency. Experimental results show superior rate-PSNR performance over JPEG2000 and a JPEG-like SA-DCT coder, especially at low bit rates. Pixel domain coding and shape adaptation help to avoid blurry edges around the object (e.g., in JPEG2000) and blocky artifacts from block-based Xiaoqing Zhu , Anne Aaron and Bernd Girod Department of Electrical Engineering, Stanford University System Description Rendering of Side Information The geometry model is reconstructed from silhouette information of the conventional camera views Side information of the Wyner-Ziv camera views are rendered based on pixel correspondences derived from the geometry. Encoder Complexity 0.54 0.18 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 B uddha G arfield W yner-Ziv C oder JPEG 2000 CPU Execution Time milliseconds(ms) per picture Basic Operations The Wyner-Ziv encoder needs: 1 quantization step and 3 look-up- table procedures per pixel shape extraction and coding The JPEG2000 compressor needs: Multi-level 2-D DWT: ~ 5 multiplications per pixel Content-based arithmetic coding WZ-ENC Geometry Reconstruc tion Rendering Wyner-Ziv Cameras Conventional Cameras Distributed Encoding Centralized Decoding WZ-ENC WZ-DEC WZ-DEC Geometry Information Side Information Shape Adaptation Only encode pixels within the object shape Object shapes are obtained by chroma keying, compressed with JBIG, and then transmitted to the decoder. Wyner-Ziv Decoder Scaler Quantiz er Turbo Coder Wyner-Ziv Encoder Turbo Decode r Reconstructi on X ' X Q Buffe r Y Parity Bits Q Request Bits Wyner-Ziv Codec The Wyner-Ziv coder in comparison with JPEG2000 and a SA-DCT coder, using the synthetic Buddha and the real-world Garfield data sets. Shape information is derived from perfect geometry for Buddha and coded at 0.0814 bpp for Garfield. The overhead of shape coding is counted in the Wyner-Ziv coder and the SA-DCT coder [Aaron ’02] Shape Architecture Proposed Scheme Apply Wyner-Ziv coding to multi- viewpoint images Distributed encoding and joint decoding of the images, hence to benefit from the inter-viewpoint coherence. Stanford Camera Array, Courtesy of Computer Graphics Lab, Stanford 0 0.05 0.1 0.15 0.2 0.25 0.3 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 bpp PSNR (dB) Buddha WynerZiv JPEG 2000 SA-D CT 0.05 0.1 0.15 0.2 0.25 37 38 39 40 41 42 43 44 45 46 47 bpp PSN R (dB) Garfield W ynerZiv JPEG 2000 SA -D CT Rate-PSNR Curve JPEG2000 SA-DCT Coder Wyner-Ziv Coder Reconstructed Images Rate = 0.11 bpp PSNR = 39.87 dB Rate = 0.12 bpp PSNR = 38.89 dB Rate = 0.11 bpp PSNR = 37.43 dB Rate = 0.13 bpp PSNR = 42.68 dB Rate = 0.15 bpp PSNR = 41.86 dB Rate = 0.13 bpp PSNR = 44.08 dB [Ramanathan ‘01] Contact: [email protected]

Introduction Compression Performance Conclusions Large Camera Arrays Capture multi-viewpoint images of a scene/object. Potential applications abound: surveillance,

Embed Size (px)

Citation preview

Page 1: Introduction Compression Performance Conclusions Large Camera Arrays Capture multi-viewpoint images of a scene/object. Potential applications abound: surveillance,

Introduction

Compression Performance Conclusions

Large Camera Arrays• Capture multi-viewpoint images of a scene/object. • Potential applications abound:

• surveillance, special movie effects. • Image-based rendering [Levoy ’96]

• Joint encoding of multiple views cannot be used

Distributed Compression for Large Camera Arrays

• A distributed compression scheme for large camera arrays.

• Low-complexity Wyner-Ziv encoder

• Allows independent encoding of each camera view but centralized decoding to exploit inter-viewpoint image similarities.

• The existence of rendered side information and the use of shape adaptation techniques enhances compression efficiency.

• Experimental results show superior rate-PSNR performance over JPEG2000 and a JPEG-like SA-DCT coder, especially at low bit rates.

• Pixel domain coding and shape adaptation help to avoid blurry edges around the object (e.g., in JPEG2000) and blocky artifacts from block-based transform (e.g., in the SA-DCT coder).

Xiaoqing Zhu, Anne Aaron and Bernd GirodDepartment of Electrical Engineering, Stanford University

System Description

Rendering of Side Information

• The geometry model is reconstructed from silhouette information of the conventional camera views

• Side information of the Wyner-Ziv camera views are rendered based on pixel correspondences derived from the geometry.

Encoder Complexity

0.54

0.18

1.2

1.4

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Buddha Garfield

Wyner-Ziv Coder JPEG2000

CPU Execution Timemilliseconds(ms) per picture

Basic Operations

• The Wyner-Ziv encoder needs:• 1 quantization step and 3 look-up-table procedures per pixel • shape extraction and coding

• The JPEG2000 compressor needs:• Multi-level 2-D DWT: ~ 5 multiplications per pixel • Content-based arithmetic coding

WZ-ENC

GeometryReconstruction

Rendering

Wyner-Ziv Cameras Conventional Cameras

DistributedEncoding

CentralizedDecoding

WZ-ENC

WZ-DEC WZ-DEC

Geometry Information

Side Information

Shape Adaptation

• Only encode pixels within the object shape

• Object shapes are obtained by chroma keying, compressed with JBIG, and then transmitted to the decoder.

Wyner-Ziv Decoder

ScalerQuantizer

Turbo Coder

Wyner-Ziv Encoder

TurboDecoder Reconstruction

X 'XQ

Buffer

Y

Parity BitsQ

Request Bits

Wyner-Ziv Codec

• The Wyner-Ziv coder in comparison with JPEG2000 and a SA-DCT coder, using the synthetic Buddha and the real-world Garfield data sets.

• Shape information is derived from perfect geometry for Buddha and coded at 0.0814 bpp for Garfield. The overhead of shape coding is counted in the Wyner-Ziv coder and the SA-DCT coder

[Aaron ’02]

Shape Architecture

Proposed Scheme• Apply Wyner-Ziv coding to multi-viewpoint images• Distributed encoding and joint decoding of the images, hence to benefit from the inter-viewpoint coherence.

Stanford Camera Array, Courtesy of Computer Graphics Lab, Stanford

0 0.05 0.1 0.15 0.2 0.25 0.3313233343536373839404142434445

bpp

PSN

R (d

B)

Buddha

WynerZivJPEG2000SA-DCT

0.05 0.1 0.15 0.2 0.2537

38

39

40

41

42

43

44

45

46

47

bpp

PSN

R (

dB)

Garfield

WynerZivJPEG2000SA-DCT

Rate-PSNR Curve

JPEG2000SA-DCT CoderWyner-Ziv Coder

Reconstructed Images

Rate = 0.11 bpp PSNR = 39.87 dB Rate = 0.12 bpp PSNR = 38.89 dB Rate = 0.11 bpp PSNR = 37.43 dB

Rate = 0.13 bpp PSNR = 42.68 dBRate = 0.15 bpp PSNR = 41.86 dBRate = 0.13 bpp PSNR = 44.08 dB

[Ramanathan ‘01]

Contact: [email protected]