A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation

A Picture is Worth a Thousand Words

Milton Chen

What’s a Picture Worth?

• A thousand words - Descartes (1596-1650)

• A thousand bytes - modern translation– 1000 * 5 * 5 / 3 8,000 bits

• 75,000 bytes - ATSC/MPEG-2– 20 M / 30 600,000 bits

Frequency Response of the Eye

• Lens - low pass

• Photoreceptors - low pass

• Lateral inhibition - high pass– edge is important

Today’s Video Coding

YUV(lossy)

Motion DCTQuantize(lossy)

EntropyOrder

Designed for natural scenes =>Higher frequency DCT coefficients are quantized more =>Sharp edges are not well preserved

What’s Wrong with Today’s Video Coding

• Poor performance for – text (channel logo, stock ticks)– graphics – anything with sharp edges

Desirable Features

• Postproduction support

• Personalized delivery / presentation

• Interactive

• Error resilience

• More compression

• Facilitate search / indexing (MPEG-7)

Outline

• Why

• MPEG-4 Overview

• Systems Layer

• Visual Coding– Arbitrarily shaped video– Meshed video– Face and body

Goals of MPEG-4

• One content– convergence of DTV, computer graphics, and

WWW– broadcast, internet, local

• User interactivity

• Higher compression rates

• Robustness in mobile environment

MPEG-4 Applications

• Interactive TV (broadcast)– Home-shopping, Interactive game show

• Virtual workspace (internet)– virtual meeting, collaborative design

• Infotainment (local)– Virtual-City-Guide

MPEG-4 Key Concepts

• Independent coding of objects– allow user interactivity (client & server)– higher compression rates

• Provide tools as well as solutions– allow content specific and user defined

compression algorithms

MPEG-4 History

• Started in July 1993

• Originally for low-bit-rate applications

• Version 1 to be standardized by January 1999

• Continue work on version 2, etc.

MPEG-4 Standard

1) Systems (manage streams, composition)

2) Visual (natural and synthetic)

3) Audio (natural and synthetic)

4) Conformance Testing

5) Reference Software

6) Delivery Multimedia Integration Framework (medium abstraction layer)

hierarchically multiplexeddownstream control / data

hierarchically multiplexedupstream control / data

audiovisualpresentation

3D objects

2D background

voice

sprite

hypothetical viewer

projection

videocompositor

plane

audiocompositor

scenecoordinate

systemx

y

z user events

audiovisual objects

speakerdisplay

user input

TransMux Streams

FlexMux Streams

Audiovisual InteractiveScene

AL-Packetized Streams

Elementary Streams

Composition and Rendering

Display andUser

Interaction

Transmission/Storage Medium

...(RTP)UDP

IP

(PES)MPEG-2

TS

AAL2ATM

H223PSTN

DABMux ...

TransMuxLayer

TransMux Interface

FlexMux FlexMux FlexMux FlexMux FlexMuxLayer

Stream Multiplex Interface

AL AL...AL AL ... AL AccessUnitLayer

Elementary Stream Interface

PrimitiveAV Objects

SceneDescriptionInformation

ObjectDescriptor

... CompressionLayer

ReturnChannelCoding

Previous Work in Object Coding• Synthetic High System (Schreiber ‘59)

• Contour-Texture Approach (Kocher & Kunt ‘82)

• Object-Based Video Coder (Musmann et. al. ‘89)

• Talisman (Torborg & Kajiya ‘96)

• Blue screen matting (Vlahos ‘64)

Shape Coding• Bitmap-based

– 1 means in, 0 means out– Chroma-keying, GIF89a– G4 fax standard

• Contour-based– chain code– polygon/curve approximation– Fourier descriptor

Chain Code

• Follows the contour and encode the direction of next boundary pel

• 4 or 8 directions for an avg. of 1.2 or 1.4 bits per boundary pel

• Extensions– length– angular resolution

Polygon Approximation

• Add control points until maximum error is below threshold

• Threshold <= 1.4 pel for CIF (352*288) video

• Extension– curves of various order

Fourier Descriptor

• Translation, rotation, and scale invariant

• Sample contour -> ( xi, yi )

• i, ( yi+1 - yi ) / ( xi + 1 - xi )

• Compute Fourier Series coefficients

• Good for recognition, but not an efficient shape coder

MPEG-4 Experiments• Chroma-keying

– color bleeding– need to decode whole frame to get shape

• Bitmap and contour-based coding are similar in:– error resilience– coding efficiency

• Bitmap-based is simpler for hardware due to regular memory access

MPEG-4 Shape Coding

• Three types of macroblocks– transparent, opaque, and object boundary

• Context-based arithmetic encoder • Macroblocks can be subsampled• Texture padded with 0 or mean value• Transparency

– constant: one 8 bit value– arbitrary: treat it like color

Meshed Video

• 2D mesh tessellates the video into patches

• Motion vector for each vertex

• Texture warped in each patch

Meshed Video - Motivation

• Motion Modeling– Translational-block motion does not model

rotation, scaling, reflection, and shear

• Shape Modeling– Possible without depth

Meshed Video - Applications• Compression

– better motion compensation– transmit texture only at key frames– spatio-temporal interpolation (zooming, frame-rate

up-conversion)

• Manipulation– augmented reality– transfiguration (replace billboards)

• Indexing / searching

Face• Face object

– Default face model with terminal– Facial Definition Parameter or user supplied

model/texture– Facial Animation Parameter plus Amplification

and Filters– Lip Shape Animation from phoneme

Facial Definition Parameter

4.64.4

10.4

10.2

9.4

2.10

Y

Z

X

7.1

2.12.10

2.1

9.2

5.2 5.1

4.34.2 4.14.4

10.6

10.10

10.8

11.311.2

11.511.5

11.411.4

11.2

11.1

11.1

10.10

10.8

10.6

10.9

10.7

10.5

10.3

10.110.2

3.11

3.13

3.7

3.9

3.53.1

3.3

Left Eye

Other feature points

Feature points affected by FAPs6.1

6.3

6.4

Tongue

6.2

Mouth

8.4

8.7

8.5

2.4 8.3

8.1

2.5

2.8

2.6.2.2

2.9

2.7

2.3

8.108.6 8.9

8.8 8.2

3.14

3.12

3.10

3.8

3.63.2

Right Eye

4.6 4.5

9.119.10

9.9

9.8

Teeth

9.12

2.112.12

9.6

2.132.14 2.14

2.12

9.14

Nose

9.79.6

9.12

9.19.29.3

9.59.4

9.14 9.13

9.15

Y

X

Z

3.4

10.4

9.3

Facial Animation Parameter

ES0

ENS0

MNS0

MW0

IRISD0

Body

• Like the face

Ultimate Compression TechniqueComputer Graphics ???

• Block based DCT (MPEG-1/2)

• Arbitrary shaped video (MPEG-4)

• Meshed video (MPEG-4)

• Image based rendering

• Textured 3D graphics

• Geometry only 3D graphics

Documents

A Picture is Worth a Thousand Words Milton Chen. What’s a Picture Worth? A thousand words - Descartes (1596-1650) A thousand bytes - modern translation