UNIT V Video Compression. 2 Outline 1. Introduction to Video Compression 2 Video Compression with Motion Compensation 3 Search for Motion Vectors 4 H.261

UNIT V

Video Compression

2

Outline

1. Introduction to Video Compression

2 Video Compression with Motion Compensation

3 Search for Motion Vectors

4 H.261

5 H.263

6 MPEG 1,2,4,7

7 Digital video interface

3

Introduction to Video Compression

A video consists of a time-ordered sequence of

frames,

i.e., images. An obvious solution to video compression would be

predictive coding based on previous frames. Compression proceeds by subtracting images:

subtract in time order and code the residual error. It can be done even better by searching for just the

right parts of the image to subtract from the

previous

frame.

4

Video Compression with Motion Compensation

Consecutive frames in a video are similar

- temporal redundancy exists. Temporal redundancy is exploited so that not every

frame of the video needs to be coded

independently

as a new image. The difference between the current frame and

other

frame(s) in the sequence will be coded

- small values and low entropy, good for

compression.

5

Video Compression with Motion Compensation

Steps of Video compression based on

Motion Compensation (MC):

1. Motion estimation (motion vector search).

2. MC-based Prediction.

3. Derivation of the prediction error, i.e., the

difference.

6

Motion Compensation

Each image is divided into macroblocks of size N×N.

• By default, N = 16 for luminance images. • For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted.

Chap 10 Basic Video Compression Techniques 7

Motion Compensation

Motion compensation is performed at the

macroblock level.

• The current image frame is referred to as

Target Frame.

• A match is sought between the macroblock in the

Target Frame and the most similar macroblock in

previous and/or future frame(s) (Reference

frame(s)).

• The displacement of the reference macroblock to

the

target macroblock is called a motion vector MV.


Fig. 10.1: Macroblocks and Motion Vector in Video Compression.


Figure 10.1 shows the case of forward prediction in which the Reference frame is taken to be a previous frame.

MV search is usually limited to a small immediate neighborhood – both horizontal and vertical displacements in the range [−p, p]: This makes a search window of size (2p+1)×(2p+1).

10

Search for Motion Vectors

The difference between two macroblocks can then be

measured by their Mean Absolute Difference (MAD) 1 1

20 0

1( , ) ( , ) ( , )

- -

= =å å= + + - + + + +N N

k lMAD i j C x k y l R x i k y j l

N

11

Search for Motion Vectors

The goal of the search is to find a vector (i, j)

as the motion vector MV = (u,v),

such that MAD(i, j) is minimum:

( , ) [( , ) | ( , ) is minimum,

[ , ], [ , ] ]

=

Î - Î -

u v i j MAD i j

i p p j p p

12

Sequential Search

Sequential search: sequentially search the whole

(2p+1)×(2p+1) window in the reference frame

(also referred to as full search or exhaustive

search).

• A macroblock centered at each of the positions

within the window is compared to the macroblock

in the Target frame pixel by pixel and their

respective MAD is then derived

• The vector (i, j) that offers the least MAD is

designated as the MV (u, v) for the macroblock in

the Target frame.

13

• Sequential search method is very costly

• Assuming each pixel comparison requires three

operations (subtraction, absolute value, addition),

the cost for obtaining a motion vector for a

single macroblock is

2 2 2(2 1) (2 1) 3 ( )+ × + × × Þp p N O p N

14

Motion-vector: sequential-search

15

2D Logarithmic Search

Logarithmic search: a cheaper version, that is suboptimal but still usually effective. The procedure for 2D Logarithmic Search of motion vectors takes several iterations and is akin to a binary search:• Initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as ‘1’.

16

• After the one that yields the minimum MAD is located, the center of the new search region is moved to it and the step-size (offset) is reduced to half.

• In the next iteration, the nine new locations are marked as ‘2’, and so on.

17

Fig. 1: 2D Logarithmic Search for Motion Vectors.

18

Motion-vector: 2D-logarithmic-search


Using the same example as in the previous

subsection,

the total operations per second is dropped to:


Hierarchical Search

The search can benefit from a hierarchical

(multiresolution) approach in which initial estimation of the motion vector can be obtained from images with a significantly reduced resolution.

Figure 10.3: a three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2.

Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced.


Fig. 10.3: A Three-level Hierarchical Search for Motion Vectors.

22

Table 10.1 Comparison of Computational Cost of Motion Vector Search based on examples


H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards.

• The standard was designed for videophone, video conferencing and other audiovisual services over ISDN.

• The video codec supports bit-rates of p×64 kbps, where p ranges from 1 to 30.

• Require that the delay of the video encoder be less than 150 msec so that the video can be used for

real-time bidirectional video conferencing.

H.261


Table 10.2 Video Formats Supported by H.261

25

Fig. 10.4: H.261 Frame Sequence.

26

Two types of image frames are defined: Intra-frames

(I-frames) and Inter-frames (P-frames):

I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame.

P-frames are not independent: coded by a forward predictive coding method (prediction from previous

I-frame or P-frame is allowed).

H.261 Frame Sequence

27

Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only

spatial redundancy removal.

To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video.

Motion vectors in H.261 are always measured in units of full pixel and they have a limited range of ±15 pixels, i.e., p = 15.

H.261 Frame Sequence

28

Intra-frame (I-frame) Coding

Fig. 10.5: I-frame Coding.

29

Macroblocks are of size 16×16 pixels for the Y frame, and 8×8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed.

A macroblock consists of

four Y, one Cb, and one Cr 8×8 blocks.

For each 8×8 block a DCT transform is applied,

the DCT coefficients then go through quantization, zigzag scan, and entropy coding.

Intra-frame (I-frame) Coding

30

Inter-frame (P-frame) Coding

Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation.

31

• For each macroblock in the Target frame, a motion vector is allocated by one of the search methods discussed earlier.

• After the prediction, a difference macroblock is derived to measure the prediction error.

• Each of these 8x8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures.


32

The P-frame coding encodes the difference macroblock (not the Target macroblock itself).

Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level.

• The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB.

For motion vector, the difference MVD is sent for entropy coding:

• MVD = MVPreceding −MVCurrent


33

• The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.

• If we use DCT and QDCT to denote the DCT coefficients before and after the quantization, then for DC coefficients in Intra mode:

Quantization in H.261

• For all other coefficients:

• scale - an integer in the range of [1, 31].

34

Fig. 10.7 shows a relatively complete picture of how the H.261 encoder and decoder work.

• A scenario is used where frames I, P1, and P2 are encoded and then decoded.

Note: decoded frames (not the original frames) are used as reference frames in motion estimation.

The data that goes through the observation points indicated by the circled numbers are summarized in Tables 10.3 and 10.4.

H.261 Encoder and Decoder

35

I

I

I

I original image decoded imageI

0

Fig. 10.6(a): H.261 Encoder (I-frame).

36

I

I

decoded imageI

0

Fig. 10.6(b): H.261 Decoder (I-frame).

37

1P

1P

1D

1P'1P

1D

'1P

1P original image

decoded image

'1P prediction 1D prediction error

1D decoded prediction error

'1 1 1D P P

'1 1 1P D P

Fig. 10.6(a): H.261 Encoder (P-frame).

38

1D

1P

'1P

'1P

'1P prediction

decoded (reconstructed) image1P1D decoded prediction error

Fig. 10.6(b): H.261 Decoder (P-frame).

39

40

1PI

'1P 1P

Fig. .1: Macroblocks and Motion Vector in Video Compression.

41

1P1P

I '1P

1D

Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation.

42

• Fig. 10.8 shows the syntax of H.261 video bitstream: a hierarchy of four layers:

Picture, Group of Blocks (GOB), Macroblock,

and Block.

1. The Picture layer: PSC (Picture Start Code) delineates boundaries between pictures.

TR (Temporal Reference) provides a time-stamp for the picture.

Syntax of H.261 Video Bitstream

43

2. The GOB layer: H.261 pictures are divided into regions of 11×3 macroblocks, each of which is called a Group of Blocks (GOB).

• Fig. 10.9 depicts the arrangement of GOBs in a CIF or QCIF luminance image.

• For instance, the CIF image has 2×6 GOBs, corresponding to its image resolution of 352×288 pixels. Each GOB has its Start Code (GBSC) and Group number (GN).

• In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identifiable GOB.

44

3. The Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8×8 image blocks

(4 Y, 1 Cb, 1 Cr).

4. The Block layer: For each 8x8 block, the bitstream starts with DC value, followed by pairs of length of zero-run (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code. The range of Run is [0, 63].

Level reflects quantized values

- its range is [−127; 127] and Level ≠ 0.

Ch 45

Fig. 10.8: Syntax of H.261 Video Bitstream.

46

Fig. 10.9: Arrangement of GOBs in H.261 Luminance Images.

47

H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN).

• Aims at low bit-rate communications at bit-rates of less than 64 kbps.

• Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction).

H.263

Li & Drew; 인터넷미디어공학부 임창훈 48

Table 10.5 Video Formats Supported by H.263

Chap 10 Basic Video Compression Techniques Li & Drew; 인터넷미디어공학부 임창훈 49

As in H.261, H.263 standard also supports the notion of Group of Blocks (GOB).

The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture.

As shown in Fig. 10.10, each QCIF luminance image consists of 9 GOBs and each GOB has 11×1 MBs (176×16 pixels), whereas each 4CIF luminance image consists of 18 GOBs and each GOB has 44×2 MBs (704×32 pixels).

H.263 & Group of Blocks (GOB)

Chap 10 Basic Video Compression Techniques Li & Drew; 인터넷미디어공학부 임창훈 50

Fig. 10.10: Arrangement of GOBs in H.263 Luminance Images.


The horizontal and vertical components of the MV are predicted from the median values of the horizontal and vertical components, respectively, of MV1, MV2, MV3 from the “previous", “above" and “above and right" MBs (see Fig. 10.11 (a)).

For the Macroblock with MV(u; v):

Motion Compensation if H.263

52

Fig. 10.11: Prediction of Motion Vector in H.263.


In order to reduce the prediction error, half-pixel precision is supported in H.263 vs. full-pixel precision only in H.261.

• The default range for both the horizontal and vertical components u and v of MV(u, v) are now [−16, 15.5].

• The pixel values needed at half-pixel positions are generated by a simple bilinear interpolation method, as shown in Fig. 10.12.

Half-Pixel Precision


Fig. 10.12: Half-pixel Prediction by Bilinear Interpolation in H.263.

Chap 11 MPEG Video Coding 55

MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video.

It is appropriately recognized that proprietary interests need to be maintained within the family of MPEG standards:

Accomplished by defining only a compressed bitstream that implicitly defines the decoder.

The compression algorithms, and thus the encoders, are completely up to the manufacturers.

11.1 Overview

Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 56

MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format).

MPEG-1 supports only non-interlaced video. Normally, its picture resolution is:

352×240 for NTSC video at 30 fps

352×288 for PAL video at 25 fps

It uses 4:2:0 chroma subsampling

The MPEG-1 standard has five parts:

Systems, Video, Audio, Conformance, Software.

11.2 MPEG-1


Motion Compensation (MC) based video encoding in H.261 works as follows:

In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction.

Prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps.

The prediction is from a previous frame - forward prediction.

Motion Compensation in MPEG-1


Fig. 11.1: The Need for Bidirectional Search.

The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.


MPEG introduces a third frame type - B-frame, and its accompanying bi-directional motion compensation.

The MC-based B-frame coding idea is illustrated in Fig. 11.2:

Motion Compensation in MPEG-1


Fig. 11.2: B-frame Coding Based on Bidirectional Motion Compensation.


Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction).

If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged before comparing to the Target MB for generating the prediction error.

If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.


Fig. 11.3: MPEG frame sequence.


Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices (Fig. 11.4):

May contain variable numbers of macroblocks in a slice.

May also start and end anywhere as long as they fill the whole picture.

Each slice is coded independently –

additional flexibility in bit-rate control.

Slice concept is important for error recovery.

Other Major Differences from H.261


Fig. 11.4: Slices in an MPEG-1 Picture.


Fig. 11.5: Layers of MPEG-1 Video Bitstream.


MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps.

Defined seven profiles aimed at different applications:

Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview.

Within each profile, up to four levels are defined (Table 11.5).

The DVD video specification allows only four display resolutions: 720×480, 704×480, 352×480, and 352×240.

11.3 MPEG-2


MPEG-2

Need for MPEG-2

MPEG-1 allowed rates of 1.5 Mbps at SIF resolution and higher resolution coding standards were needed for direct video broadcasting and storage on DVB, DVD

MPEG-1 allowed encoding only of progressive scan sources, not interlaced scan sources

MPEG-1 provides limited error concealment for noisy channels

a more flexible choice of formats, resolutions and bitrates was needed

MPEG-2

MPEG-2 was designed mainly for storage (DVD, DVB) and transmission on noisy channels (direct terrestrial or satellite TV broadcast)

MPEG-2 standards were published as ISO/IEC 13818

like MPEG-1, the MPEG-2 standard only specifies the syntax of the bit stream and the semantics/operation of the decoding process and leaves out the design of the encoder and decoder (to stimulate competition and industry product differentiation) although it provides a reference implementation

developed between 1991-1993

parts of MPEG-2 reached International Standard in 1994, 1996, 1997, 1999

MPEG-3 was originally intended for HDTV at higher bitrates, but was merged with MPEG-2

MPEG-2 partspart 1, Systems : synchronization and multiplexing of audio and

video

part 2, video

part 3, audio

part 4, testing compliance

part 5, software simulation

part 6, extensions for Digital Storage Media Command and Control (DSM-CC)

part 7, Advanced Audio Coding (AAC)

part 9, extensions for real time interfaces

part 10, conformance extensions for DSM-CC

part 11, Intellectual Property Management and Protection

[ part 8 withdrawn due to lack of industry interest ]

MPEG-2 target applications

coding high-quality video at 4-15 Mbps for video on demand (VOD), standard definition (SD) and high-definition (HD) digital TV broadcasting and for storing video on digital storage media like the DVD

MPEG-2 should have scalable coding and should include error resilience techniques

MPEG-2 should provide good NTSC quality video at 4-6 Mbps and transparent NTSC quality video at 8-10 Mbps

MPEG-2 should provide random access to frames

MPEG-2 should be compatible with MPEG-1 (an MPEG-2 decoder should be able to decode an MPEG-1 bitstream)

low cost decoders

MPEG-2 SystemsMPEG-2 Systems offers 2 types of multiplexation bitstreams:

Program Stream: it consists of a sequence of PESs, similar and compatible to MPEG-1 Program (System) Stream, but containing additional features; MPEG-2 PS is a superset of MPEG-1 PS; it is suited for error-free transmission environments and has long and variable length packets (typically 1-2KB, but can also be 64KB) for coding efficiency; it has features not present in MPEG-1 PS like: scrambling of data, assigning different priorities to packets, alignment of elementary stream packets, copyright indication, fast forward and fast reverse indication.

Transport Stream: designed for transmission through noisy channels; has a small fixed size packet of 188 bytes; it is suited for cable/satellite TV broadcasting, ATM networks; allows synchronous multiplexing of programs with independent time bases, fast access to the desired program for channel hoping

PES (Packetized Elementary Stream) – is the central structure used in both Program and Transport Streams; results from packetizing cntinuous streams of compressed audio or video

MPEG-2 Systems multiplexation

(digital storage media)

Packetized Elementary Streams (PES)

Program Stream structure (simplified)

Transport Stream Structure (simplified)

MPEG-2 Profiles and LevelsMPEG-2 is designed o cover a wide range of applications, but not all

features are needed by all applications

MPEG-2 groups application features into 7 profiles and profiles have different levels

simple profile – for low-delay video conferencing applications using only I- and P- frames

main profile – most used, high quality digital video apps.

SNR (signal to noise ratio) scalable – supports multiple grades of video quality

spatially scalable - supports multiple grades of resolution

high – supports multiple grades of quality, resolution and chroma formats

4:2:2

multiview

MPEG-2 Profiles and Levels (2)

there are 4 levels for each profile:

low (for SIF pictures)

main (for ITU-R BT 601 resolution pictures)

high-1440 (for European HDTV resolution pictures)

high (for North America HDTV resolution pictures)

MPEG-2 Profiles and Levels (3)

Scalable coding

scalable coding – means coding the audio-video stream into a base layer and some enhancement layers, so that when the base layer is decoded basic quality is achieved, but if the transmission channel allows it, decoding enhancement layers brings additional quality to the decoded stream

There are 4 types of scalability

SNR scalability

spatial scalability

temporal scalability

hybrid (combination of the above)

SNR scalability

Spatial scalability

Encoding of interlaced video

MPEG-2 allows encoding of interlaced video and a frame can be intracoded or intercoded as a picture or as a field of picture

motion estimation/compensation can be between frames or between fields

MPEG-4

MPEG-4

MPEG-4, or ISO/IEC 14496 is an international standard describing coding of audio-video objects

the 1st version of MPEG-4 became an international standard in 1999 and the 2nd version in 2000 (6 parts); since then many parts were added and some are under development today

MPEG-4 included object-based audio-video coding for Internet streaming, television broadcasting, but also digital storage

MPEG-4 included interactivity and VRML support for 3D rendering

has profiles and levels like MPEG-2has 27 parts

MPEG-4 parts

Part 1, Systems – synchronizing and multiplexing audio and video

Part 2, Visual – coding visual data

Part 3, Audio – coding audio data, enhancements to Advanced Audio Coding and new techniques

Part 4, Conformance testing

Part 5, Reference software

Part 6, DMIF (Delivery Multimedia Integration Framework)

Part 7, optimized reference software for coding audio-video objects

Part 8, carry MPEG-4 content on IP networks

MPEG-4 parts (2)

Part 9, reference hardware implementationPart 10, Advanced Video Coding (AVC)Part 11, Scene description and application engine;

BIFS (Binary Format for Scene) and XMT (Extensible MPEG-4 Textual format)

Part 12, ISO base media file formatPart 13, IPMP extensionsPart 14, MP4 file format, version 2Part 15, AVC (advanced Video Coding) file formatPart 16, Animation Framework eXtension (AFX)Part 17, timed text subtitle formatPart 18, font compression and streamingPart 19, synthesized texture stream

MPEG-4 parts (3)

Part 20, Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)

Part 21, MPEG-J Graphics Framework eXtension (GFX)

Part 22, Open Font Format

Part 23, Symbolic Music Representation

Part 24, audio and systems interaction

Part 25, 3D Graphics Compression Model

Part 26, audio conformance

Part 27, 3D graphics conformance

Motivations for MPEG-4

Broad support for MM facilities are available2D and 3D graphics, audio and video – but

Incompatible content formats3D graphics formats as VRML are badly integrated to

2D formats as FLASH or HTML

Broadcast formats (MHEG) are not well suited for the Internet

Some formats have a binary representation – not all

SMIL, HTML+, etc. solve only a part of the problems

Both authoring and delivery are cumbersome

Bad support for multiple formats

MPEG-4: Audio/Visual (A/V) Objects

Simple video coding (MPEG-1 and –2)A/V information is represented as a sequence of

rectangular frames: Television paradigm

Future: Web paradigm, Game paradigm … ?

Object-based video coding (MPEG-4)A/V information: set of related stream objects

Individual objects are encoded as needed

Temporal and spatial composition to complex scenes

Integration of text, “natural” and synthetic A/V

A step towards semantic representation of A/V

Communication + Computing + Film (TV…)

Main parts of MPEG-4

1. Systems

– Scene description, multiplexing, synchronization, buffer management, intellectual property and protection management

2. Visual

– Coded representation of natural and synthetic visual objects

3. Audio

– Coded representation of natural and synthetic audio objects

4. Conformance Testing

– Conformance conditions for bit streams and devices

5. Reference Software

– Normative and non-normative tools to validate the standard

6. Delivery Multimedia Integration Framework (DMIF)

– Generic session protocol for multimedia streaming

Main objectives – rich data

Efficient representation for many data typesVideo from very low bit rates to very high quality

24 Kbs .. several Mbps (HDTV)

Music and speech data for a very wide bit rate rangeVery low bit rate speech (1.2 – 2 Kbps) ..

Music (6 – 64 Kbps) ..

Stereo broadcast quality (128 Kbps)

Synthetic objectsGeneric dynamic 2D and 3D objects

Specific 2D and 3D objects e.g. human faces and bodies

Speech and music can be synthesized by the decoder

Text

Graphics

Main objectives – robust + pervasive

Resilience to residual errorsProvided by the encoding layer

Even under difficult channel conditions – e.g. mobile

Platform independence

Transport independenceMPEG-2 Transport Stream for digital TV

RTP for Internet applications

DAB (Digital Audio Broadcast) . . .

However, tight synchronization of media

Intellectual property management + protectionFor both A/V contents and algorithms

Main objectives - scalability

ScalabilityEnables partial decoding

Audio - Scalable sound rendering quality

Video - Progressive transmission of different quality levels

- Spatial and temporal resolution

ProfilingEnables partial decoding

Solutions for different settings

Applications may use a small portion of the standard

“Specify minimum for maximum usability”

Main objectives - genericity

Independent representation of objects in a scene

Independent access for their manipulation and re-use

Composition of natural and synthetic A/V objects into one audiovisual scene

Description of the objects and the events in a scene

Capabilities for interaction and hyper linking

Delivery media independent representation format

Transparent communication between different delivery environments

Object-based architecture

MPEG-4 as a tool box

MPEG-4 is a tool box (no monolithic standard)

Main issue is not a better compression

No “killer” application (as DTV for MPEG-2)

Many new, different applications are possible

Enriched broadcasting, remote surveillance, games, mobile multimedia, virtual environments etc.

Profiles

Binary Interchange Format for Scenes (BIFS)Based on VRML 2.0 for 3D objects

“Programmable” scenes

Efficient communication format

MPEG-4 Systems part

MPEG-4 scene, VRML-like model

Logical scene structure

MPEG-4 Terminal Components

Digital Terminal Architecture

BIFS tools – scene features

3D, 2D scene graph (hierarchical structure)

3D, 2D objects (meshes, spheres, cones etc.)

3D and 2D Composition, mixing 2D and 3D

Sound composition – e.g. mixing, “new instruments”, special effects

Scalability and scene controlTerminal capabilities (TermCab)

MPEG-J for terminal control

Face and body animation

XMT - Textual format; a bridge to the Web world

BIFS tools – command protocol

Replace a scene with this new sceneA replace command is an entry point like an I-frame

The whole context is set to the new value

Insert node in a grouping nodeInstead of replacing a whole scene, just adds a node

Enables progressive downloads of a scene

Delete node - deletion of an element costs a few bytes

Change a field value; e.g. color, position, switch on/off an object

BIFS tools – animation protocol

The BIFS Command Protocol is a synchronized, but non streaming media

Anim is for continuous animation of scenes

Modification of any value in the scene

– Viewpoints, transforms, colors, lights

The animation stream only contains the animation values

Differential coding – extremely efficient

Elementary stream management

Object descriptionRelations between streams and to the scene

Auxiliary streams:IPMP – Intellectual Property Management and Protection

OCI – Object Content Information

Synchronization + packetization

– Time stamps, access unit identification, …

System Decoder Model

File format - a way to exchange MPEG-4 presentations

An example MPEG-4 scene

MPEG-7

• Standard for the description of multimedia content – XML Schema for content description– Does not standardize extraction of

descriptions– MPEG1, 2, and 4 make content

available– MPEG7 makes content semantics

available

Digital Video Interactive

Digital Video Interactive (DVI) was the first multimedia desktop video standard for IBM-compatible personal computers.

It enabled full-screen, full motion video, as well as stereo audio, still images, and graphics to be presented on a DOS-based desktop computer.

The scope of Digital Video Interactive encompasses a file format, including a digital container format, a number of video and audio compression formats, as well as hardware associated with the file format.[1]

Contents1 His

Digital Video Interactive• The DVI format specified two video compression schemes,

Presentation Level Video or Production Level Video (PLV) and Real-Time Video (RTV) and two audio compression schemes, ADPCM and PCM8.[3][1]

• The original video compression scheme, called Presentation Level Video (PLV), was asymmetric in that a Digital VAX-11/750 minicomputer was used to compress the video in non-real time to 30 frames per second with a resolution of 320x240.

• Encoding was performed by Intel at its facilities or at licensed encoding facilities set up by Intel.[4] Video compression involved coding both still frames and motion-compensated residuals using Vector Quantization (VQ) in dimensions 1, 2, and 4.

Documents

UNIT V Video Compression. 2 Outline 1. Introduction to Video Compression 2 Video Compression with Motion Compensation 3 Search for Motion Vectors 4 H.261