Upload
dwight-holmes
View
215
Download
0
Embed Size (px)
Citation preview
UNIT V
Video Compression
2
Outline
1. Introduction to Video Compression
2 Video Compression with Motion Compensation
3 Search for Motion Vectors
4 H.261
5 H.263
6 MPEG 1,2,4,7
7 Digital video interface
3
Introduction to Video Compression
A video consists of a time-ordered sequence of
frames,
i.e., images. An obvious solution to video compression would be
predictive coding based on previous frames. Compression proceeds by subtracting images:
subtract in time order and code the residual error. It can be done even better by searching for just the
right parts of the image to subtract from the
previous
frame.
4
Video Compression with Motion Compensation
Consecutive frames in a video are similar
- temporal redundancy exists. Temporal redundancy is exploited so that not every
frame of the video needs to be coded
independently
as a new image. The difference between the current frame and
other
frame(s) in the sequence will be coded
- small values and low entropy, good for
compression.
5
Video Compression with Motion Compensation
Steps of Video compression based on
Motion Compensation (MC):
1. Motion estimation (motion vector search).
2. MC-based Prediction.
3. Derivation of the prediction error, i.e., the
difference.
6
Motion Compensation
Each image is divided into macroblocks of size N×N.
• By default, N = 16 for luminance images. • For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted.
Chap 10 Basic Video Compression Techniques 7
Motion Compensation
Motion compensation is performed at the
macroblock level.
• The current image frame is referred to as
Target Frame.
• A match is sought between the macroblock in the
Target Frame and the most similar macroblock in
previous and/or future frame(s) (Reference
frame(s)).
• The displacement of the reference macroblock to
the
target macroblock is called a motion vector MV.
Chap 10 Basic Video Compression Techniques 8
Fig. 10.1: Macroblocks and Motion Vector in Video Compression.
Chap 10 Basic Video Compression Techniques 9
Figure 10.1 shows the case of forward prediction in which the Reference frame is taken to be a previous frame.
MV search is usually limited to a small immediate neighborhood – both horizontal and vertical displacements in the range [−p, p]: This makes a search window of size (2p+1)×(2p+1).
10
Search for Motion Vectors
The difference between two macroblocks can then be
measured by their Mean Absolute Difference (MAD) 1 1
20 0
1( , ) ( , ) ( , )
- -
= =å å= + + - + + + +N N
k lMAD i j C x k y l R x i k y j l
N
11
Search for Motion Vectors
The goal of the search is to find a vector (i, j)
as the motion vector MV = (u,v),
such that MAD(i, j) is minimum:
( , ) [( , ) | ( , ) is minimum,
[ , ], [ , ] ]
=
Î - Î -
u v i j MAD i j
i p p j p p
12
Sequential Search
Sequential search: sequentially search the whole
(2p+1)×(2p+1) window in the reference frame
(also referred to as full search or exhaustive
search).
• A macroblock centered at each of the positions
within the window is compared to the macroblock
in the Target frame pixel by pixel and their
respective MAD is then derived
• The vector (i, j) that offers the least MAD is
designated as the MV (u, v) for the macroblock in
the Target frame.
13
• Sequential search method is very costly
• Assuming each pixel comparison requires three
operations (subtraction, absolute value, addition),
the cost for obtaining a motion vector for a
single macroblock is
2 2 2(2 1) (2 1) 3 ( )+ × + × × Þp p N O p N
14
Motion-vector: sequential-search
15
2D Logarithmic Search
Logarithmic search: a cheaper version, that is suboptimal but still usually effective. The procedure for 2D Logarithmic Search of motion vectors takes several iterations and is akin to a binary search:• Initially only nine locations in the search window are used as seeds for a MAD-based search; they are marked as ‘1’.
16
• After the one that yields the minimum MAD is located, the center of the new search region is moved to it and the step-size (offset) is reduced to half.
• In the next iteration, the nine new locations are marked as ‘2’, and so on.
17
Fig. 1: 2D Logarithmic Search for Motion Vectors.
18
Motion-vector: 2D-logarithmic-search
Chap 10 Basic Video Compression Techniques 19
Using the same example as in the previous
subsection,
the total operations per second is dropped to:
Chap 10 Basic Video Compression Techniques 20
Hierarchical Search
The search can benefit from a hierarchical
(multiresolution) approach in which initial estimation of the motion vector can be obtained from images with a significantly reduced resolution.
Figure 10.3: a three-level hierarchical search in which the original image is at Level 0, images at Levels 1 and 2 are obtained by down-sampling from the previous levels by a factor of 2, and the initial search is conducted at Level 2.
Since the size of the macroblock is smaller and p can also be proportionally reduced, the number of operations required is greatly reduced.
Chap 10 Basic Video Compression Techniques 21
Fig. 10.3: A Three-level Hierarchical Search for Motion Vectors.
22
Table 10.1 Comparison of Computational Cost of Motion Vector Search based on examples
Chap 10 Basic Video Compression Techniques 23
H.261: An earlier digital video compression standard, its principle of MC-based compression is retained in all later video compression standards.
• The standard was designed for videophone, video conferencing and other audiovisual services over ISDN.
• The video codec supports bit-rates of p×64 kbps, where p ranges from 1 to 30.
• Require that the delay of the video encoder be less than 150 msec so that the video can be used for
real-time bidirectional video conferencing.
H.261
Chap 10 Basic Video Compression Techniques 24
Table 10.2 Video Formats Supported by H.261
25
Fig. 10.4: H.261 Frame Sequence.
26
Two types of image frames are defined: Intra-frames
(I-frames) and Inter-frames (P-frames):
I-frames are treated as independent images. Transform coding method similar to JPEG is applied within each I-frame.
P-frames are not independent: coded by a forward predictive coding method (prediction from previous
I-frame or P-frame is allowed).
H.261 Frame Sequence
27
Temporal redundancy removal is included in P-frame coding, whereas I-frame coding performs only
spatial redundancy removal.
To avoid propagation of coding errors, an I-frame is usually sent a couple of times in each second of the video.
Motion vectors in H.261 are always measured in units of full pixel and they have a limited range of ±15 pixels, i.e., p = 15.
H.261 Frame Sequence
28
Intra-frame (I-frame) Coding
Fig. 10.5: I-frame Coding.
29
Macroblocks are of size 16×16 pixels for the Y frame, and 8×8 for Cb and Cr frames, since 4:2:0 chroma subsampling is employed.
A macroblock consists of
four Y, one Cb, and one Cr 8×8 blocks.
For each 8×8 block a DCT transform is applied,
the DCT coefficients then go through quantization, zigzag scan, and entropy coding.
Intra-frame (I-frame) Coding
30
Inter-frame (P-frame) Coding
Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation.
31
• For each macroblock in the Target frame, a motion vector is allocated by one of the search methods discussed earlier.
• After the prediction, a difference macroblock is derived to measure the prediction error.
• Each of these 8x8 blocks go through DCT, quantization, zigzag scan and entropy coding procedures.
Inter-frame (P-frame) Coding
32
The P-frame coding encodes the difference macroblock (not the Target macroblock itself).
Sometimes, a good match cannot be found, i.e., the prediction error exceeds a certain acceptable level.
• The MB itself is then encoded (treated as an Intra MB) and in this case it is termed a non-motion compensated MB.
For motion vector, the difference MVD is sent for entropy coding:
• MVD = MVPreceding −MVCurrent
Inter-frame (P-frame) Coding
33
• The quantization in H.261 uses a constant step size, for all DCT coefficients within a macroblock.
• If we use DCT and QDCT to denote the DCT coefficients before and after the quantization, then for DC coefficients in Intra mode:
Quantization in H.261
• For all other coefficients:
• scale - an integer in the range of [1, 31].
34
Fig. 10.7 shows a relatively complete picture of how the H.261 encoder and decoder work.
• A scenario is used where frames I, P1, and P2 are encoded and then decoded.
Note: decoded frames (not the original frames) are used as reference frames in motion estimation.
The data that goes through the observation points indicated by the circled numbers are summarized in Tables 10.3 and 10.4.
H.261 Encoder and Decoder
35
I
I
I
I original image decoded imageI
0
Fig. 10.6(a): H.261 Encoder (I-frame).
36
I
I
decoded imageI
0
Fig. 10.6(b): H.261 Decoder (I-frame).
37
1P
1P
1D
1P'1P
1D
'1P
1P original image
decoded image
'1P prediction 1D prediction error
1D decoded prediction error
'1 1 1D P P
'1 1 1P D P
Fig. 10.6(a): H.261 Encoder (P-frame).
38
1D
1P
'1P
'1P
'1P prediction
decoded (reconstructed) image1P1D decoded prediction error
Fig. 10.6(b): H.261 Decoder (P-frame).
39
40
1PI
'1P 1P
Fig. .1: Macroblocks and Motion Vector in Video Compression.
41
1P1P
I '1P
1D
Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation.
42
• Fig. 10.8 shows the syntax of H.261 video bitstream: a hierarchy of four layers:
Picture, Group of Blocks (GOB), Macroblock,
and Block.
1. The Picture layer: PSC (Picture Start Code) delineates boundaries between pictures.
TR (Temporal Reference) provides a time-stamp for the picture.
Syntax of H.261 Video Bitstream
43
2. The GOB layer: H.261 pictures are divided into regions of 11×3 macroblocks, each of which is called a Group of Blocks (GOB).
• Fig. 10.9 depicts the arrangement of GOBs in a CIF or QCIF luminance image.
• For instance, the CIF image has 2×6 GOBs, corresponding to its image resolution of 352×288 pixels. Each GOB has its Start Code (GBSC) and Group number (GN).
• In case a network error causes a bit error or the loss of some bits, H.261 video can be recovered and resynchronized at the next identifiable GOB.
44
3. The Macroblock layer: Each Macroblock (MB) has its own Address indicating its position within the GOB, Quantizer (MQuant), and six 8×8 image blocks
(4 Y, 1 Cb, 1 Cr).
4. The Block layer: For each 8x8 block, the bitstream starts with DC value, followed by pairs of length of zero-run (Run) and the subsequent non-zero value (Level) for ACs, and finally the End of Block (EOB) code. The range of Run is [0, 63].
Level reflects quantized values
- its range is [−127; 127] and Level ≠ 0.
Ch 45
Fig. 10.8: Syntax of H.261 Video Bitstream.
46
Fig. 10.9: Arrangement of GOBs in H.261 Luminance Images.
47
H.263 is an improved video coding standard for video conferencing and other audiovisual services transmitted on Public Switched Telephone Networks (PSTN).
• Aims at low bit-rate communications at bit-rates of less than 64 kbps.
• Uses predictive coding for inter-frames to reduce temporal redundancy and transform coding for the remaining signal to reduce spatial redundancy (for both Intra-frames and inter-frame prediction).
H.263
Li & Drew; 인터넷미디어공학부 임창훈 48
Table 10.5 Video Formats Supported by H.263
Chap 10 Basic Video Compression Techniques Li & Drew; 인터넷미디어공학부 임창훈 49
As in H.261, H.263 standard also supports the notion of Group of Blocks (GOB).
The difference is that GOBs in H.263 do not have a fixed size, and they always start and end at the left and right borders of the picture.
As shown in Fig. 10.10, each QCIF luminance image consists of 9 GOBs and each GOB has 11×1 MBs (176×16 pixels), whereas each 4CIF luminance image consists of 18 GOBs and each GOB has 44×2 MBs (704×32 pixels).
H.263 & Group of Blocks (GOB)
Chap 10 Basic Video Compression Techniques Li & Drew; 인터넷미디어공학부 임창훈 50
Fig. 10.10: Arrangement of GOBs in H.263 Luminance Images.
Chap 10 Basic Video Compression Techniques 51
The horizontal and vertical components of the MV are predicted from the median values of the horizontal and vertical components, respectively, of MV1, MV2, MV3 from the “previous", “above" and “above and right" MBs (see Fig. 10.11 (a)).
For the Macroblock with MV(u; v):
Motion Compensation if H.263
52
Fig. 10.11: Prediction of Motion Vector in H.263.
Chap 10 Basic Video Compression Techniques 53
In order to reduce the prediction error, half-pixel precision is supported in H.263 vs. full-pixel precision only in H.261.
• The default range for both the horizontal and vertical components u and v of MV(u, v) are now [−16, 15.5].
• The pixel values needed at half-pixel positions are generated by a simple bilinear interpolation method, as shown in Fig. 10.12.
Half-Pixel Precision
Chap 10 Basic Video Compression Techniques 54
Fig. 10.12: Half-pixel Prediction by Bilinear Interpolation in H.263.
Chap 11 MPEG Video Coding 55
MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video.
It is appropriately recognized that proprietary interests need to be maintained within the family of MPEG standards:
Accomplished by defining only a compressed bitstream that implicitly defines the decoder.
The compression algorithms, and thus the encoders, are completely up to the manufacturers.
11.1 Overview
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 56
MPEG-1 adopts the CCIR601 digital TV format also known as SIF (Source Input Format).
MPEG-1 supports only non-interlaced video. Normally, its picture resolution is:
352×240 for NTSC video at 30 fps
352×288 for PAL video at 25 fps
It uses 4:2:0 chroma subsampling
The MPEG-1 standard has five parts:
Systems, Video, Audio, Conformance, Software.
11.2 MPEG-1
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 57
Motion Compensation (MC) based video encoding in H.261 works as follows:
In Motion Estimation (ME), each macroblock (MB) of the Target P-frame is assigned a best matching MB from the previously coded I or P frame - prediction.
Prediction error: The difference between the MB and its matching MB, sent to DCT and its subsequent encoding steps.
The prediction is from a previous frame - forward prediction.
Motion Compensation in MPEG-1
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 58
Fig. 11.1: The Need for Bidirectional Search.
The MB containing part of a ball in the Target frame cannot find a good matching MB in the previous frame because half of the ball was occluded by another object. A match however can readily be obtained from the next frame.
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 59
MPEG introduces a third frame type - B-frame, and its accompanying bi-directional motion compensation.
The MC-based B-frame coding idea is illustrated in Fig. 11.2:
Motion Compensation in MPEG-1
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 60
Fig. 11.2: B-frame Coding Based on Bidirectional Motion Compensation.
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 61
Each MB from a B-frame will have up to two motion vectors (MVs) (one from the forward and one from the backward prediction).
If matching in both directions is successful, then two MVs will be sent and the two corresponding matching MBs are averaged before comparing to the Target MB for generating the prediction error.
If an acceptable match can be found in only one of the reference frames, then only one MV and its corresponding MB will be used from either the forward or backward prediction.
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 62
Fig. 11.3: MPEG frame sequence.
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 63
Instead of GOBs as in H.261, an MPEG-1 picture can be divided into one or more slices (Fig. 11.4):
May contain variable numbers of macroblocks in a slice.
May also start and end anywhere as long as they fill the whole picture.
Each slice is coded independently –
additional flexibility in bit-rate control.
Slice concept is important for error recovery.
Other Major Differences from H.261
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 64
Fig. 11.4: Slices in an MPEG-1 Picture.
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 65
Fig. 11.5: Layers of MPEG-1 Video Bitstream.
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 66
MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps.
Defined seven profiles aimed at different applications:
Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2, Multiview.
Within each profile, up to four levels are defined (Table 11.5).
The DVD video specification allows only four display resolutions: 720×480, 704×480, 352×480, and 352×240.
11.3 MPEG-2
Chap 11 MPEG Video Coding Li & Drew; 인터넷미디어공학부 임창훈 67
MPEG-2
Need for MPEG-2
MPEG-1 allowed rates of 1.5 Mbps at SIF resolution and higher resolution coding standards were needed for direct video broadcasting and storage on DVB, DVD
MPEG-1 allowed encoding only of progressive scan sources, not interlaced scan sources
MPEG-1 provides limited error concealment for noisy channels
a more flexible choice of formats, resolutions and bitrates was needed
MPEG-2
MPEG-2 was designed mainly for storage (DVD, DVB) and transmission on noisy channels (direct terrestrial or satellite TV broadcast)
MPEG-2 standards were published as ISO/IEC 13818
like MPEG-1, the MPEG-2 standard only specifies the syntax of the bit stream and the semantics/operation of the decoding process and leaves out the design of the encoder and decoder (to stimulate competition and industry product differentiation) although it provides a reference implementation
developed between 1991-1993
parts of MPEG-2 reached International Standard in 1994, 1996, 1997, 1999
MPEG-3 was originally intended for HDTV at higher bitrates, but was merged with MPEG-2
MPEG-2 partspart 1, Systems : synchronization and multiplexing of audio and
video
part 2, video
part 3, audio
part 4, testing compliance
part 5, software simulation
part 6, extensions for Digital Storage Media Command and Control (DSM-CC)
part 7, Advanced Audio Coding (AAC)
part 9, extensions for real time interfaces
part 10, conformance extensions for DSM-CC
part 11, Intellectual Property Management and Protection
[ part 8 withdrawn due to lack of industry interest ]
MPEG-2 target applications
coding high-quality video at 4-15 Mbps for video on demand (VOD), standard definition (SD) and high-definition (HD) digital TV broadcasting and for storing video on digital storage media like the DVD
MPEG-2 should have scalable coding and should include error resilience techniques
MPEG-2 should provide good NTSC quality video at 4-6 Mbps and transparent NTSC quality video at 8-10 Mbps
MPEG-2 should provide random access to frames
MPEG-2 should be compatible with MPEG-1 (an MPEG-2 decoder should be able to decode an MPEG-1 bitstream)
low cost decoders
MPEG-2 SystemsMPEG-2 Systems offers 2 types of multiplexation bitstreams:
Program Stream: it consists of a sequence of PESs, similar and compatible to MPEG-1 Program (System) Stream, but containing additional features; MPEG-2 PS is a superset of MPEG-1 PS; it is suited for error-free transmission environments and has long and variable length packets (typically 1-2KB, but can also be 64KB) for coding efficiency; it has features not present in MPEG-1 PS like: scrambling of data, assigning different priorities to packets, alignment of elementary stream packets, copyright indication, fast forward and fast reverse indication.
Transport Stream: designed for transmission through noisy channels; has a small fixed size packet of 188 bytes; it is suited for cable/satellite TV broadcasting, ATM networks; allows synchronous multiplexing of programs with independent time bases, fast access to the desired program for channel hoping
PES (Packetized Elementary Stream) – is the central structure used in both Program and Transport Streams; results from packetizing cntinuous streams of compressed audio or video
MPEG-2 Systems multiplexation
(digital storage media)
Packetized Elementary Streams (PES)
Program Stream structure (simplified)
Transport Stream Structure (simplified)
MPEG-2 Profiles and LevelsMPEG-2 is designed o cover a wide range of applications, but not all
features are needed by all applications
MPEG-2 groups application features into 7 profiles and profiles have different levels
simple profile – for low-delay video conferencing applications using only I- and P- frames
main profile – most used, high quality digital video apps.
SNR (signal to noise ratio) scalable – supports multiple grades of video quality
spatially scalable - supports multiple grades of resolution
high – supports multiple grades of quality, resolution and chroma formats
4:2:2
multiview
MPEG-2 Profiles and Levels (2)
there are 4 levels for each profile:
low (for SIF pictures)
main (for ITU-R BT 601 resolution pictures)
high-1440 (for European HDTV resolution pictures)
high (for North America HDTV resolution pictures)
MPEG-2 Profiles and Levels (3)
Scalable coding
scalable coding – means coding the audio-video stream into a base layer and some enhancement layers, so that when the base layer is decoded basic quality is achieved, but if the transmission channel allows it, decoding enhancement layers brings additional quality to the decoded stream
There are 4 types of scalability
SNR scalability
spatial scalability
temporal scalability
hybrid (combination of the above)
SNR scalability
Spatial scalability
Encoding of interlaced video
MPEG-2 allows encoding of interlaced video and a frame can be intracoded or intercoded as a picture or as a field of picture
motion estimation/compensation can be between frames or between fields
MPEG-4
MPEG-4
MPEG-4, or ISO/IEC 14496 is an international standard describing coding of audio-video objects
the 1st version of MPEG-4 became an international standard in 1999 and the 2nd version in 2000 (6 parts); since then many parts were added and some are under development today
MPEG-4 included object-based audio-video coding for Internet streaming, television broadcasting, but also digital storage
MPEG-4 included interactivity and VRML support for 3D rendering
has profiles and levels like MPEG-2has 27 parts
MPEG-4 parts
Part 1, Systems – synchronizing and multiplexing audio and video
Part 2, Visual – coding visual data
Part 3, Audio – coding audio data, enhancements to Advanced Audio Coding and new techniques
Part 4, Conformance testing
Part 5, Reference software
Part 6, DMIF (Delivery Multimedia Integration Framework)
Part 7, optimized reference software for coding audio-video objects
Part 8, carry MPEG-4 content on IP networks
MPEG-4 parts (2)
Part 9, reference hardware implementationPart 10, Advanced Video Coding (AVC)Part 11, Scene description and application engine;
BIFS (Binary Format for Scene) and XMT (Extensible MPEG-4 Textual format)
Part 12, ISO base media file formatPart 13, IPMP extensionsPart 14, MP4 file format, version 2Part 15, AVC (advanced Video Coding) file formatPart 16, Animation Framework eXtension (AFX)Part 17, timed text subtitle formatPart 18, font compression and streamingPart 19, synthesized texture stream
MPEG-4 parts (3)
Part 20, Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
Part 21, MPEG-J Graphics Framework eXtension (GFX)
Part 22, Open Font Format
Part 23, Symbolic Music Representation
Part 24, audio and systems interaction
Part 25, 3D Graphics Compression Model
Part 26, audio conformance
Part 27, 3D graphics conformance
Motivations for MPEG-4
Broad support for MM facilities are available2D and 3D graphics, audio and video – but
Incompatible content formats3D graphics formats as VRML are badly integrated to
2D formats as FLASH or HTML
Broadcast formats (MHEG) are not well suited for the Internet
Some formats have a binary representation – not all
SMIL, HTML+, etc. solve only a part of the problems
Both authoring and delivery are cumbersome
Bad support for multiple formats
MPEG-4: Audio/Visual (A/V) Objects
Simple video coding (MPEG-1 and –2)A/V information is represented as a sequence of
rectangular frames: Television paradigm
Future: Web paradigm, Game paradigm … ?
Object-based video coding (MPEG-4)A/V information: set of related stream objects
Individual objects are encoded as needed
Temporal and spatial composition to complex scenes
Integration of text, “natural” and synthetic A/V
A step towards semantic representation of A/V
Communication + Computing + Film (TV…)
Main parts of MPEG-4
1. Systems
– Scene description, multiplexing, synchronization, buffer management, intellectual property and protection management
2. Visual
– Coded representation of natural and synthetic visual objects
3. Audio
– Coded representation of natural and synthetic audio objects
4. Conformance Testing
– Conformance conditions for bit streams and devices
5. Reference Software
– Normative and non-normative tools to validate the standard
6. Delivery Multimedia Integration Framework (DMIF)
– Generic session protocol for multimedia streaming
Main objectives – rich data
Efficient representation for many data typesVideo from very low bit rates to very high quality
24 Kbs .. several Mbps (HDTV)
Music and speech data for a very wide bit rate rangeVery low bit rate speech (1.2 – 2 Kbps) ..
Music (6 – 64 Kbps) ..
Stereo broadcast quality (128 Kbps)
Synthetic objectsGeneric dynamic 2D and 3D objects
Specific 2D and 3D objects e.g. human faces and bodies
Speech and music can be synthesized by the decoder
Text
Graphics
Main objectives – robust + pervasive
Resilience to residual errorsProvided by the encoding layer
Even under difficult channel conditions – e.g. mobile
Platform independence
Transport independenceMPEG-2 Transport Stream for digital TV
RTP for Internet applications
DAB (Digital Audio Broadcast) . . .
However, tight synchronization of media
Intellectual property management + protectionFor both A/V contents and algorithms
Main objectives - scalability
ScalabilityEnables partial decoding
Audio - Scalable sound rendering quality
Video - Progressive transmission of different quality levels
- Spatial and temporal resolution
ProfilingEnables partial decoding
Solutions for different settings
Applications may use a small portion of the standard
“Specify minimum for maximum usability”
Main objectives - genericity
Independent representation of objects in a scene
Independent access for their manipulation and re-use
Composition of natural and synthetic A/V objects into one audiovisual scene
Description of the objects and the events in a scene
Capabilities for interaction and hyper linking
Delivery media independent representation format
Transparent communication between different delivery environments
Object-based architecture
MPEG-4 as a tool box
MPEG-4 is a tool box (no monolithic standard)
Main issue is not a better compression
No “killer” application (as DTV for MPEG-2)
Many new, different applications are possible
Enriched broadcasting, remote surveillance, games, mobile multimedia, virtual environments etc.
Profiles
Binary Interchange Format for Scenes (BIFS)Based on VRML 2.0 for 3D objects
“Programmable” scenes
Efficient communication format
MPEG-4 Systems part
MPEG-4 scene, VRML-like model
Logical scene structure
MPEG-4 Terminal Components
Digital Terminal Architecture
BIFS tools – scene features
3D, 2D scene graph (hierarchical structure)
3D, 2D objects (meshes, spheres, cones etc.)
3D and 2D Composition, mixing 2D and 3D
Sound composition – e.g. mixing, “new instruments”, special effects
Scalability and scene controlTerminal capabilities (TermCab)
MPEG-J for terminal control
Face and body animation
XMT - Textual format; a bridge to the Web world
BIFS tools – command protocol
Replace a scene with this new sceneA replace command is an entry point like an I-frame
The whole context is set to the new value
Insert node in a grouping nodeInstead of replacing a whole scene, just adds a node
Enables progressive downloads of a scene
Delete node - deletion of an element costs a few bytes
Change a field value; e.g. color, position, switch on/off an object
BIFS tools – animation protocol
The BIFS Command Protocol is a synchronized, but non streaming media
Anim is for continuous animation of scenes
Modification of any value in the scene
– Viewpoints, transforms, colors, lights
The animation stream only contains the animation values
Differential coding – extremely efficient
Elementary stream management
Object descriptionRelations between streams and to the scene
Auxiliary streams:IPMP – Intellectual Property Management and Protection
OCI – Object Content Information
Synchronization + packetization
– Time stamps, access unit identification, …
System Decoder Model
File format - a way to exchange MPEG-4 presentations
An example MPEG-4 scene
MPEG-7
• Standard for the description of multimedia content – XML Schema for content description– Does not standardize extraction of
descriptions– MPEG1, 2, and 4 make content
available– MPEG7 makes content semantics
available
Digital Video Interactive
Digital Video Interactive (DVI) was the first multimedia desktop video standard for IBM-compatible personal computers.
It enabled full-screen, full motion video, as well as stereo audio, still images, and graphics to be presented on a DOS-based desktop computer.
The scope of Digital Video Interactive encompasses a file format, including a digital container format, a number of video and audio compression formats, as well as hardware associated with the file format.[1]
Contents1 His
Digital Video Interactive• The DVI format specified two video compression schemes,
Presentation Level Video or Production Level Video (PLV) and Real-Time Video (RTV) and two audio compression schemes, ADPCM and PCM8.[3][1]
• The original video compression scheme, called Presentation Level Video (PLV), was asymmetric in that a Digital VAX-11/750 minicomputer was used to compress the video in non-real time to 30 frames per second with a resolution of 320x240.
• Encoding was performed by Intel at its facilities or at licensed encoding facilities set up by Intel.[4] Video compression involved coding both still frames and motion-compensated residuals using Vector Quantization (VQ) in dimensions 1, 2, and 4.