Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

Overview of the Scalable Video Coding Extension of the H.264/AVC StandardHeiko Schwarz, Detlev Marpe, and Thomas Wiegand

CSVT, Sept. 2007

2009/5 MC2008, VCLAB 1

Outline

Introduction Problems Definition Functionality Goal Competition Applications Targets

History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions

2007/8 MC2008, VCLAB 2

Introduction - problem

Non-Scalable Video Streaming Multiple video streams are needed for

heterogeneous clients

2007/8 MC2008, VCLAB 3

8Mb/s

6Mb/s 4Mb/s

1Mb/s

512Kb/s

Introduction - definition

Scalable video stream

Scalability Removal of parts of the video bit-stream to adapt

to the various needs of end users and to varying terminal capabilities or network conditions

2007/8 MC2008, VCLAB 4

Sub-stream 1Sub-stream 2

Sub-stream n

…

Sub-stream k1

Sub-stream k2

Sub-stream ki…reconstruc

tion

High quality

Low quality

Introduction - functionality

Functionality of SVC Graceful degradation when “right” parts of the bit-

stream are lost Bit-rate adaptation to match the channel

throughput Format adaptation for backwards compatible

extension Power adaptation for trade-off between runtime

and quality

2007/8 MC2008, VCLAB 5

Introduction - goal

Goal of SVC

Scalability mode Fidelity reduction (SNR scalability) Picture size reduction (spatial scalability) Frame rate reduction (temporal scalability) Sharpness reduction (frequency scalability) Selection of content (ROI or object-based

scalability)

2007/8 MC2008, VCLAB 6

Sub-stream k1

Sub-stream k2

Sub-stream ki…

H.264/AVC bit-stream

=(Quality)

Introduction - competition

SVC is an old research topic (> 20 years) and has been included in H.262/MPEG-2, H.263, and MPEG-4 Visual. Rarely used because

The characteristics of traditional video transmission systems

Significant loss of coding efficiency and large increase in decoder complexity

Competition Simulcast Transcoding

2007/8 MC2008, VCLAB 7

Introduction - applications

Applications Heterogeneous clients Unequal protection Surveillance

Problems Increased decoder complexity Decreased coding efficiency Temporal scalability is more often supported than

spatial and quality scalability.

2007/8 MC2008, VCLAB 8

Introduction - targets

Targets Little decrease in coding efficiency Little increase in decoding complexity Support of temporal, spatial, and quality

scalability A backward compatible base layer Simple bit-stream adaptations after encoding

2007/8 MC2008, VCLAB 9

History of SVC

October 2003: MPEG issues a call for proposals of Scalable Video Coding 12 wavelet-based 2 extensions of H.264/AVC

~October 2004: MSRA vs. HHI proposal (Wavelet-based vs. H.264 Extension)

October 2004: HHI proposal adopted as starting point (due to reduction of the encoder and decoder and improvements in coding efficiency)

January 2005: MPEG and VCEG agree to jointly finalize the SVC project as an Amendment of H.264/AVC

Spring 2007: Finalization

2007/8 MC2008, VCLAB 10

Structure of SVC

2007/8 MC2008, VCLAB 11

Spatial decimation

Temporal scalable coding

Temporal scalable coding

Prediction

Prediction

Base layer coding

Base layer coding

SNR scalable coding

SNR scalable coding

Multiplex

Outline

Introduction History of SVC Structure of SVC Temporal Scalability

Hierarchical prediction structure Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC Conclusions

2007/8 MC2008, VCLAB 12

Temporal Scalability

Hierarchical prediction structures

2007/8 MC2008, VCLAB 13

0 1234 5 67 8 9101112 13 1415 16

0 123 4 56 7 8 9 101112 13 1415 16 17 18

0 1 2 3 4 5 6 7 8 9 1011 1213 14 15 16

Hierarchical B pictures

Non-dyadic hierarchical prediction

Hierarchical prediction with zero delay

GOP


Combination with multiple reference picture Arbitrary modification of the prediction

structure Issue of quantization

Lower layers with higher fidelity Smaller QPs are used in lower layers

Propagation of quantization error smaller QPs are used in higher layers

2007/8 MC2008, VCLAB 14


2007/8 MC2008, VCLAB 16

I

I

I

I

P P P P P P P P

P P PP

P P

P

B0 B0 B0 B0

B0B0

B0

B1 B1 B1 B1

B1 B1B2 B2 B2 B2

N=1

N=2

N=4

N=8

Temporalscalability

Video Coding Experiment with H.264/MPEG4-AVCForeman, CIF 30Hz @ 1320kbpsPerformance as a function of N

Cascaded QP assignmentQP(P) QP(B0)-3 QP(B1)-4 QP(B2)-5

This slide is copied from JVT-W132-Talk


Coding efficiency of hierarchical prediction JSVM11, High profile with CABAC Only one reference frame

2007/8 MC2008, VCLAB 18

CIF


Compared with IPPP (With and without delay constraint)

Providing temporal scalability usually doesn’t have any negative impact on coding efficiency

2007/8 MC2008, VCLAB 19

Outline

Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability

Inter layer prediction Inter layer motion prediction Inter layer residual prediction Inter layer intra prediction

Quality Scalability Combined Scalability Profiles of SVC Conclusions

2007/8 MC2008, VCLAB 20

Spatial Scalability

2007/8 MC2008, VCLAB 21

H.264/AVC MCP & Intra-prediction

Hierarchical MCP & Intra-prediction

Hierarchical MCP & Intra-prediction

Base layer coding

Base layer coding

Base layer coding

texture

motion

texture

motion

texture

motion

Inter-layer prediction•Intra•Motion•Residual

Inter-layer prediction•Intra•Motion•Residual

Spatial decimation

Spatial decimation

MultiplexScalable

bit-stream

H.264/AVC compatible coder

H.264/AVC compatible base layer bit-stream

Spatial Scalability

Similar to MPEG-2, H.263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction

2007/8 MC2008, VCLAB 22

Intra

IntraSpatial 0Temporal 0Temporal 1

Spatial 1Temporal 2

Spatial Scalability

The prediction signals are formed by MCP inside the enhancement layer (Temporal) (small motion and

high spatial detail)

Up-sampling from the lower layer (Spatial) Average of the above two predictions (Temporal + Spatial)

Inter-layer prediction Three kinds of inter-layer prediction

Inter-layer motion prediction Inter-layer residual prediction Inter-layer intra prediction

Base mode MB Only residual are transmitted, but no additional side info.

2007/8 MC2008, VCLAB 23

Spatial Scalability

Inter-layer motion prediction base_mode_flag = 1 The reference layer is inter-coded Data are derived from the reference layer

MB partitioning Reference indices MVs

motion_pred_flag 1: MV predictors are obtained from the reference layer 0: MV predictors are obtained by conventional spatial

predictors.

2007/8 MC2008, VCLAB 24

(x1,y1)

Reference layer

1616

88

(x2,y2)

(2x2,2y2) (2x1,2y1)

Spatial Scalability

Inter-layer residual prediction residual_pred_flag = 1 Predictor

Block-wise up-sampling by a bi-linear filter from the corresponding 88 sub-MB in the reference layer

Transform block basis

2007/8 MC2008, VCLAB 25

Spatial Scalability

Inter-layer intra prediction base_mode_flag = 1 The reference layer is intra-coded Up-sampling from the reference layer

Luma: one-dimensional 4-tap FIR filter Chroma: bi-linear filter

2007/8 MC2008, VCLAB 26

Spatial Scalability

Past spatial scalable video: Inter-layer intra prediction requires completely decoding

of base layer. Multiple motion compensation and deblocking filter are

needed. Full decoding + inter-layer prediction: complexity >

simulcast. Single-loop decoding

Inter-layer intra prediction is restricted to MBs for which the co-located base layer is intra-coded

2007/8 MC2008, VCLAB 27

Spatial Scalability

Single-loop vs. multi-loop decoding

2007/8 MC2008, VCLAB 28This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf

Inter

I B P

http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf

Spatial Scalability

Generalized spatial scalability in SVC Arbitrary ratio

Only restriction: Neither the horizontal nor the vertical resolution can decrease from one layer to the next.

Cropping Containing new regions Higher quality of interesting regions

2007/8 MC2008, VCLAB 29

Spatial Scalability

Coding efficiency Multiple-loop > Single-loop

2007/8 MC2008, VCLAB 30

Spatial Scalability

Coding efficiency (IPPP…) Multi-loop > Single-loop

2007/8 MC2008, VCLAB 32

Spatial Scalability

Encoder control (JSVM) Base layer

p0 is optimized for base layer

Enhancement layer p1 is optimized for enhancement layer

Decisions of p1 depend on p0 Efficient base layer coding but inefficient

enhancement layer coding

2007/8 MC2008, VCLAB 33

)}()({minarg' 00000}{

00

pRpDpp

)}|()|({minarg' 0111011}|{

101

ppRppDppp

Spatial Scalability

Encoder control (optimization) Base layer

Considering enhancement layer coding Eliminating p0’s disadvantaging enhancement layer coding

Enhancement layer

No change w

w = 0: JSVM encoder control w = 1: Single-loop encoder control (base layer is not

controlled)

2007/8 MC2008, VCLAB 34

)]}|()|([)]()()[1{(minarg' 011101100000}|,{

0010

ppRppDwpRpDwpppp

Spatial Scalability

Coding efficiency of optimal encoder control Optimized encoder vs. JSVM encoder (QPE = QPB

+ 4)

2007/8 MC2008, VCLAB 35

Outline

Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability

CGS MGS Drift control

Combined Scalability Profiles of SVC Conclusions

2007/8 MC2008, VCLAB 36

Quality Scalability

Coarse-grain quality scalability (CGS) A special case of spatial scalability

Identical sizes (resolution) for base and enhancement layers

Smaller quantization step sizes for higher enhancement residual layers

Designed for only several selected bit-rate points Supported bit-rate points = Number of layers

Switch can only occur at IDR access units

2007/8 MC2008, VCLAB 37

Quality Scalability

Medium-grain quality scalability (MGS) More enhancement layers are supported

Refinement quality layers of residual Key pictures

Drift control Switch can occur at any access units CGS + key pictures + refinement quality layers

2007/8 MC2008, VCLAB 38

Quality Scalability

Drift control Drift: The effect caused by unsynchronized MCP

at the encoder and decoder side Trade-off of MCP in quality SVC

Coding efficiency drift

2007/8 MC2008, VCLAB 39

Quality Scalability

MPEG-4 quality scalability with FGS

Base layer is stored and used for MCP of following pictures Drift: Drift free Complexity: Low Efficiency: Efficient based layer but inefficient enhancement

layer Refinement data are not used for MCP

2007/8 MC2008, VCLAB 40

Base layer

Refinement(possibly lost or truncated)

Quality Scalability MPEG-2 quality scalability (without FGS)

Only 1 reference picture is stored and used for MCP of following pictures

Drift: Both base layer and enhancement layer Frequent intra updates is necessary

Complexity: Low Efficiency: Efficient enhancement layer but inefficient base

layer

2007/8 MC2008, VCLAB 41

Base layer


Quality Scalability 2-loop prediction

Several closed encoder loops run at different bit-rate points in a layered structure

Drift: Enhancement layer Complexity: High Efficiency: Efficient base layer and medium efficient

enhancement layer

2007/8 MC2008, VCLAB 42

Base layer


Quality Scalability

SVC concepts

Key picture Trade-off between coding efficiency and drift MPEG-4 FGS: All key pictures MPEG-2 quality scalability: Non-key pictures

2007/8 MC2008, VCLAB 43

Base layer


Quality Scalability

Drift control with hierarchical prediction

Key pictures Based layer is stored and used for the MCP of following pictures

Other pictures Enhancement layer is stored and used for the MCP of following

pictures GOP size adjusts the trade-off between enhancement layer

coding efficiency and drift

2007/8 MC2008, VCLAB 44

Base layer


P P PB1B1B2 B2 B2 B2

Quality Scalability

Comparisons of drift control

2007/8 MC2008, VCLAB 45

Low efficiency

High efficiency

Drift

Drift-free

Quality Scalability

Comparisons of coding efficiency

2007/8 MC2008, VCLAB 46

High dQP

Low dQP

QSTEP = 2 (QP-4)/6

Quality Scalability

MGS with key pictures using optimized encoder control

2007/8 MC2008, VCLAB 47

Only base layer

Outline

Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability

SVC encoder structure Dependence and Quality refinement layers Bit-stream format Bit-stream switching

Profiles of SVC Conclusions

2007/8 MC2008, VCLAB 48

Combined Scalability

SVC encoder structure

2007/8 MC2008, VCLAB 49

Dependency

layer

The same motion/prediction

information

The same motion/prediction

information

Temporal Decomposition


Dependency and Quality refinement layers

2007/8 MC2008, VCLAB 50

D = 2

Q = 2

Q = 1

Q = 0

D = 1

Q = 2

Q = 1

Q = 0

D = 0

Q = 2

Q = 1

Q = 0

Scalable bit-stream


2007/8 MC2008, VCLAB 51

T0

D1

Q1

Q0

D0

Q1

Q0

T2 T1 T2 T0


Bit-stream format

2007/8 MC2008, VCLAB 52

NAL unit header NAL unit header extension NAL unit payload

1 1 1 1 1 323362

P T D Q

P (priority_id): indicates the importance of a NAL unitT (temporal_id): indicates temporal levelD (dependency_id): indicates spatial/CGS layerQ (quality_id): indicates MGS/FGS layer


Bit-stream switching Inside a dependency layer

Switching everywhere Outside a dependency layer

Switching up only at IDR access units Switching down everywhere if using multiple-loop

decoding

2007/8 MC2008, VCLAB 53

Outline

Introduction History of SVC Structure of SVC Temporal Scalability Spatial Scalability Quality Scalability Combined Scalability Profiles of SVC

Scalable Baseline Scalable High Scalable High Intra

Conclusions

2007/8 MC2008, VCLAB 54

Profiles of SVC

Scalable Baseline For conversational and surveillance applications requiring

low decoding complexity Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned

cropping Temporal and quality scalability: arbitrary No interlaced coding tools B-slices, weighted prediction, CABAC, and 8x8 luma

transform The base layer conforms Baseline profile of H.264/AVC

2007/8 MC2008, VCLAB 55

Profiles of SVC

Scalable High For broadcast, streaming, and storage Spatial, temporal, and quality scalability: arbitrary The base layer conforms High profile of

H.264/AVC Scalable High Intra

Scalable High + all IDR pictures

2007/8 MC2008, VCLAB 56

Conclusions

Temporal scalability Hierarchical prediction structure

Spatial and quality scalability Inter-layer prediction of Intra, motion, and residual information Single-loop MC decoding Identical size for each spatial layer – CGS CGS + key pictures + quality refinement layer – MGS

applications Power adaption – decoding needed part of the video stream Graceful degradation – when “right” parts are lost Format adaption – backwards compatible extension in mobile TV

What’s next in SVC? Bit-depth scalability (8-bit 4:2:0 10-bit 4:2:0) Color format scalability (4:2:0 4:4:4)

2007/8 MC2008, VCLAB 57

References

H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007.

T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007.

T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm)

H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05.

2007/8 MC2008, VCLAB 58

http://iphome.hhi.de/wiegand/dic.htm

Documents

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard