Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member,

IEEE

Brief recall of the H.264 encode and decode structure

Transform in H.264 DCT and Integer transform Low-Complexity integer transform(author

proposed) Quantization in H.264

Three Parts: Prediction, Transform, Quantization

Prediction Transform QuantizationInput block

Entropy

Coding

Prediction: Generate block prediction by Motion Estimation.Transform: Convert the difference between the prediction and true value into coefficients by integer transform.Quantization: Quantize the coefficients.

Transmit

DCT(Discrete Cosine Transform)Commonly used in block transform coding of image and video, e.g. JPEG and MPEG.

Definition for 8x8 block:

Convert image from spatial domain to frequency domain

In H.264, 4x4 block transform is adopted

Problem: Coefficients are irrational numbers. In digital computer, when you do an inverse transform after forward transform of an input, It may not get the same input back.

Solution: Integer Transform An integer approximation of DCT.

Original H.264 design: {a=13, b=7, c=17}

Problem: increase of dynamic range.

If max(X(i,j))=A, then max(Y(u,v))= A x (13x4)^2 = 2704 x A.Log2(2704) = 11.4, So it needs 12 more bits to encode Y(u,v) than X(i,j)

Choose {a=1, b=2, c=1}1.Rows are orthogonal to each other.2.The dynamic range gain is log2(6^2) = 5.173.Although the norm of each row is different,

it can be easily compensated in quantization part.

No noticeable performance penalty while reducing the dynamic range gain and simplicity.

Inverse transform

We could just use the transpose of H. However, in order to minimize the dynamic range gain, we scale the rows that has element 2 in H’ by ½. So it becomes,

Dynamic range gain = log2(4^2) = 4 bits.

Also, the factor ½ can be realized by right shift 1 bit, so no multiplication needed.

Forward transform Inverse transform

It is the step that introduces signal loss for better compression.

Encoder quantization is given by

where controls the quantization width near the origin.

The decoder produces reverse quantization by

ssq QQfjiXjiXsignjiX /))(|),()}(|,({),(

)( sQf

( , ) ( , )r s qX i j Q X i j

There must be as low complexity as possible since the H.264 uses predictive coding which means that the error will tend to drift over the entire set for each prediction.

Memory requirements are very high for 32-bit operations hence the arithmetic must be as close to 16-bit as possible.

There must be no undue stress on the hardware yet keeping the prediction drift error free.

The disadvantage of the quantizing equation is that it divides by an integer .

In the H.264 format the quantization is of the form

The inverse quantization is given by

The values A(Q) and B(Q) are obtained from the quantization tables.

( , ) { ( , )}[(| ( , ) | ( ) 2 ) ]LqX i j sign X i j X i j A Q f L

( , ) ( , ) ( )r qX i j X i j B Q1( 2 )T N

r rx H X e N

In the previous equation

And Q varies from 0 to Qmax. Hence 0 is the finest and Qmax is the coarsest

quantization.

Care must be taken during shifting the bits right since repeated division means tending towards negative infinity and not 0.

In the original H.264 design, L=N=20.

1111T

e

The values A(Q) and B(Q) must satisfy the form where G is the squared norm of the rows of H.

The values of L & N are chosen on a compromise. Larger values reduce approximation error in the above equation and smaller values reduce dynamic range.

2( ) ( ) 2L NA Q B Q G

The complexity of quantization formulae are reduced considerably by reducing them to 16 bits.

However, this reduction must be traded off with no reduction in PSNR.

This is done by effectively reducing values of B(Q), L & N.

B(Q) effectively doubles for an increase of 6 in Q making it a linear relationship between PSNR and Q.

This makes it easier to design quantization and reconstruction tables.

The H.264 hence uses the modified quantization and reconstruction formulae

Where

The mod operator makes the quantization factor periodic making it easy to define a large range of parameters without increasing memory requirements

17( , ) { ( , )}[(| ( , ) | ( , , ) 2 ) (17 )

( , ) ( , ) ( , , )

EQq M E

r q M E

X i j sign X i j X i j A Q i j f Q

X i j X i j B Q i j Q

mod6

/ 6

M

E

Q Q

Q Q

The matrices shown denote values of A(Q) and B(Q) such that the matrices maximise dynamic range.

These ensure that results always fall within a 16 bit result.

Documents

Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE