17
Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Embed Size (px)

Citation preview

Page 1: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member,

IEEE

Page 2: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Brief recall of the H.264 encode and decode structure

Transform in H.264 DCT and Integer transform Low-Complexity integer transform(author

proposed) Quantization in H.264

Page 3: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Three Parts: Prediction, Transform, Quantization

Prediction Transform QuantizationInput block

Entropy

Coding

Prediction: Generate block prediction by Motion Estimation.Transform: Convert the difference between the prediction and true value into coefficients by integer transform.Quantization: Quantize the coefficients.

Transmit

Page 4: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

DCT(Discrete Cosine Transform)Commonly used in block transform coding of image and video, e.g. JPEG and MPEG.

Definition for 8x8 block:

Convert image from spatial domain to frequency domain

Page 5: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

In H.264, 4x4 block transform is adopted

Problem: Coefficients are irrational numbers. In digital computer, when you do an inverse transform after forward transform of an input, It may not get the same input back.

Page 6: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Solution: Integer Transform An integer approximation of DCT.

Original H.264 design: {a=13, b=7, c=17}

Problem: increase of dynamic range.

If max(X(i,j))=A, then max(Y(u,v))= A x (13x4)^2 = 2704 x A.Log2(2704) = 11.4, So it needs 12 more bits to encode Y(u,v) than X(i,j)

Page 7: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Choose {a=1, b=2, c=1}1.Rows are orthogonal to each other.2.The dynamic range gain is log2(6^2) = 5.173.Although the norm of each row is different,

it can be easily compensated in quantization part.

No noticeable performance penalty while reducing the dynamic range gain and simplicity.

Page 8: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Inverse transform

We could just use the transpose of H. However, in order to minimize the dynamic range gain, we scale the rows that has element 2 in H’ by ½. So it becomes,

Dynamic range gain = log2(4^2) = 4 bits.

Also, the factor ½ can be realized by right shift 1 bit, so no multiplication needed.

Page 9: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

Forward transform Inverse transform

Page 10: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

It is the step that introduces signal loss for better compression.

Encoder quantization is given by

where controls the quantization width near the origin.

The decoder produces reverse quantization by

ssq QQfjiXjiXsignjiX /))(|),()}(|,({),(

)( sQf

( , ) ( , )r s qX i j Q X i j

Page 11: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

There must be as low complexity as possible since the H.264 uses predictive coding which means that the error will tend to drift over the entire set for each prediction.

Memory requirements are very high for 32-bit operations hence the arithmetic must be as close to 16-bit as possible.

There must be no undue stress on the hardware yet keeping the prediction drift error free.

Page 12: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

The disadvantage of the quantizing equation is that it divides by an integer .

In the H.264 format the quantization is of the form

The inverse quantization is given by

The values A(Q) and B(Q) are obtained from the quantization tables.

( , ) { ( , )}[(| ( , ) | ( ) 2 ) ]LqX i j sign X i j X i j A Q f L

( , ) ( , ) ( )r qX i j X i j B Q1( 2 )T N

r rx H X e N

Page 13: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

In the previous equation

And Q varies from 0 to Qmax. Hence 0 is the finest and Qmax is the coarsest

quantization.

Care must be taken during shifting the bits right since repeated division means tending towards negative infinity and not 0.

In the original H.264 design, L=N=20.

1111T

e

Page 14: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

The values A(Q) and B(Q) must satisfy the form where G is the squared norm of the rows of H.

The values of L & N are chosen on a compromise. Larger values reduce approximation error in the above equation and smaller values reduce dynamic range.

2( ) ( ) 2L NA Q B Q G

Page 15: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

The complexity of quantization formulae are reduced considerably by reducing them to 16 bits.

However, this reduction must be traded off with no reduction in PSNR.

This is done by effectively reducing values of B(Q), L & N.

B(Q) effectively doubles for an increase of 6 in Q making it a linear relationship between PSNR and Q.

This makes it easier to design quantization and reconstruction tables.

Page 16: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

The H.264 hence uses the modified quantization and reconstruction formulae

Where

The mod operator makes the quantization factor periodic making it easy to define a large range of parameters without increasing memory requirements

17( , ) { ( , )}[(| ( , ) | ( , , ) 2 ) (17 )

( , ) ( , ) ( , , )

EQq M E

r q M E

X i j sign X i j X i j A Q i j f Q

X i j X i j B Q i j Q

mod6

/ 6

M

E

Q Q

Q Q

Page 17: Henrique S. Malvar, Fellow, IEEE, Antti Hallapuro, Marta Karczewicz, and Louis Kerofsky, Member, IEEE

The matrices shown denote values of A(Q) and B(Q) such that the matrices maximise dynamic range.

These ensure that results always fall within a 16 bit result.