BY AMRUTA KULKARNI STUDENT ID : 1000666836 UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video

BY AMRUTA KULKARNI STUDENT ID : 1000666836 UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video Coding

H.264/MPEG-4 Part 10 or AVC Is an advanced video compression standard, developed by ITU-T Video Coding Experts Group(VCEG) together with ISO/IEC Moving Picture Experts Group(MPEG). It is a widely used video codec in mobile applications, internet ( YouTube, flash players), set top box, DTV etc. A H.264 encoder converts the video into a compressed format(.264) and a decoder converts compressed video back into an uncompressed format.

How does H.264 codec work ? An H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed H.264 bit stream. The block diagram of the H.264 video encoder is shown in Fig 1. A decoder carries out a complementary process by decoding, inverse transform and reconstruction to output a decoded video sequence. The block diagram of the H.264 video decoder is shown in Fig 2.

H.264 encoder block diagram Fig. 1 H.264 Encoder block diagram[7]

H.264 decoder block diagram Motion Compensation Entropy Decoding Intra Prediction Intra/Inter Mode Selection Inverse Quantization & Inverse Transform Deblocking Filter + + Bitstream Input Video Output Picture Buffering Fig.2 H.264 decoder block diagram [2]

Profiles in H.264 The H.264 standard defines sets of capabilities, which are also referred to as Profiles, targeting specific classes of applications. Fig. 3. Different features are supported in different profiles depending on applications. Table 1. lists some profiles and there applications. ProfileApplications BaselineVideo conferencing, Videophone MainDigital Storage Media, Television Broadcasting HighStreaming Video ExtendedContent distribution Post processing Table 1. List of H.264 Profiles and applications [2]

Profiles in H.264[9] Fig. 3 Profiles in H. 264[9]

Prediction The H.264 encoder forms a prediction of the current macro block a) Based on the current frame using intra (spatial) prediction. b) Based on the previous frames that have already been coded using inter prediction.

Intra prediction Supports the following macro block sizes : a) 16x16 (for Luma) [9] Fig. 4 H.264 Intra 16x16 prediction modes.

Intra Prediction H.264 Intra 16x16 prediction modes are shown in Fig 4. Mode 0 (vertical): extrapolation from upper samples (H). Mode 1 (horizontal): extrapolation from left samples (V). Mode 2 (DC): mean of upper and left-hand samples (H+V). Mode 3 (Plane): a linear plane function is fitted to the upper and left-hand side samples H and V. This works well in areas of smoothly-varying luminance. b) 8x8 (for Chroma) DC, Plane, Horizontal and Vertical

Intra prediction c) 4x4 (for Luma) modes for intra prediction are shown if Fig.5 -[2] Fig 5. H.264 Intra 4x4 prediction modes[2]

Introduction In H.264/AVC intra-prediction is conducted for block sizes such as 4x4 luma blocks, 16x16 luma blocks, and 8x8 chroma blocks for baseline profile 4:2:0 format. The residual between the current block and its prediction is then transformed, quantized, and entropy coded. To obtain the best mode for these blocks, the H.264/AVC encoder performs the rate-distortion optimization (RDO) technique for each macro block.

RDO procedure for one macro block Set macro block parameters : QP(quantization parameter) and Lagrangian multiplier () Calculate : Mode = 0.85*2(QP-12)/3 Then calculate cost, which determines the best mode, Cost = D + MODE * R, D Distortion R - bit rate with given QP Lagrange multiplier Distortion (D) is obtained by SATD (Sum of Absolute Transform Differences) between the original macro block and its reconstructed block. Bit rate(R) includes the bits for the mode information and transform coefficients for macro block. Quantization parameter (QP) can vary from (0-51) Lagrange multiplier () a value representing the relationship between bit cost & quality.

RDO Procedure for one macro block 1. For a 4x4 luma block select the best mode, which minimizes the cost among 9 modes. 2. For a 16x16 luma block choose the mode with minimum SATD(Sum of absolute transformed differences) among the 4 modes as the best mode. 3. For an 8x8 chroma block choose the mode with minimum cost among 4 modes. 4. Choose the best one as the prediction mode of one macro block by comparing RD costs for 4x4 mode obtained from step (1) and 16x16 mode from step (2). 5. Choose the best one as the prediction mode of one chroma block.

RDO (Rate distortion optimization) Considering the RDO procedure for intra mode selection in H.264/AVC, the number of mode combinations in one macro block is N8x(16xN4 + N16) N8 number of modes of an 8x8 chroma block 4 modes N4 number of modes of an 4x4 luma block - 9 modes N16 number of modes of an 16x16 luma block 4 modes N8x(16xN4 + N16) = 4x(16x9 + 4) = 592 Thus, to select the best mode for one macro block in the intra prediction, the H.264/AVC encoder carries out 592 RDO calculations. As a result, the complexity of the encoder increases extremely.

SATD(sum of absolute transform differences) calculation Following steps are performed in JM 17.2 for calculation of SATD. a) Find the absolute difference (magnitude) between original 16x16 block and predicted 16x16 block. b) Apply Hadamard transform on every 4x4 block. c) Then take sum of every 4x4 block transform coefficients except the DC coefficient. d) Check if the sum_cost > max_cost. if yes return max_cost. e) If no, apply Hadamard transform on every 4x4 block and add sum of all 16 DC coefficients to sum_cost. f) Check if the sum_cost > max_cost if yes return max_cost else return sum_cost. where, sum_cost -> cost calculated for each block partition. max_cost -> maximum cost value allowed for each block size.

Introduction This project uses the baseline profile, as it provides simplicity in implementation. The important features are a) I and P slice coding b) Enhanced error resilience such as FMO (flexible macro block ordering) and arbitrary slice ordering(ASO) and redundant slices (RS) c) Context adaptive variable length coding (CAVLC) Baseline profile is primarily used for low-cost applications, for data loss robustness. The joint model (JM 17.2) [6] implementation of the H.264 encoder is used in this project.

Complexity reduction algorithm for intra prediction This project will implement the complexity reduction algorithm for all the 3 block sizes 1) 16x16 luma 2) 4x4 luma 3) 8x8 chroma

Algorithm The proposed intra mode selection algorithm for a 16x16 luma block is summarized as follows: Step 1 - Examine sizes of adjacent blocks: if both blocks (upper block and left block) are 16x16, go to Step 2, otherwise go to Step 4. Step 2 - Examine modes of adjacent blocks: if both modes are same, go to Step 3, otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD (sum of absolute transformed differences) between two adjacent modes of modeA and modeB. Step 3 - If both adjacent modes are DC mode, go to Step 4, otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD between the adjacent mode and DC mode.

Algorithm Modes of upper and left blocks used as candidate modes[8] Indices used in 4x4 luma block[8]

Yes No

Algorithm Step 4 - Let V be a vertical difference between upper boundary pixels of the current block and boundary pixels of the upper block, and H be a horizontal difference between left boundary pixels of the current block and boundary pixels of the left block as follows. V = |u(i)-q(i)| for i =0 to 15. H = |l(i)-r(i)| for i =0 to 15. Where u(i) -> upper block boundary pixels q(i) -> upper boundary pixels of current block l(i) -> boundary pixels of the left block r(i) -> left boundary pixels of the current block T 2 -> Threshold level 2 (T 2 = 32)

Algorithm Obtain candidate modes as follows by using two difference values, V and H: if |V H | is smaller than 2T 2, candidate modes are DC mode and plane mode; if (V H) is larger than T 2, candidate modes are DC mode and horizontal mode; if (V H) is smaller than T 2, candidate modes are DC and vertical modes, where T 2 is a positive value. T 2 is the threshold value equal to 32. The calculation for V and H is shown in Fig. 8. Finally, select the best mode between each candidate mode by choosing the mode with minimum SATD.

Algorithm Fig. 6 Calculation of V & H [8]

Comparison of number of RDO computations Intra block sizesNumber of modes - Original JM 17.2 Implementation Number of modes - Complexity reduction algorithm 16 x 16 4 2 8 x 8 4 2 4 x 4 9 4 After implementing the complexity reduction algorithms the number of mode decisions for every macro block size in intra prediction is reduced. Table 2 lists reduction in mode decisions. Table 2. Reduction in Number of mode decisions for Intra block prediction. [8]

CIF and QCIF sequences. CIF (Common Intermediate Format) is a format used to standardize the horizontal and vertical resolutions in pixels of Y, C b, C r sequences in video signals, commonly used in video teleconferencing systems. QCIF means "Quarter CIF". To have one fourth of the area as "quarter" implies the height and width of the frame are halved. The differences in Y, C b, C r of CIF and QCIF are as shown below in fig.6. [16] Fig.6 CIF and QCIF resolutions( Y, C b, C r ).

Results The following QCIF and CIF sequences were used to test the complexity reduction algorithm. [10] Akiyo Foreman Car phone Hall monitor Silent News Container Coastguard

Test Sequences News Foreman Akiyo CoastguardCar phone Bus

Test Sequences Hall monitor Silent

Results The complexity reduction algorithm was successfully implemented on the JM 17.2 software. Profile used : Baseline Total number of frames : 100 (I frames only) QP (quantization parameter) : 28 Table 1 shows the simulation results for QCIF sequences. Table 2 shows the simulation results for CIF sequences. Computational efficiency is measured by the amount of time reduction, which is computed as follows: Time is calculated as :

Results MSE is calculated as: Bit rate is calculated as: PSNR is calculated as :

Results Sequence (QCIF) Time (%) PSNR(dB) Bit rate (%) MSE Akiyo-10.2030.0143.590.033 Foreman-10.942-0.0042.03-0.012 Car phone-9.768-0.0024.33-0.012 Hall monitor-10.8260.0022.780.011 Silent-10.669-0.0022.98-0.007 News-10.5660.0041.810.080 Container-9.1070.0081.33-0.019 Coastguard-10.629-0.0212.72-0.082 *Negative values indicate the gain (e.g. decrease in encoding time) *Positive values indicate the loss (e.g. increase in the bit rate) Table 1 : Test results for QCIF sequences

Results Sequence (CIF) Time (%) PSNR(dB) Bit rate (%) MSE Bus-10.459-0.0065.370.111 Container-10.4950.0503.93-0.027 Coastguard-10.287-0.0012.72-0.020 *Negative values indicate the gain (e.g. decrease in encoding time) *Positive values indicate the loss (e.g. increase in the bit rate) Table 2 : Test results for CIF sequences

References: 1. ITU-T Rec. H.264 | ISO/IEC 14496-10: Information Technology Coding of Audio-visual Objects, Part 10: Advanced Video Coding 2002. 2. T.Wiegand, et al,: Overview of the H.264/AVC Video Coding Standard. IEEE Trans. Circuits and Syst. for Video Technol., Vol. 13, pp. 560-576, July. 2003. 3. Z.Chen, et al,: Fast Integer Pixel and Fractional Pixel Motion Estimation for JVT. Doc. #JVT-F017,Dec. 2002. 4. B.Hsieh, et al,:Fast Motion Estimation for H.264/MPEG-4 AVC by Using Multiple Reference Frame Skipping Criteria. VCIP 2003,Proceedings of SPIE, Vol. 5150, pp. 1551-1560, Oct. 2003. 5. A.Puri et al, Video coding using the H.264/MPEG-4 AVC compression standard, Signal Processing: Image Communication, vol.19, pp. 793-849, Oct. 2004.

References 6. H.264/AVC JM software: http://iphome.hhi.de/suehring/tml/http://iphome.hhi.de/suehring/tml/ 7. An overview of the H.264 encoder: www.vcodex.com 8. J. Kim, et al, Complexity reduction algorithm for intra mode selection in H.264/AVC video coding J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 465, Springer-Verlag Berlin Heidelberg, 2006. 9. I. Richardson, The H.264 advanced video compression standard second edition, Wiley, 2010. 10. YUV test video sequences : http://trace.eas.asu.edu/yuv/

Documents

BY AMRUTA KULKARNI STUDENT ID : 1000666836 UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video