47
Performance and Computational Complexity Assessment of High-Efficiency Video Encoders A project proposal on Performance and Computational Complexity Assessment of High- Efficiency Video Encoders Under the guidance of Dr. K. R. Rao For the fulfillment of the course Multimedia Processing (EE5359) Spring 2016 Submitted by Manu Rajendra Sheelvant MANU RAJENDRA SHEELVANT 1 1001112728

Performance and Computational Complexity … · Web viewThe objective of this project is to study about the computational complexity and encoding performance of the HEVC encoder

  • Upload
    letuyen

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

A project proposal on

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Under the guidance of Dr. K. R. Rao

For the fulfillment of the course Multimedia Processing (EE5359)Spring 2016

Submitted by

Manu Rajendra Sheelvant

UTA ID: 1001112728

Email id: [email protected]

Department of Electrical Engineering

The University of Texas at Arlington

MANU RAJENDRA SHEELVANT 1 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Table of ContentsList of Acronyms and Abbreviations……………………………………………………………...3Abstract…………………………………………………………………………………................5

1. Objective of the Project……………………………………………………………………………62. H.265/High Efficiency Video Coding(HEVC)...………………………………………………….7

2.1 Introduction……………………………………………………………………..................72.2 Encoder and Decoder…………………………………………………………...................72.3 Features of HEVC……………………………………………………………....................8

2.3.1 Coding tree units and coding tree block (CTB) structure…………………....……92.3.2 Coding Units (CUs) and coding blocks (CBs)…………………………………...102.3.3 Prediction units and prediction blocks (PBs)…………………………………….112.3.4 Transform Units(TUs) and transform blocks…………………………..………...112.3.5 Motion Vector Signaling……………………………………………....................112.3.6 Motion Compensation………………………………………………....................112.3.7 Intra-picture prediction………………………………………………………...…122.3.8 Quantization Control…………………………………………………………..…132.3.9 Entropy Coding………………………………………………………………...…132.3.10 In-loop de-blocking filtering…………………………………………………...…132.3.11Sample Adaptive Offset (SAO)………………………..………………………...…14

3. Recent works on complex analysis…………………………………………………………...…...153.1 Reference Code Analysis…………………………………………………………...……..153.2 Reported Runtime…………………………………………………………………...…….153.3 Time-Stamp Counter………………………………………………………………………153.4 Energy Consumption Analysis and Estimation……………………………………………163.5 Software Profilers………………………………………………………………………….16

4. HEV Encoders……………………………………………………………………………………..17 5. Experimental Setup………………………………………………………………………………..18 6. References…………………………………………………………………………………………19

MANU RAJENDRA SHEELVANT 2 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

List of Acronyms and Abbreviations

ALF: Adaptive Loop Filter.

AVC: Advanced Video Coding.

CABAC: Context Adaptive Binary Arithmetic Coding

CTB: Coding Tree Block.

CTU: Coding Tree Unit.

CU: Coding Unit.

CB: Coding Block.

DBF: De-blocking Filter.

DCT: Discrete Cosine Transform.

DST: Discrete Sine Transform.

DVFS: Dynamic Voltage and Frequency Scaling.

GBD-PSNR: Generalized Bjontegaard Delta – PSNR.

HEVC: High Efficiency Video Coding.

HEVC-DM: HEVC Direct Mode.

HEVC-LM: HEVC Linear Mode.

HM: HEVC Test Model.

HP: Hierarchical Prediction.

JCT: Joint Collaborative Team.

JCT-VC: Joint Collaborative Team on Video Coding.

JM: H.264 Test Model.

JPEG: Joint Photographic Experts Group.

LCU: Largest Coding Units.

MV: Motion Vector.

MC: Motion Compensation.

ME: Motion Estimation.

MPEG: Motion Picture Experts Group.

NSQT: Non-Square Quadtree.

PC: Prediction Chunking.

PU: Prediction Unit.

PB: Prediction Block.

PSNR: Power Signal-to-Noise Ratio.

MANU RAJENDRA SHEELVANT 3 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

QP: Quantization Parameter.

RDO: Rate Distortion Optimization.

RDTSC: Read Time Stamp Counter.

RQT: Residual Quadtree.

SAO: Sample Adaptive Offset.

TB: Transform Block.

TSC: Time Stamp Counter.

TU: Transform Unit.

VCEG: Video Coding Experts Group.

MANU RAJENDRA SHEELVANT 4 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Abstract

This project presents a performance evaluation study of coding efficiency versus computational

complexity for the forthcoming High Efficiency Video Coding (HEVC) [1] standard. A thorough

experimental investigation was carried out to identify the tools that most affect the encoding

efficiency and computational complexity of the HEVC encoder. A set of 16 different encoding

configurations was created to investigate the impact of each tool, varying the encoding parameter set

and comparing the results with a baseline encoder. This paper shows that, even though the

computational complexity increases monotonically from the baseline to the most complex

configuration, the encoding efficiency saturates at some point [2]. Moreover, the results of this paper

provide relevant information for implementation of complexity-constrained encoders by taking into

account the tradeoff between complexity and coding efficiency. It is shown that low-complexity

encoding configurations, defined by careful selection of coding tools, achieve coding efficiency

comparable to that of high-complexity configurations.

MANU RAJENDRA SHEELVANT 5 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

1. Objective of this project:- The objective of this project is to study about the computational complexity and encoding

performance of the HEVC encoder. Different cases of encoding configurations will be tested on wide

variety of video contents and test sequences.

The relation/trade-off between HEVC complexity and coding efficiency cost will be studied.

Different types of coding tool will be combined and left out and then check the how the HEVC

complexity and coding efficiency cost will be effected. Then, it will derived to find the best

combination of configurations tools to maximize the encoding efficiency and minimize the HEVC

complexity.

Therefore it is also found what configuration tools are helpful for high HEVC encoding

efficiency and find what configuration tools have the minimum effect on HEVC encoding efficiency.

The simulation will be conducted using HM 16.4 software [3], with different video sequences

[4], configuration tools like Search Range, Hadamard ME, Fast Merge Decision, Deblocking Filter,

Sample Adaptive Offset, Adaptive Loop filter, Transform Skipping, and other tools.

MANU RAJENDRA SHEELVANT 6 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

2. H.265 / High Efficiency Video Coding

2.1 Introduction

High Efficiency Video Coding (HEVC) is an international standard for video compression developed

by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video

Coding Experts Group). The main goal of HEVC standard is to significantly improve compression

performance compared to existing standards (such as H.264/Advanced Video Coding [7]) in the

range of 50% bit rate reduction at similar visual quality [1].

HEVC is designed to address existing applications of H.264/MPEG-4 AVC and to focus on two key

issues: increased video resolution and increased use of parallel processing architectures [1]. It

primarily targets consumer applications as pixel formats are limited to 4:2:0 8-bit and 4:2:0 10-bit.

The next revision of the standard, will enable new use-cases with the support of additional pixel

formats such as 4:2:2 and 4:4:4 and bit depth higher than 10-bit [8], embedded bit-stream scalability

and 3D video [9].

2.2 Encoder and Decoder in HEVC

Source video, consisting of a sequence of video frames, is encoded or compressed by a video

encoder to create a compressed video bit stream. The compressed bit stream is stored or transmitted.

A video decoder decompresses the bit stream to create a sequence of decoded frames [10].

The video encoder performs the following steps:

Partitioning each picture into multiple units

Predicting each unit using inter or intra prediction, and subtracting the prediction from the unit

Transforming and quantizing the residual (the difference between the original picture unit and the

prediction)

Entropy encoding transform output, prediction information, mode information and headers

The video decoder performs the following steps:

Entropy decoding and extracting the elements of the coded sequence

Rescaling and inverting the transform stage

Predicting each unit and adding the prediction to the output of the inverse transform

Reconstructing a decoded video image

MANU RAJENDRA SHEELVANT 7 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

The Figure 1 represents the block diagram of HEVC CODEC [10] :

Figure 1: Block Diagram of HEVC CODEC [10].

2.3 Features of HEVC

The video coding layer of HEVC employs the same hybrid approach (inter-/intra-picture prediction

and 2-D transform coding) used in all video compression standards. Figure. 1 depicts the block

diagram of a hybrid video encoder, which can create a bitstream conforming to the HEVC standard.

Figure.5 shows the HEVC decoder block diagram. An encoding algorithm producing an HEVC

compliant bitstream would typically proceed as follows. Each picture is split into block-shaped

regions, with the exact block partitioning being conveyed to the decoder. The first picture of a video

sequence (and the first picture at each clean random access point in a video sequence) is coded using

only intra-picture prediction (that uses prediction of data spatially from region-to-region within the

same picture, but has no dependence on other pictures). For all remaining pictures of a sequence or

between random access points, inter-picture temporally predictive coding modes are typically used

for most blocks.

The encoding process for inter-picture prediction consists of choosing motion data

comprising the selected reference picture and Motion Vector (MV) to be applied for predicting the

samples of each block. The encoder and decoder generate identical inter-picture prediction signals by

applying motion compensation (MC) using the MV and mode decision data, which are transmitted as

side information. The residual signal of the intra- or inter-picture prediction, which is the difference

between the original block and its prediction, is transformed by a linear spatial transform. The

transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the

prediction information. The encoder duplicates the decoder processing loop (see gray-shaded boxes

MANU RAJENDRA SHEELVANT 8 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

in Figure.4) such that both will generate identical predictions for subsequent data. Therefore, the

quantized transform coefficients are constructed by inverse scaling and are then inverse transformed

to duplicate the decoded approximation of the residual signal. The residual is then added to the

prediction, and the result of that addition may then be fed into one or two loop filters to smooth out

artifacts induced by block-wise processing and quantization.

The final picture representation (that is a duplicate of the output of the decoder) is stored in a

decoded picture buffer to be used for the prediction of subsequent pictures. In general, the order of

encoding or decoding processing of pictures often differs from the order in which they arrive from

the source; necessitating a distinction between the decoding order (i.e., bitstream order) and the

output order (i.e., display order) for a decoder. Video material to be encoded by HEVC is generally

expected to be input as progressive scan imagery (either due to the source video originating in that

format or resulting from de-interlacing prior to encoding). No explicit coding features are present in

the HEVC design to support the use of interlaced scanning, as interlaced scanning is no longer used

for displays and is becoming substantially less common for distribution. However, a metadata syntax

has been provided in HEVC to allow an encoder to indicate that interlace-scanned video has been

sent by coding each field (i.e., the even or odd numbered lines of each video frame) of interlaced

video as a separate picture or that it has been sent by coding each interlaced frame as an HEVC

coded picture. This provides an efficient method of coding interlaced video without burdening

decoders with a need to support a special decoding process for it. In the following, the various

features involved in hybrid video coding using HEVC are highlighted as follows.

2.3.1 Coding tree units and coding tree block (CTB) structure: The core of the coding layer in

previous standards was the macroblock, containing a 16×16 block of luma samples and, in the usual

case of 4:2:0 color sampling, two corresponding 8×8 blocks of croma samples; whereas the

analogous structure in HEVC is the coding tree unit (CTU), which has a size selected by the encoder

and can be larger than a traditional macroblock. The CTU consists of a luma CTB and the

corresponding croma CTBs and syntax elements. The size L×L of a luma CTB can be chosen as L =

16, 32, or 64 samples, with the larger sizes typically enabling better compression. HEVC then

supports a partitioning of the CTBs into smaller blocks using a tree structure and quad tree-like

signalling. The partitioning of CTBs into CBs ranging from 64*64 down to 8*8 is shown in

Figure.2.

MANU RAJENDRA SHEELVANT 9 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Figure 2: 64*64 CTBs split into CBs [13]

2.3.2 Coding units (CUs) and coding blocks (CBs): The quad tree syntax of the CTU specifies the

size and positions of its luma and croma CBs. The root of the quadtree is associated with the CTU.

Hence, the size of the luma CTB is the largest supported size for a luma CB. The splitting of a CTU

into luma and croma CBs is signaled jointly. One luma CB and ordinarily two croma CBs, together

with associated syntax, form a coding unit (CU) as shown in Figure.3. A CTB may contain only one

CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction

units (PUs) and a tree of transform units (TUs).

Figure 3: CUs split into CBs [13]

MANU RAJENDRA SHEELVANT 10 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

2.3.3 Prediction units and prediction blocks (PBs): The decision whether to code a picture area

using interpicture or intrapicture prediction is made at the CU level. A PU partitioning structure has

its root at the CU level. Depending on the basic prediction-type decision, the luma and croma CBs

can then be further split in size and predicted from luma and croma prediction blocks (PBs) as shown

in Figure. 4. HEVC supports variable PB sizes from 64×64 down to 4×4 samples.

Figure 4: Partitioning of Prediction Blocks from Coding Blocks [13]

2.3.4 TUs and transform blocks: The prediction residual is coded using block transforms. A TU

tree structure has its root at the CU level. The luma CB residual may be identical to the luma

transform block (TB) or may be further split into smaller luma TBs as shown in Figure.5. The same

applies to the Croma TBs. Integer basis functions similar to those of a discrete cosine transform

(DCT) are defined for the square TB sizes 4×4, 8×8, 16×16, and 32×32. For the 4×4 transform of

luma intrapicture prediction residuals, an integer transform derived from a form of discrete sine

transform (DST) is alternatively specified.

Figure 5: Partitioning of Transform Blocks from Coding Blocks [13]

2.3.5 Motion vector signalling: Advanced motion vector prediction (AMVP) is used, including

derivation of several most probable candidates based on data from adjacent PBs and the reference

picture. A merge mode for MV coding can also be used, allowing the inheritance of MVs from

temporally or spatially neighboring PBs. Moreover, compared to H.264/MPEG-4 AVC [7],

improved skipped and direct motion inference is also specified.

2.3.6 Motion compensation: Quarter-sample precision is used for the MVs and 7-tap or 8-tap filters

are used for interpolation of fractional-sample positions (compared to six-tap filtering of half-sample

MANU RAJENDRA SHEELVANT 11 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

positions followed by linear interpolation for quarter-sample positions in H.264/MPEG-4 AVC).

Similar to H.264/MPEG-4 AVC, multiple reference pictures are used as shown in Figure.6. For each

PB, either one or two motion vectors can be transmitted, resulting either in uni-predictive or bi-

predictive coding, respectively. As in H.264/MPEG-4 AVC, a scaling and offset operation may be

applied to the prediction signal(s) in a manner known as weighted prediction.

Figure 6: Quadtree structure used for MVs [13]

Figure 7: Concept of multi-frame motion-compensated prediction [13]

2.3.7 Intra-picture prediction: The decoded boundary samples of adjacent blocks are used as

reference data for spatial prediction in regions where inter-picture prediction is not performed. Intra

picture prediction supports 33 directional modes (compared to eight such modes in H.264/MPEG-4

AVC[7]), plus planar (surface fitting) and DC (flat) prediction modes. The selected intra-picture

prediction modes are encoded by deriving most probable modes (e.g., prediction directions) based on

those of previously decoded neighboring PBs.

MANU RAJENDRA SHEELVANT 12 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Figure 8: Directional Prediction Modes in H.264 [13]

Figure 9: Modes and directional orientations for intra picture prediction in HEVC [13]

2.3.8 Quantization control: As in H.264/MPEG-4 AVC, uniform reconstruction quantization

(URQ) is used in HEVC, with quantization scaling matrices supported for the various transform

block sizes.

2.3.9 Entropy coding: Context adaptive binary arithmetic coding (CABAC) is used for entropy

coding. This is similar to the CABAC scheme in H.264/MPEG-4 AVC, but has undergone several

improvements to improve its throughput speed (especially for parallel-processing architectures) and

its compression performance, and to reduce its context memory requirements.

2.3.10 In-loop de-blocking filtering: A de-blocking filter similar to the one used in H.264/MPEG-4

AVC [7] is operated within the inter-picture prediction loop. However, the design is simplified in

regard to its decision-making and filtering processes, and is made more friendly to parallel

processing.

MANU RAJENDRA SHEELVANT 13 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

2.3.11 Sample adaptive offset (SAO): A nonlinear amplitude mapping is introduced within the

inter-picture prediction loop after the de-blocking filter. Its goal is to better reconstruct the original

signal amplitudes by using a look-up table that is described by a few additional parameters that can

be determined by histogram analysis at the encoder side.

MANU RAJENDRA SHEELVANT 14 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

3. Recent work on Complexity AnalysisSeveral works have been published in the last few years, addressing the problem of

complexity analysis of video encoders [18] and decoders [23]. Generally, these works evaluate the

relationship between compression efficiency and complexity, mainly focusing on computational

complexity (processing time or instruction-level profiling). Processing time has been largely used to

measure computational complexity of video coding systems [18], but other approaches such as

theoretical analysis [19], direct analysis of reference code [21], and power consumption evaluation

[18], have also been frequently applied to measure complexity.

3.1 Reference Code Analysis: Lehtoranta and Hamalainen [21] analyzed the reference code of an

MPEG-4 encoder, evaluating its processing modules by counting their equivalent number of RISC

like instructions. The overall complexity of the MPEG-4 encoder was then compared to that of

H.263 and H.264/AVC encoders. Chang-Guo et al. [32] also analyzed the C code implementation of

an MPEG decoder using the number of basic processor instructions to estimate the computational

complexity measured in MOPS. The code size and complex implementation structures of the

reference software in modern video encoders/decoders make this kind of direct analysis of computer

code extremely difficult and inaccurate, which encourages the use of other methods and tools.

3.2 Reported Runtime Other works measure the computational complexity simply based on the

encoding and decoding report generated by the reference software of each standard, which generally

includes processing time information. As an example of this approach, Xiang et al. [33] report on a

method called generalized Bjøntegaard delta PSNR (GBD-PSNR) for evaluating coding efficiency

that takes into consideration the bit rate, the image distortion, and the processing time. The

computational complexity for each of the main H.264/AVC encoding modules is also theoretically

estimated and experimentally evaluated in [19], considering all coding modes possibilities. The

experimental results in [19] and [33] are based on the runtime reported by the reference software.

Although this type of information provides a rough indication about encoding complexity, to support

valid conclusions an accurate analysis is necessary using a controlled test environment and specific

tools.

3.3 Time Stamp Counter The time stamp counter (TSC) is a register present in all x86 processors

since Pentium, which counts the number of clock cycles from the last reset to the current instant. The

instruction read time stamp counter (RDTSC) has been used in some works, such as [29] and [30], to

estimate the computational complexity of each module and also that of the overall encoder/decoder.

Until recently, the use of TSC has been accepted as an excellent way of getting high-resolution, low-

overhead processing time information. However, the use of TSC is not accurate in modern

MANU RAJENDRA SHEELVANT 15 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

processors that support out-of-order execution, since the RDTSC instruction may be executed earlier

or later than expected, resulting in a wrong clock cycle count.

3.4 Energy Consumption Analysis and Estimation In [18], the average power dissipation was

measured using the PTscalar thermal and power microarchitecture simulator [34]. In [24], the

complexity of the AVS encoder is analyzed taking into consideration the processing time in terms of

clock cycles for the Pentium platforms. In this case, the energy consumption is estimated by the Intel

Application Energy Toolkit. In [28], actual power dissipation data are extracted through direct

measurements in a dynamic voltage and frequency scaling (DVFS) ARM processor employed for

energy efficient video decoding.

3.5 Software Profilers Profiling tools have been used to automate the complexity analysis of video

codecs and generate more reliable results. In [22], the Atomium profiler was used to measure the

computational complexity of the main coding functions in the H.264/AVC video codec.

Comparisons between H.264/AVC and MPEG-4 Part 2 are also presented. Saponara et al. [23]

analyzed 18 possible configurations of the H.264/AVC encoder, comparing all of them in terms of

complexity versus PSNR in order to identify the best ones. The Atomium profiler was also used to

obtain the complexity data by executing the instrumented code with representative test stimuli.

The software profilers provide an accurate and easy-to-use method for measuring complexity

in video encoders. Each individual module can be evaluated separately, and batch processing can be

used to run several tests in a row, using different predefined configurations.

MANU RAJENDRA SHEELVANT 16 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

4. HEV EncodersThe reference software used in the development phase, called HEVC test model (HM),

provides a common framework for research and development [4]. Six mandatory configurations for

normalized experiments have been proposed by JCT-VC [37]. These are combinations of two

possible complexity or efficiency settings, defined as high efficiency 10 (HE10) and Main, 1 and

three basic coding or access settings, named as Intra Only, Random Access, and Low Delay. The

HE10 setting is used for high coding efficiency (and high complexity), whereas Main defines a

coding configuration with low computational complexity, which does not provide the best coding

efficiency. The Intra Only mode uses no temporal prediction, Random Access allows the use of

future frames as reference in temporal prediction, and Low Delay uses only previous frames as

reference.

One of the most important features introduced by HEVC is the use of quadtree-based frame

partitioning structures. In HEVC, each video slice is divided into a number of square blocks of equal

size called treeblocks [or largest coding units (LCUs)], which are used as the roots of each coding

quadtree (or coding tree). Each leaf of the coding tree is called a coding unit (CU) and its dimensions

can vary from 8 × 8 to 64 × 64 pixels (or up to the treeblock size), depending on the tree depth. For

each CU, intra- or inter-frame prediction can be independently applied. Each CU can be divided into

two or four prediction units (PU), which are predicted separately. When transform coding of the

prediction residue is used, each CU is assumed to be the root of another quadtree-based structure

called residual quadtree (RQT). Each leaf of the RQT is known as a transform unit (TU), which can

take on sizes from 4×4 to 32 ×32 pixels (or up to the CU size), including non-square formats (NSQT)

such as 4 × 16, 16 × 4, 8 × 32 and 32 × 8 for inter-predicted areas.

HE10 encoder configurations allow the use of asymmetric PUs in a CU (e.g., 32 × 8 PU and

32 × 24 PU into a single 32 × 32 CU), known as asymmetric motion partitions (AMP), and

nonsquare TUs (4×16, 16×4, 8×32 and 32×8), known as nonsquare transforms (NSQT), for inter-

predicted areas. On the other hand, the Main encoder configurations limit PUs to symmetric formats

and TUs to square sizes.

The encoding mode decision algorithm selects the best coding tree configuration, the best PU

division, and the best RQT structure based on exhaustive iterations of the ratedistortion optimization

(RDO) process, which considers every encoding possibility and compares all of them in terms of bit

rate and image quality. Therefore, both the data structures and the encoding process are responsible

for an enormous share of the overall computational complexity. A temporal analysis of

computational complexity incurred by the use of such encoding structures in HEVC and a

complexity control technique were recently presented in [38].

MANU RAJENDRA SHEELVANT 17 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Intra-frame prediction in HEVC is an extension of intraframe prediction in H.264/AVC,

supporting larger block sizes (from 4 × 4 to 64 × 64) and allowing more prediction directions: 16

different luma angular modes for 4 × 4 PUs, 2 luma angular modes for 64 × 64 PUs, and 33 luma

angular modes for the other PU sizes. Besides these angular modes, a DC prediction mode similar to

DC in H.264/AVC and a planar prediction mode designed to reconstruct smooth image segments

through bilinear interpolation from the PU borders may also be applied to any PU. The number of

chroma intra prediction modes was also increased to six in HEVC: direct mode (DM), linear mode

(LM), vertical (mode 0), horizontal (mode 1), DC (mode 2), and planar (mode 3). DM is used when

the texture directionality of luma and chroma is similar; LM is used when the sample intensities of

luma and chroma are highly correlated. In the first case, the same mode used for luma prediction is

used for chroma, whereas in the second case, the chroma components are predicted from

downsampled reconstructed luma values of the same PU.

Similarly to H.264/AVC, HEVC inter-frame prediction relies on block-based motion

compensation (MC). Motion estimation (ME) is based on advanced motion vector prediction, which

includes a motion vector competition scheme [39] along with a motion merge [40] technique. The

former defines a set of motion vector predictor candidates comprising motion vectors of spatially and

temporally neighboring PUs. The best motion vector predictor is selected from the set of candidates

through RD optimization. Motion merging is based on a similar principle, but in this case the motion

parameters are not explicitly transmitted. They are instead derived at the decoder based on the

prediction mode and merge mode parameters.

Transform and quantization are heavily based on H.264/AVC, even though larger block sizes

(ranging from 4×4 to 32×32) are supported. For inter-predicted PUs, a discrete cosine transform

(DCT) is applied to the prediction residue, whereas for intra-predicted PUs, the encoder can choose

between applying the DCT[20] or a mode-dependent discrete sine transform (DST)[20].

Regarding the entropy coding, the current HEVC draft supports only context-adaptive binary

arithmetic coding (CABAC), which was already used in H.264/AVC. Even though the CABAC

structure is basically the same as it was in H.264/AVC, HEVC aims at decreasing the computational

burden associated with binarization and context retrieving by reducing the number of syntax

elements to binarize and the number of contexts, and by increasing throughput by enabling parallel

processing of transform coefficients [41].

The HEVC deblocking filter (DF) is also similar to that used in H.264/AVC. However,

instead of performing deblocking at the boundaries of 4 × 4 blocks, the DF is applied to boundaries

belonging to CUs, PUs, and TUs larger than 4×4 pixels. Vertical and horizontal edges are filtered

according to a border filtering strength, which varies from 0 (no filtering) to 4 (maximum filtering

MANU RAJENDRA SHEELVANT 18 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

strength) and also depends on the border characteristics. In addition to the DF, HEVC employs two

new filters: the sample adaptive offset (SAO) and the adaptive loop filter (ALF). SAO processes the

entire picture as a hierarchical quadtree, classifying the reconstructed pixels into different categories

and then reducing distortion by adding an offset to the pixels in each category. This classification is

performed taking into consideration the pixel intensity and edge properties. ALF is used after the DF

and the SAO to reduce the distortion between the original and the reconstructed frame caused by

lossy compression. The filtering is performed by applying a 2-D Wiener filter[9] at CU level

depending on the difference between values of samples in the original and reconstructed frames.

MANU RAJENDRA SHEELVANT 19 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

5. Experimental SetupAs highlighted in the previous section, the HEVC encoder includes a large number of coding

tools, each one having a different contribution to the overall coding efficiency and complexity. The

operation of these tools and their functional modes are determined by encoding configuration

parameters, which may be set to several different values. The experimental study presented in this

article was carried out in two steps. Firstly, the coding tools that have stronger impact on both coding

efficiency and computational complexity were identified. To that end, the individual contribution of

each and every encoding tool to such performance metrics was experimentally evaluated. Secondly,

those tools identified in the first step and sorted according to encoding gain per complexity increase

were selected for further analysis, which consisted in evaluating the impact of each tool in a

cumulative enabling order. To conduct the experiments, a subset of 5 video sequences was selected

from the 24 sequences specified in the Common Test Conditions document from JCT-VC [37].

These sequences differ broadly from one another in terms of frame rate, bit depth, motion, and

texture characteristics as well as spatial resolution. Table 1 presents the name, frame count, frame

rate, and spatial resolution of each sequence.

The reference software used in all tests was the HEVC Model—version 16.9 (HM16.9) [4] in

release mode, compiled using Microsoft Visual Studio C++ Compiler. All tests were performed on a

personal computer based on Processor Intel(R) Core(TM) i3-2330M CPU @ 2.20GHz running on

Windows(TM)10 Home 64-bit Operating system.

Test Sequence Resolution Frame Rate Total No. of Frames Pace

BQ Mall 832x480 60 601 Medium

Kirsten and Sara 1280x720 60 600 Slow

Park Scene 1920x1080 24 240 Medium

People On Street 2560x1600 30 150 Slow

Race Horses 416x240 30 300 Medium

Table 1:- Test sequences used to implement the analysis.

MANU RAJENDRA SHEELVANT 20 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Figure 10:- BQMall_832x480_60.yuv

Figure 11:- KristenAndSara_1280x720_60.yuv (Zoom 1:2)

MANU RAJENDRA SHEELVANT 21 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Figure 12:- ParkScene_1920x1080_24.yuv (Zoom 1:4)

Figure 13:- PeopleOnStreet_2560x1600_30_crop.yuv (Zoom 1:4)

MANU RAJENDRA SHEELVANT 22 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

Figure 14 :- RaceHorses_416x240_30.yuv

MANU RAJENDRA SHEELVANT 23 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

6 . Comparison Metrics

6.1 Peak Signal to Noise RatioPeak signal-to-noise ratio (PSNR) [35] [36] is an expression for the ratio between the

maximum possible value (power) of a signal and the power of distorting noise that affects the quality

of its representation. Because many signals have a very wide dynamic range (ratio between the

largest and smallest possible values of a changeable quantity), the PSNR is usually expressed in

terms of the logarithmic decibel scale.

PSNR is most commonly used to measure the quality of reconstruction of lossy compression

codecs. The signal in this case is the original data, and the noise is the error introduced by

compression. When comparing compression codecs, PSNR is an approximation to human perception

of reconstruction quality. Although a higher PSNR generally indicates that the reconstruction is of

higher quality, in some cases it may not. One has to be extremely careful with the range of validity of

this metric; it is only conclusively valid when it is used to compare results from the same codec (or

codec type) and same content.

PSNR is defined via the mean squared error (MSE). Given a noise-free m x n monochrome

image I and its noisy approximation K, MSE is defined as:

The PSNR is defined as

Here, MAXI is the maximum possible pixel value of the image.

For test sequences in 4:2:0 color format, PSNR is computed as a weighted average of

luminance ( Y) and chrominance ( U,V ) components [14] as given below:

MANU RAJENDRA SHEELVANT 24 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

6.2 Bjontegaard Delta Bit-rate (BD-BR) and Bjontegaard Delta PSNR (BD-PSNR)

To objectively evaluate the coding efficiency of video codecs, Bjontegaard Delta PSNR (BD-

PSNR) was proposed. Based on the rate-distortion (R-D) curve fitting, BD-PSNR provides a good

evaluation of the R-D performance [37] [38].

BD metrics allow to compute the average gain in PSNR or the average per cent saving in Bit-

rate between two rate-distortion curves. However, BD-PSNR has a critical drawback: It does not

take the coding complexity into account [37].

6.3 Implementation Complexity

The computational time for various configuration profiles in HM-16.9 software will be

compared and this serves as an indication of implementation complexity.

MANU RAJENDRA SHEELVANT 25 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

7. Implementation

7.1 Configuration profiles used for comparison Test sequences in YUV format are encoded using the Random access profile of HM-16.9 and are compared using different configuration parameters used.

7.2 Parameters modified F - Number of frames to be encoded Fr - Frame rate Wdt - Width of the video sequence Hgt - Height of the video sequence Profile - encoder_randomaccess _main.

Command line parameters for using HM-16.9 encoder:

TAppEncoder [-h] [-c config.cfg] [--parameter=value]

Options:-h Prints parameter usage-c Defines configuration file to use. Multiple configuration files may be used.--parameter=value Assigns value to a given parameter.

7.3 Sample command line parameters for HM-16. 9 C:\Users\mann\Desktop\Spr_2016\Multimedia_Signal_Processing\HM_16.9\bin\vc2015\x64\

Release>TAppEncoder.exe –c C:\Users\mann\Desktop\Spr_2016\Multimedia_Signal_Processing\HM_16.9\

cfg\encoder_randomaccess_main.cfg –i C:\Users\mann\Desktop\Spr_2016\Multimedia_Signal_Processing\

HM_16.9\cfg\BQMall_832x480_60.yuv –hgt 480 –wdt 832 –f 300 –fr 60

7.4 Test Configurations used:

The Test configurations and outputs are being submitted on the another file in .xls format because it cannot

be portrayed in the .doc format.

8. Implementation Results:8.1 People On The Street:-

MANU RAJENDRA SHEELVANT 26 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1534.9

35

35.1

35.2

35.3

35.4

35.5

35.6

People on Street(YUV-PSNR)

Test Configurations

YUV-

PSN

R (in

dB)

Figure 14:- YUV-PSNR for People On Street for different configurations.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 158050

8100

8150

8200

8250

8300

8350

8400

8450

People on Street(Bit Rate)

Test Configurations

Bit R

ate

(in b

ps)

Figure 15:- Bit Rate for People On Street for different configurations.

MANU RAJENDRA SHEELVANT 27 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

5000

10000

15000

20000

25000

30000

35000

40000

People on Street- Time Taken(Complexity)

Test Configurations

Tim

e Ta

ken

(in se

cs)

Figure 16:- Encoding Complexity for People On Street for different configurations.

8.2 Race Horses:-

1 2 3 4 5 6 7 8 9 10 11 12 13 14 1532.5

32.55

32.6

32.65

32.7

32.75

32.8

32.85

32.9

32.95

Race Horses (YUV-PSNR)

Test Configurations

YUV-

PSNR

(in

bB)

Figure 17:- YUV-PSNR for Race Horses for different configurations.

MANU RAJENDRA SHEELVANT 28 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15300

302

304

306

308

310

312

314

316

318

Race Horses (Bit Rate)

Test Configurations

Bit R

ate

(in b

ps)

Figure 18:- Bit Rate for Race Horses for different configurations.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

200

400

600

800

1000

1200

1400

Race Horses-Time Taken (Complexity)

Test Configurations

Tim

e Ta

ken

(in se

cs)

Figure 19:- Encoding Complexity for Race Horses for different configurations.

MANU RAJENDRA SHEELVANT 29 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

9. References:[1] G.J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE

Transactions on Circuits and Systems for Video Technology(CSVT), Vol. 22, No. 12, pp. 1649-

1668, Dec. 2012.

[2] G. Correa et al, “Performance and computational complexity assessment of high efficiency video

encoders”, IEEE Trans. CSVT, vol. 22, pp.1899-1909, Dec. 2012.

[3] Video test sequences - https://media.xiph.org/video/derf/.

[4] HM 16.9 software - https://hevc.hhi.fraunhofer.de/trac/hevc/milestone/HM-16.9.

[5] I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”,

Wiley, 2002.

[6] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document

JCTVC-L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC),Mar.2013.

[7] D. Marpe et al, “The H.264/MPEG4 advanced video coding standard and its applications”, IEEE

Communications Magazine, Vol. 44, pp. 134-143, Aug. 2006.

[8] HEVC white paper-Ateme: http://www.ateme.com/an-introduction-to-uhdtv-and-hevc

[9] G.J. Sullivan et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE

Journal of selected topics in Signal Processing, Vol. 7, No. 6, pp. 1001-1016, Dec. 2013.

[10] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html

[11] C. Fogg, “Suggested figures for the HEVC specification”, ITU-T / ISO-IEC Document: JCTVC

J0292r1, July 2012.

[12] T. Wiegand et al, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on

Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, Jul. 2003.

[13] U.S.M. Dayananda, “Study and Performance comparison of HEVC and H.264 video codecs”

M.S. Thesis, EE Dept., UTA, Arlington, TX, Dec. 2011 available on

http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html

[14] M. Wein, “ High Efficiency Video Coding: Coding Tools and Specification”, Springer, 2014.

MANU RAJENDRA SHEELVANT 30 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

[15] T. Vermeir, “Use cases and requirements for lossless and screen content coding”, JCTVC-

M0172, 13th JCT-VC meeting, Incheon, KR, Apr. 2013.

[16] J. Sole et al, “Requirements for wireless display applications”, JCTVC-M0315, 13th JCT-VC

meeting, Incheon, KR, Apr. 2013.

[17] A. Gabriellini et al, “Combined Intra-Prediction for High-Efficiency Video Coding”, IEEE

Journal of selected topics in Signal Processing. Vol. 5, no. 7, pp. 1282-1289, Nov. 2011.

[18] L. Younghoon, K. Jungsoo, and K. Chong-Min, “Energy-aware video encoding for image

quality improvement in battery-operated surveillance camera,” IEEE Trans. Very Large Scale

Integration (VLSI) Syst., vol. 20, no. 2, pp. 310–318, Feb. 2012.

[19] W. Kim, J. You, and J. Jeong, “Complexity control strategy for real-time H.264/AVC encoder,”

IEEE Trans. Consumer Electron., vol. 56, no. 2, pp. 1137–1143, May 2010.

[20] D. N. Kwon, P. F. Driessen, A. Basso, and P. Agathoklis, “Performance and computational

complexity optimization in configurable hybrid video coding system,” IEEE Trans. Circuits Syst.

Video Technol., vol. 16, no. 1, pp. 31–42, Jan. 2006.

[21] O. Lehtoranta and T. D. Hamalainen, “Complexity analysis of spatially scalable MPEG-4

encoder,” in Proc. Int. Symp. System-on-Chip, 2003, pp. 57–60.

[22] J. Ostermann, et al., “Video coding with H.264/AVC: Tools, performance, and complexity,”

IEEE Circuits Syst. Mag., vol. 4, no. 1, pp. 7–28, Jan.–Mar. 2004.

[23] S. Saponara, K. Denolf, G. Lafruit, C. Blanch, and J. Bormans, “Performance and complexity

co-evaluation of the advanced video coding standard for cost-effective multimedia communications,”

EURASIP J. Appl. Signal Process., vol. 2004, pp. 220–235, Jan. 2004.

[24] P. Li, Y. Chen, and W. Ji, “Rate-distortion-complexity analysis on AVS encoder,” presented at

the 11th Pacific Rim Conf. Multimedia, Shanghai, China, 2011.

[25] A. Hallapuro, V. Lappalainen, and T. D. Hamalainen, “Performance analysis of low bit rate

H.26L video encoder,” presented at the Proc. IEEE Int. Conf. Acoustics, Speech, and Signal

Processing, vol. 2, pp. 1129-1132, May 2001.

MANU RAJENDRA SHEELVANT 31 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

[26] M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, “H.264/AVC baseline profile decoder

complexity analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 704–716, Jul.

2003.

[27] L. Szu-Wei and C. C. J. Kuo, “Complexity modeling of spatial and temporal compensations in

H.264/AVC decoding,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 5, pp. 706–720, May

2010.

[28] M. Zhan, H. Hao, and W. Yao, “On complexity modeling of H.264/AVC video decoding and its

application for energy efficient decoding,” IEEE Trans. Multimedia, vol. 13, no. 6, pp. 1240–1255,

Dec. 2011.

[29] V. Lappalainen, A. Hallapuro, and T. D. Hamalainen, “Complexity of optimized H.26L video

decoder implementation,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 717–725,

Jul. 2003.

[30] On Software Complexity, JCTVC-G757, ISO/IEC-JCT1/SC29/WG11, Geneva, Switzerland,

2011.

[31] HM Decoder Complexity Assessment on ARM, JCTVC-G262, ISO/IECJCT1/SC29/WG11,

Geneva, Switzerland, 2011.

[32] Z. Chang-Guo et al., “MPEG video decoding with the UltraSPARC visual instruction set,” in

Proc. Compcon ’95: Technol. Inform. Superhighway Dig. Papers, 1995, pp. 470–477.

[33] L. Xiang, M. Wien, and J. R. Ohm, “Rate-complexity-distortion optimization for hybrid video

coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 7, pp. 957–970, Jul. 2011.

[34] PTscalar V1.0 [Online]. Available: http://eda.ee.ucla.edu/PTscalar/

[35] OProfile: A System Profiler for Linux [Online]. Available: http://oprofile.sourceforge.net/news/

[36] PAPI Performance Application Programming Interface [Online]. Available:

http://icl.cs.utk.edu/papi/

[37] Common Test Conditions and Software Reference Configurations, JCTVC-I1100, ISO/IEC-

JCT1/SC29/WG11, Geneva, Switzerland, 2012.

[38] G. Correa, P. Assuncao, L. Agostini, and L. A. da Silva Cruz, “Complexity control of high

efficiency video encoders for power-constrained devices,” IEEE Trans. Consumer Electron., vol. 57,

no. 4, pp. 1866– 1874, Nov. 2011. MANU RAJENDRA SHEELVANT 32 1001112728

Performance and Computational Complexity Assessment of High-Efficiency Video Encoders

[39] G. Laroche, J. Jung, and B. Pesquet-Popescu, “RD optimized coding for motion vector predictor

selection,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 9, pp. 1247–1257, Sep. 2008.

[40] S. Oudin et al., “Block merging for quadtree-based video coding,” in Proc. IEEE Int. Conf.

Multimedia Expo, pp. 1–6, July 2011.

[41] V. Sze and M. Budagavi, “Parallelization of CABAC transform coefficient coding for HEVC,”

in Proc. PCS, pp. 509–512, 2012.

MANU RAJENDRA SHEELVANT 33 1001112728