37
CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation High Efficiency Video Coding Authors: Muhammad Shafique, Jörg Henkel

Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

CES – Chair for Embedded Systems

ces.itec.kit.edu

Low Power Design of the Next-Generation

High Efficiency Video Coding

Authors: Muhammad Shafique, Jörg Henkel

Page 2: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 2 Shafique @ ASPDAC, Jan. 2014

Outline

Introduction to the High Efficiency Video Coding (HEVC)

HEVC Analysis

complexity, memory access, thermal

Power-Efficient HEVC System Design

Conclusion

Page 3: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 3 Shafique @ ASPDAC, Jan. 2014

High Efficiency Video Coding (HEVC)

Ultra-HD (or supervision)

7680×4320 ≈ 33 million pixels per frame

By 2017: 80% ─ 90% global internet traffic

New video compression standards/techniques required

JCT-VC’s High Efficiency Video Coding (HEVC)

~2× compression efficiency compared to H.264

0

0.4

0.8

1.2

1.6

1 2 3

Time

Bitrate

(a)

Basketball Kimono PeopleOnStreet

(b)

No

rmal

ized

0

2E+11

4E+11

6E+11

8E+11

1E+12

1.2E+12

1.4E+12

1 2 3

HEVC

H.264/AVC

Mem

ory

BW

. [G

B/s

]

0.5

1.5

1

3

2

2.5

0

HD720 HD1080 2K

Full HD @ 30fps

1 second ≈ 712 Mbits

1 hour ≈ 2.4 Tbits

Page 4: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 4 Shafique @ ASPDAC, Jan. 2014

Challenges for Developing

HEVC-based Multimedia Systems

Compute

Complexity

Power

Efficiency

Thermal

Management

Content-Awareness,

HW-SW Collaboration,

Many-core Systems

Parallelization

Workload Balancing,

Arch.-Awareness,

Power Budgeting

Accelerator Design,

Content-Awareness,

Power- Gating

Thermal Analysis,

Configurations,

Content-Adaptive

Challenges & Requirements

Video Memory

Memory Hierarchy

Design,

Content-Aware

Page 5: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 5 Shafique @ ASPDAC, Jan. 2014

HEVC Overview: Encoding Flow

Intra Prediction

Inter Prediction

Recursive CU/PU

Size Reduction

‾ Transform and

Quatization

Inverse Transform and Quantization

Recursive TU

Size Reduction

+

Deblocking and SAO Filter

Decoded Picture Buffer

Output Reconstructed

Video

Bitstream Headers

CABAC Entropy Coder

Output Bitstream

Input Video in CTUs

Page 6: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 6 Shafique @ ASPDAC, Jan. 2014

HEVC Overview: Slices and Tiles

Slice 0

Slice 1

Slice 2

Slice 3

Tile 0 Tile 1 Tile 2

Tile 3 Tile 4 Tile 5

Core0

f0

F0 FM-1

T0 T1 TK-1

Core1

f1

CoreK-1

fK-1

TK-1

T0 T1

GOP0 F0GOP0

HEVC Parallel Encoding

Page 7: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 7 Shafique @ ASPDAC, Jan. 2014

HEVC Overview: Tree-Block Structure

0 1

2 3

4 5

6 7

8 9

10 11

12 13

1415 16

17 18

19 20

21

Example PU

Configuration

0 1

2 3

Tested TU

Configurations

...

Example CU

Configuration

CTU0 CTU1 CTU2

8×8

32×32

16×16

4×4

64×64

CTU

Page 8: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 8 Shafique @ ASPDAC, Jan. 2014

CTU Distribution

Page 9: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 9 Shafique @ ASPDAC, Jan. 2014

HEVC Overview: Intra and Inter Prediction

Vertical Angular Predictors

Ho

rizo

nta

l A

ng

ula

r P

red

icto

rs

0: Planar

1: DC

2log 2 2

02

M i

iiN

─ HEVC-Intra: ~2.56× more mode decisions than H.264

─ HEVC-Inter: ~2.2× more complex than H.264

2log 3 2

013 2

M i

i

64×64

32×32

32×32

16×16

16×16

HEVC Intra Prediction HEVC Inter Prediction

Page 10: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 10 Shafique @ ASPDAC, Jan. 2014

HEVC Overview: Motion Estimation

Block Matching (BM) or Motion Estimation (ME)

Compression by searching temporal neighbors

High energy/time, high compression efficiency (H.264-Inter, HEVC-Inter)

Search

Window Current Frame

Current

Block

Best

Matching

Motion

Vector

Previous Frame

Current Frame Reference Frame Residue Frame

Page 11: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 11 Shafique @ ASPDAC, Jan. 2014

HEVC Overview: Search Data Fetching

External

Memory

(DRAM)

On-Chip

Memory

(SRAM)

Block

Matching

External Memory Bus

Reference Frame

Current Frame Current

Block

Search

Window

High leakage

High dynamic

Very high

leakage High bus

power

A memory subsystem with low power

consumption and high efficiency is

required…

Page 12: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 12 Shafique @ ASPDAC, Jan. 2014

Outline

Introduction to the High Efficiency Video Coding (HEVC)

HEVC Analysis

complexity, memory access, thermal

Power-Efficient HEVC System Design

Conclusion

Page 13: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 13 Shafique @ ASPDAC, Jan. 2014

HEVC Analysis: Computational Complexity

CU/PU Partitioning

0

20

40

60

80

100

BasketballDrill832×480

ParkScene1920×1080

PeopleOnStreet2560×1600

64×64 32×32 16×16 8×8 4×4Pe

rcen

tage

Are

a

Low Variance Regions

High Variance Regions

Large partitions for low-variance and homogeneous image areas and vice-versa

Smooth texture (due to larger QP or resolution) is usually captured by

larger sized PUs

Early PU size prediction may provide

significant reduction in computational

and energy requirements

Page 14: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 14 Shafique @ ASPDAC, Jan. 2014

HEVC Analysis: CTU Distribution

Page 15: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 15 Shafique @ ASPDAC, Jan. 2014

HEVC Analysis: Memory Accesses

Memory Access for Motion Estimation

Memory accesses of HEVC ≈ 3.86× of H.264

Most of the on-chip memory is wasted (leakage power)

0%

25%

50%

75%

100%

H.264 HEVC

0

20

40

60

80

100

Keiba BasketballDrill RaceHorses KristenAndSara

(a)

Pe

rce

nta

ge

Uti

lizat

ion

Median75 %

25 %

Maximum

Minimum

(b)

─ Only a part of the full search window is utilized

─ Adapting the search window size at run-time provides

increased potential for leakage power savings

Page 16: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 16 Shafique @ ASPDAC, Jan. 2014

Using a thermal camera setup

Copyright: © Chair for Embedded Systems (CES),

Karlsruhe Institute of Technology (KIT), Germany

DIAS Pyroview thermal camera operates

at 50Hz with spatial resolution of 50 µm

Peltier Based

Cooling

IR Camera

CPU chip

Thermal map

Water-cooling unit to cool

down the thermoelectric device

Voltage

supply Linux Ubuntu

kernel A bottom view

Water heat sink

Thermoelectric

device

Copper plate

Thermal pad

Intel Atom 45nm dual-core processor

(1.8 GHz) Src: Intel

Page 17: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 17 Shafique @ ASPDAC, Jan. 2014

Temperature Measurements for HEVC

[RaceHorses@37QP vs. 22QP]

Temp max.: 53.0°C

Temp min.: 35.0 °C

Temp avg.: 49.0 °C

Temp max.: 55.0°C

Temp min.: 36.0 °C

Temp avg.: 53.0 °C

Copyright: © Chair for Embedded Systems (CES),

Karlsruhe Institute of Technology (KIT), Germany hevcDTM @ DATE‘14

Page 18: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 18 Shafique @ ASPDAC, Jan. 2014

HEVC Analysis: Temperature

40

45

50

55

60

1350 1400 1450 1500 1550

Tem

peratu

re ( C

)

Time (sec)

Keiba (1.8 GHz)

Basketball (1.35 GHz)

62 ºC

56 ºC

50 ºC

44 ºC

40

45

50

55

60

1000 1050 1100 1150

Tem

per

atu

re ( C

)

Time (sec)

Keiba Basketball

62 ºC

60 ºC

58 ºC

56 ºC

54 ºC

52 ºC

50 ºC

48 ºC

46 ºC

44 ºC

Frequency Dependence Content Dependence

So What is Required?

Interplay between Software and Hardware needs

to be explored for power/energy optimization

1. Optimized Algorithms for Fast Intra- and Inter-

Prediction

2. Energy-Efficient Hardware Accelerators

3. Energy-Efficient Video Memory Heirarchy

4. Content-Adaptive Power Management

Page 19: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 19 Shafique @ ASPDAC, Jan. 2014

Outline

Introduction to the High Efficiency Video Coding (HEVC)

HEVC Analysis

complexity, memory access, thermal

Power-Efficient HEVC System Design

Conclusion

Page 20: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 20 Shafique @ ASPDAC, Jan. 2014

Power Efficient HEVC Design:

Hardware Architecture

Video Tile

Formation

Data Analysis

and Statistics

Adaptive

Workload

Budgeting

Complexity

Reduction

Scheme

Energy to

Quality

TradeoffApplication

Driven

Adaptive

Power/

Thermal

Manager

HEVC

Encoding

Intra/Inter

HEVC Software Layer

CTR

CTR

CTR

CTR

CTR

CTR

CTR

CTR

CTR

Battery

HEVC Hardware

Processing

Architecture

Feedback

Monitors to

Software

...

...

...Off-chip

DRAM

Page 21: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 21 Shafique @ ASPDAC, Jan. 2014

Analysis and Statistics

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 20 40 60 80 100 120 140 160 180 200

Fre

qu

en

cy

Variance

8x816x1632x3264x64

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000

Fre

qu

en

cy

Distortion

8x816x1632x3264x64

PD

F

Variance

0 20 40 60 140 160 180 20080 100 120 101 100 1000 10000 100000

Distortion

PD

F

Parameter Value SAD SSE SATD Kbps

Max. CU

Depth

4 771 263 51 3001.7

3 659.15 229.08 42.1 3320.9

2 372.92 153.37 28.6 3363.1

Search

Range

64 771 263 51 3001.7

32 553.92 263.26 50.9 3080.44

16 472.37 262.78 52.82 3738.83

AMP 1 771 263 51 3001.7

0 665.74 237.1 44.27 3072.92

Variance and Motion based Classification

Page 22: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 22 Shafique @ ASPDAC, Jan. 2014

Complexity Reduction: PU Size Estimation

CTU variance computation at 4×4

Recursive 4 neighbors merge

PU Map (PUM)

PU Map Above (PUMA)

HEVC CTU Compressor

1

2

0

1

1

n

i x

i

v xn

, {1,2,3,4}

, {1,2,3,4} OR

c i i

c Th i i Th

v CombineVariances v

if v v v v

MergeBlocks

2

2

log4 1ln

1 220

Th v

Hv QP

Rayleigh CDF

Analysis

Empirical

Analysis

µv = Mean of variance curve

Δ = CDF threshold (0.8)

H = Size of PU to combine

Page 23: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 23 Shafique @ ASPDAC, Jan. 2014

Time Savings and Video Quality Results

Sequence Class Size BD-PSNR BD-Rate

Traffic A 4K -0.03048 0.5611

BasketballDrive B 1080p -0.04966 3.1834

BasketballDrill C WVGA -0.05175 1.0846

BQSquare D WQVGA -0.03365 0.3802

RaceHorses D WQVGA -0.03009 0.4482

Johnny E 720p -0.08711 2.1241

BasketballDrillText F WVGA -0.05827 1.1123

0

0.2

0.4

0.6

0.8

1

1.2

Traffic BasketballDrive BasketballDrill BQSquare Johnny

RDO Proposed

No

rma

lize

d T

ime

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Traffic BasketballDrive BasketballDrill BQSquare RaceHorses JohnnyTraffic BasketballDrive

BasketballDrill

BQSquare RaceHorses Johnny BasketballDrillText

39.544.3

37.8 37.042.1

57.4

39.5

Page 24: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 24 Shafique @ ASPDAC, Jan. 2014

Tile Mapping and Parallelization

Output

Tile Formation and Maximum Workload Estimator

Workload(per core)

Frequency (per CPU)

Offline Tuning

Video Input

Cores Frame Rate fpCPUs Max freq. fmax

Core0

User bit-rate tolerance n

Workload Manager

Intra Mode Prediction

Core Frequency Selector

Workload Allocator

Monitoring Unit

Threshold Generator

Workload Adaptation

Total Intra Angles (θ)

Core1

Core2 Core3

CoreK-2 CoreK-1

...

50

60

70

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Tile 0 Tile 1

Tile 3 Tile 4

Tim

e [

mse

c]

Frame

Workload is not equal for tiles

Page 25: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 25 Shafique @ ASPDAC, Jan. 2014

HEVC Thermal Management

Application-Driven DTM

HEVC

Encoder

Application

driven DTM

Core0

Sensor

Core1

Sensor

Extract Motion

Intensity

Frequency

scaling

NO

Execute HM

YES

Tcurrent > Tcritical

Page 26: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 26 Shafique @ ASPDAC, Jan. 2014

HEVC Thermal Management

40

45

50

55

60

No DTM DTM 54ºC DTM 50ºC DTM 46ºC

Tem

per

atu

re (

ºC)

Max Average Min

0

50

100

150

PSNR (dB) Bit rate (kbps)

No DTM Our 54°C Our 50°C Our 46°C

56 ºC

54 ºC

52 ºC

50 ºC

48 ºC

46 ºC

44 ºC

42 ºC

40 ºC

38 ºC

Page 27: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 27 Shafique @ ASPDAC, Jan. 2014

HEVC Thermal Management

Keiba BasketballDrill

40

45

50

55

60

0 10 20

Pea

k t

emp

eratu

re (

ºC)

# Frames

No DTM DTM 54ºC

DTM 50ºC DTM 46ºC40

45

50

55

60

0 10 20

Pea

k t

emp

eratu

re (

ºC)

# Frames

No DTM DTM 54ºC

DTM 50ºC DTM 46ºC

Page 28: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 28 Shafique @ ASPDAC, Jan. 2014

CTR

CTR

CTR

CTR

CTR

CTR

CTR

CTR

CTR

Battery

HEVC Hardware

Processing

Architecture

Feedback

Monitors to

Software

...

...

...

Power Efficient HEVC Design:

Hardware Architecture

Page 29: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 29 Shafique @ ASPDAC, Jan. 2014

Hardware Accelerators

HW0 HW2N/8

CTU Row

8 PPC

M

HW1 HW2Intra Predictor

0

200

400

600

800

1000

1 2 3 4 5 6

Slice LUTs (luma)

Slice LUTs (chroma)

Slice registers (luma)

Slice registers (chroma)

Occupied Slices (luma)

Occupied Slices (chroma)

Legend:

Number of datapaths in parallel

0

200

400

600

800

1000

1 2 3 4 5 6

Slice LUTs (luma)

Slice LUTs (chroma)

Slice registers (luma)

Slice registers (chroma)

Occupied Slices (luma)

Occupied Slices (chroma)

Legend:

Number of datapaths in parallel

Page 30: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 30 Shafique @ ASPDAC, Jan. 2014

AMBER: Memory Subsystem

External Memory holds the current frame

High density, low read and write power

On-chip SRAM memory (FIFO) holds only the current block

High read and write speed and low dynamic write power

Hides latencies from HEVC engine

SRAM Block FIFO

Power Control

External Memory

(Current Frame)

External Memory

Controller

On-chip Current Data (Block) SRAM

HEVC Video Compression Control

Block Matching

Engine

++++

+

++

HEV

C E

nco

de

r (T

ran

sfo

rm lo

op

)

MRAM Buffers(N Reference Frames) Reference

Write Master

Current Read Master- Reads current frame data

- Writes SRAM Buffer -Low write amount

-Fast Write

Reference Read Master- Reads reference frames

- Low latency read

Page 31: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 31 Shafique @ ASPDAC, Jan. 2014

AMBER: MRAM Reference Buffers

One MRAM buffer holds a full reference frame

Each column (sector) of reference buffer is power-gated

Reference read and write masters read and write data to

the MRAM buffer

WW

H Reference F1 H Reference FN

MRAM Power Gate Control

Reference Read Master

Ro

w B

uff

er

SRAM FIFO

Re

fere

nce

Wri

te

Mas

ter

Block MatchingHEVC Encoder

MRAM Reference Buffers

Page 32: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 32 Shafique @ ASPDAC, Jan. 2014

AMBER: Reference Buffer Power Management

Observation: Not all of the search window is used

Block matching algorithm accesses only a small percentage of

reference buffer sectors

Power-gate unused sectors

Reduce leakage

Block Matching

Turned OFF

Turned ON

s1 s2

1 x min

2 x max

s CU s

s CU sPrediction of Unused Sectors is based on:

1. Self-Organizing Map

2. Content Properties

Page 33: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 33 Shafique @ ASPDAC, Jan. 2014

0

0.5

1

1.5

2

Search Window

AMBER

Power Consumption (4 reference frames) P

ow

er

[W]

Keiba China

Speed

Four

People

Basketball

Drive

People

129×129 193×193 257×257

Keiba ChinaSpeed FourPeople BasketballDrive People

832x480 1024x768 1280x720 1920x1080 2560x1600

─ Increasing the number of reference frames improves the

power consumption of the AMBER system compared to the

search window approach

Page 34: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 34 Shafique @ ASPDAC, Jan. 2014

Conclusion

Comprehensive analysis of HEVC

Architecture, power, thermal and complexity

Challenges posed by HEVC

Architectural (memory, reconfiguration, accelerators)

Power/thermal (power-gating, configuration control)

Complexity (parallelization, many-core, workload balancing)

Both Hardware and Software need to be optimized while

leveraging the application-specific knowledge

Our approach

Adaptive complexity management

Video tiling, workload budgeting, CU/PU partitioning

Power and thermal aware HEVC configuration

Hybrid video memory hierarchy with content-driven power-gating

Page 35: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 35 Shafique @ ASPDAC, Jan. 2014

ces265: Multi-threaded HEVC Encoder

Open-source

C++ based

Multithreading via

pthread API

One thread of ces265

≈ 13.2× faster than

HM-9.2

Web

http://ces.itec.kit.edu/ces265/

Download

https://sourceforge.net/projects/ces265/

HEVC-Intra Encoder’s Top

Workload Allocator

GOP Compressor

Tile Compressor Threads

Slice Compressor

WorkloadQueue

Tile Formation and Workload

Curtailing

YUV ReadWrite

CTU Compressor

Workload Manager

Encoder Statistics

Sniper many-core x86 simulator

Simulator statistics McPAT power simulator Power statistics

Proposed HEVC Intra Encoder

SystemConfiguration

Page 36: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 36 Shafique @ ASPDAC, Jan. 2014

Acknowledgement

Muhammad

Usman Karim

Khan

Daniel

Palomino

Claudio M.

Diniz

Felipe

Sampaio

Page 37: Low Power Design of the Next-Generation High Efficiency Video Coding · 2014-03-12 · CES – Chair for Embedded Systems ces.itec.kit.edu Low Power Design of the Next-Generation

ces.itec.kit.edu 37 Shafique @ ASPDAC, Jan. 2014

Thank you!

Questions?

Web: http://ces.itec.kit.edu/ces265/

Download: https://sourceforge.net/projects/ces265/