Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

Preview:

DESCRIPTION

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture. G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July 7-9. Motivation. Source of complexity on high-performance VLIW processors : hardware duplication - PowerPoint PPT Presentation

Citation preview

1

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

G. Pokam, F. BodinCPC 2004

Chiemsee, Germany, July 7-9

2

Motivation Source of complexity on high-

performance VLIW processors:

hardware duplication many FUs of different types (ALUs, LSUs, FPUs, BR, etc.) need large register file

Power growth factor IPCPower ~

compiler

architecturecomplexity

3

Motivation Assume a fixed ; does compiling

for higher ILP results in dissipating less power ?

Which issues (architecture, software, etc.) affect power when compiling for ILP ?Try to figure out what happens analytically !

4

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

5

Metric Performance to energy ratio (PTE)

[Gonzales, R. et al.]

: nb. of oper. per Basic Block : average nb. of oper. per bundle : energy per Basic Block

EDelayEnergy BBBBBB

NIPCenergy

eperformancPTE

1

N

IPC

EBB

higher is better

6

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

7

Energy Model The execution of a bundle dissipates

an energy :

Consider loop intensive kernels …

wnEPB nw

EEEIPCEEPB misssopwcw qlpmnn

Energybase cost

Energy due toexecution of bundle

Energy due toD-cache misses

Energy due toI-cache misses

EEEIPCEEPB misssopwcw qlpmnn

8

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

9

Analysis Use as a lever for power

exploration

Assume R is a CFG region to be transformed into an ILP region H

a sufficient condition for this is given by

PTE

PTEPTE RH

10

Analysis Idea:

keep track of IPC values that improve energy efficiency

solve the PTE inequality at :

: avg. #oper. in transformed region : avg. #oper. in the CFG region R

IPC IPCIPC RH rmILPtransfo

IPCH

IPCR

11

EnNfnfN opHHHRRRmC

Analysis

IPCIPC

IPCR

RR CB

ArmILPtransfo

where

EsNEnNf sHHCHHHA

EsNEnfN sRRCRRRmB • f : exec. freq.• N : # of oper.• n : # of bundles• s : # stall due to dmiss • m : #of BB in region

C is a measure of extra work!

Shape of ILPtransform function depends on sign of C

12

vs. IPCH IPCR

C < 0: •exponential shape means high extra work!•dependence height mismatch•resource contention

C = 0• linear shape•negligible extra work

C > 0•Optimal scenario•Logarithmic shape

e.g. Hyperblock:Compensation code

e.g. Hyperblock:Instruction merging

13

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

14

Hyperblock framework predication model via the select instruction

slct dest = cond, src1, src2

only hammock regions are considered

single entry – single exit hyperblock

15

Transformation heuristic

1. build the loop tree2. traverse the loop tree from innermost to

outermost loop3. evaluate profit for each candidate loop region4. propagate profit to CFG after transformation

PTEPTEPTE

original

originaldtransformeprofit

16

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

17

Platform Lx Platform from STMicroelectronics

4-issue VLIW machine 64 GPRs, 8 CBRs 4 ALUs, 1 LD/ST, 2 MULs, 1 BU

Instruction-based energy model from STMicroelectronics

Lx compiler prefetch disabled only scalar optimizations (-O2)

18

Methodology Post-pass optimization

absciss

SALTOLx Compiler

.s file

.s file

Instrumentation:•BB frequency•Dmiss per BB

• Hyperblock formation • Hyperblock optimization

• instr. promotion• instr. merging• instr. renaming

source

phase 1

phase 2

• original CFG• selective hyperblock• all hyperblock

19

Results

negligible IPCimprovement

relative larger increase of operation count andstatic schedule length

?

20

Agenda Motivation Used metrics Energy model Tradeoff analysis Hyperblock example Experiments Conclusions

21

Conclusions Analytical scheme to understand the impact of ILP

compilation on energy Heuristic shows 17% energy-delay improvement on a

restricted hyperblock scheme programs suffer from limited ILP which quickly turns into

wasted energy need to go beyond compiler-centric approaches in order to

overcome ILP limitations What is missing:

impact of post-optimization passes has not been determined only a restricted hyperblock scheme has been evaluate

22

Thanks!

Recommended