18
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir Computer Science and Engineering The Pennsylvania State University University Park, PA 16802, USA

Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Embed Size (px)

Citation preview

Page 1: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Exploiting Program Hotspots and Code Sequentiality for Instruction

Cache Leakage Management

J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. KandemirComputer Science and EngineeringThe Pennsylvania State University

University Park, PA 16802, USA

Page 2: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Outline

Motivation Hotspots based leakage

management Just-in-time activation Experimental results conclusions

Page 3: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Motivation Leakage is projected to account for 70% of the

cache power budget in 70nm technology Instruction caches are much more sensitive

(performance impact) to leakage control mechanisms

Objective: exploiting application dynamic characteristics for effective instruction cache leakage management

Two major factors that shape instruction access behavior: Hotspot execution Code sequentiality

Exploit program hotspots for further leakage savings

Exploit code sequentiality for performance penalty reduction

Page 4: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Hotspot based Leakage Management

Management cache leakage in an application-sensitive manner

Track both spatial and temporal locality of instruction cache accesses

Observations: Program execution occurs in phases Instructions in a phase do not need to be clustered

Program phases produce hotspots in the cache Hotspot based leakage management (HSLM)

Prevent hotspots from inadvertent turn-off Detect phase change for early turn-off

Page 5: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Dynamic Hotspot Protection

Original Scheme

HSLM Scheme

Global Set PC PCI-Cachemask

Page 6: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

HSLM Detecting Phase ChangesGlobal SetI-Cache

Global SetI-Cachemask

Original Scheme

HSLM Scheme

Drowsy Window

Global Set

New hotspots detected

Drowsy window expires

Page 7: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Hotspot based Leakage Management

Tracking program hotspots through the branch predictor (BTB)

Augmenting BTB entries to maintain the access history of each basic block

Adding a voltage control mask bit for each instruction cache line Set mask bit indicates a hotspot cache line

HSLM performs two main functions: Keep hotspot cache lines from being turned off

by the Global Set counter Track phase changes and allow early turn-off

of cache lines not in a newly detected hotspot

Page 8: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Augmented Cache Microarchitecture

to prevent accessing drowsy lines

word line

word

lin

e d

rivers

row

decod

er

Reset

GlobalSet

!Q Q

0.3V(drowsy)

1V(active) word line

power line

SRAMs

wordline gate

Set: drowsyReset: active

VCM

Set

Global Set

Page 9: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

BTB Microarchitecture for HSLM

Branch Target Buffer

PC

Branch taken

BTB hit

1 target_addr 1 0000 0 0111

vbit tgt_cnt fth_cnt

ICache

Leakage ControlCircuitry

VCM

GlobalMask Bit

1 0 word

lin

es

Global Reset

Page 10: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Just-In-Time Activation (JITA)

Sequentiality is the norm in the execution of many applications (e.g., >80% static sequential code, 50% branches are not taken -> 90%)

Take advantage of the sequential access pattern to do just-in-time activation The status of the next cache line is checked

and pre-activated if possible In sequential access, performance penalty due

to activation of drowsy lines is eliminated JITA may fail:

target of taken branch is beyond next cache line or next instruction is beyond current bank

Page 11: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Just-In-Time Activation (JITA)

Activate

I-CachePC PC

Activate

PC

Access

PC

Access

PC

AccessPreactivate

PC

PAPA

Activate

PC

Access

I-CachePC

PA

ActivatePreactivate

Original Scheme

JITA

Page 12: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Augmented Cache Microarchitecture

to prevent accessing drowsy lines

word line

word

lin

e d

rivers

row

decod

er

Reset

GlobalSet

!Q Q

0.3V(drowsy)

1V(active) word line

power line

SRAMs

wordline gate

Set: drowsyReset: active

VCM

Set

Global SetPreactivate

Preactivate

Page 13: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Leakage Energy Breakdown (overhead)

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Base Drowsy-

Bank

Loop DHS DHS-PA DHS-Bank-

PA

Tota

l Leakage E

nerg

y B

reakdow

n (

J) Drowsy_Leakage Active_ leakage Processor+Dcache

Turn_on Extr_turn_on Counter

Page 14: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Leakage Energy Reduction

-20%

-10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

vpr gap wupwise mesa AvgLeakage E

nergy R

eductio

nDrowsy-Bank Loop DHS DHS-PA DHS-Bank-PA

DHS-Bank-PA achieved a leakage energy reduction of 63% over Base, 49% over Drowsy-Bank,

and 29% over Loop (Compiler)

Page 15: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Energy Delay Product (EDP)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

vpr gap wupwise mesa Avg

Energ

y D

ela

y P

roduct

(j*s)

Base Drowsy-Bank Loop DHS DHS-PA DHS-Bank-PA

DHS-Bank-PA achieved the smallest EDP: 63% over Base, 48% over Drowsy-Bank,

and 38% over Loop

Page 16: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Effectiveness of JITA

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

vpr gap wupwise mesa Avg

Perf

orm

ance D

egra

dati

on

DHS DHS-PA

Applying JITA, DHS-PA removes 87.8% performance degradation of DHS

Page 17: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Conclusions

HSLM and JITA can be implemented with minimum hardware overhead

Energy model takes the energy overhead of introduced hardware and other processor components into account

Applying both HSLM and JITA helps Reduce leakage energy by 63% over Base, 49% over

Drowsy-Bank, and 29% over Loop Reduce the (leakage) energy*delay product by 63% over

Base, 48% over Drowsy-Bank, 38% over Loop Evaluation shows that application characteristics can be

exploited for effective leakage control

Page 18: Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir

Thank You!