30
A Hardware-based Cache A Hardware-based Cache Pollution Filtering Pollution Filtering Mechanism for Mechanism for Aggressive Prefetches Aggressive Prefetches Georgia Institute of Technology Georgia Institute of Technology Atlanta, GA 30332 Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan, 2003 Xiaotong Zhuang Xiaotong Zhuang Hsien- Hsin Sean Lee College of Computing College of Computing School of School of Electrical and Electrical and Computer Computer Engineering Engineering

A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

Embed Size (px)

Citation preview

Page 1: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

A Hardware-based Cache A Hardware-based Cache Pollution Filtering Pollution Filtering Mechanism for Mechanism for Aggressive PrefetchesAggressive Prefetches

Georgia Institute of TechnologyGeorgia Institute of TechnologyAtlanta, GA 30332Atlanta, GA 30332

ICPP, Kaohsiung, Taiwan, 2003

Xiaotong ZhuangXiaotong Zhuang Hsien-Hsin Sean

LeeCollege of ComputingCollege of Computing School of Electrical andSchool of Electrical andComputer EngineeringComputer Engineering

Page 2: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

2ICPP-03

AgendaAgenda

IntroductionIntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion

Page 3: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

3ICPP-03

AgendaAgenda

IntroductionIntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion

Page 4: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

4ICPP-03

Data PrefetchingData PrefetchingWhyWhy data prefetching data prefetching??

Speed gap between CPU and main memory Initial data references still miss Performance suffers if no enough independent instructions

to mask the latencyPrefetching techniquesPrefetching techniques

Hardware-based Software-based

Design Trend Design Trend Memory bandwidth increase more aggressive prefetch L1 cache is getting smaller for expediting accesses

When When prefetchingprefetching becomes “ becomes “tootoo aggressive”aggressive” Severe pollution Performance overkill

Page 5: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

5ICPP-03

Cache PollutionCache PollutionSource of pollutionSource of pollution

No prefetching guarantees 100% accuracy HW-based prefetching can cause a lot of pollution Stride-based prefetching can easily become ineffective for

pointer-based applications

OutcomesOutcomes of pollution of pollution Evict useful data Compete for available resources

Limited size of cache capacity Cache ports Bus bandwidth between components of memory hiearchy

Degrade performance

Page 6: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

6ICPP-03

Related WorkRelated WorkPrefetch bufferPrefetch buffer [Chen et al. ‘91] [Chen & Baer ‘95]

Separate normal and prefetched data, access in parallel Small-size, fully-associative, in critical path

Evict-meEvict-me [Wang et al. ’02]

Reuse distance check, mark unused or distance too long Evict-me data have higher priority to be cast out

Dead cache line detection [Lai, Fide & Falsafi ’01]

Detect dead blocks and replace with useful prefetches Prevent useful data from being evicted

Prefetch taxonomy [Srinivasan et al. ‘99]

More detailed classification of prefetches Proposed “static filter”—profiling based pollution filtering

Page 7: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

7ICPP-03

Our ContributionOur ContributionCharacterization of prefetch effectivenessPropose and evaluate two hardware prefetch

pollution filtering mechanisms Per-Address (PA) based Program Counter (PC) based

Quantify our technique through simulation

Page 8: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

8ICPP-03

AgendaAgenda

IntroductionMotivationMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion

Page 9: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

9ICPP-03

Prefetch ClassificationPrefetch Classification

Prefetch classification Comprehensive classification is not desirable due

to its implementation complexity in hardware Good or effective— those referenced in the cache

before they are evicted Bad or ineffective — those never referenced

during their lifetime in the cache

Page 10: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

10ICPP-03

Prefetch EffectivenessPrefetch Effectiveness

11 benchmarks, HW prefetch—NSP, SDP, SW prefetchMore than 52% prefetches are bad!!

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1good prefetch bad prefetch

Norm

aliz

ed #

of

Pre

fetc

hes

Page 11: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

11ICPP-03

AgendaAgenda

IntroductionMotivationThe Prefetch Pollution FilterThe Prefetch Pollution FilterExperimental ResultsConclusion

Page 12: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

12ICPP-03

Cache Pollution FilterCache Pollution Filter

OOO Core

L1 Cache

LD

/ST

Q

ueue

L2 Cache

HardwarePrefetcher

Pre

fetc

h Q

ueue

Issu

e P

refe

tch

SW

Pre

fetc

hes

look

up

Prefetch Pollution FilterPrefetch Pollution Filter

History Tablearray of 2-bit counters

Hash

Upd

ate

Ld/st inst includ.SW prefetches

TAG

Reference Indication Bit (RIB)

Pre

fetc

h In

dica

tion

bit

(P

IB)

DATA

Page 13: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

13ICPP-03

Prefetch Pollution FiltersPrefetch Pollution FiltersPA-basedPA-based

Per-Address-based, track cache line addresses issued by each prefetch operation

Can distinguish different prefetch addresses by the same issuing instruction

Need longer history table to reduce aliasing

PC-basedPC-based Track the program counter that triggers a prefetch SW prefetch: PC of the prefetch instruction HW pretetch: the memory instruction that triggers the

prefetch Less aliasing, tolerate smaller history table, less precise

Page 14: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

14ICPP-03

AgendaAgenda

IntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion

Page 15: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

15ICPP-03

SimulationSimulation Configuration Configuration (Default)(Default)

Processor Caches

Target frequency 2GHz L1 I/D 8K, 32-byte lineDM, 1 cycle

Issue/retire width 8 per cycle

Reorder bufer 128 entries L1 D ports 3

Load/store queue 64 entries L2 I/D 512K 32-byte line4 way 15 cycle delay

Branch Predictor Bimodal with 2048 entries L2 I/D ports 1

BTB size 4096 sets, assoc=4 Prefetcher

Memory Queue Len 64 entries

Latency 150 core cycles Pollution Filter

Bus 64 byte wide Hist table 1KB, 4K entries

Page 16: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

16ICPP-03

BenchmarksBenchmarks and Miss Rates and Miss Rates

Benchmarks Input data sets L1 miss rate L2 miss rate

bh 2048 bodies 0.0464 0.0026

em3d 100 nodes 10 arity 10K iter 0.2161 0.0001

perimeter 12 Levels 0.0478 0.2709

ijpeg penguin.ppm 0.0565 0.0235

fpppp natoms.in 0.0807 0.0003

Gcc cp-decl.i 0.0551 0.0221

Wave5 wave5.in 0.1387 0.0209

Gap ref.in 0.0409 0.2247

Gzip input .graphic 0.0597 0.3176

Mcf inp.in 0.0648 0.2426

Page 17: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

17ICPP-03

Prefetch Reduction Prefetch Reduction Comparison Comparison ((Default Default ModelModel))

Normalized to the good one without filteringLoss of bad prefetches: 97%(PA) 98%(PC)Loss of good prefetches: 51%(PA) 48%(PC)Traffic reduction: 75%(PA) 74%(PC)

Norm

aliz

ed #

of

Pre

fetc

hes

0

0.5

1

1.5

2

2.5

3

3.5

bad(no filtering) bad(PA) bad(PC) good(no filtering) good(PA) good(PC)

Page 18: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

18ICPP-03

IPC IPC Comparison Comparison (Default Model)(Default Model)

Increase: 8.2%(PA) 9.1%(PC)

0

0.5

1

1.5

2

2.5

3

3.5no-filtering PA-based PC-based

IPC

Page 19: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

19ICPP-03

Prefetch Reduction Prefetch Reduction Comparison Comparison Comparison Comparison (32KB)(32KB)

00.20.40.60.8

11.21.41.6

bad(no filtering) bad(PA) bad(PC) good(no filtering) good(PA) good(PC)

Loss of bad prefetches: 91%(PA) 92%(PC)Loss of good prefetches: 35%(PA) 27%(PC)Traffic reduction: 52%(PA) 47%(PC)

Page 20: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

20ICPP-03

IPC IPC Comparison Comparison (32K Cache(32K Cache Model Model))

Increase: 7.0%(PA) 8.1%(PC)

0

0.5

1

1.5

2

2.5

3

3.5

4no-f iltering PA-based PC-based

IPC

Page 21: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

21ICPP-03

IPC for Different History Table IPC for Different History Table SizesSizes

Jump at 2k-4k, 6% <1% before & after

0

0.5

1

1.5

2

2.5

3

3.51K 2K 4K 8K 16K

IPC

Page 22: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

22ICPP-03

Bad/Good Prefetch Ratio for Bad/Good Prefetch Ratio for DDifferent ifferent ## of L1 Ports of L1 Ports

6% drop from 3-port to 4-port, 2% drop from 4-port to 5-port

Bad/G

ood P

refe

tch R

ati

o

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

13-port 4-port 5-port

Page 23: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

23ICPP-03

IPC for IPC for DifferentDifferent ## of L1 of L1 PortsPorts

4% speedup from 3-port to 4-port, <1% speedup from 4-port to 5-port

0

0.5

1

1.5

2

2.5

3

3.53-port 4-port 5-port

IPC

Page 24: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

24ICPP-03

Bad/Good Prefetch Ratio wBad/Good Prefetch Ratio w// Prefetch BufferPrefetch Buffer

Prefbuf, on critical path, very smallPrefbuf, no reduction in traffic, short lifetime for good prefetch

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6PA-based(no prefbuf) PA-based(prefbuf)

PC-based(no prefbuf) PC-based(prefbuf)

Page 25: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

25ICPP-03

IPC Comparison wIPC Comparison w// Prefetch Prefetch BufferBuffer

IPC Loss: 9% (PA) 10%(PC)

0

0.5

1

1.5

2

2.5

3

3.5

4PA-based(no prefbuf) PA-based(prefbuf)

PC-based(no prefbuf) PC-based(prefbuf)

IPC

Page 26: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

26ICPP-03

AgendaAgenda

IntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusionConclusion

Page 27: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

27ICPP-03

Conclusion Conclusion Too aggressive prefetching is an overkillLots of prefetches are ineffective

Cannot remove SW-induced prefetches without source code Have to live with HW-induced prefetches Need dynamic HW-based prefetch filtering schemes

We propose (1) Per-Address-based and (2) Program-Counter-based that can Filter out ~98% bad prefetches for 8KB L1 Filter out ~92% bad prefetches for 32KB L1 Most good prefetches are retained ~50%(8K L1) ~70%(32K L1)

Improvement Traffic reduced by ~75%(8K L1) ~50%(32K L1) Overall IPC improved by 7% to 9%

History table size can be reasonably smallImprovements decrease when more cache ports are addedIPC loses (9-10 %) with dedicated prefetch buffer for

aggressive prefetching

Page 28: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

28ICPP-03

That’s All Folks !That’s All Folks !Thanks Archbeer!Thanks Archbeer!

Page 29: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

29ICPP-03

Bad/Good Prefetch Bad/Good Prefetch Ratio Comparison Ratio Comparison ((Default ModelDefault Model))

Reduction: 70%(PA) 91%(PC)

0

0.5

1

1.5

2

2.5

3

3.5 no-filtering PA-based PC-based

Bad/G

ood P

refe

tch R

ati

o

Page 30: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,

30ICPP-03

Bad/Good Prefetch Bad/Good Prefetch Ratio Comparison Ratio Comparison (32KB)(32KB)

Reduction: 75%(PA) 93%(PC)

Bad

/Good

Pre

fetc

h R

ati

o

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6no-f iltering PA-based PC-based