36
Back to the Future: Leveraging Belady’s Algorithm for Improved Cache Replacement Akanksha Jain Calvin Lin 1

Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Back to the Future: Leveraging Belady’s Algorithm for Improved Cache ReplacementAkanksha JainCalvin Lin

1

Page 2: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Cache Replacement

•  On a cache miss, which line should we evict? – Well-studied problem

•  Belady provided an optimal solution in 1966

2

Page 3: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Belady’s Optimal Algorithm (OPT)

•  Evict the line that is accessed farthest in the future –  Impractical

3

Page 4: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Existing Solutions•  Rely on heuristics that make assumptions about

the underlying access pattern – LRU assumes recency-friendly accesses – MRU assumes thrashing accesses – Recent solutions use more sophisticated heuristics

•  Problem: Heuristics don’t work well when their assumptions don’t hold

4

Page 5: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

* = AMedium-term Reuse

CShort-term Reuse

BLong-term Reuse

Example: Matrix Multiplication

5 LRU MRU DRRIP SDBP SHiP OPT

Hit

Rate

Page 6: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

LRU MRU DRRIP SDBP SHiP OPT

Hit

Rate

B (Long-term) A (Medium-term) C (Short-term)

Example: Matrix Multiplication

6

* = AMedium-term Reuse

CShort-term Reuse

BLong-term Reuse

Page 7: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Significant Headroom

0

5

10

15

20

25

30

DIP

DR

RIP

DSB

SDB

P

SHiP

MPK

I Red

uctio

n ov

er L

RU (%

) Cache Replacement for SPEC

Opt

imal

2007 2010 2010 2010 2011 7

We are going to use OPT

Page 8: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Our Solution: Key Idea•  We cannot look into the future •  But we can apply the OPT algorithm to past events to

learn how OPT behaves •  If past predicts the future, then this solution should

approach OPT

time

Future Past Behavior

What should I evict? 8

Predictor

Page 9: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Complication

•  OPT looks arbitrarily far into the future – We might need an arbitrarily long history

9

Page 10: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

0

10

20

30

40

50

60

70

1× 2× 4× 8× 16×

Erro

r com

pare

d to

Infin

ite O

PT (

%)

View of the Future (number of cache accesses)

Belady

LRU

How far in the future does OPT need to look?

10

History length not unbounded, but we need to track past 8× cache accesses

Page 11: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Our Solution

•  Hawkeye Cache Replacement (Hawkeye) – Hawks can see 8× farther than the best humans

11

Page 12: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Hawkeye: Challenges•  Need to look at a long history (8× the cache size)

•  Need to efficiently compute the OPT solution for a long history of past references

•  New algorithm called OPTgen – Online (linear time) –  Sampling (12KB overhead for 2MB cache)

12

Page 13: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen PC-based Predictor

13

Hawkeye: Overall Design

Last Level Cache

Computes OPT’s decisions for the

past

Remembers past OPT decisions

Cache Access Stream

OPT hit/miss

Insertion Priority

PC

Page 14: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen PC-based Predictor

Address: X

14

Hawkeye: Overall Design

With OPT, would X be a cache hit or miss?

Hit/Miss

Training

Last Level Cache

X

PC

Page 15: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen PC-based Predictor

Insert X with high or low

priority

Last Level Cache15

Hawkeye: Overall Design

Does this PC tend to load cache-friendly or cache-averse lines?

Address: XPredictionPC

Page 16: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen•  Simple online algorithm that reproduces OPT’s

solution for the past

•  Inspired from Belady’s insight – Lines that are reused first have higher priority

•  OPTgen also gives higher priority to lines that are reused first

16

Page 17: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen•  For each line, we ask if this line would have been a

hit or miss with OPT.

A B C C E D D A

time Future Past Behavior 17

Cache Capacity = 2

D

C

A

Page 18: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen•  For each line, we ask if this line would have been a

hit or miss with OPT.

time Future Past Behavior 18

Hit

A B C C E D D ACache Capacity = 2

D

C

A

Page 19: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen•  For each line, we ask if this line would have been a

hit or miss with OPT.

19

Miss

D

time Future Past Behavior

A B C C E D D A BCache Capacity = 2

D

C

A

B

Page 20: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen is equivalent to OPT•  100% accuracy with unlimited history

– 95.5% accuracy with 8× history (for SPEC 2006)

20

Page 21: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen Algorithm: Insight 1•  OPT hit/miss can be determined at the time of reuse

21

Don’t need information about future accesses because they will have lower priority than A

A B C C E D D A B

D

C

time Future Past Behavior

A

Page 22: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen Algorithm: Insight 2•  OPT solution can be reproduced by tracking

occupancy rather than cache contents

22

Page 23: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen Algorithm: Insight 2

23

B

D

A B C C E D D A BCache Capacity = 2

D

C

1 1 2 1 1 2 1

time Future Past Behavior

A

Page 24: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen Algorithm: Insight 3

•  OPT considers both reuse distances and their overlap

24

E

D

A B C C E D D A ECache Capacity = 2

D

C

time Future Past Behavior

E’s reuse distance is smaller than A, but it misses with OPT because it has higher overlap

A

Page 25: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Sampling for OPTgen

•  8× history for a 2MB cache –  100K entries (> 0.5MB)

•  Set Dueling [Qureshi et al., ISCA 07] –  Sample the behavior of a

few cache sets –  64 sampled sets

•  Set Dueling with OPTgen –  Apply OPTgen to 64

sampled sets –  2.5K entries (12KB) –  95% accuracy in

estimating miss rate

25

Page 26: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

OPTgen" PC-based Predictor"

26

Hawkeye: Overall Design

Last Level Cache"

95% accurate

Cache Access Stream

OPT hit/miss

Insertion Priority

Page 27: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Evaluation•  Cache Replacement Championship Simulator (CRC) •  Benchmarks: Memory-intensive SPEC 2006 •  Hardware Configurations

–  Single Core: Last-level Cache: 16-way, 2MB LLC – Multi-core: Shared Cache: 16-way, 4MB & 8MB LLC

•  Replacement policies – Baseline: LRU – DRRIP [2010] –  SDBP [2011] –  SHiP [2012]

27

Page 28: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

-30 -20 -10

0 10 20 30 40 50 60

asta

r gr

omac

s go

bmk

lbm

om

netp

p ca

lcul

ix

gcc

lesl

ie

zeus

lib

q bz

ip

h264

to

nto

hmm

er

xala

nc

mcf

so

plex

ge

ms

cact

us

sphi

nx3

Mea

n

Mis

s Red

uctio

n ov

er L

RU

(%)

DRRIP SHIP SDBP

28

Miss Reduction

Page 29: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

-30 -20 -10

0 10 20 30 40 50 60

asta

r gr

omac

s go

bmk

lbm

om

netp

p ca

lcul

ix

gcc

lesl

ie

zeus

lib

q bz

ip

h264

to

nto

hmm

er

xala

nc

mcf

so

plex

ge

ms

cact

us

sphi

nx3

Mea

n

Mis

s Red

uctio

n ov

er L

RU

(%)

DRRIP SHIP SDBP Hawkeye

17.4%

Hawkeye outperforms DRRIP, SDBP and SHiP 29

Miss Reduction

Page 30: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

-30 -20 -10

0 10 20 30 40 50 60

asta

r gr

omac

s go

bmk

lbm

om

netp

p ca

lcul

ix

gcc

lesl

ie

zeus

lib

q bz

ip

h264

to

nto

hmm

er

xala

nc

mcf

so

plex

ge

ms

cact

us

sphi

nx3

Mea

n

Mis

s Red

uctio

n ov

er L

RU

(%)

DRRIP SHIP SDBP Hawkeye

17.4%

Hawkeye does not result in a slowdown for any benchmark 30

Miss Reduction

Page 31: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

-30 -20 -10

0 10 20 30 40 50 60

asta

r gr

omac

s go

bmk

lbm

om

netp

p ca

lcul

ix

gcc

lesl

ie

zeus

lib

q bz

ip

h264

to

nto

hmm

er

xala

nc

mcf

so

plex

ge

ms

cact

us

sphi

nx3

Mea

n

Mis

s Red

uctio

n ov

er L

RU

(%)

DRRIP SHIP SDBP Hawkeye

17.4%

Hawkeye’s performance gains are consistent across benchmarks 31

Miss Reduction

Page 32: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

-30 -20 -10

0 10 20 30 40 50 60 70

asta

r gr

omac

s go

bmk

lbm

om

netp

p ca

lcul

ix

gcc

lesl

ie

zeus

lib

q bz

ip

h264

to

nto

hmm

er

xala

nc

mcf

so

plex

ge

ms

cact

us

sphi

nx3

Mea

n

Mis

s Red

uctio

n ov

er L

RU

(%)

DRRIP SHIP SDBP Hawkeye OPT

32

Miss Reduction

Page 33: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

-20

-10

0

10

20

30

40

50

asta

r"gr

omac

s"go

bmk"

lbm"

omne

tpp"

calc

ulix"

gcc"

lesl

ie"

zeus"

libq"

bzip

2"h2

64"

tont

o"hm

mer"

xala

n"m

cf"

sopl

ex"

gem

s"ca

ctus"

sphi

nx"

Geo

mea

n"

Spee

dup

(%)"

DRRIP SHiP SDBP Hawkeye

8.4%

33

Speedup

Page 34: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Multi-Core Results

0"

5"

10"

15"

20"

25"

30"

35"

1" 2" 4"

MPK

I Red

uctio

n ov

er L

RU

(%)"

Number of Cores"

SDBP"

SHiP"

Hawkeye"

Speedup = 8.4%

Speedup = 13.5%

Speedup = 15.0%

Averaged across 100s of multi-programmed SPEC runs 34

Page 35: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Hawkeye: Summary

•  New goal: learn from the OPT solution

•  Not limited to specific access patterns

•  Models both reuse distance and demand 0

5

10

15

20

25

30

DIP

DR

RIP

DSB

SDB

P

SHiP

OPT

Haw

keye

MPK

I Red

uctio

n ov

er L

RU (%

)

‘07 ‘10 ‘10 ‘10 ‘11 2015

35

Page 36: Back to the Future: Leveraging Belady’s Algorithm for ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/2A-1.pdfConclusions and Future Work • Recent trend to view cache replacement

Conclusions and Future Work•  Recent trend to view cache replacement as a

prediction problem – Learn from the past to predict the future

•  Hawkeye learns from an oracle

•  Future Work: More sophisticated predictors to learn the OPT solution for the past

36