49
The V-Way Cache: Demand-Based Associativity via Global Replacement Moinuddin K. Qureshi David Thompson Yale N. Patt Department of Electrical and Computer Engineering, The University of Texas at Austin. {moin, dave, patt}@hps.utexas.edu The International Symposium on Computer Architecture - 2005 1/23

The V-Way Cache: Demand-Based Associativity via Global

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache:

Demand-Based Associativity via Global Replacement

Moinuddin K. Qureshi David Thompson Yale N. Patt

Department of Electrical and Computer Engineering,

The University of Texas at Austin.

{moin, dave, patt}@hps.utexas.edu

The International Symposium on Computer Architecture - 2005 1/23

Page 2: The V-Way Cache: Demand-Based Associativity via Global

Introduction

Need for efficient management of secondary caches.

Ideal cache: fully associative with OPT replacement.

The International Symposium on Computer Architecture - 2005 2/23

Page 3: The V-Way Cache: Demand-Based Associativity via Global

Introduction

Need for efficient management of secondary caches.

Ideal cache: fully associative with OPT replacement.

0

10

20

30

40

50

Per

cent

age

redu

ctio

n in

mis

s ra

te 256kB 16-way + LRU256kB FA + LRU256kB FA + OPT512kB 8-way + LRU

The International Symposium on Computer Architecture - 2005 2/23

Page 4: The V-Way Cache: Demand-Based Associativity via Global

Fully Associative Caches: Cost v/s Benefit

Benefits

Conflict miss eliminationGlobal Replacement (finds the best victim)

Cost

Significant increase in the number of tag comparisonsIncreased access latencyIncreased power consumption

The International Symposium on Computer Architecture - 2005 3/23

Page 5: The V-Way Cache: Demand-Based Associativity via Global

Fully Associative Caches: Cost v/s Benefit

Benefits

Conflict miss eliminationGlobal Replacement (finds the best victim)

Cost

Significant increase in the number of tag comparisonsIncreased access latencyIncreased power consumption

Can we get the benefits of a fully associative cache without payingthe cost?

The International Symposium on Computer Architecture - 2005 3/23

Page 6: The V-Way Cache: Demand-Based Associativity via Global

Outline

Introduction

Example of Local and Global Replacement

The V-Way Cache

Evaluation

Related Work and Conclusion

The International Symposium on Computer Architecture - 2005 4/23

Page 7: The V-Way Cache: Demand-Based Associativity via Global

Example of Local Replacement

X = {x0, x1, x2, x3}

Y = {y0, y1, y2, y3}

SET A

SET B

x0 x1 x2 x3

y0 y1 y2 y3

WORKING SETdx0

dx1

dx2

dx3

dy0

dy1

dy2

dy3

DATA−STORETAG−STOREADDRESS

The International Symposium on Computer Architecture - 2005 5/23

Page 8: The V-Way Cache: Demand-Based Associativity via Global

Example of Local Replacement

SET A

SET B

x0 x1 x2 x3

y0 y1 y2 y3

WORKING SETdx0

dx1

dx2

dx3

dy0

dy1

dy2

dy3

DATA−STORETAG−STOREADDRESS

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

The International Symposium on Computer Architecture - 2005 5/23

Page 9: The V-Way Cache: Demand-Based Associativity via Global

Example of Local Replacement

SET A

SET B

x1 x2 x3

y0 y1 y2 y3

WORKING SET

dx1

dx2

dx3

dy0

dy1

dy2

dy3

DATA−STORETAG−STOREADDRESS

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

x4

dx4

The International Symposium on Computer Architecture - 2005 5/23

Page 10: The V-Way Cache: Demand-Based Associativity via Global

Example of Local Replacement

SET A

SET B

x1 x2 x3

y0 y1 y2 y3

WORKING SET

dx1

dx2

dx3

dy0

dy1

dy2

dy3

DATA−STORETAG−STOREADDRESS

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

x4

dx4

DORMANT WAY

THRASH

The International Symposium on Computer Architecture - 2005 5/23

Page 11: The V-Way Cache: Demand-Based Associativity via Global

Example of Local Replacement

SET A

SET B

x1 x2 x3

y0 y1 y2 y3

WORKING SET

dx1

dx2

dx3

dy0

dy1

dy2

dy3

DATA−STORETAG−STOREADDRESS

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

x4

dx4

DORMANT WAY

THRASH

Static partitioning of resources.

The International Symposium on Computer Architecture - 2005 5/23

Page 12: The V-Way Cache: Demand-Based Associativity via Global

Example of Global Replacement

Y = {y0, y1, y2, y3}

X = {x0, x1, x2, x3}

SET B1

SET A1

SET B0

SET A0 x0 x2

x1 x3

y0 y2

y1 y3

dy0

dx3

dx2

dy3

dy1

dx1

dy2

ADDRESS

dx0

DATA−STORETAG−STORE

WORKING SET

REDISTRIBUTED

The International Symposium on Computer Architecture - 2005 6/23

Page 13: The V-Way Cache: Demand-Based Associativity via Global

Example of Global Replacement

SET B1

SET A1

SET B0

SET A0 x0 x2

x1 x3

y0 y2

y1 y3

dy0

dx3

dx2

dy3

dy1

dx1

dy2

ADDRESS

dx0

DATA−STORETAG−STORE

WORKING SET

REDISTRIBUTED

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

The International Symposium on Computer Architecture - 2005 6/23

Page 14: The V-Way Cache: Demand-Based Associativity via Global

Example of Global Replacement

SET B1

SET A1

SET B0

SET A0 x0 x2

x1 x3

y0 y2

y1 y3

dy0

dx3

dx2

dy3

dy1

dx1

dy2

ADDRESS

dx0

DATA−STORETAG−STORE

WORKING SET

REDISTRIBUTED

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

The International Symposium on Computer Architecture - 2005 6/23

Page 15: The V-Way Cache: Demand-Based Associativity via Global

Example of Global Replacement

SET B1

SET A1

SET B0

SET A0 x0 x2

x1 x3

y0 y2

y1

dy0

dx3

dx2

dy1

dx1

dy2

ADDRESS

dx0

DATA−STORETAG−STORE

WORKING SET

REDISTRIBUTED

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

x4

dx4

The International Symposium on Computer Architecture - 2005 6/23

Page 16: The V-Way Cache: Demand-Based Associativity via Global

Example of Global Replacement

SET B1

SET A1

SET B0

SET A0 x0 x2

x1 x3

y0 y2

y1

dy0

dx3

dx2

dy1

dx1

dy2

ADDRESS

dx0

DATA−STORETAG−STORE

WORKING SET

REDISTRIBUTED

X = {x0, x1, x2, x3, x4}

Y = {y0, y1, y2}

x4

dx4

Dynamic sharing of resources!!

The International Symposium on Computer Architecture - 2005 6/23

Page 17: The V-Way Cache: Demand-Based Associativity via Global

Outline

Introduction

Example of Local and Global Replacement

The V-Way Cache

Evaluation

Related Work and Conclusion

The International Symposium on Computer Architecture - 2005 7/23

Page 18: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

HIT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

The International Symposium on Computer Architecture - 2005 8/23

Page 19: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

HIT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

The International Symposium on Computer Architecture - 2005 8/23

Page 20: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

RPTRV

ARRAY

SCHEME

GLOBAL

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

DATA

HIT

The International Symposium on Computer Architecture - 2005 8/23

Page 21: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

MISS

The International Symposium on Computer Architecture - 2005 8/23

Page 22: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

MISS

The International Symposium on Computer Architecture - 2005 8/23

Page 23: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

MISS

Configuration Tag Access Data Replacement

Set-Associative Fast Local

Fully-Associative

V-Way

The International Symposium on Computer Architecture - 2005 8/23

Page 24: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

MISS

Configuration Tag Access Data Replacement

Set-Associative Fast Local

Fully-Associative Slow Global

V-Way

The International Symposium on Computer Architecture - 2005 8/23

Page 25: The V-Way Cache: Demand-Based Associativity via Global

The V-Way Cache

TAGCOMPARE

FPTRSELECT

RPTRV

ARRAY

SCHEME

GLOBALDATA

DATA

FPTRTAGSTATUS

REPLACEMENT

INDEX OFFTAG

TAG−STORE DATA−STORE

MISS

Configuration Tag Access Data Replacement

Set-Associative Fast Local

Fully-Associative Slow Global

V-Way Fast Global

The International Symposium on Computer Architecture - 2005 8/23

Page 26: The V-Way Cache: Demand-Based Associativity via Global

A Practical Global Replacement Algorithm

LRU is impractical because there are thousands of lines

Second level cache access stream is a filtered version of theprogram access stream

Reuse frequency is skewed towards the low end0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

15+

Reuse count

0

10

20

30

40

Per

cent

age

of L

2 ev

icti

ons

The International Symposium on Computer Architecture - 2005 9/23

Page 27: The V-Way Cache: Demand-Based Associativity via Global

Reuse Replacement

00 01 10 11

VICTIMIZE

INITIALIZE

TESTTESTTESTTEST

TABLE

PTR

REUSE COUNTER

ACCESS ACCESS ACCESS

ACCESS

The International Symposium on Computer Architecture - 2005 10/23

Page 28: The V-Way Cache: Demand-Based Associativity via Global

Reuse Replacement

00 01 10 11

VICTIMIZE

INITIALIZE

TESTTESTTESTTEST

ACCESS ACCESS ACCESS

ACCESS

TABLE

PTR

REUSE COUNTER

11

01

00

The International Symposium on Computer Architecture - 2005 10/23

Page 29: The V-Way Cache: Demand-Based Associativity via Global

Reuse Replacement

00 01 10 11

VICTIMIZE

INITIALIZE

TESTTESTTESTTEST

TABLEREUSE COUNTER

PTR

10

01

00ACCESS ACCESS ACCESS

ACCESS

The International Symposium on Computer Architecture - 2005 10/23

Page 30: The V-Way Cache: Demand-Based Associativity via Global

Reuse Replacement

00 01 10 11

VICTIMIZE

INITIALIZE

TESTTESTTESTTEST

ACCESS ACCESS

ACCESS

TABLEREUSE COUNTER

PTR

10

00

00ACCESS

The International Symposium on Computer Architecture - 2005 10/23

Page 31: The V-Way Cache: Demand-Based Associativity via Global

Reuse Replacement

00 01 10 11

VICTIMIZE

INITIALIZE

TESTTESTTESTTEST

ACCESS

TABLEREUSE COUNTER

PTR

10

00

00ACCESS ACCESS

ACCESS

The International Symposium on Computer Architecture - 2005 10/23

Page 32: The V-Way Cache: Demand-Based Associativity via Global

Victim Distance for Reuse Replacement

Problem of variable replacement latencyAverage victim distance: 3.9Worst case victim distance: 1888

The International Symposium on Computer Architecture - 2005 11/23

Page 33: The V-Way Cache: Demand-Based Associativity via Global

Victim Distance for Reuse Replacement

Problem of variable replacement latencyAverage victim distance: 3.9Worst case victim distance: 1888

SolutionTest eight counters each cycleLimit search to five cycles

1 2 3 4 5

98.9%Probability (victim)

Latency (in cycles)

91.3% 96.9% 99.2%98.3%

The International Symposium on Computer Architecture - 2005 11/23

Page 34: The V-Way Cache: Demand-Based Associativity via Global

Outline

Introduction

Example of Local and Global Replacement

The V-Way Cache

Evaluation

Related Work and Conclusion

The International Symposium on Computer Architecture - 2005 12/23

Page 35: The V-Way Cache: Demand-Based Associativity via Global

Evaluation Outline

Experimental Methodology

Reduction in Misses with the V-Way Cache

Comparing Reuse Replacement and LRU

Storage, Latency, and Energy Cost

Impact on IPC

The International Symposium on Computer Architecture - 2005 13/23

Page 36: The V-Way Cache: Demand-Based Associativity via Global

Experimental Methodology

First level I-cache, D-cache: 16kB, 2-way, 64B linesize, LRU

Baseline L2: Unified, 256kB, 8-way, 128B linesize, LRU

Benchmarks: SPEC CPU2000

The International Symposium on Computer Architecture - 2005 14/23

Page 37: The V-Way Cache: Demand-Based Associativity via Global

Reduction in Misses with the V-Way Cache

Primary upper bound: Fully associative cache

Secondary upper bound: Double sized cache

The International Symposium on Computer Architecture - 2005 15/23

Page 38: The V-Way Cache: Demand-Based Associativity via Global

Reduction in Misses with the V-Way Cache

Primary upper bound: Fully associative cache

Secondary upper bound: Double sized cache

0

10

20

30

40

50

60

70

80

90

100

Per

cent

age

redu

ctio

n in

mis

s ra

te 256kB V-way + Reuse

256kB FA + LRU512kB 8-way + LRU

perlb

mkcra

ftymes

afac

erec

vorte

xap

simcf vp

r

gcc

ammp

twolf

bzip2

parse

rgz

ipsw

imga

lgel

amea

n

The International Symposium on Computer Architecture - 2005 15/23

Page 39: The V-Way Cache: Demand-Based Associativity via Global

Comparing Reuse Replacement and LRU

Comparison of miss-rate for LRU and Reuse replacement

Bmk bzip2 crafty gcc gzip mcf parser perl. twolf vortex

LRU 34.6 1.1 3.8 2.4 29.5 32.7 0.1 36.5 8.5

Reuse 35.0 1.0 3.8 2.4 29.9 32.9 0.1 35.4 7.1

Bmk vpr ammp apsi facerec galgel mesa swim amean

LRU 11.0 50.0 34.8 50.7 8.3 3.4 65.3 23.3

Reuse 10.5 50.0 34.8 50.6 8.5 3.5 65.3 23.2

The International Symposium on Computer Architecture - 2005 16/23

Page 40: The V-Way Cache: Demand-Based Associativity via Global

Storage, Latency, and Energy Cost

Storage needed for extra tags, FPTR, RPTR, and Reuse bits

Line-size Miss-rate reduction Increase in area

128 B 13.2% 5.8%

256 B 14.9% 2.9%

Delay due to more tags and FPTR selection: 0.13 ns

Energy in accessing bigger tag-store

Parallel lookup Baseline V-Way

1.02nJ 0.35nJ 0.40nJ

The International Symposium on Computer Architecture - 2005 17/23

Page 41: The V-Way Cache: Demand-Based Associativity via Global

Storage, Latency, and Energy Cost

Storage needed for extra tags, FPTR, RPTR, and Reuse bits

Line-size Miss-rate reduction Increase in area

128 B 13.2% 5.8%

256 B 14.9% 2.9%

Delay due to more tags and FPTR selection: 0.13 ns

Energy in accessing bigger tag-store

Parallel lookup Baseline V-Way

1.02nJ 0.35nJ 0.40nJ

The International Symposium on Computer Architecture - 2005 17/23

Page 42: The V-Way Cache: Demand-Based Associativity via Global

Storage, Latency, and Energy Cost

Storage needed for extra tags, FPTR, RPTR, and Reuse bits

Line-size Miss-rate reduction Increase in area

128 B 13.2% 5.8%

256 B 14.9% 2.9%

Delay due to more tags and FPTR selection: 0.13 ns

Energy in accessing bigger tag-store

Parallel lookup Baseline V-Way

1.02nJ 0.35nJ 0.40nJ

The International Symposium on Computer Architecture - 2005 17/23

Page 43: The V-Way Cache: Demand-Based Associativity via Global

Impact on IPC

Pipeline: 12 stage, 8 wide with 128 entry reservation station

L1 hit latency of 2 cycles and L2 hit latency of 10 cycles

L3/Main memory: access-latency of 80 cycles

The International Symposium on Computer Architecture - 2005 18/23

Page 44: The V-Way Cache: Demand-Based Associativity via Global

Impact on IPC

Pipeline: 12 stage, 8 wide with 128 entry reservation station

L1 hit latency of 2 cycles and L2 hit latency of 10 cycles

L3/Main memory: access-latency of 80 cycles

0

5

10

15

20

25

30

35

40

45

Per

cent

age

IPC

impr

ovem

ent

over

bas

e

V-Way (same hit latency as base)V-Way (additional one cycle hit latency)

bzip2

crafty gcc

gzip

mcfpa

rser

perlb

mktw

olfvo

rtex

vpr

ammp

apsi

facere

cga

lgel

mesa

swim

gmea

n

The International Symposium on Computer Architecture - 2005 18/23

Page 45: The V-Way Cache: Demand-Based Associativity via Global

Outline

Introduction

Example of Local and Global Replacement

The V-Way Cache

Evaluation

Related Work and Conclusion

The International Symposium on Computer Architecture - 2005 19/23

Page 46: The V-Way Cache: Demand-Based Associativity via Global

Related Work

Extra storage for conflict misses: Victim cache [Jouppi ISCA’90]

Multi-probe techniquesPredictive sequential associative cache [Calder+ HPCA’96]

Adaptive group associative cache [Peir+ ASPLOS’98]

Cache indexing functionSkewed associativity [Seznec ISCA’93]

Prime-modulo indexing [Kharbutli+ HPCA’04]

Software managed fully associative cache: IIC [Hallnor+ ISCA’00]

The International Symposium on Computer Architecture - 2005 20/23

Page 47: The V-Way Cache: Demand-Based Associativity via Global

Other Possible Applications of the V-Way Cache

Platform for global replacement with inbuilt shadow directory

Tag inclusion data exclusion [Piranha ISCA’00]

Cache compression [Hallnor+ HPCA’05]

Interaction with NuRAPID [Chishti+ MICRO’03]

The International Symposium on Computer Architecture - 2005 21/23

Page 48: The V-Way Cache: Demand-Based Associativity via Global

Conclusion

Traditional cache assumes uniform accesses across sets

Global replacement allows the V-Way cache to vary thenumber of valid ways depending on the set demand

Reuse replacement is fast and performs comparable to LRU

V-Way cache can lower miss-rate and improve performance

V-Way cache can serve as an infrastructure for otheroptimizations

The International Symposium on Computer Architecture - 2005 22/23

Page 49: The V-Way Cache: Demand-Based Associativity via Global

Questions

The International Symposium on Computer Architecture - 2005 23/23