26
Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China JWAC-1: Cache Replacement Championship 2010-06-20 ISCA-2010

College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

  • Upload
    mills

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang. College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China. JWAC-1: Cache Replacement Championship. 2010-06-20. ISCA-2010. Inner Mongolia University. - PowerPoint PPT Presentation

Citation preview

Page 1: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Adaptive Subset Based Replacement Policy for High Performance Caching

Liqiang He Yan Sun Chaozhong Zhang

College of Computer Science, Inner Mongolia University

Hohhot, Inner Mongolia, P. R. China

JWAC-1: Cache Replacement Championship2010-06-20

ISCA-2010

Page 2: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Background Cache Replacement Policy plays an

important role in a cache design. LRU policy is widely used in nowadays

microprocessor The LLC has poor locality due to the L1

already filters temporal locality LRU causes thrashing when working set >

cache size

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

Page 3: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Possible solution if working set > cache size, retain some working

set [Qureshi, et al, ISCA’07] record part of a longer cache access history

College of Computer Science

Inner Mongolia University

How we do it?

Grouping a cache set and keeping part of access history in each group.

Inspired by the thread migration paper of Pierre at HPCA’04

L2 L2 L2

C0 C1 Cn

L2 L2 L2

g0 g1 gn

JWAC-1: Cache Replacement Championship

Page 4: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Overview

Proposal: Subset Based Replacement Policy (SRP)

Inner Mongolia University

College of Computer Science

ASRP obtains a 4.5 % of geometric average miss reduction over LRU.

JWAC-1: Cache Replacement Championship

SRP successfully reduces the misses through retaining part of longer history in the groups.

But the static SRP does not suitable for different programs.

To adapt the diversity of programs and the behavior changing inside a program, we propose Adaptive SRP policy (ASRP).

Page 5: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Outline

Introduction

Static Subset Based Replacement Policy

Adaptive Subset Based Replacement Policy

Summary

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

Page 6: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Static Subset Based Replacement Policy

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

subset

subset

subset

subset

Cache set

Active:Accept insertion

Non-Active

Local LRU Stack

Page 7: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Insertion scheme in SRP

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

Insertion only occurs in active subset

Choose victim at LRU position. Do NOT promote to MRU

a b c dMRU LRU

a b c i

Reference to ‘i’

blocks in active subset

Page 8: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Operation on cache hit in SRP

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

hit in any (active or non-active) subset

a b c dMRU LRU

Reference to ‘c’

c a b d

Move to local MRU position

Page 9: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Changing of active subset When the misses in a set > a threshold X, change active subset

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

Thus:

A. force X consecutive misses only replacing the blocks in active subset

B. assume N subsets, then a subset can change to active again ONLY after (N-1)*X misses

C. a greater value of X, a longer time that blocks in non-active subsets can stay in a set

Page 10: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Thrashing access pattern in SRP

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 ….. b24

x = 6

assume working set is 24 blocks, LLC is 16-way, 4 subsets, 4 blocks/subset

b1

b2

b3

b4

LRU

MRU

Subset 0

b5b6 b7

b8

b9

b10

b11b12

Subset 1

b6

b2

b3

b4

Blocks in a set with SRP: b2b3b4b6 b8b9b10b12 b14b15b16b18 b20b21b22b24

Blocks in a set with LRU: b9 ….. b24

When access b2b3b4b6b8 again, SRP hits but LRU misses

Page 11: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Case Study of thrashing workload

Inner Mongolia University

4

4.5

5

5.5

6

6.5

7

7.5

1 2 4 8 16 32 64 128 256 512 1k 2k

Threshold

Mis

ses

pe

r 1

K in

stru

ctio

ns

SRP

LRU

College of Computer Science JWAC-1: Cache Replacement Championship

Different static thresholds have different abilities to reduce misses

Page 12: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Hardware implementationInner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

MRU

LRU

Page 13: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Results

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

0.6

0.8

1

1.2

1.4

1.6

1.8

(%)

Impr

ovem

ent o

f mis

ses

over

LR

U

threshold 2

threshold 4

threshold 8

• SRP reduces misses for thrashing workloads but increases for LRU-friendly ones.• Not exist a threshold that is suitable for all benchmarks

Page 14: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

Outline

Introduction

Static Subset Based Replacement Policy

Adaptive Subset Based Replacement Policy

Summary

Page 15: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

Adaptive SRP policy

Different programs prefer different thresholds.

Victim selection and insertion policy are same as in SRP

ONLY difference: threshold is selected dynamically from a pool of values according to which one causes fewest misses. The maximum threshold is 128 Pick eight values: 20, 21, …, 27

Apply the best threshold value to the cache

In ASRP policy:

Page 16: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

+

+

ASRP policy via “Set Dueling” Divide the cache into two

type: Sampling sets (eight

thresholds * 4sets/thres.)

Follower sets

Eight counters misses to threshold X’s

sampling sets: counter_x++

Counters decides threshold for Follower sets:

counter with smallest value

Thres-20-sets

Follower Sets

Thres-21-sets

Thres-27-sets

Cntr_0

miss

Cntr_7

Eight thresholds

JWAC-1: Cache Replacement ChampionshipCollege of Computer Science

Inner Mongolia University

Page 17: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Resetting mechanism

Eight thresholds

last_follow

=

global_follow

Y++

N --

threshold

>?

Cntr_0

Cntr_7

reset

JWAC-1: Cache Replacement ChampionshipCollege of Computer Science

Inner Mongolia University

To avoid the accumulative effect of a big value in a specific Cnrt_x

Record the times of a same threshold is selected by the follower sets

When the times > a threshold, reset all the Cntr_Xs

Page 18: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

Budget

Totally 45K bits

only 70% of the budget used by LRU policy, and 35% of the total budget provided by this championship

Page 19: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

College of Computer Science

Inner Mongolia University

Results

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

asta

rbw

aves

bzip

2ca

ctus

ADM

calc

ulix

deal

IIga

mes

sgc

cG

emsF

DTD

gobm

kgr

omac

sh2

46re

fhm

mer

lbm

lesl

ie3d

libqu

antu

mm

cfm

ilcna

md

omne

tpp

perlb

ench

pova

rysj

eng

sopl

exsp

hix3

tont

oxa

lanc

bmk

zeus

mp

aver

age

(%)

Imp

rove

me

nt o

f mis

ses

ove

r L

RU

DIP ASRP

For 1MB 16-ways LLC. ASRP gets a geometric average speedup of 4.5% over LRU

JWAC-1: Cache Replacement Championship

Page 20: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Analyze

College of Computer Science

Inner Mongolia University

4

4.5

5

5.5

6

6.5

7

7.5

1 2 4 8 16 32 64 128 256 512 1k 2kThreshold

Mis

ses

pe

r 1

K in

stru

ctio

ns

SRP

LRU

ASRP

7

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

1 2 4 8 16 32 64 128 256 512 1k 2kThreshold

Mis

ses

pe

r 1

K in

stru

ctio

ns

SRPLRUASRP

xalancbmk GemsFDTD

JWAC-1: Cache Replacement Championship

The sampling mechanism does help ASRP to find the best thresholds for different programs

Page 21: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Conclusion

Keeping part of working set in the cache helps reducing misses when the cache suffers a thrashing problem

The part of longer access history helps SRP more accurately capturing the frequently used blocks

Different programs and different phases of a program prefer different thresholds to contribute maximum hits to the cache

“Set Dueling” helps ASRP dynamically selecting a suitable threshold

The experiment results show the effectiveness of ASRP policy

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

Page 22: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Thank you!

Any question?

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

Page 23: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Result on multi-core processor

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

asta

ras

tar

asta

r

bwav

es

GemsF

DTD

omne

tpp

hmm

er

GemsF

DTD

hmm

er

xalan

cbm

k

(%)

Imp

rove

me

nt

of

mis

ses

ove

r L

RU

DIP ASRP

College of Computer Science

Inner Mongolia University

JWAC-1: Cache Replacement Championship

0.6

0.8

1

1.2

1.4

1.6

1.8

2

asta

rbw

aves

asta

rbw

aves

asta

rhm

mer

sphi

x3xa

lanc

bmk

bzip

2G

emsF

DTD

gobm

kom

netp

p

hmm

erxa

lanc

bmk

Gem

sFD

TDom

netp

p

(%)

Imp

rove

me

nt o

f mis

ses

ove

r L

RU DIP ASRP

Page 24: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

7

7.1

7.2

7.3

7.4

7.5

7.6

7.7

7.8

1 2 4 8 16 32 64 128 256 512 1k 2k

Threshold

Mis

ses

per

1K in

stru

ctio

ns

SRP

LRU

Case Study of LRU-friendly workload

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

Page 25: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship

Explanation of active subset changing

Page 26: College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China

A simple example of SRP policy

Inner Mongolia University

College of Computer Science JWAC-1: Cache Replacement Championship