Upload
mills
View
40
Download
0
Embed Size (px)
DESCRIPTION
Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang. College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China. JWAC-1: Cache Replacement Championship. 2010-06-20. ISCA-2010. Inner Mongolia University. - PowerPoint PPT Presentation
Citation preview
Adaptive Subset Based Replacement Policy for High Performance Caching
Liqiang He Yan Sun Chaozhong Zhang
College of Computer Science, Inner Mongolia University
Hohhot, Inner Mongolia, P. R. China
JWAC-1: Cache Replacement Championship2010-06-20
ISCA-2010
Background Cache Replacement Policy plays an
important role in a cache design. LRU policy is widely used in nowadays
microprocessor The LLC has poor locality due to the L1
already filters temporal locality LRU causes thrashing when working set >
cache size
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Possible solution if working set > cache size, retain some working
set [Qureshi, et al, ISCA’07] record part of a longer cache access history
College of Computer Science
Inner Mongolia University
How we do it?
Grouping a cache set and keeping part of access history in each group.
Inspired by the thread migration paper of Pierre at HPCA’04
L2 L2 L2
C0 C1 Cn
L2 L2 L2
g0 g1 gn
JWAC-1: Cache Replacement Championship
Overview
Proposal: Subset Based Replacement Policy (SRP)
Inner Mongolia University
College of Computer Science
ASRP obtains a 4.5 % of geometric average miss reduction over LRU.
JWAC-1: Cache Replacement Championship
SRP successfully reduces the misses through retaining part of longer history in the groups.
But the static SRP does not suitable for different programs.
To adapt the diversity of programs and the behavior changing inside a program, we propose Adaptive SRP policy (ASRP).
Outline
Introduction
Static Subset Based Replacement Policy
Adaptive Subset Based Replacement Policy
Summary
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Static Subset Based Replacement Policy
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
subset
subset
subset
subset
Cache set
Active:Accept insertion
Non-Active
Local LRU Stack
Insertion scheme in SRP
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Insertion only occurs in active subset
Choose victim at LRU position. Do NOT promote to MRU
a b c dMRU LRU
a b c i
Reference to ‘i’
blocks in active subset
Operation on cache hit in SRP
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
hit in any (active or non-active) subset
a b c dMRU LRU
Reference to ‘c’
c a b d
Move to local MRU position
Changing of active subset When the misses in a set > a threshold X, change active subset
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Thus:
A. force X consecutive misses only replacing the blocks in active subset
B. assume N subsets, then a subset can change to active again ONLY after (N-1)*X misses
C. a greater value of X, a longer time that blocks in non-active subsets can stay in a set
Thrashing access pattern in SRP
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 ….. b24
x = 6
assume working set is 24 blocks, LLC is 16-way, 4 subsets, 4 blocks/subset
b1
b2
b3
b4
LRU
MRU
Subset 0
b5b6 b7
b8
b9
b10
b11b12
Subset 1
b6
b2
b3
b4
Blocks in a set with SRP: b2b3b4b6 b8b9b10b12 b14b15b16b18 b20b21b22b24
Blocks in a set with LRU: b9 ….. b24
When access b2b3b4b6b8 again, SRP hits but LRU misses
Case Study of thrashing workload
Inner Mongolia University
4
4.5
5
5.5
6
6.5
7
7.5
1 2 4 8 16 32 64 128 256 512 1k 2k
Threshold
Mis
ses
pe
r 1
K in
stru
ctio
ns
SRP
LRU
College of Computer Science JWAC-1: Cache Replacement Championship
Different static thresholds have different abilities to reduce misses
Hardware implementationInner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
MRU
LRU
Results
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
0.6
0.8
1
1.2
1.4
1.6
1.8
(%)
Impr
ovem
ent o
f mis
ses
over
LR
U
threshold 2
threshold 4
threshold 8
• SRP reduces misses for thrashing workloads but increases for LRU-friendly ones.• Not exist a threshold that is suitable for all benchmarks
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Outline
Introduction
Static Subset Based Replacement Policy
Adaptive Subset Based Replacement Policy
Summary
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Adaptive SRP policy
Different programs prefer different thresholds.
Victim selection and insertion policy are same as in SRP
ONLY difference: threshold is selected dynamically from a pool of values according to which one causes fewest misses. The maximum threshold is 128 Pick eight values: 20, 21, …, 27
Apply the best threshold value to the cache
In ASRP policy:
+
+
ASRP policy via “Set Dueling” Divide the cache into two
type: Sampling sets (eight
thresholds * 4sets/thres.)
Follower sets
Eight counters misses to threshold X’s
sampling sets: counter_x++
Counters decides threshold for Follower sets:
counter with smallest value
Thres-20-sets
Follower Sets
Thres-21-sets
Thres-27-sets
Cntr_0
miss
Cntr_7
Eight thresholds
JWAC-1: Cache Replacement ChampionshipCollege of Computer Science
Inner Mongolia University
Resetting mechanism
Eight thresholds
last_follow
=
global_follow
Y++
N --
threshold
>?
Cntr_0
Cntr_7
reset
JWAC-1: Cache Replacement ChampionshipCollege of Computer Science
Inner Mongolia University
To avoid the accumulative effect of a big value in a specific Cnrt_x
Record the times of a same threshold is selected by the follower sets
When the times > a threshold, reset all the Cntr_Xs
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Budget
Totally 45K bits
only 70% of the budget used by LRU policy, and 35% of the total budget provided by this championship
College of Computer Science
Inner Mongolia University
Results
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
asta
rbw
aves
bzip
2ca
ctus
ADM
calc
ulix
deal
IIga
mes
sgc
cG
emsF
DTD
gobm
kgr
omac
sh2
46re
fhm
mer
lbm
lesl
ie3d
libqu
antu
mm
cfm
ilcna
md
omne
tpp
perlb
ench
pova
rysj
eng
sopl
exsp
hix3
tont
oxa
lanc
bmk
zeus
mp
aver
age
(%)
Imp
rove
me
nt o
f mis
ses
ove
r L
RU
DIP ASRP
For 1MB 16-ways LLC. ASRP gets a geometric average speedup of 4.5% over LRU
JWAC-1: Cache Replacement Championship
Analyze
College of Computer Science
Inner Mongolia University
4
4.5
5
5.5
6
6.5
7
7.5
1 2 4 8 16 32 64 128 256 512 1k 2kThreshold
Mis
ses
pe
r 1
K in
stru
ctio
ns
SRP
LRU
ASRP
7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
1 2 4 8 16 32 64 128 256 512 1k 2kThreshold
Mis
ses
pe
r 1
K in
stru
ctio
ns
SRPLRUASRP
xalancbmk GemsFDTD
JWAC-1: Cache Replacement Championship
The sampling mechanism does help ASRP to find the best thresholds for different programs
Conclusion
Keeping part of working set in the cache helps reducing misses when the cache suffers a thrashing problem
The part of longer access history helps SRP more accurately capturing the frequently used blocks
Different programs and different phases of a program prefer different thresholds to contribute maximum hits to the cache
“Set Dueling” helps ASRP dynamically selecting a suitable threshold
The experiment results show the effectiveness of ASRP policy
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Thank you!
Any question?
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
Result on multi-core processor
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
asta
ras
tar
asta
r
bwav
es
GemsF
DTD
omne
tpp
hmm
er
GemsF
DTD
hmm
er
xalan
cbm
k
(%)
Imp
rove
me
nt
of
mis
ses
ove
r L
RU
DIP ASRP
College of Computer Science
Inner Mongolia University
JWAC-1: Cache Replacement Championship
0.6
0.8
1
1.2
1.4
1.6
1.8
2
asta
rbw
aves
asta
rbw
aves
asta
rhm
mer
sphi
x3xa
lanc
bmk
bzip
2G
emsF
DTD
gobm
kom
netp
p
hmm
erxa
lanc
bmk
Gem
sFD
TDom
netp
p
(%)
Imp
rove
me
nt o
f mis
ses
ove
r L
RU DIP ASRP
7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
1 2 4 8 16 32 64 128 256 512 1k 2k
Threshold
Mis
ses
per
1K in
stru
ctio
ns
SRP
LRU
Case Study of LRU-friendly workload
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship
Explanation of active subset changing
A simple example of SRP policy
Inner Mongolia University
College of Computer Science JWAC-1: Cache Replacement Championship