33
Exploiting Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch

Exploiting Compressed Block Size as an Indicator of Future Reuse

  • Upload
    ngoliem

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploiting Compressed Block Size as an Indicator of Future Reuse

Exploiting Compressed Block Size as an Indicator of Future Reuse

Gennady Pekhimenko,

Tyler Huberty, Rui Cai,

Onur Mutlu, Todd C. Mowry

Phillip B. Gibbons,

Michael A. Kozuch

Page 2: Exploiting Compressed Block Size as an Indicator of Future Reuse

Executive Summary

2

• In a compressed cache, compressed block size is an additional dimension

• Problem: How can we maximize cache performance utilizing both block reuse and compressed size?

• Observations: – Importance of the cache block depends on its compressed size

– Block size is indicative of reuse behavior in some applications

• Key Idea: Use compressed block size in making cache replacement and insertion decisions

• Results:– Higher performance (9%/11% on average for 1/2-core)

– Lower energy (12% on average)

Page 3: Exploiting Compressed Block Size as an Indicator of Future Reuse

Potential for Data Compression

3Compression Ratio

Late

ncy

0.5 1.51.0 2.0

FPC

X

C-Pack

X

BDI

X

SC2

X

Frequent Pattern Compression (FPC)

[Alameldeen+, ISCA’04]

C-Pack [Chen+, Trans. on VLSI’10]

Base-Delta-Immediate (BDI)[Pekhimenko+, PACT’12]

Statictical Compression(SC2) [Arelakis+, ISCA’14]

5 cycles

9 cycles

1-2 cycles

8-14 cycles

Can we do better?All use LRU replacement policyAll these works showed significant

performance improvements

Page 4: Exploiting Compressed Block Size as an Indicator of Future Reuse

Size-Aware Cache Management

1. Compressed block size matters

2. Compressed block size varies

3. Compressed block size can indicate reuse

4

Page 5: Exploiting Compressed Block Size as an Indicator of Future Reuse

#1: Compressed Block Size Matters

5

A Y B

Cache:

Xmiss hit

A Yhit

Bmiss

Cmiss

C

time

Xmiss hit

B Ctime

A Ymiss hit hit

Belady’s OPT Replacement

Size-Aware Replacement

saved cycles

Size-aware policies could yield fewer misses

X A Y B CAccess stream:

large small

Page 6: Exploiting Compressed Block Size as an Indicator of Future Reuse

#2: Compressed Block Size Varies

6

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9

Cac

he

Co

vera

ge, %

Series1 Series2 Series3 Series4 Series5

Series6 Series7 Series8 Series9

Compressed block size varies within and between applications

Block size, bytes

BDI compression algorithm [Pekhimenko+, PACT’12]

Page 7: Exploiting Compressed Block Size as an Indicator of Future Reuse

#3: Block Size Can Indicate Reuse

7

0

2000

4000

6000

1 8 16 20 24 34 36 40 64

Re

use

Dis

tan

ce

(# o

f m

em

ory

acce

sse

s) bzip2

Block size, bytes

Different sizes have different dominant reuse distances

Page 8: Exploiting Compressed Block Size as an Indicator of Future Reuse

Code Example to Support Intuition

int A[N]; // small indices: compressible

double B[16]; // FP coefficients: incompressible

for (int i=0; i<N; i++) {

int idx = A[i];

for (int j=0; j<N; j++) {

sum += B[(idx+j)%16];

}

}

8

long reuse, compressible

short reuse, incompressible

Compressed size can be an indicator of reuse distance

Page 9: Exploiting Compressed Block Size as an Indicator of Future Reuse

Size-Aware Cache Management

1. Compressed block size matters

2. Compressed block size varies

3. Compressed block size can indicate reuse

9

Page 10: Exploiting Compressed Block Size as an Indicator of Future Reuse

Outline

10

• Motivation

• Key Observations

• Key Ideas:• Minimal-Value Eviction (MVE)

• Size-based Insertion Policy (SIP)

• Evaluation

• Conclusion

Page 11: Exploiting Compressed Block Size as an Indicator of Future Reuse

Compression-Aware Management Policies (CAMP)

11

CAMP

MVE: Minimal-Value

Eviction

SIP:Size-based

Insertion Policy

Page 12: Exploiting Compressed Block Size as an Indicator of Future Reuse

Define a value (importance) of the block to the cacheEvict the block with the lowest value

Minimal-Value Eviction (MVE): Observations

12

Block #0Block #1

Block #N

Highest priority: MRU

Lowest priority: LRU

Set:

Block #XBlock #X

… Importance of the block depends on both the likelihood of reference and the compressed block size

Highest value

Lowest value

Page 13: Exploiting Compressed Block Size as an Indicator of Future Reuse

Minimal-Value Eviction (MVE): Key Idea

13

Value = Probability of reuse

Compressed block size

Probability of reuse:• Can be determined in many different ways• Our implementation is based on re-reference

interval prediction value (RRIP [Jaleel+, ISCA’10])

Page 14: Exploiting Compressed Block Size as an Indicator of Future Reuse

Compression-Aware Management Policies (CAMP)

14

CAMP

MVE: Minimal-Value

Eviction

SIP:Size-based

Insertion Policy

Page 15: Exploiting Compressed Block Size as an Indicator of Future Reuse

Size-based Insertion Policy (SIP):Observations• Sometimes there is a relation between the

compressed block size and reuse distance

15

compressed block sizedata

structure

• This relation can be detected through the compressed block size

• Minimal overhead to track this relation (compressed block information is a part of design)

reuse distance

Page 16: Exploiting Compressed Block Size as an Indicator of Future Reuse

Size-based Insertion Policy (SIP):Key Idea

• Insert blocks of a certain size with higher priority if this improves cache miss rate

• Use dynamic set sampling [Qureshi, ISCA’06] to detect which size to insert with higher priority

16

set A

set B

set C

set D

set E

set G

set H

set F

set I

Main Tags

set A 8B

Auxiliary Tags

set D 64B

set F 8B

set I 64B

set A

set F

+ CTR8B -

…miss

set F 8B

set A 8B

miss

decides policyfor steady state

Page 17: Exploiting Compressed Block Size as an Indicator of Future Reuse

Outline

17

• Motivation

• Key Observations

• Key Ideas:• Minimal-Value Eviction (MVE)

• Size-based Insertion Policy (SIP)

• Evaluation

• Conclusion

Page 18: Exploiting Compressed Block Size as an Indicator of Future Reuse

Methodology

18

• Simulator– x86 event-driven simulator (MemSim [Seshadri+, PACT’12])

• Workloads– SPEC2006 benchmarks, TPC, Apache web server

– 1 – 4 core simulations for 1 billion representative instructions

• System Parameters– L1/L2/L3 cache latencies from CACTI

– BDI (1-cycle decompression) [Pekhimenko+, PACT’12]

– 4GHz, x86 in-order core, cache size (1MB - 16MB)

Page 19: Exploiting Compressed Block Size as an Indicator of Future Reuse

Evaluated Cache Management Policies

19

Design Description

LRU Baseline LRU policy- size-unaware

RRIP Re-reference Interval Prediction [Jaleel+, ISCA’10] - size-unaware

Page 20: Exploiting Compressed Block Size as an Indicator of Future Reuse

Evaluated Cache Management Policies

20

Design Description

LRU Baseline LRU policy- size-unaware

RRIP Re-reference Interval Prediction [Jaleel+, ISCA’10] - size-unaware

ECM Effective Capacity Maximizer [Baek+, HPCA’13]+ size-aware

Page 21: Exploiting Compressed Block Size as an Indicator of Future Reuse

Evaluated Cache Management Policies

21

Design Description

LRU Baseline LRU policy- size-unaware

RRIP Re-reference Interval Prediction [Jaleel+, ISCA’10] - size-unaware

ECM Effective Capacity Maximizer [Baek+, HPCA’13]+ size-aware

CAMP Compression-Aware Management Policies(MVE + SIP)

Page 22: Exploiting Compressed Block Size as an Indicator of Future Reuse

Size-Aware Replacement

22

• Effective Capacity Maximizer (ECM)

[Baek+, HPCA’13]– Inserts “big” blocks with lower priority

– Uses heuristic to define the threshold

• Shortcomings– Coarse-grained

– No relation between block size and reuse

– Not easily applicable to other cache organizations

Page 23: Exploiting Compressed Block Size as an Indicator of Future Reuse

CAMP Single-Core

23

0.8

0.9

1

1.1

1.2

1.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

No

rmal

ize

d IP

C

Series1 Series2 Series3

SPEC2006, databases, web workloads, 2MB L2 cache

CAMP improves performance over LRU, RRIP, ECM

31%

8%5%

Page 24: Exploiting Compressed Block Size as an Indicator of Future Reuse

Multi-Core Results

• Classification based on the compressed block size distribution– Homogeneous (1 or 2 dominant block size(s))

– Heterogeneous (more than 2 dominant sizes)

• We form three different 2-core workload groups (20 workloads in each):– Homo-Homo

– Homo-Hetero

– Hetero-Hetero

24

Page 25: Exploiting Compressed Block Size as an Indicator of Future Reuse

Multi-Core Results (2)

25

0.95

1.00

1.05

1.10

1 2 3 4

No

rmal

ize

d W

eig

hte

d S

pe

ed

up

Series1 Series2 Series3

CAMP outperforms LRU, RRIP and ECM policiesHigher benefits with higher heterogeneity in size

Page 26: Exploiting Compressed Block Size as an Indicator of Future Reuse

Effect on Memory Subsystem Energy

26

0.7

0.8

0.9

1

1.1

1.2as

tar

cact

usA

DM gcc

gob

mk

h2

64

ref

sop

lex

zeu

smp

bzi

p2

Gem

sFD

TDgr

om

acs

lesl

ie3

dp

erlb

ench

sjen

gsp

hin

x3xa

lan

cbm

km

ilco

mn

etp

plib

qu

antu

mlb

mm

cftp

ch2

tpch

6ap

ach

e

GM

ean

Inte

nse

GM

ean

All

No

rmal

ize

d E

ne

rgy RRIP ECM CAMP

CAMP reduces energy consumption in the memory subsystem

5%

L1/L2 caches, DRAM, NoC, compression/decompression

Page 27: Exploiting Compressed Block Size as an Indicator of Future Reuse

CAMP for Global Replacement

27

G-CAMP

G-MVE: Global Minimal-Value Eviction

G-SIP:Global Size-based

Insertion Policy

CAMP with V-Way [Qureshi+, ISCA’05] cache design

Reuse Replacement instead of RRIPSet Dueling instead of Set Sampling

Page 28: Exploiting Compressed Block Size as an Indicator of Future Reuse

G-CAMP Single-Core

28

SPEC2006, databases, web workloads, 2MB L2 cache

G-CAMP improves performance over LRU, RRIP, V-Way

0.8

0.9

1

1.1

1.2

1.3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

No

rmal

ize

d IP

C

Series1 Series2 Series3 Series4 Series563%-97%

14%9%

Page 29: Exploiting Compressed Block Size as an Indicator of Future Reuse

Storage Bit Count Overhead

29

Designs(with BDI)

LRU CAMP V-Way G-CAMP

tag-entry +14b +14b +15b +19b

data-entry +0b +0b +16b +32b

total +9% +11% +12.5% +17%

Over uncompressed cache with LRU

2% 4.5%

Page 30: Exploiting Compressed Block Size as an Indicator of Future Reuse

Cache Size Sensitivity

30

1

1.1

1.2

1.3

1.4

1.5

1.6

1 2 3 4 5

No

rmal

ize

d IP

C Series1 Series2 Series3

CAMP/G-CAMP outperforms prior policies for all $ sizesG-CAMP outperforms LRU with 2X cache sizes

Page 31: Exploiting Compressed Block Size as an Indicator of Future Reuse

Other Results and Analyses in the Paper

• Block size distribution for different applications

• Sensitivity to the compression algorithm

• Comparison with uncompressed cache

• Effect on cache capacity

• SIP vs. PC-based replacement policies

• More details on multi-core results

31

Page 32: Exploiting Compressed Block Size as an Indicator of Future Reuse

Conclusion

32

• In a compressed cache, compressed block size is an additional dimension

• Problem: How can we maximize cache performance utilizing both block reuse and compressed size?

• Observations:– Importance of the cache block depends on its compressed size– Block size is indicative of reuse behavior in some applications

• Key Idea: Use compressed block size in making cache replacement and insertion decisions

• Two techniques: Minimal-Value Eviction and Size-based Insertion Policy

• Results:– Higher performance (9%/11% on average for 1/2-core)– Lower energy (12% on average)

Page 33: Exploiting Compressed Block Size as an Indicator of Future Reuse

Exploiting Compressed Block Size as an Indicator of Future Reuse

Gennady Pekhimenko,

Tyler Huberty, Rui Cai,

Onur Mutlu, Todd C. Mowry

Phillip B. Gibbons,

Michael A. Kozuch