34
Yiying Zhang Gokul Soundararajan Mark W. Storer Lakshmi N. Bairavasundaram Sethuraman Subbiah Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Warming up Storage-level Caches with Bonfire

Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Yiying ZhangGokul Soundararajan

Mark W. Storer

Lakshmi N. Bairavasundaram

Sethuraman Subbiah

Andrea C. Arpaci-Dusseau

Remzi H. Arpaci-Dusseau

Warming up Storage-level Caches with Bonfire

Page 2: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Does on-demand cache warmup still work ?

2

Page 3: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

0.0

0.1

1.0

16.0

256.0

4096.0

65536.0

1990 1995 2000 2005 2010 2015

Me

mo

ry (

Cac

he

) Si

ze (

GB

)

Year

3

< 10 GB

10s of TBs

. . .

+ idle time

Random:6 days

Sequential:2.5 hours

1 TB Cache

Page 4: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

0

20

40

60

80

100

0 4 8 12 16 20 24

Re

ad H

it R

ate

(%

)

Time (hour)

Always-warm

Cold

How Long Does On-demand Warmup Take?

* Simulation results from a project server trace

4

• Read hit rate difference between warm cache and on-demand

On-demand

On-demand warmup takes hours to days

Page 5: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Caches are critical▫ Key component to meet application SLAs

▫ Reduce storage server I/O load

• Cache warmup happens often▫ Storage server restart

▫ Storage server take-over

▫ Dynamic caching [Narayanan’08, Bairavasundaram’12]

To Make Things Worse

5

Page 6: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Bonfire▫ Monitors and logs I/Os

▫ Load warmup data in bulk

• Challenges▫ What to monitor & log? Effective

▫ How to monitor & log? Efficient

▫ How to load warmup data? Fast▫ General solution

On-demand Warmup Doesn’t Work Anymore What Can We Do?

6

BonfireMonitor

Storage System

Warmup Information

LoggingVolume

I/O

Page 7: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Trace analysis for storage-level cache warmup▫ Temporal and spatial patterns of reaccesses

• Cache warmup algorithm design and simulation

• Implementation and evaluation of Bonfire

▫ Up to 100% warmup time improvement over on-demand

▫ Up to 200% more server I/O load reduction

▫ Up to 5 times lower read latency

▫ Low overhead

Summary of Contributions

7

Page 8: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Introduction

• Trace analysis for cache warmup

• Cache warmup algorithm study with simulation

• Bonfire architecture

• Evaluation results

• Conclusion

Outline

8

Page 9: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• MSR-Cambridge [Narayanan’08]

▫ 36 one-week block-level traces from MSR-Cambridge data center servers

▫ Filter out write-intensive, small working set, and low reaccess-rate

Workload Study – Trace Selection

Server Function #volumes

mds Media server 1

prn Print server 1

proj Project directories 3

src1 Source control 1

usr User home directories 2

web Web/SQL server 1

Reaccesses: Read After Readsand Read After Writes

9

Page 10: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Questions for Trace Study

10

Time Space

Relative Q1: What’s the temporal distance?Q3: What’s the spatial distance?

Any clustering of reaccesses?

AbsoluteQ2: When do reaccesses happen

(in terms of wall clock time)?

Q4: Where do reaccesses

happen (in terms of LBA)?

0

50

100

150

12 AM 1 AM 2 AM 3 AM 4 AM 5 AM 6 AM

LBA

5 hours 1 hour

Page 11: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Q1: What is the Temporal Distance?

11

0

20

40

60

80

100

0 24 48 72 96 120 144 168

Am

ou

nt

of

Re

acce

sse

s (%

)

Temporal Distance (Hour)

Time b/w Reaccesses for All Traces

Within an hour: Hourly

23-25 hours: Daily

Page 12: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Q1: What is the Temporal Distance?

A1: Two main reaccess patterns: Hourly & Daily

12

0

20

40

60

80

100

mds_1 proj_1 proj_4 usr_1 src1_1 web_2 prn_1 proj_2 usr_2

Am

ou

nt

of

Re

acce

ses

(%)

HourlyDailyOther

Hourly Dominated

Daily Dominated Other

In an hour, recent blocks more likely reaccessed

Page 13: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Q2: When Do Reaccesses Happen (Wall Clock Time)?

13

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7

Dai

ly R

eac

cess

es

(%)

Time (day)

src11web2

A2: Daily reaccesses at same time every day

Page 14: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Q3: What is the Spatial Distance?

A3: Spatial distance usually small for hourly

sometimes small for other reaccesses

14

0%

20%

40%

60%

80%

100%

mds1 proj1 proj4 usr1 src11 web2 prn1 proj2 usr2Am

ou

nt

of

Rea

cces

ses

(%)

<10MB 10MB-1GB 1-10GB 10-100GB >100GB

Hourly Dominated Daily Dominated Other

Page 15: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Percentage of 1MB regions that have reaccesses

Q3: Any spatial clustering among reaccesses?

A3: Daily reaccesses more spatially clustered

15

0

20

40

60

80

100

mds_1 proj_1 proj_4 usr_1 src1_1 web_2 prn_1 proj_2 usr_2

Pce

nta

geo

f R

eac

cese

sin

1

MB

Re

gio

n (

%)

Hourly

Daily

Other

Page 16: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• A1 Hourly: Use recently accessed blocks

• A1 and A2 Daily: Use same period from previous day

• A3 Small spatial distance: Size of monitoring buffer is small

Trace Analysis Summary and Implications

Time Space

Relative

A1: Reaccesses have two main

temporal patterns:

within 1 hour, around 1 day

A3: Hourly reaccesses are

close in spatial distance.

Daily reaccesses exhibit

spatial clustering.

AbsoluteA2: Daily reaccesses correlate

with wall clock time

A4: No hot spot of

reaccesses in LBA space

16

Page 17: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Introduction

• Trace analysis for cache warmup

• Cache warmup algorithm study with simulation

• Bonfire architecture

• Evaluation results

• Conclusion

Outline

17

Page 18: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Warmup period: Hit-rate convergence time

0

20

40

60

80

100

0 20 40 60 80 100 120

Re

ad H

it R

ate

(%

)

Time

Always-warm cache

New cache

Metrics: Warmup Time

Converge time strictConverge time loose

18

Page 19: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Storage server I/O load reduction

▫𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝐼/𝑂𝑠 𝑔𝑜𝑖𝑛𝑔 𝑡𝑜 𝑐𝑎𝑐ℎ𝑒

𝑇𝑜𝑡𝑎𝑙 𝐼/𝑂𝑠during convergence time

• Improvement in server I/O load reduction

▫𝑆𝑒𝑟𝑣𝑒𝑟 𝐼/𝑂 𝑙𝑜𝑎𝑑 𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐵𝑜𝑛𝑓𝑖𝑟𝑒

𝑆𝑒𝑟𝑣𝑒𝑟𝐼

𝑂𝑙𝑜𝑎𝑑 𝑟𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑂𝑛−𝑑𝑒𝑚𝑎𝑛𝑑

Metrics: Server I/O Reduction

19

Page 20: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Last-K: Last K regions accessed in the trace

• First-K: First K regions in the past 24 hours

• Top-K: K most frequent regions

• Random-K: Random K regions

Cache Warmup Algorithms

LF

24 Hours

I/Os

Time

RT

cache starts

20

Region: granularity of monitoring and logginge.g., 1MB

Page 21: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• LRU cache simulator with four warmup algorithms

• Convergence time ▫ Improves 14% to 100%

• Server I/O load reduction ▫ Improves 44% to 228%

• In general, Last-K is the best

• First-K works for special case (known patterns)

Simulation Results - Overall

21

Page 22: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Introduction

• Trace analysis for cache warmup

• Cache warmup algorithm study with simulation

• Bonfire architecture

• Evaluation results

• Conclusion

Outline

22

Page 23: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Design principles▫ Low overhead monitoring and logging (efficient)

▫ Bulk loading useful warmup data (effective and fast)

▫ General design applicable to a range of scenarios

• Techniques▫ Last-K

▫ Monitors I/O below the server buffer cache

▫ Performance snapshot

Bonfire Design

23

Page 24: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Bonfire Architecture: Monitoring

PerformanceSnapshot

Bonfire Monitor

Storage System

123

4

n

5...Warmup

Metadata

BufferCache

LoggingVolume

I/O

I/O Warmup Data

In-memoryStaging Buffer

Data Volumes

I/O

24

Only store warmup metadata: metadata-only

Store warmup metadata and data: metadata+data

Page 25: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Bonfire Architecture: Bulk Cache Warmup

PerformanceSnapshot

Bonfire Monitor

Storage System

123

4

n

5...Warmup

Metadata

BufferCache

LoggingVolume

I/O

I/O Warmup Data

Data Volumes

In-mem n n-1 k..

Sorted by LBA

NewCache

Warmup Data

NewCache

25

Page 26: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Introduction

• Trace analysis for cache warmup

• Cache warmup algorithm study with simulation

• Bonfire architecture

• Evaluation results

• Conclusion

Outline

26

Page 27: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

1 week

• Implemented Bonfire as a trace replayer▫ Always-warm, on-demand, and Bonfire

▫ Metadata-only and metadata+data

▫ Replay traces using sync I/Os

• Workloads▫ Synthetic workloads

▫ MSR-Cambridge traces

• Metrics▫ Benefits and overheads

Evaluation Set Up

27

1 2 3 4 5 6 7

On-demand

BonfireAlways-warm

Page 28: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

0

20

40

60

80

100

0 1 2 3 4

Re

ad H

it R

ate

(%

)

Num of I/Os (x1000000)

Always-warm

Cold

Bonfire

Benefit Results - Read Hit Rate of MSR Trace

On-demand converges

Bonfire

converges

28

* Results of a project server trace from MSR-Cambridge trace set

CacheStarts

Day 5 Day 6

• Higher read hit rate => less server I/O load

On-demand

Page 29: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

0

0.5

1

1.5

2

0 1 2 3 4

Re

ad L

ate

ncy

(m

s)

Num of I/Os (x1000000)

Always-warm

Cold

Bonfire

Benefit Results - Read Latency of MSR Trace

29

• Lower read latency => better application-perceived performance

* Results of a project server trace from MSR-Cambridge trace set

Cache Starts

Day 5 Day 6

On-demand

Page 30: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Overhead Results

30

PerformanceSnapshot

Bonfire Monitor

Storage System

Warmup Metadata

BufferCache

LoggingVolume

I/O

I/O Warmup Data

In-memoryStaging Buffer

Data Volumes

123

4

n

5...

In-memoryStaging Buffer

PerformanceSnapshot

256KB & 128MB19MB to 71MB

4.6MB/s to

238MB/s9.5GB to 36GB

9KB/s to

476KB/s

Page 31: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Overhead Results

31

PerformanceSnapshot

Bonfire Monitor

Storage System

Warmup Metadata

BufferCache

LoggingVolume

I/O

I/O Warmup Data

In-memoryStaging Buffer

Data Volumes

123

4

n

5...

Warmup Data

NewCache

256KB & 128MB19MB to 71MB

2 to 20 minutes

PerformanceSnapshot

Proper Bonfire scheme and configuration

4.6MB/s to

238MB/s9.5GB to 36GB

9KB/s to

476KB/sIn-memory

Staging Buffer

Page 32: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

• Faster cache warmup▫ 59% to 100% improvement over on-demand

• Less storage server I/O load▫ 38% to 200% more reduction than on-demand

• Better application-perceived latency▫ Avg read latency 1/5 to 2/3 of on-demand

• Small controllable overhead

Summary of Results

32

Page 33: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

On-demand warmup doesn’t work anymore▫ Warm up terabytes of caches take days

Bonfire and beyond▫ Client-side cache warmup

▫ Application-aware warmup

▫ …

In need for more long big public traces !

Conclusion

33

Page 34: Warming up Storage-level Caches with Bonfire · Warming up Storage-level Caches with Bonfire. Does on-demand cache warmup still work ? 2. 0.0 0.1 1.0 16.0 256.0 4096.0 65536.0 1990

Thank You

Questions?

34

http://wisdom.cs.wisc.edu/home

http://research.cs.wisc.edu/adsl