57
Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al- Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Embed Size (px)

Citation preview

Page 1: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Split-Level I/O Scheduling

Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya,

Anand Krishnamurthy, Rini T Kaushik, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

Page 2: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

2

…yet another I/O scheduling paper?

CFQ (2003)

BFQ (2010)

Deadline (2002)

mClock (2011)

Token-Bucket (2008)Libra (2014)

pClock (2007)

Fahrrad (2008)

YFQ (1999)

Facade(2003)

Page 3: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

3

Some mistakes we have been making for decades…

(in trying to build better schedulers)

Page 4: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

4

• Current frameworks fundamentally limited– CFQ, Deadline, Token-Bucket

• Important policies cannot be realized– Fairness, Latency Guarantee, Isolation

• Wasted effort trying to build new schedulers without fixing the framework

Problem

Page 5: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

5

Can we design a simple and effective framework that lets us build schedulers to correctly realize important I/O policies?

Page 6: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

6

Solution: Split-Level Framework• Control: Allow scheduling at multiple levels

– Block level– System-call level– Page-cache level

• Information: Tag requests to identify the origin

• Simplicity: Small set of hooks at key junctions within the storage stack

Page 7: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

7

Results

• Three distinct policies implemented– Priory, Deadline, Isolation

• Large performance improvements– Fairness: 12x– Tail latency: 4x– Isolation: 6x

• Good foundation for applications– Reduce transaction latency for databases– Improve isolation for virtual machines– Effective rate limit for HDFS

Page 8: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

8

Overview

• How I/O scheduling frameworks work

• Split-Level Scheduling Framework: Design

• Split-Level Scheduler Case Study

• Conclusion

Page 9: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

9

Framework vs. Scheduler

• Framework: A running environment (mechanism)

• Scheduler: Implement different policies

• How it works Framework provides callbacks to schedulers.

Page 10: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

10

Traditional Approach:Block-Level I/O Scheduling

Page Cache

File System

Block-Level Queues

add_req

dispatch_req req_completeBlock-Level Scheduler

App App App

Device

Page 11: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

11

Block-Level I/O Scheduling

Simplified Complete Faire Queuing (CFQ) Implementation:

Block-Level Queues

dispatch_req req_completeBlock-Level Scheduler

Device

add_req

add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}

dispatch_req(){ q = get_high_prio_queue() r = dequeue(q) dispatch(r)}

complete_req(r){//clean up

}

Page 12: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

12

Overview

• What is an I/O scheduling framework

• Split-Level Scheduling Framework: Design

– The reordering problem

– The cause-mapping problem

– The cost-estimation problem

• Split-Level Scheduler Case Study

• Conclusion

Page 13: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

13

Reordering

Scheduling is just reordering I/O requests

Page 14: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

14

File System

Data Entanglement

Block-Level Scheduler

• File system tangles data into one bundle – Journal transaction– Shared metadata block

• Impossible for the schedulers to reorder

App1 App2

Page 15: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

15

File System

Write Dependencies

Block-Level Scheduler

• File systems carefully order writes

• Schedulers cannot reorder (unless FS allows)

App

tx1 tx2

Page 16: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

16

Fundamental Limitation #1(of block-level scheduling)

• The file system imposes ordering requirements contrary to the scheduling goals

• The scheduler cannot reorder

• Too late once data in the file system

– Need admission control

Page 17: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

17

Split-Level I/O Scheduling: Multi-Layer Hooks

Page Cache

File System

Block-Level Queues

add_req

dispatch_req req_complete

Split-Level Scheduler

App App App

Device

write() fsync()avoid data

entanglement and ordering above the file

system

Page 18: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

18

Cause Mapping

A scheduler needs to map an I/O request to the originating application

Page 19: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Write Delegation

Page Cache

Block-Level Scheduler

App1 App2

write() write()

Write-back Daemon

Loss of cause information!

Write-back daemon submits

all requests!

• Write-back, journaling, delayed allocation….

Page 20: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

20

Fundamental Limitation #2(of block-level scheduling)

• Cause-mapping information lost within the framework

• Impossible to map an I/O request back to its originating application

(no matter how you implement the scheduler)

Page 21: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Split-Level I/O Scheduling: Tags

Page Cache

Block-Level Scheduler

App1 App2

write() write()

Write-back Daemon

Tags to identify origin

Tags pass across layers

1 1 21 1 2

Page 22: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

22

Cost Estimation

A scheduler needs to estimate the cost of I/O

– Memory-level notification for timely estimate

– Block-level notification for accurate estimate

– Details in paper

Page 23: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

23

Split-Level I/O Scheduling Framework: Summary

• Three key pieces: – Multiple-layer hooks to prevent adverse file system

interaction – Tags to track causes across layers– Early memory-level notification of write work

• Easy Implementation– ~300 LOC in Linux– Little added complexity for building schedulers

Page 24: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

24

Overview

• How I/O scheduling frameworks work

• Split-Level Scheduling Framework: Design

• Split-Level Scheduler Case Study

• Conclusion

Page 25: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

25

Challenge #1:Priority Scheduler

Fairly allocate I/O resources based on the processes’ priorities

Page 26: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

26

Block-Level: CFQ

goal Workload:Eight processes with different priority (0-7), each sequentially writing its own file

add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}

Page 27: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

27

Block-Level: CFQ

the write-back

thread

add_req(r){ p = r.submit_process q = get_queue(p) enqueue(q,r)}

Page 28: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

28

Split-Level: AFQ

CFQ deviate from the goal by 82%AFQ by 7% 12x improvement

add_req(r){ p = r.tagged_cause q = get_queue(p) enqueue(q,r)}

Page 29: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

29

Challenge #2:Deadline Scheduler

Provide guaranteed latency of I/O requests

Page 30: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Block-Deadline

• Block-Deadline: cannot serve the low-latency requests until previous transaction completed

File System

Block-Deadline

App

tx1 tx2

Page 31: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Block-DeadlineWorkload:

Flush 4KB data to disk with or w/o background writes

Expected Results:

Operation finish within deadline (100ms)

Page 32: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Split-Deadline

• Split-Deadline: suspend write() and fsync() to avoid many high-latency requests to accumulate in one transaction.

File System

Split-Deadline

App

tx1

App

write() fsync()

Write and fsync blocked to prevent high-latency data into FS

Page 33: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

Split-Level: Split-Deadline• Split-Deadline maintains the deadline

regardless of background writes.

Page 34: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

34

The Fsync-Freeze Problem

During checkpointing, the system begins writing out the data that need to fsync()’d so aggressively that the service time for I/O requests from other

processes go through the roof.

---Robert Hass (PostgreSQL)

Page 35: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

35

The Fsync-Freeze Problem

4x tail latency reduction.

Split-Deadline solves the fsync-freeze problem!

Workload: SQLite transaction with different checkpoint interval

Expected Results: Consistent transaction latency

Page 36: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

36

Other Evaluation Results

• Low overhead <1% runtime overhead <50 MB memory overhead

• Other schedulers Token-bucket for performance isolation

• Other applications PostgreSQL: latency guarantee for TPC-B workloads QEMU: provides isolation across VMs HDFS: effective I/O rate limit

Page 37: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

37

Overview

• What is an I/O scheduling framework and how does it work.

• Split-Level Scheduling Framework: Design

• Split-Level Scheduler Case Study

• Conclusion

Page 38: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

38

Conclusion• For decades, people have been trying to

build better block-level schedulers– bound to fail without appropriate framework

support

• Split-level framework enables correct scheduler implementation– Cross-layer tags– Multi-level hooks– Memory-level notification

Source code and more information:

http://research.cs.wisc.edu/adsl/Software/split/

Page 39: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

39

BACKUP SLIDES

Page 40: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea
Page 41: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

41

File System

Write Dependencies

App

Block-Level Scheduler

• Modern file system maintains data consistency by carefully ordering writes.

• Schedulers cannot reorder unless file system allows it.

tx1 tx2

Page 42: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

42

Split-Level I/O Scheduling: Multi-Layer Hooks

• System-call scheduling above the file system to avoid data entanglement.

• Block-level scheduling below the file system to maximize performance.

Page Cache

App AppApp

read() write() fsync()

File System

write-back

Block-Level Queues

add_req

dispatch_req req_complete

Disk SSD

Scheduler

Page 43: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

43

Split-Level I/O Scheduling: Tags

• Write-heavy HDFS workload on a machine with 8GB RAM.

Page 44: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

44

Split-Level I/O Scheduling: Tags

• Write-heavy HDFS workload on a machine with 8GB RAM.

Page 45: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

45

Split-Level Framework Overhead

I/O performance with noop scheduler:

Page 46: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

46

Split-Level I/O Scheduling: Tags

• Write-heavy HDFS workload on a machine with 8GB RAM.

• Worse case memory overhead of tags: 50MB.

Page 47: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

47

Block-Level: Windows

Page 48: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

48

Performance Isolation

Sequential ReaderUnthrottled

A:

Throttled to 10MB/sB:

Page 49: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

49

Real Applications

Page 50: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

50

Page Cache

Write DelegationApp1 App2

write() write()

Block-Level Scheduler

write-back

Loss of Cause

Information!

• The process that submitted the block-level requests may not be the process that issued the I/O.

• Write-back, journaling, delayed allocation….

Page 51: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

51

Page Cache

Split-Level I/O Scheduling: TagsApp1 App2

write() write()

Block-Level Scheduler

write-back

• Use tags to track I/O request across layers and identify the originating application.

• Tags identify a set of processes responsible for an I/O request.

1 1 21 1 2

Page 52: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

52

Myth #1 in I/O Scheduling:

I don’t have to care about I/O scheduling. It is someone else’s problem…

Page 53: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

53

• bottleneck of many systems, from phones to servers.

[…our servers appear to freeze for tens of seconds during disk writes…]

• Foundation of performance isolation. […the interference as a result of competing I/Os remains

problematic in a virtualized environment…]

• Pain points for databases, hypervisors, key-value stores and more.

[…one customer reported that just changing cfq to noop solved

their innoDB IO problems…]

Why Is I/O Scheduling Relevant (to You)

Page 54: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

54

Myth #1 in I/O Scheduling:

I don’t have to care about I/O scheduling. It is someone else’s problem…

Fact #1:

If you care about performance, you should care about I/O scheduling

Page 55: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

55

Myth #2 in I/O Scheduling:

Can’t the disk (or SSD) handle all I/O scheduling?

(Do I still need I/O scheduling in the era of SSD?)

Page 56: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

56

• Device powerless when handed the “wrong”

requests from the OS -- file system may withhold requests

• Devices rely on OS-provided information --lack such mechanisms

• Other common reasons: --more contextual information

--OS-level isolation unit --multi-device I/O scheduling

Why Should OS Do I/O Scheduling

Page 57: Split-Level I/O Scheduling Suli Yang, Tyler Harter, Nishant Agrawal, Samer Al-Kiswany, Salini Selvaraj Kowsalya, Anand Krishnamurthy, Rini T Kaushik, Andrea

58

Myth #2 in I/O Scheduling:

Shouldn’t the disk (or SSD) handle all the I/O scheduling?

Fact #2:

OS has to issue the right request at the right time