35
2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved. Storage Quality of Service for Enterprise Workloads Tom Talpey Eno Thereska Microsoft

Storage Quality of Service - · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

  • Upload
    dodiep

  • View
    221

  • Download
    7

Embed Size (px)

Citation preview

Page 1: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Storage Quality of Service for Enterprise Workloads

Tom Talpey Eno Thereska

Microsoft

Page 2: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Outline

Problem Statement Classification, Control and Costing Controller and Policies Demo Details Q&A

2

Page 3: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Background: Enterprise data centers

• General purpose applications • Application runs on several VMs

• E.g. a multi-tier service • Separate networks for VM-to-VM traffic and VM-to-Storage traffic

• Storage is virtualized

• Resources are shared

• Filesharing protocols deployed

Switch Switch Switch

S-NIC S-NIC

S-NIC NIC S-NIC NIC

VM VM VM Virtual

Machine vDisk

VM VM VM Virtual

Machine vDisk

2

Page 4: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

The Storage problem

First things first In a datacenter, storage is a shared resource Many tenants (customers)

Contention between tenants in the datacenter Many workloads Many backend device types and throughputs Network resources are also shared Provisioning key to providing cost-effective SLAs

4

Page 5: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

More storage problems

Storage throughput (capacity) changes with time Storage demand also changes with time SLAs do not Variety of storage protocols SMB3, NFS, HTTP, …

5

Page 6: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

SMB3 challenges

Multichannel – Many to Many Many SMB3 sessions (flows) on one channel Many SMB3 channels used for each session

SMB Direct RDMA offload of bulk transfer – stack bypass

Live Migration Memory-to-memory with time requirements Starvation of other flows must be avoided

6

Page 7: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Service Level Agreement (SLA) Examples

Minimum guaranteed (“Not less than…”) Storage bandwidth Storage I/Os per second

Maximum allowed (“Not more than…”) Opportunistic use (“If available…”) Bandwidth/IOPS can use additional resources

when available (and authorized) All limits apply across each tenant’s/customer’s

traffic 7

Page 8: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Storage SLA Challenges

Diverse operations mix E.g. SMB3 Create, Read/Write, metadata Resources, latencies, IO size diversity

Dynamic, diverse workload E.g. database tier vs web frontend vs E-mail Small-random, large-sequential, metadata

Bi-directional Read vs write differing costs, resources

8

Page 9: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Can’t We Just Use Network Rate Limits?

No. Consider: The network flow is a 4-tuple which names only

source and destination Does not distinguish:

Read from write Storage resources such as disk, share, user, etc Even deep packet inspection doesn't help

Can't go that deep – not all packets contain the full context Might be encrypted

SMB multichannel and connection sharing Many-to-many – mixing of adapters, users, sessions, etc

RDMA Control and data separated, and offloaded

9

Page 10: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Classification – Storage “Flow”

1. Classify storage traffic into Flow buckets Pre-defined “tuple” of higher-layer attributes

Can have any number of elements, and wildcards Can therefore apply across many connections, users, etc.

For example (SMB3 and virtual machine): VM identity \\Servername\Sharename Filename Operation type

2. Identify each operation as member of Flow 3. Apply policy to Flow 10

Page 11: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Elements of an IO classification

Simple classifier tuple: { VM4 , \\share\dataset }

VM1

VM2

VM3

Application

VM4

SMBc

Physical NIC

Network driver

Physical NIC

SMBs`

Filesystem

Network driver

Diskdriver

Compute Server Storage Server

GuestOS

Hypervisor

Filesystem

Blockdevice

VHD

One IO path (VM-to-Storage)

Share appears as a block device

Z: (\\share\dataset)

Block device maps to a VHD

\\serverX\RM79.vhd

Maps to packets Dst IP: ServerX’s IP

Dst port: 445

VHD file maps to drive

H:\RM79.vhd

Maps to device \\device\ssd5

Page 12: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Control (Rate Limiting)

Apply limits (max) / guarantees (min) to each flow

Rate-control operations to meet these Control at: Initiator (client) Target (server) Any point in storage stack

12

Page 13: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Rate Limiting along the Flow

Simple classifier tuple: { VM4 , \\share\dataset }

VM1

VM2

VM3

Application

VM4

SMBc

Physical NIC

Network driver

Physical NIC

SMBs`

Filesystem

Network driver

Diskdriver

Compute Server Storage Server

GuestOS

Hypervisor

Filesystem

Blockdevice

VHD

One IO path (VM-to-Storage)Storage points

of control

Network points of control

Page 14: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Control based on simple Classification?

Again, no Taking the example SLA <VM 4, \\share\dataset> --> Bandwidth B With contention from other VMs

Comparing three strategies By “payload” – actual request size By “bytes” – read and write size By “IOPS” – operation rate

14

Page 15: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Classification-only Rate Limit Fairness?

15

VM 1 VM 4

8KB Writes 8KB Reads

By payloads- reads beat writes by dominating queue

VM 2 VM 4

8KB Writes 8KB Reads

By bytes – writes beat reads by dominating (e.g. SSD) expense

VM 3 VM 4

8KB Writes 64KB Reads

By IOPS - large beats small by dominating bandwidth

Page 16: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Solution - Cost

Compute a cost to each operation within a flow, and to a flow itself “Controller” function in the network constructs empirical cost models

based on: Device type (e.g. RAM, SSD, HDD)

Server property Workload characteristics (r/w ratio, size)

“Normalization” for large operations Client property

Bandwidths and latencies Network and storage property

Any other relevant attribute Cost model is assigned to each rate limiter Cost is managed globally across flows by distributed rate limiters

16

Page 17: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Cost of operations along the Flow

Simple classifier tuple: { VM4 , \\share\dataset }

VM1

VM2

VM3

Application

VM4

SMBc

Physical NIC

Network driver

Physical NIC

SMBs`

Filesystem

Network driver

Diskdriver

Compute Server Storage Server

GuestOS

Hypervisor

Filesystem

Blockdevice

VHD

One IO path (VM-to-Storage)

$Client

$Net

$Server

$$SSD

Cost functions

Page 18: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Outline

Problem Statement Classification, Control and Costing Controller and Policies Demo Details Q&A

18

Page 19: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Demo of storage QoS prototype

Four tenants: Red, Blue, Yellow, Green Each tenant

Rents 30 virtual machines on 10 servers that access shared storage

Pays the same amount Should achieve the same IO performance

Tenants have different workloads Red tenant is aggressive: generates more

requests/second

Page 20: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Demo setup

Mellanox ConnectX-3 40Ge RNIC (RoCE) 120 one-Core VMs across 10 Hyper-V servers - Run IoMeter 64K Read from 3GB VHD

MellanoxMSX1036B-1SFR

40Ge (RoCE)

Hyper-V Servers X10

Storage ServerDell R720 384GB

Dell R720 16XCPU

Page 21: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Things to look for

Enforcing storage QoS across 4 competing tenants

Aggressive tenant under control Inter-tenant work conservation

Bandwidth released by idle tenant given to active tenants

Intra-tenant work conservation Bandwidth of tenant’s idle VMs given to its active

VMs

Page 22: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Demo: Red tenant dominates!

Tenant performance is not proportional to its

payment

Meg

aByt

es/s

econ

d

Time (seconds)

By being aggressive, red tenant gets

better performance

Page 23: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Research Prototype: With Storage QoS

Tenant performance is proportional to its payment

Meg

aByt

es/s

econ

d

Time (seconds)

Algorithm notices red

tenant’s performance

Tenants get equal

performance

Page 24: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Demo

24

Page 25: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Behind the scenes: how did it work?

Data-plane Queues + Centralized controller

Programmable queues along the IO path

Controller

IO Packets

...

Queue nQueue 1

Q Q Q Q Q Q Q Q

Q Q Q

High-level SLA

Page 26: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Data plane queues

1. Classification [IO Header -> Queue]

2. Queue servicing [Queue -> <rate, priority, size>]

IO Packets

...

Queue nQueue 1

Page 27: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Controller: Distributed, dynamic enforcement

• SLA needs per-VM enforcement • Need to control the aggregate rate of

VMs 1-4 that reside on different physical machines

• Static partitioning of bandwidth is sub-optimal

<{Red VMs 1-4}, *, * //share/dataset> --> Bandwidth 40 Gbps

27

VM VM VM VM VM VM VM VM

40Gbps

Page 28: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Work-conserving solution

• VMs with traffic demand

should be able to send it as long as the aggregate rate does not exceed 40 Gbps

• Solution: Max-min fair sharing

28

VM VM VM VM VM VM VM VM

Page 29: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Max-min fair sharing (intuition)

Well-accepted notion of allocating a shared resource Basic idea: Maximize the minimum allocation

Formally, VMs [1..n] sharing rate B Demand Di for VM i Max-min fair share f ensures that

if the rate allocated to VM i is Ri = min(f, Di) then ∑𝑅𝑅𝑅𝑅 ≤ 𝐵𝐵

No VM is given more than its

demand

Total allocated rate doesn’t exceed B

Page 30: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Distributed rate limiting algorithm

Allocate rate to VMs based on their demand No VM get a rate higher than its demand VMs with unmet demand get the same rate

VM1 VM2 VM3 VM4 1

2

5 6

Unallocated Rate = 10 Unallocated VMs = 4

Current fair share = 10/4 = 2.5

2.5

Aggregate rate = 10

Unallocated Rate = 7 Unallocated VMs = 2

Current fair share = 7/2 = 3.5

3.5

1 2

VM1 VM2

3.5

3.5

VM3

VM4

VM

Dem

and Algorithm achieves a

max-min fair allocation

Page 31: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Max-min fair sharing (recap)

Well studied problem in networks Existing solutions are distributed

Each VM varies its rate based on congestion Converge to max-min sharing

Drawbacks: complex and requires congestion signal

But we have a centralized controller Converts to simple algorithm at controller

31

Page 32: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Controller decides where to enforce

32

SLA constraints Queues where resources shared Bandwidth enforced close to source Priority enforced end-to-end

Efficiency considerations Overhead in data plane ~ # queues Important at 40+ Gbps

Minimize # times IO is queued and distribute rate limiting load

VM VM VM VM VM VM VM VM

Page 33: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Centralized vs. decentralized control

Centralized controller allows for simple algorithms that focus on SLA enforcement and not on

distributed system challenges

Analogous to benefits of centralized control in software-defined networking (SDN)

33

Page 34: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

Summary and takeaways

Storage QoS building blocks Traffic classification Rate limiters Logically centralized controller

34

Page 35: Storage Quality of Service -  · PDF fileStorage Quality of Service for Enterprise Workloads ... E.g. database tier vs web frontend vs E-mail ... SSD, HDD) Server

2014 Storage Developer Conference. © Microsoft Corp. All Rights Reserved.

References / Other Work

Predictable Data Centers http://research.microsoft.com/datacenters/

Storage QoS - IOFlow (SOSP 2013) http://research.microsoft.com/apps/pubs/default.aspx?id=198941

Network + storage QoS (OSDI 2014) http://research.microsoft.com/apps/pubs/default.aspx?id=228983

SMB3 Rate Limiter (Jose Barreto’s blog) http://smb3.info http://blogs.technet.com/b/josebda/archive/2014/08/11/smb3-

powershell-changes-in-windows-server-2012-r2-smb-bandwidth-limits.aspx

35