University of Michigan Near-Optimal Latency Versus Cost Mosharaf Chowdhury, Harsha … ·...

Preview:

Citation preview

Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed StorageMuhammed Uluyol, Anthony Huang, Ayush Goel,Mosharaf Chowdhury, Harsha V. Madhyastha

University of Michigan

Web Server

Data Site

Distribute Web Servers for Interactive Latency

2

Data Site

Distribute Data for Availability

3

Web Server

Data Site

Distribute Data for Availability and Latency

4

Web Server

Linearizability Imposes Unavoidable Trade-offs

5

● Read vs Write Latency

● Read Latency vs Cost

Linearizability Imposes Unavoidable Trade-offs

6

● Read vs Write Latency

● Read Latency vs Cost

Read

Overlap ensures consistency

Write

Linearizability Imposes Unavoidable Trade-offs

7

● Read vs Write Latency

● Read Latency vs Cost

Read

Overlap ensures consistency

Write

Linearizability Imposes Unavoidable Trade-offs

8

● Read vs Write Latency

● Read Latency vs Cost

Read

Write

Linearizability Imposes Unavoidable Trade-offs

9

● Read vs Write Latency

● Read Latency vs Cost

Linearizability Imposes Unavoidable Trade-offs

10

● Read vs Write Latency

● Read Latency vs Cost

Linearizability Imposes Unavoidable Trade-offs

11

● Read vs Write Latency

● Read Latency vs Cost

Linearizability Imposes Unavoidable Trade-offs

12

● Read vs Write Latency

● Read Latency vs Cost

How Do Existing Approaches Perform?

13

Low

est P

ossi

ble

Rea

d La

tenc

y

StorageOverhead Budget

Write Latency Budget

0

How Do Existing Approaches Perform?

14

Better

0

Low

est P

ossi

ble

Rea

d La

tenc

y

Write Latency BudgetStorage

Overhead Budget

How Do Existing Approaches Perform?

15

0

Low

est P

ossi

ble

Rea

d La

tenc

y

Write Latency BudgetStorage

Overhead Budget

Better

How Do Existing Approaches Perform?

16

Better

Better

0

Low

est P

ossi

ble

Rea

d La

tenc

y

Write Latency BudgetStorage

Overhead Budget

● EPaxos: state-of-the-art geo-replication protocol

How Do Existing Approaches Perform?

17

● EPaxos: state-of-the-art geo-replication protocol

● Compare with estimate of theoretical lower bound

How Do Existing Approaches Perform?

18

● EPaxos: state-of-the-art geo-replication protocol

● Compare with estimate of theoretical lower bound○ No particular protocol○ Respects consistency and

fault-tolerance constraints

How Do Existing Approaches Perform?

19

● EPaxos: state-of-the-art geo-replication protocol

● Compare with estimate of theoretical lower bound○ No particular protocol○ Respects consistency and

fault-tolerance constraints

How Do Existing Approaches Perform?

20

2⨉ storage forsame latency

2⨉ readlatency

● EPaxos: state-of-the-art geo-replication protocol

● Compare with estimate of theoretical lower bound○ No particular protocol○ Respects consistency and

fault-tolerance constraints

How Do Existing Approaches Perform?

21

2⨉ storage forsame latency

2⨉ readlatency

Core problem: replicationEach site stores a full copy

● Each site stores 1/kth of the data

● RS-Paxos: Paxos on erasure-coded data

Lowering Cost with Erasure Coding Utility of RS-Paxos

22

● Each site stores 1/kth of the data

● RS-Paxos: Paxos on erasure-coded data

Lowering Cost with Erasure Coding

23

1.5⨉ write latency for same read

1.5⨉ latency at same storage

● Each site stores 1/kth of the data

● RS-Paxos: Paxos on erasure-coded data

Lowering Cost with Erasure Coding

24

1.5⨉ write latency for same read

1.5⨉ latency at same storage

RS-Paxos Limitations

● Two-round writes● k-site intersection between quorums

Recap of the Problem

● Want to spread data across DCs, but constraints that impose trade-offs

● State-of-the-art falls short of the optimal

● Use erasure coding → hurts latency

25

● Two-round writesApproximates latency of one-round writes

● k-site intersection between quorums1-site intersection (common-case)

Pando: Near-Optimal Trade-off

26

Paxos Review

27

Paxos Review

● 2-Phase writes: first become leader

I am leader Ack.

30 ms 30 ms

28

Paxos Review

● 2-Phase writes: first become leader, then write

I am leader Ack. Write

data Ack.

30 ms 30 ms 30 ms 30 ms

29

Paxos Review

● 2-Phase writes: first become leader, then write

I am leader Ack. Write

data Ack.

30 ms 30 ms 30 ms 30 ms

30

One-round write protocol

Quickly Executing 2-Phase Writes

● Step 1: faster Phase 1○ Flexible Paxos [OPODIS’16]: need Phase 1, 2 quorums to intersect○ Phase 1 quorums need not overlap

Write data

30 ms

Ack.

30 ms

Ack.

10 ms

I am leader

10 ms

31

Quickly Executing 2-Phase Writes

● Step 1: faster Phase 1● Step 2: overlap latency cost of Phase 1 with Phase 2

○ RPC Chains [NSDI’09]: start Phase 2 at a delegate

32

30 ms

Ack.Write data

15 ms

Ack.

10 ms

I am leader

10 ms

● Two-round writesApproximates latency of one-round writes

● k-site intersection between quorums1-site intersection (common-case)

Pando: Near-Optimal Trade-off

33

Write to All

Phase 2

34

k=2

Write to All, Wait for Quorum

Phase 2

35

k=2

v2

v2

v2

v2

Write to All, Wait for Quorum

Phase 2

36

v2

v2

v1

v2

Read

Rare Case Common Casek=2

Read

v2

v2

v2

v2

Achieving 1-Site Intersection

Phase 2

37

v2

v2

v1

v2

Read

Rare Case Common Casek=2

Read

Maybea write finished

v2

v2

v2

v2

Achieving 1-Site Intersection

Phase 2

38

v2

v2

v1

v2

Read

Rare Case Common Casek=2

Read

Maybea write finished

● Two-round writesApproximates latency of one-round writes

● k-site intersection between quorums1-site intersection (common-case)

Pando: Near-Optimal Trade-off

39

● Two-round writesApproximates latency of one-round writes

● k-site intersection between quorums1-site intersection (common-case)

Pando: Near-Optimal Trade-off

40

See paper:● Correctness● Bounding latency under conflicts

Evaluation: Proximity to Lower Bound

● Access set: DCs hosting web servers reading/writing data

● MIP solver selects data sites to minimize latency

● 500 access sets

41

Sample access set

Measure gap

Pando is Close to the Lower Bound

42

11⨉ higher

13⨉ higher

Pando is Close to the Lower Bound

43

8⨉ higher

11⨉ higher

13⨉ higher

Pando is Close to the Lower Bound

44

8⨉ higher

11⨉ higher

13⨉ higherPotential for true 1-round writes

Pando is Close to the Lower Bound

45

8⨉ higher

11⨉ higher

13⨉ higherPotential for true 1-round writes

● Cloud deployment confirms solver latency estimates

● Up to 46% cost ($) savings

Conclusion

● Pando: linearizability across geo-distributed DCs

● Achieves a near-optimal read–write–storage trade-off○ Allow for erasure-code data to minimize cost○ Rethink how to use Paxos in the wide-area setting

46

Backup Slides

47

Deployment Latency

48

Latency Under Conflicts

49

Contributions of Each Technique

50

Throughput

51

Read Latency After Failure

52

Recommended