Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Performance-Optimal Read-only Transactions
Haonan Lu★Siddhartha Sen✢, Wyatt Lloyd★
1
★Princeton University, ✢Microsoft Research
Distributed Storage SystemsEnable Today’s Web Services
2
Storage
Web
Jack
Mia
Jack’s Page
Friend Lists
Mia’s Page
LoadPage
FriendJack
Read
Write
Distributed Storage SystemsReads Dominate Workloads
3
Storage
Web
Jack
Mia
Jack’s Page
Friend Lists
Mia’s Page
LoadPage
FriendJack
Reads
Writes
Distributed Storage SystemsSimple Reads Are Insufficient
4
Storage
Web
Jack
Mia
Jack’s Page
Friend Lists
Mia’s Page
LoadPage
UnfriendMia
Read
Friends✔
Unfriended
New Page
New Page
Read
New Page
New page
Read-Only Transactions
5
• A group of simple reads sent in parallel
• Do not write data– Writes are allowed in the system
• Coordinate a consistent view across shards
Coordination overhead causes higher latency and lower throughput
6
Read-only transactionperformance as close
as possible to simple reads
Goal:
7
Read-only transactionperformance as close
as possible to simple reads
Goal:
We answer:• What does optimal performance mean for read-only transactions?
• When is optimal performance achievable?
• How can we design performance-optimal read-only transactions?
Performance FactorsEngineering vs. Algorithmic
8
EngineeringFactors
• Equally impact simple reads and read-only transactions
• Abstract engineering factors by comparing to simple reads
Hardware
Networking
Batching
Coordination AlgorithmicProperties
…
• Focus on the algorithmic properties due to coordination
9
Page
FriendsSimpleRead
Blocking
SimpleRead
Performance FactorsAlgorithmic Properties
AlgorithmicProperties
10
Page
FriendsSimpleRead
Messages
Blocking
SimpleRead
AlgorithmicProperties
Performance FactorsAlgorithmic Properties
11
Page
FriendsSimpleRead
Messages
Blocking
Metadata
SimpleRead
Timestamp
AlgorithmicProperties
Timestamp
Performance FactorsAlgorithmic Properties
12
SimpleRead
More Messages
Blocking
Metadata
SimpleRead
CoordinationOverhead
AlgorithmicProperties
Performance FactorsCoordination Is Algorithmic
13
SimpleRead Simple
Read
Performance-optimalRead-only
Transactions(N,O,C)
Blocking
Metadata
N
OC
Messages
Read-Only TransactionsOptimal Performance
AlgorithmicProperties
Non-Blocking Reads• Do not wait on external events– Distributed locks, timeouts, messages, etc.
• Lower latency– Avoid any time spent blocking
• Higher throughput– Avoid CPU cost of context switches
14
One-Round Communication• One-round on-path reads– Succeed in one round, i.e., no retries
• No off-path messages– Required by reads but off the critical path
• Lower latency– Avoids time for extra on-path messages
• Higher throughput– Avoids CPU cost of processing extra messages
15
Constant Metadata• Metadata– Information used to find a consistent view– Timestamps, transaction IDs, etc.
• Size of metadata remains constant regardless of contention
• Higher throughput– Avoids CPU cost of processing extra data
16
17
Performance-optimal read-only transactions are NOC:
Non-blocking messagesthat complete in One-round with Constant metadata
Strict Serializability• The strongest consistency model–Writing applications made easy
• Requires a total order + real-time order
18
Page
Friends
JackAdd Mia
New Page
New
Mia✔Mia
Done
DoneRead
ReadFriends✔
NewPage
19
The NOCS Theorem: Impossible for read-only
transaction algorithms to achieve performance-optimality [N,O,C]
and strict serializability [S]
Proof Intuition of NOCS
20
Svr-1
now
stable unstable
Coordination
Free
Coordination
Required
Finalized Write
Unfinalized WriteSvr-2
Svr-3
Svr-4
21
now
stable unstable
?
ROTXN
?
Must give up either N, O, or C
Svr-1
Svr-2
Svr-3
Svr-4
Proof Intuition of NOCS
NOC Designs
22
Weak Consistency
Strict Serializability
Process-order Serializability
Read Committed
Causal
Strong
Wea
k
MySQL Cluster
By the NOCS Theorem
Our new design: PORT
23
now
Svr-1
Svr-2
Svr-3
Svr-4Stable
Frontier(SF)
Design InsightCapturing the Stable Frontier
stable unstable
• A type of logical clock– Specialized for distributed storage systems
• Treat reads and writes differently– Enable optimizations for reads and writes
• Capture the stable frontier
24
Version Clock
25
StorageServer
WebClient
PORT Overview
Jack
26
Key A[AX]0 [AY]1 [AZ]2
Version Clock
PORT Overview
Jack
27
Key A[AX]0 [AY]1 [AZ]2
Version Clock
VersionStamp
(VS)
VS
PORT Overview
Jack
1
28
Key A
Write in PORT
[AX]0 [AY]2
12
Write A := AY
VS = 2
Version clocks tick on writes
“Done”
Jack
29
Key A
Read in Port
[AX]0 [AY]2 [AZ]5
12
Read A = ?VS = 2
No tick on reads
A = AY
Jack
30
Key A12
Read A = ?VS = 2
Read PromotionEnsures a Total Order
[AX]0 [ ? ]2
Jack
31
Key A12
Read A = ?VS = 2
A = AX
Read PromotionEnsures a Total Order
[AX]1 [AX]2[AX]0 Immutable
Jack
32
Key A
Read PromotionEnsures a Total Order
[AX]0à2 [AY]3
Write VS = 2A := AY
“Done”
Mia
12
33
Key A12
Read/Write
SFA = 3
Track Stable Frontier
SFA = 3SFB = 3SFC = 5
SF = ?SF = 3SF Map
3
Advance to stable frontier
[AX]0à2 [AY]3
Mia
34
Key A
13
JackSFA = 3SFB = 3SFC = 5
SF Map
Read-Only Transaction Logic
Key B
ReadA = ?
VS = 3
Read B = ?VS = 3
[AX]0 [AY]3 [AZ]7
[BX]0 [BY]1 [BZ]3
SF = 3
35
Key A
13
JackSFA = 3SFB = 3SFC = 5
SF Map
Read-Only Transaction Logic
Key B
ReadA = ?
VS = 3
Read B = ?VS = 3
[AX]0 [AY]3 [AZ]7
[BX]0 [BY]1 [BZ]3
SF = 3
36
Key A
13
Jack
SF Map
Read-Only Transaction Logic
Key A
[AX]0 [AY]3 [AZ]7
[BX]0 [BY]1 [BZ]3
A = AY, SFA
= 7
B = BZ , SFB = 3
SFB = 3SFC = 5
SFA = 3SFA = 7SF = 3
• Reading at the stable frontier ensures reads are non-blocking (N)
• Client pre-determined snapshot with VS ensures one-round communication (O)
• One VS per read request ensure constant metadata (C)
37
PORT Is NOC
PORT Systems• Scylla-PORT– Base system: ScyllaDB (non-transactional)
• Highly optimized à sensitive to overhead– NOC + Process-ordered serializability– Supports simple writes (not write transactions)
• Eiger-PORT– Base system: Eiger (N, O, C)
• Existing read-only and write transactions– NOC + Causal consistency– Supports write transactions
38
Evaluation of Scylla-PORT• To understand– Overhead in latency and throughput compared to
simple reads– Performance advantages compared to other
protocols, e.g., OCC.
• Experiment configuration– YCSB benchmark with customized parameters for
skew and read-to-write ratios– Evaluated latency, throughput, scalability, freshness
39
40
Latency-ThroughputUniform, 5% Writes
0.5
1.5
2.5
3.5
4.5
5.5
0 50 100 150 200 250
AverageLatency(ms)
Throughput (K Txn/s)
Scylla-PORTScylla-OCC
ScyllaDBHigherThroughput
LowerLatency
41
00.51
1.52
2.53
3.5
0 50 100 150 200
AverageLatency(ms)
Throughput (K Txn/s)
Scylla-OCCScylla-PORT
ScyllaDB
8%
Latency-ThroughputZipf = 0.99, 5% Writes
Conclusion• Performance-optimal read-only transactions: NOC
• The NOCS Theorem for read-only transactions– Impossible to have all of the NOCS properties
• The design of PORT– NOC with the strongest consistency to date
• Scylla-PORT– Minimum performance overhead compared to simple reads– Significantly outperforms the standard OCC
42
1
Contact InformationHaonan Lu