Upload
madeline-mcelroy
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
PLATO: Predictive Latency-Aware Total Ordering
Mahesh Balakrishnan
Ken Birman
Amar Phanishayee
Total Ordering
a.k.a Atomic Broadcast delivering messages to a set of nodes
in the same order messages arrive at nodes in different
orders… nodes agree on a single delivery order messages are delivered at nodes in the
agreed order
Modern Datacenters
Applications E-tailers, Finance, Aerospace Service-Oriented Architectures, Publish-
Subscribe, Distributed Objects, Event Notification…
… Totally Ordered Multicast!
Hardware Fast high-capacity networks Failure-prone commodity nodes
Total Ordering in a Datacenter
Inventory ServiceReplica 1
Inventory ServiceReplica 2
Query
Query Update 1
Update 2
Updates are Totally OrderedReplicatedService
Totally Ordered Multicast is used to consistently update Replicated Services
Latency of Multicast System Consistency
Requirement: order multicasts consistently, rapidly, robustly
Multicast Wishlist
Low Latency!
High (stable) throughput Minimal, proactive overheads
Leverage hardware properties HW Multicast/Broadcast is fast, unreliable
Handle varying data rates Datacenter workloads have sharp spikes… and
extended troughs!
State-of-the-Art
Traditional Protocols Conservative Latency-Overhead tradeoff
Example: Fixed Sequencer Simple, works well
Optimistic Total Ordering: deliver optimistically, rollback if incorrect Why this works – No out-of-order arrival in LANs
Optimistic total ordering for datacenters?
PLATO: Predictive Ordering
In a datacenter, broadcast / multicast occurs almost instantaneously Most of the time, messages arrive in
same order at all nodes. Some of the time, messages arrive in
different orders at different nodes. Can we predict out-of-order arrival?
Reasons for Disorder: Swaps
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Receives Sender 1's message after
Sender 2's message
Receives Sender 2's message after
Sender 1's message
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Receiver 1
Sender 1
Switch SwitchReceiver 2
Sender 2
Out-of-order arrival can occur when the inter-send interval betweentwo messages is smaller than the diameter of the network
Typical Datacenter Diameter: 50-500 microseconds
E
D
C
B
A
Order of arrivals into user-space
t
G
F
E
D
C
B
A
Order of arrivals into user-space
t
H
A B
E
D
C
F G
G
F
E
D
C
B
A
Order of arrivals into user-space
t
H
A B C D E H
E
D
C
F G
G
F
C
B
A
Order of arrivals into user-space
t
E
D
Reasons for Disorder: Loss
Datacenter networks are over-provisioned Loss never occurs
in the network Datacenter nodes
are cheap Loss occurs due to
end-host buffer overflows caused by CPU contention
Emulab Testbed (Utah)
Cisco 6509
Cisco 6509Cisco 6509
Cisco 6509
Cisco 6513
1 Gb8 Gb
4 Gb
4 Gb
100 Mb
100 Mb
100 Mb
600 Mhz
850 Mhz
850 Mhz 2 Ghz
Emulab3 test scenario: 3 switches of separationOne-way ping latency:
~110 microseconds
Emulab2 test scenario: 2 switches of separationOne-way ping latency:
~100 microseconds
4 Gb
3 GHz
850 Mhz
100 Mb
The Utah Emulab Testbed
Cornell Testbed
HP
Pro
curv
e 40
00M
HP
Procurve
4000M
HP Procurve 6108
100 Mb 100 Mb1 Gb 1 Gb
Cornell3 test scenario:3 switches of separationOne-way ping latency:
~70 microseconds
HP
Pro
curv
e 40
00M
HP
Procurve
4000MHP Procurve 6108
100 Mb 100 Mb1 Gb 1 Gb
1.3 Ghz
1 Gb Cornell5 test scenario: 5 switches of separationOne-way ping latency:
~110 microseconds
1.3 Ghz
HP Procurve 6108
1 Gb1.3 Ghz
1.3 Ghz
The Cornell Testbed
Disorder: Emulab3
At 2800 packets per sec, 2% of all packet pairs are swapped and 0.5% of packets are lost.
Percentage of swaps and losses goes up with data rate
Disorder
Predicting Disorder
Predictor: Inter-arrival time of consecutive packets into user-space
Why? Swaps: simultaneous multicasts
low inter-arrival time Loss: kernel buffer overflow
sequence of low inter-arrival times
Predicting Disorder
95% of swaps and 14% of all pairs are within 128 µsecs
Inter-arrival time of swaps
Inter-arrival time of all pairs
Cornell Datacenter, 400 multicasts/sec
Predicting Disorder
PLATO Design
Heuristic: If two packets arrive within Δ µsecs, possibility of disorder
PLATO Heuristic + Lazy Fixed Sequencer Heuristic works ~ zero (Δ) latency Heuristic fails fixed sequencer latency
PLATO Design
API: optdeliver, confirm, revoke
Ordering Layer:
Pending Queue: Packets suspected to be out-of-order, or queued behind suspected packets
Suspicious Queue:Packets optdelivered to the application, not yet confirmed
PLATO Design
D
optdeliver(A)optdeliver(E)optdeliver(B)optdeliver(D)
B E A
A
E
D
B
C
TC-TD<DELTA
TE-TA>DELTA
Seq MsgOrder: ABCD
D
B
revoke(D)setsuspect(D)setsuspect(C)
E A
C
E
revoke(E)setsuspect(E)
confirm(A, B, C, D)
suspicious
suspicious
suspicious
pending
pending
pending
Underlined packets in pending are suspected
t
Performance
Fixed Sequencer
PLATO
At small values of Δ, very low latency of delivery but more rollbacks
Performance
Latency of both Fixed-Sequencer and PLATO decreases as throughput increases
Performance
Traffic Spike: PLATO is insensitive to data rate, while Fixed Sequencer depends on data rate
Performance
Δ is varied adaptively in reaction to rollbacks
Latency is as good as static Δ parameterization
Conclusion
First optimistic total order protocol that predicts out-of-order delivery
Slashes ordering latency in datacenter settings
Stable at varying loads Ordering layer of a time-critical
protocol stack for Datacenters