EEC 693/793EEC 693/793Special Topics in Electrical EngineeringSpecial Topics in Electrical Engineering
Secure and Dependable ComputingSecure and Dependable Computing
Lecture 13Lecture 13
Wenbing ZhaoWenbing ZhaoDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering
Cleveland State UniversityCleveland State University
[email protected]@ieee.org
22
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
OutlineOutline• Reminder:
– Midterm#2 April 7, Monday
• Event ordering• Group communication systems
– Ordered multicast– Techniques to implement ordered multicast– Membership protocols
• Reference: – Reliable distributed systems, by K. P. Birman,
Springer; Chapter 14-16
33
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Event OrderingEvent Ordering• “Time, Clocks, and the Ordering of Events in
a Distributed System”, by Leslie Lamport, Communications of the ACM, July 1978, Volume 21, Number 7, pp.558-565– What usually matters is not that all processes agree
on exactly what time it is, but rather, that they agree on the order in which events occur
44
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Happens-Before RelationHappens-Before Relation
• Assumptions:– The system is composed of a collection of processes,
each process consists of a sequence of events– The events of a process form a sequence, where a
occurs before b in this sequence if a happens before b– The sending or receiving of a message is an event in a
process
55
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Happens-Before RelationHappens-Before Relation• The happens-before relation “→” on the set of
events of a system is the relation satisfying the following three conditions:– If a and b are events in the same process, and a
comes before b, then a → b– If a is the sending of a message by one process and
b is the receipt of the same message by another process, then a → b
– If a → b and b → c, then a → c
66
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Partial OrderingPartial Ordering• Not all events have the happens-before relationship• Two distinct events a and b are said to be
concurrent if a → b and b → a– Neither event can causally affect the other– This introduces a partial ordering of events in a system
with concurrently operating processes
• “a happens before b” means that information can flow from a to b
• “a is concurrent with b” means that there is no information flow between a and b
77
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
How to Capture the Partial Ordering?How to Capture the Partial Ordering?
• Use logical clocks to capture the partial ordering– Define a clock Ci for each process Pi. Assign a
number Ci(a) to any event a in that process
– The entire system of clocks is represented by the function C which assigns to any event b the number C(b), where C(b) =Cj(b) if b is an event in process Pj
– The clocks Ci are logical clocks rather than physical clocks
88
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Lamport ClockLamport Clock
• A Lamport logical clock is a monotonically increasing software counter
• Each process Pi keeps its own logical clock Ci to apply Lamport timestamps to events
• To capture the happens-before relation →, processes must do the following:– Before each event at Pi: Ci := Ci+1
– When Pi sends a message m, it piggybacks t = Ci
– When Pj receives (m,t): Cj := max(Cj,t) + 1
e → e’ C(e) < C(e’)
99
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Lamport Clock: An ExampleLamport Clock: An Example
a b
c d
e f
m1
m2
21
3 4
51
p1
p2
p3
Physical time
1010
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Group Communication SystemGroup Communication System
• Services provided by the GCS– Membership service: who is up and who is down
• Deals with failure detection and more
– Reliable, ordered, multicast service• FIFO, causal, total
– Virtual synchrony service• Virtual synchrony synchronizes membership change with
multicasts
• GCS is often used to build fault tolerant systems
1111
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Reliable MulticastReliable Multicast• Reliable multicast – the message is targeted to
multiple receivers, and all receivers receive the message reliably– Positive or negative acknowledgement– Need to avoid ack/nack implosion
• Distinguish receiving from delivery!
Application
Middleware
Receiving
Delivering
1212
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Ordered Reliable MulticastOrdered Reliable Multicast• Ordered reliable multicast – if many messages are
multicast by many senders, in what order the messages are delivered at the receivers?– First in first out (FIFO)– Causal – the causal relationship among msgs preserved– Total – all msgs are delivered at all receivers in the same
order
1313
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
FIFO Ordered MulticastFIFO Ordered Multicast
• FIFO or sender ordered multicast:Messages are delivered in the order they were sent (by any single sender)
p
q
r
s
a
b c d
e
delivery of c to p is delayed until after b is delivered
1414
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b
1515
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b cdelivery of c to p is delayed until after b is delivered
1616
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b c
e
delivery of c to p is delayed until after b is deliverede is sent (causally) after b
1717
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b c d
e
delivery of c to p is delayed until after b is delivereddelivery of e to r is delayed until after b&c are delivered
1818
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Totally Ordered MulticastTotally Ordered Multicast
• Total ordering:Messages are delivered in same order to all recipients (including the sender)
p
q
r
s
a
b c d
e
all deliver a, b, c, d, then e
1919
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Implementing Total OrderingImplementing Total Ordering
• Use a token that moves around– Token has a sequence number– When you hold the token you can send the next burst
of multicasts
• Use a sequencer to order all multicast– Message is first multicast to all, including the
sequencer; then the sequencer determines the order for the message and informs all
– Or send to the sequencer and the sequencer multicast with total order information
– Each sender can take turn to serve as the sequencer
2020
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Group membership serviceGroup membership service
• Input:– Process “join” events– Process “leave” events– Apparent failures
• Output:– Membership views for group(s) to which those
processes belong
2121
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Issues?Issues?
• The service itself needs to be fault-tolerant– Otherwise our entire system could be crippled
by a single failure!– Hence Group Membership Service (GMS)
must run some form of protocol (GMP)
2222
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
ApproachApproach
• We’ll assume that GMS has members {p,q,r} at time t
• Designate the “oldest” of these as the protocol “leader”– To initiate a change in GMS membership, leader will
run the GMP– Others can’t run the GMP; they report events to the
leader
2323
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
GMP ExampleGMP Example
• Example:– Initially, GMS consists of {p,q,r}– Then q is believed to have crashed
p
q
r
2424
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Unreliable Failure DetectionUnreliable Failure Detection
• Recall that failures are hard to distinguish from network delay– So we accept risk of mistake– If p is running a protocol to exclude q because
“q has failed”, all processes that hear from p will cut channels to q
• Avoids “messages from the dead”
– q must rejoin to participate in GMS again
2525
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Basic GMPBasic GMP• Someone reports that “q has failed”• Leader (process p) runs a 2-phase commit
protocol– Announces a “proposed new GMS view”
• Excludes q, or might add some members who are joining, or could do both at once
– Waits until a majority of members of current view have voted “ok”
– Then commits the change
2626
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
GMP ExampleGMP Example
• Proposes new view: {p,r} [-q]• Needs majority consent: p itself, plus one more (“current”
view had 3 members)• Can add members at the same time
p
q
r
Proposed V1 = {p,r}
V0 = {p,q,r}OK
Commit V1
V1 = {p,r}
2727
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
Special Concerns?Special Concerns?
• What if someone doesn’t respond?– P can tolerate failures of a minority of
members of the current view• New first-round “overlaps” its commit:
– “Commit that q has left. Propose add s and drop r”
– P must wait if it can’t contact a majority• Avoids risk of partitioning
2828
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
What If Leader Fails?What If Leader Fails?
• Here we do a 3-phase protocol– New leader identifies itself based on age ranking
(oldest surviving process)– It runs an inquiry phase
• “The adored leader has died. Did he say anything to you before passing away?”
• Note that this causes participants to cut connections to the adored previous leader
– Then run normal 2-phase protocol but “terminate” any interrupted view changes leader had initiated
2929
Spring 2008Spring 2008 EEC693: Secure & Dependable ComputingEEC693: Secure & Dependable Computing Wenbing ZhaoWenbing Zhao
GMP ExampleGMP Example
• New leader first sends an inquiry• Then proposes new view: {r,s} [-p]• Needs majority consent: q itself, plus one more (“current”
view had 3 members)• Again, can add members at the same time
p
q
r
Proposed V1 = {r,s}
V0 = {p,q,r}OK
Commit V1
V1 = {r,s}
Inquire [-p]
OK: nothing was pending