View
221
Download
2
Tags:
Embed Size (px)
Citation preview
EEC 688/788EEC 688/788Secure and Dependable ComputingSecure and Dependable Computing
Lecture 14Lecture 14
Wenbing ZhaoWenbing ZhaoDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering
Cleveland State UniversityCleveland State University
[email protected]@ieee.org
22
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
OutlineOutline• Group communication systems
– Ordered multicast– Techniques to implement ordered multicast– Group membership service– Agreed and safe delivery
• Checkpointing and recovery• Reference:
– Reliable distributed systems, by K. P. Birman, Springer; Chapter 14-16
33
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Group Communication SystemGroup Communication System
• Services provided by the GCS– Membership service: who is up and who is down
• Deals with failure detection and more
– Reliable, ordered, multicast service• FIFO, causal, total
– Virtual synchrony service• Virtual synchrony synchronizes membership change with
multicasts
• GCS is often used to build fault tolerant systems
44
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Reliable MulticastReliable Multicast• Reliable multicast – the message is targeted to
multiple receivers, and all receivers receive the message reliably– Positive or negative acknowledgement– Need to avoid ack/nack implosion
• Distinguish receiving from delivery!
Application
Middleware
Receiving
Delivering
55
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Ordered Reliable MulticastOrdered Reliable Multicast• Ordered reliable multicast – if many messages are
multicast by many senders, in what order the messages are delivered at the receivers?– First in first out (FIFO)– Causal – the causal relationship among msgs preserved– Total – all msgs are delivered at all receivers in the same
order
66
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
FIFO Ordered MulticastFIFO Ordered Multicast
• FIFO or sender ordered multicast:Messages are delivered in the order they were sent (by any single sender)
p
q
r
s
a
b c d
e
delivery of c to p is delayed until after b is delivered
77
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b
88
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b cdelivery of c to p is delayed until after b is delivered
99
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b c
e
delivery of c to p is delayed until after b is deliverede is sent (causally) after b
1010
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Causally Ordered MulticastCausally Ordered Multicast
• Causal or happens-before ordering:If send(a) send(b) then deliver(a) occurs before deliver(b) at common destinations
p
q
r
s
a
b c d
e
delivery of c to p is delayed until after b is delivereddelivery of e to r is delayed until after b&c are delivered
1111
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Totally Ordered MulticastTotally Ordered Multicast
• Total ordering:Messages are delivered in same order to all recipients (including the sender)
p
q
r
s
a
b c d
e
all deliver a, b, c, d, then e
1212
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Implementing Total OrderingImplementing Total Ordering
• Use a token that moves around– Token has a sequence number– When you hold the token you can send the next burst
of multicasts
• Use a sequencer to order all multicast– Message is first multicast to all, including the
sequencer; then the sequencer determines the order for the message and informs all
– Or send to the sequencer and the sequencer multicast with total order information
– Each sender can take turn to serve as the sequencer
1313
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Group membership serviceGroup membership service
• Input:– Process “join” events– Process “leave” events– Apparent failures
• Output:– Membership views for group(s) to which those
processes belong
1414
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Issues?Issues?
• The service itself needs to be fault-tolerant– Otherwise our entire system could be crippled
by a single failure!– Hence Group Membership Service (GMS)
must run some form of protocol (GMP)
1515
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
ApproachApproach
• Assume that GMS has members {p,q,r} at time t• Designate the “oldest” of these as the protocol
“leader”– To initiate a change in GMS membership, leader will
run the GMP– Others can’t run the GMP; they report events to the
leader
1616
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
GMP ExampleGMP Example
• Example:– Initially, GMS consists of {p,q,r}– Then q is believed to have crashed
p
q
r
1717
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Unreliable Failure DetectionUnreliable Failure Detection
• Recall that failures are hard to distinguish from network delay– So we accept risk of mistake– If p is running a protocol to exclude q because
“q has failed”, all processes that hear from p will cut channels to q
• Avoids “messages from the dead”
– q must rejoin to participate in GMS again
1818
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Basic GMPBasic GMP• Someone reports that “q has failed”• Leader (process p) runs a 2-phase commit
protocol– Announces a “proposed new GMS view”
• Excludes q, or might add some members who are joining, or could do both at once
– Waits until a majority of members of current view have voted “ok”
– Then commits the change
1919
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
GMP ExampleGMP Example
• Proposes new view: {p,r} [-q]• Needs majority consent: p itself, plus one more (“current”
view had 3 members)• Can add members at the same time
p
q
r
Proposed V1 = {p,r}
V0 = {p,q,r}OK
Commit V1
V1 = {p,r}
2020
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Special Concerns?Special Concerns?
• What if someone doesn’t respond?– P can tolerate failures of a minority of
members of the current view• New first-round “overlaps” its commit:
– “Commit that q has left. Propose add s and drop r”
– P must wait if it can’t contact a majority• Avoids risk of partitioning
2121
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
What If Leader Fails?What If Leader Fails?
• Here we do a 3-phase protocol– New leader identifies itself based on age ranking
(oldest surviving process)– It runs an inquiry phase
• “The adored leader has died. Did he say anything to you before passing away?”
• Note that this causes participants to cut connections to the adored previous leader
– Then run normal 2-phase protocol but “terminate” any interrupted view changes leader had initiated
2222
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
GMP ExampleGMP Example
• New leader first sends an inquiry• Then proposes new view: {r,s} [-p]• Needs majority consent: q itself, plus one more (“current”
view had 3 members)• Again, can add members at the same time
p
q
r
Proposed V1 = {r,s}
V0 = {p,q,r}OK
Commit V1
V1 = {r,s}
Inquire [-p]
OK: nothing was pending
2323
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Safe and Agreed DeliverySafe and Agreed Delivery
• For totally ordered reliable multicast, there are two delivery policies– Safe delivery: a message is delivered only
when all correct processes have received it– Agreed delivery: a message is delivered as
long as it is the next message in total order
2424
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Safe and Agreed DeliverySafe and Agreed Delivery
• Safe delivery guarantees the uniformity of multicast:– If a message is delivered to any process, it is
delivered by all correct processes
• Agreed delivery does not: – It is possible that a message is delivered in
one (or more) process, but is not delivered by some correct process
2525
Checkpointing and RecoveryCheckpointing and Recovery
• Faults occur over time. How to ensure a fault tolerant system remain operational for extensive period of time?– Recover failed replicas, or replace failed
replicas with new one => Recovery is needed
• How to recover a failed replica or install a new replica?– Checkpointing a correct replica and transfer
the state to the recovering replica
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
2626
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
CheckpointingCheckpointing
• Checkpointing: the act of taking a snapshot of an entity so that we can restore it later
• A replica is a process running in an operating system. The state of a process– Processes' memory, stack and registers– Threads – Open or mmap'ed files – Current working directory– Interprocess communication:
• Semaphores, shared memory, pipes, sockets
– Dynamic Load Libraries – …
2727
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
CheckpointingCheckpointing
• Many tools are available to perform checkpointing transparently or semi-transparently– http://www.checkpointing.org/– Condor, libckpt, etc.– Checkpoints taken in general are not portable– Checkpoint size might be big
2828
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Checkpointing of Application StateCheckpointing of Application State
• Sometimes it is more efficient to save and store the application state only – Checkpoints can be very portable and compact in size– class Counter {
int counter; Counter(int initVal) { counter = initVal; }
void increment() {counter++; } void decrement() {counter--; } void setState(int c) {counter = c; }
int getState() { return counter;}|}
2929
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
LoggingLogging
• Logging of messages– Checkpointing in general is expensive– Logging of messages is cheaper
=> we can periodically do checkpointing, or do checkpointing on demand and log all messages in between
• Logging of other non-deterministic activities– Access order to shared data
3030
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Roll-Forward RecoveryRoll-Forward Recovery
• With replication in space, it is possible to recover a fault while the system is progressing ahead
• Roll-forward recovery is made possible by– Checkpointing of replica state– Logging of incoming messages– Reliable, totally ordered group communication
system
3131
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Roll-Forward RecoveryRoll-Forward Recovery
• We want to ensure the newly admitted replica to have a consistent state with others when it starts
• Steps of adding a new replica into a group (with on-demand checkpointing)– A recovered (or a new) replica joins a group– A join message is multicast in total order– On receiving the join message, it is put into incoming
message queue and wait for processing– When the join message is at the head of the queue, a
checkpoint is taken and it is transferred to the new replica
3232
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Roll-Forward RecoveryRoll-Forward Recovery
– At the new replica, it starts queueing messages after it receives the join messages (sent by itself)
– When the checkpoint is received by the new replica, its state is restored using the received checkpoint (the checkpoint is delivered out of order!)
– The queued messages are delivered in order, at the new replica
– Other replicas do not stop and wait for the new replica
• Steps of adding a new replica into a group with periodic checkpointing is similar
3333
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Steps of Roll-Forward Steps of Roll-Forward RecoveryRecovery
ExistingReplica
Recovery_Start Recovery_Start
Recovery_Start
Recovery_Starttriggers queueingof messages
Recovery_Startis queued, just like a regular message
Checkpoint
op1
op2Recovery_Start
New or restartedReplica
(i)
3434
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Steps of Roll-Forward Steps of Roll-Forward RecoveryRecovery
op3
op3
op3
op3
Recovery_S tart
ExistingReplica
New messageop3 is queuedwhile waiting forreply of op2
Checkpoint
op1
op2
Recovery_S tart
New or restartedReplica
(ii)
3535
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Steps of Roll-Forward Steps of Roll-Forward RecoveryRecovery
ExistingReplica
op3
op3
Recovery_S tart
Loggedmessages beforeRecovery_Startare consolidatedand multicast
op2returns
Checkpoint
op1
op2Recovery_S tart
New or restartedReplica
(iii)
3636
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
Steps of Roll-Forward Steps of Roll-Forward RecoveryRecovery
ExistingReplica
op3 op3
New or restartedReplica
Normal operationis resumed andqueued messages aredelivered
O utgo ing messages as a result of opera tions beforeRecovery_Start aresuppressed
Transferredlog is expandedand the checkpo int is app lied
Checkpoint Checkpoint
op1 op1op2 op2
(iv)
3737
Roll-backward RecoveryRoll-backward Recovery• Roll-backward recovery is used for systems
relying on replication in time for fault tolerance– When a failure occurs, roll back using the most recent
checkpoint (and retry)
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao
3838
Roll-backward Recovery in a Roll-backward Recovery in a Distributed SystemDistributed System
• Performing roll-backward recovery in a distributed system is non-trivial– Need to solve the distributed snapshot problem– It is easy to perform a local checkpoint of a process,
but in a distributed system, when one process rolls back, other processes must also roll back to a consistent state
04/18/2304/18/23EEC688/788: Secure & Dependable EEC688/788: Secure & Dependable
ComputingComputing Wenbing ZhaoWenbing Zhao