View
224
Download
0
Category
Preview:
Citation preview
8/7/2019 Replication Management using the State-Machine Approach
1/36
Replication Managementusing the State-Machine
ApproachFred B. Schneider
Summary and Discussion :
Hee Jung Kim and Ying Zhang
October 27, 2005
8/7/2019 Replication Management using the State-Machine Approach
2/36
Introduction State Machines
Fault Tolerance Fault-tolerant State Machines
Tolerating Faulty Output Devices
Tolerating Faulty Clients
Using Time to Make Request
Reconfiguration
8/7/2019 Replication Management using the State-Machine Approach
3/36
Why Replication ? Two kinds of replication are ..
State machine Approach is .. What can be discussed in each
sections
Introduction
8/7/2019 Replication Management using the State-Machine Approach
4/36
Ageneral method for implementing a fault-tolerant service by replicating servers and
coordinating client interactions with serverreplicas.
State-Machine Approach
8/7/2019 Replication Management using the State-Machine Approach
5/36
State machine consist of- State Variables- Commands.
Command might be implemented by
- Sharing data amongst procedures,
- Queuing requests
- Using interrupt handlers.
State Machines
8/7/2019 Replication Management using the State-Machine Approach
6/36
Requests from clients processed incausal order. O1: Requests issued by a single client
processed by smin the order they areissued
O2: r1could have caused r2 => r1processed by
smbefore r2
Assumption !
8/7/2019 Replication Management using the State-Machine Approach
7/36
Outputs of a state machine arecompletely determined by the
sequence of requests it processes,independent of time or any other
activity of a system
Semantic Characterization
8/7/2019 Replication Management using the State-Machine Approach
8/36
monitor: processdo true -> val:= sensor;
;delay D
odend monitor
Is this a state machine ?pc: state-machine
var q:real;
adjust: command(sensor-val: real)q := F(q, sensor-val);send q to actuator
end adjustend pc
No !!
Yes !!
8/7/2019 Replication Management using the State-Machine Approach
9/36
Byzantine failures:arbitrary and malicious
Failstop failures:other components [can] detect
that a failure has occurred
Fault Tolerance
8/7/2019 Replication Management using the State-Machine Approach
10/36
A system consisting of a set of distinctcomponents is t fault-tolerant if it
satisfies its specification provided thatno more than t of those componentsbecome faulty during some interval of
interest.
T Fault-Tolerance
8/7/2019 Replication Management using the State-Machine Approach
11/36
Replicate State Machines and run onseparate processors.
Each replica
Starts in the same initial state Executes same requests in the same order
Assuming independent failure Combine outputs of the replicas of thisensemble.
Fault-tolerant SM
8/7/2019 Replication Management using the State-Machine Approach
12/36
Replica CoordinationAll replicas receive and process the same
sequence of requests. Agreement :
Each Non-Fault replica receives every
request. Order : Each Non-Fault replica processes
the requests in the same relative order.
Fault-tolerant SM
8/7/2019 Replication Management using the State-Machine Approach
13/36
Any protocol that allows a designatedprocessor called the transmitterso that
IC1: All non-faulty processors agree on the
same value.
IC2: If the transmitter is non-faulty, thenall non-faulty processors use its value as theone on which they agree.
Agreement
8/7/2019 Replication Management using the State-Machine Approach
14/36
Order requirement can be satisfied by
Assigning unique ids to requests.
Processing the requests according to atotal ordering on the unique ids.
Order and Stability
8/7/2019 Replication Management using the State-Machine Approach
15/36
Order Implementation
A replica next processes the stablerequest with smallest unique ids.
Using Logical Clocks. Synchronized Real-Time Clocks.
Using Replica-Generated Identifiers.
8/7/2019 Replication Management using the State-Machine Approach
16/36
Using Logical Clocks
A logical clock is a mapping T fromevents to the integers.
LCl: Tp is incremented after eachevent at P.
LC2: Upon receipt of a message -with
timestamp ts, process p resets Tp,:Tp := max(Tp, ts) + 1.
8/7/2019 Replication Management using the State-Machine Approach
17/36
Using Logical Clocks
Assumption to property ofcommunication channels. FIFO channels between processors
Failure Detection Assumption (for fail-stop processors) : A processor pdetects
that a fail-stop processor qhas failed onlyafter phas received the last message sentto p by q.
8/7/2019 Replication Management using the State-Machine Approach
18/36
Logical Clocks Stability Test
Every client periodically makes some-possibly null-request to the state machine.
Request stable at smiif a request withlarger timestamp has been received fromevery client running on a non-faultyprocessor.
8/7/2019 Replication Management using the State-Machine Approach
19/36
Synchronized Real-time Clocks Tp(e) : the real-time clock at processor p
when event eoccurs.Unique id : Tp(e) appended by fixed bitstring that uniquely identifies p.
- O1 satisfied if only one request in betweensuccessive clock ticks
- O2 satisfied if degree on synchronizationis better than the minimum messagedelivery time.
8/7/2019 Replication Management using the State-Machine Approach
20/36
Synchronized Real-time Clocks
(contd) Real-time Clock Stability Test I
ris stable at smiexecuted at pif the local clock
at preads tsand uid(r) < ts td
Real Clock Stability Test IIris stable at smiif a request with larger uid has
been received from every client.
8/7/2019 Replication Management using the State-Machine Approach
21/36
Using Replica-Generated Ids. Unique ids assigned by the replicas
Two phase protocol
Replicas propose candidate unique ids
One candidate is selected Elaboration of the protocol
Seen : smihas seen ronce it has received rand
proposed a candidate unique id for it. Accepted: smihas accepted ronce it knows thefinal choice of uid(r).
8/7/2019 Replication Management using the State-Machine Approach
22/36
Using Replica-Generated Ids.
Constraints on the proposed
ids(cuid(smi,r)) UID1: cuid(smi,r) < = uid(r)
UID2: if rSEEN at smiafter rhas been
accepted then uid(r) < cuid(smi,r) Replica-Generated Id Stability Test:
rthat has been accepted by smiis stable providedthere is no request rthat has
i) Been seen by smi
ii) Not been accepted by smiiii) cuid(smi,r) < = uid(r)
8/7/2019 Replication Management using the State-Machine Approach
23/36
Using Replica-Generated Ids.
Replica-generated Unique Identifiers :smimaintains
SEENi: largest cuid(smi,r) so far assigned by smi ACCEPT i: largest uid(r) so far assigned by smi onreceipt of r cuid(smi,r) = max( ) + 1+ i
Disseminates cuid(smi,r) to other replicas, awaitsreceipt of a candidate uid from every non-faultyreplica.
uid(r) = maxj(cuid(smi,r))
8/7/2019 Replication Management using the State-Machine Approach
24/36
Outputs used outside system :
Use replicated voters and output devices.
Outputs used inside system :
the client need not gather a majority of
responses to its request to the state
machine. It can use the single responseproduced locally.
Tolerating Faulty Output Devices
8/7/2019 Replication Management using the State-Machine Approach
25/36
Replicate the client
- However, requires changes to state machinesthat handle requests from that client.
Defensive programming- Sometimes, a client cannot be made
fault-tolerant by using replication.- Careful design of state machine can limit the
effects of requests from faulty clients.
Tolerating Faulty Clients
8/7/2019 Replication Management using the State-Machine Approach
26/36
Assume that- All clients and state machine replicas have
clocks synchronized to within r, and
- Election starts at time strtand known to all clien
ts and state machine replicas. Transmitting a default vote
- If client has not made a request by time strt+ r,
then a request with that clients default vote hasbeen made.
Using Time to Make Request
8/7/2019 Replication Management using the State-Machine Approach
27/36
An ensemble of state machine replicas can tolerate more than t faults if it is possible to remove state machine replicas ru
nning on faulty processors from the ensemble and add replicas running on repairedprocessors.
Reconfiguration
8/7/2019 Replication Management using the State-Machine Approach
28/36
Combining Condition:
P(t) - F(t) > X for all 0
8/7/2019 Replication Management using the State-Machine Approach
29/36
Unbounded total number offaultsis possible if ..
Fl: Byzantine failures, removed faulty replica fromthe ensemble before the Combining Condition is
violated by subsequent processor failures.
F2: Replicas running on repaired processors areadded to the ensemble before the CombiningCondition is violated by subsequent processorfailures.
8/7/2019 Replication Management using the State-Machine Approach
30/36
Configuration
The configurationof the system is defined as:C: The clientsS: The state-machine replicas
O: The output devices
To change system configuration ..
- the value of C,S,O must be available- whenever C,S,O added, state must be updated
8/7/2019 Replication Management using the State-Machine Approach
31/36
Managing Configuration
A non -faulty configurator satisfies ..
C1: Only a faulty element is removedfrom the configuration.C2: Only a non-faulty element is added
to the configuration.
8/7/2019 Replication Management using the State-Machine Approach
32/36
Integration with Failstop Processorsand Logical Clocks
If eis a client or output device, then smi sends the statvariables to before sending any output with ids > rjoin.
If e is a state-machine replica, smnew, then smi:1. sends state variables and copies of any pendingrequests to smnew,2. sends sm
newsubsequent request r received from c
such that uid(r) < uid(rc), where rc is the first requestthat smnew received directly from cafter beingrestarted.
8/7/2019 Replication Management using the State-Machine Approach
33/36
Integration with Failstop Processorsand Realtime Clocks
If eis a client or output device, then smi sends the statevariables to before sending any output with ids > rjoin.
If e is a state-machine replica, smnew, then smi:1. sends state variables and copies of any pendingrequests to smnew,2. sends to sm
newevery request received
during the next interval of duration.
Simplified !!
8/7/2019 Replication Management using the State-Machine Approach
34/36
Stability RevisedWhen requests made by a client can be
received from two sources-the client and viaa relay.The stability test must be changed ..
Stability Test During Restart :
rreceived directly from cby a restarting
smnew is stable only after the last requestfrom crelayed by another processor hasbeen received by smnew
8/7/2019 Replication Management using the State-Machine Approach
35/36
State Machines approach is ..
Coping with failures (Byzantine, Failstop) ..
-. Fault-tolerant State Machines
-. Tolerating Faulty Output Devices
-. Tolerating Faulty Clients
Optimization :
- . Using time to requestDynamic reconfiguration
-. Managing the configuration-. Integrating a repaired object
Summary
8/7/2019 Replication Management using the State-Machine Approach
36/36
Thank you !!!
Any question ???
Recommended