Upload
serena-herriott
View
282
Download
1
Tags:
Embed Size (px)
Citation preview
1
Win XP/Vista/Win7+++
Win 2000
2
Bugs, Bugs and Bugs
3
Bugs: Run Time Handling
• Heisenbugs/MandelbugsHeisenbugs/Mandelbugs
– Heisenbugs are easier to take care of during run-time– Higher chance that robust programming mechanisms are successful
• Bohr bugs are typically easier to find and fix…at design time
• But harder to take care of during run time
We’ll cover schemes later that cover both types…though let’s try simple approaches first
4
Perturbation Classifications/Coverage
• PersistencePersistence– Transient fault– Intermittent fault– Permanent fault
• Creation timeCreation time– Design fault– Operational fault
• IntentionIntention– Accidental fault– Intentional fault
Crash failure Fail-silent and Fail-stop
Omission failure Timing failure
System fails to respond within a specified time slice
Both late and early responses might be “bad”
Late timing failure = performance failure
Arbitrary failure System behaves arbitrarily
5
Robust Programming Mechanisms
Objective: Sustain the delivery of services despite perturbations!
• Process Pairs• Graceful Degradation• Selective Retry• Checkpointing • Rejuvenation• Micro-reboots• Recovery Blocks• Diversity (NVP, NCP)• ...
6
Process Pairs (Continual Service)Implementation Variants: - Active replicas – both process client requests [+ fast; - complex]- Primary/Backup – state transfer [+- simpler; - delay]
client sends request to pair ... as long as one is correct, client should get an answer. Variants?(a)both process request … active replication (b) only one processes request … transfers state (primary backup)(b’) only one processes request … does not update state of other … fast but later state consistency problems
7
Process Pairs
• Process pair scheme robust to varied types of software faults Process pair scheme robust to varied types of software faults (crashes, resource shortage/delays, load…) :(crashes, resource shortage/delays, load…) :
– Study of print servers with process pair technology (primary / backup)– 2000 systems; 10 million system hours– 99.3% of failures affected only one server, i.e., 99.3% of failures were
tolerated
8
Simple Process Pair (same host)
......
forever {forever {
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
......
forever {forever {
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}eventloop
Server Process:
only takes care of crash failures …watchdogs to take care of hang failures etc…
9
Simple Process Pair (same host)
int ft = backup();int ft = backup();
......
forever {forever {
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
int ft = backup();int ft = backup();
......
forever {forever {
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
create backup process;primary returns
create backup process;primary returns
eventloop
Server Process:
Simplicity!! Just call it as a function
10
Simple Process Pair Implementation
backup
event loop event loop
- Don’t forget that we are assuming that the backup has the “full” state info or that the needed state is stored on (external) stable storage
- Mostly focusing on crash failures…primary can hang too…watchdog timers- Transients ok too except this model is at a basic concept level…
state is lost during crash - hope is that all needed state is stored externally e.g. file system
11
Syscalls
parent processkernelfork timewaitpid fork waitpid waitpidfork
...
12
man page: fork
forkfork() () creates a child process that differs from the parent process only in its creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.locks and pending signals are not inherited.
RETURN VALUE RETURN VALUE On success, the PID of the child process is returned in the On success, the PID of the child process is returned in the parent's thread of execution, parent's thread of execution, and a 0 is returned in the child's threadand a 0 is returned in the child's thread of of execution. execution. On failureOn failure, a , a -1-1 will be returned in the parent's context, no child will be returned in the parent's context, no child process will be created, and process will be created, and errnoerrno will be set appropriately. will be set appropriately.
ERRORSERRORSEAGAIN forkEAGAIN fork() cannot allocate sufficient memory to copy the parent's page () cannot allocate sufficient memory to copy the parent's page
tables and allocate a task structure for the child.tables and allocate a task structure for the child.EAGAIN EAGAIN It was not possible to create a new process because the caller's It was not possible to create a new process because the caller's
RLIMIT_NPROCRLIMIT_NPROC resource limit was encountered. resource limit was encountered.ENOMEM forkENOMEM fork() failed to allocate the necessary kernel structures because () failed to allocate the necessary kernel structures because
memory is tightmemory is tight..
Don’t forget that there is a limit for the # of threads; else EAGAIN error!
13
man page: waitpid(pid, *status, options)
The The waitpidwaitpid() system call suspends execution of the current process until a child specified () system call suspends execution of the current process until a child specified by by pidpid argument has changed state. By default, argument has changed state. By default, waitpidwaitpid() waits only for terminated () waits only for terminated children. The value of children. The value of pidpid can be: can be:
< -1 meaning wait for any child process whose process group ID is equal to the absolute < -1 meaning wait for any child process whose process group ID is equal to the absolute value of value of pidpid..
-1 meaning wait for any child process.-1 meaning wait for any child process.0 0 meaning wait for any child process whose process group ID is equal to that of the meaning wait for any child process whose process group ID is equal to that of the
calling process.calling process.
>0 meaning wait for the child whose process ID is equal to the value of >0 meaning wait for the child whose process ID is equal to the value of pidpid..waitpidwaitpid(): (): on success, returns the process ID of the child whose state has changed; on success, returns the process ID of the child whose state has changed; on error, -1 is on error, -1 is
returnedreturned..
ERRORSERRORS
- ECHILD - ECHILD The process specified by The process specified by pidpid does not exist or is not a child of the calling process. (This can does not exist or is not a child of the calling process. (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the LINUX NOTES happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the LINUX NOTES section about threads.)section about threads.)
- EINTR WNOHANG- EINTR WNOHANG was not set and an unblocked signal or a was not set and an unblocked signal or a SIGCHLDSIGCHLD was caught. was caught.- EINVAL- EINVALThe The optionsoptions argument was invalid. argument was invalid.
14
Simple Process Pair
int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {
ret = fork(); ret = fork(); if (ret == 0) {// child?if (ret == 0) {// child?
return restarts;return restarts;}}while(ret != waitpid(ret,0,0))while(ret != waitpid(ret,0,0))
;;}}
}}
int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {
ret = fork(); ret = fork(); if (ret == 0) {// child?if (ret == 0) {// child?
return restarts;return restarts;}}while(ret != waitpid(ret,0,0))while(ret != waitpid(ret,0,0))
;;}}
}}
count number of child procscount number of child procs
create childcreate child
parent waits for child to terminateparent waits for child to terminate
Create child – child returns – parent waits for child to terminate
waitpid(PID, *status, options), fork
15
Robust?
16
Failed fork system call (looping?)...
int backup() {int ret, restarts = 0;for (;; restarts++) {
ret = fork(); if (ret == 0) {// child?
return restarts;}
while(ret != waitpid(ret,0,0));
}}
int backup() {int ret, restarts = 0;for (;; restarts++) {
ret = fork(); if (ret == 0) {// child?
return restarts;}
while(ret != waitpid(ret,0,0));
}}
returns -1 on errorreturns -1 on error
parent does not returnparent does not return
returns with -1: no child createdreturns with -1: no child created
retry until success
loops... and creates new children.. Implicit retry…
17
Problem: forked another child
......
fork() fork() // fork non-terminating child// fork non-terminating child......backup() ...backup() ...
fork(); fork(); // fails // fails returns -1 returns -1 waitpid(-1,0,0); waitpid(-1,0,0);
// waits for any child ... might not return// waits for any child ... might not return
......
fork() fork() // fork non-terminating child// fork non-terminating child......backup() ...backup() ...
fork(); fork(); // fails // fails returns -1 returns -1 waitpid(-1,0,0); waitpid(-1,0,0);
// waits for any child ... might not return// waits for any child ... might not return
ret = fork() waitpid(ret,0,0)ret = fork() waitpid(ret,0,0)
18
Graceful Degradation
int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {
ret = fork(); ret = fork(); if (ret < 0) if (ret < 0) { l{ log(“backup: ...”);og(“backup: ...”);
return -1; }return -1; }if (ret == 0) {// child returnsif (ret == 0) {// child returns
return restarts;return restarts;}}while(ret != waitpid(ret,0,0)) ;while(ret != waitpid(ret,0,0)) ;
}}}}
int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {
ret = fork(); ret = fork(); if (ret < 0) if (ret < 0) { l{ log(“backup: ...”);og(“backup: ...”);
return -1; }return -1; }if (ret == 0) {// child returnsif (ret == 0) {// child returns
return restarts;return restarts;}}while(ret != waitpid(ret,0,0)) ;while(ret != waitpid(ret,0,0)) ;
}}}}
process can run withoutbackup: just return if
fork fails
process can run withoutbackup: just return if
fork fails
Why not retry in backup? Should do that .. might help in association with killer process! But we need a top level retry mechanism!
19
Selective Retries
• Retries:Retries:– repeat a call until it succeeds or until we run out of time (timeout) or max.
number of retries
• Selective Retries:Selective Retries:– repeat only calls when there is a chance that retry can succeed– e.g., memory shortage might disappear– e.g., invalid argument will typically stay invalid
20
Not always clear if retry could succeed
ForkFork() creates a child process that differs from the parent process only in its PID and () creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.signals are not inherited.
RETURN VALUE RETURN VALUE On success, the PID of the child process is returned in the parent's On success, the PID of the child process is returned in the parent's thread of execution, and a 0 is returned in the child's thread of execution. thread of execution, and a 0 is returned in the child's thread of execution. On failure, On failure, a -1 willa -1 will be returned in the parent's context, no child process will be created, and be returned in the parent's context, no child process will be created, and errnoerrno will be set appropriately. will be set appropriately.
ERRORSERRORS
- EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.
- EAGAIN It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered.
- ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.
The maximum number of threads that can be created for the real user ID of the calling process. Upon encountering this limit, fork() fails with the error EAGAIN.
21
Selective Retries
int ft = backup();int ft = backup();
......
forever {forever {
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
int ft = backup();int ft = backup();
......
forever {forever {
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
Can fail but might succeed when more memory avail or
less processes
Can fail but might succeed when more memory avail or
less processes
Infinite re-tries? Delays?
Selective retries: retry calls for which a retry might help ; resource problems...Do not retry all failed calls – e.g., due to argument failures
Do not retry infinitely often –> might lead to unacceptable delays
22
Selective Retries
int ft = backup();int ft = backup();
......
forever {forever {
if (ft < 0) { ft = backup(); }if (ft < 0) { ft = backup(); }
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
int ft = backup();int ft = backup();
......
forever {forever {
if (ft < 0) { ft = backup(); }if (ft < 0) { ft = backup(); }
wait_for_request(Request);wait_for_request(Request);
process_request(Request);process_request(Request);
}}
Retry if no backupRetry if no backup
Might be a lot of retries ... Might be a lot of retries ... state might already be corrupted
state might already be corrupted
Might need too much processing to do for each request ... Potentially too much processing power taken away...
Exponential backup .. Generally a good trade-off as it balances overhead and eventual success of retry
23
Retry Questions...
• How often should we retry?How often should we retry?– should we wait between retries?
• Should we retry at some later point in time?Should we retry at some later point in time?– how many times until we give up?
• At what level should we retry?At what level should we retry?
24
Hierarchical Retries
potentially:exp. increase in retries!
composability: retries should be independent of each other!
function h
calls
calls
retry
retry
function f retry
25
Selective Retries
• Under high load calls might fail due to resource Under high load calls might fail due to resource shortageshortage
• We can use selective retries to increase probability of We can use selective retries to increase probability of success during resource allocationsuccess during resource allocation
• Operating systems like Linux have a “killer process” Operating systems like Linux have a “killer process” that terminates processes if too few resources existthat terminates processes if too few resources exist
<check out exact polity in man pages!> <check out exact polity in man pages!>
• With selective retries this will make sure that With selective retries this will make sure that processes that survive can complete their requestsprocesses that survive can complete their requests
26
Bohrbugs
27
Continuous Crashing
28
Continuous Crashing
• Finite number of retries by client?Finite number of retries by client?– client will stop sending the request eventually
• But what if we cannot control clientsBut what if we cannot control clients– clients might think it is fun to crash server? DoS attacks take place like this!
What happens if the retrying request activates bohrbugs?What happens if the retrying request activates bohrbugs?
29
Graceful Degradation
• Alternative Approach:Alternative Approach:– server needs to make sure that failed request is only retried for
a fixed number of times
• Problem:Problem:– how can we know that a request has already been partially
processed several times?
• Solution:Solution:– need to keep some state info between request instances!
30
State Handling(load & store application states)
31
Using Session State
int ft = backup();
...
forever {
wait_for_request(Request);
get_session_state(Request);
if(num_retries < N) {
process_request(Request);
store_session_state(Request);
}else { return_error(); }
}
int ft = backup();
...
forever {
wait_for_request(Request);
get_session_state(Request);
if(num_retries < N) {
process_request(Request);
store_session_state(Request);
}else { return_error(); }
}
updates number of retries
updates number of retries
32
Crash of Parent!
33
What if parent process dies?
Possible reasons:Possible reasons:• Operator might kill wrong processOperator might kill wrong process• Parent might terminate for some other reason, e.g.,Parent might terminate for some other reason, e.g.,
– Linux: out of memory process killer (see earlier slide!)– Kills processes that use too much memory:
• “more cpu time decreases the chance of being killed”• Parent could get killed
Normally we would expect that parent does not crash .. just performs a waitpid but ...
34
Detecting Parent Crashes
35
Detection of Process Crashes
• Pipe used to Pipe used to communicate between communicate between procsprocs– Unix: ls | sort
• Pipe end closed whenPipe end closed when– process terminates
• Process B can detectProcess B can detect– when process A
terminated
36
Adding Parent Termination Detection
int fd[2]; // pipe fdint fd[2]; // pipe fdint backup() { ...int backup() { ...pipe(fd);pipe(fd);ret = fork();ret = fork();if (ret == 0) { // child?if (ret == 0) { // child?
close (fd [1]);close (fd [1]); return restarts++;return restarts++;} // parent closes other end:} // parent closes other end:close (fd [0]);close (fd [0]);
......
int fd[2]; // pipe fdint fd[2]; // pipe fdint backup() { ...int backup() { ...pipe(fd);pipe(fd);ret = fork();ret = fork();if (ret == 0) { // child?if (ret == 0) { // child?
close (fd [1]);close (fd [1]); return restarts++;return restarts++;} // parent closes other end:} // parent closes other end:close (fd [0]);close (fd [0]);
......
write end
read end
37
Child can detect parent termination
int hasParentTerminated() {int hasParentTerminated() {
// check if other end of pipe has been closed// check if other end of pipe has been closed
......
}}
int hasParentTerminated() {int hasParentTerminated() {
// check if other end of pipe has been closed// check if other end of pipe has been closed
......
}}
has to be called periodically
Pipe as detector ; not completely satisfactory: child is already in a state that might be corrupted !Would need to start the application with an option to say that it is in parent mode to jettison most state!
38
Problem: State Corruption
39
Parent Replacement
alreadyexecutedrequests
e.g., new parent allocated resources that are never freed
new children will have also that corrupted state!
40
Alternative Approach
reinit may fail and might cost too much time ... no ideal solution as far as I know .. would stickwith the first solution when possible: parent might not fail that often
41
Process Links
Generalized Crash DetectionGeneralized Crash Detection
42
Linking Processes
• We can use a pipe as a failure detector:We can use a pipe as a failure detector:– We can detect that a process has terminated
• We can use that for:We can use that for:– Replacing failed processes– Providing some “termination atomicity” (if one dies all die!)
• If one process fails, some other processes might not be able to work properly anymore
• One simple way is to terminate all such processes
• Garbage collection of processes
43
Process Links: “Termination Atomicity”
• Set of cooperating processesSet of cooperating processes• If some process p terminates, each linked process q must If some process p terminates, each linked process q must
terminateterminate• We can link processes via “process links”:We can link processes via “process links”:
– Programming language support – Java, Erlang, …
44
Pipe And Filter
45
Example: Farmer / Worker
Farmer process pair ;Worker ;
46
Asymmetric Link Behavior
47
Master as Process PairMitigates parent crash semantics by avoiding
terminations as possible for liveness
48
Error Recovery in Distributed Systems (DS)Checkpointing
49
Handling Transients?
• Transient Fault: a fault that is no longer present after system restart
• Many flavors:– SW transients– OS transients– Middleware/Protocol transients– Network transients– Operational transients– Power transients
• Need to recover from the effects of transients detect them! … let us assume simple local sanity checks (acceptance tests)
exisit!
50
So how does one handle these transients?So how does one handle these transients?
Objective: - sustained ops (key driver: sustained performance)- transparent handling of bugs (to users and application designers)
System Model:System Model: Coupled/Distributed/Networked Processes Coupled/Distributed/Networked Processes
51
Periodic Checkpointing
52
Checkpointing
pid parent = getpid();pid parent = getpid();......for (int nxt_ckpt=for (int nxt_ckpt=00 ;; nxt_ckpt -- ) { ;; nxt_ckpt -- ) {if (nxt_ckpt <= 0) {if (nxt_ckpt <= 0) {
pid newparent = getpid();pid newparent = getpid();if (backup() >= 0 if (backup() >= 0
&& parent != newparent) {&& parent != newparent) {kill(parent, KILL);kill(parent, KILL);parent = newparent;parent = newparent;nxt_ckpt = nxt_ckpt = NN;;
}}}}wait_for_request(Request);wait_for_request(Request);process_request(Request);process_request(Request);
}}
pid parent = getpid();pid parent = getpid();......for (int nxt_ckpt=for (int nxt_ckpt=00 ;; nxt_ckpt -- ) { ;; nxt_ckpt -- ) {if (nxt_ckpt <= 0) {if (nxt_ckpt <= 0) {
pid newparent = getpid();pid newparent = getpid();if (backup() >= 0 if (backup() >= 0
&& parent != newparent) {&& parent != newparent) {kill(parent, KILL);kill(parent, KILL);parent = newparent;parent = newparent;nxt_ckpt = nxt_ckpt = NN;;
}}}}wait_for_request(Request);wait_for_request(Request);process_request(Request);process_request(Request);
}}
53
Backup Code Revisited
• Issue:Issue:– If we have multiple generations, we want the ancestors only to take over
if none of the children is alive
• Use process links instead of waitpidUse process links instead of waitpid– Waitpid in endless loop is dangerous anyhow...
54
Temporal Redundancy
““Redo” tasks on error detection Redo” tasks on error detection
Xtask progress
transient occurs(and is detected)
P
REDO task
55
Backward Error Recovery
• Save process state at predetermined (periodic) Save process state at predetermined (periodic) recovery pointsrecovery points– Called “checkpoints”– Checkpoints stored on stable storage, not affected by same failures
• Recover by Recover by rolling backrolling back to a previously saved (error-free) state to a previously saved (error-free) state
task progress
transient
task progress transient (& acceptance
test)
X
Xchkpt chkpt
chkpt: complete set of (state) information needed to re-starttask executionfrom chkpt.
P
P
56
Advantages of Backward Recovery
+ Requires no knowledge of the errors in the system state
+ Can handle arbitrary / unpredictable faults (as long as they do not affect the recovery mechanism)
+ Can be applied regardless of the sustained damage (the saved state must be error-free, though)
+ General scheme / application independent
+ Particularly suitable for recovering from transient faults
57
Disadvantages of Backward Recovery
―Requires significant resources (e.g. time, computation, stable storage) for checkpointing and recovery
―Checkpointing requires– To identify consistent states– The system to be halted / slowed down temporarily
―Care must be taken in concurrent systems to avoid the orphans, lost and domino effects (will cover later in the lecture...)
58
Forward Error Recovery
• Detect the error• Damage assessment• Build a new error-free state from which the system can
continue execution– “Safe stop”– Degraded mode– Error compensation
• E.g., switching to a different component, etc…
Faultdetected
Fault manifests
State Reconstruction
Damage Assessment
59
Advantages of Forward Recovery
+ Efficient (time / memory)– If the characteristics of the fault are well understood, forward recovery is
a very efficient solution
+ Well suited for real-time applications– Missed deadlines can be addressed
+ Anticipated faults can be dealt with in a timely way using redundancy
60
Disadvantages of Forward Recovery
—Application-specific—Can only remove predictable errors from the system state—Requires knowledge of the actual error—Depends on the accuracy of error detection, potential damage
prediction, and actual damage assessment—Not usable if the system state is damaged beyond
recoverability
61
Error Recovery
• Save process state at predetermined (periodic) Save process state at predetermined (periodic) recovery pointsrecovery points– Called “checkpoints”– Checkpoints stored on stable storage, not affected by same failures
• Recover by Recover by rolling backrolling back to a previously saved (error-free) state to a previously saved (error-free) state
task progress
transient
task progress transient (& acceptance
test)
X
Xchkpt chkpt
chkpt: complete set of (state) information needed to re-starttask executionfrom chkpt.
P
P
62
Logging Requests
request_no = 0;...
for (int nxt_ckpt=0 ;; nxt_ckpt--) {checkpoint(&nxt_ckpt);
wait_for_request(Request);log_to_disk(++request_no,Request);
process_request(Request);}
request_no = 0;...
for (int nxt_ckpt=0 ;; nxt_ckpt--) {checkpoint(&nxt_ckpt);
wait_for_request(Request);log_to_disk(++request_no,Request);
process_request(Request);}
63
Processing Log
request_no = 0;...
for (int nxt_ckpt=0 ;; nxt_ckpt--) {if (checkpoint(&nxt_ckpt) == recovery) {
while((request_no+1,R) in log) { process_request(R); request_no+
+;}
}wait_for_request(Request);
log_to_disk(++request_no,Request);process_request(Request);
}
request_no = 0;...
for (int nxt_ckpt=0 ;; nxt_ckpt--) {if (checkpoint(&nxt_ckpt) == recovery) {
while((request_no+1,R) in log) { process_request(R); request_no+
+;}
}wait_for_request(Request);
log_to_disk(++request_no,Request);process_request(Request);
}
64 64
Problems:Lost Updates, corrupted saved states...not
easy to fix!
• State diverges from original computationState diverges from original computation– results of replayed request might be different
• could detect this by keeping a log of replies– new client request might be processed correctly
• e.g., ids in requests might not make sense to the current server instance
65
Frequency vs Completeness
• Less complete checkpointLess complete checkpoint– higher probability that error is purged from saved state– omitted state needs to be recomputed on recovery
• Less frequent checkpointingLess frequent checkpointing– checkpoint becomes larger– state information becomes stale– …
• ““Application save” is (in practice) very robust Application save” is (in practice) very robust – might not always contain all info (e.g., window position) for
transparent restart
67
Distributed Systems: Checkpointing
So So how how does one place the chkpts & does one place the chkpts & wherewhere??
Should we synchronize process-es(-ors) & checkpoints?Should we synchronize process-es(-ors) & checkpoints?
P1
P2
P3
Note: A system can be synchronous though the msg. based comm. can still be async!
68
Options for Checkpoint Storage?
• Key building block: Key building block: stable storagestable storage– Persistent: survives the failure of the entity that created/initialized/used it– Reliable: very low probability of losing or corrupting info
• Implementation Implementation – Typically non-volatile media (disks)– Single disk? Often replicated/multiple volatile memories– Make sure one replica at least always survives!
69
Options for Checkpoint Placement?
• UncoordinatedUncoordinated: processes take checkpoints independently: processes take checkpoints independently– Pro: no delays– Con: consistency?
• CoordinatedCoordinated: have processes coordinate before taking a checkpoint: have processes coordinate before taking a checkpoint– Pro: globally consistent checkpoints– Con: co-ordination delays
• Communication-inducedCommunication-induced: checkpoint when receiving and prior to : checkpoint when receiving and prior to processing messages that may introduce conflictsprocessing messages that may introduce conflicts
70
What happens when we don’t synchronize?
orphan msgs.orphan msgs. lost msgs.lost msgs.
P1
P2
P1
P2
X
X
chkpt C1 chkpt C1
chkpt C2 chkpt C2
Msg Msg
fault
fault
Rollback to C1 & C2 gives an inconsistent state
71
..and more problems... domino effects
P1
P2
X
fault
* problems are fixable though require considerable pre-planning
oo
72
• PP11 fails, recovers, rolls back to fails, recovers, rolls back to CCaa
• PP22 finds it received message ( finds it received message (mmii) never sent, rollback to ) never sent, rollback to CCbb
• PP33 finds it received message ( finds it received message (mmjj) never sent, roll back to ) never sent, roll back to CCcc• …………
P1
P2
P3
Recovery line Ca
Cb
Cc
Boom!
mi
mj
73
Consistent Checkpoints: No orphans, lost msgs or dominos!
P1
P2
all messages sent ARE recorded with a consistent cut!
P3
consistent cut
74
• Processes co-ordinate (synchronize) to set checkpoints guaranteed to be Processes co-ordinate (synchronize) to set checkpoints guaranteed to be consistentconsistent– 2 Phase Consistent Checkpointing
Phase IPhase I: : An initiator node X takes a “tentative” checkpoint and requests all other processes to set checkpoints. All processes inform X when they are willing to checkpoint
Phase IIPhase II: : If all other processes are willing to checkpoint, then X decides to make its checkpoint permanent; otherwise X decides that all checkpoints shall be discarded. Informs all of decision
Either all or none take permanent checkpoints!
Synchronizing Checkpoints (not the processors!)
75
2Phase Consistent Checkpoints
X
R
{X1,R1,S1} preliminary checkpoints{X2,R2,S2} consistent checkpoints
S
requests
X1 X2
S2
R1 R2
S1
76
Atomic Commitment and Window of Vulnerability
• So far, recovery of actions that can be individually rolled back…So far, recovery of actions that can be individually rolled back…
• Better idea: Better idea: – Encapsulate actions in sequences that cannot be undone individually– Atomic transactions provide this– Properties: ACID
• Atomicity: transaction is an indivisible unit of work• Consistency: transaction leaves system in correct state or aborts• Isolation: transactions’ behavior not affected by other concurrent
transactions• Durability: transaction’s effects are permanent after it commits • (Serializable)
77
Atomic Commit (cont.)
• To implement transactions, processes must coordinate!To implement transactions, processes must coordinate!– Bundling of related events– Coordination between processes
• One protocol: two-phase commitOne protocol: two-phase commit
Commit Abort
Q: can this somehow block?
78
Two-phase commit (cont.)
• Problem: coordinator failure after PREPARE & before COMMIT blocks Problem: coordinator failure after PREPARE & before COMMIT blocks participants waiting for decision (a)participants waiting for decision (a)
• Three-phase commit overcomes this (b)Three-phase commit overcomes this (b)– delay final decision until enough processes “know” which decision will be
taken
79
State Transfer
• Reintegrating a failed component requires state Reintegrating a failed component requires state transfer!transfer!– If checkpoint/log to stable storage, recovering replica can do
incremental transfer• Recover first from last checkpoint• Get further logs from active replicas
– Goal: minimal interference with remaining replicas– Problem: state is being updated!
• Might result in incorrect state transfer (have to coordinate with ongoing messages)
• Might change such that the new replica can never catch up!– Solution: give higher priority to state-transfer messages
• Lots of variations…