1 Win XP/Vista/Win7+++ Win 2000. 2 Bugs, Bugs and Bugs

1

Win XP/Vista/Win7+++

Win 2000

2

Bugs, Bugs and Bugs

3

Bugs: Run Time Handling

• Heisenbugs/MandelbugsHeisenbugs/Mandelbugs

– Heisenbugs are easier to take care of during run-time– Higher chance that robust programming mechanisms are successful

• Bohr bugs are typically easier to find and fix…at design time

• But harder to take care of during run time

We’ll cover schemes later that cover both types…though let’s try simple approaches first

4

Perturbation Classifications/Coverage

• PersistencePersistence– Transient fault– Intermittent fault– Permanent fault

• Creation timeCreation time– Design fault– Operational fault

• IntentionIntention– Accidental fault– Intentional fault

Crash failure Fail-silent and Fail-stop

Omission failure Timing failure

System fails to respond within a specified time slice

Both late and early responses might be “bad”

Late timing failure = performance failure

Arbitrary failure System behaves arbitrarily

5

Robust Programming Mechanisms

Objective: Sustain the delivery of services despite perturbations!

• Process Pairs• Graceful Degradation• Selective Retry• Checkpointing • Rejuvenation• Micro-reboots• Recovery Blocks• Diversity (NVP, NCP)• ...

6

Process Pairs (Continual Service)Implementation Variants: - Active replicas – both process client requests [+ fast; - complex]- Primary/Backup – state transfer [+- simpler; - delay]

client sends request to pair ... as long as one is correct, client should get an answer. Variants?(a)both process request … active replication (b) only one processes request … transfers state (primary backup)(b’) only one processes request … does not update state of other … fast but later state consistency problems

7

Process Pairs

• Process pair scheme robust to varied types of software faults Process pair scheme robust to varied types of software faults (crashes, resource shortage/delays, load…) :(crashes, resource shortage/delays, load…) :

– Study of print servers with process pair technology (primary / backup)– 2000 systems; 10 million system hours– 99.3% of failures affected only one server, i.e., 99.3% of failures were

tolerated

8

Simple Process Pair (same host)

......

forever {forever {

wait_for_request(Request);wait_for_request(Request);

process_request(Request);process_request(Request);

}}

......

forever {forever {



}}eventloop

Server Process:

only takes care of crash failures …watchdogs to take care of hang failures etc…

9

Simple Process Pair (same host)

int ft = backup();int ft = backup();

......

forever {forever {



}}


......

forever {forever {



}}

create backup process;primary returns

create backup process;primary returns

eventloop

Server Process:

Simplicity!! Just call it as a function

10

Simple Process Pair Implementation

backup

event loop event loop

- Don’t forget that we are assuming that the backup has the “full” state info or that the needed state is stored on (external) stable storage

- Mostly focusing on crash failures…primary can hang too…watchdog timers- Transients ok too except this model is at a basic concept level…

state is lost during crash - hope is that all needed state is stored externally e.g. file system

11

Syscalls

parent processkernelfork timewaitpid fork waitpid waitpidfork

...

12

man page: fork

forkfork() () creates a child process that differs from the parent process only in its creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.locks and pending signals are not inherited.

RETURN VALUE RETURN VALUE On success, the PID of the child process is returned in the On success, the PID of the child process is returned in the parent's thread of execution, parent's thread of execution, and a 0 is returned in the child's threadand a 0 is returned in the child's thread of of execution. execution. On failureOn failure, a , a -1-1 will be returned in the parent's context, no child will be returned in the parent's context, no child process will be created, and process will be created, and errnoerrno will be set appropriately. will be set appropriately.

ERRORSERRORSEAGAIN forkEAGAIN fork() cannot allocate sufficient memory to copy the parent's page () cannot allocate sufficient memory to copy the parent's page

tables and allocate a task structure for the child.tables and allocate a task structure for the child.EAGAIN EAGAIN It was not possible to create a new process because the caller's It was not possible to create a new process because the caller's

RLIMIT_NPROCRLIMIT_NPROC resource limit was encountered. resource limit was encountered.ENOMEM forkENOMEM fork() failed to allocate the necessary kernel structures because () failed to allocate the necessary kernel structures because

memory is tightmemory is tight..

Don’t forget that there is a limit for the # of threads; else EAGAIN error!

13

man page: waitpid(pid, *status, options)

The The waitpidwaitpid() system call suspends execution of the current process until a child specified () system call suspends execution of the current process until a child specified by by pidpid argument has changed state. By default, argument has changed state. By default, waitpidwaitpid() waits only for terminated () waits only for terminated children. The value of children. The value of pidpid can be: can be:

< -1 meaning wait for any child process whose process group ID is equal to the absolute < -1 meaning wait for any child process whose process group ID is equal to the absolute value of value of pidpid..

-1 meaning wait for any child process.-1 meaning wait for any child process.0 0 meaning wait for any child process whose process group ID is equal to that of the meaning wait for any child process whose process group ID is equal to that of the

calling process.calling process.

>0 meaning wait for the child whose process ID is equal to the value of >0 meaning wait for the child whose process ID is equal to the value of pidpid..waitpidwaitpid(): (): on success, returns the process ID of the child whose state has changed; on success, returns the process ID of the child whose state has changed; on error, -1 is on error, -1 is

returnedreturned..

ERRORSERRORS

- ECHILD - ECHILD The process specified by The process specified by pidpid does not exist or is not a child of the calling process. (This can does not exist or is not a child of the calling process. (This can happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the LINUX NOTES happen for one's own child if the action for SIGCHLD is set to SIG_IGN. See also the LINUX NOTES section about threads.)section about threads.)

- EINTR WNOHANG- EINTR WNOHANG was not set and an unblocked signal or a was not set and an unblocked signal or a SIGCHLDSIGCHLD was caught. was caught.- EINVAL- EINVALThe The optionsoptions argument was invalid. argument was invalid.

14

Simple Process Pair

int backup() {int backup() {int ret, restarts = 0;int ret, restarts = 0;for (;; restarts++) {for (;; restarts++) {

ret = fork(); ret = fork(); if (ret == 0) {// child?if (ret == 0) {// child?

return restarts;return restarts;}}while(ret != waitpid(ret,0,0))while(ret != waitpid(ret,0,0))

;;}}

}}


ret = fork(); ret = fork(); if (ret == 0) {// child?if (ret == 0) {// child?

return restarts;return restarts;}}while(ret != waitpid(ret,0,0))while(ret != waitpid(ret,0,0))

;;}}

}}

count number of child procscount number of child procs

create childcreate child

parent waits for child to terminateparent waits for child to terminate

Create child – child returns – parent waits for child to terminate

waitpid(PID, *status, options), fork

15

Robust?

16

Failed fork system call (looping?)...

int backup() {int ret, restarts = 0;for (;; restarts++) {

ret = fork(); if (ret == 0) {// child?

return restarts;}

while(ret != waitpid(ret,0,0));

}}

int backup() {int ret, restarts = 0;for (;; restarts++) {

ret = fork(); if (ret == 0) {// child?

return restarts;}

while(ret != waitpid(ret,0,0));

}}

returns -1 on errorreturns -1 on error

parent does not returnparent does not return

returns with -1: no child createdreturns with -1: no child created

retry until success

loops... and creates new children.. Implicit retry…

17

Problem: forked another child

......

fork() fork() // fork non-terminating child// fork non-terminating child......backup() ...backup() ...

fork(); fork(); // fails // fails returns -1 returns -1 waitpid(-1,0,0); waitpid(-1,0,0);

// waits for any child ... might not return// waits for any child ... might not return

......

fork() fork() // fork non-terminating child// fork non-terminating child......backup() ...backup() ...

fork(); fork(); // fails // fails returns -1 returns -1 waitpid(-1,0,0); waitpid(-1,0,0);

// waits for any child ... might not return// waits for any child ... might not return

ret = fork() waitpid(ret,0,0)ret = fork() waitpid(ret,0,0)

18

Graceful Degradation


ret = fork(); ret = fork(); if (ret < 0) if (ret < 0) { l{ log(“backup: ...”);og(“backup: ...”);

return -1; }return -1; }if (ret == 0) {// child returnsif (ret == 0) {// child returns

return restarts;return restarts;}}while(ret != waitpid(ret,0,0)) ;while(ret != waitpid(ret,0,0)) ;

}}}}


ret = fork(); ret = fork(); if (ret < 0) if (ret < 0) { l{ log(“backup: ...”);og(“backup: ...”);

return -1; }return -1; }if (ret == 0) {// child returnsif (ret == 0) {// child returns

return restarts;return restarts;}}while(ret != waitpid(ret,0,0)) ;while(ret != waitpid(ret,0,0)) ;

}}}}

process can run withoutbackup: just return if

fork fails

process can run withoutbackup: just return if

fork fails

Why not retry in backup? Should do that .. might help in association with killer process! But we need a top level retry mechanism!

19

Selective Retries

• Retries:Retries:– repeat a call until it succeeds or until we run out of time (timeout) or max.

number of retries

• Selective Retries:Selective Retries:– repeat only calls when there is a chance that retry can succeed– e.g., memory shortage might disappear– e.g., invalid argument will typically stay invalid

20

Not always clear if retry could succeed

ForkFork() creates a child process that differs from the parent process only in its PID and () creates a child process that differs from the parent process only in its PID and PPID, and in the fact that resource utilizations are set to 0. File locks and pending PPID, and in the fact that resource utilizations are set to 0. File locks and pending signals are not inherited.signals are not inherited.

RETURN VALUE RETURN VALUE On success, the PID of the child process is returned in the parent's On success, the PID of the child process is returned in the parent's thread of execution, and a 0 is returned in the child's thread of execution. thread of execution, and a 0 is returned in the child's thread of execution. On failure, On failure, a -1 willa -1 will be returned in the parent's context, no child process will be created, and be returned in the parent's context, no child process will be created, and errnoerrno will be set appropriately. will be set appropriately.

ERRORSERRORS

- EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task structure for the child.

- EAGAIN It was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered.

- ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.

The maximum number of threads that can be created for the real user ID of the calling process. Upon encountering this limit, fork() fails with the error EAGAIN.

21

Selective Retries


......

forever {forever {



}}


......

forever {forever {



}}

Can fail but might succeed when more memory avail or

less processes

Can fail but might succeed when more memory avail or

less processes

Infinite re-tries? Delays?

Selective retries: retry calls for which a retry might help ; resource problems...Do not retry all failed calls – e.g., due to argument failures

Do not retry infinitely often –> might lead to unacceptable delays

22

Selective Retries


......

forever {forever {

if (ft < 0) { ft = backup(); }if (ft < 0) { ft = backup(); }



}}


......

forever {forever {

if (ft < 0) { ft = backup(); }if (ft < 0) { ft = backup(); }



}}

Retry if no backupRetry if no backup

Might be a lot of retries ... Might be a lot of retries ... state might already be corrupted

state might already be corrupted

Might need too much processing to do for each request ... Potentially too much processing power taken away...

Exponential backup .. Generally a good trade-off as it balances overhead and eventual success of retry

23

Retry Questions...

• How often should we retry?How often should we retry?– should we wait between retries?

• Should we retry at some later point in time?Should we retry at some later point in time?– how many times until we give up?

• At what level should we retry?At what level should we retry?

24

Hierarchical Retries

potentially:exp. increase in retries!

composability: retries should be independent of each other!

function h

calls

calls

retry

retry

function f retry

25

Selective Retries

• Under high load calls might fail due to resource Under high load calls might fail due to resource shortageshortage

• We can use selective retries to increase probability of We can use selective retries to increase probability of success during resource allocationsuccess during resource allocation

• Operating systems like Linux have a “killer process” Operating systems like Linux have a “killer process” that terminates processes if too few resources existthat terminates processes if too few resources exist

<check out exact polity in man pages!> <check out exact polity in man pages!>

• With selective retries this will make sure that With selective retries this will make sure that processes that survive can complete their requestsprocesses that survive can complete their requests

26

Bohrbugs

27

Continuous Crashing

28

Continuous Crashing

• Finite number of retries by client?Finite number of retries by client?– client will stop sending the request eventually

• But what if we cannot control clientsBut what if we cannot control clients– clients might think it is fun to crash server? DoS attacks take place like this!

What happens if the retrying request activates bohrbugs?What happens if the retrying request activates bohrbugs?

29

Graceful Degradation

• Alternative Approach:Alternative Approach:– server needs to make sure that failed request is only retried for

a fixed number of times

• Problem:Problem:– how can we know that a request has already been partially

processed several times?

• Solution:Solution:– need to keep some state info between request instances!

30

State Handling(load & store application states)

31

Using Session State

int ft = backup();

...

forever {

wait_for_request(Request);

get_session_state(Request);

if(num_retries < N) {

process_request(Request);

store_session_state(Request);

}else { return_error(); }

}

int ft = backup();

...

forever {

wait_for_request(Request);

get_session_state(Request);

if(num_retries < N) {

process_request(Request);

store_session_state(Request);

}else { return_error(); }

}

updates number of retries

updates number of retries

32

Crash of Parent!

33

What if parent process dies?

Possible reasons:Possible reasons:• Operator might kill wrong processOperator might kill wrong process• Parent might terminate for some other reason, e.g.,Parent might terminate for some other reason, e.g.,

– Linux: out of memory process killer (see earlier slide!)– Kills processes that use too much memory:

• “more cpu time decreases the chance of being killed”• Parent could get killed

Normally we would expect that parent does not crash .. just performs a waitpid but ...

34

Detecting Parent Crashes

35

Detection of Process Crashes

• Pipe used to Pipe used to communicate between communicate between procsprocs– Unix: ls | sort

• Pipe end closed whenPipe end closed when– process terminates

• Process B can detectProcess B can detect– when process A

terminated

36

Adding Parent Termination Detection

int fd[2]; // pipe fdint fd[2]; // pipe fdint backup() { ...int backup() { ...pipe(fd);pipe(fd);ret = fork();ret = fork();if (ret == 0) { // child?if (ret == 0) { // child?

close (fd [1]);close (fd [1]); return restarts++;return restarts++;} // parent closes other end:} // parent closes other end:close (fd [0]);close (fd [0]);

......

int fd[2]; // pipe fdint fd[2]; // pipe fdint backup() { ...int backup() { ...pipe(fd);pipe(fd);ret = fork();ret = fork();if (ret == 0) { // child?if (ret == 0) { // child?

close (fd [1]);close (fd [1]); return restarts++;return restarts++;} // parent closes other end:} // parent closes other end:close (fd [0]);close (fd [0]);

......

write end

read end

37

Child can detect parent termination

int hasParentTerminated() {int hasParentTerminated() {

// check if other end of pipe has been closed// check if other end of pipe has been closed

......

}}

int hasParentTerminated() {int hasParentTerminated() {

// check if other end of pipe has been closed// check if other end of pipe has been closed

......

}}

has to be called periodically

Pipe as detector ; not completely satisfactory: child is already in a state that might be corrupted !Would need to start the application with an option to say that it is in parent mode to jettison most state!

38

Problem: State Corruption

39

Parent Replacement

alreadyexecutedrequests

e.g., new parent allocated resources that are never freed

new children will have also that corrupted state!

40

Alternative Approach

reinit may fail and might cost too much time ... no ideal solution as far as I know .. would stickwith the first solution when possible: parent might not fail that often

41

Process Links

Generalized Crash DetectionGeneralized Crash Detection

42

Linking Processes

• We can use a pipe as a failure detector:We can use a pipe as a failure detector:– We can detect that a process has terminated

• We can use that for:We can use that for:– Replacing failed processes– Providing some “termination atomicity” (if one dies all die!)

• If one process fails, some other processes might not be able to work properly anymore

• One simple way is to terminate all such processes

• Garbage collection of processes

43

Process Links: “Termination Atomicity”

• Set of cooperating processesSet of cooperating processes• If some process p terminates, each linked process q must If some process p terminates, each linked process q must

terminateterminate• We can link processes via “process links”:We can link processes via “process links”:

– Programming language support – Java, Erlang, …

44

Pipe And Filter

45

Example: Farmer / Worker

Farmer process pair ;Worker ;

46

Asymmetric Link Behavior

47

Master as Process PairMitigates parent crash semantics by avoiding

terminations as possible for liveness

48

Error Recovery in Distributed Systems (DS)Checkpointing

49

Handling Transients?

• Transient Fault: a fault that is no longer present after system restart

• Many flavors:– SW transients– OS transients– Middleware/Protocol transients– Network transients– Operational transients– Power transients

• Need to recover from the effects of transients detect them! … let us assume simple local sanity checks (acceptance tests)

exisit!

50

So how does one handle these transients?So how does one handle these transients?

Objective: - sustained ops (key driver: sustained performance)- transparent handling of bugs (to users and application designers)

System Model:System Model: Coupled/Distributed/Networked Processes Coupled/Distributed/Networked Processes

51

Periodic Checkpointing

52

Checkpointing

pid parent = getpid();pid parent = getpid();......for (int nxt_ckpt=for (int nxt_ckpt=00 ;; nxt_ckpt -- ) { ;; nxt_ckpt -- ) {if (nxt_ckpt <= 0) {if (nxt_ckpt <= 0) {

pid newparent = getpid();pid newparent = getpid();if (backup() >= 0 if (backup() >= 0

&& parent != newparent) {&& parent != newparent) {kill(parent, KILL);kill(parent, KILL);parent = newparent;parent = newparent;nxt_ckpt = nxt_ckpt = NN;;

}}}}wait_for_request(Request);wait_for_request(Request);process_request(Request);process_request(Request);

}}

pid parent = getpid();pid parent = getpid();......for (int nxt_ckpt=for (int nxt_ckpt=00 ;; nxt_ckpt -- ) { ;; nxt_ckpt -- ) {if (nxt_ckpt <= 0) {if (nxt_ckpt <= 0) {

pid newparent = getpid();pid newparent = getpid();if (backup() >= 0 if (backup() >= 0

&& parent != newparent) {&& parent != newparent) {kill(parent, KILL);kill(parent, KILL);parent = newparent;parent = newparent;nxt_ckpt = nxt_ckpt = NN;;

}}}}wait_for_request(Request);wait_for_request(Request);process_request(Request);process_request(Request);

}}

53

Backup Code Revisited

• Issue:Issue:– If we have multiple generations, we want the ancestors only to take over

if none of the children is alive

• Use process links instead of waitpidUse process links instead of waitpid– Waitpid in endless loop is dangerous anyhow...

54

Temporal Redundancy

““Redo” tasks on error detection Redo” tasks on error detection

Xtask progress

transient occurs(and is detected)

P

REDO task

55

Backward Error Recovery

• Save process state at predetermined (periodic) Save process state at predetermined (periodic) recovery pointsrecovery points– Called “checkpoints”– Checkpoints stored on stable storage, not affected by same failures

• Recover by Recover by rolling backrolling back to a previously saved (error-free) state to a previously saved (error-free) state

task progress

transient

task progress transient (& acceptance

test)

X

Xchkpt chkpt

chkpt: complete set of (state) information needed to re-starttask executionfrom chkpt.

P

P

56

Advantages of Backward Recovery

+ Requires no knowledge of the errors in the system state

+ Can handle arbitrary / unpredictable faults (as long as they do not affect the recovery mechanism)

+ Can be applied regardless of the sustained damage (the saved state must be error-free, though)

+ General scheme / application independent

+ Particularly suitable for recovering from transient faults

57

Disadvantages of Backward Recovery

―Requires significant resources (e.g. time, computation, stable storage) for checkpointing and recovery

―Checkpointing requires– To identify consistent states– The system to be halted / slowed down temporarily

―Care must be taken in concurrent systems to avoid the orphans, lost and domino effects (will cover later in the lecture...)

58

Forward Error Recovery

• Detect the error• Damage assessment• Build a new error-free state from which the system can

continue execution– “Safe stop”– Degraded mode– Error compensation

• E.g., switching to a different component, etc…

Faultdetected

Fault manifests

State Reconstruction

Damage Assessment

59

Advantages of Forward Recovery

+ Efficient (time / memory)– If the characteristics of the fault are well understood, forward recovery is

a very efficient solution

+ Well suited for real-time applications– Missed deadlines can be addressed

+ Anticipated faults can be dealt with in a timely way using redundancy

60

Disadvantages of Forward Recovery

—Application-specific—Can only remove predictable errors from the system state—Requires knowledge of the actual error—Depends on the accuracy of error detection, potential damage

prediction, and actual damage assessment—Not usable if the system state is damaged beyond

recoverability

61

Error Recovery

• Save process state at predetermined (periodic) Save process state at predetermined (periodic) recovery pointsrecovery points– Called “checkpoints”– Checkpoints stored on stable storage, not affected by same failures

• Recover by Recover by rolling backrolling back to a previously saved (error-free) state to a previously saved (error-free) state

task progress

transient

task progress transient (& acceptance

test)

X

Xchkpt chkpt

chkpt: complete set of (state) information needed to re-starttask executionfrom chkpt.

P

P

62

Logging Requests

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {checkpoint(&nxt_ckpt);

wait_for_request(Request);log_to_disk(++request_no,Request);

process_request(Request);}

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {checkpoint(&nxt_ckpt);

wait_for_request(Request);log_to_disk(++request_no,Request);

process_request(Request);}

63

Processing Log

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {if (checkpoint(&nxt_ckpt) == recovery) {

while((request_no+1,R) in log) { process_request(R); request_no+

+;}

}wait_for_request(Request);

log_to_disk(++request_no,Request);process_request(Request);

}

request_no = 0;...

for (int nxt_ckpt=0 ;; nxt_ckpt--) {if (checkpoint(&nxt_ckpt) == recovery) {

while((request_no+1,R) in log) { process_request(R); request_no+

+;}

}wait_for_request(Request);

log_to_disk(++request_no,Request);process_request(Request);

}

64 64

Problems:Lost Updates, corrupted saved states...not

easy to fix!

• State diverges from original computationState diverges from original computation– results of replayed request might be different

• could detect this by keeping a log of replies– new client request might be processed correctly

• e.g., ids in requests might not make sense to the current server instance

65

Frequency vs Completeness

• Less complete checkpointLess complete checkpoint– higher probability that error is purged from saved state– omitted state needs to be recomputed on recovery

• Less frequent checkpointingLess frequent checkpointing– checkpoint becomes larger– state information becomes stale– …

• ““Application save” is (in practice) very robust Application save” is (in practice) very robust – might not always contain all info (e.g., window position) for

transparent restart

67

Distributed Systems: Checkpointing

So So how how does one place the chkpts & does one place the chkpts & wherewhere??

Should we synchronize process-es(-ors) & checkpoints?Should we synchronize process-es(-ors) & checkpoints?

P1

P2

P3

Note: A system can be synchronous though the msg. based comm. can still be async!

68

Options for Checkpoint Storage?

• Key building block: Key building block: stable storagestable storage– Persistent: survives the failure of the entity that created/initialized/used it– Reliable: very low probability of losing or corrupting info

• Implementation Implementation – Typically non-volatile media (disks)– Single disk? Often replicated/multiple volatile memories– Make sure one replica at least always survives!

69

Options for Checkpoint Placement?

• UncoordinatedUncoordinated: processes take checkpoints independently: processes take checkpoints independently– Pro: no delays– Con: consistency?

• CoordinatedCoordinated: have processes coordinate before taking a checkpoint: have processes coordinate before taking a checkpoint– Pro: globally consistent checkpoints– Con: co-ordination delays

• Communication-inducedCommunication-induced: checkpoint when receiving and prior to : checkpoint when receiving and prior to processing messages that may introduce conflictsprocessing messages that may introduce conflicts

70

What happens when we don’t synchronize?

orphan msgs.orphan msgs. lost msgs.lost msgs.

P1

P2

P1

P2

X

X

chkpt C1 chkpt C1

chkpt C2 chkpt C2

Msg Msg

fault

fault

Rollback to C1 & C2 gives an inconsistent state

71

..and more problems... domino effects

P1

P2

X

fault

* problems are fixable though require considerable pre-planning

oo

72

• PP11 fails, recovers, rolls back to fails, recovers, rolls back to CCaa

• PP22 finds it received message ( finds it received message (mmii) never sent, rollback to ) never sent, rollback to CCbb

• PP33 finds it received message ( finds it received message (mmjj) never sent, roll back to ) never sent, roll back to CCcc• …………

P1

P2

P3

Recovery line Ca

Cb

Cc

Boom!

mi

mj

73

Consistent Checkpoints: No orphans, lost msgs or dominos!

P1

P2

all messages sent ARE recorded with a consistent cut!

P3

consistent cut

74

• Processes co-ordinate (synchronize) to set checkpoints guaranteed to be Processes co-ordinate (synchronize) to set checkpoints guaranteed to be consistentconsistent– 2 Phase Consistent Checkpointing

Phase IPhase I: : An initiator node X takes a “tentative” checkpoint and requests all other processes to set checkpoints. All processes inform X when they are willing to checkpoint

Phase IIPhase II: : If all other processes are willing to checkpoint, then X decides to make its checkpoint permanent; otherwise X decides that all checkpoints shall be discarded. Informs all of decision

Either all or none take permanent checkpoints!

Synchronizing Checkpoints (not the processors!)

75

2Phase Consistent Checkpoints

X

R

{X1,R1,S1} preliminary checkpoints{X2,R2,S2} consistent checkpoints

S

requests

X1 X2

S2

R1 R2

S1

76

Atomic Commitment and Window of Vulnerability

• So far, recovery of actions that can be individually rolled back…So far, recovery of actions that can be individually rolled back…

• Better idea: Better idea: – Encapsulate actions in sequences that cannot be undone individually– Atomic transactions provide this– Properties: ACID

• Atomicity: transaction is an indivisible unit of work• Consistency: transaction leaves system in correct state or aborts• Isolation: transactions’ behavior not affected by other concurrent

transactions• Durability: transaction’s effects are permanent after it commits • (Serializable)

77

Atomic Commit (cont.)

• To implement transactions, processes must coordinate!To implement transactions, processes must coordinate!– Bundling of related events– Coordination between processes

• One protocol: two-phase commitOne protocol: two-phase commit

Commit Abort

Q: can this somehow block?

78

Two-phase commit (cont.)

• Problem: coordinator failure after PREPARE & before COMMIT blocks Problem: coordinator failure after PREPARE & before COMMIT blocks participants waiting for decision (a)participants waiting for decision (a)

• Three-phase commit overcomes this (b)Three-phase commit overcomes this (b)– delay final decision until enough processes “know” which decision will be

taken

79

State Transfer

• Reintegrating a failed component requires state Reintegrating a failed component requires state transfer!transfer!– If checkpoint/log to stable storage, recovering replica can do

incremental transfer• Recover first from last checkpoint• Get further logs from active replicas

– Goal: minimal interference with remaining replicas– Problem: state is being updated!

• Might result in incorrect state transfer (have to coordinate with ongoing messages)

• Might change such that the new replica can never catch up!– Solution: give higher priority to state-transfer messages

• Lots of variations…

Documents

1 Win XP/Vista/Win7+++ Win 2000. 2 Bugs, Bugs and Bugs