View
9
Download
0
Category
Preview:
Citation preview
Chandy-Lamport Snapshotting
COS 418: Distributed SystemsPrecept 8
Themis Melissaris and Daniel Suo
[Content adapted from I. Gupta]
Distributed Snapshots: Determining Global States of a Distributed SystemK. Mani Chandy and Leslie Lamport ACM Transactions on Computer SystemsFebruary 4, 1985
Globalsnapshots
3
Exampleofaglobalsnapshot
4
Butthatwaseasy
• Inoursystemofworldleaders,wewereabletocapturetheir‘state’(i.e.,likeness)easily– Synchronizedinspace– Synchronizedintime
• Howwouldwetakeaglobalsnapshotiftheleaderswereallathome?
• WhatifObamatoldTrudeauthatheshouldreallyputonashirt?
• Thismessageispartofoursystemstate!5
Globalsnapshotisglobalstate
• Eachdistributedapplicationhasanumberofprocesses(leaders)runningonanumberofphysicalservers
• Theseprocessescommunicatewitheachotherviachannels(textmessaging)
• Asnapshot capturesthelocalstatesofeachprocess(e.g.,programvariables)alongwiththestateofeachcommunicationchannel
6
Whydoweneedsnapshots?
• Checkpointing:restartiftheapplicationfails• Collectinggarbage:removeobjectsthatdon’thaveanyreferences
• Detectingdeadlocks:canexaminethecurrentapplicationstate
• Otherdebugging:alittleeasiertoworkwiththanprintf…
7
Wecouldjustsynchronizeclocks
• Eachprocessrecordsstateattimesomeagreedupont– Butclocksskew– Andwewouldn’trecordmessages
• Doweneedsynchronization?• WhatdidLamport realizeaboutorderingevents?
8
• Twoprocesses:P1andP2
Exampleofglobalsnapshotsv2
9
P1 P2
• ChannelC12 fromP1toP2• ChannelC21 fromP2toP1
Exampleofglobalsnapshotsv2
10
P1 P2
C12
C21
• ProcessstatesforP1andP2
Exampleofglobalsnapshotsv2
11
P1 P2
C12
C21
X:0Y:0Z:0
X:1Y:2Z:3
• Channelstates(i.e.,messages)forC12andC21• Thisisourinitialglobalstate• Alsoaglobalsnapshot
Exampleofglobalsnapshotsv2
12
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:1Y2:2Z2:3
• P1 tellsP2 tochangeitsstatevariable,X2,from1to4
• Thisisanotherglobalsnapshot
Exampleofglobalsnapshotsv2
13
P1 P2
C12:[X2 → 4]
C21:[Empty]
X1:0Y1:0Z1:0
X2:1Y2:2Z2:3
• P2 receivesthemessagefromP1• Anotherglobalsnapshot
Exampleofglobalsnapshotsv2
14
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:1Y2:2Z2:3
X2 → 4
• P2 changesitsstatevariable,X2,from1to4• Andanotherglobalsnapshot
Exampleofglobalsnapshotsv2
15
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
• Theglobalstatechangeswheneveraneventhappens– Processsendsmessage– Processreceivesmessage– Processtakesastep
• Movingfromstatetostateobeyscausality
Summary
16
Chandy-Lamport algorithm
17
• Problem:recordaglobalsnapshot(stateforeachprocessandchannel)
• Model– N processesinthesystemwithnofailures– TherearetwoFIFOunidirectionalchannelsbetweeneveryprocesspair(Pi →Pj andPj →Pi)
– Allmessagesarrive,intact,notduplicated• Futureworkrelaxestheseassumptions
Systemmodel
18
• Takingasnapshotshouldn’tinterferewithnormalapplicationbehavior– Don’tstopsendingmessages– Don’tstoptheapplication!
• Eachprocesscanrecorditsownstate• Collectstateinadistributedmanner• Anyprocesscaninitiateasnapshot
Systemrequirements
19
• Let’ssayprocessPi initiatesthesnapshot• Pi recordsitsownstateandpreparesaspecialmarkermessage(distinctfromapplicationmessages)
• Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)
• StartrecordingallincomingmessagesfromchannelsCji forj notequaltoi
Initiatingasnapshot
20
• ForallprocessesPj (includingtheinitiator),consideramessageonchannelCkj
• Ifweseemarkermessageforthefirsttime– Pj recordsownstateandmarksCkj asempty– Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)
– StartrecordingallincomingmessagesfromchannelsClj forl notequaltojork
• Elseaddallmessagesfrominboundchannelssincewebeganrecordingtotheirstates
Propagatingasnapshot
21
• Allprocesseshavereceivedamarker(andrecordedtheirownstate)
• AllprocesseshavereceivedamarkeronalltheN-1 incomingchannels(andrecordedtheirstates)
• Later,acentralservercangatherthepartialstatetobuildaglobalsnapshot
Terminatingasnapshot
22
• P1 initiatesasnapshot
Example
23
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
• First,P1 recordsitsstate
Example
24
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
• Then,P1 sendsamarkermessagetoP2 andbeginsrecordingallmessagesoninboundchannels
• Meanwhile,P2 sentamessagetoP1
Example
25
P1 P2
C12:[<marker>]
C21:[M1]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
• P2 receivesamarkermessageforthefirsttime,sorecordsitsstate
• P2 thensendsamarkermessagetoP1
Example
26
P1 P2
C12:[Empty]
C21:[<marker>]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
<marker>
M1
• P1 hasalreadysentamarkermessage,soitrecordsallmessagesitreceivedoninboundchannelstotheappropriatechannel’sstate
Example
27
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
M1
• Bothprocesseshaverecordedtheirstateandallthestateofallincomingchannels
• Oursnapshottedstateishighlightedinblue
Example
28
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
M1
• RelatedtotheLamport clockpartialordering• Aneventispresnapshot ifitoccursbeforethelocalsnapshotonaprocess
• Postsnapshot ifafterwards• IfeventA happenscausallybeforeeventB,andB ispresnapshot,thenA istoo
Causalconsistency
30
Recommended