Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
DistributedSystemsDay6:DistributedHashTables [Part3]
Agenda
• Tapestry• Lookups(routing)inTapestry• AddingaNode• Deleting anode• Dealingwithfailures
TapestryNode
• ObjectStore• Localkey-value• I.e.,keyswherethisnodeis“Root”
• Route—Table• Atableofneighbors
• Backpointers• Alistofnodes(whohavethisnodeasintheirrouteTable)
RouteTableBackpointers
Objectstore(key,valuestore)
BackPointers
RouteTableBackPointers
Objectstore(key,valuestore)
IfNodeA isinNodeB’s RouteTableThenNodeB isinNodeA’s Backpointers
RouteTableBackPointers
Objectstore(key,valuestore)
NodeA NodeB
Backpointers areusefulduring:*GracefulExit*Addinganewnode
RouteTable
• RouteTableisBbyB• EachRowidentifies aprefix• EachColumnisasubset oftheprefix
• Eachcellisanode• Anodemayshowupinmultiple cell• Therearemultipleoptions
• 2303and2111arebothvalid• Pick `best’’option
• Best==Closestnode.
XXXX3XXX33XX331X
00331
———
11332311133113311
22302
—33203312
333123312
——
RouteTableforNode:3312
RouteTableBackPointers
Objectstore(key,value store)
XXXX3XXX33XX331X
0 1 2 3
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:3312
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
X X X0
XXXX3XXX33XX331X
00331
1 2 3
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
X X X0
2130
3122
XXXX3XXX33XX331X
00331
11332
2 3
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
1332 1331
1001
X X X1
2130
3122
XXXX3XXX33XX331X
00331
11332
22302
3
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
X X X2
2130
3122
XXXX3XXX33XX331X
00331
11332
22302
33312
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
X X X3
2130
3122
XXXX3XXX33XX331X
00331
—
11332
22302
33312
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
0 X X3
2130
3122
XXXX3XXX33XX331X
00331
—
113323111
22302
33312
RowdeterminesprefixlengthColdeterminesdigit
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
1 X X3
2130
3122
XXXX3XXX33XX331X
00331
———
11332311133113311
22302
—33203312
333123312
——
RouteTableforNode:33123312 2302
31111332
31200121
0331
1331
1001
3311
3320
2130
3122
XXXX3XXX33XX332X
00121
——
3320
1100131203311
—
22130
—3320
—
333113320
——
RouteTableforNode:3320
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
2=Prefix(3312,3320)
Tapestry:IDLookup
HowtoRoute?UsingPrefixLookup
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
lookup:3122
3XXX31XX
312X
165
RoutingAlgorithm
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
lookup:3122
3XXX
XXXX2XXX21XX213X
00331
—--
2130
113322130
----
22302
—----
3331223022130
—
RouteTableforNode:2130
//executedateachnodeinroutetodestination
NextHop(targetHash,step){nextDigit=digit(targetHash,step)
return(table[step,nextDigit])}
//executedateachnodeinroutetodestination
NextHop(targetHash,step){nextDigit=digit(targetHash,step)
return(table[step,nextDigit])}
NextDigit=3
Table[0,3]
3122.at(0)
166
RoutingAlgorithm//executedateachnodeinrouteto
destination
NextHop(targetHash,step){nextDigit=digit(targetHash,step)
return(table[step,nextDigit])}
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
lookup:3122
3XXX31XX
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
NextDigit=1
Table[1,1]
3122.at(1)
167
RoutingAlgorithm
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
lookup:3122
3XXX31XX
312X
3122
//executedateachnodeinroutetodestination
NextHop(targetHash,step){nextDigit=digit(targetHash,step)
return(table[step,nextDigit])}
NextDigit=2
Table[2,2]
NextDigit=2
Table[3,2]
3122.at(2)3122.at(3)
WhatHappensifYouaremissinganEntry:SurrogateRouting
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
lookup:3021
3XXX
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
????
171
How?
• Ifnonexthopexists,trythenextlargerdigit,modbase• eachneighbor-tablerowmusthaveatleastoneentry
• why?• ifanytwoneighbor-tablerows(ofdifferentnodes)sharethesameprefix,theymustagreeonwhichentriesarenull• why?
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
????
172
How?
• Ifnonexthopexists,trythenextlargerdigit,modbase• eachneighbor-tablerowmusthaveatleastoneentry
• why?• ifanytwoneighbor-tablerows(ofdifferentnodes)sharethesameprefix,theymustagreeonwhichentriesarenull• why?
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
????
XXXX3XXX33XX332X
00121
——
3320
1100131203311
—
22130
—3320
—
333113320
——
RouteTableforNode:3320
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
2=Prefix(3312,3320)
XXXX3XXX31XX312X
00121
——
3120
1133231223111
---
22302
—31223122
333123311
——
RouteTableforNode:3122 1 =Prefix(3320,3122)
XXXX3XXX33XX332X
00121
——
3320
1100131203311
—
22130
—3320
—
333113320
——
RouteTableforNode:3320
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
XXXX3XXX31XX312X
00121
——
3120
1133231223111
---
22302
—31223122
333123311
——
RouteTableforNode:3122 1 =Prefix(3320,3122)
3122
0121
1xxx
1332
0xxx
2302 3312
2xxx 3xxx
----
31xx
3122
30xx
---- 3311
32xx 33xx
---
311x
3111
310x
3122 ----
312x 313x
3120
3121
---
3120
3122 ---
3122 3123
3320
0121
1xxx
1332
0xxx
2302 3312
2xxx 3xxx
----
31xx
3122
30xx
---- 3311
32xx 33xx
---
331x
3311
330x
3320 ----
332x 333x
3320
3321
---
3320
--- ---
3322 3323
XXXX3XXX33XX332X
00121
——
3320
1133231223311
—
22302
—3320
—
333123311
——
RouteTableforNode:3320
XXXX3XXX31XX312X
00121
——
3120
1133231223111
---
22302
—31223122
333123311
——
RouteTableforNode:3122
SurrogateRoutingfor3021
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
3021
3xxx3xxx 30xx?
30xx?
31xx
31xx
312x312x
3121?31223121?
3122
Tapestry:DeletingANode
WhatNeedstobeDoneduringDeletion?
• TwoversionsofNodeDelete• Graceful:nodewilling leaves
• Candoclean-up• Ungraceful:nodecrashes
• Noopportunity todocleanup
RouteTableBackPointers
Local<K,V> NodeA
• LocalK,Vwilldisappearà mustinformownerstorepublish
• I’minotherpeople’sroutingtableàmusttellthemI’mleaving
WhatNeedstobeDoneduringDeletion?
• TwoversionsofNodeDelete• Graceful:nodewilling leaves
• Candoclean-up• Ungraceful:nodecrashes
• Noopportunity todocleanup
RouteTableBackPointers
Local<K,V>
RouteTableBackPointers
Local<K,V>
RouteTableBackPointers
Local<K,V>
NodeA
NodeBNodeC
RemoteRoutingTable:usebackpointersLocalK/V:Informclienttorepublish
RemoteroutingTable:NodeswilldetectfailureLocalK/V:Clientwillrepublishaftertimeout!!
Tapestry:AddingANode
AddingANewNode
• Step1:FindRootforServerID
• Step2:MovesubsetofobjectsfromRoottoNode
• Step3:MakeNoderoutingtable
• Step4:UpdateRoutingTableforothernodes
RouteTableBackPointers
Objectstore(key,valuestore)
XXXX3XXX33XX332X
00121
——
3320
1100131203311
—
22130
—3320
—
333113320
——
RouteTableforNode:3320
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
XXXX3XXX31XX312X
00121
——
3120
1100131203111
—
22130
—31203122
333113320
——
RouteTableforNode:3122
AddingNode3121• Step1:findRoot
MakeNewNode’sRoutingTable
• Nodesateachstageoflookshareprefix• Youcanbuildanewroutingtableby:
• Option1:Takingsubsetofroutingtables ateachstep• Option2:Takeback-pointers --- foreachbackpointer askforbackpointers
31222130 31113xxx
3123?
AddingNode3121• Step1:findRoot• Step2:Makeroutetablefor3121
3120
Option1
Givemeentriesinrow0
Givemeentriesinrow1
Givemeentriesinrow2
Givemeentriesinrow3
3123RouteTable
XXXX3XXX33XX332X
00121
——
3320
1100131203311
—
22130
—3320
—
333113320
——
RouteTableforNode:3320
XXXX3XXX33XX331X
00331
———
11332311133123311
22302
—33203312
333123312
——
RouteTableforNode:3312
2=Prefix(3312,3320)
XXXX3XXX31XX312X
00121
——
3120
1133231223111
---
22302
—31223122
333123311
——
RouteTableforNode:3122 1 =Prefix(3320,3122)
MakeNewNode’sRoutingTable
• Nodesateachstageoflookshareprefix• Youcanbuildanewroutingtableby:
• Option1:Takingsubsetofroutingtables ateachstep• Option2:Takeback-pointers ---
3122Back-
pointersBack-
pointers
3xxx
3121?
AddingNode3121• Step1:findRoot• Step2:Makeroutetablefor3121
Back-pointers
Option2
Givemepointeroverlap0
Givemepointersoverlap1
Givemepointersoverlap2
Givemepointersoverlap3
3121RouteTable
Step1:Askrootforbackpointerswithoverlapp=prefixlen(newNodeID,RootID)Step2:foreachbackpointers• Askforit’sbackpointerswithoverlapp—Step3:repeatstep2untilp=0
• Everynodein3121Routingtable• Send`hello’message• Theyshouldaddyoutotheirbackpointer
AddingNode3123• Step1:findRoot• Step2:Makeroutetablefor3123• Step2.1:updatebackpointers
RouteTableBackPointers
Objectstore(localKV)
RouteTableBackPointers
Objectstore(localKV)
3121 Nodein3121RoutingTable
• Problem:newnodecanfillinemptyslotinafewroutingtables
• Solutions• Option1:sendamessage toeverynode• Option2:sendamessage to‘need-to-know’nodes
• i.e.,nodes thatweknow haveanyemptyslotwherethenewnode’s IDgoes
AddingNode3121• Step1:findRoot• Step2:Makeroutetablefor3121• Step2.1:updatebackpointers• Step3:update `needtoknownodes’’
SurrogateRoutingfor3021
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
3021
3xxx3xxx 30xx?
30xx?
31xx
31xx
312x312x
3121?31223121?
3122
Anynodewith30XXinitsroutingtable isaneedtoknownode
SurrogateRoutingfor3021
2130
3122
3312 2302
31111332
31200121
0331
1331
1001
3311
3320
3021
3xxx3xxx 30xx?
30xx?
31xx
31xx
312x312x
3121?31223121?
3122
Anynodewith30XXinitsroutingtable isaneedtoknownode
SurrogateRootcontactsnodes
Updatingneedtoknownodes
Root
3121?
Neighbors
SendtoNewNodetoneighborswithp=prefix(root,NewNode)
SendtoNewNodetoneighborswithprefixp--
3***
Note: inthisexample,since p=1(3***),
Tapestry:Optimizations
ImplicationsofNodeFailure
• Problem:whenanodecrashes,allobjectsstoredonnodearelost• Naïvesolution:clientsrepublishobjectsperiodically
• I.e.,You(asaclient)needtorepublishyourfacebook pictures.• Drawback1:clients needtostoreandrepublish• Drawback2:objectsareunavailable untilclientrepublishes
• Whataresomealternativesolutions?
• ``salt’’thehashandpublishseveralcopiesoftheobject• Recoverfromfailurethroughredundancy
LatencyOptimization:EntrySelection
• Problem:multipleoptionsforeachroutingtableentry• Howdoyouselectone?
• Shortestdistance• Benefit: lowerslookuplatency
XXXX3XXX33XX331X
0331———
1332311133123311
2302—
33203312
33123312
——
Options:• 3312• 3320• 3311• 3312• 3111
DistributedHashTableRecap
• Consistenthash(Chash)• Benefitofconsistenthashingovertraditional keyallocation• Howtomapkeystoservers
• Chord(practicaluseofChash)• Terms:Successor, routingtable (finger table),• Building aroutingtable• Performing look-ups
• Tapestry• Terms: rootnode,surrogatenode,backpointers, publishing• Building routingtable• Performing lookup(regular routing, surrogaterouting)• Adding/deleting anode(``need-to-know’’)• Optimizations (``salting’’,entryselection)
k1 v1
k2 v2
k3 v3
k4 v4
k5 v5
k0 v0
Hash table
k4 v4
k5 v5
RouteTableBackPointers
Local<K,V>
32ff
7fc5
9d5e
0d61f623
3a3e
8006
8 b 8 7
7 1 b 1
5 2 6 9
a a 2 e
a 2 d 4
8 fa a
9 d ca
dfcf
c1d9