If you can't read please download the document
Upload
achille-peternier
View
2.787
Download
0
Embed Size (px)
Citation preview
Shepherd: Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS
Nenad StojniDatabases & Information Systems Group
Outline
Self-organizing properties in OSIRIS and current limitations
The Shepherd approach to fault-toleranceNovel migration algorithm
Shepherd ring: herds, shepherd pools, routing
Binding ring: Service lookup, late binding, load balancing
Summary
present on-going work aimed at improv-ing OSIRIS' fault tolerance capabilities
OSIRIS Open Service Infrastructure for Reliable and Integrated process Support
Decentralized P2P execution of processesWeb Service Invocation
Fault-tolerant, Self-* propertiesLate-binding & Load-balancing
Safe continuation-passing (2PC)
Pub/Sub Meta-data repositories
processes can be imagined as programsthat coordinate the invocation of distributed web services
Late binding of service in-stances, in conjunction with load balancing strategiesOffer alreadz self * properties
Transactional garantees. the system is completelyresilient to temporary node failures.
Also, thanks to late binding, permanent failures of nodes participating to the execution of a process instance, but not involved in a computation at the moment of failure, do not affect the execution.
OSIRIS-Process execution example
BC
A
Process Definition
1
the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Process migration
BC
A
Process Definition
1
EService instancesOSIRIS LayerD5WhiteboardAthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Activity execution
BC
A
Process Definition
1
AEService instancesOSIRIS LayerD5
Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Late binding
BC
A
Process Definition
1
BA2
AEService instancesOSIRIS LayerD5
DB4
the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Late binding
BC
A
Process Definition
1
BA2
AEService instancesOSIRIS LayerD5
Whiteboard
DB4
the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Successor failure
BC
A
Process Definition
DC3
1
BA2
CE6
AEService instancesOSIRIS LayerD5
Whiteboard
DB4
the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Successor failure
BC
A
Process Definition
DC3
1
BA2
CE6
AEService instancesOSIRIS LayerD5
Whiteboard
DB4
X
the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Successor failure
BC
A
Process Definition
DC3
1
BA2
CE6
AEService instancesOSIRIS LayerD5
DB4
X
Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
Replacement node found (late binding)
OSIRIS-Migration failure
BC
A
Process Definition
DC3
1
BA2
CE6
AEService instancesOSIRIS LayerD5
DB4
2PC
WhiteboardX
the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Predecessor failure
BC
A
Process Definition
DC3
1
BA2
CE6
AEService instancesOSIRIS LayerD5
DB4
X
Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS
DC3
1
BA2
CE6
DB4
AEService instancesOSIRIS LayerD
BC
A
Process Definition5Whiteboardthe node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS-Current node failure
DC3
BA2
CE6
Whiteboard
DB4
AEService instancesOSIRIS LayerD
BCAProcess DefinitionX
1
5the node migrates the control for process execution to one or more successor nodes by delivering anactivation token containing flow-control information and the whiteboard
OSIRIS failure handling
Failure caseHandling
Successor failureLate-binding
Migration failure2PC abort
Predecessor failureNo handling necessary
Temporary node failureRecovery from local stable storage
Current node failureProcess execution stops/hangsState is lostNo notification
Hardware, network or service failures
If the node becomes temporarily disconnected from the network, the system is still able to recover.node will keep retrying to pass on the results until it succeeds
Works very well in controlled environnments
Outline
Self-organizing properties in OSIRIS and current limitations
The Shepherd approach to fault-toleranceNovel migration algorithm
Shepherd ring: herds, shepherd pools, routing
Binding ring: Service lookup, late binding, load balancing
Summary
present on-going work aimed at improv-ing OSIRIS' fault tolerance capabilities
Our solution: Shepherd
Shared Memory Layer
BA
OSIRIS Layer
Shepherd Layer
DC3
ED1
BA2
DB4
DA5
2
EC6
DC7Monitor
Read/Write
WN assigned to Shepherds (herds)
Shepherds organized in poolsLeader
Shepherds in the pool share state
Persistence of process state
Triggering of process activity
Shepherd Migration Algorithm
A
S1
K0
1Shepherd starts the activity
Picks a worker from the herd
Sends an activation key K0
Shepherd Migration Algorithm
AS1K0
Worker acknowledges supervision
Resends the activation key K0
Start of monitoring
2
Shepherd Migration Algorithm
AS1
Worker reads the whiteboard with the activation key K0
3
Shepherd Migration Algorithm
(K1,B)
AS1Worker finishes execution
Generates a new activation key K1
Determines the service type to continue the execution
4
Shepherd Migration Algorithm
AS1
Worker writes the whiteboard with the activation key K1
5
Shepherd Migration Algorithm
Wack
AS16Worker acknowledges write of whiteboard
Supervision ends
Shepherd Migration Algorithm
AS1S2
K1,B
7Shepherd migrates to another shepherd
Passes on the activation key K1 and following service type
Shepherd Migration Algorithm
(K1,B)
Wack
A
S1
BS2
CS3
K0
K1,B
K1
K2,C
K2
K2
135467
(K2,C)
Wack
(K3,...)
Wack
2K0
K1
K1
K2
Leader of a pool communicates to a WN an activation key KiUsing Ki, WN gets the porcess state form SMLWN writes the next process activity with a new key Ki+1 to SML WN sends the new activation key Ki+1 to the assigned pool of ShepherdsLeader of the pool forwards the activation key to another pool of shepherdsAnother step that deltes entires from the shm
Shepherd failure cases
Failure of worker nodes
Failure of shepherds
Failures in the shared memory
Failure of worker nodes
Replacement node from the herd
Same service type
Fail-safe services
BUT undo side effects on Shared Memory
Wack
S1A''S2
K0
35A'
K0
...
AX
Unique activation key provides indenpendance of process activities
Temporarily failed WNs that have been replaced are terminated
The side-eff
ects created by B that are notstored on the shared memory cannot be
undone
Failure of shepherds
Shepherds organized in pools, state shared
WN speaks to the pool
Transactional writes consistency guaranteed
New leader learns current state from the pool
A
S1X
S2
Wack
...
DHT-like structured overlay
Paxos commit protocol
consistent information about the state of the activity it is supervising. Distributed transaction
DHT fault detection mechanism to elect anappropriate shepherd replacement replica
Failures in shared memory
Chord-based
Replicated transactional storageSuccessful writes persistent
failed read/write can be always retried
A
S1
X
Beernet DHT implementationwith respect to the migration algorithm, only a passive role
Shepherd ring
Used for:Worker node to shepherd assignment
Routing of messages from WN to shepherds
Pools construction
Based on Chord structured overlayIndentifier circle of Shepherd node IDs and Worker node Ids (Consistent hashing)
Efficient routing: Log(NSh)
Routing mechanism
Failure-detection mechanism
their state relative to the execution of the migration algorithm
We use it to are assigned nodes to the herd of a shepherdseveral shepherds coordinate to form a poolhow leader election within a pool proceeds
Communication between a worker node and a pool of shepherds
Shepherds are phisical nodes and Wns the reource to be storedWorker node ids in the circle lying inbetween 2 shepherd ids become the herd of the adjacent shepherd
Shepherd ring
S2
S5
S3
S4
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
WN
deliver(96.76.89.12,join())
Worker requests an assignment to a shepherd
Submits a join message to any known shepherd
If a shepherd leaves the ring the subsequent one takes over the herd
Shepherd ring
S2
S5
S3
S4
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
h(96.76.89.12) = ID17
IP16=98.x.x.x
deliver(96.76.89.12,join())
Shepherd hashes worker Id
Routs the join message another shepherd
Routing until the responsible is found
Shepherd ring
S2
S5
S3
S4
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
IP16=98.x.x.x
deliver(96.76.89.12,join())
IP17=96.76.89.12Worker joins the herd
Exchanges heartbeats with its shepherd
Shepherd pools
Symmetric replication strategy:Node ID congruence-modulo equivalence classes
Responsible for x knows entire class of x
Pool = all responsibles for a class
Transactional guaranteesPaxos consensus
Shepherd pools
S2
S5
S3
S4
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
Equivalence class: ID1, ID9, ID17
Congruence modulo: 8
Pool: S2, S3, S5
Pool size : 3
Shepherd pools
S2
S5
S3
S4
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
Equivalence class: ID1, ID9, ID17
Congruence modulo: 8
Pool: S2, S3, S5
Equivalence class: ID2, ID10, ID18
Pool: S2, S3, S5
Pool size : 3
Shepherd pools
S2
S5
S3
S4
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
Equivalence class: ID1, ID9, ID17
Congruence modulo: 8
Pool: S2, S3, S5
Equivalence class: ID2, ID10, ID18
Pool: S2, S3, S5
Equivalence class: ID3, ID11, ID19
Pool: S2, S3, S1
Pool size : 3
Late binding
Locate a shepherd providing service type TShepherd provides type T if it monitors instances of type T
Binding ringPhysical nodes & service types (resources)
Distributed multimap data structureService type List of shepherds
Binding ring
O3
O7
O5
O6
S8
O4
T1
T0
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
T12
T23
T21
T22
T19
T20
T17
T18
T15
T16
T13
T14
store(T,S5)
O1
O8
O2
rnd[1, Nfrag]? 2
Tfrag3
Storing shepherd S5 providing service type T
Query for number of fragments of type T
Binding ring
O3
O7
O5
O6
S8
O4
T1
T0
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
T12
T23
T21
T22
T19
T20
T17
T18
T15
T16
T13
T14
store(T,S5)
O1
O8
O2
Tfrag3S2S3Tfrag1S1
Tfrag2S4
Cfrag=3
Fragments of service type T in the ring
Each fragment is a multimap
Binding ring
O3
O7
O5
O6
S8
O4
T1
T0
T2
T3
T4
T6
T7
T8
T9
T10
T11
T12
T23
T21
T22
T19
T20
T17
T18
T15
T16
T13
T14
store(T,S5)
O1
O8
O2
Tfrag3S2S3Tfrag1S1
Tfrag2S4S5
rnd[1, Nfrag]? 2
storefrag(Tfrag2,S5)
T5
Random selection of fragment for storage
If storage is full, create a new fragment and add to it
Load balancing
Optimize performance
Extended binding ringShepherd average load
Publish/subscribe of load information
take into considerationother factors to improve porcess execution
as explained above is sufficient to guarantee the correctness of the routing and enableLate-binding.
Aggregate load
Load balancing
S3
S7
S5
ID1
ID0
ID2
ID3
ID4
ID5
ID6
ID7
ID8
ID9
ID10
ID11
ID12
ID23
ID21
ID22
ID19
ID20
ID17
ID18
ID15
ID16
ID13
ID14
S1
WN3 Load = 40%
WN5 Load = 60%
1122Shepherd ring
Worker nodes publish load to their shepherd
Load balancing
O3
O7
O5
O6
S8
O4
T1
T0
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
T12
T23
T21
T22
T19
T20
T17
T18
T15
T16
T13
T14
O1
O8
O2
S375%S260%Cfrag2
S170%Cfrag1S170%S455%Afrag1
Cfrag2Cbest
Binding ring
Avg. load of a shepherd for a service type
Avg. load lists sorted in fragments
Load balancing
O3
O7
O5
O6
S8
O4
ID9
T21
T22
ID15
O1
O8
O2
Cfrag1Cbest
Cbest = < Cfrag1, 50% >
T1
T0
T23
T2
T3
T4
T5
T6
T7
T8
T10
T11
T12
T13
T14
T19
T20
T17
T18
T16
S455%S150%Afrag1
S375%S260%Cfrag2
S150%Cfrag1
Start contest
Least loaded type fragment becomes the best fragment
Outline
Self-organizing properties in OSIRIS and current limitations
The Shepherd approach to fault-toleranceNovel migration algorithm
Shepherd ring: herds, shepherd pools, routing
Binding ring: Service lookup, late binding, load balancing
Summary
present on-going work aimed at improv-ing OSIRIS' fault tolerance capabilities
Summary
Shepherd: Improved self-* properties in OSIRIS
Novel completely decentralized architecture
Future Work:Implementation & Experimental evaluation
Extend to Stream-enabled services
Customize transactional protocols for efficiency
Economical cost-model (trade-off performance vs. robustness)
Thank you for your attention!
Questions ?
Click to edit the title text format
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso