Upload
marcus
View
27
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Locality Sensitive Distributed Computing. David Peleg Weizmann Institute. Structure of mini-course. Basics of distributed network algorithms Locality-preserving network representations Constructions and applications. Part 1: Basic distributed algorithms. Model Broadcast - PowerPoint PPT Presentation
Citation preview
Locality Sensitive Distributed Computing
David PelegWeizmann Institute
Structure of mini-course
1. Basics of distributed network algorithms
2. Locality-preserving network representations
3. Constructions and applications
Part 1: Basic distributed algorithms
• Model• Broadcast • Tree constructions• Synchronizers• Coloring, MIS
The distributed network model
Point-to-point communication network
The distributed network model
Described by undirected weightedgraph G(V,E,)
V={v1,…,vn} - Processors (network sites)E - bidirectional communication links
The distributed network model
: E + edge weight functionrepresenting transmission costs (usually satisfies triangle inequality)
Unique processor ID's: ID : V SS={s1,s2,…} ordered set of integers
Communication
Processor v has deg(v,G) ports (external connection points)
Edge e represents pair ((u,i),(v,j)) = link connecting u's port i to v's port j
Communication
Message transmission from u to neighbor v:
• u loads M onto port i• v receives M in input buffer of port j
CommunicationAssumption:At most one message can occupy a communication link at any given time(Link is available for next transmission only after previous message is removed from input buffer by receiving processor)
Allowable message size = O(log n) bits(messages carry a fixed number of vertex ID's, e.g., sender and destination)
Issues unique to distributed computing
There are several inherent differences between the distributed
and the traditional centralized-sequential computational models
Communication
In centralized setting: Issue nonexistent
In distributed setting: Communication• has its limits (in speed and capacity)• does not come “for free”
should be treated as a computational resourcesuch as time or memory(often - the dominating consideration)
Communication as a scarce resource
One common model: LOCAL
Assumes local processing comes for free (Algorithm pays only for communication)
Incomplete knowledge
In centralized-sequential setting: Processor knows everything (inputs, intermediate results, etc.)
In distributed setting:Processors have very partial picture
Partial topological knowledge
Model of anonymous networks: Identical nodes no ID'sno topology knowledge
Intermediate models:Estimates for network diameter, # nodes etcunique identifiersneighbor knowledge
Partial topological knowledge (cont)
Permissive models:Topological knowledge of large regions, or even entire network
Structured models:Known sub-structure, e.g., spanning tree / subgraph / hierarchical partition / routing service available
Other knowledge deficiencies
• know only local portion of the input• do not know who else participates• do not know current stage of other participants
Coping with failures
In centralized setting: Straightforward -Upon abnormal termination or system crash:Locate source of failure, fix it and go on.
In distributed setting: Complication -When one component fails, others continue
Ambitious goal: ensure protocol runs correctly despite occasional failures at some machines (including “confusion-causing failures”, e.g., failed processors sending corrupted messages)
Timing and synchrony
Fully synchronous network:• All link delays are bounded• Each processor keeps local clock• Local pulses satisfy following property:
Think of entire system as driven by global clock
Message sent from v to neighbor u at pulse p of v arrives u before its pulse p+1
Timing and synchrony
Machine cycle of processors - composed of 3 steps:
1. Send msgs to (some) neighbors
2. Wait to receive msgs from neighbors
3. Perform some local computation
Asynchronous model
Algorithms are event-driven :
• No access to global clock
• Messages sent from processor to neighbor arrive within finite but unpredictable time
Asynchronous model
Clock can't tell if message is coming or not:perhaps “the message is still on its way”
Impossible to rely on ordering of events(might reverse due to different message transmission speeds)
Nondeterminism
Asynchronous computations are inherently
nondeterministic
(even when protocols do not use randomization)
Nondeterminism
Reason: Message arrival order may differ from one execution to another (e.g., due to other events concurrently occurring in the system – queues, failures)
Run same algorithm twice on same inputs - get different outputs / “scenarios”
Nondeterminism
Complexity measures
• Traditional (time, memory)• New (messages, communication)
Time
For synchronous algorithm :Time() = (worst case) # pulses during
execution
For asynchronous algorithm ?
(Even a single message can incur arbitrary delay ! )
Time
For asynchronous algorithm :
Time() = (worst-case) # time units from start to end of execution,
assuming each message incurs delay < 1 time unit (*)
Time
Note:1. Assumption (*) is used only for performance
evaluation, not for correctness.2. (*) does not restrict set of possible scenarios
– any execution can be “normalized” to fit this constraint
3. “Worst-case” means all possible inputs and all possible scenarios over each input
Memory
Mem() = (worst-case) # memory bits used throughout the network
MaxMem() = maximum local memory
Message complexity
Basic message = O(log n) bits
Longer messages cost proportionally to length
Sending basic message over edge costs 1
Message() = (worst case) # basic messages sent during execution
Distance definitions
Length of path (e1,...,es) = s
dist(u,w,G) = length of shortest u - w path in G
Diameter:
Diam(G) = maxu,vV {dist(u,v,G)}
Distance definitions (cont)
Radius:
Rad(G) = minvV {Rad(v,G)}
Rad(v,G) = maxwV {dist(v,w,G)}
A center of G: vertex v s.t. Rad(v,G)=Rad(G)
Observe: Rad(G) < Diam(G) < 2Rad(G)
Broadcast
Goal:Disseminate message M originated at source r0 to all vertices in network
M
M
M
M
MM
M
Basic lower bounds
Thm:For every broadcast algorithm B:
• Message(B) > n-1,
• Time(B) > Rad(r0,G) = (Diam(G))
Tree broadcast
Algorithm Tcast(r0,T)
• Use spanning tree T of G rooted at r0
• Root broadcasts M to all its children• Each node v getting M, forwards it to children
Tree broadcast (cont)
Assume: Spanning tree known to all nodes
(Q: what does it mean in distributed context?)
Tree broadcast (cont)
Claim: For spanning tree T rooted at r0:
• Message(Tcast) = n-1
• Time(Tcast) = Depth(T)
Tcast on BFS tree
BFS (Breadth-First Search) tree = Shortest-paths tree:
The level of each v in T is dist(r0,v,G)
Tcast (cont)
Corollary:For BFS tree T w.r.t. r0:
• Message(Tcast) = n-1
• Time(Tcast) < Diam(G) (Optimal in both)
But what if there is no spanning tree ?
The flooding algorithm
Algorithm Flood(r0)
1. Source sends M on each outgoing link
2. For other vertex v:• On receiving M first time over edge e: store in buffer; forward on every edge ≠ e• On receiving M again (over other edges): discard it and do nothing
Flooding - correctness
Lemma: 1. Alg. Flood yields correct broadcast2. Time(Flood)=(Rad(r0,G)) = (Diam(G))
3. Message(Flood)=(|E|) in both synchronous and asynchronous model
Proof:Message complexity: Each edge delivers m at
most once in each direction
Neighborhoods
(v) = -neighborhood of v = vertices at distance or less from v
0(v)
1(v)
2(v)
Time complexity
Verify (by induction on t) that:
After t time units, M has already reached every vertex at distance < t from r0
(= every vertex in the t-neighborhood t(r0) )
Note: In asynchronous model, M may have reached additional vertices(messages may travel faster)
Time complexity
Note: Algorithm Flood implicitly constructs directed spanning tree T rooted at r0,
defined as follows:
The parent of each v in T is the node from which v received M for the first time
Lemma: In the synchronous model,T is a BFS tree w.r.t. r0, with depth Rad(r0,G)
Flood time
Note: In the asynchronous model, T may be deeper (< n-1)
Note: Time is still O(Diam(G)) even in this case!
r0
Broadcast with echo
Goal: Verify successful completion of broadcast
Method: Collect acknowledgements on a spanning tree T
Broadcast with echo
Converge(Ack) process - code for v
Upon getting M do:• For v leaf in T:
- Send up an Ack message to parent• For v non-leaf:
- Collect Ack messages from all children- Send Ack message to parent
Collecting Ack’s
Semantics of Ack from v“Joint ack” for entire subtree Tv rooted at v,signifying that each vertex in Tv received M
r0 receives Ack from all children
only after all vertices received MClaim: On tree T,
• Message(Converge(Ack)) = O(n)
• Time(Converge(Ack))=O(Depth(T))
Tree selection
Tree broadcast alg: Take same tree used for broadcast.Time / message complexities grow by const factor.
Flooding alg: Use tree T defined by broadcastSynch. model: BFS tree - complexities doubleAsynch. model: no guarantee
Tree selection - complexity
Lemma: In network G(V,E) of diameter D, complexities of “broadcast with echo” are:
• Message(FloodEcho)=O(|E|)• Time(FloodEcho)=
O(D) in synchronous model,O(n) in asynchronous model.
• In both models, M reaches all by time D
BFS tree constructions
In synchronous model:
Algorithm Flood generates BFS tree of optimal
• Message(Flood)=(|E|)
• Time(Flood) = (Diam(G))
In asynchronous model:Tree generated by Algorithm Flood is not BFS
Level-synchronized BFS construction (Dijkstra)
Idea:• Develop BFS tree from root r0 in phases,
level by level• Build next level by adding all vertices
adjacent to nodes in lowest tree level
After p phases: Constructed partial tree Tp
• The tree Tp is a BFS tree for p(r0)
• Each v in Tp knows its parent, children, depth
Level-synchronized BFS (Dijkstra)
Level-synchronized BFS (Dijkstra)
Phase p+1:1. r0 broadcasts message Pulse on Tp
2. Each leaf of Tp sends “exploration” message Layer to all neighbors except parent.
Level-synchronized BFS (Dijkstra)
3. Vertex w receiving Layer message for the first time (possibly from many neighbors) picks one neighbor v, lists it as parent, sends back Ack messages to all Layer messages
Vertex w in Tp receiving Layer message sends back Ack messages to all Layer messages
Level-synchronized BFS (Dijkstra)
4. Each leaf v collects acks on exploration msgs.If w chose v as parent, v lists w as child
5. Once receiving Ack on all Layer messages, leaf v Ack s parent.Acks are convergecast on Tp
back to r0.6. Once convergecast
terminates, r0 starts
next phase
AnalysisCorrectness: By induction on p, show:
• After phase p, variables parent and child define legal BFS tree spanning r0's p-neighborhood
Algorithm constructs BFS
tree rooted at r0.
Analysis (cont)
Time complexity:
Time(Phase p) = 2p+2
Time = ∑p 2p+2 = O(Diam2(G))
Analysis (cont)
Message complexity:For integer p > 0 let
Vp = vertices in layer pEp= edges internal to Vp
Ep,p+1 = edges between Vp and Vp+1
Analysis (cont)
Phase p:Layer msgs of phase p - sent only on Ep and Ep,p+1
• Only O(1) messages
sent over each edge• Tp edges are traversed
twice (< 2n messages)
Analysis (cont)Comm(Phase p) = O(n) + O(|Ep|+|Ep,p+1|)
In total: Comm = ∑p O(n + |Ep|+|Ep,p+1|) =O(n Diam(G)+|E|)
Complexities of BFS algorithms
Reference Messages Time
Lower bound E D(+ Sync. Model)
Dijkstra E+ n D D2
Bellman-Ford nE D
Best known E + n log3 n D log3 n
SynchronizersGoal: Transform algorithm for synchronous networks into algorithmfor asynchronous networks.
Motivation:Algorithms for the synchronous model - easier to design / debug / testthan ones for the asynchonous model
(Behavior of asynchronous system - harder to analyze)
SynchronizersSynchronizer: Methodology for such simulation:
Given algorithm S for synchronous network, and synchronizer , combine them to yield protocol A=(S) executable on asynchronous network
Correctness requirement:A's execution on asynchronous network - “similar” to S's execution on synchronous one
Underlying simulation principles
Combined protocol A composed of two parts:• original component• synchronization component(each with its own local var's and msg types)
Pulse generator: Processor v has pulse var Pv,generating sequence of local clock pulses,i.e., periodically increasing Pv=0,1,2,...
Underlying simulation principles
Under protocol A, each v performs during time interval when Pv=p precisely the actions it should perform during round p of the synchronous algorithm S
Def: t(v,p) = global time when v increased its pulse to p.
We say that “v is at pulse Pv=p” during the time interval
(v,p) = [t(v,p),t(v,p+1))
Underlying simulation principles
Pulse compatibility:
If processor v sends original message M to neighbor w during its pulse Pv=p
then w receives M during its pulse Pw=p
Correct simulations
Synchronous protocol S
Simulating protocol A=(S)
Execution S = S(G,I) of S in synch' network
Execution A = A(G,I) of A in asynch' network
(same topology G, same input I)
Correct simulations (cont)
Similar executions:Executions A and S are similar iffor every v, for every neighbor w, for every original local variable X at v, for every integer p > 0:
1. X value at beginning of pulse p in A = X value at beginning of round p in S
Correct simulations (cont)
2. Original messages sent by v to w during pulse p in execution A - same as those sent by v to w during round p in execution S
3. Original messages received by v from w during pulse p in A -same as those received by v from w during round p in S
4. Final output of v in A – same as in S
Correct simulations (cont)Correct simulation:Asynchronous protocol A simulates synchronous protocol S if for every network topology and initial input, the executions of A and S are similar
Synchronizer is correct if for every synchronous protocol S,protocol A=(S) simulates S
Correct simulations (cont)
Lemma: If synchronizer guarantees pulse compatibility then it is correct
Goal: Impose pulse compatibility
Correct simulations (cont)
Fundamental question:
When is it permissible for a processor to increase its pulse number?
Correct simulations (cont)
First answer: Increase pulse from p to p+1once certain that original messages of algorithm S sent by neighbors during their pulse p will no more arrive
Question:How can that be ensured?
Correct simulations (cont)Readiness property: Processor v is ready for pulse p,denoted Ready(v,p), once it already received all algorithm messagessent to it by neighbors during their pulse p-1.
Readiness rule:Processor v may generate pulse p once it finished its original actions for pulse p-1, and Ready(v,p) holds.
Correct simulations (cont)Problem: Obeying the readiness rule does not impose pulse compatibility
(Bad scenario: v is ready for pulse p, generates pulse p, sends msg of pulse p to neighbor w,yet w is still “stuck” at pulse p-1,waiting for msgs of pulse p-1 from some other neighbor z)
Correct simulations (cont)
Fix: Delay messages that arrived too early
Delay rule:Receiving in pulse p-1 msg sent from w on its pulse p, temporarily store it;
Process it only after generating pulse p
Correct simulations (cont)
Lemma: A synchronizer imposing both readiness and delay rules guarantees pulse compatibility
Corollary: If synchronizer imposes the readiness and delay rules, then it is correct
Implementation phases
Problem: To satisfy Ready(v,p), v must ensure that it already received all algorithm messages sent to it by its neighbors in pulse p-1
If w did not send any message to v in pulse p-1, then v must wait forever
(link delays in an asynchronous network are unpredictable...)
Implementation phases
Phase A:1. Each processor sends its original messages2. Processor receiving message from neighbor
sends Ack
Each processor learns (within finite time) that all messages it sent during pulse p have arrived
Conceptual solution: Employ two communication phases
Implementation phases
Fact:If each neighbor w of v satisfies Safe(v,p), then v satisfies Ready(v,p+1)
Node may generate new pulse once it learns all neighbors are safe w.r.t. current pulse.
Safety property: Node v is safe w.r.t. pulse p,denoted Safe(v,p), if all messages it sent during pulse p have already arrived.
Implementation phases
Phase B:
Apply a procedure to let each processor know when all its neighbors are safe w.r.t. pulse p
Synchronizer constructions:• based on 2-phase strategy • all use same Phase A procedure• but different Phase B procedures
Synchronizer complexity
Initialization costs:Tinit() and Cinit() = time and message costs of initialization procedure setting up synchronizer
Pulse overheads:Cpulse() = cost of synchronization messagessent by all vertices during their pulse p
Tpulse() = ?
Synchronizer complexity
Tpulse() = ?
Time periods during which different nodes v is at pulse p…
Synchronizer complexityLet tmax(p) = maxvV {t(v,p)}(time when slowest processor reached pulse p)
tmax(1) tmax(2) tmax(3)
Tpulse() = maxp>0 {tmax(p+1) - tmax(p)}
Synchronizer complexity
Lemma:For synchronous algorithm S and asynchronous A = (S),
• Comm(A) = Cinit() + Comm(S) + Time(S) * Cpulse(),
• Time(A) = Tinit() + Time(S) * Tpulse()
Basic synchronizer
Phase B of synchronizer : Direct.
After executing pulse p, when processor v learns it is safe, it reports this fact to all neighbors.
Claim: Synchronizer is correct.
Basic synchronizer Claim:• Cinit()=O(|E|)
• Tinit()=O(Diam)
• Cpulse()=O(|E|)
• Tpulse()=O(1)
Note: Synchronizer is optimal for trees, planar graphs and bounded-degree networks (mesh, butterfly, cube-connected cycle, ring,..)
Basic synchronizer
Assume: rooted spanning tree T in G
Phase B of : convergecast process on T
Basic synchronizer
• When processor v learns all its descendants in T are safe, it reports this fact to parent.
• When r0 learns all processors in G are safe, it broadcasts this along tree.
Stage 1: Stage 2:
Convergecast ends all nodes are safe
Basic synchronizer Claim: Synchronizer is correct.
Claim:• Cinit()=O(n|E|)
• Tinit()=O(Diam)
• Cpulse()=O(n)
• Tpulse()=O(Diam)
Note: Synchronizer is optimal for bounded-diameter networks.
Understanding the effects of locality
Model: • synchronous• simultaneous wakeup• large messages allowed
Goal: Focus on limitations stemming from
locality of knowledge
Symmetry breaking algorithms
Vertex coloring problem: Associate a color v with each v in V, s.t. any two adjacent vertices have different color
Naive solution: Use unique vertex ID's = legal coloring by n colors
Goal: obtain coloring with few colors
Symmetry breaking algorithms
Basic palette reduction procedure:Given legal coloring by m colors, reduce # colors
(G) = max vertex degree in G
Reduction idea: v's neighbors occupy at most distinct colors
+1 colors always suffice to find a “free” color
Symmetry breaking algorithms
First Free coloring(For set of colors and node set W v V)
FirstFree(W,) = min color in that is currently not used by any vertex in W
Standard palette:m = {1,...,m}, for m > 1
Sequential color reduction
For every node v do (sequentially):
v FirstFree((v),+1)
/* Pick new color 1 < j < +1, different from those used by the neighboring nodes */
Procedure Reduce(m) - code for v
Palette:3 = {1,...,3},
21
13
3
21
Procedure Reduce(m) - parallelization
Code for v:
For round j= +2 to m do:/* all nodes colored j re-color themselves simultaneously */
• If v's original color is v = j then do:
1. Set v FirstFree((v),+1)/* Pick new color 1 < j < +1, different from those used by the neighbors */
2. Inform all neighbors
Procedure Reduce(m) - code for v
Lemma:• Procedure Reduce produces a legal coloring
of G with +1 colors• Time(Reduce(m)) = m-+1
Proof:Time bound: Each iteration requires one time unit.
Procedure Reduce(m) - code for v
Correctness: Consider iteration j.• When node v re-colors itself, it always finds a
non-conflicting color (< neighbors, and +1 color palette)
• No conflict with nodes recolored in earlier iterations (or originally colored 1, 2, …, +1).
• No conflict with choices of other nodes in iteration j (they are all mutually nonadjacent, by legality of original coloring )
New coloring is legal
3-coloring trees
Goal: color a tree T with 3 colors in time O(log*n)
Recall: log(1)n = log nlog(i+1)n = log(log(i)n)log*n = min { i | log(i)n < 2 }
General idea: • Look at colors as bit strings. • Attempt to reduce # bits used for colors.
3-coloring trees|v| = # bits in v
v[i] = ith bit in the bit string representing v
Specific idea: Produce new color from old v:1. find index 0 < i < |v| in which
v's color differs from its parent's.(Root picks, say, index 0.)
2. set new color to: i , v[i]/* the index i concatenated with the bit v[i] */
3-coloring trees
We will show: a. neighbors have different new colorsb. length of new coloring is roughly logarithmic
in that of previous coloring
root
Old coloring:
3-coloring trees (cont)
Algorithm SixColor(T) - code for v
Let v ID(v) /* initial coloring */
Repeat:• |v|• If v is the root then set I 0
else set I min{ i | v[i]≠parent(v)[i] }
•Set v I; v[I]•Inform all children of this choice
until |v| =
3-coloring trees (cont)Lemma:In each iteration, Procedure SixColor produces a legal coloring
Proof:Consider iteration i, neighboring nodes v,w T, v=parent(w).
I = index picked by v; J = index picked by w
3-coloring trees (cont)
If I≠ J: new colors of v and w differ in 1st component
If I=J: new colors differ in 2nd component
v
wv
w
i=1
j=2i=2
j=2
3-coloring trees (cont)
Ki = # bits in color representation after ith iteration.
(K0=K=O(log n) = # bits in original ID coloring.)
Note: Ki+1 = dlog Kie + 1
2nd coloring uses about log(2)n bits, 3rd - about log(3)n, etc
3-coloring trees (cont)Lemma: Final coloring uses six colors
Proof:Final iteration i satisfies Ki = Ki-1 < 3
In final coloring, there are < 3 choices for the index to the bit in (i-1)st coloring, and two choices for the value of the bit
Total of six possible colors
Reducing from 6 to 3 colors
Shift-down operation:Given legal coloring of T:1. re-color each non-root vertex by color of
parent2. re-color root by new color (different from
current one)
Reducing from 6 to 3 colors
Claim:1. Shift-down step preserves coloring legality2. In new coloring, siblings are monochromatic
Reducing from 6 to 3 colors
Cancelling color x, for x {4,5,6}:
1. Perform shift-down operation on current coloring,
2. All nodes colored x apply FirstFree((v),3)/* choose a new color from among {1,2,3} not used by any neighbor */
Reducing from 6 to 3 colors
shift-down FirstFree
Claim: Rule for cancelling color x produces legal coloring
Example: cancelling color 4
Overall 3 coloring process
Thm: There is a deterministic distributed algorithm for 3-coloring trees in time O(log*n)
1. Invoke Algorithm SixColor(T) (O(log*n) time)2. Cancel colors 6, 5, 4 (O(1) time)
+1-coloring for arbitrary graphs
Goal: Color G of max degree with +1 colors in O( log n) time
Node ID’s in G = K-bit strings
Idea: Recursive procedure ReColor(x),where x = binary string of < K bits.
Ux = { v | ID(v) has suffix x } (|Ux| < 2K-|x|)
The procedure is applied to Ux, and returns with a coloring of Ux vertices with +1 colors.
+1-coloring for arbitrary graphs
Procedure ReColor(x) - intuition
If |x|=K (Ux has < one node) then return color 0.Otherwise:1. Separate Ux into two sets U0x and U1x
2. Recursively compute +1 coloring for each, invoking ReColor(0x) and ReColor(1x)
3. Remove conflicts between the two colorings by altering the colors of U1x vertices, color by color, as in Procedure Reduce.
ReColor – distributed implementation
• Set |x|• If = K /* singleton Ux = {v} */
then set v 0 and return
• Set b aK- /* v Ubx */
• v ReColor(bx).
Procedure ReColor(x) – code for v Ux
/* ID(v)=a1a2... aK , x = aK-|x|+1... aK */
Procedure ReColor(x) - code for v
/* Reconciling the colorings on U0x and U1x */
• If b=1 then do:• For round i=1 through +1 do:
• If v=i then do:
• v FirstFree((v), +1) (pick a new color 1 < j < +1, different
from those used by any neighbor)• Inform all neighbors of this choice
AnalysisLemma:For = empty word:• Procedure ReColor() produces legal coloring
of G with +1 colors• Time(ReColor()) = O( log n)
AnalysisProof:Sub-claim: ReColor(x) yields legal +1-coloring for vertices of subgraph G(Ux) induced by Ux
Proof:By induction on length of parameter x.Base (|x|=K): ImmediateGeneral case: Consider run of ReColor(x).
Note: Coloring assigned to U0x is legal (by Ind. Hyp.), and does not change later.
Analysis (cont)Consider v in U1x recoloring itself in some
iteration i via the FirstFree operation.
Note: v always finds a non-conflicting color:• No conflict with nodes of U1x recolored in
earlier iterations, or with nodes of U0x
• No conflict with other nodes that recolor in iteration i (mutually non-adjacent, by legality of coloring generated by ReColor(1x) to set U1x)
new coloring is legal
Analysis (cont)
Time bound: Each of the K=O(log n) recursion levels requires +1 time units
O( log n) time
Lower bound for 3-coloring the ringLower bound: Any deterministic distributed algorithm for 3-coloring n-node rings requires at least (log*n-1)/2 time.
Applies in strong model: After t time units, v knows everything known to anyone in its t-neighborhood.
In particular, given no inputs but vertex ID's: after t steps, node v learns the topology of its t-neighborhood t(v) (including ID's)
Lower bound for 3-coloring the ringOn a ring, v learned a (2t+1)-tuple (x1,...,x2t+1) from space W2t+1,n, whereWs,n= {(x1,...,xs) | 1 < xi < n, xi≠xj},
• xt+1 = ID(v),
• xt and xt+2 = ID's of v’stwo neighbors,
• etc.
Coloring lower bound (cont)
W.l.o.g., any deterministic t(n)-step algorithm At for coloring a ring in cmax colors follows a 2-phase policy:
• Phase 1: For t rounds, exchange topology info.At end, each v holds a tuple (v) W2t+1,n
• Phase 2: Select v A((v)) where
A : W2t+1,n {1,...,cmax}is the coloring function of algorithm A
Coloring lower bound (cont)
Define a graph Bs,n = (Ws,n, Es,n), whereEs,n contains all edges of form
(x1,x2,...,xs)
(x2,...,xs,xs+1)
satisfying x1 ≠ xs+1
Coloring lower bound (cont)Note: Two s-tuples of Ws,n ,(x1,x2,...,xs) and (x2,...,xs,xs+1) ,are connected in Bs,n
they may occur as tuples corresponding to two neighboring nodes in some ID assignment for the ring.
s
s
Coloring lower bound (cont)
Lemma: If Algorithm At produces a legal coloring for any n-node ring,then the function A defines a legal coloring for the graph B2t+1,n
Proof:Suppose A is not a legal coloring for B2t+1,n , i.e., there exist two neighboring vertices =(x1,x2,...,x2t+1) and =(x2,...,xs,x2t+2) in B2t+1,n s.t.
A() = A()
Coloring lower bound (cont)
Consider n-node ring with the following ID assignments:
Coloring lower bound (cont)Then algorithm A colors the neighboring nodes v and w by colors A() and A() respectively.These colors are identical,
so the ringcoloring is illegal;contradiction
Coloring lower bound (cont)Corollary:If the n-vertex ring can be colored in t rounds using cmax colors, then (B2t+1,n) < cmax
Thm:Any deterministic distributed algorithm for coloring the (2n)-vertex ring with two colors requires at least n-1 rounds.
Coloring lower bound (cont)
Proof:By Corollary, if there is a 2-coloring algorithm working in t time units,then (B2t+1,2n) < 2 (or, B2t+1,n is 2-colorable)hence B2t+1,n is bipartite.
But for t < n-2, this leads to contradiction, sinceB2t+1,2n contains an odd length cycle, hence it is not bipartite.
Coloring lower bound (cont)
The odd cycle:
(1,2,…,2t+1)
(2,…,2t+1, 2t+1)
(3,…, 2t+3)
(4,…, 2t+3,1)(2t+3,1,2,…,2t)
(5,…, 2t+3,1,2)
Coloring lower bound (cont)
Def: Family of directed graphs s,n = (s,n, s,n), s,n = {(x1,...,xs) | 1 < x1 < ... < xs < n },s,n = all (directed) arcs
(x1,x2,...,xs)
(x2,...,xs,xs+1)
Returning to 3-coloring: We prove the following:Lemma: (B2t+1,n) > log(2t)n
Coloring lower bound (cont)
Claim: (s,n) < (Bs,n)
Proof: The undirected version of s,n is a subgraph of Bs,n
To prove the lemma, i.e., bound (B2t+1,n), it suffices to show that (2t+1,n) > log(2t)n
Coloring lower bound (cont)Recursive representation for directed graphs : based on directed line graphs
in digraph H
e e’
e e’
in (H)
Def: For a directed graph H=(U,F), line graph of H, (H), is a directed graph withV((H)) = F,E((H)) contains an arc e,e' (for e,e‘ F) iff in H, e' starts at the vertex in which e ends
Coloring lower bound (cont)
Lemma:1. 1,n = complete directed graph on n nodes
(with every two vertices connected by one arc in each direction)
2. s+1,n = (s,n)
ProofClaim 1: immediate from definition.
Coloring lower bound (cont)
Claim 2: Establish appropriate isomorphism between s+1,n and (s,n) as follows.Consider
e = (x1,...,xs) , (x2,...,xs+1)e = arc of s,n = node of (s,n)
Map e to node (x1,...,xs,xs+1) of s+1,n
Straightforward to verify this mapping preserves the adjacency relation
Coloring lower bound (cont)
Coloring lower bound (cont)Lemma: For every directed graph H,
((H)) > log (H)Proof: Let k=((H)).Consider k-coloring of (H).
= edge coloring for H, s.t. if e' starts at vertex in which e ends, then (e') ≠ (e).
coloring can be used to create a 2k-coloring for H, by setting the color of node v to be the set v = { (e) | e ends in v }
Coloring lower bound (cont)
Note: uses < 2k colors. is legal
(H) < 2k, proving the lemma.
Coloring lower bound (cont)
Corollary: (s,n) > log(s-1)n
Proof:Immediate from last two lemmas:(1) 1,n = complete directed n node graph s+1,n = (s,n)(2)((H)) > log (H)
Corollary: (2t+1,n) > log(2t)nCorollary: (B2t+1,n) > log(2t)n
Coloring lower bound (cont)
Thm:Any deterministic distributed algorithm for coloring n-vertex rings with 3 colors requires time t > (log*n-1)/2Proof:If A is such an algorithm and it requires t rounds, then log(2t)n < (B2t+1,n) < 3,
log(2t+1)n < 2
2t+1 > log*n
Distributed Maximal Independent Set
Goal: Select MIS in graph G
Independent set: U V s.t.
u,w U u,w non-adjacent
Maximal IS: Adding any vertex violates independence
Distributed Maximal Independent Set
Note: Maximal IS ≠ Maximum IS
MaximalIS
Non-maximal, Non-maximumIS
MaximumIS
Distributed Maximal Independent Set
Sequential greedy MIS construction
Set U V, M
While U ≠ do:• Pick arbitrary v in U• Set U U - (v)• Set M M [ {v}
Distributed Maximal Independent Set
Note:1.M is independent throughout process2.once U is exhausted, M forms an MIS
Complexity: O(|E|) time
Distributed implementation
Distributedly marking an MIS: Set local boolean variable at each v:
v MIS = 1
v MIS = 0
Distributed implementationAlgorithm MIS-DFS• Single token traversing G in depth-first order,
marking vertices as in / out of MIS.• On reaching an unmarked vertex:
1. add it to MIS (by setting to 1),2. mark its neighbors as excluded from MIS
Complexity:• Message(MIS-DFS)=O(|E|)• Time(MIS-DFS)=O(n)
Lexicographically smallest MIS
LexMIS: The lexicographically smallest MISover V={1,…,n}
Note: Possible to construct LexMIS by simple sequential (non-distributed) procedure (go over node list 1,2,…:
- add v to MIS, - erase its neighbors from list)
{1,3,5,9} < {1,3,7,9}
Distributed LexMIS computation
Algorithm MIS-Rank - code for v• Invoke Procedure Join• On getting msg Decided(1) from neighbor w
do:- Set 0- Send Decided(0) to all neighbors
• On getting msg Decided(0) from neighbor w do:
- Invoke Procedure Join
Distributed LexMIS computation
Procedure Join – code for v• If every neighbor w of v with larger ID
has decided (w)=0then do:
- Set 1- Send Decided(1) to all neighbors
Complexity – Distributed LexMIS
Claim:• Message(MIS-Rank)=O(|E|)• Time(MIS-Rank)=O(n)
Note: Worst case complexities no better than naive sequential procedure
Reducing coloring to MIS
Procedure ColorToMIS(m) - code for v
For round i=1 through m do:- If v's original color is v = i then do:
• If None of v's neighbors joined MIS yet then do:Decide 1 (join MIS)Inform all neighbors
• Else decide 0
AnalysisLemma: Procedure ColorToMIS constructs MIS
for G in time m
Proof:Independence:• Node v that joins MIS in iteration i
is not adjacent to any w that joined MIS earlier. • It is also not adjacent to any w trying to join in
current iteration(since they belong to same color class)
Analysis
Maximality: By contradiction. For M marked by procedure,suppose there is a node v M s.t. M [ {v} is independent. Suppose v=i. Then in iteration i, the decision made by vwas erroneous.
Analysis (cont)
Corollary: Given algorithm for coloring G with f(G) colors in time T(G), it is possible to construct MIS for G in time T(G)+f(G)
Corollary:There is a deterministic distributed MIS algorithm for trees / bounded-degree graphs with time O(log*n).
Analysis (cont)
Corollary:There is a deterministic distributed MIS algorithm for arbitrary graphs with time complexity O((G) log n).
Lower bound for MIS on ringsFact: Given MIS for the ring, it is possible to 3-
color the ring in one round.
Proof: v MIS: takes color 1, sends “2” to left neighborw MIS: takes color 2 if it gets msg “2”; otherwise takes color 3
Reducing coloring to MIS (cont)Validity of 3-coloring: Since MIS vertices are spaced 2 or 3 places apart around the ring
Corollary: Any deterministic distributed MIS algorithm for the n-vertex ring requires at least (log*n-3)/2 time.
Randomized distributed MIS algorithm
Doable in time O(log n)
“Store and forward” routing schemes
Routing scheme: Mechanism specifying for each pair u,v V a path in G connecting u to v
Routing labels: Labeling assignment
Labels = (v1,...,vn) for G vertices
Headers = { allowable message headers }
“Store and forward” routing schemes
Data structures: Each v stores:
1. Initial header function Iv: Labels Headers
2. Header function Hv: Headers Headers
3. Port function Fv: Headers [1.. deg(v,G)]
Forwarding protocol
For u to send a message M to v:
1. Prepare header h=Iu(v), attach it to M(Typically consists of label of destination, v, plus some additional routing information)
2. Load M onto exit port i=Fu(h)
Forwarding protocol
Message M with header h' arriving at node w:
• Read h', check whether w = final destination.
• If not:1. Prepare new header by setting h=Hw(h')
replace old header h' attached to M by h2. Compute exit port by setting i=Fu(h)3. load M onto port i
Routing schemes (cont)For every pair u,v, scheme RS specifies a route
(RS,u,v)=(u=w1,w2,...,wj=v),
through which M travels from u to v.
|(RS,u,v)| = route length
Partial routing schemes: Schemes specifying a route only for some vertex pairs in G
Performance measures(e) = cost of using link e ~ estimated link delay for message sent on e
Comm(RS,u,v) = cost of uv routing by RS
= weighted route length, |(RS,u,v)|
Performance measures (cont)
Stretch factor:Given routing scheme RS for G,we say RS stretches the path from u to v by
Dilation(RS,u,v) = Comm(RS,u,v)| / dist(u,v)
Dilation(RS,G) = maxu,vV {Dilation(RS,u,v)}
Performance measures (cont)
Memory requirement:Mem(v,Iv,Hv,Fv) = # memory bits for storing the label and functions Iv, Hv, Fv in v.
Total memory requirement of RS:Mem(RS)=∑vLabels Mem(v,Iv,Hv,Fv)
Maximal memory requirement of RS:MaxMem(RS)=maxvLabels Mem(v,Iv,Hv,Fv)
Routing strategiesRouting strategy: Algorithm computing a routing scheme RS for every G(V,E,).
A routing strategy has stretch factor k if for every G it produces scheme RS with Dilation(RS,G) < k.
Memory requirement of routing strategy (as function of n) =maximum (over all n-vertex G) memory requirement of routing schemes produced.
Routing strategies (cont)Solution 1: Full tables routing (FTR)Port function Fv stored at v specifies entire table(one entry per each destination u ≠ v) listing exit port used for forwarding M to u.
Port functionfor node 1:
FTR (cont)
Note: The pointers to a particular destination u form shortest path tree rooted at u
Optimal communication cost:Dilation(FTR,G)=1
Disadvantage: Expensive for large systems (each v stores O(n log n) bit routing table)
FTR (cont)Example: Unweighted ring
Consider unit cost n-vertex ring.
FTR strategy implementation:• Label vertices consecutively as 0,...,n-1• Route from i to j along shorter of two ring
segments (inferred from labels i,j)
Stretch = 1 (optimal routes)2log n bits per vertex (stores own label and n)
Solution 2: Flooding
Origin broadcasts M throughout entire network.
Requires no routing tables (optimal memory)
Non-optimal communication (unbounded stretch)
FTR vs. Flooding:
Extreme endpointsof communication-memory tradeoff
Part 2: Representations
1. Clustered representations• Basic concepts: clusters, covers, partitions • Sparse covers and partitions• Decompositions and regional matchings
2. Skeletal representations
• Spanning trees and tree covers• Sparse and light weight spanners
Basic idea of locality-sensitive distributed computing
Utilize locality to both simplify control structures and algorithms and reduce their costs
Operation performed in large network may concern few processors in small region
(Global operation may have local sub-operations)
Reduce costs by utilizing “locality of reference”
Components of locality theory
• General framework, complexity measures and algorithmic methodology
• Suitable graph-theoretic structures and efficient construction methods
• Adaptation to wide variety of applications
Fundamental approach
Clustered representation:• Impose clustered hierarchical organization on
arbitrary given network• Use it efficiently for bounding complexity of
distributed algorithms.
Skeletal representation:• Sparsify given network • Execute applications on remaining skeleton,
reducing complexity
Clusters, covers and partitions
Cluster = connected subset of vertices S V.
Cover of G(V,E,) = collection of clusters={S1,...,Sm} containing all vertices of G
(i.e., s.t. [ = V).
PartitionsPartial partition of G = collection of disjointclusters ={S1,...,Sm}, i.e., s.t. S Å S'=
Partition= cover and partial partition.
Evaluation criteria
Locality and Sparsity
Locality level: cluster radius
Sparsity level: vertex / cluster degrees
Evaluation criteria
Locality - sparsity tradeoff:
locality and sparsity parametersgo opposite ways:
better sparsity ⇔ worse locality (and vice versa)
Evaluation criteria
Locality measures
Weighted distances:
Length of path (e1,...,es) = ∑1<i< s (ei)
dist(u,w,G) = (weighted) length of shortest path
dist(U,W) = min{ dist(u,w) | uU, wW }
Evaluation criteria
Diameter, radius: As before, except weighted
For clusters collection :• Diam()=maxi Diam(Si)
• Rad ()=maxi Rad (Si)
Sparsity measuresCover sparsity measure - overlap:
deg(v,) = # occurrences of v in clusters Si.e., degree of v in hypergraph (V,)
C() = maximum degree of cover
Av() = average degree of = ∑vV deg(v,) / n = ∑S|S| / n
deg(v) = 3
v
Partition sparsity measure - adjacency
Intuition: “contract” clusters into super-nodes,look at resulting cluster graph of ,()=(, ),={(S,S') | S,S‘ ,G contains edge (u,v) for u S and v S'}
edges: inter-cluster edges
Example: A basic construction
Goal: produce a partition with:
1. clusters of radius < k2. few inter-cluster edges (or, low Avc())
Algorithm BasicPart
Algorithm operates in iterations,each constructing one cluster
Example: A basic construction
At end of iteration:- Add resulting cluster S to output collection - Discard it from V- If V is not empty then start new iteration
Iteration structure• Arbitrarily pick a vertex v from V
• Grow cluster S around v, adding layer by layer
• Vertices added to S are discarded from V
Iteration structure
• Layer merging process is carried repeatedly until reaching required sparsity condition:
- next iteration increases # vertices by a factor of < n1/k
(I.e., |(S)| < |S| n1/k)
Analysis
Thm: Given n-vertex graph G(V,E), integer k > 1,Alg. BasicPart creates a partition satisfying:1) Rad() < k-1,2) # inter-cluster edges in () < n1+1/k
(or, Avc() < n1/k)
Analysis
Proof:
Correctness:• For every S added to is (connected) cluster• The generated clusters are disjoint
(Alg' erases from V every v added to cluster)• is a partition (covers all vertices)
Analysis (cont)Property (2):By termination condition of internal loop,resulting S satisfies |(S)| < n1/k |S|
(# inter-cluster edges touching S) < n1/k |S|
Number can only decrease in later iterations, ifadjacent vertices get merged into same cluster
|| < ∑S n1/k |S| = n1+1/k
Analysis (cont)Property (1):Consider iteration of main loop.
Let J = # times internal loop was executed.
Let Si = S constructed on i'th internal iteration
|Si| > n(i-1)/k for 2 < i < J (By induction on i)
Analysis (cont) J < k (otherwise, |S| > n)
Note: Rad(Si) < i-1 for every 1 < i < J (S1 is composed of a single vertex, each additional layer increases Rad(Si) by 1)
Rad(SJ) < k-1
Synchronizers revisitedGoal: Synchronizer capturing reasonable middle points on time-communication tradeoff scale
Synchronizer
Assumption: Given a low-degree partition
For each cluster in , build rooted spanning tree.
In addition, between any two neighboring clusters designate a synchronization link.
Synchronizer
Handling safety information (in Phase B)
Step 1: For every cluster separately apply synchronizer (By end of step, every v knows every w in its cluster is safe)Step 2: Every processor incident to synchronization link sends a message to other cluster, saying its cluster is safe.
Handling safety information (in Phase B)
Step 3: Repetition of step 1, except the convergecast performed in each cluster carries different information:
• Whenever v learns all clusters neighboring its subtree are safe, it reports this to parent.
Step 4: When root learns all neighboring clusters are safe, it broadcasts “start new pulse” on tree
Synchronizer
Phases of synchronizer In each cluster
1. Converge(Æ,Safe(v,p))2. Tcast(ClusterSafe(p))3. Send ClusterSafe(p) messages to adjacent
clusters
4. Converge(Æ,AdjClusterSafe(v,p)),5. Tcast(AllSafe(p))
AnalysisCorrectness:Recall:
Readiness property: Processor v is ready for pulse p once it already received all alg' msgs sent to it by neighbors during their pulse p-1.
Readiness rule:Processor v may generate pulse p once it finished its original actions for pulse p-1, and Ready(v,p) holds.
Analysis
To prove Sync. properly implements Phase B, need to show that it imposes readiness rule.
Claim: Synchronizer is correct.
ComplexityClaim:1. Cpulse()=O(n1+1/k)
2. Tpulse()=O(k)Proof:Time to implement one pulse: < 2 broadcast / convergecast rounds in clusters(+ 1 message-exchange step among border
vertices in neighboring clusters)
Tpulse() < 4 Rad() +1 = O(k)
ComplexityMessages: Broadcast / convergecast rounds,
separately in each cluster,cost O(n) msgs total(clusters are disjoint)
Single communication step among neighboring clusters requires n Avc() = O(n1+1/k) msgs
Cpulse() = O(n1+1/k)