Upload
stuart
View
58
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Distributed Control Algorithms for Artificial Intelligence. by Avi Nissimov, DAI seminar @ HUJI, 2003. Control methods. Goal: deliberation on task that should be executed, and on time when it should be executed. Control in centralized algorithms Loops, branches - PowerPoint PPT Presentation
Citation preview
Distributed Control Algorithms Distributed Control Algorithms for Artificial Intelligencefor Artificial Intelligence
by Avi Nissimov,
DAI seminar @ HUJI, 2003
Control methodsControl methods
Goal: deliberation on task that should be executed, and on time when it should be executed.
Control in centralized algorithms– Loops, branches
Control in distributed algorithms– Control messages
Control for distributed AI– Search coordination
Centralized versus Distributed Centralized versus Distributed computation modelscomputation models
“Default” centralized computation model:– Turing machine.
Open issues in distributed models:– Synchronization– Predefined structure of network– Network graph structure knowledge on processors – Processor identification– Processor roles
Notes about proposed Notes about proposed computational modelcomputational model
Asynchronous – (and therefore non-deterministic)
Unstructured (connected) network graphNo global knowledge – neighbors onlyEach processor has unique idNo server-client roles, but there is a
computation initiator
Complexity measuresComplexity measures
Communication– Number of exchanged messages
Time– In terms of slowest message (no weights on
network graph edges); ignore local processing
Storage– Common number of bits/words required
Control issuesControl issues
Graph exploration– Communication over the graph
Termination detection– Detection of state when no node is running and
no message is sent
Graph exploration: TasksGraph exploration: Tasks
Routing of message from node to nodeBroadcastingConnectivity determinationCommunication capacity usage
Echo algorithmEcho algorithm
Goal: spanning tree buildingIntuition: got a message – let it go onOn reception of message on first time, send
it to all of the neighbors, ignoring the restTermination detection – after the nodes
respond, send [echo] message to father
Echo alg.: implementationEcho alg.: implementation
receive[echo] from w; father:=w;
received:=1;
for all (v in Neighbors-{w}) send[echo] to v;
while (received < Neighbors.size) do
receive[echo]; received++;
send [echo] to father
Echo algorithm - propertiesEcho algorithm - properties
Very useful in practice, since no faster exploration can happen
Reasonable assumption – “fast” edges tend to stay fast
Theoretical model allows worst execution, since every spanning tree can be a result of the algorithm
DFS spanning tree algorithm:DFS spanning tree algorithm:Centralized versionCentralized version
DFS(u, father)
if (visited[u]) then return;
visited[u]=true;
father[u]=father;
for all (neigh in neighbors[u])
DFS(neigh, u);
DFS spanning tree algorithm:DFS spanning tree algorithm:Distributed versionDistributed version
On reception of [dfs] from v
if (visited[u]) then
send [return] to v;
status[v]:=returned; return;
visited:=true; status[v]:=father;
sendToNext();
DFS spanning tree algorithm:DFS spanning tree algorithm:Distributed version (Cont.)Distributed version (Cont.)
On reception of [return] from v
status[v]:=returned;
sendToNext();
sendToNext:
if there is w s.t.status[w]=unused then
send [dfs] to w;
else send [return] to father
Discussion, complexity Discussion, complexity analysisanalysis
Sequential in natureThere is 2 messages on each node therefore
– Communication complexity is 2m
All the messages are sent in sequence– Time complexity is 2m as well
Explicitly un-utilizing parallel execution
Awerbuch linear time Awerbuch linear time algorithm for DFS treealgorithm for DFS tree
Main idea: why to send to node that is visited?
Each node sends [visited] message in parallel to all the neighbors
Neighbors update their knowledge on status of the node before they are visited in O(1) for each node (in parallel)
Awerbuch algorithm: Awerbuch algorithm: complexity analysiscomplexity analysis
Let (u,v) be edge, suppose u is visited before v. Then u sends [visit] message on (u,v); and v sends back [ok] message to u.
If (u,v) is also a tree edge, [dfs], [return] messages are sent too.
Comm. complexity: 2m+2(n-1)Time complexity: 2n+2(n-1)=4n-2
Relaxation algorithm - ideaRelaxation algorithm - idea
DFS-tree property: if (u,v) is edge in original graph, then v is in path (root,..,u) or u is in path of (root,..,v).
Union of lexically minimal simple paths (lmsp) satisfies this property.
Therefore, all we need is to find lmsp for each node in graph
Relaxation algorithm – Relaxation algorithm – ImplementationImplementation
On arrival of [path, <path>]
if (currentPath>(<path>,u)) then
currentPath:=(<path>, u);
send all neighbors [path, currentPath]
// (in parallel, of course)
Relaxation algorithm – Relaxation algorithm – analysis and conclusionsanalysis and conclusions
Advantages – low complexity:– In k steps, all the nodes with length k of lmsp
are set up, therefore time complexity is n
Disadvantages:– Unlimited message length– Termination detection required (see further)
Other variations and notesOther variations and notes
Minimal spanning tree– Requires weighting the nodes, much like Kruskal’s
MST algorithm
BFS– Very hard, since there is no synchronization; much like
iterative deepening DFS
Linear message solution– Like centralized; sends all the information to next node;
unlimited message length.
Connectivity CertificatesConnectivity Certificates
Idea: let G be network graph. Throw from G some edges, while preserving k paths when available in G; and all the paths if G itself contains less than k paths (for each {u,v})
Applications:– Network capacity utilization– Transport reliability insurance
Connectivity certificate: GoalsConnectivity certificate: Goals
The main idea of certificates is to use as less edges as possible, there always is the trivial certificate – whole graph.
Finding minimal certificate is NP-hard problem
Sparse certificate is one that contains no more than k*n edges
Sparse connectivity certificate: Sparse connectivity certificate: SolutionSolution
Let E(i) be a spanning forest in graph G\Union(E(j)) for 1<=j<=i-1; then Union(E(i)) is a sparse connectivity certificate
Algorithm idea – calculate all the forests simultaneously – if an edge closes a cycle in tree of i-th forest, then add the edge to forest (i+1)-th (rank of the edge is i+1)
Distributed certificate Distributed certificate algorithmalgorithm
Search(father)
if (not visited) then
for all neighbor v s.t. rank[v] ==0send[give_rank] to v;
receive[ranked, <i>] from v;
rank[v]:=i;
visited:=true;
Distributed certificate Distributed certificate algorithm (cont.)algorithm (cont.)
Search(v) (cont.)
for all w s.t. needs_search[w] and
rank[w]>=rank[father] in
decreasing order
needs_search[w]:=false;
send [search ] to w; receive [return];
send [return] to father
Distributed certificate Distributed certificate algorithm (cont.)algorithm (cont.)
On receipt of [give_rank] from v
rank[v]:=min(i) s.t. i>rank[w] for all w;
send [ranked, <rank[v]>] to v;
On receipt of [search] from father
Search(father);
Complexity analysis and Complexity analysis and discussiondiscussion
There is no reference to k in algorithm; it calculates sparse certificates for all k’s
There is at most 4 messages on each edge – therefore time and communication complexity is at most 4m=O(m)
Ranking the nodes in parallel, we can achieve 2n+2m complexity
Termination detection: Termination detection: definitiondefinition
Problem: detect a state when all the nodes are awaiting for messages in passive state
Similar to garbage collection problem – determine the nodes that no longer can accept the messages (until “reallocated” – reactivated)
Two approaches: tracing vs. probe
Processor states of execution: Processor states of execution: global pictureglobal picture
Send– pre-condition: {state=active}; – action: send[message];
Receive– pre-condition: {message queue is not empty}; – action: state:=active;
Finish activity– pre-condition: {state=active}; – action: state:=passive;
TracingTracing
Similar to “reference counting” garbage collection algorithm
On sending a message, increases children counter On receiving message [finished_work], decreases
children counter When finishes work, and when children counter
equals zero, sends a [finished_work] message to the father
Analysis and discussionAnalysis and discussion
Main disadvantage: doubles (!!) the communication complexity
Advantages: simplicity, immediate termination detection (because the message is initiated by terminator).
Variations may send [finished_work] message on chosen messages; so called “weak reference”
Probe algorithms Probe algorithms
Main idea: Once per some time, “collect garbage” – calculate number of sent minus number of received messages per processor
If sum of these numbers is 0 – then there is no message running on the network.
In parallel, find out if there is an active processor.
Probe algorithms – detailsProbe algorithms – details
We will introduce new role – controller; and we will assume it is in fact connected to each node.
Once in some period (delta), controller sends [request] message to all the nodes.
Each processor sends back [deficit= <sent_number-received_number>].
Think it works? Not yet…Think it works? Not yet…
Suppose U sends a message to V and becomes passive; then U receives [request] message and replies (immediately) [deficit=1].
Next processor W receives [request] message; it replies [def=0] since it got no message yet
Meanwhile V activates W by sending it a message, receives reply from W and stops; receives [request] and replies [def=-1]
But W is still active….
How to work it out?How to work it out?
As we saw, a message can pass “behind the back” of the controller, since the model is asynchronous
Yet, if we add some additional boolean variable on each of processors, such as “was active since last request”, we can deal with this problem
But that means, we will detect termination only in 2*delta time after the termination actually occurs
Variations, discussion, Variations, discussion, analysisanalysis
If there is more one edge between the controller and a node, usage of “echo” when initiator=controller, sum calculated inline
Not immediate detection, initiated by controller
Small delta causes to communication bottleneck, while large delta causes long period before detection
CSP and Arc ConsistencyCSP and Arc Consistency
Formal definition: find x(i) from D(i) so that if Cij(v,w) and x(i)=v then x(j)=w
Problem is NP-complete in generalArc-consistency problem is removing all
values that are redundant: if for all w from D(j) Cij(v,w)=false then remove v from D(i)
Of course, Arc-consistency is just the primary step of CSP solution
Sequential AC4 algorithmSequential AC4 algorithmFor all Cij,v in Di,w in Dj
if Cij(v,w) then count[i,v,j]++; Supp[j,w].insert(,<i,v>);
For all Cij,v in Di checkRedundant(i,v,j) ;While not Q.empty
<j,w> =Q.deque();forall <i,v> in Supp[j,w]
count[i,v,j]--; checkRedundant(i,v,j);
Sequential AC4 algorithm: Sequential AC4 algorithm: redundancy checkredundancy check
checkRedundant(i,v,j)
if (count[i,v,j]=0) then
Q.enque(<i,v>); Di.remove(v);
Distributed Arc consistencyDistributed Arc consistency
Assume that each variable x(i) is assigned to separate processor, and that all the mutually dependent variables assigned to neighbors.
The main idea of algorithm: Supp[j,w] and count[i,v,j] lay on processor of x(j); while D(i) is on processor i; if v is to be removed from D(i), then j processor sends message
Distributed AC4: InitializationDistributed AC4: Initialization
Initialization:
For all Cij,v in Di
For all w in Dj_initial
if Cij(v,w) count[v,j]++;
if count[v,j]=0 Redundant(v)
Redundant(v):
if v in Di
Di.remove(v); SendQueue.enque(v);
Distributed AC4: messagingDistributed AC4: messaging
On not SendQueue.empty
v=SendQueue.deque
for all Cji send [remove v] to j
On reception of [remove w] from j
for all v in Di such that Cij(v,w)
count[v,j]--;
if count[v,j]=0 Redundant(v)
Distributed AC4: complexityDistributed AC4: complexity
Assume A=max{|Di|}, m=|{Cij}|. Sequential execution: both loops pass over all the
Cij,v in Di and w in Dj => O(mA^2) Distributed execution:
– Communication complexity: on each edge can be at most A messages => O(mA);
– Time complexity: each node sends in parallel each of A messages => O(nA).
– Local computation: O(mA^2) because of initialization
Dist. AC4 – Final detailsDist. AC4 – Final details
Termination detection is not obvious, and requires explicit implementation– Usually probe algorithm is preferred because of
big quantity of messagesAC4 ends in three possible states
– Contradiction– Solution– Arc Consistent sub-set
Task assignment for AC4Task assignment for AC4
Our assumption was that each variable is assigned to different processor.
Special case is multiprocessor computer, when all the resources are on hand
In fact, that is NP-hard problem to minimize communication cost when assignment has to be done by computer => heuristic approximation algorithms.
From AC4 to CSPFrom AC4 to CSP
There are many heuristics, taught mainly in introduction AI course (such as most restricted variable and most restricting value) that tells which variables should be removed after arc-consistency is reached
On contradiction termination – usage in back-tracing
Loop cut-set exampleLoop cut-set example
Definition: Pit in loop L – a vertex in directed graph, such that both edges of L are incoming.
Goal: break loops in directed graph.Formulation: Let G=<V,E> be graph; find
C subset of V such that any loop in G contains at least one non-pit vertex.
Applications: Belief networks algorithms
Sequential solutionSequential solution
It can be shown that finding minimal cut-set is NP-hard problem, therefore approximations are used instead
Best known approximation – by Suermondt and Cooper – is shown on next slide
Main idea: on each step drop all leaves and then find a vertex so that is common to the maximal number of cycles that has 1 incoming edge
Suermondt and Cooper Suermondt and Cooper algorithmalgorithm
C:=empty;
While not V.empty do
remove all v such that deg(v)<=1;
K:={v in V: indeg(v)<=1}
v:=argmax{deg(v): v in K};
C.insert(v);
V.remove(v);
Edge caseEdge case
There is still one subtlety (that isn’t described in Tel’s article) – what to do if K is empty while V is not (for example, if G is Euler path on octahedron)
Distributed version: ideasDistributed version: ideas
Four parts of algorithm– Variables and leaf trim – removal of all leaf
nodes– Control tree construction – tree for search of
next cut node– Cut node search – search of the best node to
add to cut– Controller shift – optimization, see later
Data structures Data structures for distributed versionfor distributed version
Each node contains – its activity status (nas: {yes, cut, non-cut})– activity status of all adjacent edges-links (las:
{yes, no})– control status of all adjacent links (lcs:{basic,
son, father, frond})
Leaf trim partLeaf trim part
Idea: remove all leafs from the graph (put them to non-cut state).
If the algorithm discovers that a node has 1 active edge left, it sends its unique neighbor [remove] message
Tracing-like termination detection
Leaf trim implementationLeaf trim implementation
Var las[x]=yes; nas=yes;procedure TrimTest
if |{x:las[x]}|=1 thennas:=noncut; las[x]:=no;send [remove] to x and receive [return]
or [remove] back;
On reception of [remove] from x:las[x]:=no; TrimTest; send [return] to x
Control tree searchControl tree search
For this goal echo algorithm is used (with appropriate variation – now each father should know list of its children
This task is completely independent from the previous (leaf trim), therefore they can be executed in parallel
During this task, lcs variable is set up
Control tree construction: Control tree construction: implementationimplementation
Procedure constructSubtree
For all x s.t. lcs[x]=basic do
send [construct, father] to x
while exists x s.t. lcs[x]=basic
receive [construct, <i>] from y
lcs[y]:= (<i>=father)? frond : son
First [construct,father] message from x
lcs[x]:=father; {constructSubtree & TrimTest};
send [construct,son] to x;
Cut node searchCut node search
Idea: pass over the control tree and combine For this reason we will need to collect (un-
broadcast) on control tree the maximal degree of a node in the sub-tree
Note, that only nodes with indeg<=1 (for this reason, Income represents incoming edges-neighbors), that still are active.
Cut node search: Cut node search: implementationimplementation
Procedure NodeSearch:my_degree:= ( |{x in Income: las[x]}| <2)?
|{x : las[x]}| : 0;best_degree:=my_degree;for all x: lst[x]=son send [search] to x;do |{x: lst[x]=son}| times
receive [best_is, d] from x;if (best_degree<d) then
best_degree:=d; best_branch:=x;send [best_is, best_degree] to father;
Controller shiftController shift
This task has no parallel code in sequential algorithm and is only optimization issue
Idea: because the newly selected cut-code is the center of trim activity, the root of control tree should pass there.
In fact, this part doesn’t involve search, since we already on this stage the path to best degree on best branches
Controller shift: root changeController shift: root change
On [change_root] message from x
lcs[x]:=son;
if (best_branch=u) then
TrimFromNeighbors;
InitSearchCutnode;
else
lcs[best_branch]:=father;
send[change_root] to best_branch
Trim from neighborsTrim from neighbors
TrimFromNeighbors:
for all x:las[x] send [remove] to x;
do | {x:las[x]}| times
receive [return] or [remove] from x;
las[x]:=no;
ComplexityComplexity
New measures: s=C.size; d=tree diameter Communication complexity
– 2m for [remove/return]+2m for [construct]+ 2(n-1)(s+1) for [search/best_is]+sd for [change_root] => 4m+2sn
Time complexity without trim:– 2d+2(s+1)d+sd=(3s+4 )d
Trim time complexity:– Worst case: 2(n-s)
Controller shift: Controller shift: