Distributed Control Algorithms for Artificial Intelligence

Distributed Control Algorithms Distributed Control Algorithms for Artificial Intelligencefor Artificial Intelligence

by Avi Nissimov,

DAI seminar @ HUJI, 2003

Control methodsControl methods

Goal: deliberation on task that should be executed, and on time when it should be executed.

Control in centralized algorithms– Loops, branches

Control in distributed algorithms– Control messages

Control for distributed AI– Search coordination

Centralized versus Distributed Centralized versus Distributed computation modelscomputation models

“Default” centralized computation model:– Turing machine.

Open issues in distributed models:– Synchronization– Predefined structure of network– Network graph structure knowledge on processors – Processor identification– Processor roles

Notes about proposed Notes about proposed computational modelcomputational model

Asynchronous – (and therefore non-deterministic)

Unstructured (connected) network graphNo global knowledge – neighbors onlyEach processor has unique idNo server-client roles, but there is a

computation initiator

Complexity measuresComplexity measures

Communication– Number of exchanged messages

Time– In terms of slowest message (no weights on

network graph edges); ignore local processing

Storage– Common number of bits/words required

Control issuesControl issues

Graph exploration– Communication over the graph

Termination detection– Detection of state when no node is running and

no message is sent

Graph exploration: TasksGraph exploration: Tasks

Routing of message from node to nodeBroadcastingConnectivity determinationCommunication capacity usage

Echo algorithmEcho algorithm

Goal: spanning tree buildingIntuition: got a message – let it go onOn reception of message on first time, send

it to all of the neighbors, ignoring the restTermination detection – after the nodes

respond, send [echo] message to father

Echo alg.: implementationEcho alg.: implementation

receive[echo] from w; father:=w;

received:=1;

for all (v in Neighbors-{w}) send[echo] to v;

while (received < Neighbors.size) do

receive[echo]; received++;

send [echo] to father

Echo algorithm - propertiesEcho algorithm - properties

Very useful in practice, since no faster exploration can happen

Reasonable assumption – “fast” edges tend to stay fast

Theoretical model allows worst execution, since every spanning tree can be a result of the algorithm

DFS spanning tree algorithm:DFS spanning tree algorithm:Centralized versionCentralized version

DFS(u, father)

if (visited[u]) then return;

visited[u]=true;

father[u]=father;

for all (neigh in neighbors[u])

DFS(neigh, u);

DFS spanning tree algorithm:DFS spanning tree algorithm:Distributed versionDistributed version

On reception of [dfs] from v

if (visited[u]) then

send [return] to v;

status[v]:=returned; return;

visited:=true; status[v]:=father;

sendToNext();

DFS spanning tree algorithm:DFS spanning tree algorithm:Distributed version (Cont.)Distributed version (Cont.)

On reception of [return] from v

status[v]:=returned;

sendToNext();

sendToNext:

if there is w s.t.status[w]=unused then

send [dfs] to w;

else send [return] to father

Discussion, complexity Discussion, complexity analysisanalysis

Sequential in natureThere is 2 messages on each node therefore

– Communication complexity is 2m

All the messages are sent in sequence– Time complexity is 2m as well

Explicitly un-utilizing parallel execution

Awerbuch linear time Awerbuch linear time algorithm for DFS treealgorithm for DFS tree

Main idea: why to send to node that is visited?

Each node sends [visited] message in parallel to all the neighbors

Neighbors update their knowledge on status of the node before they are visited in O(1) for each node (in parallel)

Awerbuch algorithm: Awerbuch algorithm: complexity analysiscomplexity analysis

Let (u,v) be edge, suppose u is visited before v. Then u sends [visit] message on (u,v); and v sends back [ok] message to u.

If (u,v) is also a tree edge, [dfs], [return] messages are sent too.

Comm. complexity: 2m+2(n-1)Time complexity: 2n+2(n-1)=4n-2

Relaxation algorithm - ideaRelaxation algorithm - idea

DFS-tree property: if (u,v) is edge in original graph, then v is in path (root,..,u) or u is in path of (root,..,v).

Union of lexically minimal simple paths (lmsp) satisfies this property.

Therefore, all we need is to find lmsp for each node in graph

Relaxation algorithm – Relaxation algorithm – ImplementationImplementation

On arrival of [path, <path>]

if (currentPath>(<path>,u)) then

currentPath:=(<path>, u);

send all neighbors [path, currentPath]

// (in parallel, of course)

Relaxation algorithm – Relaxation algorithm – analysis and conclusionsanalysis and conclusions

Advantages – low complexity:– In k steps, all the nodes with length k of lmsp

are set up, therefore time complexity is n

Disadvantages:– Unlimited message length– Termination detection required (see further)

Other variations and notesOther variations and notes

Minimal spanning tree– Requires weighting the nodes, much like Kruskal’s

MST algorithm

BFS– Very hard, since there is no synchronization; much like

iterative deepening DFS

Linear message solution– Like centralized; sends all the information to next node;

unlimited message length.

Connectivity CertificatesConnectivity Certificates

Idea: let G be network graph. Throw from G some edges, while preserving k paths when available in G; and all the paths if G itself contains less than k paths (for each {u,v})

Applications:– Network capacity utilization– Transport reliability insurance

Connectivity certificate: GoalsConnectivity certificate: Goals

The main idea of certificates is to use as less edges as possible, there always is the trivial certificate – whole graph.

Finding minimal certificate is NP-hard problem

Sparse certificate is one that contains no more than k*n edges

Sparse connectivity certificate: Sparse connectivity certificate: SolutionSolution

Let E(i) be a spanning forest in graph G\Union(E(j)) for 1<=j<=i-1; then Union(E(i)) is a sparse connectivity certificate

Algorithm idea – calculate all the forests simultaneously – if an edge closes a cycle in tree of i-th forest, then add the edge to forest (i+1)-th (rank of the edge is i+1)

Distributed certificate Distributed certificate algorithmalgorithm

Search(father)

if (not visited) then

for all neighbor v s.t. rank[v] ==0send[give_rank] to v;

receive[ranked, <i>] from v;

rank[v]:=i;

visited:=true;

Distributed certificate Distributed certificate algorithm (cont.)algorithm (cont.)

Search(v) (cont.)

for all w s.t. needs_search[w] and

rank[w]>=rank[father] in

decreasing order

needs_search[w]:=false;

send [search ] to w; receive [return];

send [return] to father

Distributed certificate Distributed certificate algorithm (cont.)algorithm (cont.)

On receipt of [give_rank] from v

rank[v]:=min(i) s.t. i>rank[w] for all w;

send [ranked, <rank[v]>] to v;

On receipt of [search] from father

Search(father);

Complexity analysis and Complexity analysis and discussiondiscussion

There is no reference to k in algorithm; it calculates sparse certificates for all k’s

There is at most 4 messages on each edge – therefore time and communication complexity is at most 4m=O(m)

Ranking the nodes in parallel, we can achieve 2n+2m complexity

Termination detection: Termination detection: definitiondefinition

Problem: detect a state when all the nodes are awaiting for messages in passive state

Similar to garbage collection problem – determine the nodes that no longer can accept the messages (until “reallocated” – reactivated)

Two approaches: tracing vs. probe

Processor states of execution: Processor states of execution: global pictureglobal picture

Send– pre-condition: {state=active}; – action: send[message];

Receive– pre-condition: {message queue is not empty}; – action: state:=active;

Finish activity– pre-condition: {state=active}; – action: state:=passive;

TracingTracing

Similar to “reference counting” garbage collection algorithm

On sending a message, increases children counter On receiving message [finished_work], decreases

children counter When finishes work, and when children counter

equals zero, sends a [finished_work] message to the father

Analysis and discussionAnalysis and discussion

Main disadvantage: doubles (!!) the communication complexity

Advantages: simplicity, immediate termination detection (because the message is initiated by terminator).

Variations may send [finished_work] message on chosen messages; so called “weak reference”

Probe algorithms Probe algorithms

Main idea: Once per some time, “collect garbage” – calculate number of sent minus number of received messages per processor

If sum of these numbers is 0 – then there is no message running on the network.

In parallel, find out if there is an active processor.

Probe algorithms – detailsProbe algorithms – details

We will introduce new role – controller; and we will assume it is in fact connected to each node.

Once in some period (delta), controller sends [request] message to all the nodes.

Each processor sends back [deficit= <sent_number-received_number>].

Think it works? Not yet…Think it works? Not yet…

Suppose U sends a message to V and becomes passive; then U receives [request] message and replies (immediately) [deficit=1].

Next processor W receives [request] message; it replies [def=0] since it got no message yet

Meanwhile V activates W by sending it a message, receives reply from W and stops; receives [request] and replies [def=-1]

But W is still active….

How to work it out?How to work it out?

As we saw, a message can pass “behind the back” of the controller, since the model is asynchronous

Yet, if we add some additional boolean variable on each of processors, such as “was active since last request”, we can deal with this problem

But that means, we will detect termination only in 2*delta time after the termination actually occurs

Variations, discussion, Variations, discussion, analysisanalysis

If there is more one edge between the controller and a node, usage of “echo” when initiator=controller, sum calculated inline

Not immediate detection, initiated by controller

Small delta causes to communication bottleneck, while large delta causes long period before detection

CSP and Arc ConsistencyCSP and Arc Consistency

Formal definition: find x(i) from D(i) so that if Cij(v,w) and x(i)=v then x(j)=w

Problem is NP-complete in generalArc-consistency problem is removing all

values that are redundant: if for all w from D(j) Cij(v,w)=false then remove v from D(i)

Of course, Arc-consistency is just the primary step of CSP solution

Sequential AC4 algorithmSequential AC4 algorithmFor all Cij,v in Di,w in Dj

if Cij(v,w) then count[i,v,j]++; Supp[j,w].insert(,<i,v>);

For all Cij,v in Di checkRedundant(i,v,j) ;While not Q.empty

<j,w> =Q.deque();forall <i,v> in Supp[j,w]

count[i,v,j]--; checkRedundant(i,v,j);

Sequential AC4 algorithm: Sequential AC4 algorithm: redundancy checkredundancy check

checkRedundant(i,v,j)

if (count[i,v,j]=0) then

Q.enque(<i,v>); Di.remove(v);

Distributed Arc consistencyDistributed Arc consistency

Assume that each variable x(i) is assigned to separate processor, and that all the mutually dependent variables assigned to neighbors.

The main idea of algorithm: Supp[j,w] and count[i,v,j] lay on processor of x(j); while D(i) is on processor i; if v is to be removed from D(i), then j processor sends message

Distributed AC4: InitializationDistributed AC4: Initialization

Initialization:

For all Cij,v in Di

For all w in Dj_initial

if Cij(v,w) count[v,j]++;

if count[v,j]=0 Redundant(v)

Redundant(v):

if v in Di

Di.remove(v); SendQueue.enque(v);

Distributed AC4: messagingDistributed AC4: messaging

On not SendQueue.empty

v=SendQueue.deque

for all Cji send [remove v] to j

On reception of [remove w] from j

for all v in Di such that Cij(v,w)

count[v,j]--;

if count[v,j]=0 Redundant(v)

Distributed AC4: complexityDistributed AC4: complexity

Assume A=max{|Di|}, m=|{Cij}|. Sequential execution: both loops pass over all the

Cij,v in Di and w in Dj => O(mA^2) Distributed execution:

– Communication complexity: on each edge can be at most A messages => O(mA);

– Time complexity: each node sends in parallel each of A messages => O(nA).

– Local computation: O(mA^2) because of initialization

Dist. AC4 – Final detailsDist. AC4 – Final details

Termination detection is not obvious, and requires explicit implementation– Usually probe algorithm is preferred because of

big quantity of messagesAC4 ends in three possible states

– Contradiction– Solution– Arc Consistent sub-set

Task assignment for AC4Task assignment for AC4

Our assumption was that each variable is assigned to different processor.

Special case is multiprocessor computer, when all the resources are on hand

In fact, that is NP-hard problem to minimize communication cost when assignment has to be done by computer => heuristic approximation algorithms.

From AC4 to CSPFrom AC4 to CSP

There are many heuristics, taught mainly in introduction AI course (such as most restricted variable and most restricting value) that tells which variables should be removed after arc-consistency is reached

On contradiction termination – usage in back-tracing

Loop cut-set exampleLoop cut-set example

Definition: Pit in loop L – a vertex in directed graph, such that both edges of L are incoming.

Goal: break loops in directed graph.Formulation: Let G=<V,E> be graph; find

C subset of V such that any loop in G contains at least one non-pit vertex.

Applications: Belief networks algorithms

Sequential solutionSequential solution

It can be shown that finding minimal cut-set is NP-hard problem, therefore approximations are used instead

Best known approximation – by Suermondt and Cooper – is shown on next slide

Main idea: on each step drop all leaves and then find a vertex so that is common to the maximal number of cycles that has 1 incoming edge

Suermondt and Cooper Suermondt and Cooper algorithmalgorithm

C:=empty;

While not V.empty do

remove all v such that deg(v)<=1;

K:={v in V: indeg(v)<=1}

v:=argmax{deg(v): v in K};

C.insert(v);

V.remove(v);

Edge caseEdge case

There is still one subtlety (that isn’t described in Tel’s article) – what to do if K is empty while V is not (for example, if G is Euler path on octahedron)

Distributed version: ideasDistributed version: ideas

Four parts of algorithm– Variables and leaf trim – removal of all leaf

nodes– Control tree construction – tree for search of

next cut node– Cut node search – search of the best node to

add to cut– Controller shift – optimization, see later

Data structures Data structures for distributed versionfor distributed version

Each node contains – its activity status (nas: {yes, cut, non-cut})– activity status of all adjacent edges-links (las:

{yes, no})– control status of all adjacent links (lcs:{basic,

son, father, frond})

Leaf trim partLeaf trim part

Idea: remove all leafs from the graph (put them to non-cut state).

If the algorithm discovers that a node has 1 active edge left, it sends its unique neighbor [remove] message

Tracing-like termination detection

Leaf trim implementationLeaf trim implementation

Var las[x]=yes; nas=yes;procedure TrimTest

if |{x:las[x]}|=1 thennas:=noncut; las[x]:=no;send [remove] to x and receive [return]

or [remove] back;

On reception of [remove] from x:las[x]:=no; TrimTest; send [return] to x

Control tree searchControl tree search

For this goal echo algorithm is used (with appropriate variation – now each father should know list of its children

This task is completely independent from the previous (leaf trim), therefore they can be executed in parallel

During this task, lcs variable is set up

Control tree construction: Control tree construction: implementationimplementation

Procedure constructSubtree

For all x s.t. lcs[x]=basic do

send [construct, father] to x

while exists x s.t. lcs[x]=basic

receive [construct, <i>] from y

lcs[y]:= (<i>=father)? frond : son

First [construct,father] message from x

lcs[x]:=father; {constructSubtree & TrimTest};

send [construct,son] to x;

Cut node searchCut node search

Idea: pass over the control tree and combine For this reason we will need to collect (un-

broadcast) on control tree the maximal degree of a node in the sub-tree

Note, that only nodes with indeg<=1 (for this reason, Income represents incoming edges-neighbors), that still are active.

Cut node search: Cut node search: implementationimplementation

Procedure NodeSearch:my_degree:= ( |{x in Income: las[x]}| <2)?

|{x : las[x]}| : 0;best_degree:=my_degree;for all x: lst[x]=son send [search] to x;do |{x: lst[x]=son}| times

receive [best_is, d] from x;if (best_degree<d) then

best_degree:=d; best_branch:=x;send [best_is, best_degree] to father;

Controller shiftController shift

This task has no parallel code in sequential algorithm and is only optimization issue

Idea: because the newly selected cut-code is the center of trim activity, the root of control tree should pass there.

In fact, this part doesn’t involve search, since we already on this stage the path to best degree on best branches

Controller shift: root changeController shift: root change

On [change_root] message from x

lcs[x]:=son;

if (best_branch=u) then

TrimFromNeighbors;

InitSearchCutnode;

else

lcs[best_branch]:=father;

send[change_root] to best_branch

Trim from neighborsTrim from neighbors

TrimFromNeighbors:

for all x:las[x] send [remove] to x;

do | {x:las[x]}| times

receive [return] or [remove] from x;

las[x]:=no;

ComplexityComplexity

New measures: s=C.size; d=tree diameter Communication complexity

– 2m for [remove/return]+2m for [construct]+ 2(n-1)(s+1) for [search/best_is]+sd for [change_root] => 4m+2sn

Time complexity without trim:– 2d+2(s+1)d+sd=(3s+4 )d

Trim time complexity:– Worst case: 2(n-s)

Controller shift: Controller shift:

Documents

Distributed Control Algorithms for Artificial Intelligence