A Feedback Scheme to Reorder a Multi-Agent Execution

A Feedback Scheme to Reorder a Multi-Agent Execution Schedule by PersistentlyOptimizing a Switchable Action Dependency Graph

Alexander Berndt* Niels van Duijkeren† Luigi Palmieri† Tamás Keviczky*

Abstract

In this paper we consider multiple Automated Guided Vehi-cles (AGVs) navigating a common workspace to fulfill var-ious intralogistics tasks, typically formulated as the Multi-Agent Path Finding (MAPF) problem. To keep plan executiondeadlock-free, one approach is to construct an Action Depen-dency Graph (ADG) which encodes the ordering of AGVsas they proceed along their routes. Using this method, de-layed AGVs occasionally require others to wait for them atintersections, thereby affecting the plan execution efficiency.If the workspace is shared by dynamic obstacles such as hu-mans or third party robots, AGVs can experience large delays.A common mitigation approach is to re-solve the MAPF us-ing the current, delayed AGV positions. However, solving theMAPF is time-consuming, making this approach inefficient,especially for large AGV teams. In this work, we present anonline method to repeatedly modify a given acyclic ADG tominimize the cumulative AGV route completion times. Ourapproach persistently maintains an acyclic ADG, necessaryfor deadlock-free plan execution. We evaluate the approachby considering simulations with random disturbances on theexecution and show faster route completion times comparedto the baseline ADG-based execution management approach.

Index terms— Robust Plan Execution, Scheduling andCoordination, Mixed Integer Programming, Multi-AgentPath Finding, Factory Automation

1 IntroductionMultiple Automated Guided Vehicles (AGVs) have shownto be capable of efficiently performing intra-logistics taskssuch as moving inventory in distribution centers (Wur-man, D’Andrea, and Mountz 2008). The coordination ofAGVs in shared environments is typically formulated as the

*Alexander Berndt and Tamás Keviczky are with theDelft Center for Systems and Control (DCSC), TU Delft,2628 CN Delft, The Netherlands [email protected],[email protected].

†Niels van Duijkeren and Luigi Palmieri are withRobert Bosch GmbH, Corporate Research, Rennin-gen, 71272, Germany {Niels.vanDuijkeren,Luigi.Palmieri}@de.bosch.com.Copyright © 2020, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Figure 1: A roadmap occupied by 50 AGVs (represented bycolored dots). AGVs must efficiently navigate from a start toa goal position while avoiding collisions with one another,despite being subjected to delays.

Multi-Agent Path Finding (MAPF) problem, which has beenshown to be NP-Hard (Yu and LaValle 2012). The problemis to find trajectories for each AGV along a roadmap suchthat each AGV reaches its goal without colliding with theother AGVs, while minimizing the makespan. The MAPFproblem typically considers an abstraction of the workspaceto a graph where vertices represent spatial locations andedges pathways connecting two locations.

Recently, solving the MAPF problem has garneredwidespread attention (Stern et al. 2019; Felner et al. 2017).This is mostly due to the abundance of application do-mains, such as intralogistics, airport taxi scheduling (Mor-ris et al. 2016) and computer games (Ontanón et al. 2013).Solutions to the MAPF problem include Conflict-BasedSearch (CBS) (Sharon et al. 2015), Prioritized Planning us-ing Safe Interval Path Planning (SIPP) (Yakovlev and An-dreychuk 2017), declarative optimization approaches usinganswer set programming (Bogatarkan, Patoglu, and Erdem2019), heuristic-guided coordination (Pecora et al. 2018)and graph-flow optimization approaches (Yu and LaValle2013).

Algorithms such as CBS have been improved by exploit-ing properties such as geometric symmetry (Li et al. 2019),

arX

iv:2

010.

0525

4v1

[cs

.RO

] 1

1 O

ct 2

020

using purpose-built heuristics (Felner et al. 2018), or adopt-ing a Mixed-Integer Linear Program (MILP) formulationwhere a branch-cut-and-price solver is used to yield signifi-cantly faster solution times (Lam et al. 2019).

Similarly, the development of bounded sub-optimalsolvers such as Enhanced Conflict-Based Search (ECBS)(Barer et al. 2014) have further improved planning per-formance for higher dimensional state spaces. ContinuousConflict-Based Search (CCBS) can be used to determineMAPF plans for more realistic roadmap layouts (Andrey-chuk et al. 2019). As opposed to CBS, CCBS considers aweighted graph and continuous time intervals to describecollision avoidance constraints, albeit with increased solu-tion times.

The abstraction of the MAPF to a graph search problemmeans that executing the MAPF plans requires monitoringof the assumptions made during the planning stage to en-sure and maintain their validity. This is because irregularitiessuch as vehicle dynamics and unpredictable delays influenceplan execution. kR-MAPF addresses this by permitting de-lays up to a duration of k time-steps (Atzmon et al. 2020).Stochastic AGV delay distributions are considered in (Ma,Kumar, and Koenig 2017), where the MAPF is solved byminimizing the expected overall delay. These robust MAPFformulations and solutions inevitably result in more conser-vative plans compared to their nominal counterparts.

An Action Dependency Graph (ADG) encodes the order-ing between AGVs as well as their kinematic constraintsin a post-processing step after solving the MAPF (Höniget al. 2017). Combined with an execution management ap-proach, this allows AGVs to execute MAPF plans success-fully despite kinematic constraints and unforeseen delays.This work was extended to allow for persistent re-planning(Hönig et al. 2019).

The aforementioned plan execution solutions in (Höniget al. 2019; Atzmon et al. 2020; Ma, Kumar, and Koenig2017) address the effects of delays by ensuring synchronousbehavior among AGVs while maintaining the originallyplanned schedule’s ordering. The result is that plan exe-cution is unnecessarily inefficient when a single AGV islargely delayed and others are on schedule, since AGVs needto wait for the delayed AGV before continuing their plans.We observe that to efficiently mitigate the effects of largedelays the plans should be adjusted continuously in an on-line fashion, where the main challenges are to maintain theoriginal plan’s deadlock- and collision-free guarantees.

In this paper, we present such an online approach capableof reordering AGVs based on a MAPF solution, allowingfor efficient MAPF plan execution despite AGVs being sub-jected to large delays. This approach is fundamentally differ-ent from the aforementioned approaches (Hönig et al. 2017;Ma, Kumar, and Koenig 2017; Atzmon et al. 2020; Hönig etal. 2019) in that delays can be accounted for as they occur,instead of anticipating them a priori. The feedback nature ofour approach additionally means solving the initial MAPFcan be done assuming nominal plan execution, as opposedto solving a robust formulation which necessarily results inplans of longer length due to the increased conservativeness.

Our contributions include an optimization formulation

based on a novel Switchable Action Dependency Graph(SADG) to re-order AGV dependencies. Monte-Carlo simu-lation results show lower cumulative route completion timeswith real-time applicable optimization times while guaran-teeing collision- and deadlock-free plan execution.

Working towards our proposed solution, we formally de-fine the MAPF problem and the concept of an ADG in Sec-tion 2. Based on a modified version of this ADG, we intro-duce the concept of a reverse agent dependency in Section 3.This will allow an alternative ordering of Automated GuidedVehicles, while maintaining a collision-free schedule. InSection 4, we formulate the choice of selecting between for-ward or reverse ADG dependencies as a mixed-integer linearprogramming problem. The optimization problem formula-tion guarantees that the resulting ADG allows plan execu-tion to be both collision- and deadlock-free, while minimiz-ing the predicted plan completion time. Finally, we comparethis approach to the baseline ADG method in Section 5.

2 PreliminariesLet us now introduce the fundamental concepts on whichour approach is based, facilitated by the example shownin Fig. 2. Consider the representation of a workspace by aroadmap G = (V, E), where V is a set of vertices and E a setof edges, e.g., as in Fig. 2a.Definition 1 (MAPF Solution). The roadmap G = (V, E)is occupied by a set of N AGVs where the ith AGV hasstart si ∈ V and goal gi ∈ V , such that si 6= sj andgi 6= gj ∀ i, j ∈ {1, . . . , N}, i 6= j. A MAPF solutionP = {P1, . . . ,PN} is a set of N plans, each defined by asequence P = {p1, . . . , pNi} of tuples p = (l, t), with alocation l ∈ V and a time t ∈ [0,∞). The MAPF solutionis such that, if every AGV perfectly follows its plan, then allAGVs will reach their respective goals in finite time withoutcollision.

For a plan tuple p = (l, t), let us define the operatorsl = loc(p) and t = t̂(p) which return the location l ∈ V andplanned time of plan tuple p respectively. Let S(l) 7→ S ⊂R2 be an operator which maps a location l (obtained froml = loc(p)) to a spatial region in the physical workspace inR2. Let SAGV ⊂ R2 refer to the physical area occupied byan AGV.

In Fig. 2a, AGV1 and AGV2 have start and goal s1 =A, g1 = H and s2 = E, g2 = D, respectively. For thisexample, using CCBS (Andreychuk et al. 2019) yields P ={P1,P2} asP1 = {(A, 0), (B, 1.0), (C, 2.2), (G, 3.1), (H, 3.9)},P2 = {(E, 0), (F, 1.1), (G, 3.9), (C, 4.8), (D, 5.9)}.

Note the implicit ordering in P , statingAGV1 traverses C−G before AGV2.

2.1 Modified Action Dependency GraphBased on a MAPF solution P , we can construct a modifiedversion of the original Action Dependency Graph (ADG),formally defined in Definition 2. This modified ADG en-codes the sequencing of AGV movements to ensure theplans are executed as originally planned despite delays.

Definition 2 (Action Dependency Graph). An ADG is a di-rected graph GADG = (VADG, EADG) where the vertices rep-resent events of an AGV traversing a roadmap G. A ver-tex vki = ({p1, . . . , pq}, status) ∈ VADG denotes the kth

event of the ith AGV moving from loc(p1), via intermedi-ate locations, to loc(pq), where q ≥ 2 denotes the num-ber of consecutive plan tuples encoded within vki . status ∈{staged, in-progress, completed}. The edge (vki , v

lj) ∈ EADG,

from here on referred to as a dependency, states that vlj can-not be in-progress or completed until vki = completed. Anedge (vki , v

lj) ∈ EADG is classified as Type 1 if i = j and

Type 2 if i 6= j.

Initially, the status of vki are staged ∀ i, k. Let us intro-duce plan(vki ) which returns the sequence of plan tuples{p1, . . . , pq} for vki ∈ VADG. Let the operators s(vki ) andg(vki ) return the start and goal vertices loc(p1) and loc(pq)of vertex vki respectively and ⊕ denote the Minkowski sum.We also differentiate between planned and actual ADG ver-tex completion times. Let t̂s(vki ) and t̂g(vki ) denote theplanned time that event vki ∈ VADG starts (status changesfrom staged to in-progress) and is completed (status changesfrom in-progress to completed), respectively. An ADG canbe constructed from a plan P using Algorithm 1. AGVs canexecute their plans as originally described by the MAPF so-lution P by adhering to the ADG, defined next in Defini-tion 3.

Definition 3 (Executing ADG based plans). AGVs adhereto the ADG if each AGV only starts executing an ADG eventvki (status of vki changes from staged to in-progress) if alldependencies pointing to vki have status = completed for allvki ∈ VADG.

Fig. 2b shows an example of AGVs adhering to the ADG.Observe howAGV2 cannot start v22 before v41 has been com-pleted by AGV1, as dictated by Definition 3.

Next, we introduce Assumption 1 which we require tomaintain deadlock-free behavior between AGVs when ex-ecuting an ADG based plan as described in Definition 3.

Assumption 1 (Acyclic ADG). The ADG constructed by Al-gorithm 1 using P as defined in Definition 1 is acyclic.

Remark 1. Assumption 1 can in practice always be satisfiedin case the roadmap vertices outnumber the AGV fleet size,i.e. |V| > N (as is typically the case in warehouse robotics).Simple modificationa to existing MAPF solvers (e.g. an extraedge constraint in CBS) is sufficient to obtain ADGs thatsatisfy Assumption 1 (Hönig et al. 2019).

Unlike the originally proposed ADG algorithm, Algo-rithm 1 ensures that non-spatially-exlusive subsequent plantuples are contained within a single ADG vertex, cf. line 7of the algorithm. This property will prove to be useful withthe introduction of reverse dependencies in Section 3.1. De-spite these modifications, Algorithm 1 maintains the orig-inal algorithm’s time complexity of O(N2n̄2) where n̄ =maxiNi.

Due to delays, the planned and actual ADG vertex timesmay differ. Much like the previously introduced planned

AB C

GFE

D

H

1.0 1.2 1.1

0.81.40.9

1.1

s�g�

g�s�AGV�

AGV�

(a) A roadmap graph occupied by two AGVs with start si andgoal gi for i = {1, 2}. The start and goal vertices are highlightedwith dotted and solid colored circle outlines respectively. The edgeweights indicate the expected traversal times.

A→B B→C C→G G→H

E→F G→C C→D

v�� v�

� v�� v�

�

v�� v�

�v��

F→Gv�

�

AGV�

AGV�

Type 1Type 2

staged

in-progress

completed

(b) Illustration of the ADG where each vertex status is color coded.It reflects the momentary progress of the AGVs in Fig. 2a.

Figure 2: Illustrative MAPF problem example alongside theconstructed Action Dependency Graph

Algorithm 1 Modified ADG construction based on (Höniget al. 2019)

Input: MAPF solution P = {P1, . . . ,PN}Result: GADG

// Add ADG vertices and Type 1 dependencies1: for i = 1 to N do2: p← p1i3: v ← ({p}, staged)4: vprev ← None5: for k = 2 to Ni do6: Append pki to plan(v)7: if S(loc(p))⊕SAGV∩S(loc(pki ))⊕SAGV = ∅ then8: Add v to VADG9: if vprev not None then

10: Add edge (vprev, v) to EADG11: vprev ← v

12: p← pki13: v ← ({p}, staged)

// Add Type 2 dependencies14: for i = 1 to N do15: for k = 1 to Ni do16: for j = 1 to N do17: if i 6= j then18: for l = 1 to Nj do19: if s(vki ) = g(vlj) and t̂g(vki ) ≤ t̂g(vlj)

then20: Add edge (vki , v

lj) to EADG

21: return GADG

event start and completion times t̂s(vki ) and t̂g(vki ), we

also introduce ts(vki ) and tg(vki ) which denote the actualstart and completion times of event vki ∈ VADG respec-tively. Note that if the MAPF solution is executed nomi-nally, i.e. AGVs experience no delays, then ts(vki ) = t̂s(v

ki )

and tg(vki ) = t̂g(vki ) for all vki ∈ VADG. Let us introducean important property of an ADG-managed plan-executionscheme, Proposition 1, concerning guarantees of successfulplan execution.

Proposition 1 (Collision- and deadlock-free ADG plan exe-cution). Consider an ADG, GADG, constructed from a MAPFsolution as defined in Definition 1 using Algorithm 1, sat-isfying Assumption 1. If the AGV plan execution adheres tothe dependencies in GADG, then, assuming the AGVs are sub-jected to a finite number of delays of finite duration, the planexecution will be collision-free and completed in finite time.

Proof 1: Proof by induction. Consider that AGVi andAGVj traverse a common vertex p̄ ∈ G along their plansPi and Pj , for any i, j ∈ {1, . . . , N}, i 6= j. By lines 1-13of Algorithm 1, this implies g(vki ) = s(vlj) = p̄ for somevki , v

lj ∈ VADG. By lines 14-20 of Algorithm 1, common ver-

tices of Pi and Pj in G will result in a Type 2 dependency(vlj , v

ki ) if p = s(vlj) = g(vki ) and t̂g(vki ) ≤ t̂g(vlj). For

the base step: initially, all ADG dependencies have been ad-hered to since v1i is staged ∀ i ∈ {1, . . . , N}. For the in-ductive step: assuming vertices up until vk−1i and vl−1j havebeen completed in accordance with all ADG dependencies,it is sufficient to ensure AGVi and AGVj will not collideat p̄ while completing vki and vlj respectively, by ensuringts(v

ki ) > tg(vlj). By line 19 of Algorithm 1 the Type 2 de-

pendency (vki , vlj) guarantees ts(vki ) > tg(vlj). Since, by As-

sumption 1, the ADG is acyclic, at least one vertex of theADG can be in-progress at all times. By the finite nominalexecution time of the MAPF solution in Definition 1, despitea finite number of delays of finite duration, finite-time plancompletion is established. This completes the proof. �

3 Switching Dependencies in the ActionDependency Graph

We now introduce the concept of a reversed ADG depen-dency. In the ADG, Type 2 dependencies essentially encodean ordering constraint for AGVs visiting a vertex in G. Theidea is to switch this ordering to minimize the effect an un-foreseen delay has on the task completion time of each AGV.

3.1 Reverse Type 2 DependenciesWe introduce the notion of a reverse Type 2 dependency inDefinition 4. It states that a dependency and its reverse en-code the same collision avoidance constraints, but with areversed AGV ordering. Lemma 1 can be used to obtain adependency which conforms to Definition 4. Lemma 1 is il-lustrated graphically in Fig. 3.

Definition 4 (Reverse Type 2 dependency). Consider a Type2 dependency d = (vki , v

lj). d requires ts(vlj) ≥ tg(vki ).

vk−1i vki vk+1i

vl−1j vlj vl+1j

forward reverse

Figure 3: A subset of an ADG with a dependency (black)and its reverse (red)

A reverse dependency of d is a dependency d′ that ensurests(v

ki ) ≥ tg(vlj).

Lemma 1 (Reversed Type 2 dependency). Letvki , v

lj , v

l+1j , vk−1i ∈ VADG. Then d′ = (vl+1

j , vk−1i )

is the reverse dependency of d = (vki , vlj).

Proof 2: The dependency d = (vki , vlj) encodes the con-

straint ts(vlj) ≥ tg(vki ). The reverse of d is denoted asd′ = (vl+1

j , vk−1i ). d′ encodes the constraint ts(vk−1i ) ≥tg(vl+1

j ). By definition, ts(vki ) ≥ tg(vk−1i ) and ts(vl+1j ) ≥

tg(vlj). Since tg(v) ≥ ts(v), this implies that d′ encodes theconstraint ts(vki ) ≥ tg(vlj), satisfying Definition 4. �

The modified ADG ensures that reverse dependenciesmaintain collision avoidance since adjacent vertices in VADGrefer to spatially different locations, cf. line 7 in Algo-rithm 1.

3.2 Switchable Action Dependency GraphHaving introduced reverse Type 2 dependencies, it is nec-essary to formalize the manner in which we can select de-pendencies to obtain a resultant ADG. A cyclic ADG im-plies that two events are mutually dependent, in turn imply-ing a deadlock. To ensure deadlock-free plan execution, itis sufficient to ensure that the selected dependencies resultin an acyclic ADG. Additionally, to maintain the collision-avoidance guarantees implied by the original ADG, it is suf-ficient to select at least one of the forward or reverse de-pendencies of each forward-reverse dependency pair in theresultant ADG. Since selecting both a forward and reversedependency always results in a cycle within the ADG, wetherefore must either select between the forward or the re-verse dependency. To this end, we formally define a Switch-able Action Dependency Graph (SADG) in Definition 5which can be used to obtain the resultant ADG given a se-lection of forward or reverse dependencies.Definition 5 (Switchable Action Dependency Graph). Letan ADG as in Definition 2 contain mT forward-reverse de-pendency pairs determined using Definition 4. From thisADG we can construct a Switchable Action DependencyGraph SADG(b) : {0, 1}mT → G where G is the set ofall possible ADG graphs obtained by the boolean vectorb = {b1, . . . , bmT

}, where bm = 0 and bm = 1 imply se-lecting the forward and reverse dependency of pair m re-spectively, for m ∈ {1, . . . ,mT }.

Corollary 1 (SADG plan execution). Consider an SADG,SADG(b), as in Definition 5. If b is chosen such that GADG= SADG(b) is acyclic, and no dependencies in GADG pointfrom vertices that are staged or in-progress to vertices thatare completed, GADG will guarantee collision- and deadlock-free plan execution.

Proof 3: By definition, any b will guarantee collision-freeplans, since at least one dependency of each forward-reversedependency pair is selected, by Proposition 1. If b ensuresADG = SADG(b) is acyclic, and the resultant ADG hasno dependencies pointing from vertices that are staged orin-progress to vertices that are completed, the dependencieswithin the ADG are not mutually constraining, guaranteeingdeadlock-free plan execution.

The challenge is finding b which ensures SADG(b) isacyclic, while simultaneously minimizing the cumulativeroute completion times of the AGV fleet. This is formulatedas an optimization problem in Section 4.

4 Optimization-Based ApproachHaving introduced the SADG, we now formulate an opti-mization problem which can be used to determine b suchthat the resultant ADG is acyclic, while minimizing cumu-lative AGV route completion times. The result is a Mixed-Integer Linear Program (MILP) which we solve in a closed-loop feedback scheme, since the optimization problem up-dates the AGV ordering at each iteration based on the delaysmeasured at that time-step.

4.1 Translating a Switchable Action DependencyGraph to Temporal Constraints

Regular ADG Constraints Let us introduce the optimiza-tion variable tki,s which, once a solution to the optimizationproblem is determined, will be equal to ts(vki ). The samerelation applies to the optimization variable tki,g and tg(vki ).The event-based constraints within the SADG can be used inconjunction with a predicted duration of each event to deter-mine when each AGV is expected to complete its plan. Letτ(vki ) be the modeled time it will take AGVi to completeevent vki ∈ VADG based solely on dynamical constraints,route distance and assuming the AGV is not blocked. Forexample, we could let τ equal the roadmap edge length di-vided by the expected nominal AGV velocity. We can nowspecify the temporal constraints corresponding to the Type 1dependencies of the plan of AGVi as

t1i,g ≥ t1i,s + τ(v1i ),

t2i,s ≥ t1i,g,t2i,g ≥ t2i,s + τ(v2i ),

t3i,s ≥ t2i,g,...

...

tNii,s ≥ t

Ni−1i,g ,

tNii,g ≥ t

Nii,s + τ(vNi

i ).

(1)

Consider a Type 2 dependency (vki , vlj) within the ADG.

This can be represented by the temporal constraint

tlj,s > tki,g, (2)

where the strict inequality is required to guarantee thatAGVi and AGVj never occupy the same spatial region.

Adding Switchable Dependency Constraints We nowintroduce the temporal constraints which represent the se-lection of forward or reverse dependencies in the SADG.Initially, consider the set EType 2

ADG = {e ∈ EADG|e is Type 2}which represents the sets of all Type 2 dependencies. Theaim here is to determine a set E switchable

ADG ⊂ EType 2ADG containing

the dependencies which could potentially be switched andform part of the MILP decision space.

Consider efwd = (vf , v′f ) ∈ EType 2

ADG and its reverse de-pendency erev = (vr, v

′r). efwd and erev are contained within

E switchableADG if the status of vf , v′f , vr, v

′r is staged. An illustra-

tive example of the dependencies contained within E switchableADG

is shown in Fig. 4. Having determined E switchableADG , the next

step is to include the switched dependencies as temporalconstraints. Directly referring to Section 3.2, we assumemT forward-reverse dependency pairs in E switchable

ADG , wherethe Boolean bm is used to select the forward or reversedependency of the mth forward-reverse dependency pair,m ∈ {1, . . . ,mT }. These temporal constraints can be writ-ten as

tlj,s > tki,g − bmM,

tk−1i,s > tl+1j,g −

(1− bm

)M,

(3)

where M is a large, positive constant such that M >maxi t

Nii . Note that maxi t

Nii can be approximated by esti-

mating the maximum anticipated delays experienced by theAGVs. In practice, however, finding such an upper bound ondelays is not evident, meaning we choose M to be a conser-vatively high value.

4.2 Optimization Problem FormulationWe have shown that an SADG is represented by the tempo-ral constraints in Eq. (1) through Eq. (3) for i ∈ {1, . . . , N},m ∈ {1, . . . ,mT }. Minimizing the cumulative route com-pletion time of all AGVs is formulated as the following op-timization problem

minb, ts, tg

N∑i=1

tNii,g

s.t. Eq. (1) ∀ i = {1, . . . , N},Eq. (2) ∀ e ∈ EType 2

ADG \ EswitchableADG ,

Eq. (3) ∀ e ∈ E switchableADG ,

(4)

where b : {0, 1}mT is a vector containing all the binaryvariables bm and the vectors ts and tg contain all the vari-ables tki,s and tki,g respectively ∀ k ∈ {1, . . . , Ni}, i ∈{1, . . . , N}.

Figure 4: Dependencies contained in E switchableADG , shown in

black (forward) and red (reverse). Gray dependencies are inEType 2

ADG \ E switchableADG .

4.3 Solving the MILP in a Feedback LoopThe aforementioned optimization formulation can be solvedbased on the current AGV positions in a feedback loop.The result is a continuously updated GADG which guaran-tees minimal cumulative route completion times based oncurrent AGV delays. This feedback strategy is defined in Al-gorithm 2.

An important aspect to optimal feedback control strate-gies is that of recursive feasibility, which means that the op-timization problem will remain feasible as long as the con-trol law is applied. The control strategy outlined in Algo-rithm 2 is guaranteed to remain recursively feasible, as for-mally shown in Proposition 2.

Algorithm 2 Switching ADG Feedback Scheme

1: Get goals and locations2: Solve MAPF to obtain P3: Construct ADG using Algorithm 14: Determine SADG(b) and set b = 0 (see Section 4.1)5: while Plans not done do6: get current position along plans for each robot7: b←MILP in Eq. (4)8: ADG← SADG(b)

Proposition 2 (Recursive Feasibility). Consider an ADG, asdefined in Definition 2, which is acyclic at time t = 0. Con-secutively applying the MILP solution from Eq. (4) is guar-anteed to ensure the resultant ADG remains acyclic for allt > 0.Proof 4: Proof by induction. Consider an acyclic ADG asdefined in Definition 2, at a time t. The MILP in Eq. (4) al-ways has the feasible solution b = 0 if the initial ADG (fromwhich the MILP’s constraints in Eq. (1) through Eq. (3) aredefined) is acyclic. Any improved solution of the MILP withb 6= 0 is necessarily feasible, implying a resultant acyclicADG. This implies that the MILP is guaranteed to return afeasible solution, the resultant ADG will always be acyclic ifthe ADG before the MILP was solved, was acyclic. Since theADG at t = 0 is acyclic (a direct result of a MAPF solution),it will remain acyclic for t > 0. �

4.4 Decreasing Computational EffortThe time required to solve the MILP will directly affect thereal-time applicability of this approach. In general, the com-plexity of the MILP increases exponentially in the numberof binary variables. To render the MILP less computation-ally demanding, it is therefore most effective to decrease the

Figure 5: Dependency selection for a horizon of 4 vertices.Switchable dependency pairs are shown in black (forward)and red (reverse). Regular dependencies considered in theMILP are green. Dependencies not considered are gray.

number of binary variables. We present two complementarymethods to achieve this goal.

Switching Dependencies in a Receding Horizon Insteadof including all switchable dependency pairs in the setE switchable

ADG , we can only include the switchable dependen-cies associated with vertices within a horizon H from thelast completed vertex. An illustration of such selection forH = 4 is shown in Fig. 5. Note that dependency selectionusing this approach maintains ADG acyclicity as in the in-finite horizon case, because the set E switchable

ADG is smaller, butEq. (4) remains recursively feasible since the trivial solutionguarantees a acyclic ADG at every time-step. Proposition 2is equally valid when only considering switchable depen-dencies in a receding horizon. The horizon length H can beseen as a tuning parameter which can offer a trade-off be-tween computational complexity and solution optimality.

Note that, to guarantee recursive feasibility, any switch-able dependencies which are not within the horizon H (e.g.the green dependencies in Fig. 5) still need to be consideredwithin the MILP by applying the constraint in Eq. (2). Fu-ture work will look into a receding horizon approach thatdoes not necessarily require the consideration of all theseconstraints while guaranteeing recursive feasibility.

Dependency Grouping We observed that multiple depen-dencies would often form patterns, two of which are shownin Fig. 6. These patterns are referred to as same-directionand opposite-direction dependency groups, shown in Fig. 6aand Fig. 6b respectively. These groups share the same prop-erty that the resultant ADG is acyclic if and only if either allthe forward or all the reverse dependencies are active. Thismeans that a single binary variable is sufficient to describethe switching of all the dependencies within the group, de-creasing the variable space of the MILP in Eq. (4). Oncesuch a dependency group has been identified, the temporalconstraints can then be defined as

tlj,s > tki,g − bDGM ∀ (vki , vlj) ∈ DGfwd,

tlj,s > tki,g − (1− bDG)M ∀ (vki , vlj) ∈ DGrev,

(5)

where DGfwd and DGrev refer to the forward and reverse de-pendencies of a particular grouping respectively, and bDG isa binary variable which switches all the forward or reversedependencies in the entire group simultaneously.

(a) same (b) opposite

Figure 6: Dependency groups. Each dependency is eitheroriginal (black) or reversed (red). Reverse and forward de-pendency pairings are differentiated by line styles.

5 EvaluationWe design a set of simulations to evaluate the approach interms of re-ordering efficiency when AGVs are subjected todelays while following their initially planned paths. We usethe method presented by Hönig et al. (Hönig et al. 2019) asa comparison baseline. All simulations were conducted on aLenovo Thinkstation with an Intel® Xeon E5-1620 3.5GHzprocessor and 64 GB of RAM.

5.1 Simulation SetupThe simulations consider a roadmap as shown in Fig. 1. Ateam of AGVs of size {30, 40, 50, 60, 70} are each initial-ized with a random start and goal position. ECBS (Bareret al. 2014) is used to solve the MAPF with sub-optimalityfactor w = 1.6. We consider delays of duration k ={1, 3, 5, 10, 15, 20, 25} time-steps. At the kth time-step, arandom subset (20%) of the AGVs are stopped for a lengthof k. Eq. (4) is solved at each time-step with M = 104. Weevaluate our approach using a Monte Carlo method: for eachAGV team size and delay duration configuration, we con-sider 100 different randomly selected goal/start positions.The receding horizon dependency selection and dependencygroups are used as described in Section 4.4.

5.2 Performance Metric and ComparisonPerformance is measured by considering the cumulativeplan completion time of all the AGVs. This is compared tothe same metric using the original ADG approach with noswitching as in (Hönig et al. 2019), which is equivalent toforcing the solution of Eq. (4) to b = 0 at every time-step.The improvement is defined as

improvement =

∑tbaseline −

∑tswitching∑

tbaseline· 100%,

where∑t∗ refers to the cumulative plan completion time

for all AGVs. Note that we consider cumulative plan com-pletion time instead of the make-span because we want toensure each AGV completes its plans as soon as possible,such that it can be assigned a new task.

Another important consideration is the time it takes tosolve the MILP in Eq. (4) at each time-step. For our sim-ulations, the MILP was solved using the academically ori-entated Coin-Or Branch-and-Cut (CBC) solver (Forrest etal. 2018). However, based on preliminary tests, we didnote better performance using the commercial solver Gurobi(Gurobi Optimization, LLC 2020). This yielded computa-tional time improvements by a factor 1.1 up to 20.

0 10 20 30 40 50Delay k [timesteps]

0

5

10

15

20

25

30

Impr

ovem

ent [

%]

AGVs3040506070

Figure 7: Average improvement of 100 scenarios for variousdelay lengths and AGV group sizes. Each scenario refersto different randomly generated starts/goals and a randomlyselected subset of delayed AGVs. Solid lines depict the av-erage, lighter regions encapsulate the min-max values.

5.3 Results and DiscussionTo showcase the efficacy of our approach, we first determinethe average improvement of 100 random scenarios using theminimum switching horizon length of 1 for different AGVteam sizes and delay lengths, shown in Fig. 7. The averageimprovement is highly correlated to the delay duration ex-perienced by the AGVs.

Considering Fig. 7, it is worth noting how the graph layoutand AGV-to-roadmap density affects the results: the AGVgroup size of 40 shows the best average improvement for agiven delay duration. This leads the authors to believe thereis an optimal AGV group size for a given roadmap, whichensures the workspace is both:

1. Not too congested to make switching of dependencies im-possible due to the high density of AGVs occupying themap.

2. Not too sparse such that switching is never needed sinceAGVs are distant from each other, meaning that switchingrarely improves task completion time.

Considering the switching dependency horizon, Fig. 8shows the average improvement for 100 random start/goalpositions and delayed AGV subset selection. We observethat a horizon length of 1 already significantly improves per-formance, and larger horizons seem to gradually increaseperformance for larger AGV teams.

Fig. 9 shows the peak computation time for various hori-zon lengths and AGV team sizes. As expected, the compu-tation time is exponential with horizon size and AGV teamsize. Two additional observations that were made:

1. High variability in results. Note the high variability in im-provement indicated by the large lighter regions in Fig. 7.This means that for different random start/goal and delayconfigurations, the improvement varied significantly. Thisis due to the fact that each start/goal combination pro-

0 1 2 3 4 5Horizon H

0

1

2

3

4

5

6

Imp

rovem

ent

[%]

AGVs

30

40

50

60

70

(a) Delay k = 3.

0 2 4 6 8 10 12 14Horizon H

0

5

10

15

20

25

30

Imp

rovem

ent

[%]

AGVs

30

40

50

60

70

(b) Delay k = 25.

Figure 8: Average improvement of 100 random start/goalpositions and delayed AGV subset, for different switchinghorizon lengths, for different AGV group sizes. Solid linesdepict the average, lighter regions encapsulate the min-maxvalues.

0 2 4 6 8 10 12 14Horizon H

10−2

10−1

100

101

102

Peak

com

puta

tion

time

[s]

AGVs3040506070

Figure 9: The peak computation to solve the optimizationproblem for different AGV team sizes and considered de-pendency horizon lengths. This plot considers the averageimprovement for delays k = {1, 3, 5, 10, 15, 20, 25}. Solidlines depict the average, lighter regions encapsulate the min-max values.

vides differing degrees-of-freedom from an ADG switch-ing perspective.

2. Occasional worse performance. Occasionally, albeitrarely, our approach would yield a negative improvementfor a particular random start/goal configuration. This wastypically observed for small delay durations. The reasonis that the optimization problem solves the switching as-suming no future delays. However, it may so happen thatthe AGV which was allowed ahead of another, is delayedin the near future, additionally delaying the AGV it sur-passed. We believe a robust optimization approach couldpotentially resolve this.

Finally, we emphasize that our proposed approach:

1. Is a complementary approach which could be used to-gether with the methods presented in (Atzmon et al. 2020;Hönig et al. 2019) and other works.

2. Applies to directional and weighted roadmaps (sinceADG switching retains the direction of the original MAPFplan);

3. Persistent planning schemes as in (Hönig et al. 2019), aslong as all AGV plans P are known when the ADG isconstructed.

6 Conclusions and Future WorkIn this paper, we introduced a novel method which, givena MAPF solution, can be used to switch the ordering ofAGVs in an online fashion based on currently measuredAGV delays. This switching was formulated as an optimiza-tion problem as part of a feedback control scheme, whilemaintaining the deadlock- and collision-free guarantees ofthe original MAPF plan. Results show that our approachclearly improves the cumulative task completion time of theAGVs when a subset of AGVs are subjected to delays.

In future work, we plan to consider a receding horizon op-timization approach. In this work, ADG dependencies canbe switched in a receding horizon fashion, but the plans stillneed to be of finite length for the optimization problem tobe formulated. For truly persistent plans (theoretically infi-nite length plans), it is necessary to come up with a recedinghorizon optimization formulation to apply the method pro-posed in this paper.

Another possible extension is complementing our ap-proach with a local re-planning method. This is because ourapproach maintains the originally planned trajectories of theAGVs. However, we observed that under large delays, theoriginally planned routes can become largely inefficient dueto the fact the AGVs are in entirely different locations alongtheir planned path. This could potentially be addressed byintroducing local re-planning of trajectories.

To avoid the occasional worse performance, we suggest arobust optimization approach to avoid switching dependen-cies which could have a negative impact on the plan execu-tion given expected future delays.

Finally, to further validate this approach, it is desirable tomove towards system-level tests on a real-world intralogis-tics setup.

ReferencesAndreychuk, A.; Yakovlev, K.; Atzmon, D.; and Stern, R.2019. Multi-agent pathfinding with continuous time. InProceedings of the Twenty-Eighth International Joint Con-ference on Artificial Intelligence, IJCAI-19, 39–45. Interna-tional Joint Conferences on Artificial Intelligence Organiza-tion.Atzmon, D.; Stern, R.; Felner, A.; Wagner, G.; Barták, R.;and Zhou, N.-F. 2020. Robust Multi-Agent Path Findingand Executing. Journal of Artificial Intelligence Research67:549–579.Barer, M.; Sharon, G.; Stern, R.; and Felner, A. 2014.Suboptimal variants of the conflict-based search algorithmfor the multi-agent pathfinding problem. In Proceedings ofthe European Conference on Artificial Intelligence (ECAI),961–962.Bogatarkan, A.; Patoglu, V.; and Erdem, E. 2019. A declar-ative method for dynamic multi-agent path finding. In Cal-vanese, D., and Iocchi, L., eds., GCAI 2019. Proceedingsof the 5th Global Conference on Artificial Intelligence, vol-ume 65 of EPiC Series in Computing, 54–67. EasyChair.Felner, A.; Stern, R.; Shimony, S. E.; Boyarski, E.; Gold-enberg, M.; Sharon, G.; Sturtevant, N. R.; Wagner, G.; andSurynek, P. 2017. Search-based optimal solvers for themulti-agent pathfinding problem: Summary and challenges.In Fukunaga, A., and Kishimoto, A., eds., Proceedings of theTenth International Symposium on Combinatorial Search,SOCS 2017, 16-17 June 2017, Pittsburgh, Pennsylvania,USA, 29–37. AAAI Press.Felner, A.; Li, J.; Boyarski, E.; Ma, H.; Cohen, L.; Kumar,S.; and Koenig, S. 2018. Adding heuristics to conflict-based search for multi-agent path finding. In Proceedingsof the International Conference on Automated Planning andScheduling (ICAPS), 83–87.Forrest, J.; Ralphs, T.; Vigerske, S.; LouHafer; Kristjans-son, B.; jpfasano; EdwinStraver; Lubin, M.; Santos, H. G.;rlougee; and Saltzman, M. 2018. coin-or/cbc: Version 2.9.9.Gurobi Optimization, LLC. 2020. Gurobi optimizer refer-ence manual.Hönig, W.; Kumar, S.; Cohen, L.; Ma, H.; Xu, H.; Ayanian,N.; and Koenig, S. 2017. Summary: Multi-agent path find-ing with kinematic constraints. In Proceedings of the Inter-national Joint Conference on Artificial Intelligence (IJCAI),4869–4873.Hönig, W.; Kiesel, S.; Tinka, A.; Durham, J.; and Ayanian,N. 2019. Persistent and robust execution of MAPF sched-ules in warehouses. IEEE Robotics and Automation Letters4(2):1125–1131.Lam, E.; Bodic, P. L.; Harabor, D.; and Stuckey, P. 2019.Branch-and-cut-and-price for multi-agent pathfinding. InProceedings of the International Joint Conference on Artifi-cial Intelligence (IJCAI), (in print).Li, J.; Harabor, D.; Stuckey, P.; Ma, H.; and Koenig, S. 2019.Symmetry-breaking constraints for grid-based multi-agentpath finding. In Proceedings of the AAAI Conference onArtificial Intelligence (AAAI), (in print).

Ma, H.; Kumar, S.; and Koenig, S. 2017. Multi-agent pathfinding with delay probabilities. In Proceedings of the AAAIConference on Artificial Intelligence (AAAI), 3605–3612.Morris, R.; Pasareanu, C. S.; Luckow, K. S.; Malik, W.; Ma,H.; Kumar, T. K. S.; and Koenig, S. 2016. Planning, schedul-ing and monitoring for airport surface operations. In AAAIWorkshop: Planning for Hybrid Systems.Ontanón, S.; Synnaeve, G.; Uriarte, A.; Richoux, F.;Churchill, D.; and Preuss, M. 2013. A survey of real-time strategy game ai research and competition in starcraft.IEEE Transactions on Computational Intelligence and AI ingames 5(4):293–311.Pecora, F.; Andreasson, H.; Mansouri, M.; and Petkov, V.2018. A loosely-coupled approach for multi-robot coordi-nation, motion planning and control. In Twenty-eighth inter-national conference on automated planning and scheduling.Sharon, G.; Stern, R.; Felner, A.; and Sturtevant, N. R. 2015.Conflict-based search for optimal multi-agent pathfinding.Artificial Intelligence 219:40 – 66.Stern, R.; Sturtevant, N.; Felner, A.; Koenig, S.; Ma, H.;Walker, T. T.; Li, J.; Atzmon, D.; Cohen, L.; Kumar, T.K. S.; Boyarski, E.; and Barták, R. 2019. Multi-agentpathfinding: Definitions, variants, and benchmarks. CoRRabs/1906.08291.Wurman, P. R.; D’Andrea, R.; and Mountz, M. 2008. Co-ordinating hundreds of cooperative, autonomous vehicles inwarehouses. volume 29 of AI Magazine, 9 – 19. Menlo Park,CA: AAAI.Yakovlev, K. S., and Andreychuk, A. 2017. Any-anglepathfinding for multiple agents based on SIPP algorithm.CoRR abs/1703.04159.Yu, J., and LaValle, S. M. 2012. Multi-agent Path Planningand Network Flow. CoRR abs/1204.5717.Yu, J., and LaValle, S. M. 2013. Planning optimal pathsfor multiple robots on graphs. In 2013 IEEE InternationalConference on Robotics and Automation, 3612–3617.

Documents

A Feedback Scheme to Reorder a Multi-Agent Execution