TARGET ASSIGNMENT AND PATH PLANNING FOR …idm-lab.org/bib/abstracts/papers/dissertation-ma.pdfbers in my dissertation committee and proposal guidance committee: Nora Ayanian, Gaurav

TARGET ASSIGNMENT AND PATH PLANNING FOR NAVIGATION TASKS WITHTEAMS OF AGENTS

by

Hang Ma

A Dissertation Presented to theFACULTY OF THE USC GRADUATE SCHOOLUNIVERSITY OF SOUTHERN CALIFORNIA

In Partial Fulfillment of theRequirements for the Degree

DOCTOR OF PHILOSOPHY(COMPUTER SCIENCE)

August 2020

Copyright 2020 Hang Ma

Acknowledgments

First and foremost, I would like to thank my advisor, Sven Koenig, for his guidance with a

profound vision, his patience for leading many inspiring and insightful discussions, and his

offering of the greatest freedom for me to pursue my own research ideas, all throughout this

five-year journey. I can never learn enough from him about how to keep the enthusiasm and

curiosity for both research and life. He has taught me how to be a great advisor of my own

students in the future.

I would like to thank T. K. Satish Kumar, with whom I can never spend enough days and

nights writing papers together, both on whiteboards and online. Satish has always been a great

source of humor who tells only jokes made up by himself. I would like to thank other mem-

bers in my dissertation committee and proposal guidance committee: Nora Ayanian, Gaurav

S. Sukhatme, Satyandra K. Gupta, and Peter Stone, who have provided helpful comments and

suggestions on this dissertation. Nora has provided great robotics insights and hardware for this

dissertation. Peter has pointed out many interesting research directions on multi-agent/robot sys-

tems, and it has always been great fun interacting with his research group members at various

conferences.

Much of my research would not have been successful without my colleagues and collab-

orators. I would like to thank Craig Tovey, through whom I obtained an Erdos number of 2.

Our collaboration and discussions built up my confidence in the beginning of this journey and

inspired much of my later research. I would like to thank Wolfgang Honig, without whom my

algorithms would have never been able to run on real robots. We have made a great AI+robotics

team. I would like to thank Ariel Felner for our fruitful collaboration and discussions, from

ii

which I have learned a lot about heuristic search. I cannot count how many telecons I have had

with Daniel Harabor and Peter J. Stuckey, both from the other side of the earth. Our collabo-

ration has always been productive and fun. I have had great fun with all members of the IDM

Lab: I would like to thank Jiaoyang Li, who has grown from a mentee to a great colleague of

mine and whose undergraduate summer research has become part of this dissertation. I would

like to thank Liron Cohen and Tansel Uras for their collaboration and many great discussions.

Liron has also been a great source of ideas for entertainment in Los Angeles. My special thanks

go to former IDM Lab member William Yeoh, from whom I have learned about how to survive

this journey. Many thanks to Hong Xu and Han Zhang, who have made our lab a joyful place.

I have also been fortunate to work with and mentor many talented Masters and undergraduate

students, including Minghua Liu, Jiangxing Wang, and Jingxing Yang. Many thanks to my great

collaborators Guni Sharon, Glenn Wagner, Eli Boyarski, Pavel Surynek, Carlos Hernandez, Roni

Stern, Nathan Sturtevant, Thayne Walker, Roman Bartak, and many others. We have formed a

great MAPF research community.

I am grateful to Joelle Pineau, my former advisor at McGill University, who has inspired and

encouraged me to work in AI. My special thanks go to many USC professors: David Kempe and

Shanghua Teng have provided insightful suggestions for several research problems I have had. I

have enjoyed interacting with Fei Sha and his group members. I would also like to thank Robert

Morris and Corina Pasareanu from NASA Ames for a great blue-sky internship project and Ngai

Meng Kou and Cheng Peng from Cainiao, Alibaba for an opportunity to apply my algorithms to

warehouse automation.

My deepest gratitude to my parents, Xiaoxuan Ma and Bihua Lin, for their support, educa-

tion, encouragement, and love that have made me who I am.

Last but not least, I would like to thank my wife, Xiaodong Huang, for her unconditional

love, patience, and support. I have also been inspired a lot by just watching our newborn baby

figuring out the world. My family has made this journey wonderful.

iii

The research presented in this dissertation was supported by a USC Annenberg fellowship,

Cainiao Smart Logistics Network, an internship at NASA Ames via Stinger Ghaffarian Tech-

nologies, an internship at ARL West, ONR in form of a MURI project under contract/grant

number N00014-09-1-1031, NSF under grant numbers 1319966, 1409987, 1724392, 1817189,

and 1837779, and a gift from Amazon.

iv

Contents

Acknowledgments ii

List of Tables ix

List of Figures x

List of Important Abbreviations xiv

Abstract xvi

Chapter 1: Introduction 11.1 Definitions, Categorizations, and Assumptions . . . . . . . . . . . . . . . . . 41.2 Central Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Hypothesis and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1 Contribution 1: Theoretical Analysis of Target Assignment andPath Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3.2 Contribution 2: One-Shot Combined Target Assignment and PathPlanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.3 Contribution 3: Long-Term Target Assignment and Path Planning . . 151.3.4 Contribution 4: Long-Term Target Assignment and Path Planning

with Kinematic Constraints . . . . . . . . . . . . . . . . . . . . . . 161.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 2: Problem Definitions 182.1 Problem Definition of MAPF . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.1 MAPF Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Problem Definition of Anonymous MAPF . . . . . . . . . . . . . . . . . . . 21

2.2.1 Anonymous MAPF Example . . . . . . . . . . . . . . . . . . . . . . 232.3 Problem Definition of TAPF . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.1 TAPF as a Generalization of MAPF . . . . . . . . . . . . . . . . . . 252.3.2 TAPF Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4 Problem Definition of PERR . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4.1 PERR as a Relaxation of MAPF . . . . . . . . . . . . . . . . . . . . 292.4.2 PERR Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.3 K-PERR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4.4 1-PERR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

v

2.5 Problem Definition of MAPD . . . . . . . . . . . . . . . . . . . . . . . . . . 322.5.1 MAPD as a Long-Term Generalization of One-Shot Problems . . . . 342.5.2 MAPD Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 3: Target Assignment and Path Planning for Teams of Agents 373.1 One-Shot Path-Planning Problem: MAPF . . . . . . . . . . . . . . . . . . . 37

3.1.1 Theoretical Results for MAPF . . . . . . . . . . . . . . . . . . . . . 383.1.2 MAPF Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1.3 MAPF Extensions and Related Problems . . . . . . . . . . . . . . . 43

3.2 MAPF Algorithm Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2.1 Cooperative A* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.2.2 Conflict-Based Search . . . . . . . . . . . . . . . . . . . . . . . . . 473.2.3 ILP-Based MAPF Algorithm . . . . . . . . . . . . . . . . . . . . . . 51

3.3 One-Shot Path Planning with Kinematic Constraints . . . . . . . . . . . . . . 553.4 Other One-Shot and Long-Term Target-Assignment and Path-Planning

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.4.1 One-Shot Target-Assignment Problem . . . . . . . . . . . . . . . . . 573.4.2 One-Shot Combined Target-Assignment and Path-Planning Prob-

lem for One Team of Agents: Anonymous MAPF . . . . . . . . . . . 573.4.3 Long-Term Target-Assignment Problem . . . . . . . . . . . . . . . . 58

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Chapter 4: Theoretical Analysis of Target Assignment and Path Planning 604.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2 Unified NP-Hardness Proof Structure and Intractability of PERR . . . . . . . 634.3 Other Target-Assignment and Path-Planning Problems . . . . . . . . . . . . 66

4.3.1 Complexity Results for MAPF . . . . . . . . . . . . . . . . . . . . . 664.3.2 Complexity Results for MAPD . . . . . . . . . . . . . . . . . . . . . 674.3.3 Complexity Results for K-PERR . . . . . . . . . . . . . . . . . . . 684.3.4 Complexity Results for TAPF . . . . . . . . . . . . . . . . . . . . . 734.3.5 Additional Generalizations . . . . . . . . . . . . . . . . . . . . . . . 73

4.4 Feasibility of PERR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.5 PERR Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.5.1 Adapted CBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.5.2 ILP-Based PERR Algorithm . . . . . . . . . . . . . . . . . . . . . . 784.5.3 Solving K-PERR Optimally . . . . . . . . . . . . . . . . . . . . . . 83

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Chapter 5: One-Shot Target Assignment and Path Planning 895.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905.2 ILP-Based TAPF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.1 Reducing TAPF to Multi-Commodity Flow . . . . . . . . . . . . . . 925.2.2 Solving TAPF via ILP . . . . . . . . . . . . . . . . . . . . . . . . . 945.2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

vi

5.2.4 Special Case of One Team . . . . . . . . . . . . . . . . . . . . . . . 955.3 Conflict-Based Min-Cost Flow . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.3.1 High-Level Search of CBM . . . . . . . . . . . . . . . . . . . . . . 975.3.2 Low-Level Search of CBM . . . . . . . . . . . . . . . . . . . . . . . 995.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.3.4 Avoiding to Create Collisions with Other Teams in the Low-Level

Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.3.5 Analysis of Properties . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.4.1 Experiment 1: Alternative Algorithms . . . . . . . . . . . . . . . . . 1095.4.2 Experiment 2: Team Size . . . . . . . . . . . . . . . . . . . . . . . . 1125.4.3 Experiment 3: Number of Agents and Scalability . . . . . . . . . . . 1135.4.4 Experiment 4: Warehouse Map . . . . . . . . . . . . . . . . . . . . 115

5.5 Including Kinematic Constraints . . . . . . . . . . . . . . . . . . . . . . . . 1175.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Chapter 6: Long-Term Target Assignment and Path Planning 1196.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.2 Motivating MAPD Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.3 Utilizing Environmental Characteristics: Well-Formedness . . . . . . . . . . 1246.4 Decoupled MAPD Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.4.1 TP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.4.2 TPTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.5 Centralized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1376.5.1 Task/Endpoint-Assignment Procedure . . . . . . . . . . . . . . . . . 1386.5.2 Path-Planning Procedure . . . . . . . . . . . . . . . . . . . . . . . . 1406.5.3 Extensions of CENTRAL . . . . . . . . . . . . . . . . . . . . . . . 142

6.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446.6.1 Experiment 1: Makespan and Service Time . . . . . . . . . . . . . . 1466.6.2 Experiment 2: Runtime per Time Step . . . . . . . . . . . . . . . . . 1476.6.3 Experiment 3: Number of Executed Tasks . . . . . . . . . . . . . . . 1486.6.4 Experiment 4: Scalability . . . . . . . . . . . . . . . . . . . . . . . 150

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Chapter 7: MAPD with Kinematic Constraints in a Simulated System 1537.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1547.2 Assumptions and TP-SIPPwRT . . . . . . . . . . . . . . . . . . . . . . . . . 1567.3 SIPPwRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

7.3.1 A* Search of SIPP . . . . . . . . . . . . . . . . . . . . . . . . . . . 1587.3.2 Reservation Table and Safe Intervals . . . . . . . . . . . . . . . . . . 1587.3.3 Time Offsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1607.3.4 Increased/Decreased Bounds . . . . . . . . . . . . . . . . . . . . . . 1637.3.5 Admissible h-Values for Multiple Targets . . . . . . . . . . . . . . . 1647.3.6 Pseudocode of SIPPwRT . . . . . . . . . . . . . . . . . . . . . . . . 1657.3.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

vii

7.4 Analysis of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1727.5 Simulated Automated Warehouses . . . . . . . . . . . . . . . . . . . . . . . 1767.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

7.6.1 Experiment 1: MAPD Algorithms and Task Velocity . . . . . . . . . 1787.6.2 Experiment 2: Number of Agents, Task Frequency, and Task Velocity 1817.6.3 Experiment 3: Scalability, Number of Agents, and Task Velocity . . . 1857.6.4 Experiment 4: Robot Simulator . . . . . . . . . . . . . . . . . . . . 185

7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

Chapter 8: Conclusions 1888.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1888.2 Direct Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1928.3 Limitations and Future Directions . . . . . . . . . . . . . . . . . . . . . . . 194

Bibliography 197

viii

List of Tables

Table 1.1: Summary of problem formulations and contributions. . . . . . . . . 12

Table 2.1: Summary of symbols. The last column specifies if the symbols areused for specific problems. . . . . . . . . . . . . . . . . . . . . . . 36

Table 5.1: Results for different TAPF and MAPF algorithms on 30×30 4-neighbor grids with randomly blocked cells for different numbersof agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Table 5.2: Results for CBM on 30×30 4-neighbor grids with randomlyblocked cells for different team sizes. . . . . . . . . . . . . . . . . . 112

Table 5.3: Results for CBM on 30×30 4-neighbor grids with randomlyblocked cells for different numbers of agents. . . . . . . . . . . . . . 114

Table 6.1: Results for TP, TPTS, and CENTRAL in the small simulated ware-house environment. . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Table 6.2: Throughputs for TP, TPTS, and CENTRAL in the small simulatedwarehouse environment. . . . . . . . . . . . . . . . . . . . . . . . . 149

Table 6.3: Results for TP in the large simulated warehouse environment. . . . . 149

Table 7.1: Results for TP-SIPPwRT, CENTRAL, and TP-A* in the small sim-ulated warehouse environment. . . . . . . . . . . . . . . . . . . . . 178

Table 7.2: Results for TP-SIPPwRT in the small simulated warehouse envi-ronment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Table 7.3: Results for TP-SIPPwRT in the large simulated warehouse envi-ronment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

ix

List of Figures

Figure 1.1: The typical layout of part of an Amazon Robotics automatedwarehouse system, reproduced from Wurman, D’Andrea, andMountz (2008). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Figure 2.1: Example of a MAPF problem instance. . . . . . . . . . . . . . . . 20

Figure 2.2: Graph representation of the MAPF problem instance shown inFigure 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Figure 2.3: Example of an Anonymous MAPF problem instance. . . . . . . . . 23

Figure 2.4: Graph representation of the Anonymous MAPF problem instanceshown in Figure 2.3. . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 2.5: Example of a TAPF problem instance. . . . . . . . . . . . . . . . . 26

Figure 2.6: Graph representation of the TAPF problem instance shown in Fig-ure 2.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 2.7: Example of a PERR problem instance. . . . . . . . . . . . . . . . . 30

Figure 2.8: Graph representation of the PERR problem instance shown inFigure 2.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Figure 2.9: Example of a MAPD problem instance. . . . . . . . . . . . . . . . 35

Figure 2.10: Graph representation of the MAPD problem instance shown inFigure 2.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Figure 2.11: Relationships between different problems. . . . . . . . . . . . . . . 36

Figure 3.1: The constraint tree for the MAPF problem instance shown in Fig-ure 2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Figure 3.2: Example of the construction of the gadgets inN for edge (u, v) ∈E and time step t. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

x

Figure 3.3: A feasible integer multi-commodity flow for the MAPF probleminstance shown in Figure 2.1. . . . . . . . . . . . . . . . . . . . . . 53

Figure 4.1: Example of the reduction from a ≤3,=3-SAT problem instanceto a PERR problem instance. . . . . . . . . . . . . . . . . . . . . . 65

Figure 4.2: Example of the reduction from a 2/2/3-SAT problem instance toa 2-PERR problem instance. . . . . . . . . . . . . . . . . . . . . . 72

Figure 4.3: Motivating examples of PERR that demonstrate the power of ex-change operations. . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Figure 4.4: Example of the construction used in the proof of Theorem 4.14. . . 76

Figure 4.5: Example of the construction of N for edge (u, v) ∈ E and timestep t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Figure 4.6: A feasible integer multi-commodity flow for the PERR probleminstance shown in Figure 2.7. . . . . . . . . . . . . . . . . . . . . . 80

Figure 5.1: Example of the construction of the gadgets inN for edge (u, v) ∈E and time step t. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Figure 5.2: A feasible integer multi-commodity flow for the TAPF probleminstance shown in Figure 2.5. . . . . . . . . . . . . . . . . . . . . . 96

Figure 5.3: Construction of the root node Root and the low-level search forteam1 and team2. The constraints and the plan of Root areshown on the top right. The colliding teams of each node areshown to explain the tie breaking. . . . . . . . . . . . . . . . . . . 101

Figure 5.4: Construction of the left child node N1 and the low-level searchfor team1. The constraints and the plan of N1 are shown on thetop right. The constraints and the plan of the other nodes are notshown. The colliding teams of each node are shown to explainthe tie breaking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Figure 5.5: Construction of the right child node N2 and the low-level searchfor team2. The constraints and the plan of N2 are shown on thetop right. The constraints and the plan of the other nodes are notshown. The colliding teams of each node are shown to explainthe tie breaking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Figure 5.6: Makespans and runtimes for CBM on 30×30 4-neighbor gridswith randomly blocked cells for different team sizes. . . . . . . . . 112

Figure 5.7: Success rates for CBM on 30×30 4-neighbor grids with randomlyblocked cells for different numbers of agents. . . . . . . . . . . . . 114

xi

Figure 5.8: A randomly generated TAPF problem instance on the warehousemap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Figure 6.1: Example of an unsolvable MAPD problem instance. . . . . . . . . 123

Figure 6.2: Three MAPD problem instances. . . . . . . . . . . . . . . . . . . . 125

Figure 6.3: Example of a MAPD problem instance for the comparison of TPand TPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Figure 6.4: The small simulated warehouse environment with 50 agents. . . . . 145

Figure 6.5: Number of tasks added (gray) and executed by 50 agents per timestep in a moving 100-time-step window [t− 99, t] for TP, TPTS,and CENTRAL as a function of the time step t for different taskfrequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Figure 6.6: The large simulated warehouse environment with 500 agents. . . . . 150

Figure 6.7: Number of tasks executed per time step during the 100-time-stepwindow [t − 99, t] for TP as a function of the time step t fordifferent numbers of agents. . . . . . . . . . . . . . . . . . . . . . 151

Figure 7.1: Left: Two agents move in the same direction. Middle: D is atits minimum for the vtrans,1 < vtrans,2 case. Right: D is at itsminimum for the vtrans,1 ≥ vtrans,2 case. . . . . . . . . . . . . . . . 162

Figure 7.2: Left: Two agents move in orthogonal directions. Right: D is atits minimum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Figure 7.3: Example of a MAPD problem instance. . . . . . . . . . . . . . . . 169

Figure 7.4: Graph representation of the MAPD problem instance shown inFigure 7.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Figure 7.5: The search of SIPPwRT. . . . . . . . . . . . . . . . . . . . . . . . 170

Figure 7.6: Snapshots of agent a1 following its path. Left [t = 2 +√5+12 =

3.62]: 0.5 time units after agent a1 departs from cell A. Middle[t = 3 + 2

√5

5 = 3.89]: The distance between agents a1 and a2 is

at its minimum (0.5 m). Right [t = 3 +√52 = 4.12]: Agent a1

arrives at cell C. . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

Figure 7.7: The small simulated warehouse environment with 50 agents. . . . . 176

Figure 7.8: Number of tasks executed by 30 agents per second in a moving100-second window (t − 100, t] for TP-SIPPwRT, CENTRAL,and TP-A* as a function of time t for different task velocities. . . . 179

xii

Figure 7.9: Number of tasks added (gray) and executed per second in a mov-ing 100-second window (t−100, t] for TP-SIPPwRT as a functionof time t for different numbers of agents. . . . . . . . . . . . . . . 182

Figure 7.10: The large simulated warehouse environment with 250 agents. . . . . 185

Figure 7.11: Screenshots for Experiment 4 at t = 35 s. Left: Agent simulator.Right: Robot simulator. . . . . . . . . . . . . . . . . . . . . . . . . 186

xiii

List of Important Abbreviations

Abbreviation Description

Problems:

MAPD Multi-Agent Pickup and Delivery: a long-term combined target-assignment and path-planning problem

MAPF Multi-Agent Path Finding: a one-shot path-planning problem

PERR Package Exchange Robot Routing: a one-shot path-planning problem

TAPF Target Assignment and Path Finding: a one-shot combined target-assignment and path-planning problem

Algorithms:

CBM Conflict-Based Min-Cost Flow: a TAPF algorithm

CBS Conflict-Based Search: a MAPF algorithm

CENTRAL a centralized MAPD algorithm

ILP Integer Linear Programming or used as the label of an Integer LinearProgramming-based algorithm

MAPF-POST a polynomial-time procedure that post-processes MAPF solutions

SIPP Safe Interval Path Planning: a one-shot single-agent path-planning algo-rithm

SIPPwRT Safe Interval Path Planning with Reservation Table: an improved versionof Safe Interval Path Planning that stores paths in a reservation table

TP Token Passing: a decoupled MAPD algorithm

xiv

Abbreviation Description

TP-SIPPwRT a version of Token Passing that uses Safe Interval Path Planning withReservation Table for single-agent path planning and outputs plan-execution schedules

TPTS Token Passing with Task Swaps: a decoupled MAPD algorithm

xv

Abstract

In many real-world applications of multi-agent systems, teams of agents must assign targets

among themselves and plan paths to the targets. The agents must avoid collisions with each

other in a congested environment, yet reach their targets as soon as possible. The resulting

coordination problems, which model the target-assignment and path-planning operations of the

agents, are fundamental for these multi-agent systems but, at the same time, computationally

challenging, as there are often many agents and their operating time is long. This dissertation

studies two types of multi-agent coordination problems: (1) One-shot coordination problems that

assign given targets to and plan paths for a given set of agents to reach their targets on a given

graph that models the environment and (2) long-term coordination problems that repeatedly

assign incoming tasks to a given set of agents and plan paths for the agents to the targets of their

assigned tasks.

This dissertation builds upon the recent successes in tackling a one-shot path-planning prob-

lem, called Multi-Agent Path Finding (MAPF), in the artificial intelligence (AI) community as

well as theoretical and algorithmic results for other target-assignment and path-planning prob-

lems from the theoretical computer science, operations research, AI, and robotics communi-

ties. It addresses three central questions when generalizing these results to solving the target-

assignment and the path-planning problems jointly: Question 1: How hard is it to jointly assign

targets to and plan paths for teams of agents? Question 2: How and how well can one jointly

assign targets to and plan paths for teams of agents? Question 3: How do teams of agents execute

the computed solutions?

xvi

We tackle these three central questions by formalizing novel variants of MAPF that model

different target-assignment and path-planning problems, studying their theoretical properties,

and developing algorithms for them and thus make the following four contributions. First, we

introduce a unified NP-hardness proof structure that can be used to derive complexity results for

different target-assignment and path-planning problems. Second, we formalize and study Com-

bined Target Assignment and Path Finding (TAPF), which models the one-shot combined target-

assignment and path-planning problem and present complete and optimal algorithms for solving

it. Third, we formalize and study Multi-Agent Pickup and Delivery (MAPD), which models

the long-term combined target-assignment and path-planning problem and present MAPD algo-

rithms that provide a guarantee on their long-term robustness, namely, allow agents to finish all

tasks without deadlocks. Fourth, we demonstrate how our MAPD algorithms can take some of

the kinodynamic constraints of real-world agents into account to compute plan-execution sched-

ules, which showcases the benefits of our algorithms for real-world applications of multi-agent

systems.

xvii

Chapter 1

Introduction

In many real-world applications of multi-agent systems, teams of autonomous agents must assign

targets (goal locations) among themselves (target-assignment operations) and plan collision-free

paths to their targets (path-planning operations) in order to finish tasks cooperatively. Examples

include autonomous aircraft-towing vehicles (Morris et al., 2016), automated warehouse robots

(Kou, Peng, Ma, Kumar, & Koenig, 2020; Wurman, D’Andrea, & Mountz, 2008), automated-

guided port vehicles (Thurston & Hu, 2002), autonomous intersection management (Dresner

& Stone, 2008), forklift robot fleets (Pecora, Andreasson, Mansouri, & Petkov, 2018; Salvado,

Krug, Mansouri, & Pecora, 2018), game characters in video games (Ma, Yang, Cohen, Kumar, &

Koenig, 2017), object-transportation robots (Mataric, Nilsson, & Simsarin, 1995; Rus, Donald,

& Jennings, 1995), patrolling robots (Agmon, Urieli, & Stone, 2011), search-and-rescue robots

(Jennings, Whelan, & Evans, 1997), service robots (Ahmadi & Stone, 2006; Khandelwal et

al., 2017; Veloso, Biswas, Coltin, & Rosenthal, 2015), swarms of differential-drive robots and

quadcopters (Honig, Kumar, Ma, Ayanian, & Koenig, 2016; Honig, Preiss, Kumar, Sukhatme,

& Ayanian, 2018; Preiss, Honig, Ayanian, & Sukhatme, 2017), robots in formations (Balch &

Arkin, 1998; Li, Sun, et al., 2020; Poduri & Sukhatme, 2004; Smith, Egerstedt, & Howard,

2009; Tanner, Pappas, & Kumar, 2004), and other multi-robot systems (Ma, Honig, Cohen,

et al., 2017). For example, in the near future, autonomous aircraft-towing vehicles will tow

aircraft all the way from the runways to their gates (and vice versa), thereby reducing pollution,

1

Figure 1.1: The typical layout of part of an Amazon Robotics automated warehouse system,reproduced from Wurman, D’Andrea, and Mountz (2008).

energy consumption, congestion, and human workload (Morris et al., 2016). Today, hundreds

of warehouse robots already navigate autonomously in Amazon Robotics automated warehouse

systems (formally called Amazon fulfillment centers) to move inventory pods all the way from

their storage locations to the inventory stations that need the products they store (and vice versa)

(Wurman et al., 2008).

Figure 1.1 shows the typical grid layout of part of an Amazon Robotics automated warehouse

system with inventory stations on the left side and storage locations in the storage area to the

right of the inventory stations. Each inventory station has an entrance (purple cells) and an exit

(pink cells). Each storage location (green cell) can store one inventory pod. Each inventory

pod consists of a stack of trays, each of which holds bins with products. Each warehouse robot

(orange square) is capable of picking up, carrying, and putting down one inventory pod at a time.

The warehouse robots need to move inventory pods all the way from their storage locations to the

inventory stations that need the products they store (to ship them to customers) or from inventory

stations to their assigned storage locations. After a warehouse robot enters an inventory station,

the requested product is removed from its inventory pod by a worker. Once the warehouse robots

have delivered all requested products for one shipment to the same inventory station, the worker

prepares the shipment to the customer.

2

The coordination of autonomous operations of teams of agents is a fundamental building

block for the above multi-agent systems but often requires a large search space since it involves

the following three components: (1) coordination of target-assignment operations, such as as-

signing inventory pods to warehouse robots that are not carrying inventory pods or assigning

empty storage locations to warehouse robots that have picked up inventory pods from the in-

ventory stations in an Amazon Robotics automated warehouse system, (2) coordination of path-

planning operations, such as planning paths for warehouse robots to pick up inventory pods

assigned to them or to deliver inventory pods to the destinations, while avoiding that they collide

with each other, in an Amazon Robotics automated warehouse system, and (3) repeating these

target-assignment and path-planning operations if long-term coordination is required, such as

repeatedly assigning inventory pods to warehouse robots and plan paths for them to deliver the

inventory pods when they finish delivering their current inventory pods in an Amazon Robotics

automated warehouse system. Making good decisions for these components requires solving the

combined target-assignment and path-planning problem.

Many recent approaches in the artificial intelligence (AI) community have concentrated

on studying a one-shot multi-agent path-planning problem called Multi-Agent Path Finding

(MAPF) (Ma & Koenig, 2017; Ma, Koenig, et al., 2016). The problem of MAPF is to find

collision-free paths for a given set of agents from their given start locations to their given (preas-

signed) targets in a given environment. While MAPF techniques can potentially be generalized

to both the one-shot and the long-term combined target-assignment and path-planning problems

for these applications of multi-agent systems, three central questions remain: Question 1: How

hard is it to jointly assign targets to and plan paths for teams of agents? Question 2: How and

how well can one jointly assign targets to and plan paths for teams of agents? Question 3: How

do teams of agents execute the computed solutions? This dissertation addresses these three cen-

tral questions by evaluating the following hypothesis: Formalizing and studying new variants

of MAPF can result in new theoretical insights into or new algorithms for the one-shot

and the long-term combined target-assignment and path-planning problems for teams of

agents, which can benefit real-world applications of multi-agent systems.

3

To validate this hypothesis, we present four contributions: First, we introduce a unified NP-

hardness proof structure that can be used to derive computational complexity results for different

target-assignment and path-planning problems, stemming from formalizing and studying a new

variant of MAPF, called Package-Exchange Robot Routing (PERR), and its generalizations. Sec-

ond, we formalize and study a new variant of MAPF, called Combined Target Assignment and

Path Finding (TAPF), that models the one-shot combined target-assignment and path-planning

problem, and present complete and optimal algorithms for solving it. Third, we formalize and

study a new variant of MAPF, called Multi-Agent Pickup and Delivery (MAPD), that models

the long-term combined target-assignment and path-planning problem, and present MAPD al-

gorithms that utilize environmental characteristics to guarantee long-term robustness. Fourth,

we demonstrate how MAPD algorithms can take kinematic constraints of real-world agents into

account to compute plan-execution schedules that allow for the safe execution of the computed

solutions, thus showcasing the benefits of these algorithms for real-world applications of multi-

agent systems.

1.1 Definitions, Categorizations, and Assumptions

We now define the technical terms, categorize the problems and algorithms, and state the as-

sumptions of this dissertation.

One-shot and Long-Term Coordination The coordination of target-assignment and path-

planning operations can be either one-shot or long-term. In this dissertation, for one-shot co-

ordination, we formalize and study TAPF, where each agent needs to get assigned a target, plan

a path to the target without collisions with other agents, and stay at the target. For long-term

coordination, we formalize and study MAPD, where each agent needs to get assigned a task that

consists of two targets, namely the pickup location and the delivery location, move first to the

pickup location and then to the delivery location of the task without collisions with other agents,

and continue to attend to new tasks afterward.

4

Teams Agents in the combined target-assignment and path-planning problem are partitioned

into teams. Agents in the same team have the same capability and are allowed to exchange their

assigned targets or tasks, while agents in different teams are not allowed to do so.

Target-Assignment and Path-Planning Operations The combined target-assignment and

path-planning problem involves some or all of the following three components:

• Component 1: Determining targets that the agents each needs to visit (one-shot target-

assignment sub-problem).

• Component 2: Planning paths for the agents from their current locations to their targets

in a way such that the agents do not collide with each other (one-shot path-planning

sub-problem).

• Component 3 (for long-term coordination): Repeatedly solving the above one-shot

target-assignment and path-planning sub-problems as the agents need to execute new tasks

after finishing their current ones, which requires a mechanism to coordinate the interplay

of Components 1 and 2.

Many recent approaches in the AI community have concentrated on tackling only Component 2

by studying MAPF. The problem of MAPF is to find collision-free paths for a given set of agents

from their given start locations to their given targets in a given known environment (modeled as

a graph). The quality of a solution is often measured by either the makespan (the maximum of

the arrival times of all agents at their targets) or the flowtime (the sum of the arrival times of

all agents at their targets). Many MAPF algorithms can utilize environmental characteristics

(special properties of the graph that models the given environment) to guarantee that they are

complete for some classes of MAPF problem instances (Cap, Vokrınek, & Kleiner, 2015; Turpin,

Mohta, Michael, & Kumar, 2014; Wang & Botea, 2011; Yu, 2017). The one-shot and the

long-term target-assignment problems have been studied as various multi-robot task-allocation

problems (see the taxonomy by Gerkey and Mataric (2004)) in the robotics community. The one-

shot target-assignment problem corresponds to ST-SR-IA (Single-Task Robots, Single-Robot

5

Tasks, Instantaneous Assignment). The long-term target-assignment problem corresponds to

ST-SR-TA (Single-Task Robots, Single-Robot Tasks, Time-Extended Assignment). The one-

shot combined target-assignment and path-planning problem TAPF studied in this dissertation

addresses Components 1 and 2 jointly. It assumes that all targets to be assigned are known a

priori. It is thus offline. For TAPF, we focus on complete TAPF algorithms, which always

return a solution if one exists, and optimal TAPF algorithms, which are complete and always

return a solution with the smallest makespan. The long-term combined target-assignment and

path-planning problem MAPD studied in this dissertation addresses Components 1-3 jointly.

It assumes that not all tasks are known a priori and new tasks can appear as time goes by. It

is thus online. It is NP-hard to solve optimally even if all tasks are known a priori (Brucker,

2010), and there does not exist any online algorithm that computes optimal solutions (Azar,

Naor, & Rom, 1995; Kalyanasundaram & Pruhs, 1993). Therefore, for MAPD, we focus on

long-term robustness, an analogy to completeness for one-shot problems: A long-term robust

MAPD algorithm always returns a solution, if one exists, that finishes a finite set of tasks in a

finite amount of time. Intuitively, a long-term robust MAPD algorithm avoids deadlocks, where

all agents stay idle (because they block each other) and cannot execute the remaining tasks, and

livelocks, where the agents constantly move but do not makes progress toward finishing their

tasks. We refer to both deadlocks and livelocks as deadlocks throughout this dissertation.

Kinematic Constraints For the definitions of target-assignment and path-planning problems,

we assume a given undirected graph whose vertices model the locations in the environment,

point agents that each occupies a vertex at a discrete time step, uniform edge lengths (un-

weighted edges), and synchronized agent movements. For the execution of a given solution

of these target-assignment and path-planning problems, we consider essential kinodynamic con-

straints (Kavraki & LaValle, 2016), namely first-order dynamic constraints (temporal constraints,

such as velocity limits) and kinematic constraints (geometric constraints, such as agent sizes and

placement of obstacles in the environment), of real-world agents, for example, mobile robots.

We do not distinguish first-order dynamic constraints and kinematic constraints and refer to them

6

as kinematic constraints throughout this dissertation. We ignore other kinodynamic constraints

and assume that they are handled by the controllers of real-world agents.

1.2 Central Questions

State-of-the-art MAPF algorithms can compute collision-free paths for hundreds of agents in

minutes (Ma et al., 2018a). However, while MAPF algorithms are potentially applicable to solv-

ing both the one-shot and the long-term combined target-assignment and path-planning prob-

lems, three central questions remain:

• Question 1: How hard is it to jointly assign targets to and plan paths for teams of agents?

There has not been any comprehensive study yet on the complexity of solving either the

one-shot or the long-term combined target-assignment and path-planning problem. Some

results are known for related problems: Finding an optimal solution to the one-shot multi-

agent path-planning problem MAPF is NP-hard (Goldreich, 2011; Ratner & Warmuth,

1986; Surynek, 2010; Yu & LaValle, 2013c). For the special case of one team of agents,

the one-shot combined target-assignment and path-planning problem can be solved op-

timally in polynomial time (Yu & LaValle, 2013a). However, none of these existing

results applies to either the one-shot or the long-term combined target-assignment and

path-planning problem in general.

• Question 2: How and how well can one jointly assign targets to and plan paths for teams

of agents?

Solving the combined target-assignment and path-planning problem requires solving the

target-assignment and the path-planning sub-problems. MAPF algorithms solve the one-

shot path-planning sub-problem (Felner et al., 2017; Ma & Koenig, 2017; Ma, Koenig,

et al., 2016; Stern et al., 2019). But they assume that the assignment of targets to agents is

fixed and do not consider different target assignments. Target-assignment algorithms solve

either the one-shot (Bertsekas, 1992; Garfinkel, 1971; Gross, 1959; Kuhn, 1955; Shapley

7

& Shubik, 1971) or the long-term (Azar et al., 1995; Kalyanasundaram & Pruhs, 1993;

Khuller, Mitchell, & Vazirani, 1994) target-assignment sub-problem. But they do not

consider actual paths of agents when assigning targets, and the resulting target assignment

is not optimal for the combined target-assignment and path-planning problem. Algorithms

that solve the target-assignment and the path-planning sub-problems jointly can potentially

determine solutions of higher quality than a trivial combination of individual solutions

for all sub-problems (Srivastava et al., 2014; Turpin, Mohta, et al., 2014). However, the

following issues need to be addressed to generalize MAPF algorithms to solving the target-

assignment and the path-planning sub-problems jointly:

– How and how well do the good theoretical properties of MAPF algorithms carry over

to algorithms that solve the combined target-assignment and path-planning problem?

For example, an optimal MAPF solution may not be optimal for the combined prob-

lem.

– How and how well can such algorithms exploit the combinatorial structure of the

combined target-assignment and path-planning problem? For example, it is not

known how much better/worse or faster/slower such algorithms can solve the target-

assignment and the path-planning sub-problems jointly than trivially combining in-

dividual solutions for the sub-problems.

Specifically, for the one-shot combined problem, it remains unclear how one can general-

ize target-assignment and MAPF algorithms there. Moreover, while, for the special case

of one team of agents, the one-shot combined problem can be solved with a max-flow

algorithm (Yu & LaValle, 2013a), it remains unclear how to generalize this algorithm to

the general case of multiple teams of agents. For the long-term combined problem, it

remains unclear whether and how the completeness of a MAPF algorithm can be gener-

alized to the long-term robustness of an algorithm for the long-term combined problem.

Specifically, while some MAPF algorithms can utilize environmental characteristics to

guarantee completeness for some classes of MAPF problem instances (Cap et al., 2015;

8

Turpin, Mohta, et al., 2014; Wang & Botea, 2011; Yu, 2017), it remains unclear whether

and how algorithms for the long-term combined problem can guarantee long-term robust-

ness, for example, by also utilizing environmental characteristics. To summarize, while

existing target-assignment algorithms and path-planning algorithms provide some promis-

ing directions, none of them is directly applicable to solving the one-shot or the long-term

combined target-assignment and path-planning problem.

• Question 3: How do teams of agents execute the computed solutions?

There has not been any study of how to take kinematic constraints of teams of real-world

agents into account to let them safely execute solutions for the combined target-assignment

and path-planning problem. Existing MAPF algorithms can use the polynomial-time pro-

cedure MAPF-POST, developed in our recent research (Honig, Kumar, Cohen, et al.,

2016) (not covered as a contribution in this dissertation), in a post-processing step to trans-

form a MAPF solution into a plan-execution schedule that takes kinematic constraints of

real-world agents into account, which allows the agents to execute the MAPF solution

safely. It remains unclear how the agents can execute the computed solutions to the com-

bined target-assignment and path-planning problem and whether MAPF-POST can also be

used to transform the computed solutions into plan-execution schedules for the combined

target-assignment and path-planning problem.

1.3 Hypothesis and Contributions

The hypothesis of this dissertation is the following:

Formalizing and studying new variants of Multi-Agent Path Finding (MAPF) can

result in new theoretical insights into or new algorithms for the one-shot and the

long-term combined target-assignment and path-planning problems for teams of

agents, which can benefit real-world applications of multi-agent systems.

9

This dissertation makes four contributions to validate the above hypothesis, which leverages

insights and tools from (1) operations research and theoretical computer science to characterize

the computational complexity and capture the combinatorial structure of the new variants of

MAPF that model the one-shot and the long-term combined target-assignment and path-planning

problems, (2) AI to develop new algorithms that solve these new variants of MAPF by exploiting

their combinatorial structure, and (3) robotics to take kinematic constraints of real-world agents

into account and guarantee long-term robustness for the long-term combined target-assignment

and path-planning problem by utilizing environmental characteristics.

• Contribution 1: To validate the hypothesis that formalizing and studying new variants

of MAPF can result in new theoretical insights into the one-shot and the long-term com-

bined target-assignment and path-planning problems, we introduce a unified NP-hardness

proof structure that can easily be used to derive computational complexity results for these

and many other target-assignment and path-planning problems. This NP-hardness proof

structure stems from formalizing and studying a new variant of MAPF, called PERR.


of MAPF can result in new algorithms for the one-shot combined target-assignment and

path-planning problem, which can benefit real-world applications of multi-agent systems,

we formalize and study a new variant of MAPF, called TAPF, that models both the target-

assignment and the path-planning sub-problems in a one-shot combined problem formu-

lation. We present complete and optimal TAPF algorithms. These TAPF algorithms can

use MAPF-POST to transform their solutions into plan-execution schedules that allow for

the safe execution of their solutions on real-world agents.


of MAPF can result in new algorithms for the long-term combined target-assignment and

path-planning problem, we formalize and study a new variant of MAPF, called MAPD,

that models both the target-assignment and the path-planning sub-problems in a long-term

10

combined problem formulation. We present MAPD algorithms that utilize environmental

characteristics to guarantee long-term robustness.

• Contribution 4: To validate the hypothesis that MAPD algorithms can benefit real-world

applications of multi-agent systems, we present two methods to take kinematic constraints

of real-world agents into account. First, we present an adapted MAPD algorithm that takes

kinematic constraints of real-world agents into account during planning and still guaran-

tees its long-term robustness. Second, as a baseline method, our MAPD algorithms can

use MAPF-POST to transform their solutions into plan-execution schedules that take kine-

matic constraints of real-world agents into account. We conduct a case study of MAPD

with kinematic constraints in a simulated automated warehouse system, where we demon-

strate the benefits of our methods using both an agent simulator and a standard robot

simulator.

Table 1.1 provides a summary of the problem formulations and contributions of this disser-

tation. In the following, we describe the contributions in detail.

1.3.1 Contribution 1: Theoretical Analysis of Target Assignment and Path Plan-

ning

Many theoretical results are known for the one-shot path-planning problem MAPF. The solvabil-

ity of a MAPF problem instance can be determined and a MAPF solution, if there exists one, can

be found in polynomial time (Kornhauser, Miller, & Spirakis, 1984; Roger & Helmert, 2012).

However, it is NP-hard to solve MAPF optimally (Goldreich, 2011; Ratner & Warmuth, 1986;

Surynek, 2010; Yu & LaValle, 2013c). Many theoretical results are also known for both the one-

shot and the long-term target-assignment problems. On one hand, special cases of the one-shot

target-assignment problem, including the classic assignment problem (Bertsekas, 1992; Kuhn,

1955; Shapley & Shubik, 1971) and the bottleneck assignment problem (Fulkerson, Glicks-

berg, & Gross, 1953; Garfinkel, 1971; Gross, 1959), can be solved optimally in polynomial

time. On the other hand, the long-term target-assignment problem is NP-hard to solve optimally

11

Table 1.1: Summary of problem formulations and contributions.

MAPF PERR TAPF MAPDnovelty of

formulationnot novel novel [Contribution 1] novel [Contribution 2] novel [Contribution 3]

coordination type one-shot one-shot one-shot long-termoperation type path-planning path-planning target-assignment and path-planning target-assignment and path-planning

known complexityresults

NP-hard -polynomial-time solvable

for one team of agents,NP-hard for single-agent teams

-

new complexityresults

[Contribution 1]

fixed-parameter inapproximable(for makespan minimization)

fixed-parameter inapproximable(for makespan minimization),

NP-hard (for flowtime minimization)

fixed-parameter inapproximablefor more than one team of agents

fixed-parameter inapproximable(for makespan minimization),

NP-hard (for service time minimization)example

algorithmsILP, CBS ILP, Adapted CBS ILP, CBM [Contribution 2]

TP, TPTS, CENTRAL [Contribution 3],TP-SIPPwRT [Contribution 4]

properties ofalgorithms

complete, optimal complete, optimalcomplete, optimal[Contribution 2]

long-term robust for well-formedproblem instances

[Contributions 3&4]how to include

kinematicconstraints?

post-process via MAPF-POST - post-process via MAPF-POSTpost-process via MAPF-POST,

include directly via TP-SIPPwRT[Contribution 4]

12

in general (Brucker, 2010). It is also known that the special case of the one-shot combined

target-assignment and path-planning problem for one team of agents can be solved optimally in

polynomial time (Yu & LaValle, 2013a). However, none of these existing results are applica-

ble to the general case of either the one-shot or the long-term combined target-assignment and

path-planning problem for multiple teams of agents.

Therefore, we study the relationships between different target-assignment and path-planning

problems and introduce a unified NP-hardness proof structure for these problems, which stems

from formalizing and studying a new variant of MAPF, called PERR. In PERR, each agent

starts in a given start location and carries a package that needs to be delivered to a given goal

location (preassigned target). Packages can be reassigned to agents in a proactive way—two

agents in adjacent locations can exchange their packages, thereby exchanging their targets as

well. We prove that PERR is NP-hard to approximate within any constant factor less than 4/3

for makespan minimization and to solve optimally for flowtime minimization by using our uni-

fied NP-hardness proof structure to establish a reduction from an NP-complete version of the

Boolean satisfiability problem called ≤3,=3-SAT (Tovey, 1984) to PERR. Studying PERR also

lays the theoretical foundation for studying other target-assignment and path-planning problems,

including MAPF, TAPF, and MAPD. For example, we demonstrate how our unified NP-hardness

proof structure can be directly applied to proving the NP-hardness of optimally solving MAPF

and MAPD for different objectives with a reduction from the same Boolean satisfiability prob-

lem ≤3,=3-SAT. We demonstrate how to derive similar NP-hardness results for TAPF by using

our unified NP-hardness proof structure to establish a reduction from a newly constructed NP-

complete version of the Boolean satisfiability problem called 2/2/3-SAT to TAPF. Notably, we

provide the first fixed-parameter inapproximability results for both MAPF and TAPF and the

first NP-hardness results for MAPD, thereby improving the state of the art.

1.3.2 Contribution 2: One-Shot Combined Target Assignment and Path Planning

We consider the one-shot combined target-assignment and path-planning problem, where teams

of agents must assign targets among themselves and plan collision-free paths to their assigned

13

targets. We formalize and study TAPF, that models this one-shot combined target-assignment

and path-planning problem for multiple teams of agents. In TAPF, each team is given the same

number of targets as there are agents in the team. The problem of TAPF is to assign the targets of

each team to agents in the same team and plan collision-free paths for the agents to their targets

so that each agent reaches exactly one target and each target is reached by an agent. Existing

research has considered only two extremes of this one-shot combined target-assignment and

path-planning problem. On one hand, the special case of TAPF with one team of agents can be

solved optimally in polynomial time (Yu & LaValle, 2013a) using a max-flow algorithm. On

the other hand, MAPF algorithms (Felner et al., 2017; Ma & Koenig, 2017; Ma, Koenig, et al.,

2016; Stern et al., 2019) assume that each agent forms a single-agent team. It remains unclear

how and how well one can solve the general case of TAPF with multiple teams of agents.

Therefore, we present a complete and optimal TAPF algorithm called Conflict-Based Min-

Cost Flow (CBM). CBM breaks TAPF down to the NP-hard sub-problem of coordinating differ-

ent teams of agents and the polynomial-time solvable sub-problems of coordinating the agents

in every team. It then tackles these sub-problems by using a combination of a combinatorial

search algorithm called Conflict-Based Search (Sharon, Stern, Felner, & Sturtevant, 2015) for

the NP-hard sub-problem and a min-cost max-flow algorithm (Goldberg & Tarjan, 1987) for

the polynomial-time solvable sub-problems. We also present an Integer Linear Programming

(ILP) based TAPF algorithm that solves an ILP encoding resulting from a reduction of TAPF

to the integer multi-commodity flow problem. Experimentally, we demonstrate that CBM can

compute solutions faster than the ILP-based algorithm, indicating better exploitation of the com-

binatorial structure of TAPF. We also demonstrate that CBM can compute solutions for more

than 400 agents in minutes of runtime, showcasing its potential for large-scale multi-agent sys-

tems. Our TAPF algorithms can use the existing polynomial-time procedure MAPF-POST in

a post-processing step to transform their solutions into plan-execution schedules that take kine-

matic constraints of teams of real-world agents into account and thus allow for the safe execution

of their solutions on these agents. These results showcase the benefits of our TAPF algorithms

14

for the one-shot coordination of target-assignment and path-planning operations for real-world

applications of multi-agent systems.

1.3.3 Contribution 3: Long-Term Target Assignment and Path Planning

We consider the long-term combined target-assignment and path-planning problem, where teams

of agents must assign tasks among themselves, plan collision-free paths to the targets of their

assigned tasks, and repeatedly attend to new tasks after finishing their current tasks. We formal-

ize and study MAPD, that models this long-term combined target-assignment and path-planning

problem. In MAPD, agents have to attend to a stream of tasks that each consists of two targets,

the pickup location and the delivery location. The system changes dynamically as each task is

added to system at an unknown time. An agent that is currently not executing any task can be

assigned an unexecuted task. To execute the task, the agent has to first move from its current

location to the pickup location of the task and then from there to the delivery location of the

task, while avoiding collisions with other agents. Existing research has considered the one-shot

target-assignment (Bertsekas, 1992; Garfinkel, 1971; Gross, 1959; Kuhn, 1955; Shapley & Shu-

bik, 1971), the one-shot path-planning (Felner et al., 2017; Ma & Koenig, 2017; Ma, Koenig,

et al., 2016; Stern et al., 2019), and the long-term target-assignment (Azar et al., 1995; Kalyana-

sundaram & Pruhs, 1993; Khuller et al., 1994) problems. However, it remains unclear how and

how well one can generalize the existing results to the long-term combined target-assignment

and path-planning problem MAPD and how one can guarantee long-term robustness for MAPD.

Therefore, we demonstrate how MAPD algorithms can utilize environmental characteristics

to guarantee long-term robustness for well-formed MAPD problem instances, a class of MAPD

problem instances that are realistic for many real-world applications of multi-agent systems.

We design MAPD algorithms that allow agents to rest (that is, stay for a long period) only in

locations where they cannot block other agents, thereby avoiding deadlocks. This is inspired

by warehouse robots that are only allowed to charge batteries or pick up and drop off inventory

pods in locations where they cannot block other robots. Specifically, we present two decoupled

MAPD algorithms, TP and TPTS, that assign a task to and plan a path for one agent at a time and

15

one centralized MAPD algorithm, CENTRAL, that assigns tasks to and plans paths for multiple

agents at a time. These MAPD algorithms break MAPD down to a sequence of one-shot target-

assignment and one-shot path-planning sub-problems in chronological order and then apply one-

shot target-assignment and one-shot path-planning algorithms to solve them. We prove that these

MAPD algorithms can solve the one-shot target-assignment and one-shot path-planning sub-

problems without backtracking in time, while avoiding deadlocks, for all well-formed MAPD

problem instances. They are thus long-term robust for well-formed MAPD problem instances.

We compare them experimentally with up to 500 agents and 1,000 tasks, thus showcasing their

potential for the long-term coordination of target-assignment and path-planning operations for

large-scale multi-agent systems.

1.3.4 Contribution 4: Long-Term Target Assignment and Path Planning with

Kinematic Constraints

We consider generating plan-execution schedules for MAPD that take kinematic constraints of

real-world agents into account. Realistic simulations of multi-agent systems with kinematic con-

straints provide important test beds for demonstrating the practicability of multi-agent coordina-

tion algorithms for their real-world applications. However, existing target-assignment and path-

planning problems do not model kinematic constraints. For example, they assume discrete agent

movements with uniform velocity and ignore the sizes and the velocities of real-world agents.

The resulting solutions may be unsafe for real-world agents to execute. MAPF and TAPF algo-

rithms can use the existing polynomial-time procedure MAPF-POST in a post-processing step

to take kinematic constraints into account, thus transforming their solutions into plan-execution

schedules that can be safely executed by real-world agents. However, it remains unclear whether

and, if so, how MAPD algorithms can take kinematic constraints into account to compute such

plan-execution schedules.

Therefore, we conduct a case study of MAPD with kinematic constraints of real-world agents

in a simulated automated warehouse system. We demonstrate how MAPD algorithms can be

made to produce kinematically feasible plan-execution schedules, showcasing their generality

16

for real-world applications of multi-agent systems. In particular, we present two methods: First,

we adapt one of our MAPD algorithms, TP, using a novel one-shot single-agent path-planning

algorithm SIPPwRT that computes paths with continuous agent movements with given veloc-

ities. The resulting algorithm TP-SIPPwRT directly takes kinematic constraints of real-world

agents into account during planning, computes plan-execution schedules for the agents, provides

guaranteed user-specified safety distances between the agents, and remains long-term robust for

well-formed MAPD problem instances. Second, as a baseline method, MAPD algorithms can

use the existing polynomial-time procedure MAPF-POST to transform their solutions into plan-

execution schedules in a post-processing step. We compare the two methods and demonstrate

their benefits for real-world applications of multi-agent systems using both an agent simulator

and a standard robot simulator. For example, we demonstrate that TP-SIPPwRT can compute so-

lutions for 250 agents and 2,000 tasks in seconds of runtime. These results showcase the benefits

of our MAPD algorithms for the long-term coordination of target-assignment and path-planning

operations for real-world applications of multi-agent systems.

1.4 Dissertation Outline

The outline of this dissertation is as follows: In Chapter 2, we provide formal definitions of the

problems studied in this dissertation. In Chapter 3, we provide an in-depth overview of differ-

ent target-assignment and path-planning problems. In Chapter 4, we introduce a unified NP-

hardness proof structure that lays the theoretical foundations for problems covered in this disser-

tation. In Chapter 5, we study TAPF, that models the one-shot coordination of target-assignment

and path-planning operations. In Chapter 6, we study MAPD, that models the long-term co-

ordination of target-assignment and path-planning operations. In Chapter 7, we study MAPD

with kinematic constraints of real-world agents. In Chapter 8, we conclude the dissertation by

summarizing our contributions and providing an outline of possible future work.

17

Chapter 2

Problem Definitions

In this chapter, we formalize different target-assignment and path-planning problems on a graph

whose vertices model the locations in the environment and whose edges model the connections

between locations. We formally define MAPF in Section 2.1 and Anonymous MAPF in Sec-

tion 2.2. We then formally define three novel target-assignment and path-planning problems,

TAPF in Section 2.3, PERR in Section 2.4, and MAPD in Section 2.5, that we study in the later

chapters of this dissertation. We formalize these problems as variants of MAPF, discuss their

relationships to MAPF and between themselves, and give examples of their problem instances.

Finally, we conclude the chapter in Section 2.6.

2.1 Problem Definition of MAPF

We now formalize the problem of Multi-Agent Path Finding (MAPF). A MAPF problem instance

consists of:

• A given finite connected undirected graph G = (V,E), whose vertices V correspond to

locations and whose edges E correspond to connections between locations that the agents

can move along.

18

• A given set of M agents {ai|i ∈ [M ]}.1 Each agent ai has a start vertex si ∈ V and a

goal vertex gi ∈ V (that represents the preassigned target). All start vertices are pairwise

different. All goal vertices are also pairwise different.

At each discrete time step, each agent ai either moves to an adjacent vertex or waits at the

same vertex. Let πi(t) denote the vertex occupied by agent ai at time step t = 0, . . . ,∞.

Definition 2.1. A path πi = 〈πi(0), πi(1), . . . , πi(Ti), πi(Ti + 1), . . .〉 for agent ai satisfies the

following conditions:

1. The agent starts at its start vertex, that is, πi(0) = si.

2. The agent ends at its goal vertex at the arrival time Ti, which is the minimal time step Ti

such that, for all time steps t = Ti, . . . ,∞, πi(t) = gi.

3. The agent always either moves to an adjacent vertex or does not move between two con-

secutive time steps, that is, for all time steps t = 0, . . . ,∞, (πi(t), πi(t + 1)) ∈ E or

πi(t+ 1) = πi(t).

In this dissertation, path πi is written in its short form 〈πi(0), πi(1), . . . , πi(Ti)〉 (agent ai

continues to occupy the last vertex of its path after the arrival time).

Definition 2.2. A vertex collision is a tuple 〈ai, aj , v, t〉, where agents ai and aj occupy the same

vertex v = πi(t) = πj(t) at time step t. An edge collision is a tuple 〈ai, aj , u, v, t〉, where agents

ai and aj traverse the same edge (u, v), where u = πi(t) = πj(t+1) and v = πj(t) = πi(t+1),

in opposite directions between time steps t and t+ 1.

Definition 2.3. A MAPF plan consists of a path πi assigned to each agent ai. A MAPF solution

is a MAPF plan whose paths are collision-free.

Definition 2.4. The makespan maxi∈[M ] Ti of a MAPF plan is the maximum of the arrival times

of all agents at their goal vertices.

1We let [M ] denote the positive integer set {1, . . . ,M}.

19

a2

a1

Figure 2.1: Example of a MAPF problem instance.

A C

B

D

E

Figure 2.2: Graph representation of the MAPF problem instance shown in Figure 2.1.

The problem of MAPF is to find a solution with the smallest makespan. Some existing

research on MAPF also aims to minimize the flowtime (Felner et al., 2017).

Definition 2.5. The flowtime∑

i∈[M ] Ti of a MAPF plan is the sum of the arrival times of all

agents at their goal vertices.

MAPF is NP-hard to solve optimally for both makespan minimization (Surynek, 2010) and

flowtime minimization (Yu & LaValle, 2013c). The optimal makespan and optimal flowtime of

any MAPF problem instance are both bounded by O(|V |3) (Yu & Rus, 2015).

2.1.1 MAPF Example

Figure 2.1 shows an example of a MAPF problem instance on a 2D 4-neighbor grid. White cells

are traversable, and black cells are blocked. Colored circles are agents in their given start cells.

20

A hatched circle is placed in the given goal cell of each agent in the same color as the agent.

Figure 2.2 shows the corresponding graph representation of the 2D 4-neighbor grid. Agent a1

with s1 = B and g1 = D and agent a2 with s2 = A and g2 = E are given. An optimal solution

is {π1 = 〈B,C,D〉, π2 = 〈A,A,C,E〉} with makespan 3.

2.2 Problem Definition of Anonymous MAPF

We now formalize the problem of Anonymous MAPF. An Anonymous MAPF problem instance

consists of:

• A given finite connected undirected graph G = (V,E), whose vertices V correspond to


can move along.

• A given set of M agents {ai|i ∈ [M ]} and M targets g1, . . . , gM ∈ V . Each agent ai has

a start vertex si ∈ V . All start vertices are pairwise different. All targets are also pairwise

different.

Definition 2.6. An assignment of targets to agents is a one-to-one mapping σ, determined by a

permutation of [M ], that maps each agent ai to a target gi′ = σ(ai).

An agent can be assigned any one of the targets as long as every target is assigned to and

reached by an agent eventually. Therefore, the agents are interchangeable and thus “anony-

mous”.


same vertex. Let πi(t) denote the vertex occupied by agent ai at time step t = 0, . . . ,∞.


following conditions:

1. The agent starts at its start vertex, that is, πi(0) = si.

21

2. The agent ends at its assigned target at the arrival time Ti, which is the minimal time step

Ti such that, for all time steps t = Ti, . . . ,∞, πi(t) = gi′ = σ(ai).


secutive time steps, that is, for all time steps t = 0, . . . ,∞, (πi(t), πi(t + 1)) ∈ E or

πi(t+ 1) = πi(t).

In this dissertation, path πi is written in its short form 〈πi(0), πi(1), . . . , πi(Ti)〉 (agent ai


Definition 2.8. A vertex collision is a tuple 〈ai, aj , v, t〉, where agents ai and aj occupy the same

vertex v = πi(t) = πj(t) at the same time step t. An edge collision is a tuple 〈ai, aj , u, v, t〉,

where agents ai and aj traverse the same edge (u, v), where u = πi(t) = πj(t + 1) and v =

πj(t) = πi(t+ 1), in opposite directions between time steps t and t+ 1.

Definition 2.9. An Anonymous MAPF plan consists of an assignment σ of targets to agents and

a path πi assigned to each agent ai. An Anonymous MAPF solution is an Anonymous MAPF

plan whose paths are collision-free.

The assignment σ of an Anonymous MAPF plan is often not written explicitly since it can

be inferred from the paths of the plan.

Definition 2.10. The makespan maxi∈[M ] Ti of an Anonymous MAPF plan is the maximum of

the arrival times of all agents at their assigned targets.

The problem of Anonymous MAPF is to find a solution with the smallest makespan. It can

be solved optimally in polynomial time (Yu & LaValle, 2013a). All Anonymous MAPF problem

instances are solvable, and there always exists a solution with makespan no larger thanM+|V |−

1 for any Anonymous MAPF problem instance (Yu & LaValle, 2013a). The optimal makespan

of any Anonymous MAPF problem instance is thus bounded from above by M + |V | − 1.

22

a2

a1

Figure 2.3: Example of an Anonymous MAPF problem instance.

A C

B

D

E

Figure 2.4: Graph representation of the Anonymous MAPF problem instance shown in Fig-ure 2.3.

2.2.1 Anonymous MAPF Example

Figure 2.3 shows an example of an Anonymous MAPF problem instance on a 2D 4-neighbor

grid. White cells are traversable, and black cells are blocked. Blue circles are agents in their

given start cells. A blue hatched circle is placed in each target cell. Figure 2.4 shows the

corresponding graph representation of the 2D 4-neighbor grid. Agent a1 with s1 = B and

agent a2 with s2 = A are given. Two targets g1 = D and g2 = E are given. An optimal solution

is {π1 = 〈B,C,E〉, π2 = 〈A,A,C,D〉} (where σ(a1) = g2 and σ(a2) = g1) with makespan 3.

23

2.3 Problem Definition of TAPF

We now formalize the problem of Target Assignment and Path Finding (TAPF). A TAPF problem

instance consists of:

1. A given finite connected undirected graph G = (V,E), whose vertices V correspond to


can move along.

2. A given set of M agents that are partitioned into K disjoint teams team1, . . . , teamK .

Each team teamk consists ofMk agents ak1, . . . , akMk

and is givenMk targets gk1 , . . . , gkMk

,

where M =∑K

k=1Mk. Each agent akj has a unique start vertex skj . All start vertices are

pairwise different. All targets are also pairwise different.

Definition 2.11. An assignment of targets of team teamk to agents in the same team is a one-to-

one mapping σk, determined by a permutation of [Mk], that maps each agent akj in team teamk

to a target gkj′ = σk(akj ) of the same team.

An agent in a team can be assigned any one of the targets of the same team as long as every

target of the team is assigned to and reached by an agent in the team eventually. An agent in a

team cannot be assigned a target of a different team.

At each discrete time step, each agent akj either moves to an adjacent vertex or waits at the

same vertex. Let πkj (t) denote the vertex occupied by agent akj at time step t = 0, . . . ,∞.

Definition 2.12. A path πkj = 〈πkj (0), πkj (1), . . . , πkj (T kj ), πkj (T kj +1), . . .〉 for agent akj satisfies

the following conditions:

1. The agent starts at its start vertex, that is, πkj (0) = skj .

2. The agent ends at its assigned target at the arrival time T kj , which is the minimal time T kj

such that, for all time steps t = T kj , . . . ,∞, πkj (t) = gkj′ = σk(akj ).

24


secutive time steps, that is, for all time steps t = 0, . . . ,∞, (πkj (t), πkj (t + 1)) ∈ E or

πkj (t+ 1) = πkj (t).

In this dissertation, path πkj is written in its short form 〈πkj (0), πkj (1), . . . , πkj (T kj )〉 (agent akj


Definition 2.13. A vertex collision between an agent akj in team teamk and a different agent

ak′j′ in team teamk′ is a tuple 〈teamk, teamk′ , v, t〉, where agents akj and ak

′j′ occupy the same

vertex v = πkj (t) = πk′j′ (t) at the same time step t. An edge collision between an agent akj

in team teamk and a different agent ak′j′ in team teamk′ is a tuple 〈teamk, teamk′ , u, v, t〉,

where agents akj and ak′j′ traverse the same edge (u, v), where u = πkj (t) = πk

′j′ (t + 1) and

v = πk′j′ (t) = πkj (t + 1), in opposite directions between time steps t and t + 1. A collision can

occur between two agents in the same team (in this case, k = k′) and between two agents in

different teams (in this case, k 6= k′).

Definition 2.14. A TAPF plan consists of an assignment σk of targets to agents for each team

teamk and a path πkj assigned to each agent akj in each team teamk. A TAPF solution is a TAPF

plan whose paths are collision-free.

The assignments σk, for all k ∈ K, of a TAPF plan are often not written explicitly since they

can be inferred from the paths of the plan.

Definition 2.15. Given paths for all agents in team teamk, the team cost maxj∈[Mk] Tkj of team

teamk is the maximum of the arrival times of all agents in the team at their targets.

Definition 2.16. The makespan maxk∈[K],j∈[Mk] Tkj of a TAPF plan is the maximum of the

arrival times of all agents at their assigned targets.

The problem of TAPF is to find a solution with the smallest makespan.

2.3.1 TAPF as a Generalization of MAPF

TAPF generalizes both Anonymous MAPF and (Non-Anonymous) MAPF:

25

a21 a22

a11

Figure 2.5: Example of a TAPF problem instance.

• Anonymous MAPF [K = 1] (also called Permutation-Invariant (Kloder & Hutchinson,

2006) or Unlabeled MAPF (Solovey & Halperin, 2016)) results from TAPF if only one

team exists that consists of all M agents. It is called “anonymous” because a target can be

assigned to any agent, and the agents are thus interchangeable.

• (Non-Anonymous) MAPF [K = M ] (often just called MAPF) results from TAPF if

every team consists of exactly one agent and the number of teams is thus equal to the

number of agents. It is called “non-anonymous” because a target can be assigned to only

one specific agent (meaning that the assignments of targets to agents are pre-determined),

and the agents are thus non-interchangeable.

If we compare the problem definition of TAPF to that of MAPF, then we observe that a

MAPF problem instance can be obtained from a TAPF problem instance by fixing the assignment

of targets to agents and setting the goal vertices of the agents corresponding to their assigned

targets. Therefore, TAPF is a generalization of MAPF that allows any assignment of targets of

a team to the agents in the same team. Any solution to a TAPF problem instance is thus also

a solution to its corresponding MAPF problem instance for a suitable assignment of targets to

agents. Since the makespan of any optimal MAPF solution is bounded from above by O(|V |3)

(Yu & Rus, 2015), the makespan of any optimal TAPF solution is also bounded from above by

O(|V |3).

26

A B D

C

E

F

Figure 2.6: Graph representation of the TAPF problem instance shown in Figure 2.5.

2.3.2 TAPF Example

Figure 2.5 shows an example of a TAPF problem instance on a 2D 4-neighbor grid. White

cells are traversable, and black cells are blocked. Colored circles are agents in their given start

cells. Each color represents a team. A hatched circle is placed in a target cell in the color of

its team. Figure 2.6 shows the corresponding graph representation of the 2D 4-neighbor grid.

Two teams team1 (blue) and team2 (green) are given. Team team1 is given target g11 = E

and consists of agent a11 with start vertex C. Team team2 is given targets g21 = D and g22 = F

and consists of agents a21 and a22 with start vertices A and B, respectively. Both teams have

paths of team cost 2 that are collision-free among agents in the same team: {π11 = 〈C,D,E〉}

and {π21 = 〈A,B,D〉, π22 = 〈B,D,F〉}, respectively. However, there is a vertex collision

〈team1, team2,D, 1〉 between agent a11 in team team1 and agent a22 in team team2 for these

paths. An optimal solution is {π11 = 〈C,D,E〉, π21 = 〈A,A,B,D〉, π22 = 〈B,B,D,F〉} (where

σ1(a11) = g11 , σ2(a21) = g21 , and σ2(a22) = g22) with makespan 3.

2.4 Problem Definition of PERR

We now formalize the problem of Package-Exchange Robot Routing (PERR). A PERR problem


27



can move along.

2. A given set of M packages {pi|i ∈ [M ]} and a given set of M agents. Each package

pi has a start vertex si and a goal vertex gi (that represents the preassigned target) and is

carried by an agent initially. All start vertices are pairwise different. All goal vertices are

also pairwise different.

At each discrete time step, each package pi, carried by an agent, either moves to an adjacent

vertex or waits at its current vertex. Let πi(t) denote the vertex occupied by the agent carrying

package pi at time step t = 0, . . . ,∞.

Definition 2.17. A path πi = 〈πi(0), πi(1), . . . , πi(Ti), πi(Ti + 1), . . .〉 for package pi satisfies

the following conditions:

1. The package starts at its start vertex, that is, πi(0) = si.

2. The package ends at its goal vertex at the arrival time Ti, which is the minimal time step

Ti such that, for all time steps t = Ti, . . . ,∞, πi(t) = gi.

3. The package (carried by an agent) always either moves to an adjacent vertex or does not

move, that is, for all time steps t = 0, . . . ,∞, (πi(t), πi(t+ 1)) ∈ E or πi(t+ 1) = πi(t).

In this dissertation, path πi is written in its short form 〈πi(0), πi(1), . . . , πi(Ti)〉 (package pi

continues to be at the last vertex of its path after the arrival time).

Definition 2.18. A vertex collision is a tuple 〈pi, pj , v, t〉, where the agent carrying package pi

and the agent carrying package pj occupy the same vertex v = πi(t) = πj(t) at the same time

step t.

Definition 2.19. A PERR plan consists of a path πi assigned to each package pi. A PERR

solution is a PERR plan whose paths are collision-free.

28

Definition 2.20. The makespan maxi∈[M ] Ti of a PERR plan is the maximum of the arrival times

of all packages at their goal vertices.

The problem of PERR is to find a solution with the smallest makespan.

Our unified NP-hardness proof structure presented in Chapter 4 can be used to derive com-

putational complexity results for both makespan minimization and flowtime minimization.

Definition 2.21. The flowtime∑

i∈[M ] Ti of a PERR plan is the sum of the arrival times of all

packages at their goal vertices.

2.4.1 PERR as a Relaxation of MAPF

In PERR, two packages pi and pj can be exchanged in a single time step by the agents carrying

them when they are at adjacent vertices. If this happens, then pi and pj traverse the same edge

in opposite directions, resulting in πi(t) = πj(t+ 1) and πj(t) = πi(t+ 1).

Definition 2.22. For packages pi with πi(t) = u and pj with πj(t) = v where (u, v) ∈ E, an

exchange operation moves package pi from vertex u to vertex v and package pj from vertex v

to vertex u between time steps t and t+ 1.

If we compare the problem definition of PERR to that of MAPF, PERR is almost identical

to MAPF except that MAPF does not allow exchange operations and treats them as edge colli-

sions (Definition 2.2). If we view the packages that are moved by agents (from their given start

vertices to their given goal vertices) as packages that move by themselves (just like the agents

in MAPF that move from their given start vertices to their given goal vertices), then PERR is

a relaxation of MAPF that permits exchange operations (and thus omits the edge collisions of

MAPF). Therefore, there is a correspondence between a MAPF problem instance on a graph

G = (V,E) that has agents {ai|i ∈ [M ]}, with start vertex si ∈ V and goal vertex gi ∈ V for

each agent ai, and a PERR problem instance on the same graph G = (V,E) that has packages

{pi|i ∈ [M ]}, with start vertex si ∈ V and goal vertex gi ∈ V for each package pi.

29

p2

p1

Figure 2.7: Example of a PERR problem instance.

A C

B

D

E

Figure 2.8: Graph representation of the PERR problem instance shown in Figure 2.7.

2.4.2 PERR Example

Figure 2.7 shows an example of a PERR problem instance on a 2D 4-neighbor grid. It corre-

sponds to the MAPF problem instance shown in Figure 2.1. White cells are traversable, and

black cells are blocked. Colored circles are packages, each carried by an agent, in their given

start cells. A hatched circle is placed in the given goal cell of each package in the same color

as the package. Figure 2.8 shows the corresponding graph representation of the 2D 4-neighbor

grid. Package p1 with s1 = B and g1 = D and package p2 with s2 = A and g1 = E are given.

An optimal solution is {π1 = 〈B,C,D〉, π2 = 〈A,A,C,E〉} with makespan 3.

30

2.4.3 K-PERR

We also formalize a generalization of PERR, namelyK-Type Package-Exchange Robot Routing

(K-PERR), where the packages and goal vertices are partitioned into K types. If there are Mk

packages of type k (k ∈ [K]), then there must also be Mk goal vertices of type k. Packages of

the same type are interchangeable: Each package of type k must be delivered to a different goal

vertex of type k. Each goal vertex of type k must receive a package of type k. The problem of

K-PERR is to minimize the makepsan, namely the maximum of the arrival times of all packages

at their goal vertices. PERR is thus a special case of K-PERR with K = M by definition.

We recall that, in TAPF, agents are partitioned into teams and agents in the same team are

interchangeable. K-PERR is thus a relaxation of TAPF with K teams that permits exchange op-

erations (and thus omits the edge collisions of TAPF) in the same sense that PERR is a relaxation

of MAPF. Without loss of generality, we assume that packages and goal vertices inK-PERR with

the same type are given contiguous indices, that is, we can group them into K types such that

package pi and goal vertex gi are of (a unique) type k where i ∈ [1 −Mk +∑k

1Mk,∑k

1Mk].

Therefore, there is a correspondence between a TAPF problem instance on a graph G = (V,E)

that has K teams of agents {akj |k ∈ [K], j ∈ [Mk]}, with start vertex skj ∈ V for each agent

akj , and targets gk1 , . . . gkMk∈ V for each teamk, and a K-PERR problem instance on the same

graphG = (V,E) that hasK types of packages {pi|i ∈ [M ]}, with start vertex si = skj ∈ V and

goal vertex gi = gkj ∈ V of type k for each package pi of type k, where i = j −Mk +∑k

1Mk.

2.4.4 1-PERR

1-PERR is a special case of K-PERR where only K = 1 type of packages exists. We know

from above that 1-PERR is a relaxation of Anonymous MAPF (TAPF with one team) that per-

mits exchange operations (and thus omits the edge collisions of Anonymous MAPF). There is a

correspondence between an Anonymous MAPF problem instance on a graph G = (V,E) that

has agents {ai|i ∈ [M ]}, with start vertex si ∈ V and goal vertex gi ∈ V for each agent ai,

31

and a 1-PERR problem instance on the same graph G = (V,E) that has packages {pi|i ∈ [M ]},

with start vertex si ∈ V and goal vertex gi ∈ V for each package pi.

2.5 Problem Definition of MAPD

We now formalize the problem of Multi-Agent Pickup and Delivery (MAPD). A MAPD problem




can move along.

2. A given set ofM agents {ai|i ∈ [M ]}. Each agent has an initial vertex. All initial vertices

are pairwise different.

3. A task set T that contains the set of unexecuted tasks in the system. The task set changes

dynamically as, at each time step, new tasks can be added to the system. Each task τj ∈ T

is characterized by a pickup vertex sj ∈ V and a delivery vertex gj ∈ V and is added to

the system at an unknown (finite) time step. A task is known and available for execution

only from the time step on when it has been added to the system.


same vertex. Let πi(t) ∈ V denote the vertex occupied by agent ai at time step t. When the

system starts (at time step 0), agent ai starts at its given initial vertex πi(0).

For ease of exposition, we refer to both (1) an entire sequence of vertices traversed by an

agent in a MAPD solution (when describing the long-term problem) and (2) a sequence of ver-

tices for an agent to execute a task (when describing one-shot sub-problems) as a path for MAPD.

We thus do not require a path to start at a predefined vertex and end at a predefined vertex.


following condition: The agent always either moves to an adjacent vertex or does not move, that

is, for all time steps t = 0, . . . ,∞, (πi(t), πi(t+ 1)) ∈ E or πi(t+ 1) = πi(t).

32

Agents need to avoid collisions with each other:

Definition 2.24. A vertex collision is a tuple 〈ai, ai′ , v, t〉 where agent ai and agent ai′ occupy

the same vertex v = πi(t) = πi′(t) at the same time step t. An edge collision is a tuple

〈ai, ai′ , u, v, t〉 where agent ai and agent ai′ traverse the same edge (u, v), where u = πi(t) =

πi′(t+ 1) and v = πi′(t) = πi(t+ 1), in opposite directions between time steps t and t+ 1.

We use the following definitions to model single-agent tasks that each can be assigned to

one agent at a time and single-task agents that each can execute one task at a time (Gerkey &

Mataric, 2004).

Definition 2.25. An agent is called free if and only if it is currently not executing any task.

Otherwise, it is called occupied.

Definition 2.26. A task can be assigned to one agent at a time. A free agent can be assigned

any task τj ∈ T . In order to execute task τj , it then has to move first from its current vertex to

the pickup vertex sj of the task and then from there to the delivery vertex gj of the task. When

the agent reaches the pickup vertex, it starts to execute the task and removes the task from T .

When it reaches the delivery vertex, it finishes executing the task, which implies that it becomes

free again and is no longer assigned the task. Any free agent can be assigned any task in the task

set. An agent can be assigned a different task in the task set while it is still moving to the pickup

vertex of its currently assigned task but has to finish executing the task after it has reached the

pickup vertex of the task before it can be assigned another task.

Definition 2.27. The service time is the average number of time steps needed to finish executing

each task after it was added to the system.

Definition 2.28. The makespan is the earliest time step when all tasks are finished.

The problem of MAPD is to find collision-free paths for the agents to finish executing all

tasks. The effectiveness of a MAPD algorithm is evaluated by the service time or makespan.

33

Definition 2.29. A MAPD algorithm solves a MAPD problem instance if and only if the result-

ing service time of all tasks is bounded (or, equivalently, the resulting makespan is bounded). A

MAPD algorithm is long-term robust if and only if it solves all MAPD problem instances with

finitely many tasks.

2.5.1 MAPD as a Long-Term Generalization of One-Shot Problems

In MAPD, the one-shot problem at any time step can be viewed as a variant of Anonymous

MAPF, MAPF, and TAPF. The free agents are anonymous since they can be assigned any task

in the task in the task set. Therefore, the one-shot problem for any given set of free agents,

namely to assign (pickup vertices of) tasks in the task set to and find collision-free paths for

the agents in the set to the pickup vertices of their assigned tasks, is similar to Anonymous

MAPF. The difference is that the number of free agents in the set is not necessarily equal to the

number of (pickup vertices of) tasks available for assignment. The occupied agents are non-

anonymous since they cannot change their current targets (delivery vertices of their assigned

tasks). Therefore, the one-shot problem for any given set of occupied agents, namely to find

collision-free paths for the agents in the set to the delivery vertices of the tasks they are currently

executing, is similar to MAPF. Similarly, the one-shot problem for any given set of free and

occupied agents is similar to TAPF where all free agents in the set form one team and each

occupied agent in the set forms a single-agent team.

2.5.2 MAPD Example

Figure 2.9 shows an example of a MAPD problem instance on a 2D 4-neighbor grid. Colored

circles are agents. Dashed circles represent pickup and delivery vertices. Figure 2.2 shows the

corresponding graph representation of the 2D 4-neighbor grid. Two free agents a1 (in blue) with

π1(0) = A and a2 (in green) with π2(0) = E are given. There is only one task τ1 with s1 = A

and g1 = E, which is added to the system at time step 0. A solution is to assign task τ1 and

path 〈A,A,C,E〉 to agent a1 and path 〈E,C,B〉 to agent a2 with service time 3 and makespan

34

g1s1

Figure 2.9: Example of a MAPD problem instance.

A C

B

D

E

Figure 2.10: Graph representation of the MAPD problem instance shown in Figure 2.9

3. Specifically, agent a1 becomes an occupied agent and starts to execute task τ1 at time step 0

and finishes executing the task at time step 3.

2.6 Summary

In this chapter, we gave the formal definitions of MAPF, Anonymous MAPF, TAPF, PERR, and

MAPD. Figure 2.11 summarizes the relationships between these problems. Table 2.1 summa-

rizes the symbols used in the definitions.

35

MAPF

TAPF

AnonymousMAPF

PERR

K-PERR

1-PERR

MAPD

K = M

K = 1

K = M

K = 1

edge collisions

exchange operations

edge collisions

exchange operations

edge collisions

exchange operations

long termone shot, occupied agentslong term

one shot

long ter

m

one s

hot, f

reeage

nts

Figure 2.11: Relationships between different problems.

Table 2.1: Summary of symbols. The last column specifies if the symbols are used for specificproblems.

symbol description problemG = (V,E) graph that models the environmentu, v ∈ V used for verticesA,B, . . . ,F specific vertices in examplesa used for agentsp used for packages (K-)PERRM number of agents or packagesK number of teams of agents or types of packages TAPF, K-PERRτ used for tasks MAPDT task set MAPD

s ∈ V used for start vertices all except MAPDused for pickup vertices of tasks MAPD

g ∈ Vused for goal vertices (preassigned targets) MAPF, (K-)PERRused for targets to be assigned to agents Anonymous MAPF, TAPFused for delivery vertices of tasks MAPD

π used for pathst discrete time stepT used for arrival times

36

Chapter 3

Target Assignment and Path Planning for Teams of

Agents

In this chapter, we provide a review of literature on different target-assignment and path-planning

problems. We first provide an overview of the one-shot path-planning problem MAPF as an

essential element of coordinating target-assignment and path-planning operations of teams of

agents in Section 3.1. We provide detailed descriptions of three MAPF algorithms in Section 3.2.

We describe how MAPF solutions can be safely executed by real-world agents in Section 3.3.

We then survey existing research on other one-shot and long-term target-assignment and path-

planning problems in Section 3.4. Finally, we conclude the chapter in Section 3.5.

3.1 One-Shot Path-Planning Problem: MAPF

Many recent research efforts have concentrated on tackling the one-shot path-planning problem

for multiple agents by studying the standard problem formulation MAPF (Ma & Koenig, 2017;

Ma, Koenig, et al., 2016). MAPF has been well-studied by researchers from the AI, robotics,

theoretical computer science, and operations research communities under different names, in-

cluding Cooperative Path Finding (Silver, 2005), Pebble Motion on Graphs (Kornhauser et al.,

1984), and Multi-Robot Path Planning (Yu & LaValle, 2016).

37

In the following, we first give a survey of theoretical results for MAPF in Section 3.1.1.

We then provide an overview of MAPF algorithms in Section 3.1.2. Finally, we survey other

one-shot path-planning problems that are modeled as extensions of MAPF in Section 3.1.3.

3.1.1 Theoretical Results for MAPF

Now, we provide an overview of existing theoretical results for MAPF. We categorize them into

results for pebble motion on graphs with a time-irrelevant objective function (total number of

edge traversals) and results for MAPF with time-relevant objective functions (makespan and

flowtime).

3.1.1.1 15-Puzzle and Pebble Motion on Graphs with Time-Irrelevant Objectives

In the theoretical computer science and operations research communities, MAPF originates from

the study of the 15-puzzle (Johnson & Story, 1879), that can be viewed as a special case of MAPF

on a 4 × 4 2D 4-neighbor grid with 15 agents. The solvability of a 15-puzzle problem instance

depends on the parity of the permutation. Pebble motion on graphs (Kornhauser et al., 1984) is

a generalization of the 15-puzzle and can also be viewed as a special case of MAPF with at most

M = |V | − 1 agents. In the problem of pebble motion on graphs, an agent can move from its

current vertex only to an adjacent vertex that is not currently occupied by another agent. The

solution quality is measured by the total number of edge traversals that move all agents from

their start vertices to their goal vertices. There exists a complete O(|V |3)-time algorithm that

finds a solution of O(|V |3) edge traversals to any problem instance of pebble motion on graphs

or decides that the problem instance is unsolvable (Kornhauser et al., 1984; Roger & Helmert,

2012). There even exists a linear-time algorithm that decides whether a problem instance of

pebble motion on graphs is solvable (Goraly & Hassin, 2010). However, it is NP-hard to find

a solution with the minimum total number of edge traversals to a problem instance of pebble

motion on graphs (Goldreich, 2011; Ratner & Warmuth, 1986).

38

3.1.1.2 MAPF with Time-Relevant Objectives

In the AI community, MAPF attempts to model robots in real-world applications rather than peb-

bles on game boards and thus often uses time-relevant objective functions, such as the makespan

or the flowtime, that assign costs to wait actions (staying at the same vertex for one time step) as

in Definitions 2.1 and 2.4 in additional to the time-irrelevant objective function—the total num-

ber of edge traversals. The makespan and flowtime of any MAPF problem instance are bounded

from above by O(|V |3), based on the result that there exists a complete O(|V |3)-time algorithm

that finds a solution of O(|V |3) edge traversal to any MAPF problem instance or decides that

the MAPF problem instance is unsolvable (Yu & Rus, 2015). However, it is NP-hard to find a

solution with the minimum makespan (Surynek, 2010) or the minimum flowtime (Yu & LaValle,

2013c) to a MAPF problem instance, even if the given graph is a planar graph (Yu, 2016) or a

2D 4-neighbor grid (Banfi, Basilico, & Amigoni, 2017). Any two of the objective functions, the

total number of edge traversals, makespan, and flowtime, cannot be simultaneously minimized

for MAPF (Yu & LaValle, 2013c).

3.1.2 MAPF Algorithms

Now, we provide an overview of existing MAPF algorithms. We categorize them into reduction-

based, rule-based, and search-based algorithms, based on their methodologies. We highlight

their properties in terms of completeness (complete for all MAPF problem instances, complete

for MAPF problem instances on graphs with special properties, or incomplete) and optimality

(optimal, bounded-suboptimal, or suboptimal with respect to different objectives). A survey and

limited experimental evaluation of some of these algorithms can be found in Felner et al. (2017).

3.1.2.1 Reduction-Based Algorithms

Reduction-based algorithms reduce MAPF to other well-studied combinatorial problems, such

as Boolean Satisfiability (Surynek, 2012), Integer Linear Programming (ILP) (Yu & LaValle,

39

2013b), and Answer Set Programming (Erdem, Kisa, Oztok, & Schueller, 2013). These algo-

rithms often construct an explicit representation of the state space of a MAPF problem instance

up to some value of the makespan (planning horizon) by using variables and posting constraints

for the variables. They solve a decision problem for each value of the makespan and increase

the value if no solution exists for the current one. They are complete for solving MAPF and

naturally designed for minimizing the makespan. They can be modified to solve MAPF with

other objectives optimally (Surynek, Felner, Stern, & Boyarski, 2016; Yu & LaValle, 2013b),

bounded-suboptimally (within a user-provided suboptimality factor) (Surynek, Felner, Stern, &

Boyarski, 2017), and suboptimally (Surynek, 2015). We describe an ILP-based MAPF algorithm

in Section 3.2.3.

3.1.2.2 Rule-Based Algorithms

Rule-based algorithms solve MAPF using a set of primitive operations that specify the actions

of the agents in different situations. They often guarantee completeness for a restricted class of

MAPF problem instances. Rule-based algorithms are often very efficient by simply following the

predefined primitive operations but provide no guarantee on the solution quality. Push and Swap

(Luna & Bekris, 2011) and its extension (Sajid, Luna, & Bekris, 2012) provide no completeness

guarantee. One of their descendants, Push and Rotate (de Wilde, ter Mors, & Witteveen, 2013),

is complete for MAPF problem instances on graphs with at most M = |V | − 2 agents. TASS

(Khorshid, Holte, & Sturtevant, 2011) is complete for MAPF problem instances on “solvable”

trees based on prior work on solving multi-robot motion planning on trees (Masehian & Ne-

jad, 2009). BIBOX (Surynek, 2009) is complete for MAPF problem instances on bi-connected

graphs with at most M = |V | − 2 agents. Split and Group (Yu, 2017) is complete for MAPF

problem instances on grid-like “well-connected” graphs, runs in polynomial time, and provides

a constant-factor approximation guarantee for minimizing the makespan on such graphs.

Some algorithms combine both primitive operations and search. FAR (Wang & Botea, 2008)

and MAPP (Wang & Botea, 2011) explore different ways of combining paths of individual

agents. MAPP is complete for MAPF problem instances on “slidable” graphs (Wang & Botea,

40

2011). This research has resulted in Wang’s dissertation (Wang, 2012). There is also a MAPF

algorithm that uses a combination of A* searches on a graph abstraction, primitive operations,

and reductions to constraint satisfaction problems (Ryan, 2008; Ryan, 2010).

3.1.2.3 Search-Based Algorithms

The main computational challenge of optimally solving MAPF with a search algorithm is that the

number of possible states of a MAPF problem instance is exponential in the number of agents:

Each (joint) state is the product of the vertices of all agents, and each state-transition operator

is the product of the actions (moving to one of the adjacent vertices or waiting) of all agents.

Search-based MAPF algorithms tackle this challenge by using different strategies to reduce the

size of the exponential state space.

Decoupled Algorithms The search-based algorithms in the first category decouple MAPF

completely into one-shot single-agent path-finding problems. These algorithms are often effi-

cient but provide no optimality or even completeness guarantee. Many of them are based on

Prioritized Planning (Bennewitz, Burgard, & Thrun, 2002; Erdmann & Lozano-Perez, 1987)

that plans a path for each agent at a time in a fixed order according to their predefined priorities.

Cooperative A* and Hierarchical Cooperative A* (HCA*) (Silver, 2005) are prioritized planning

algorithms that use a space-time A* search (see Section 3.2.1.1) to plan a path for each agent, one

at a time, and avoid collisions of this agent with the agents whose paths they have planned earlier.

Cooperative A* uses pre-computed heuristic values for the space-time A* searches. HCA* was

initially designed for video games with a limited amount of memory and thus needs to compute

the heuristic values for the space-time A* searches at runtime. We describe Cooperative A* in

Section 3.2.1. One of its extensions, Windowed-HCA* (Silver, 2005), only considers the paths of

other agents within a limited planning horizon (time window). Cooperative Partial-Refinement

A* (Sturtevant & Buro, 2006) further improves upon HCA* by abstracting the state space of an

agent and thus reducing the runtime needed to calculate the heuristic values. Conflict-Oriented

41

Windowed-HCA* (Bnaya & Felner, 2014) first ignores collisions and then dynamically places

time windows around collisions to adjust the paths it has planned.

A*-Based Algorithms The search-based algorithms in the second category use an A* search

to plan with joint states but try to reduce the size of the state space they need to explore. They are

complete for all MAPF problem instances and can be used for either makespan minimization or

flowtime minimization. Independence Detection (Standley, 2010) partitions the agents into inde-

pendent groups where agents in different groups do not collide with each other and thus reduces

the original MAPF problem into several MAPF sub-problems, one for each group. The size of

its state space is thus exponential only in the cardinality of the largest group. Operator Decom-

position (Standley, 2010) decomposes a state-transition operator into M (single-agent) actions

and applies one action at a time, thus affording the A* search pruning opportunities for any two

actions that lead to a collision. Enhanced Partial Expansion A* (Goldenberg et al., 2014) uses an

“operator selection function” to select state-transition operators to avoid generating search nodes

with f -values larger than the optimal solution cost. M* (Wagner & Choset, 2015) dynamically

changes the branching factor and searches with joint states of a set of agents locally only if it has

found a collision among them. M* is part of Wagner’s dissertation (Wagner, 2015).

Hierarchical Algorithms The search-based algorithms in the third category decouple MAPF

into one-shot single-agent path-planning problems on the low level and dynamically couple the

resulting single-agent paths using a best-first tree search on the high level. They are complete for

all MAPF problem instances. Increasing Cost Tree Search (Sharon, Stern, Goldenberg, & Felner,

2013) minimizes the flowtime. On the high level, it systematically considers each combination

of the arrival times of all agents and branches on possible ways of increasing the flowtime by one

if no collision-free paths exist for a combination of arrival times. On the low level, it determines

whether collision-free paths exist for a combination of arrival times. Conflict-Based Search

(CBS) (Sharon et al., 2015) minimizes either the makespan or the flowtime. We describe CBS in

Section 3.2.2. CBS first finds individually optimal paths for all agents (ignoring collisions). On

42

the high level, it then systematically resolves each collision of the computed paths by imposing

constraints on individual agents that forbid them from occupying a vertex or traversing an edge

at a given time step. On the low level, it uses a space-time A* search to find a path for an agent

that obeys its constraints. The high-level search of CBS branches on which collision to resolve.

Increasing Cost Tree Search and CBS are part of Sharon’s dissertation (Sharon, 2015).

3.1.3 MAPF Extensions and Related Problems

We now survey other one-shot path-planning problems that are modeled as extensions of MAPF.

3.1.3.1 MAPF with Deadlines

In the problem of MAPF with Deadlines (Ma et al., 2018a; 2018b), a deadline (time step) is

given. Its objective is to maximize the number of successful agents, defined as the agents that can

reach their given goal vertices from their given start vertices within the given deadline without

colliding with each other. Its applications include robots that need to evacuate before a disaster

and robots that need to finish tasks before a deadline.

3.1.3.2 MAPF with Delay Probabilities and Related Problems

MAPF with Delay Probabilities (MAPF-DP) (Ma, Kumar, & Koenig, 2017) generalizes MAPF

to the case where the uncertainty of agent motion has to be considered during planning to ensure

a collision-free execution of the plan. In MAPF-DP, the uncertainty of each agent is characterized

by a given delay probability with which the agent stays in its current vertex whenever it intends

to traverse an outgoing edge of its current vertex. The problem of MAPF-DP is to find a plan

that consists of a path for each agent and a plan-execution policy that controls with GO or STOP

commands how each agent proceeds along its path such that no collisions occur during plan

execution. It is also studied as MAPF with Uncertainty (Wagner & Choset, 2017), where the

paths are planned in the belief space of the agents and the execution of the resulting plan is

not guaranteed to be collision-free. There is also research on a similar extension of MAPF that

43

enforces a certain number of time steps for which a vertex must be unoccupied after it has been

occupied by an agent, which reduces the possibility of collisions during plan execution (Atzmon

et al., 2018).

Other related problems use the Markov Decision Process (MDP) or Partially Observable

Markov Decision Process (POMDP) framework to plan paths under uncertainty for multiple

agents. These problems include POMDP planning for robot navigation (Kurniawati, Hsu, & Lee,

2008; Ma & Pineau, 2015), transition-independent decentralized MDPs (Becker, Zilberstein,

Lesser, & Goldman, 2004; Goldman & Zilberstein, 2004), multi-agent MDPs (Boutilier, 1996),

decentralized sparse-interaction MDPs (Melo & Veloso, 2011), transition-independent multi-

agent MDPs (Scharpff, Roijers, Oliehoek, Spaan, & de Weerdt, 2016), and a framework for

approximating multi-agent MDPs (Liu & Michael, 2016) for multi-agent path planning with

motion dynamics.

3.1.3.3 MAPF for Large Agents

In the problem of MAPF for Large Agents (Li, Surynek, et al., 2019), an agent can occupy more

than one vertex at one time step according to its given shape and volume. Two agents collide if

both of them occupy some vertex at the same time step.

3.2 MAPF Algorithm Examples

Now, we describe one incomplete (and thus suboptimal) and two optimal MAPF algorithms

introduced in Section 3.1.2, which inspire the design of some of the target-assignment and path-

planning algorithms covered in the later chapters of this dissertation. The first one is Cooperative

A* (Silver, 2005), a decoupled search-based algorithm based on Prioritized Planning (Bennewitz

et al., 2002; Erdmann & Lozano-Perez, 1987). The second one is Conflict-Based Search (CBS)

(Sharon et al., 2015), a hierarchical search-based algorithm. The third one is an ILP-based

MAPF algorithm (Yu & LaValle, 2013b), which reduces MAPF to integer multi-commodity

flow problems that are then solved with an ILP formulation.

44

3.2.1 Cooperative A*

Prioritized Planning (Bennewitz et al., 2002; Erdmann & Lozano-Perez, 1987) is a decoupled

scheme for MAPF. It uses a predefined total order on the agents and reduces MAPF to a one-shot

single-agent path-planning problem for each agent: Prioritized Planning plans for the highest

priority agent first and compute its optimal path. Prioritized Planning then plans for lower and

lower priority agents and computes, for each agent, its individually optimal path that avoids

collisions with all higher priority agents, which are dynamic obstacles that follow their (already

planned) paths. Decoupled MAPF algorithms are complete for all well-formed MAPF problem

instances (Cap et al., 2015).

Definition 3.1. A MAPF problem instance is well-formed if and only if:

1. The start and goal vertices of all agents, called endpoints, are different from each other

except that the start and goal vertices of the same agent can be the same.

2. For any two endpoints, there exists a path between them that traverses no other endpoints.

We now describe a version of Cooperative A* that implements Prioritized Planning by using

a space-time A* search (Silver, 2005) to solve the one-shot single-agent path-planning problem

for each agent individually.

3.2.1.1 Space-Time A* Search

Space-time A* is a one-shot single-agent path-planning algorithm that is used in not only Co-

operative A* but also other search-based MAPF algorithms. A space-time A* search is an A*

search whose states are pairs of vertices and time steps. A directed edge exists from state 〈u, t〉

to state 〈v, t + 1〉 if and only if u = v ∈ V or (u, v) ∈ E. A space-time A* search finds a

time-minimal path (that is, with the minimum arrival time at its goal vertex) for some agent ai

that obeys a set of constraints to avoid collisions with other agents.

45

Algorithm 3.1: Cooperative A*Input: MAPF problem instance

1 Plan← ∅;2 for i← 1 . . .M do3 if Space-time A* search for ai returns no path then4 return “No Solution”;

5 Add the returned path to Plan;

6 return Plan;

Definition 3.2. A constraint for agent ai is either a vertex constraint 〈ai, v, t〉, that prohibits

agent ai from occupying vertex v at timestep t, or an edge constraint 〈ai, u, v, t〉, that prohibits

agent ai from moving from vertex u to vertex v between timesteps t and t+ 1.

Therefore, state 〈v, t〉 is removed from the state space of agent ai if and only if there is a

vertex constraint 〈ai, v, t〉. Similarly, the edge from state 〈u, t〉 to state 〈v, t + 1〉 is removed

from the state space of agent ai if and only if there is an edge constraint 〈ai, u, v, t〉.

3.2.1.2 Pseudocode

Algorithm 3.1 shows the pseudocode of Cooperative A*. Given a MAPF problem instance

with graph G = (V,E) and the set of agents {ai|i ∈ [M ]}, it plans paths for the agents

in increasing order of their indices [Line 2]. It then uses a space-time A* search to find a

path for each agent ai that obeys the constraints imposed by the already planned paths πj

of all higher priority agents aj [Lines 3-5]. Specifically, each such path πj imposes the set

of constraints {〈ai, πj(0), 0〉, 〈ai, πj(1), πj(0), 0〉, 〈ai, πj(1), 1〉, 〈ai, πj(2), πj(1), 1〉, . . .} for

agent ai. Therefore, the space-time A* search finds a time-minimal path πi for agent ai that

has no collisions with the paths πj of all higher priority agents aj . All planned paths are thus

collision-free. Cooperative A* reports that no solution exists if the space-time A* search of any

agent returns no path [Line 4]. Otherwise, it returns a MAPF plan that contains collision-free

paths for all agents [Line 6].

More details on Cooperative A*, including an analysis of its properties, can be found in

Silver (2005).

46

3.2.1.3 Example

We describe how Cooperative A* solves the MAPF problem instance shown in Figure 2.1. Co-

operative A* first plans for agent a1 and finds the time-minimal path π1 = 〈B,C,D〉. It then

plans for agent a2 and finds the time-minimal path π2 = 〈A,A,C,E〉 that obeys all constraints

imposed by path π1.

3.2.2 Conflict-Based Search

Conflict-Based Search (CBS) (Sharon et al., 2015) is a two-level search-based MAPF algorithm.

On the low level, CBS uses a space-time A* search (see Section 3.2.1.1) to compute a time-

minimal path for each agent. On the high level, CBS performs a best-first tree search to resolve

collisions in the computed paths. We now describe a version of CBS that is optimal for makespan

minimization.

3.2.2.1 High-Level Search of CBS

CBS performs a best-first search on the high level to resolve collisions among the agents and

build a constraint tree. Each node N contains a set of constraints (Definition 3.2) N.constraints,

a plan N.plan that obeys these constraints, and a cost N.cost equal to the makespan of its plan.

The list OPEN stores all generated but unexpanded nodes. Algorithm 3.2 shows the high-level

search of CBS. CBS starts with the root node, that has an empty set of constraints [Line 1]. It per-

forms a low-level space-time A* search to find an individually time-minimal path for each agent

(without any constraints) independently. It terminates unsuccessfully if the low-level space-time

A* search for any agent returns no path. Otherwise, the plan of the root node contains paths

for all agents [Line 2-6]. The cost of the root node is the makespan of its plan [Line 7]. CBS

inserts the root node into OPEN [Line 8]. If OPEN is empty, then CBS terminates unsuccess-

fully [Lines 9 and 24]. Otherwise, it expands a node N in OPEN with the smallest cost [Line

10] and removes the node from OPEN [Line 11]. Ties are broken in favor of the node whose

plan has the smallest number of pairs of colliding agents, which speeds up the high-level search

47

Algorithm 3.2: High-Level Search of CBSInput: MAPF problem instance

1 Root.constraints← ∅;2 Root.plan← ∅;3 foreach i ∈ [M ] do4 if LowLevel(ai, Root) returns no path then5 return “No Solution”;

6 Add the returned paths to Root.plan;

7 Root.cost← Makespan(Root.plan);8 OPEN ← {Root};9 while OPEN 6= ∅ do

10 N ← arg minN ′∈OPEN N′.cost;

11 OPEN ← OPEN \ {N};12 if N.plan has no collision then13 return N.plan;

14 collision← a vertex or edge collision 〈ai, aj , . . . 〉 in N.plan;15 foreach ai involved in collision do16 N ′ ← new node;17 N ′.plan← N.plan;18 N ′.constraints← N.constraints;19 N ′.constraints← N ′.constraints ∪ {〈ai, . . . 〉};20 if LowLevel(ai, N ′) returns a path then21 Update N ′.plan with returned path;22 N ′.cost← Makespan(N ′.plan);23 OPEN ← OPEN ∪ {N ′};

24 return “No Solution”;

of CBS experimentally (Sharon et al., 2015). If the plan of node N has no collisions, then it

is a goal node and CBS terminates successfully with this plan [Lines 12-13]. Otherwise, CBS

finds a collision that it needs to resolve [Line 14]. CBS then generates two child nodes N1

and N2 of N [Lines 15-16]. Each child node inherits the plan and all constraints from node N

[Lines 17-18]. If the collision to resolve is a vertex collision 〈ai, aj , v, t〉, then CBS adds the

vertex constraint 〈ai, v, t〉 to the constraints of N1 and the vertex constraint 〈aj , v, t〉 to the con-

straints of N2 [Line 19]. If the collision to resolve is an edge collision 〈ai, aj , u, v, t〉, then CBS

adds the edge constraint 〈ai, u, v, t〉 to the constraints of N1 and the edge constraint 〈aj , v, u, t〉

to the constraints of N2 [Line 19]. For node N1 (respectively N2), CBS performs a low-level

48

space-time A* search to find a time-minimal path for agent ai (respectively aj) that obeys all

constraints inN1.constraints (respectivelyN2.constraints) relevant to agent ai (respectively aj).

If the low-level space-time A* search successfully returns such a path, CBS replaces the old path

of agent ai (respectively aj) in N1.plan (respectively N2.plan) with the returned one [Lines 20-

21], updates the cost ofN1 (respectivelyN2) accordingly [Line 22], and insertsN1 (respectively

N2) into OPEN [Line 23]. Otherwise, it discards the node.

3.2.2.2 Low-Level Search of CBS

Similar to the space-time A* search for Cooperative A* (see Section 3.2.1.1), LowLevel(ai,

N ) performs a space-time A* search to find a time-minimal path for agent ai that obeys all

constraints of node N . In addition, it breaks ties among all time-minimal paths in favor of one

that has the fewest collisions with the paths of other agents in N.plan, which speeds up the

high-level search of CBS experimentally (Sharon et al., 2015).

CBS uses an upper bound U on the makespan, such as the one given in Yu and Rus (2015), to

guarantee completeness and optimality. The makespan of a plan is bounded from above by U if

and only if the arrival times of all agents are bounded from above by U . Therefore, the low-level

space-time A* search of CBS for agent ai also uses U as the upper bound on the arrival time

at its goal vertex. Therefore, it terminates unsuccessfully and returns no path when it tries to

expand a state (a pair of a vertex and a time step) whose time step is larger than U . More details

on CBS, including an analysis of its properties, can be found in Sharon et al. (2015).

3.2.2.3 Flowtime Objective

CBS can also be used for other objectives (Sharon et al., 2015). In particular, it returns a solution

with the minimum flowtime if the flowtime of the plan of a node is assigned to the cost of the

node on Lines 7 and 22.

49

Rootcost = 2plan = {

π1 = 〈B,C,D〉,π2 = 〈A,C,E〉

}

N1

cost = 3plan = {

π1 = 〈B,B,C,D〉,π2 = 〈A,C,E〉

}

N2

cost = 3plan = {

π1 = 〈B,C,D〉,π2 = 〈A,A,C,E〉

}

〈a1,C, 1〉〈a2,C, 1〉

Figure 3.1: The constraint tree for the MAPF problem instance shown in Figure 2.1.

3.2.2.4 Example

Figure 3.1 shows the constraint tree constructed for the MAPF problem instance shown in Fig-

ure 2.1. CBS starts with the root node, that has an empty set of constraints. Its plan is a set

of individually time-minimal paths {π1 = 〈B,C,D〉, π2 = 〈A,C,E〉}. When CBS expands

the root node and resolves the collision 〈a1, a2,C, 1〉, it generates two child nodes N1 and N2

with additional constraints 〈a1,C, 1〉 and 〈a2,C, 1〉, respectively. For N1 (respectively N2),

LowLevel(a1, N1) (respectively LowLevel(a2, N2)) returns path π1 = 〈B,B,C,D〉 for a1 (re-

spectively π2 = 〈A,A,C,E〉 for a2) that obeys the constraints of N1 (respectively N2). When

CBS tries to expand either N1 or N2 and detects no collisions in the plan of the node, it returns

the plan as solution.

3.2.2.5 CBS Variants

Many improvements to CBS have been proposed: Meta-Agent CBS (Sharon et al., 2015) dy-

namically groups multiple agents into a meta-agent on the high level and uses an A* search to

plan paths for these agents with their joint states on the low level. ICBS (Boyarski et al., 2015)

50

always first resolves collisions that result in child nodes whose costs are larger than that of the

current node, thus affording the high-level search of CBS pruning opportunities. CBSH (Felner

et al., 2018) and its improvement (Li, Boyarski, Felner, Ma, & Koenig, 2019) use an admissible

heuristic to improve the high-level best-first search of CBS. Recent search develops a CBS vari-

ant (Li, Harabor, Stuckey, Felner, et al., 2019) that expands each node in a way such that any

solution is admitted by the subtree under only one but not both of its child nodes, thus reducing

duplicate search effort of the high-level search of CBS. Other recent research develops a CBS

variant (Li, Gange, et al., 2020; Li, Harabor, Stuckey, Ma, & Koenig, 2019) that adds multiple

constraints to a child node at a time. Other CBS variants use different searches on the high level:

ECBS (Barer, Sharon, Stern, & Felner, 2014) and its improvement (Cohen et al., 2016) perform

a bounded-suboptimal search on the constraint tree. Some recent research (Cohen et al., 2018)

develops an anytime version of the bounded-suboptimal search on the constraint tree. Other re-

cent research (Ma, Harabor, Stuckey, Li, & Koenig, 2019) develops a greedy depth-first search

on the constraint tree.

3.2.3 ILP-Based MAPF Algorithm

The ILP-based MAPF algorithm (Yu & LaValle, 2013b) first reduces MAPF to the integer multi-

commodity flow problem on a time-expanded flow network (an idea that originated in the opera-

tions research literature (Aronson, 1989)) and then uses this reduction to solve MAPF optimally

for makespan minimization.

3.2.3.1 Reducing MAPF to Multi-Commodity Flow

We now describe the reduction used by the ILP-based MAPF algorithm.

Given a MAPF problem instance on graph G = (V,E) and a limit T on the number of

time steps, we construct a T -step time-expanded flow network N = (V, E) with vertices V =⋃v∈V ({vout0 } ∪

⋃Tt=1{vint , voutt }) and directed edges E with unit capacity. Each vertex v ∈ V is

translated to a vertex voutt ∈ V for all t = 0 . . . T (which represents vertex v at the end of time

step t) and a vertex vint ∈ V for all t = 1 . . . T (which represents vertex v in the beginning of time

51

uoutt

voutt

uint+1

vint+1

w w′

Figure 3.2: Example of the construction of the gadgets in N for edge (u, v) ∈ E and time stept.

step t). For each agent ai, we set a supply of one at (start) vertex (si)out0 and a demand of one at

(goal) vertex (gi)outT , both for commodity type i (corresponding to agent ai). Each vertex v ∈ V

is also translated to an edge (voutt , vint+1) ∈ E for all time steps t = 0 . . . T − 1 (which represents

an agent waiting at vertex v between time steps t and t + 1). Each vertex v ∈ V is translated

to an edge (vint , voutt ) ∈ E for all time steps t = 1 . . . T (which prevents vertex collisions of the

form 〈∗, ∗, v, t〉 among all agents since only one agent can occupy vertex v between time steps t

and t+ 1). Each edge (u, v) ∈ E is translated to a gadget of vertices in V and edges in E for all

time steps t = 0 . . . T − 1, which consists of two auxiliary vertices w,w′ ∈ V that are unique

to the gadget (but have no superscripts/subscripts here for ease of readability) and the edges

(uoutt , w), (voutt , w), (w,w′), (w′, uint+1), (w, vint+1) ∈ E . This gadget prevents edge collisions of

the forms 〈∗, ∗, u, v, t〉 and 〈∗, ∗, v, u, t〉 among all agents since only one agent can move along

the edge (u, v) in any direction between time steps t and t+ 1. Figure 3.2 shows an example of

the construction of the gadgets. By construction, there is a correspondence between all feasible

integer multi-commodity flows on the T -step time-expanded flow network of a number of units

equal the number of agents and all solutions of the MAPF problem instance with makespans of

at most T (Yu & LaValle, 2013b).

3.2.3.2 Example

Figure 3.3 shows a feasible integer multi-commodity flow in the 3-step time-expanded flow

network reduced from the MAPF problem instance shown in Figure 2.1. Solid colored circles

52

A B C D Es2 s1 g1 g2

out0

in1

out1

in2

out2

in3

out3

Figure 3.3: A feasible integer multi-commodity flow for the MAPF problem instance shown inFigure 2.1.

are (start) vertices with a supply. Hatched colored circles are (goal) vertices with a demand.

The blue edges represent a unit flow of commodity type 1, corresponding to a path for agent

a1. The green edges represent a unit flow of commodity type 2, corresponding to a path for

agent a2. The feasible integer multi-commodity flow corresponds to the optimal solution {π1 =

〈B,C,D〉, π2 = 〈A,A,C,E〉}.

53

3.2.3.3 Solving MAPF via ILP

The ILP-based MAPF algorithm then employs the reduction to solve MAPF optimally for

makespan minimization via a standard ILP formulation. Let δ+(v) (respectively δ−(v)) be the

set of incoming (respectively outgoing) edges of v. Let xi[e] be the Boolean variable repre-

senting the amount of flow of commodity type i on edge e. An integer multi-commodity flow

problem on the T -step time-expanded flow network can be written as an ILP using the following

standard formulation.

0 ≤∑i∈[M ]

xi[e] ≤ 1 ∀e ∈ E .

∑e∈δ+(v)

xi[e]−∑

e∈δ−(v)

xi[e] = 0 ∀i ∈ [M ],

∀v ∈ V such that v 6= (si)out0 and v 6= (gi)

outT .∑

e∈δ−((si)out0 )

xi[e] =∑

e∈δ+((gi)outT )

xi[e] = 1 ∀i ∈ [M ].

xi[e] ∈ {0, 1} ∀i ∈ [M ],∀e ∈ E .

An optimal solution can thus be found by starting with a lower bound on T and iteratively

checking for increasing values of T whether a feasible integer multi-commodity flow of M units

exists in the corresponding T -step time-expanded flow network (which is an NP-hard problem),

until an upper bound on T is reached (such as the one provided in Yu and Rus (2015)). One can

use the maximum over i ∈ [M ] of the length of a shortest path from si to gi in G as the lower

bound. Each T -step time-expanded flow network is encoded as an ILP in the above way, which is

then solved with an ILP solver. A feasible integer multi-commodity flow for the smallest value

of T corresponds to a MAPF solution with the smallest makespan. More details on the ILP-

based MAPF algorithm, including an analysis of its properties, can be found in Yu and LaValle

(2013b).

54

3.3 One-Shot Path Planning with Kinematic Constraints

In our recent research (Honig, Kumar, Cohen, et al., 2016; Honig, Kumar, Ma, et al., 2016),

we have helped to develop a post-processing procedure (not covered as a contribution of this

dissertation) called MAPF-POST. MAPF-POST is part of Honig’s dissertation (Honig, 2019).

MAPF-POST makes use of a Simple Temporal Network (STN) to postprocess a MAPF solution

in polynomial time to create a plan-execution schedule that takes kinematic constraints of real-

world agents into account. Specifically, this plan-execution schedule works for different types

of real-world agents, takes their maximum and minimum velocities into account, and provides a

guaranteed safety distance between them.

MAPF-POST assumes that (1) each agent traverses each edge with constant velocity and

stays at each location (vertex along the path of the agent in the given MAPF solution) only for an

instant before it reaches its target and (2) the velocity of each agent can change instantaneously

at each location. These assumptions can be approximated by controllers of real-world agents.

Given a MAPF solution on graph G = (V,E), MAPF-POST constructs an STN, which is

a directed acyclic graph G = (V, E). Each vertex v ∈ V represents an event corresponding

to an agent ai entering a location πi(t) ∈ V given by its path in the MAPF solution. Each

directed edge (u, v) ∈ E , labeled by an STN bound [LB(u, v), UB(u, v)], represents a temporal

constraint between events u and v indicating that event u must be scheduled between LB(u, v)

and UB(u, v) time units before event v. The STN imposes two types of temporal constraints

between events as dictated by the MAPF solution. Type 1: For each agent ai, it enters locations

in the order given by its path in the MAPF solution, namely ai enters πi(t) before it enters πi(t′)

for every consecutive locations (πi(t) 6= πi(t′)) with t < t′. The STN bound of such a Type 1

constraint is obtained by using the length of the edge connecting the locations and velocity limits

of the agent. Type 2: For each pair of agents ai and aj and each location v = πi(t) = πj(t′) ∈ G

that they both enter, they enter the location in the order given by their paths in the MAPF solution,

namely ai enters v before aj enters v assuming t < t′ without loss of generality. The STN

bound of such a Type 2 constraint is [0,∞). In effect, the MAPF solution discretizes time

55

and specifies a total order among the events. The STN, however, does not discretize time and

specifies only a partial order among the events, which provides it with the flexibility to take

kinematic constraints of real-world agents into account. If a non-zero safety distance δ is to

be enforced between the agents, the STN also constructs additional events representing agents

entering auxiliary locations, that are additional locations along edges (and between vertices) of

graph G, and impose extra temporal constraints for these events.

Definition 3.3 (Honig (2019)). A plan-execution schedule assigns a real-valued execution time

to each event, corresponding to an entry time for each location. Real-world agents that execute

a plan-execution schedule enter all locations at these entry times.

MAPF-POST either solves a linear program or uses the Bellman-Ford algorithm (Bellman,

1958; Ford Jr & Fulkerson, 2015) to compute a plan-execution schedule that assigns (contin-

uous) execution times t(v) to all events v that satisfy all the temporal constraints. For graph

G with unit-cost edges and a user-specified parameter δ ∈ (0, 1), the resulting plan-execution

schedule guarantees that the distance between any two agents is at least δ on graphG and that the

Euclidean distance between (the centers of) any two circular real-world agents is at least δ/√

2

if graph G is a 2D 4-neighbor grid (Honig, 2019).

More details on MAPF-POST, including an analysis of its properties, can be found in Honig

(2019).

3.4 Other One-Shot and Long-Term Target-Assignment and Path-

Planning Problems

We now survey existing research on other one-shot and long-term target-assignment and path-

planning problems.

56

3.4.1 One-Shot Target-Assignment Problem

In the one-shot target-assignment problem,M targets need to be assigned to a team ofM agents.

The cost of assigning each target to each agent is given, often capturing the time it takes the agent

to move from its current location to the target. An assignment is a one-to-one mapping from

the targets to the agents. Many results are known for the classic assignment problem (Kuhn,

1955). The problem is to find an assignment that minimizes the flowtime, namely the total

cost. This problem is also called the weighted bipartite matching problem. It can be solved in

polynomial time using the Hungarian algorithm (Kuhn, 1955), via an economic model (Shapley

& Shubik, 1971), or using a distributed auction-based algorithm (Bertsekas, 1992). There is also

research on the linear bottleneck assignment problem (Burkard, Dell’Amico, & Martello, 2009;

Fulkerson et al., 1953; Gross, 1959). The problem aims to find an assignment that minimizes the

makespan, namely the maximum cost. It can be solved optimally in polynomial time by using

a flow algorithm or the Hungarian algorithm as a subroutine (Derigs & Zimmermann, 1978;

Garfinkel, 1971).

3.4.2 One-Shot Combined Target-Assignment and Path-Planning Problem for

One Team of Agents: Anonymous MAPF

The special case of the one-shot combined target-assignment and path-planning problem for one

team of agents has been studied as Anonymous MAPF, also known as Permutation-Invariant

MAPF or Unlabeled MAPF (see Section 2.2). We recall that, in Anonymous MAPF, M targets

and a team of M anonymous agents are given. The problem is to assign the targets to the

agents and find collision-free paths for each agent from its given start vertex to its assigned

target such that each target is assigned to a unique agent and is occupied by the agent in the

end. The objective is to minimize the makespan. Anonymous MAPF can be solved optimally in

polynomial time by using a reduction to the max-flow problem on the time-expanded network

(Yu & LaValle, 2013a), similar to the reduction from MAPF to the integer multi-commodity

flow problem as described in Section 3.2.3. The reduction is as follows. Given an Anonymous

57

MAPF problem instance on undirected graph G = (V,E) and a limit T on the number of

time steps, we construct a T -step time-expanded flow network N = (V, E) in the same way as

described in Section 3.2.3.1 with the following change: There is only a single commodity type.

There are a supply of one unit of this commodity type at vertex (si)out0 and a demand of one

unit of this commodity type at vertex (gi)outT for all i ∈ [M ]. The integer multi-commodity

flow problem thus becomes a regular feasible circulation problem, which is easily converted

to a maximum flow problem. Since all supply and demand values are one, any polynomial or

pseudopolynomial time algorithm for maximum flow determines the feasibility of Anonymous

MAPF for any particular T in polynomial time. Since the optimal makespan of any Anonymous

MAPF problem instance is bounded from above by M + |V | − 1 (Yu & LaValle, 2013a), binary

search on T thus solves Anonymous MAPF optimally for makespan minimization in polynomial

time.

3.4.3 Long-Term Target-Assignment Problem

In the long-term target-assignment problem, targets are constantly added to the system at un-

known times and need to be reached by M agents. Each agent might need to get assigned a

new target after it reaches its current target. The problem has been well studied as different

versions of Online Multi-Robot Task Allocation (Gerkey & Mataric, 2004; Korsah, Stentz, &

Dias, 2013) or Cooperative Multi-Agent Tracking (Parker, 1999) in the robotics community and

Online Weighted Matching (Kalyanasundaram & Pruhs, 1993) in the theoretical computer sci-

ence community. It is often solved by utility-based greedy algorithms (Werger & Mataric, 2000)

or a reduction to a sequence of one-shot target-assignment problems (Gerkey & Mataric, 2002;

Khuller et al., 1994).

3.5 Summary

In this chapter, we provided an overview of MAPF, which is used as an essential element for

one-shot and long-term target-assignment and path-planning problems in general. We provided

58

detailed descriptions of three MAPF algorithms that inspire the design of some of the target-

assignment and path-planning algorithms in the later chapters. We described how MAPF so-

lutions can be transformed into plan-execution schedules in a post-processing step by using

MAPF-POST, which is also used for TAPF in Chapter 5 and MAPD in Chapter 7. We also

surveyed existing research on other one-shot and long-term target-assignment and path-planning

problems.

59

Chapter 4

Theoretical Analysis of Target Assignment and Path

Planning

In this chapter, we present the first major contribution of this dissertation. Specifically, we intro-

duce a unified NP-hardness proof structure that can be used to derive computational complexity

results for different MAPF variants that model the one-shot and the long-term coordination of

autonomous target-assignment and path-planning operations of teams of agents. For example,

this unified NP-hardness proof structure can be used to derive new fixed-parameter inapproxima-

bility results for the one-shot and the long-term combined target-assignment and path-planning

problems TAPF and MAPD. This unified NP-hardness proof structure stems from formalizing

and studying a new variant of MAPF, called Package-Exchange Robot Routing (PERR), and its

generalization K-PERR. Therefore, these results validate the hypothesis that formalizing and

studying new variants of MAPF can result in new theoretical insights into the one-shot and the

long-term combined target-assignment and path-planning problems for teams of agents. We

follow our formalization of different problems in Chapter 2.

This chapter is based on Ma, H., Tovey, C., Sharon, G., Kumar, T. K. S., & Koenig, S. (2016). Multi-agent pathfinding with payload transfers and the package-exchange robot-routing problem. In AAAI Conference on ArtificialIntelligence (pp. 3166–3173). The reductions from ≤3,=3-SAT to PERR and from 2/2/3-SAT to 2-PERR arecontributions of Craig Tovey.

60

The remainder of this chapter is structured as follows. In Section 4.1, we reiterate the mo-

tivation behind introducing our unified NP-hardness proof structure by studying PERR. In Sec-

tion 4.2, we introduce our unified NP-hardness proof structure and derive fixed-parameter in-

approximability results for PERR. In Section 4.3, we demonstrate how our unified NP-hardness

proof structure can also provide fixed-parameter inapproximability results for other one-shot and

long-term target-assignment and path-planning problems. The above results lay the theoretical

foundation of this dissertation. We then present other results from our study of PERR: In Sec-

tion 4.4, we show that, unlike MAPF, PERR is always solvable. In Section 4.5, we describe

how MAPF algorithms can be adapted to solving PERR by viewing it as a relaxation of MAPF

and also provide theoretical results for the special case of 1-PERR. Finally, we summarize the

contributions of this chapter in Section 4.6.

4.1 Introduction

In this chapter, we study the computational complexity of optimally solving one-shot and long-

term target-assignment and path-planning problems. From the survey in Section 3.1.1, we

learned that many theoretical results are known for the one-shot path-planning problem MAPF.

The solvability of a MAPF problem instance can be determined and a solution to the MAPF prob-

lem instance, if one exists, can be found in polynomial time but with a makespan of O(|V |3)

(Kornhauser et al., 1984; Roger & Helmert, 2012; Yu & Rus, 2015). However, it is NP-hard

to solve MAPF optimally (Goldreich, 2011; Ratner & Warmuth, 1986; Surynek, 2010; Yu &

LaValle, 2013c). Many theoretical results are also known for the one-shot and the long-term

target-assignment problems. Special cases of the one-shot target-assignment problem for one

team of agents, including the classic assignment problem (Bertsekas, 1992; Kuhn, 1955; Shap-

ley & Shubik, 1971) and the bottleneck assignment problem (Fulkerson et al., 1953; Garfinkel,

1971; Gross, 1959), can be solved optimally in polynomial time. The special case of the one-shot

combined target-assignment and path-planning problem for one team of agents, namely Anony-

mous MAPF (see Section 2.2), can be solved optimally in polynomial time (Yu & LaValle,

61

2013a). But the long-term target-assignment problem for multiple agents is NP-hard in gen-

eral (Brucker, 2010). However, none of the existing results or NP-hardness proof structures can

be directly applied to the general case of either the one-shot or the long-term version of the

combined target-assignment and path-planning problem for multiple teams of agents.

Therefore, we introduce a unified NP-hardness proof structure, stemming from formalizing

and studying a new variant of MAPF, called PERR, and its generalization K-PERR (see Sec-

tion 2.4). In PERR, each package is preassigned to an agent at a given start vertex and needs

to be delivered to a given goal vertex (preassigned target) in a given graph. Each agent carries

one package. Packages can be reassigned to agents in a proactive way—two agents in adja-

cent vertices can exchange their packages, and thus exchanging their goal vertices. We use our

unified NP-hardness proof structure to establish a reduction from an NP-complete version of the

Boolean satisfiability problem, called≤3,=3-SAT (Tovey, 1984), to PERR and prove that PERR

is NP-hard to approximate within any constant factor less than 4/3.

Studying PERR is inspired by practical applications, including ride-sharing (or taxis) with

passenger transfers (Coltin & Veloso, 2014) and office robots with package exchanges (Veloso

et al., 2015). But, more importantly, it lays the theoretical foundation for studying general multi-

agent coordination problems, including MAPF and the one-shot and the long-term combined

target-assignment and path-planning problems TAPF (see Section 2.3) and MAPD (see Sec-

tion 2.5). For example, we demonstrate how our unified NP-hardness proof structure, with a

similar reduction from ≤3,=3-SAT, can be directly applied to proving the NP-hardness of opti-

mally solving MAPF and MAPD. We also demonstrate how to derive similar NP-hardness results

for K-PERR and generalize them to TAPF by using our unified NP-hardness proof structure to

establish a reduction from a newly constructed NP-complete version of the Boolean satisfiabil-

ity problem, called 2/2/3-SAT. Notably, we present the first fixed-parameter inapproximability

results for both MAPF and TAPF.

We also present other results from our study of PERR. We prove that all PERR and K-

PERR problem instances are solvable. We demonstrate how optimal MAPF algorithms, such as

CBS (see Section 3.2.2) and the ILP-based MAPF algorithm (see Section 3.2.3), can be adapted

62

to solving PERR optimally by viewing PERR as a relaxation of MAPF that permits exchange

operations of packages.

4.2 Unified NP-Hardness Proof Structure and Intractability of

PERR

We now describe our unified NP-hardness proof structure that can be used to establish reduc-

tions NP-complete versions of the Boolean satisfiability problem to target-assignment and path-

planning problems. Our unified NP-hardness proof structure requires an NP-complete version

of the Boolean satisfiability problem whose Boolean variables each appear in at most a constant

number of its disjunctive clauses. Moreover, the total number of times that each variable can ap-

pear uncomplemented or complemented affects the optimal makespan of the target-assignment

and path-planning problem instance resulting from the reduction and thus the inapproximability

ratio for makespan minimization. A reduction from the general version of the Boolean satis-

fiability problem or the 3-satisfiability problem, where the number of times that each variable

can appear is not bounded, thus does not yield constant-factor inapproximability results with our

unified NP-hardness proof structure.

As an example, we now prove that PERR is NP-hard to approximate within any constant fac-

tor less than 4/3 for makespan minimization by reducing an NP-complete version of the Boolean

satisfiability problem, called ≤3,=3-SAT (Tovey, 1984), to PERR. We then derive a corollary

for a version of PERR that minimizes the flowtime: PERR is NP-hard to solve optimally for

flowtime minimization.

A ≤3,=3-SAT problem instance consists of N Boolean variables X1, . . . , XN and M dis-

junctive clauses C1, . . . , CM. Each variable appears in exactly three clauses, uncomplemented

at least once and complemented at least once. Each clause contains at most three literals. Its

decision question is as follows: Is there an assignment of True or False to the variables that

satisfies the problem instance? ≤3,=3-SAT is known to be NP-complete (Tovey, 1984).

63

Theorem 4.1. For any ε > 0, it is NP-hard to find a 4/3− ε approximate solution to PERR for

makespan minimization.

Proof. We construct a PERR problem instance that has a solution with makespan three if and

only if a given ≤3,=3-SAT problem instance is satisfiable.

For each variable Xi in the ≤3,=3-SAT problem instance, we construct two “literal” packages,

piT and piF , with start vertices siT and siF and goal vertices tiT and tiF , respectively. For

each literal package, we construct two paths to get to its goal vertex in three time steps: a

“shared” path, namely 〈siT , uiT , vi, tiT 〉 for piT and 〈siF , uiF , vi, tiF 〉 for piF , and a “private”

path, namely 〈siT , wiT , xiT , tiT 〉 for piT and 〈siF , wiF , xiF , tiF 〉 for piF . The shared paths for

piT and piF intersect at vertex vi. Only one of the two paths can thus be used if a makespan of

three is to be achieved. Sending literal package piT (or piF ) along the shared path corresponds

to assigning True (or False) to Xi in the ≤3,=3-SAT problem instance.

For each clause Cj in the ≤3,=3-SAT problem instance, we construct a “clause” package pj

with start vertex cj and goal vertex dj . It has multiple (but at most three) “clause” paths to get

to its goal vertex in three time steps, which have a one-to-one correspondence to the literals in

Cj . Every literal Xi (or Xi) can appear in at most two clauses. If Cj is the first clause that

it appears in, then the clause path is 〈cj , wiT , bj , dj〉 (or 〈cj , wiF , bj , dj〉). If Cj is the second

clause that it appears in, a vertex αj is introduced and the clause path is instead 〈cj , αj , xiT , dj〉

(or 〈cj , αj , xiF , dj〉). The clause path of each Cj with respect to any literal in that clause and the

private path of the literal intersect. Only one of the two paths can thus be used if a makespan of

three is to be achieved.

Suppose that a satisfying assignment to the ≤3,=3-SAT problem instance exists. Then, a so-

lution with makespan three is obtained by sending literal packages of true literals along their

shared paths, the other literal packages along their private paths, and clause packages along the

clause paths corresponding to one of the true literals in those clauses.

Conversely, suppose that a solution with makespan three exists. Then, each clause package tra-

verses the clause path corresponding to one of the literals in that clause, and the corresponding

64

s1T

w1T u1T

x1T

t1T

s1F

u1F w1F

x1F

t1F

v1

c1

b1

d1

c2

α2

b2

d2

s2T s2F

w2T u2T u2F w2F

x2T v2 x2F

t2T t2F

c3

α3

b3

d3

s3T s3F

w3T u3T u3F w3F

x3T v3 x3F

t3T t3F

Figure 4.1: Example of the reduction from a ≤3,=3-SAT problem instance to a PERR probleminstance.

literal package traverses its shared path. Since the packages of a literal and its complement can-

not both use their shared path if a makespan of three is to be achieved, we can assign True to

every literal whose package uses its shared path without assigning True to both the uncomple-

mented and complemented literals. If the packages of both literals use their private paths, we

can assign True to any one of the literals and False to the other one. A solution to the PERR

problem instance with makespan three thus yields a satisfying assignment to the ≤3,=3-SAT

problem instance.

To summarize, the PERR problem instance has a solution with makespan three if and only if

the ≤3,=3-SAT problem instance is satisfiable. Also, the PERR problem instance cannot have

a solution with makespan smaller than three and always has a solution with makespan four,

even if the ≤3,=3-SAT problem instance is unsatisfiable. For any ε > 0, any approximation

algorithm for PERR with ratio 4/3− ε thus computes a solution with makespan three whenever

the ≤3,=3-SAT problem instance is satisfiable and therefore solves ≤3,=3-SAT.

65

Figure 4.1 shows a PERR problem instance reduced from the ≤3,=3-SAT problem instance

(X1 ∨ X2 ∨ X3) ∧ (X1 ∨ X2 ∨ X3) ∧ (X1 ∨ X2 ∨ X3). Clause C1 is the first clause that

literal X1 appears in. The corresponding clause path is 〈c1, w1T , b1, d1〉. Since clause C2 is

the second clause that X2 appears in, vertex α2 is introduced. The corresponding clause path is

〈c2, α2, x2T , d2〉. The blue (directed) edges represent one optimal solution to the PERR problem

instance of makespan three, which corresponds to the satisfying assignment (X1, X2, X3) =

(False,True,True).

The PERR problem instance in the proof of Theorem 4.1 has the property that the length of

every path from a start vertex to the corresponding goal vertex is at least three. Therefore, if the

makespan is three, then every package is delivered in exactly three time steps and the flowtime

is 3M (= 3(2N + M)). Moreover, if the makespan exceeds three, then the flowtime exceeds

3M , yielding the following corollary:

Corollary 4.2. It is NP-hard to find an optimal solution to PERR for flowtime minimization.

4.3 Other Target-Assignment and Path-Planning Problems

We now demonstrate how the unified NP-hardness proof structure introduced in Section 4.2

can be used to derive computational complexity results for other target-assignment and path-

planning problems. A key insight is that the unified NP-hardness proof structure does not require

exchange operations since any exchange operation would move one package further away from

its goal vertex and thus not result in a smaller makespan.

4.3.1 Complexity Results for MAPF

Our unified NP-hardness proof structure applies to MAPF (see Section 2.1). Specifically, we

first construct a PERR problem instance for a given ≤3,=3-SAT problem instance as described

in the proof of Theorem 4.1. We then obtain a MAPF problem instance that corresponds to the

resulting PERR problem instance as described in Section 2.4.1. Clearly, the MAPF problem

instance has a solution with makespan three if and only if the ≤3,=3-SAT problem instance is

66

satisfiable. Also, the MAPF problem instance cannot have a solution with makespan smaller than

three and always has a solution with makespan four, even if the ≤3,=3-SAT problem instance

is unsatisfiable. If the makespan is three, then the flowtime is 3M . Moreover, if the makespan

exceeds three, then the flowtime exceeds 3M . This yields the following theorem1 and corollary:

Theorem 4.3. For any ε > 0, it is NP-hard to find a 4/3− ε approximate solution to MAPF for


Corollary 4.4. It is NP-hard to find an optimal solution to MAPF for flowtime minimization.

4.3.2 Complexity Results for MAPD

Our unified NP-hardness proof structure applies to MAPD (see Section 2.5). Specifically, we

also reduce from ≤3,=3-SAT and use a similar construction to that for PERR and MAPF.

Theorem 4.5. For any ε > 0, it is NP-hard to find a 4/3− ε approximate solution to MAPD for


Proof. We use a similar reduction to the one in the proof of Theorem 4.1, but from ≤3,=3-

SAT to MAPD. Specifically, we construct a MAPD problem instance that has a solution with

makespan three if and only if a given ≤3,=3-SAT problem instance is satisfiable.

We point out the differences in the construction: For each variable Xi in the ≤3,=3-SAT prob-

lem instance, we construct two “literal” agents, aiT and aiF , with initial vertices siT and siF ,

respectively, and two tasks, τiT and τiF , with pickup vertices siT and siF and delivery vertices

tiT and tiF , respectively, that are added to the system at time step 0. For each clause Cj in the

≤3,=3-SAT problem instance, we construct a “clause” agent aj with initial vertex cj and a task

τj with pickup vertex cj and delivery vertex dj that is added to the system at time step 0. There-

fore, an optimal solution must assign every task to the agent whose initial vertex is the pickup

vertex of the task at time step 0 and let the agent execute the task.

1Theorem 4.3 improves the state-of-the-art NP-hardness result of MAPF for makespan minimization (Surynek,2010), since it shows not only the NP-hardness of optimally solving MAPF but also the NP-hardness of constant-factor approximation. The previous NP-hardness proof structures for MAPF by Surynek (2010) and Yu and LaValle(2013c) do not transfer to PERR and other MAPF variants.

67

To summarize, using the same arguments as in the proof of Theorem 4.1, the MAPD problem

instance has a solution with makespan three if and only if the ≤3,=3-SAT problem instance is

satisfiable. Also, the MAPD problem instance cannot have a solution with makespan smaller

than three and always has a solution with makespan four, even if the ≤3,=3-SAT problem in-

stance is unsatisfiable. For any ε > 0, any approximation algorithm for MAPD with ratio 4/3−ε

thus computes a solution with makespan three whenever the≤3,=3-SAT problem instance is sat-

isfiable and therefore solves ≤3,=3-SAT.

Similar to the PERR problem instance constructed in the proof of Theorem 4.1, the MAPD

problem instance in the proof of Theorem 4.5 has the property that, if the makespan is three, then

every task is finished in exactly three time steps and the service time is three. Moreover, if the

makespan exceeds three, then the service time exceeds three, yielding the following corollary:

Corollary 4.6. It is NP-hard to find an optimal solution to MAPD for service time minimization.

These results hold even for an offline version of MAPD where all tasks are known a priori.

4.3.3 Complexity Results forK-PERR

Our unified NP-hardness proof structure also applies to K-PERR (see Section 2.4.3), where

packages are partitioned into types.

The construction in the proof of Theorem 4.1 almost applies in case all literal packages are

of the same type and all clause packages are of the same type—but not quite since, if clause Ck

is the first clause that literal Xi appears in and clause Cj is the second such clause, then clause

package pk could travel from its start vertex ck along path 〈ck, wiT , xiT , dj〉 of length three to

goal vertex dj . For example, in Figure 4.1, clause package p1 could travel from its start vertex

c1 along path 〈c1, w1T , x1T , d3〉 of length three to goal vertex d3. Therefore, a solution to the

PERR problem instance with makespan three does not necessarily yield a satisfying assignment

for the ≤3,=3-SAT problem instance.

68

Our unified NP-hardness proof structure thus requires another NP-complete version of the

Boolean satisfiability problem for K-PERR. This version of the Boolean satisfiability problem

forces any clause package pj to reach goal vertex dj if a makespan of three is to be achieved.

Specifically, we now prove that even 2-PERR is NP-hard to approximate within any constant

factor less than 4/3 for makespan minimization, by using our unified NP-hardness proof struc-

ture to establish a reduction from an NP-complete version of the Boolean satisfiability problem,

called 2/2/3-SAT, to 2-PERR. We show a corollary for the flowtime objective: 2-PERR is NP-

hard to solve optimally for flowtime minimization. We then generalize the results to K-PERR

for other values of K.

A 2/2/3-SAT problem instance consists of N Boolean variables X1, . . . , XN and M dis-

junctive clauses C1, . . . , CM. Each variable appears complemented in one clause of size two,

appears uncomplemented in one clause of size two and appears a third time in a clause of size

three. Its decision question is as follows: Is there an assignment of True or False to the variables

that satisfies the problem instance?

We first prove the following Lemma:

Lemma 4.7. 2/2/3-SAT is NP-complete.

Proof. 2/2/3-SAT is clearly in NP. The 3-satisfiability problem is NP-complete and can be re-

duced to 2/2/3-SAT as follows, similar to Tovey (1984): Given a standard 3-satisfiability prob-

lem instance with variables Yi and exactly three literals per clause, we delete all clauses that

contain a variable that does not appear in any other clause. Then, we consider each remaining

variable Yi in turn. Let Ki > 1 be the number of literals that it occurs in. We replace the κ-th

occurrence of variable Yi by a new variable Xi,κ, that is, replace literal Yi (or Y i) with literal

Xi,κ (or Xi,κ). Then, we append the following clauses of two literals each to the constructed

problem instance:(∧Ki−1

k=1 (Xi,κ ∨Xi,κ+1))∧(Xi,Ki ∨Xi,1). The clauseXi,κ∨Xi,κ+1 implies

that Xi,κ+1 must be false if Xi,κ is false and that Xi,κ must be true if Xi,κ+1 is true. The cyclic

structure of the clauses therefore forces all Xi,1, . . . , Xi,Ki to have the same truth value. There-

fore, the constructed problem instance is satisfiable if and only if the original 3-satisfiability

69

problem instance is satisfiable. Each Xi,κ appears complemented in one clause of size two, ap-

pears uncomplemented in one clause of size two and appears a third time in a clause of size

three. Moreover, the transformation requires only polynomial time.

We now use our unified NP-hardness proof structure to reduce 2/2/3-SAT to 2-PERR with

a construction similar to that in the proof of Theorem 4.1. The only difference is that a clause

Ck of size three has paths of length three only in the form 〈ck, αk, xiT /xiF , dk〉 (via the private

paths of the literalsXi it contains) and thus must send its package to goal vertex dk if a makespan

of three is to be achieved. This prevents any clause of size two from sending its package to dk.

Also, any clause of size two has no paths of length three along which it can send its package

to the goal vertex of another clause of size two and thus must send its package to its own goal

vertex if a makespan of three is to be achieved. We now prove the theorem formally.

Theorem 4.8. For any ε > 0, it is NP-hard to find a 4/3 − ε approximate solution to 2-PERR

for makespan minimization.

Proof. We construct a 2-PERR problem instance that has a solution with makespan three if and

only if a given 2/2/3-SAT problem instance is satisfiable. Figure 4.2 shows an example. Then,

the remainder of the proof of Theorem 4.1 applies without change.

We follow the construction in the proof of Theorem 4.1, with the exception of making ev-

ery clause path for clauses cj of size two of the form 〈cj , wiT /wiF , bj , dj〉 (vertex αj is

not introduced in this case) and every clause path for clauses ck of size three of the form

〈ck, αk, xiT /xiF , dk〉. We distinguish only two package types, namely literal and clause pack-

ages. Every vertex siT and siF (or tiT and tiF ) is a start (or goal) vertex of a literal package, and

every vertex cj (or dj) is a start (or goal) vertex of a clause package.

Suppose that a satisfying assignment to the 2/2/3-SAT problem instance exists. Then, a solution

with makespan three is obtained, as in the proof of Theorem 4.1, by sending literal packages of

true literals along their shared paths, the other literal packages along their private paths, and

clause packages along the clause paths corresponding to one of the true literals in those clauses.

70

Conversely, suppose that a solution with makespan three exists. Then, the only paths of length

three from start vertex siT or siF to a goal vertex of a literal package are the shared or private

paths that end at goal vertex tiT or tiF . The shared paths of packages piT and piF intersect at

their penultimate vertices. Since the two packages cannot occupy the same vertex at time step

t = 2, at most one of them can traverse its shared path if a makespan of three is to be achieved.

Package piT cannot arrive at tiF since then piF has no path of length three to a goal vertex of a

literal available, and vice versa for piF . Now consider an arbitrary clause Ck of size three. The

only paths of length three from start vertex ck to a goal vertex of a clause package have the form

〈ck, αk, xiT /xiF , dk〉. Clause package pk thus must arrive at goal vertex dk. Finally, consider

an arbitrary clause Cj of size two. The only paths of length three from start vertex cj to a goal

vertex of a clause package have the form 〈cj , wiT /wiF , bj , dj〉 or 〈cj , wiT /wiF , xiT /xiF , dk′〉,

where the clause Ck′ of size three shares a literal with clause Cj . Paths of the latter form cannot

be used because goal vertex dk′ must receive clause package pk′ . Clause package pj thus must

arrive at goal vertex dj . The situation is now identical to that in the proof of Theorem 4.1 because

every package piT (or piF , pj , or pk) travels from its start vertex siT (or siF , cj , or ck) to goal

vertex tiT (or tiF , dj , or dk). A solution to the 2-PERR problem instance with makespan three

thus yields a satisfying assignment to the 2/2/3-SAT problem instance when assigning True to

every literal whose package uses its shared path, as explained in the proof of Theorem 4.1.

Figure 4.2 shows a 2-PERR problem instance reduced from the 2/2/3-SAT problem in-

stance (X1 ∨X2) ∧ (X1 ∨X3) ∧ (X2 ∨X3) ∧ (X1 ∨X2 ∨X3). Consider any solution with

makespan three. The only paths of length three from start vertex c4 to a goal vertex of a clause

package have the form 〈c4, α4, xiT /xiF , d4〉. Clause package p4 thus must arrive at goal vertex

d4. The only paths of length three from start vertex c1 to a goal vertex of a clause package

have the form 〈c1, wiT /wiF , b1, d1〉 or 〈c1, w1T , x1T , d4〉. Paths of the latter form cannot be

used because d4 must receive p4. Clause package p1 thus must arrive at goal vertex d1. The

colored (directed) edges represent one optimal solution to the 2-PERR problem instance with

makespan three, which yields the satisfying assignment (X1, X2, X3) = (True,True,True). The

71

s1T

w1T u1T

x1T

t1T

s1F

u1F w1F

x1F

t1F

v1

c1

b1

d1

c2

b2

d2

s2T s2F

w2T u2T u2F w2F

x2T v2 x2F

t2T t2F

c3

b3

d3

s3T s3F

w3T u3T u3F w3F

x3T v3 x3F

t3T t3F

c4

α4

d4

Figure 4.2: Example of the reduction from a 2/2/3-SAT problem instance to a 2-PERR probleminstance.

blue edges represent paths of the literal packages, and the green edges represent paths of the

clause packages.

The argument for proving Corollary 4.2 yields the following corollary:

Corollary 4.9. It is NP-hard to find an optimal solution to 2-PERR for flowtime minimization.

Theorem 4.8 and Corollary 4.9 hold not only for 2-PERR but also for K-PERR for all K =

3, . . . ,M because one can pad the graph of the constructed 2-PERR problem instance with

additional vertices, each being both the start and goal vertex of a package of a different type.

This padding constructs a K-PERR problem instance for any K = 3, . . . ,M but leaves the

proof unchanged.

Theorem 4.10. For any ε > 0 and K > 1, it is NP-hard to find a 4/3− ε approximate solution

to K-PERR for makespan minimization.

Corollary 4.11. For any K > 1, it is NP-hard to find an optimal solution to K-PERR for

flowtime minimization.

72

4.3.4 Complexity Results for TAPF

Our unified NP-hardness proof structure also applies to TAPF (see Section 2.3). Specifically, we

first construct a 2-PERR problem instance for a given 2/2/3-SAT problem instance as described

in the proof of Theorem 4.8. We then obtain a TAPF problem instance with two teams of agents

that corresponds to the resulting 2-PERR problem instance as described in Section 2.4.3. The

TAPF problem instance has a solution with makespan three if and only if the 2/2/3-SAT problem

instance is satisfiable. Also, the TAPF problem instance cannot have a solution with makespan

smaller than three and always has a solution with makespan four, even if the 2/2/3-SAT problem

instance is unsatisfiable. This yields the following theorem:

Theorem 4.12. For any ε > 0, it is NP-hard to find a 4/3 − ε approximate solution to TAPF

with two teams of agents for makespan minimization.

Using the same argument for generalizing Theorem 4.8 to Theorem 4.10, we generalize

Theorem 4.12 to the theorem below:

Theorem 4.13. For any ε > 0 and K > 1, it is NP-hard to find a 4/3− ε approximate solution

to TAPF with K teams of agents for makespan minimization.

4.3.5 Additional Generalizations

Our unified NP-hardness proof structure also applies to many other variants of PERR, including

but not limited to cases where

1. packages and agents disappear upon delivery;

2. agents can carry more than one package; or

3. agents exchange packages more slowly or more quickly than moving along an edge.

Recent research (Ma et al., 2018a) also uses the unified NP-hardness proof structure to derive

similar NP-hardness results for a MAPF variant that maximizes the number of agents that can

reach their given goal vertices within a given deadline (see also Section 3.1.3.1).

73

Our unified NP-hardness proof structure can potentially establish reductions from other NP-

complete versions of the Boolean satisfiability problem to different target-assignment and path-

planning problems, which could result in different inapproximability ratios for them. For exam-

ple, the unified NP-hardness proof structure can potentially replace≤3,=3-SAT with the general

version of the Boolean satisfiability problem for PERR by adding an appropriate number of ver-

tices for each clause and making the shared and private paths of the variables longer in the

construction. This reduction, however, yields only an NP-hardness result but no inapproxima-

bility result for makespan minimization because the optimal makespan of the PERR problem

instance resulting from the construction is not a constant.

4.4 Feasibility of PERR

In Section 2.4.1, we established the correspondence between a PERR problem instance and a

MAPF problem instance on the same graph. We now show one of the key differences between

the combinatorial structures of PERR and MAPF.

We consider both the standard objective of minimizing the makespan and the objective of

minimizing the flowtime for both PERR and MAPF. Figures 4.3(a) and 4.3(b) illustrate that

exchange operations can make a PERR problem instance solvable or improve the makespan and

flowtime by a large factor in contrast to its corresponding MAPF problem instance that does

not permit exchange operations. In particular, Figure 4.3(a) shows that some MAPF problem

instances (with no exchange operations permitted) are unsolvable. The following theorem shows

that, on the contrary, all PERR problem instances are solvable:

Theorem 4.14. All PERR problem instances are solvable. Solutions with polynomial

makespans and flowtimes can be found in polynomial time.

Proof. Without loss of generality, we assume that all vertices are initially occupied by agents

carrying packages. (In the following, if neither of two adjacent vertices is occupied by an agent,

then an exchange operation involving them is replaced by a no-op operation. If only one of them

is occupied by an agent, then the exchange operation is replaced by the agent moving to the

74

p1 p2

s1 s2d2 d1

(a) A PERR problem instance on a graph with twovertices. Two agents carrying packages p1 and p2have to deliver them from their start vertices s1 ands2 to their goal vertices g1 and g2, respectively.They can exchange their packages (indicated bythe dashed gray edge). If exchange operations arenot allowed (as in the corresponding MAPF prob-lem instance), then no solution exists.

p1 p2

Long Distance

s1 s2d2 d1

(b) A PERR problem instance on a cyclic graph.Two agents carrying packages p1 and p2 haveto deliver them from their start vertices s1 ands2 to their goal vertices g1 and g2, respectively.They can exchange their packages (indicated bythe dashed gray edge). If exchange operations arenot allowed (as in the corresponding MAPF prob-lem instance), then one agent must take the longdetour.

Figure 4.3: Motivating examples of PERR that demonstrate the power of exchange operations.

adjacent vertex.) Then, any two packages can switch vertices without affecting the vertices of

the other packages. To see this, consider two packages pi and pj and their current vertices u and

v, respectively. Let 〈u, . . . , w, v〉 be a shortest path from u to v, where w is the vertex on the

path directly before v. A series of exchange operations along this path moves pi to v and every

other package on this path against the path one edge closer to u (in at most |V | − 1 time steps).

In particular, it moves pj to w. A series of exchange operations against the path 〈u, . . . , w〉 then

moves pj to u and every other package on this path back to its original vertex (in at most |V |− 2

time steps), hence proving the claim. This property allows one to route all packages to their goal

75

p4 p3 p2 p1

v w u

(a) Moving package p1 from vertex u to vertex v.

p1 p4 p3 p2

v w u

(b) Moving package p4 from vertex w to vertex u.

p1 p3 p2 p4

v w u

(c) Resulting positions of the packages.

Figure 4.4: Example of the construction used in the proof of Theorem 4.14.

vertices one at a time. Our algorithm performs only a polynomial number of operations, which

implies that the makespans and flowtimes of the resulting solutions are also polynomial.

Figure 4.4 shows an example of the construction used in the above proof of Theorem 4.14:

Package p1 can be moved from vertex u to vertex v via a series of exchange operations along

the path 〈u, . . . , v〉. Then, package p4 can be moved from vertex w to u via a series of exchange

operations along the path 〈w, . . . , u〉, restoring the original positions of packages p2 and p3.

Letting any two packages switch vertices, as done in the proof of Theorem 4.14, takes at

most (|V | − 1) + (|V | − 2) = 2|V | − 3 time steps. This operation needs to be repeated at most

M − 1 times since one additional package reaches its goal vertex each time. An upper bound on

the makespan of the resulting solution is thus

Umakespan = (M − 1)(2|V | − 3). (4.1)

76

Since each package reaches its goal vertex by time Umakespan, an upper bound on the flowtime of

the resulting solution is

Uflowtime = M · Umakespan = M(M − 1)(2|V | − 3). (4.2)

The proof of Theorem 4.14 applies unchanged to K-PERR if all packages of the same type are

first assigned arbitrary different goal vertices of the same type, yielding the following corollary:

Corollary 4.15. All K-PERR problem instances are solvable. Solutions with polynomial

makespans and flowtimes can be found in polynomial time.

4.5 PERR Algorithms

From Section 2.4.1, we know that PERR can be viewed as a relaxed version of MAPF by omit-

ting the edge collisions (Definition 2.2). Therefore, many MAPF algorithms can be directly

adapted to solving PERR by removing the constraints of avoiding edge collisions. We now

demonstrate how the two optimal MAPF algorithms, CBS and the ILP-based MAPF algorithm,

introduced in Section 3.2 can be adapted to solving PERR optimally for makespan minimization.

We also discuss K-PERR and the special case of 1-PERR.

4.5.1 Adapted CBS

We can easily adapt CBS (see Section 3.2.2) from MAPF to PERR, which requires only the ad-

dition of exchange operations. Therefore, the resulting adapted CBS algorithm does not attempt

to detect edge collisions on Line 12 in Algorithm 3.2 and thus never adds edge constraints on the

high level. Its low-level space-time A* search for agent ai uses Umakespan given by Equation 4.1

as the upper bound on the arrival time Ti. The adapted CBS algorithm is correct, complete,

and optimal for makespan minimization for PERR, as can be shown with arguments similar to

those in (Sharon et al., 2015). The adapted CBS algorithm can also be used to compute optimal

77

solutions for a version of PERR with the flowtime objective if the flowtime of the plan of a node

is assigned to the cost of the node on Lines 7 and 22 in Algorithm 3.2.

4.5.2 ILP-Based PERR Algorithm

Similar to the ILP-based MAPF algorithm (see Section 3.2.3), the ILP-based PERR algorithm

first reduces PERR to the integer multi-commodity flow problem on a time-expanded flow net-

work and then uses this reduction to solve PERR optimally for makespan minimization.

4.5.2.1 Reducing PERR to Multi-Commodity Flow

We now describe the reduction used by the ILP-based PERR algorithm.

Given a PERR problem instance on undirected graph G = (V,E) and a limit T on the

number of time steps, we construct a T -step time-expanded flow network N = (V, E) with

vertices V =⋃v∈V ({vout0 } ∪

⋃Tt=1{vint , voutt }) and directed edges E with unit capacity. Each

vertex v ∈ V is translated to a vertex voutt ∈ V for all t = 0 . . . T (which represents vertex v at

the end of time step t) and a vertex vint ∈ V for all t = 1 . . . T (which represents vertex v in the

beginning of time step t). For each package pi, we set a supply of one at (start) vertex (si)out0 and

a demand of one at (goal) vertex (gi)outT , both for commodity type i (corresponding to package

pi). Each vertex v ∈ V is translated to an edge (voutt , vint+1) ∈ E for all time steps t = 0 . . . T −1

(which represents a package waiting at vertex v between time steps t and t + 1). Each vertex

v ∈ V is also translated to an edge (vint , voutt ) ∈ E for all time steps t = 1 . . . T (which prevents

vertex collisions of the form 〈∗, ∗, v, t〉 among all packages since only one package can occupy

vertex v between time steps t and t+1). Each edge (u, v) ∈ E is translated to edges (uoutt , vint+1)

and (voutt , uint+1) for all time steps t = 0 . . . T − 1 to allow a package to move along the edge or

exchange with another package along the edge.

The above reduction is similar to reducing MAPF to the integer multi-commodity flow prob-

lem (see Section 3.2.3.1). The construction of the flow network for a PERR problem instance

is the same as that for the corresponding MAPF problem instance except that it uses edges

78

uoutt

voutt

uint+1

vint+1

Figure 4.5: Example of the construction of N for edge (u, v) ∈ E and time step t.

(uoutt , vint+1) and (voutt , uint+1) for each edge (u, v) ∈ E and each time step t = 0 . . . T − 1 to al-

low for exchange operations instead of a gadget (shown in Figure 3.2) to prevent edge collisions.

Figure 4.5 shows an example of the construction for an exchange operation. The construction of

the flow network implies the following theorem:

Theorem 4.16. There is a correspondence between all feasible integer multi-commodity flows

on the T -step time-expanded flow network of a number of units equal to the number of agents

and all solutions of the PERR problem instance with makespans of at most T .

The proof of this theorem mirrors the one for the reduction of MAPF to the integer multi-

commodity flow problem (Yu & LaValle, 2013b).

4.5.2.2 Example

Figure 4.6 shows a feasible integer multi-commodity flow in the 3-step time-expanded flow

network reduced from the PERR problem instance shown in Figure 2.7. Solid colored circles

are (start) vertices with a supply. Hatched colored circles are (goal) vertices with a demand. The

blue edges represent a unit flow of commodity type 1, corresponding to a path for package p1.

The green edges represent a unit flow of commodity type 2, corresponding to a path for package

p2. The feasible integer multi-commodity flow corresponds to a solution {π1 = 〈B,C,D〉, π2 =

〈A,A,C,E〉}.

79

A B C D Es2 s1 g1 g2

out0

in1

out1

in2

out2

in3

out3

Figure 4.6: A feasible integer multi-commodity flow for the PERR problem instance shown inFigure 2.7.

4.5.2.3 Solving PERR via ILP

The ILP-based PERR algorithm then employs the reduction to solve PERR optimally for

makespan minimization. An integer multi-commodity flow problem can be expressed as ILP

using the standard formulation (also shown in Section 3.2.3.3). Let δ+(v) (respectively δ−(v))

be the set of incoming (respectively outgoing) edges of v. Let xi[e] be the Boolean variable rep-

resenting the amount of flow of commodity type i on edge e. An integer multi-commodity flow

problem on the T -step time-expanded flow network can be written as an ILP using the standard

formulation shown in Section 3.2.3.3. An optimal solution can thus be found by starting with a

80

lower bound on T and iteratively checking for increasing values of T whether a feasible integer

multi-commodity flow of M units exists in the corresponding T -step time-expanded flow net-

work (which is an NP-hard problem). We can use the maximum over i ∈ [M ] of the length of a

shortest path from si to gi inG as a lower bound on T . Each T -step time-expanded flow network

is translated into an ILP in the above way, which is then solved with an ILP solver. A feasible

integer multi-commodity flow for the smallest value of T corresponds to a PERR solution with

the smallest makespan. Unlike the ILP-based MAPF algorithm, the ILP-based PERR algorithms

does not require an upper bound on T since every PERR problem instance is solvable.

If an upper bound on T is available, then binary search on T can also be used to the smallest

value of T for which a feasible integer multi-commodity flow of M units exists. One such

upper bound is Umakespan given by Equation 4.1. Thus, the binary search requires at most

O(logUmakespan) = O(log(M |V |)) ≤ O(log |V |2) = O(log |V |) iterations. The binary search

requires fewer iterations than repeatedly incrementing T if the smallest value of T is much larger

than the lower bound.

4.5.2.4 Alternative ILP Formulation for Minimizing Makespan

Given an upper bound on T , for example, Umakespan given by Equation 4.1, we also develop a

way to minimize the makespan by solving one ILP only. To do so, we add a single auxiliary

variable z to the standard ILP formulation for the integer multi-commodity flow problem on the

Umakespan-step time-expanded flow network. We minimize z subject to z ≥ t · xi[(uoutt−1, vint )]

for all i ∈ [M ], (u, v) ∈ E, and t = 1, . . . ,Umakespan. The last movement of any package sets

the value of z, thereby minimizing the makespan. Let L be a lower bound on T , for example,

the maximum over i ∈ [M ] of the length of a shortest path from si to gi in G. Since only the

last movement of some package pi on the incoming edges of gi no earlier than time step L can

81

possibly set the value of z for any feasible flow, we only need to keep the above constraints for

all t = L, . . . ,Umakespan and edges in δ+((gi)int ). The complete ILP formulation is as follows.

minimize z, subject to:

0 ≤∑i∈[M ]

xi[e] ≤ 1 ∀e ∈ E .

∑e∈δ+(v)

xi[e]−∑

e∈δ−(v)

xi[e] = 0 ∀i ∈ [M ],


outT .∑


xi[e] =∑

e∈δ+((gi)outT )

xi[e] = 1 ∀i ∈ [M ].

xi[e] ∈ {0, 1} ∀i ∈ [M ],∀e ∈ E .

z ≥ t · xi[e] ∀i ∈ [M ], ∀t ∈ {L, . . . ,Umakespan},∀e ∈ δ+((gi)int ).

This formulation can also be used to solve MAPF optimally for makespan minimization using

an O(|V |3) upper bound on the smallest makespan.

4.5.2.5 ILP Formulation for Minimizing Flowtime

We also consider a version of PERR with the flowtime objective. A similar formulation allows

us to minimize the flowtime by solving one ILP only, namely by using an upper bound for T

and adding auxiliary variables zi for all i ∈ [M ]. An upper bound on the smallest flowtime, for

example, Uflowtime given by Equation 4.2, is also an upper bound on T , namely an upper bound

on the makespan of any solution with the smallest flowtime. We minimize∑

i∈[M ] zi subject to

zi ≥ t · xi[(uoutt−1, vint )] for all i ∈ [M ], (u, v) ∈ E, and t = 1, . . . ,Uflowtime. The last movement

of package pi sets the value of zi, thereby minimizing the flowtime. Let Li be the length of a

shortest path from si to gi in G. Since only the last movement of package pi on the incoming

edges of gi no earlier than time step Li can possibly set the value of zi for any feasible flow, we

82

only need to keep the above constraints for all t = L, . . . ,Uflowtime and edges in δ+((gi)int ) for

each i. The complete ILP formulation is as follows.

minimize∑i∈[M ]

zi, subject to:

0 ≤∑i∈[M ]

xi[e] ≤ 1 ∀e ∈ E .

∑e∈δ+(v)

xi[e]−∑

e∈δ−(v)

xi[e] = 0 ∀i ∈ [M ],


outT .∑


xi[e] =∑

e∈δ+((gi)outT )

xi[e] = 1 ∀i ∈ [M ].

xi[e] ∈ {0, 1} ∀i ∈ [M ],∀e ∈ E .

zi ≥ t · xi[e] ∀i ∈ [M ], ∀t ∈ {Li, . . . ,Uflowtime},∀e ∈ δ+((gi)int ).

This formulation can also be used to solve MAPF optimally for flowtime minimization using an

O(|V |3) upper bound on the optimal flowtime.

4.5.3 SolvingK-PERR Optimally

We can also use the construction described in Section 4.5.2.1 to reduce K-PERR to the integer

multi-commodity flow problem. Given a K-PERR problem instance on undirected graph G =

(V,E) and a limit T on the number of time steps, we construct a T -step time-expanded flow

networkN = (V, E) in the same way as described in Section 4.5.2.1 with the following change:

There are K instead of M commodity types. There are a supply of one unit at vertex (si)out0

and a demand of one unit at vertex (gi)outT for all i ∈ [M ], both for commodity type k where

k ∈ [K] is the type of package pi. We then use the standard ILP formulation to solve K-PERR

optimally for makespan minimization as described in Section 4.5.2.3, since Umakespan given by

Equation 4.1 is also an upper bound on the optimal makespan for K-PERR.

83

4.5.3.1 Special Cases of 1-PERR

In the special case of 1-PERR, the integer multi-commodity flow problem becomes a regular

feasible circulation problem, which is easily converted to a maximum flow problem. Since

all supply and demand values are one, any polynomial or pseudopolynomial time algorithm

for maximum flow determines the feasibility of 1-PERR for any particular T in polynomial

time. This reduction is similar to reducing Anonymous MAPF (see Section 3.4.2) to a max-flow

problem but permits exchange operations. Binary search on T thus yields the following theorem:

Theorem 4.17. 1-PERR can be solved optimally for makespan minimization in polynomial

time.

We can derive a tighter upper bound on the optimal makespan for 1-PERR. We recall that

there is a correspondence between a 1-PERR problem instance and an Anonymous MAPF prob-

lem instance on the same graph (see Section 2.4.4). We further show the following theorem:

Theorem 4.18. Any solution to an Anonymous MAPF problem instance can be interpreted as a

solution to its corresponding 1-PERR problem instance with the same makespan and flowtime.

Any solution to a 1-PERR problem instance can be transformed into a solution to its correspond-

ing Anonymous MAPF problem instance with the same makespan and flowtime.

Proof. The first statement is obviously true since any solution to an Anonymous MAPF problem

instance does not contain any exchange operations and thus is also a solution to its corresponding

1-PERR problem instance with the same makespan and flowtime.

For any solution to a 1-PERR problem instance that possibly contains exchange operations,

we construct a 1-PERR solution with the same makespan and flowtime that does not contain

exchange operations as follows. For every two packages pi and pj and each exchange operation

that moves package pi from vertex u to vertex v and package pj from vertex v to vertex u between

time steps t and t + 1, we let both packages wait at their current vertices for one time step and

exchange their paths from time step t + 1 on (and thus their goal vertices and arrival times),

resulting in the new path of package pi π′i = 〈πi(0), . . . , πi(t) = u, πj(t+ 1) = u, . . . , πj(Tj)〉

84

and the new path of package pj π′j = 〈πj(0), . . . , πj(t) = v, πi(t + 1) = v, . . . , πi(Ti)〉. The

resulting 1-PERR solution has the same makespan and flowtime as the original one but does not

contain exchange operations. It thus is also a solution to the corresponding Anonymous MAPF

problem instance with the same makespan and flowtime.

Therefore, exchange operations do not improve the makespan or the flowtime for 1-PERR.

The above theorem implies the following corollary:

Corollary 4.19. The optimal makespan of any 1-PERR problem instance is equal to the optimal

makespan of its corresponding Anonymous MAPF problem instance.

Proof. The optimal makespan of any 1-PERR problem instance is at most the optimal makespan

of its corresponding Anonymous MAPF problem instance since any solution with the optimal

makespan to the Anonymous MAPF problem instance can be interpreted as a solution to the

1-PERR problem instance with the same makespan. The optimal makespan of the 1-PERR prob-

lem instance is at most the optimal makespan of the Anonymous MAPF problem instance since

any solution with the optimal makespan to the 1-PERR problem instance can be transformed into

a solution to the Anonymous MAPF problem instance with the same makespan. The corollary

thus follows.

Since the optimal makespan of any Anonymous MAPF problem instance is bounded from

above by M + |V |−1 (Yu & LaValle, 2013a) (which is a tighter upper bound than the one given

by Equation 4.1), the optimal makespan of any 1-PERR problem instance is also bounded from

above by M + |V | − 1.

4.5.3.2 Open Questions for Flowtime Minimization

We now consider a version of 1-PERR and a version of Anonymous MAPF that minimize the

flowtime. An argument similar to that for proving Corollary 4.19 yields the following corollary:

Corollary 4.20. The optimal flowtime of any 1-PERR problem instance is equal to the optimal

flowtime of its corresponding Anonymous MAPF problem instance.

85

There exists an Anonymous MAPF solution with the optimal flowtime whose makespan is

bounded from above by (M−1)(M−2)2 + |V | (Yu & LaValle, 2013a). Such an Anonymous MAPF

solution can be interpreted as a 1-PERR solution with the same makespan and flowtime due to

Theorem 4.18, which is also optimal for 1-PERR with respect to the flowtime objective due to

Corollary 4.20. Therefore, there also exists a 1-PERR solution with the optimal flowtime whose

makespan is bounded from above by (M−1)(M−2)2 + |V |.

Therefore, both 1-PERR and Anonymous MAPF can be solved optimally with respect to the

flowtime objective by using the ILP formulation introduced in Section 4.5.2.5 with one type of

commodity and setting Uflowtime = (M−1)(M−2)2 + |V | for their respective Uflowtime-step time-

expanded flow networks. Li can be set as the minimum of the length of a shortest path from si′

to gi in G over all i′ ∈ [M ]. The complete ILP formulation is as follows.

minimize∑i∈[M ]

zi, subject to:

∑e∈δ+(v)

x[e]−∑

e∈δ−(v)

x[e] = 0 ∀i ∈ [M ],


outT .∑


x[e] =∑

e∈δ+((gi)outT )

x[e] = M ∀i ∈ [M ].

x[e] ∈ {0, 1} ∀e ∈ E .

zi ≥ t · x[e] ∀i ∈ [M ],∀t ∈ {Li, . . . ,Uflowtime},∀e ∈ δ+((gi)

int ).

However, the complexities of optimally solving 1-PERR and Anonymous MAPF for flow

time minimization are not known. Their complexities are equivalent due to Theorem 4.18 and

Corollary 4.20. We do not address their complexities in this dissertation but post the following

open questions:

• Can 1-PERR and Anonymous MAPF be solved optimally in polynomial time for flowtime

minimization?

86

• Are 1-PERR and Anonymous MAPF NP-hard to solve optimally for flowtime minimiza-

tion?

4.6 Summary

In this chapter, we introduced a unified NP-hardness proof structure that stems from formaliz-

ing and studying a novel MAPF variant, called PERR. Our unified NP-hardness proof structure

reduces NP-complete versions of the Boolean satisfiability problem to target-assignment and

path-planning problems. We used our unified NP-hardness proof structure to prove that PERR

is NP-hard to approximate within any constant factor less than 4/3 for makespan minimization

by reducing ≤3,=3-SAT to PERR. We also used our proof structure to prove that 2-PERR is

NP-hard to approximate within any constant factor less than 4/3 for makespan minimization

by reducing 2/2/3-SAT to 2-PERR. The latter proof generalizes to all cases of K-PERR with

K > 1. Studying PERR is a first attempt toward establishing a theoretical understanding for

one-shot target-assignment and path-planning problems that involve payload exchanges. More

importantly, studying PERR also generates new insights into solving other target-assignment

and path-planning problems. For example, we demonstrated that our unified NP-hardness proof

structure can be used to derive fixed-parameter inapproximability results for the one-shot path-

planning problem MAPF, the one-shot combined target-assignment and path-planning problem

TAPF, and the long-term combined target-assignment and path-planning problem MAPD and

NP-hardness results for many other MAPF variants. Specifically, we proved that MAPF, TAPF

with K > 1 teams of agents, and MAPD are all NP-hard to approximate within any constant

factor less than 4/3 for makespan minimization. We also described how optimal MAPF algo-

rithms can be used to solve PERR optimally by viewing MAPF as a relaxation of PERR and that

the special case of 1-PERR can be solved optimally for makespan minimization in polynomial

time. Finally, we posted open questions for optimally solving 1-PERR and Anonymous MAPF

with respect to the flowtime objective.

87

To summarize, in this chapter, we validated the hypothesis that formalizing and studying

new variants of MAPF can result in new theoretical insights into the one-shot and the long-

term combined target-assignment and path-planning problems for teams of agents. The theo-

rems and corollaries derived in this chapter suggest that it is NP-hard to solve the one-shot and

the long-term combined target-assignment and path-planning problems optimally for makespan

minimization in general, but computing solutions with the optimal makespan for the special case

of one team of agents for the one-shot combined target-assignment and path-planning problem

can be done in polynomial time. These results lay a theoretical foundation for studying the

combined target-assignment and path-planning problems that we introduce in Chapters 5 and 6.

88

Chapter 5

One-Shot Target Assignment and Path Planning

In this chapter, we present the second major contribution of this dissertation. Specifically, we

formalize and study a new variant of MAPF, called Combined Target Assignment and Path

Finding (TAPF), that models the one-shot coordination of autonomous target-assignment and

path-planning operations of teams of agents. We demonstrate how TAPF can be solved using

a flow-based ILP formulation. We also present a novel TAPF algorithm, called Conflict-Based

Min-Cost Flow (CBM), that exploits the combinatorial structure of TAPF. Both algorithms are

optimal for TAPF. Experimental results show that CBM can compute optimal TAPF solutions

for more than 400 agents in minutes of runtime. We describe how the existing polynomial-time

procedure MAPF-POST can be used in a post-processing step to transform TAPF solutions into

plan-execution schedules that can be safely executed by teams of real-world agents. Therefore,

these results validate the hypothesis that formalizing and studying new variants of MAPF can re-

sult in new algorithms for the one-shot combined target-assignment and path-planning problem

that can benefit real-world applications of multi-agent systems. We follow our formalization of

TAPF in Section 2.3.

The remainder of this chapter is structured as follows. In Section 5.1, we reiterate the motiva-

tion behind formalizing and studying TAPF. In Section 5.2, we establish the relationship between

This chapter is based on Ma, H., & Koenig, S. (2016). Optimal target assignment and path finding for teams ofagents. In International Conference on Autonomous Agents and Multiagent Systems (pp. 1144–1152).

89

TAPF and flow problems and present an optimal TAPF algorithm based on a flow-based ILP for-

mulation. In Section 5.3, we present CBM and prove that it is correct, complete, and optimal.

In Section 5.4, we experimentally evaluate the scalability of CBM on TAPF problem instances

with dozens of teams and hundreds of agents, compare it with the ILP-based TAPF algorithm,

and adapt CBM to a warehouse map. In Section 5.5, we describe how TAPF algorithms can take

kinematic constraints into account. Finally, we summarize the contributions of this chapter in

Section 5.6.

5.1 Introduction

In this chapter, we develop algorithmic techniques for the one-shot coordination of autonomous

target-assignment and path-planning operations of teams of agents by studying TAPF (see Sec-

tion 2.3). TAPF generalizes both Anonymous MAPF and (Non-Anonymous) MAPF. In TAPF,

M agents are given and are partitioned into K teams. Each team is given the same number of

unique targets as there are agents in the team. The problem is to assign targets to agents and

plan collision-free paths for the agents from their given start vertices to their assigned targets in

a way such that each agent moves to exactly one target given to its team and each target of each

team is reached by an agent in the same team. Any agent in a team can be assigned any target of

the team, and the agents in the same team are thus exchangeable. However, agents in different

teams are not exchangeable. For example, in an Amazon Robotics automated warehouse sys-

tem (see Figure 1.1 for an example), warehouse robots delivering inventory pods to K inventory

stations are partitioned into K teams, one for each inventory station. The problem is to assign

positions in (the entrance queue of) each inventory station to warehouse robots carrying inven-

tory pods that store the products needed by the inventory station and plan collision-free paths

for the warehouse robots from their current locations to their assigned positions in the inventory

stations.

TAPF also models the one-shot coordination of both the target-assignment and the path-

planning operations for many other real-world applications of multi-agent systems, including

90

teams of video game characters (Ma, Yang, et al., 2017) or swarms of differential-drive robots

and quadcopters (Honig, Kumar, Ma, et al., 2016; Honig, Preiss, et al., 2018; Li, Sun, et al.,

2020; Preiss et al., 2017; Turpin, Michael, & Kumar, 2014) that assign targets among themselves

and move to their assigned targets to form a designed formation together, groups of soccer robots

(MacAlpine, Price, & Stone, 2015) that assign positions of a soccer field among themselves and

move to their assigned positions to fill different roles in a soccer game of RoboCup, and teams

of service robots (Jiang, Yedidsion, Zhang, Sharon, & Stone, 2019) that assign different tasks

among themselves and move to the task locations (where their assigned tasks can be executed)

in an indoor office environment to provide service to the office workers.

As we previously discussed in Section 1.1, solving the one-shot combined target-assignment

and path-planning problem TAPF requires solving both the one-shot target-assignment and the

one-shot path-planning sub-problems for these teams of agents. However, research so far has

concentrated on the two extreme cases of Anonymous MAPF and (Non-Anonymous) MAPF.

Yet, many real-world applications fall between the extreme cases because the number of teams

is larger than one but smaller than the number of agents (1 < K < M ). Straightforward ways of

generalizing (Non-Anonymous) MAPF algorithms have difficulties with either scalability (due to

the resulting large state spaces), such as searching over all assignments of targets to agents to find

optimal solutions, or solution quality, such as assigning targets to agents with target-assignment

algorithms such as (Tovey, Lagoudakis, Jain, & Koenig, 2005; Zheng & Koenig, 2009) and then

planning collision-free paths for the agents with (Non-Anonymous) MAPF algorithms (perhaps

followed by improving the assignment and iterating (Wagner, Choset, & Ayanian, 2012)) to find

sub-optimal solutions. It is also unclear how to generalize algorithms for solving Anonymous

MAPF, which is a polynomial-time solvable problem, to the NP-hard problem of TAPF with

multiple teams of agents.

Therefore, we study algorithmic techniques for optimally solving TAPF to bridge the gap

between algorithms for the extreme cases of Anonymous MAPF and (Non-Anonymous) MAPF.

We first demonstrate how TAPF can be solved by generalizing the flow-based ILP formulation

for MAPF (see Section 3.2.3) to TAPF. We then present a TAPF algorithm, called CBM, that

91

solves TAPF optimally by simultaneously assigning targets to agents and planning collision-

free paths for them. It exploits the combinatorial structure of TAPF by reducing TAPF to the

polynomial-time solvable sub-problems for single teams and the NP-hard sub-problem of coor-

dinating different teams. Specifically, CBM is a hierarchical algorithm that combines ideas from

both Anonymous and (Non-Anonymous) MAPF algorithms. It uses a min-cost max-flow algo-

rithm (Goldberg & Tarjan, 1987) on a time-expanded flow network on the low level to assign

targets to and find paths for agents in single teams and CBS (see Section 3.2.2) on the high level

to resolve any collisions among the paths of agents from different teams. Theoretically, we prove

that CBM is correct, complete, and optimal. Experimentally, we compare CBM to solving the

flow-based ILP formulation of TAPF directly with an ILP solver.

5.2 ILP-Based TAPF Algorithm

Similar to the ILP-based MAPF algorithm (see Section 3.2.3), the ILP-based TAPF algorithm

first reduces TAPF to the integer multi-commodity flow problem on a time-expanded flow net-

work and then uses this reduction to solve TAPF optimally.

5.2.1 Reducing TAPF to Multi-Commodity Flow

We now describe the reduction used by the ILP-based TAPF algorithm.

Given a TAPF problem instance on undirected graph G = (V,E) and a fixed number of

time step T , we construct a T -step time-expanded flow network N = (V, E) with vertices V =⋃v∈V ({vout0 } ∪

⋃Tt=1{vint , voutt }) and directed edges E with unit capacity. Each vertex v ∈ V is

translated to a vertex voutt ∈ V for all t = 0, . . . , T (which represents vertex v at the end of time

step t) and a vertex vint ∈ V for all t = 1, . . . , T (which represents vertex v in the beginning

of time step t). For each agent akj , we set a supply of one unit of commodity type k at (start)

vertex (skj )out0 . For each target gkj , we set a demand of one unit of commodity type k at (target)

vertex (gkj )outT . Each vertex v ∈ V is also translated to an edge (voutt , vint+1) ∈ E for all time steps

t = 0, . . . , T − 1 (which represents an agent waiting at vertex v between time steps t and t+ 1).

92

uoutt

voutt

uint+1

vint+1

w w′

Figure 5.1: Example of the construction of the gadgets in N for edge (u, v) ∈ E and time stept.

Each vertex v ∈ V is translated to an edge (vint , voutt ) ∈ E for all time steps t = 1, . . . , T (which

prevents vertex collisions of the form 〈∗, ∗, v, t〉 among all agents since only one agent can

occupy vertex v between time steps t and t+1). Each edge (u, v) ∈ E is translated to a gadget of

vertices in V and edges in E for all time steps t = 0, . . . , T − 1, which consists of two auxiliary

vertices w,w′ ∈ V that are unique to the gadget (but have no superscripts/subscripts here for

ease of readability) and the edges (uoutt , w), (voutt , w), (w,w′), (w′, uint+1), (w, vint+1) ∈ E . This

gadget prevents edge collisions of the forms 〈∗, ∗, u, v, t〉 and 〈∗, ∗, v, u, t〉 among all agents

since only one agent can move along the edge (u, v) in any direction between time steps t and

t+ 1. Figure 5.1 shows an example of the construction of the gadgets.

The above reduction is similar to reducing MAPF to the integer multi-commodity flow prob-

lem (see Section 3.2.3). The construction of the flow network for a TAPF problem instance is

similar to that for a MAPF problem instance except that the supplies at all vertices (skj )out0 and

the demands at all vertices (gkj )outT have the same commodity type k for each k ∈ [K]. The

construction implies the following theorem:

Theorem 5.1. There is a correspondence between all feasible integer multi-commodity flows on

the T -step time-expanded network of a number of units equal to the number of agents and all

solutions of the TAPF problem instance with makespans of at most T .

The proof of this theorem mirrors the one for the reduction of MAPF to the integer multi-

commodity flow problem (Yu & LaValle, 2013b).

93

5.2.2 Solving TAPF via ILP

The ILP-based TAPF algorithm employs the reduction to solve TAPF optimally. Let δ+(v)

(respectively δ−(v)) be the set of incoming (respectively outgoing) edges of v. Let xk[e] be the

Boolean variable representing the amount of flow of commodity type k on edge e. An integer

multi-commodity flow problem on the T -step time-expanded network can be written as an ILP

using the following standard formulation.

0 ≤K∑k=1

xk[e] ≤ 1 ∀e ∈ E .

∑e∈δ+(v)

xk[e]−∑

e∈δ−(v)

xk[e] = 0 ∀k ∈ [K], ∀j ∈ [Mk],∀v ∈ V suchthat v 6= (skj )

out0 and v 6= (gkj )outT .∑

j∈[Mk]

∑e∈δ−((skj )out0 )

xk[e] =∑

j∈[Mk]

∑e∈δ+((gkj )

outT )

xk[e] = Mk ∀k ∈ [K].

xk[e] ∈ {0, 1} ∀k ∈ [K], ∀e ∈ E .

An optimal solution can thus be found by starting with T = 0 and iteratively checking for in-

creasing values of T whether a feasible integer multi-commodity flow of a number of units equal

to the number of agents exists for the corresponding T -step time-expanded network (which

is an NP-hard problem), until an upper bound on T is reached (such as the one provided in

Section 2.3.1). Each T -step time-expanded network is translated into an ILP, which is then

solved with an ILP solver. We evaluate this ILP-based TAPF algorithm experimentally in Sec-

tion 5.4.1.1.

5.2.3 Example

We now demonstrate how to solve the TAPF problem instance shown in Figure 2.5 via the

reduction to the integer multi-commodity flow problem. There is no feasible integer multi-

commodity flow on the 0, 1, or 2-step time-expanded flow network constructed from the problem

94

instance. Figure 5.2 shows the construction of the 3-step time-expanded flow network from the

problem instance and a feasible integer multi-commodity flow on it. There is a supply of one

unit of commodity type 1 at vertex Cout0 and a demand of one unit of commodity type 1 at vertex

Eout3 . There is a supply of one unit of commodity type 2 at vertices Aout0 and Bout0 each and a

demand of one unit of commodity type 2 at vertices Dout3 and Fout3 each. The blue edges represent

a flow for commodity type 1, which corresponds to a path for the agent in team1. The green

edges represent a flow for commodity type 2, which corresponds to paths for the two agents in

team2. This feasible integer multi-commodity flow flow thus corresponds to the assignments of

target g11 = E to agent a11 of team1 and target g21 = D to agent a21 and target g22 = F to agent a22,

respectively, of team2, namely σ1(a11) = g11 , σ2(a21) = g21 , and σ2(a22) = g22 , and represents the

optimal solution {π11 = 〈C,D,E〉, π21 = 〈A,A,B,D〉, π22 = 〈B,B,D,F〉} with makespan 3.

5.2.4 Special Case of One Team

Anonymous MAPF results from TAPF if only one team exists (that consists of all agents) (see

also Section 3.4.2) and can be solved with a max-flow algorithm in polynomial time.

Corollary 5.2. TAPF with one team of agents can be solved optimally for makespan minimiza-

tion in polynomial time.

5.3 Conflict-Based Min-Cost Flow

In this section, we present Conflict-Based Min-Cost-Flow (CBM), a hierarchical algorithm that

solves TAPF optimally. On the high level, CBM considers each team to be a meta-agent. It

uses CBS (see Section 3.2.2) to resolve collisions among meta-agents, that is, agents in different

teams. Similar to CBS, the high-level search of CBM performs a best-first search on a constraint

tree, where each node contains a set of constraints and paths for all agents that obey these con-

straints, move all agents to unique targets of their teams, and result in no collisions among agents

in the same team. On the low level, CBM uses a polynomial-time min-cost max-flow algorithm

(Goldberg & Tarjan, 1987) on a time-expanded network to assign all targets of a single team to

95

A B C D E Fs21 s22 s11 g2

2 g11 g2

1out0

in1

out1

in2

out2

in3

out3

Figure 5.2: A feasible integer multi-commodity flow for the TAPF problem instance shown inFigure 2.5.

unique agents in the same team and plan paths for the agents that obey the constraints imposed

by the currently considered high-level node and result in no collisions among the agents in the

team. Since the runtime of CBS on the high level can be exponential in the number of collisions

that need to be resolved (Sharon et al., 2015), CBM uses edge weights on the low level to bias

the paths of agents in the same team so as to reduce the possibility of creating collisions with

agents in different teams.

The idea of biasing the search on the low level has been used before for solving (Non-

Anonymous) MAPF with CBS (as described in Section 3.2.2.2). Similarly, the idea of grouping

96

some agents into a meta-agent on the high level and planning paths for each group on the low

level has been used before for solving (Non-Anonymous) MAPF with CBS (as described in

Section 3.2.2.5) but faces the difficulty of having to identify good groups of agents (Sharon et

al., 2015). The best way to group agents can often be determined only experimentally and varies

significantly among MAPF problem instances. On the other hand, grouping all agents in a team

into a meta-agent for solving TAPF is a natural way of grouping agents since the assignments of

targets to agents in the same team and the paths of these agents strongly depend on each other

and should thus be planned together on the low level. For example, if an agent is assigned a

different target, then many of the agents in the same team typically need to be assigned different

targets as well and have their paths re-planned. Also, the lower level can then use a polynomial-

time max-flow algorithm on a time-expanded network to assign all targets of a single team to

agents in that team and find paths for the agents, due to the polynomial-time complexity of the

corresponding TAPF sub-problem for one single team (see Section 5.2.4).

5.3.1 High-Level Search of CBM

Similar to the high-level search of CBS (see Section 3.2.2.1), CBM performs a best-first search

on the high level to resolve collisions among agents from different teams and builds a constraint

tree.

Definition 5.1. A constraint for team teamk is either a vertex constraint 〈teamk, v, t〉, that

prohibits any agent in team teamk from occupying a vertex v at timestep t, or an edge con-

straint 〈teamk, u, v, t〉, that prohibits any agent in teamk from moving from vertex u to vertex

v between timesteps t and t+ 1.

Each node N contains a set of constraints N.constraints, a plan (paths for all agents) N.plan

that obeys these constraints, moves all agents to unique targets of their teams, and results in no

collisions among agents in the same team, and a cost N.cost equal to the makespan of its plan.

The list OPEN stores all generated but unexpanded nodes. Algorithm 5.1 shows the high-level

search of CBM. CBM starts with the root node, that has an empty set of constraints [Line 1].

97

Algorithm 5.1: High-Level Search of CBMInput: TAPF problem instance

1 Root.constraints← ∅;2 Root.plan← ∅;3 foreach k ∈ [K] do4 if LowLevel(teamk, Root) returns no paths then5 return “No Solution”;

6 Add the returned paths to Root.plan;

7 Root.cost← Makespan(Root.plan);8 OPEN ← {Root};9 while OPEN 6= ∅ do

10 N ← arg minN ′∈OPEN N′.cost;

11 OPEN ← OPEN \ {N};12 if N.plan has no collision then13 return N.plan;

14 collision← a vertex or edge collision 〈teamk, teamk′ , . . . 〉 in N.plan;15 foreach teamk involved in collision do16 N ′ ← new node;17 N ′.plan← N.plan;18 N ′.constraints← N.constraints;19 N ′.constraints← N ′.constraints ∪ {〈teamk, . . . 〉};20 if LowLevel(teamk, N ′) returns paths then21 Update N ′.plan with the returned paths;22 N ′.cost← Makespan(N ′.plan);23 OPEN ← OPEN ∪ {N ′};

24 return “No Solution”;

It performs a low-level search to find paths for all agents in each team that move all agents to

unique targets of the team and result in no collisions among agents in the team. It terminates

unsuccessfully if the low-level search for any team returns no path. Otherwise, the plan of the

root node contains paths for all agents [Line 2-6]. The cost of the root node is the makespan of

its plan [Line 7]. CBM inserts the root node into OPEN [Line 8]. If OPEN is empty, then CBM

terminates unsuccessfully [Lines 9 and 24]. Otherwise, CBM expands a node N in OPEN with

the smallest cost [Line 10] and removes the node from OPEN [Line 11]. The cost of a node

is the makespan of its plan. Similar to the high-level search of CBS (see Section 3.2.2.1), ties

are broken in favor of the node whose plan has the smallest number of pairs of colliding teams.

98

If the plan of node N have no colliding agents, then they are a solution and CBM terminates

successfully with this plan [Lines 12-13]. Otherwise, CBM finds a collision that it needs to

resolve [Line 14]. CBM then generates two child nodes N1 and N2 of node N [Lines 15-16].

Each child node inherits the plan and all constraints from node N [Lines 17-18]. If the collision

is a vertex collision 〈teamk, teamk′ , v, t〉, then CBM adds the vertex constraint 〈teamk, v, t〉 to

the constraints of node N1 and the vertex constraint 〈teamk′ , v, t〉 to the constraints of node N2

[Line 19]. If the collision is an edge collision 〈teamk, teamk′ , u, v, t〉, then CBM adds the edge

constraint 〈teamk, u, v, t〉 to the constraints of node N1 and the edge constraint 〈teamk′ , v, u, t〉

to the constraints of node N2 [Line 19]. For node N1 (respectively N2), CBM performs a low-

level search to assign targets of team teamk (respectively teamk′) to agents in the same team and

find paths for the agents that obey all constraints in N1.constraints (respectively N2.constraints)

relevant to team teamk (respectively teamk′) and result in no collisions among the agents in the

team. If the low-level search successfully returns such paths, CBM replaces the old paths of all

agents in teamk (respectively teamk′) in N1.plan (respectively N2.plan) with the returned ones

[Line 20], updates the cost ofN1 (respectivelyN2) accordingly, and insertsN1 (respectivelyN2)

into OPEN [Lines 22-23]. Otherwise, it discards the node.

5.3.2 Low-Level Search of CBM

On the low level, LowLevel(teamk,N ) assigns the targets of team teamk to agents in the same

team and finds paths for the agents that obey all constraints of node N (namely all vertex con-

straints of the form 〈teamk, ∗, ∗〉 and all edge constraints of the form 〈teamk, ∗, ∗, ∗〉) and result

in no collisions among the agents in the team.

Given a fixed number of time steps T , CBM constructs the T -step time-expanded flow net-

work from Section 5.2 with the following changes: (a) There is only a single commodity type

k since CBM considers only the single team teamk. There are a supply of one unit of this

commodity type at vertex (skj )out

0and a demand of one unit of this commodity type at vertex

(gkj )out

Tfor all j ∈ [Mk]. (b) To obey the vertex constraints, CBM removes the edge (vint , v

outt )

from E for each vertex constraint of the form 〈teamk, v, t〉. (c) To obey the edge constraints,

99

CBM removes the edges ((u)outt , w) and (w′, (v)int+1) from E for all gadgets that correspond to

edge (u, v) ∈ E for each edge constraint of the form 〈teamk, u, v, t〉. Let V = V ′ be the set of

(remaining) vertices and E ′ be the set of (remaining) edges.

Similar to the procedure from Section 5.2, CBM iteratively checks for increasing values of

T whether a feasible integer single-commodity flow of Mk units exists for the corresponding

T -step time-expanded flow network, which can be done with the polynomial-time max-flow

algorithm that finds a feasible maximum flow. CBM can start with T being a lower bound on

the cost of node N . One such lower bound is the cost of the parent node of node N since the

constraints of node N is a superset of those of its parent node, which results in a plan with

makespan no smaller than that of its parent node. (For N = Root, CBM starts with T = 0.)

During the earliest iteration when the max-flow algorithm finds a feasible flow of Mk units, the

call returns successfully with the paths for the agents in the team that correspond to the flow. If T

reaches an upper bound on the makespan of an optimal solution (such as the one provided in Yu

and Rus (2015)) and no feasible flow ofMk units was found, then the call returns unsuccessfully

with no paths.

5.3.3 Example

We now demonstrate how to solve the TAPF problem instance shown in Figure 2.5 using CBM.

Figure 5.3 shows the construction of the root node Root in the constraint tree on the left and the

2-step time-expanded flow network for the low-level search for each team on the bottom right

with a feasible integer flow for the team. Root.constraints of the root node Root is an empty set of

constraints [Line 1]. LowLevel(team1, Root) returns path π11 = 〈C,D,E〉, and LowLevel(team2,

Root) returns paths π21 = 〈A,B,D〉, π22 = 〈B,D,F〉, that result in no collisions among agents

in team2 [Lines 3-6]. The resulting cost Root.cost of the root node Root is 2 [Line 7]. In the

first iteration, the root node N = Root is popped from OPEN [Line 10]. There is a vertex

collision 〈team1, team2,D, 1〉 in its plan [Line 14]. For team team1, CBM generates the left

child node N ′ and adds a new constraint 〈team1,D, 1〉 [Lines 16-19]. Figure 5.4 shows the

construction of the left child node N ′ = N1 in the constraint tree on the left and the 3-step

100


2 g11 g2

1out0

in1

out1

in2

out2


2 g11 g2

1out0

in1

out1

in2

out2

a21 a22

a11

A B

C

D

E

Fg22 g21

g11

Root .plan:team1〈C,D,E〉team2〈A,B,D〉〈B,D,F〉

Root .constraints :NoneRoot

cost = 2Colliding Teams:(team1, team2)

(team1,D, 1) (team2,D, 1)

team1 team2

Figure 5.3: Construction of the root node Root and the low-level search for team1 and team2. The constraints and the plan of Root areshown on the top right. The colliding teams of each node are shown to explain the tie breaking.101


2 g11 g2

1out0

in1

out1

in2

out2

in3

out3

a21 a22

a11

A B

C

D

E

Fg22 g21

g11

N1 .plan:team1〈C,C,D,E〉team2〈A,B,D〉〈B,D,F〉

N1 .constraints :〈team1,D, 1〉Root


N1


(team1,D, 1) (team2,D, 1)

team1

Figure 5.4: Construction of the left child node N1 and the low-level search for team1. The constraints and the plan of N1 are shown onthe top right. The constraints and the plan of the other nodes are not shown. The colliding teams of each node are shown to explain the tiebreaking.

102


2 g11 g2

1out0

in1

out1

in2

out2

in3

out3

a21 a22

a11

A B

C

D

E

Fg22 g21

g11

N2 .plan:team1〈C,D,E〉team2〈A,A,B,D〉〈B,B,D,F〉

N2 .constraints :〈team2,D, 1〉Root


N1


N2

cost = 3Colliding Teams:

None

〈team1,D, 1〉〈team2,D, 1〉

team2

Figure 5.5: Construction of the right child node N2 and the low-level search for team2. The constraints and the plan of N2 are shown onthe top right. The constraints and the plan of the other nodes are not shown. The colliding teams of each node are shown to explain the tiebreaking.

103

time-expanded flow network for the low-level search for team1 on the bottom right, where edge

(Din1 ,Dout1 ) is removed to enforce the vertex constraint 〈team1,D, 1〉. LowLevel(team1, N1)

returns path π11 = 〈C,C,D,E〉, that obeys N ′.constraints and results in no collisions among

agents in team1. The plan N ′.plan and the cost N ′.cost of the child node N ′ = N1 are updated

accordingly [Lines 20-23]. Then, the right child node N ′ = N2 is constructed in a similar way

(Figure 5.5). In the next iteration, the two nodes N1 and N2 in OPEN have the same cost. The

right child node N = N2 is popped from OPEN because its plan has the smallest number of

pairs of colliding teams. Since its plan N.plan has no collision, CBM returns its plan as the

solution [Lines 12-13].

5.3.4 Avoiding to Create Collisions with Other Teams in the Low-Level Search

CBM actually implements LowLevel(teamk,N ) in a more sophisticated way to avoid creating

collisions between agents in team teamk and agents in other teams by adding edge weights to

the T -step time-expanded flow network. CBM sets the weights of all edges in E ′ to zero initially

and then modifies them as follows: (a) To reduce vertex collisions, CBM increases the weight

of edge (vint , voutt ) ∈ E ′ by one for each vertex v = πk

′j′ (t) ∈ V in the paths of node N with

k′ 6= k to reduce the possibility of an agent of team teamk occupying the same vertex at the

same time step as an agent from a different team. (b) To reduce edge collisions, CBM increases

the weight of edge (vint , w) ∈ E ′ by one for each edge (u = πk′j′ (t), v = πk

′j′ (t+ 1)) ∈ E in the

paths of node N with k′ 6= k (where w is the auxiliary vertex of the gadget that corresponds to

edge (u, v) and time step t) to reduce the possibility of an agent of team teamk moving along

the same edge in a different direction but at the same time step as an agent from a different team.

CBM uses the procedure described above, except that it now uses a min-cost max-flow al-

gorithm (instead of a max-flow algorithm) that finds a flow of minimal weight among all fea-

sible maximum flows. In particular, it uses the successive shortest path algorithm (Goldberg &

Tarjan, 1987), a generalization of the Ford-Fulkerson algorithm (Ford & Fulkerson, 1956) that

uses Dijkstra’s algorithm (Dijkstra, 1959) to find a path of minimal weight for one unit of flow.

The complexity of the successive shortest path algorithm is O(U(|E ′| + |V ′| log |V ′|)), where

104

O(|E ′|+ |V ′| log |V ′|) is the complexity of Dijkstra’s algorithm and U is the value of the feasible

maximum flow, which is bounded from above by Mk. The number of times that the successive

shortest path algorithm is executed is bounded from above by the chosen upper bound on the

makespan of an optimal solution, which in turn is bounded from above by O(|V |3). Thus, each

low-level search runs in polynomial time.

The low-level space-time A* search of CBS (see Section 3.2.2.2) uses a similar idea to return

a time-minimal path with the fewest collisions with the paths of other agents.

5.3.5 Analysis of Properties

We use the following properties to prove that CBM is correct, complete, and optimal.

Property 5.1. There is a correspondence between all feasible integer flows of Mk units on the

T -step time-expanded flow network constructed for team teamk and node N and all paths for

agents in team teamk that a) obey the constraints of node N , b) move all agents in team teamk

from their start vertices to unique targets of their team, c) result in no collisions among agents in

team teamk, and d) result in a team cost of team teamk of at most T .

Proof. The property holds by construction and can be proved in a way similar to the one for the

reduction of the (non-anonymous) MAPF problem to the integer multi-commodity flow problem

(Yu & LaValle, 2013b):

Left to right: Assume that a flow is given that has the stated properties. Each unit flow from a

source to a sink corresponds to a path through the time-expanded flow network from a unique

source to a unique sink. Thus, it can be converted to a path for an agent such that all such

paths together have the stated properties: Properties a and d hold by construction of the time-

expanded flow network; Property b holds because a flow of Mk units uses all sources and sinks;

and Property c holds since the flows neither share vertices nor edges.

Right to left: Assume that paths are given that have the stated properties. If necessary, we extend

the paths by letting the agents wait at their targets. Each path now corresponds to a path through

the time-expanded network (due to Properties a and d) from a unique source to a unique sink

105

(due to Property b) that does not share directed edges with the other such paths (due to Property

c). Thus, it can be converted to a unit flow such that all such unit flows together respect the unit

capacity constraints and form a flow of Mk units.

Property 5.2. CBM generates only finitely many nodes.

Proof. The constraint added on Line 19 to a child node is different from the constraints of its

parent node since the paths of its parent node do not obey it. Also, only finitely many different

vertex and edge constraints exist since the graph is finite and there is a finite limit (bounded from

above by O(|V |3)) on the number of time steps. Overall, CBM creates a binary tree of finite

depth and thus generates only finitely many nodes.

Property 5.3. Whenever CBM inserts a node into OPEN, its cost is finite.

Proof (by induction). The property holds for the root node since the graph is connected. Assume

that it holds for the parent node of some child node. The cost of the child node is the maximum

of the cost of the parent node and the team cost of the team whose paths are re-planned for the

child node. The cost of the parent node is finite due to the induction assumption. The team cost

resulting from any returned paths by the low level search is finite as well.

Property 5.4. Whenever CBM chooses a node on Line 10 with no collisions in its plan, then

CBM correctly terminates with a solution with finite makespan of at most its cost.

Proof. The cost of the node is finite according to Property 5.3, and the makespan of its plan is at

most its cost due to Line 22.

Property 5.5. CBM chooses nodes on Line 10 in non-decreasing order of their costs.

Proof. CBM performs a best-first search on the high level, and the cost of a parent node is at

most the cost of any of its child nodes due to Line 22 since the team cost of the team whose paths

are re-planned (with an additional constraint) for the child node cannot decrease and the team

costs of all other teams remain unchanged in the child node.

106

Property 5.6. The smallest makespan of any solution that obeys the constraints of a parent node

is at most the smallest makespan of any solution that obeys the constraints of any of its child

nodes.

Proof. The solutions that obey the constraints of a parent node are a superset of the solutions

that obey the constraints of any of its child nodes since the constraints of the parent node are a

subset of the constraints of any of its child nodes.

Property 5.7. The cost of a node is at most the makespan of any solution that obeys its con-

straints.

Proof (by induction). The paths for each team in the root node have the smallest possible team

cost for that team. The property thus holds for the root node. Assume that it holds for the parent

node N of any child node N ′ and that the paths for team teamk were updated in the child node.

Let x be the smallest makespan of any solution that obeys the constraints of the parent node and

y be the smallest makespan of any solution that obeys the constraints of the child node. We show

in the following that the cost of the parent node and the team costs of all teams for the paths of

the child node are all at most y. Then, the cost of the child node is also at most y since it is the

maximum of all these quantities, and the property holds. First, consider the cost of the parent

node. The cost of the parent node is at most x due to the induction assumption, which in turn

is at most y due to Property 5.6. Second, consider any team different from team teamk. Then,

the team cost of the team for the paths of the child node is equal to the team cost of the team

for the paths of the parent node (since the paths were not updated in the child node and are thus

identical), which in turn is at most the cost of the parent node (since the cost of the parent node

is the maximum of several quantities that include the team cost of the team for the paths of the

parent node), which in turn is at most y (as shown directly above). Finally, consider team teamk.

When the low level finds new paths for team teamk, it starts with T being the cost of the parent

node, which is at most y (as shown directly above). Thus, the min-cost max-flow algorithm on a

T -step time-expanded network constructed for team teamk and the constraints of the child node

finds a feasible integer flow of Mk units for T ≤ y since there exists a solution with makespan y

107

that obeys the constraints of the child node. The team cost of the corresponding paths for team

teamk is at most T due to Property 5.1. The cost of the child node is thus at most y, the smallest

makespan of any solution that obeys its constraints.

Theorem 5.3. CBM is correct, complete, and optimal.

Proof. LowLevel(teamk, Root) on Line 4 solves an Anonymous MAPF sub-problem for teamk

for which a solution always exists (Yu & LaValle, 2013a). CBM thus never terminates unsuc-

cessfully on Line 5.

Assume that no solution to a TAPF problem instance exists. Then, whenever CBM chooses

a node on Line 10, the paths of the node have colliding agents (because otherwise a solution

would exist due to Property 5.4). Thus, OPEN eventually becomes empty and CBM terminates

unsuccessfully on Line 24 since it generates only finitely many nodes due to Property 5.2.

Now assume that a solution exists and the makespan of an optimal solution is x. Assume, for

a proof by contradiction, that CBM does not terminate with a solution with makespan x. Thus,

whenever CBM chooses a node on Line 10 with a cost of at most x, the paths of the node

have colliding agents (because otherwise CBM would correctly terminate with a solution with

makespan at most x due to Property 5.4). A node whose constraints the optimal solution obeys

has a cost of at most x due to Property 5.7. The root note is such a node since the optimal

solution trivially obeys the (empty) constraints of the root node. Whenever CBM chooses such

a node on Line 10, the paths of the node have colliding agents (as shown directly above since

its cost is at most x). CBM thus generates the child nodes of this parent node, the constraints of

at least one of which the optimal solution obeys and which CBM thus inserts into OPEN with a

cost of at most x. Since CBM chooses nodes on Line 10 in non-decreasing order of their costs

due to Property 5.5. it chooses infinitely many nodes on Line 10 with costs of at most x, which

is a contradiction with Property 5.2.

108

5.4 Experiments

In this section, we describe the results of four experiments on a 2.50 GHz Intel Core i5-2450M

PC with 6 GB RAM. First, we compare CBM to four other TAPF or MAPF algorithms. Second,

we study how CBM scales with the number of agents in each team. Third, we study how CBM

scales with the number of agents. Fourth, we test CBM on a warehouse map.

5.4.1 Experiment 1: Alternative Algorithms

We compared our optimal TAPF algorithms CBM to two optimal (Non-Anonymous) MAPF

algorithms, namely (1) an implementation of CBS for makespan minimization provided by the

authors of Sharon et al. (2015) and (2) an implementation of the ILP-based MAPF algorithm

provided by the authors of Yu and LaValle (2013b), and two optimal TAPF algorithms, namely

(1) an unweighted version of CBM that runs the polynomial-time max-flow algorithm on a time-

expanded network without edge weights (instead of the min-cost max-flow algorithm on a time-

expanded network with edge weights) on the low level and (2) an ILP-based TAPF algorithm

that casts a TAPF problem instance as a series of integer multi-commodity flow problems as

described in Section 5.2, each of which it models as an ILP and solves with the ILP solver

Gurobi 6.0 (www.gurobi.com).

For Experiment 1, each team consists of 5 agents but the number of agents varies from 10

to 50 in increments of 5, resulting in 2 to 10 teams. For each number of agents, we generated

50 TAPF problem instances from 50 different 30× 30 2D 4-neighbor grids with 10% randomly

blocked cells by randomly assigning unique start cells to agents and unique targets to teams.

For the MAPF algorithms, we converted each TAPF problem instance to a (Non-Anonymous)

MAPF problem instance by randomly assigning targets of each team to agents in the same team.

Table 5.1 reports the average makespans and runtimes (in seconds) as well as the success

rates over the problem instances that were solved within a runtime limit of 5 minutes each. Gray

entries indicate that some problem instances were not solved within the time limit, while dashed

109

Table 5.1: Results for different TAPF and MAPF algorithms on 30×30 4-neighbor grids with randomly blocked cells for different numbersof agents.

CBM (TAPF)Unweighted CBM

(TAPF)ILP (TAPF) CBS (MAPF) ILP (MAPF)

agen

ts

mak

espa

n

time

(s)

succ

ess

mak

espa

n

time

(s)

succ

ess

mak

espa

n

time

(s)

succ

ess

mak

espa

n

time

(s)

succ

ess

mak

espa

n

time

(s)

succ

ess

10 22.34 0.34 1 22.08 0.41 0.72 22.34 18.24 1 36.36 0.03 1 36.36 8.66 115 23.88 0.57 1 24.64 1.06 0.44 23.88 35.44 1 37.32 0.05 1 37.32 15.31 120 25.06 0.78 1 23.73 2.06 0.22 24.74 62.85 0.94 39.84 0.55 1 39.84 30.30 125 25.20 1.07 1 22.25 1.58 0.08 24.76 88.55 0.82 40.44 0.12 1 40.44 43.76 130 26.26 1.71 1 31 6.73 0.02 24.70 108.75 0.66 41.92 0.21 1 41.92 65.86 135 26.50 1.92 1 - - 0 24.65 121.99 0.46 42.50 1.55 1 42.50 81.83 140 27.60 2.95 1 - - 0 25.29 152.98 0.14 43.69 4.82 0.98 43.53 115.53 0.9845 27.20 3.66 1 - - 0 24.29 161.52 0.14 42.41 2.60 0.92 42.37 133.47 0.9850 27.90 5.32 1 - - 0 24.50 161.95 0.04 43.96 7.95 0.96 42.86 166.99 0.86

110

entries indicate that all problem instances were not solved within the time limit. CBM solved all

TAPF problem instances within the time limit.

5.4.1.1 CBS and the ILP-Based MAPF Algorithm

Both MAPF algorithms solve most of the MAPF problem instances within the time limit. The

runtimes of CBM and CBS are similar because, on the low level, both the min-cost max-flow

algorithm of CBM (for a single team) and the A* algorithm of CBS (for a single agent) are fast.

Optimal solutions of the TAPF problem instances have smaller makespans than optimal solutions

of the corresponding MAPF problem instances due to the freedom of being able to assign targets

to agents optimally for the TAPF problem instances rather than assigning them randomly for the

MAPF problem instances.

5.4.1.2 Unweighted CBM

Unweighted CBM solves less than half of all TAPF problem instances within the time limit if the

number of agents is larger than 10 due to the large number of collisions among agents in different

teams produced by the max-flow algorithm on the low level in tight spaces with many agents,

which results in a large number of node expansions by CBM on the high level. We conclude that

biasing the search on the low level is important for CBM to solve all TAPF problem instances

within the time limit.

5.4.1.3 ILP-Based TAPF Algorithm

The ILP-based TAPF algorithm solves less than half of all TAPF problem instances within the

runtime limit if the number of agents is larger than 30, and its runtime is much larger than that

of CBM. The success rates and runtimes of the the ILP-based TAPF algorithm tend to be larger

than those of the ILP-based MAPF algorithm. The flow conservation constraints for the sources

and sinks in the ILP formulation of the MAPF problem instance are for a flow of one unit, while

those in the ILP formulation of the TAPF problem instance are for a flow of Mk units (for each

team teamk), which make the ILP for TAPF harder to solve than for MAPF.

111

Table 5.2: Results for CBM on 30×30 4-neighbor grids with randomly blocked cells for differentteam sizes.

teams team size makespan time (s) success1 100 7.58 0.76 12 50 11.10 1.37 14 25 15.90 2.94 15 20 17.12 2.10 110 10 23.04 3.93 120 5 29.32 5.33 125 4 30.88 6.49 150 2 39.76 12.27 1100 1 46.72 22.83 1

0

5

10

15

20

25

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70 80 90 100

Tim

e

Mak

esp

an

Teams

Makespan

Time

Figure 5.6: Makespans and runtimes for CBM on 30×30 4-neighbor grids with randomlyblocked cells for different team sizes.

5.4.2 Experiment 2: Team Size

We experimented with CBM on TAPF problem instances with a fixed total number of agents but

different numbers of teams. For Experiment 2, there are 100 agents but the number of agents

in a team (team size) varies from 100 to 1, resulting in 1 to 100 teams. For each number of

agents, we generated 50 TAPF problem instances from 50 different 30× 30 2D 4-neighbor grids

with 10% randomly blocked cells by randomly assigning unique start cells to agents and unique

targets to teams.

112


rates over the problem instances that were solved within a runtime limit of 5 minutes each.

CBM solved all TAPF problem instances within the time limit. For large team sizes and thus

small numbers of teams, the makespans are small because CBM has more freedom to assign

targets to agents. The runtimes are also small because the runtime of CBM is exponential in the

number of collisions it resolves on the high level (Sharon et al., 2015) and CBM tends to resolve

fewer collisions on the high level for smaller numbers of teams. Thus, it is advantageous for

teams to consist of as many agents as possible. Figure 5.6 shows a spectrum that spans from

one single team of 100 agents on the left side of the x-axis (where CBM solves Anonymous

MAPF problem instances) to 100 teams of one agent each (100 single agents) on the right side

of the x-axis (where CBM is solving MAPF problem instances). The average runtimes tend to

grows linearly in the number of teams, which indicates that CBM exploits the problem structure

of TAPF problem instances well.

5.4.3 Experiment 3: Number of Agents and Scalability

We experimented with CBM on TAPF problem instances with large numbers of agents to evalu-

ate how CBM scale in the number of agents. For Experiment 3, each team consists of 5 agents

but the number of agents varies from 100 to 550 in increments of 50, resulting in 20 to 110

teams. For each number of agents, we generated 50 TAPF problem instances from 50 different

30 × 30 2D 4-neighbor grids with 10% randomly blocked cells by randomly assigning unique

start cells to agents and unique targets to teams.


rates over the problem instances that were solved within a runtime limit of 5 minutes each. Gray

entries indicate that some problem instances were not solved within the time limit, while dashed

entries indicate that all problem instances were not solved within the time limit. Figure 5.7 shows

the number of agents on the x-axis and the success rate on the y-axis. For 400 agents or fewer,

the success rate is larger than 90%. As the number of agents increases, the success rate decreases

and the makespan and runtime both increase due to the increasing number of collisions among

113

Table 5.3: Results for CBM on 30×30 4-neighbor grids with randomly blocked cells for differentnumbers of agents.

agents teams makespan time (s) success100 20 30.10 6.79 1150 30 29.78 11.47 1200 40 32.12 16.48 1250 50 31.38 26.19 0.96300 60 32.40 38.94 0.96350 70 32.61 53.39 0.98400 80 33.28 93.01 0.94450 90 33.15 164.13 0.78500 100 35.27 277.24 0.22550 110 - - 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

100 150 200 250 300 350 400 450 500 550

Success

Agents

Figure 5.7: Success rates for CBM on 30×30 4-neighbor grids with randomly blocked cells fordifferent numbers of agents.

agents in different teams produced by the min-cost max-flow algorithm on the low level. For

500 agents, for example, more than 60% of the unblocked cells are occupied by agents and thus

many start cells of agents are also targets for other agents, resulting in many collisions.

114

Figure 5.8: A randomly generated TAPF problem instance on the warehouse map.

5.4.4 Experiment 4: Warehouse Map

We experimented with CBM on TAPF problem instances on the warehouse map shown in Fig-

ure 1.1 that represents the layout of part of an Amazon Robotics automated warehouse system

(Wurman et al., 2008). This map has been used to generate problem instances in recent research

on MAPF and TAPF (Cohen et al., 2016; Felner et al., 2018). The TAPF problem instances

we used have also been considered as benchmark TAPF problem instances by recent research

(Nguyen, Obermeier, Son, Schaub, & Yeoh, 2017). Figure 5.8 shows a randomly generated

TAPF problem instance on this warehouse map, represented as a 2D 4-neighbor grid. The light

gray cells represent free space and are unblocked. The dark gray cells represent storage loca-

tions of inventory pods and are blocked. There are 7 inventory stations on the left side. The

red cells are their exits, and the other 7 cells with blue-green gradient colors are their entrances.

Agents can enter and leave the inventory stations one at a time through their entrances and exits,

respectively. The incoming and outgoing queues of the inventory stations are not modeled for

simplicity. The cells with blue-green gradient colors in the storage area are occupied by agents.

Each agent needs to carry the inventory pod in its current cell to the inventory station of the same

color.

115

For Experiment 4, we generated 50 TAPF problem instances on the warehouse map. Each

problem instance has 420 agents. 210 “incoming” agents start at randomly determined storage

locations: 30 agents each need to move their inventory pods to the 7 inventory stations. In

order to create difficult simulated warehouse problem instances, we generated the start cells

of these agents randomly among all storage locations rather than clustering them according to

their target inventory stations. 210 “outgoing” agents start at the inventory stations: 30 agents

each need to move their inventory pods from the 7 inventory stations to the storage locations

vacated by the incoming agents. The problem is to assign the vacated storage locations to the 210

outgoing agents and plan collision-free paths for all 420 agents in a way such that the makespan

is minimized. The incoming agents that have the same inventory station as target are a team

(since they can arrive at the inventory station in any order), and all outgoing agents are a team.

So far, we have assumed that, for any TAPF problem instance, all start vertices are unique,

all targets are unique, and each one of the teams is given the same number of targets as there

are agents in the team, but these assumptions are not satisfied here: (a) The outgoing agents that

start at the same inventory station all start at its exit. In this case, we change the construction

of the T -step time-expanded network for the team of outgoing agents so that there is a supply

of one unit at vertex voutt ∈ V ′ for all t = 0, . . . , 29 and all vertices v ∈ V that correspond to

exits of inventory stations. This construction forces the outgoing agents that start at the same

inventory station to leave it one after the other during the first 30 time steps. No further changes

are necessary. (b) The incoming agents that have the same inventory station as target all end

at its entrance. In this case, we change the construction of the T -step time-expanded network

for each team of incoming agents so that there is an auxiliary vertex with a demand of 30 units

and vertex voutt ∈ V ′ for all t = 0, . . . , T is connected to the auxiliary vertex with an edge with

unit capacity and zero edge weight, where v ∈ V corresponds to the entrance of the inventory

station. This construction forces the incoming agents to enter the inventory station at different

time steps. No further changes are necessary. (c) There could be more empty storage locations

than outgoing agents. In this case, no changes are necessary.

116

For the TPAF problem instance shown in Figure 5.8, CBM returned a solution with makespan

65 in 25.97 seconds. The cost of the shortest path for both green agents near the top-right corner

to the green inventory station in the bottom-left corner is 64. Thus, at least one of them has to

wait for at least one time step to enter the green inventory station (thus also validating that the

solution found by CBM is optimal).

In general, CBM found solutions for 40 of the 50 simulated warehouse problem instances

within a runtime limit of 5 minutes each, yielding a success rate of 80%. The average makespan

over the solved simulated warehouse problem instances is 63.73, and the average runtime is

39.26 seconds. Since even bounded-suboptimal (Non-Anonymous) MAPF algorithms that were

specifically designed for simulated Amazon Robotics automated warehouse systems do not scale

well to hundreds of agents (Cohen, Uras, & Koenig, 2015), we conclude that CBM is a promising

TAPF algorithm for applications of real-world scale.

5.5 Including Kinematic Constraints

Since any solution of a TAPF problem instance is also a solution of a MAPF problem instance

on the same graph for a suitable assignment of targets to agents, we can directly use MAPF-

POST (see Section 3.3) to transform a TAPF solution returned by any TAPF algorithm into a

plan-execution schedule that can be safely executed by teams of real-world agents. Specifi-

cally, our recent research (Honig, Kumar, Ma, et al., 2016) (not covered as a contribution in this

dissertation) uses MAPF-POST in a post-processing step to transform a TAPF solution into a

plan-execution schedule that works for the coordination of target-assignment and path-planning

operations of teams of differential-drive robots. The resulting plan-execution schedule takes

kinematic constraints of these robots into account and provides a guaranteed safety distance be-

tween them. The results validate the hypothesis that our TAPF algorithms have the potential to

be eventually deployed in real-world applications of multi-agent systems for the one-shot coor-

dination of the autonomous target-assignment and path-planning operations of teams of agents.

117

5.6 Summary

In this chapter, we formalized and studied TAPF, a new variant of MAPF that models the one-

shot combined target-assignment and path-planning problem for multiple teams of agents. We

developed optimal TAPF algorithms to bridge the gap between the extreme cases of Anony-

mous MAPF (one team of agents) and (Non-Anonymous) MAPF (teams of one agent each).

We demonstrated how TAPF can be solved by generalizing the flow-based ILP formulation for

MAPF (see Section 3.2.3). We then presented CBM, that simultaneously assigns targets to

agents and plans collision-free paths for the agents. CBM exploits the combinatorial structure of

TAPF by breaking it down to the polynomial-time solvable sub-problems of coordinating differ-

ent agents in single teams and the NP-hard sub-problem of coordinating different teams. It uses

a min-cost max-flow algorithm (Goldberg & Tarjan, 1987) on a time-expanded network for the

polynomial-time solvable sub-problems and CBS for the NP-hard sub-problem. Theoretically,

we proved that CBM is correct, complete, and optimal. Experimentally, we compared CBM to

the ILP-based algorithm. We also demonstrated the scalability of CBM for TAPF problem in-

stances with dozens of teams and hundreds of agents and adapted it to TAPF problem instances

on a warehouse map. MAPF-POST can be used in a post-processing step to transform TAPF

solutions, like MAPF solutions, into plan-execution schedules that can be safely executed by

teams of real-world agents.

To summarize, in this chapter, we validated the hypothesis that formalizing and studying new

variants of MAPF can result in new algorithms for the one-shot combined target-assignment and

path-planning problem for teams of agents which can potentially be applied to and thus benefit

real-world applications of multi-agent systems.

118

Chapter 6

Long-Term Target Assignment and Path Planning

In this chapter, we present the third major contribution of this dissertation. Specifically, we

formalize and study a new variant of MAPF, called Multi-Agent Pickup and Delivery (MAPD),

that models the long-term coordination of autonomous target-assignment and path-planning op-

erations of teams of agents. We demonstrate how environmental characteristics can be utilized

to ensure long-term robustness for well-formed MAPD problem instances, a class of MAPD

problem instances that are realistic for many real-world applications of multi-agent systems. We

present three novel MAPD algorithms that can utilize such environmental characteristics of well-

formed MAPD problem instances. Theoretically, we proved that they are long-term robust for

all well-formed MAPD problem instances. Experimentally, we test them with up to 500 agents

and 1,000 tasks. Therefore, these results validate the hypothesis that formalizing and studying

new variants of MAPF can result in new algorithms for the long-term target-assignment and

path-planning problem. We follow our formalization of MAPD in Section 2.5.

The remainder of this chapter is structured as follows. In Section 6.1, we reiterate the mo-

tivation behind formalizing and studying MAPD. In Section 6.3, we introduce the concept of

well-formedness that allows for long-term robustness. In Section 6.4, we present two decou-

pled MAPD algorithms that are long-term robust for all well-formed MAPD problem instances

This chapter is based on Ma, H., Li, J., Kumar, T. K. S., & Koenig, S. (2017). Lifelong multi-agent path findingfor online pickup and delivery tasks. In International Conference on Autonomous Agents and Multiagent Systems(pp. 837–845).

119

and discuss their extensions. In Section 6.5, we present a centralized MAPD algorithm that is

also long-term robust for all well-formed MAPD problem instances and discuss its extensions.

In Section 6.6, we experimentally compare these MAPD algorithms in a simulated warehouse

environment. Finally, we summarize the contributions of this chapter in Section 6.7.

6.1 Introduction

In this chapter, we develop algorithmic techniques for the long-term coordination of autonomous

target-assignment and path-planning operations of teams of agents by studying MAPD (see Sec-

tion 2.5). MAPD generalizes the one-shot problems MAPF, Anonymous MAPF, and TAPF to

a long-term problem. In MAPD, M agents have to attend to a stream of incoming tasks. Each

task enters the system at an unknown time and is characterized by two targets, namely a pickup

vertex and a delivery vertex. A free agent (Definition 2.25), namely one that is currently not

executing any task, can be assigned an unexecuted task. To execute the task, the agent has to

first move from its current vertex to the pickup vertex of the task, become occupied (Definition

2.25) and start to execute the task upon reaching the pickup vertex of the task, and then move

from the pickup vertex to the delivery vertex of the task, while avoiding collisions with other

agents. For example, in an Amazon Robotics automated warehouse system (see Figure 1.1 for

an example), warehouse robots have to attend to tasks of relocating inventory pods from their

storage locations to inventory stations or vice versa. A warehouse robot that is not carrying an

inventory pod can be assigned any task of relocating an inventory pod. To execute the task, it

then has to move from its current location to the pickup location of the inventory pod and from

there to the delivery location of the inventory pod, while avoid collisions with other warehouse

robots.

There are three benefits of modeling “pickup-and-delivery” tasks that have an intermediate

target (the pickup vertex) and a final target (the delivery vertex) each instead of “navigation”

tasks that have only one target each: (a) Modeling “pickup-and-delivery” tasks results in a mix

120

of both anonymous and non-anonymous agents (see Section 2.5.1), while modeling “naviga-

tion” tasks results in only anonymous agents, which limits its generalizability. (b) The resulting

MAPD algorithms still apply directly to “navigation” tasks because each such task is a special

case of a “pickup-and-delivery” task with the pickup and delivery vertices being the same ver-

tex. (c) Modeling “pickup-and-delivery” tasks also makes it easy to explain the resulting MAPD

algorithms, even though these algorithms can easily be generalized to cases where the tasks have

multiple (ordered) intermediate targets and a subsequent final target each (but it is much harder

to generalize algorithms for “navigation” tasks to these cases).

MAPD also models the long-term coordination of both target-assignment and path-planning

operations for many other real-world applications of multi-agent systems, including autonomous

aircraft-towing vehicles (Morris et al., 2016) that constantly assign incoming towing tasks and

tow aircraft between airport terminal gates and runways and other multi-robot systems that use

fleets of forklift robots (Pecora et al., 2018; Salvado et al., 2018) or teams of service robots

(Ahmadi & Stone, 2006; Khandelwal et al., 2017; Veloso et al., 2015) to relocate items between

different locations. These real-world applications require the assignment of tasks to agents and

the planning of collision-free paths both in a lifelong and online setting. In a lifelong setting,

agents have to attend to a stream of tasks. Therefore, agents cannot stay idly at their targets after

they finish executing tasks. In an online setting, tasks can enter the system at any time. Therefore,

assigning tasks to and planning paths for the agents cannot be done in advance but rather need to

be done online. Solving the combined target-assignment and path-planning problem optimally

in a lifelong setting, even when all the tasks are known a priori, is computationally challenging

as the number of tasks becomes large. For example, a recent CBS-based algorithm requires two

minutes of runtime on average to compute optimal solutions for 4 agents and 4 tasks (Henkel,

Abbenseth, & Toussaint, 2019), assuming that the tasks are known a priori. Moreover, solving

the combined target-assignment and path-planning problem optimally in an online setting is im-

possible, as indicated by the competitive analysis result for the online target-assignment problem

(Azar et al., 1995; Kalyanasundaram & Pruhs, 1993). Therefore, unlike the research on optimal

algorithms for one-shot coordination problems, such as MAPF, PERR, and TAPF, we focus on

121

developing MAPD algorithms that are suboptimal but long-term robust (Definition 2.29) in this

chapter.

As we have previously discussed in Section 1.1, solving the long-term combined target-

assignment and path-planning problem MAPD requires repeatedly solving both the one-shot

target-assignment and the one-shot path-planning sub-problems as new tasks are added to the

system. However, existing research has considered only either the one-shot coordination prob-

lems, such as MAPF, PERR, and TAPF, or the long-term target-assignment problem. In particu-

lar, as we have discussed in Section 3.4.3, existing research (Gerkey & Mataric, 2002; Khuller et

al., 1994; Parker, 1998; 1999; Werger & Mataric, 2000) in the operations research and robotics

communities has studied the long-term target-assignment problem extensively under the names

“Online Assignment” or “Iterative Multi-Robot Task Allocation” and reduces it to a sequence

of one-shot target-assignment problems. Therefore, one can potentially solves the long-term

combined target-assignment and path-planning problem by reducing it to a sequence of one-

shot target-assignment and path-planning sub-problems in a similar way and adapt one-shot

target-assignment and path-planning algorithms to an online setting to solve these one-shot sub-

problems. However, we show in Section 6.2 that a trivial combination of target-assignment and

path-planning algorithms can result in deadlocks and is thus not long-term robust. We have also

mentioned in Section 3.1.2.3 that some research on Prioritized Planning, including Cooperative

A* (see Section 3.2.1), has utilized environmental characteristics to guarantee completeness for

well-formed MAPF problem instances (Cap et al., 2015; Turpin, Mohta, et al., 2014; Wang &

Botea, 2011; Yu, 2017), but for the one-shot path-planning problem only. Inspired by the above

results, we thus examine how and how well these results can be applied to the long-term com-

bined target-assignment and path-planning problem MAPD and how long-term robustness can

be guaranteed, for example, by utilizing environmental characteristics.

Therefore, we demonstrate how MAPD algorithms can utilize environmental characteris-

tics to ensure long-term robustness for well-formed MAPD problem instances (Definition 6.1),

a class of MAPD problem instances analogous to well-formed MAPF problem instances (Def-

inition 3.1) that are realistic for many real-world multi-agent systems. Well-formed MAPD

122

s1 a1 a2 g1

Figure 6.1: Example of an unsolvable MAPD problem instance.

problem instances capture the important graph properties of given environments of many multi-

agent systems that allow MAPD algorithms to avoid deadlocks via a simple deadlock-avoidance

mechanism: Agents should only be allowed to park (stay for a long period) at vertices where

they cannot block other agents. For example, in an automated warehouse system, warehouse

robots are only allowed to charge batteries or pick up and drop off inventory pods in locations

where they cannot block other robots. We present two decoupled algorithms, Token Passing (TP)

and Token Passing with Task Swaps (TPTS), and one centralized MAPD algorithm, CENTRAL.

As suggested before, these MAPD algorithms repeatedly apply one-shot target-assignment and

path-planning algorithms to a sequence of one-shot sub-problems and utilize environmental char-

acteristics to combine the target-assignment and the path-planning algorithms in a smart way to

avoid deadlocks. We show that all of these MAPD algorithms are thus long-term robust for

well-formed MAPD problem instances. We compare them experimentally in a simulated ware-

house environment with up to 500 agents and 1,000 tasks, thus showcasing their advantages and

disadvantages for the long-term coordination of target-assignment and path-planning operations

for large-scale multi-agent systems.

6.2 Motivating MAPD Examples

Not every MAPD problem instance is solvable. Figure 6.1 shows an example of a MAPD prob-

lem instance on a 2D 4-neighbor grid with two free agents a1 and a2 and one task τ1 with pickup

vertex s1 and delivery vertex g1 that is added to the system at time step 0. Colored circles are

agents. Dashed circles are pickup and delivery vertices. Neither of the agents can execute task

τ1.

123

Even if a MAPD problem instance is solvable, a trivial concatenation of individual solutions

for the one-shot target-assignment and one-shot path-planning sub-problems can result in dead-

locks. We consider the MAPD problem instance shown in Figure 2.9, where a direct application

of one-shot target-assignment and one-shot path-finding algorithms might result in a deadlock.

For example, if the target assignment lets agent a1 execute task τ1 and assigns agent a2 its cur-

rent vertex, then there are no collision free paths for the agents because, otherwise, the paths

end at the same vertex, that is, the resulting one-shot path-planning problem is unsolvable. One

possible solution is to (still) assign task τ1 to agent a1 at time step 0, move agent a2 from vertex

E to vertex B, and then move agent a1 from the pickup vertex s1 = A to the delivery vertex

g1 = E (see Section 2.5.2 for the complete solution). The intuition is to design a mechanism

that lets agent a2 move away from vertex E, which is the delivery vertex of a task, to another

vertex v and rest (that is, stay for a long period without an intention to move away) there so that

it cannot block agent a1 from executing the task. Ideally, one can guarantee that agent a2 can

always find such a vertex v and a collision-free path from its current vertex to vertex v and that

it should not block agents from executing other tasks while resting at vertex v

6.3 Utilizing Environmental Characteristics: Well-Formedness

We now demonstrate how the environmental characteristics, namely well-formedness, of a class

MAPD problem instances can provide a sufficient condition to allow for long-term robustness.

Our MAPD algorithms then utilize these environmental characteristics to guarantee long-term

robustness.

The definition of well-formed MAPD problem instances is inspired by that of well-formed

MAPF problem instances (Definition 3.1). As we explained in the above motivating example,

the intuition is that agents should only be allowed to rest (that is, stay for a long period without

an intention to move away) at vertices, called endpoints, where they cannot block other agents.

For example, storage locations of inventory pods are typically placed in a warehouse so as not to

block traffic, and office workspaces are also typically placed in office environments so as not to

124

e1 e2

e3 e4

e1 e2

e3 e4

e1

e2

e3 e4

Figure 6.2: Three MAPD problem instances.

block traffic. The set Vep of endpoints of a MAPD problem instance contains all initial vertices

of agents, all pickup and delivery vertices of tasks, and perhaps additional designated vertices

that allow the agents to rest. Let Vtsk denote the set of all pickup and delivery vertices of tasks,

called the task endpoints. The set Vep \ Vtsk is called the set of non-task endpoints.

Definition 6.1. A MAPD problem instance is well-formed if and only if:

1. The number of tasks is finite.

2. There are no fewer non-task endpoints than the number of agents.

3. For any two endpoints, there exists a path between them that traverses no other endpoints.

Well-formed MAPD problem instances (with at least one task) have at leastM+1 endpoints.

Figure 6.2 shows three MAPD problem instances. Black cells are blocked. Blue and green

circles are the initial vertices of agents. Red dashed circles are task endpoints. Black dashed

circles are non-task endpoints. We assume each of the MAPD problem instances has finitely

many tasks. The MAPD problem instance on the left is well-formed. The MAPD problem

instance in the center is not well-formed because there are two agents but only one non-task

endpoint. The MAPD problem instance on the right is not well-formed because all paths between

endpoints e2 and e3 (or e3 and e4) traverse endpoint e1. We design MAPD algorithms in the

following sections that are long-term robust for all well-formed MAPD problem instances.

125

6.4 Decoupled MAPD Algorithms

In this section, we present first a simple decoupled MAPD algorithm, called Token Passing

(TP), and then one of its improved versions, called Token Passing with Task Swaps (TPTS),

that is more effective. Similar to decoupled MAPF algorithms (see Section 3.1.2.3), decoupled

MAPD algorithms make decisions for one agent at a time, where each agent assigns itself a task

and computes a path given some global information about the system.

6.4.1 TP

TP is based on ideas similar to the ones used by an existing greedy algorithm (Kalyanasundaram

& Pruhs, 1993) for the long-term target-assignment problem, where targets are greedily assigned

to the agents whenever there are agents available to navigate to targets, and Cooperative A* (see

Section 3.2.1) for the one-shot path-planning problem, where agents plan their paths one after

the other. The task set of TP only contains all tasks that are not assigned to agents. We describe

a version of TP that uses token passing and can thus potentially be extended to a fully distributed

MAPD algorithm via any token-based mutual exclusion mechanism (Suzuki & Kasami, 1985).

The token is a synchronized shared block of memory that contains the current paths of all agents,

the task set, and the task assignments that record which tasks are currently assigned to which

agent. Similar to the definition of a path for one-shot problems, all MAPD algorithms presented

in this chapter, including TP, always assume that an agent rests at the last vertex of its path in

the token when it reaches the vertex. The idea of passing a token has previously been used to

develop COBRA (Cap et al., 2015), which is a MAPF-like algorithm that does not take into

account that pickup or delivery vertices of tasks can be occupied by agents not executing them

and can thus result in deadlocks for MAPD.

Algorithm 6.1 shows the pseudocode of TP, where loc(ai) denotes the current vertex of agent

ai. Agent ai finds all paths via space-time A* searches (Section 3.2.1.1) that do not result in it

colliding with other agents that move along their paths in the token. Since time-minimal paths

need to be found only to endpoints, the arrival times at all endpoints from all vertices (that ignore

126

Algorithm 6.1: TP

/* system executes now */1 Initialize token with the (trivial) path 〈loc(ai)〉 for each agent ai;2 while true do3 Add all new tasks, if any, to the task set T ;4 while agent ai exists that requests token do

/* system sends token to ai - ai executes now */5 T ′ ← {τj ∈ T |no path of any other agents in token ends at sj or gj};6 if T ′ 6= ∅ then7 τ ← arg minτj∈T ′ h(loc(ai), sj);8 Assign τ to ai;9 Remove τ from T ;

10 Update ai’s path in token with Path1(ai, τ , token);

11 else if no task τj ∈ T exists with gj = loc(ai) then12 Update ai’s path in token with the path 〈loc(ai)〉;13 else14 Update ai’s path in token with Path2(ai, token);

/* ai returns token to system - system executes now */

15 All agents move along their paths in token for one time step;/* system advances to the next time step */

collisions) are computed in a preprocessing phase and then used as consistent h-values for all

space-time A* searches. TP works as follows: The system initializes the token with the trivial

paths where all agents rest at their initial vertices [Line 1]. At each time step, the system adds

all new tasks, if any, to the task set [Line 3]. Any agent that has reached the end of its path in

the token requests the token once per time step. The system then sends the token to each agent

that requests it, one after the other [Line 4]. The agent with the token chooses a task from the

candidate task set T ′ such that no path of other agents in the token ends at the pickup or delivery

vertex of the task [Line 5] because it can then find a path to the pickup vertex and then a path to

the delivery vertex of such a task and safely rest at the delivery vertex.

• If there is at least one such task, then the agent assigns itself the one with the smallest

h-value from its current vertex to the pickup vertex of the task and removes this task from

the task set [Lines 7-9]. The agent then calls Function Path1 to update its path in the token

127

with a path that (1) is the concatenation of two time-minimal sub-paths, one that moves

from its current vertex to the pickup vertex of the task (and rests there) and one that moves

from there to the delivery vertex of the task (and rests there), and (2) does not collide with

the paths of other agents stored in the token [Line 10].

• If there is no such task, then the agent does not assign itself a task at the current time step.

If the agent is not at the delivery vertex of a task in the task set, then it updates its path in

the token with the trivial path where it rests at its current vertex [Lines 11-12]. Otherwise,

to avoid deadlocks (where it blocks other agents from executing the task whose delivery

vertex is its current vertex), it calls Function Path2 to update its path in the token with a

time-minimal path that (1) moves from its current vertex to an endpoint (and rests there)

such that the delivery vertices of all tasks in the task set are different from the chosen

endpoint and no path of other agents in the token ends at the chosen endpoint and (2) does

not collide with the paths of other agents stored in the token [Line 14]. In either case, the

agent rests at an endpoint that is different from the delivery vertex of any task in the task

set at the end of its path. (The latter case thus subsumes the former one.)

Finally, the agent returns the token to the system and moves along its path in the token [Line

15].

We now prove that the agent is always able to find a path because it finds a path only when

it is at an endpoint and thus has to find only a path from an endpoint to an endpoint.

Property 6.1. Function Path1 always returns a path successfully for well-formed MAPD prob-

lem instances.

Proof. We construct a path from the current vertex loc(ai) of agent ai (which is an endpoint) via

the pickup vertex sj of task τj to the delivery vertex gj of task τj that does not collide with the

paths of other agents stored in the token. Due to Definition 6.1, there exists a path from loc(ai)

via sj to gj that traverses no other endpoints. The paths of other agents stored in token end at

endpoints that are different from loc(ai) (because those paths do not collide with the path of ai

128

stored in token), sj , and gj . Thus, this path does not collide with the paths of the other agents if

ai moves along it after all other agents have moved along their paths.

Property 6.2. Function Path2 always returns a path successfully for well-formed MAPD prob-

lem instances.

Proof. Due to Definition 6.1, there exist at least M non-task endpoints and thus at least one

non-task endpoint such that no path of agents other than agent ai in the token ends at the non-

task endpoint. The delivery vertices (which are task endpoints) of all tasks in the task set are

different from the non-task endpoint as well. We construct a path from the current vertex loc(ai)

of agent ai (which is an endpoint) to the chosen endpoint that does not collide with the paths of

other agents stored in the token. Due to Definition 6.1, there exists a path from loc(ai) to the

chosen endpoint that traverses no other endpoints. The paths of other agents stored in token end

at endpoints that are different from loc(ai) and the chosen endpoint. Thus, this path does not

collide with the paths of the other agents if ai moves along it after all other agents have moved

along their paths.

Theorem 6.1. All well-formed MAPD problem instances are solvable, and TP is long-term

robust for them.

Proof. We show that each task is eventually assigned to some agent and executed by it. Each

agent requests the token after a bounded number of time steps, and no agent rests at the delivery

vertex of a task in the task set due to Line 14. Thus, the condition on Line 6 becomes eventually

satisfied and some agent assigns itself some task on Line 8. The agent is then able to execute it

due to Properties 6.1 and 6.2.

6.4.1.1 Extensions of TP

The design of TP is kept as simple as possible but has the following benefits:

• TP can be used to constructively prove that all well-formed MAPD problem instances, as

shown above.

129

• TP is still long-term robust even if each agent has to spend a known or unknown number

of time steps at the pickup or delivery vertex, for example, to pick up or drop down a

shelve of products in an automated warehouse system, because the agent can then plan a

path from the pickup (respectively delivery) vertex only when it has finished the pickup

(receptively delivery) operation.

• TP can replace the space-time A* search in Functions Path1 and Path2 with other one-

shot single-agent path/motion-planning algorithms to plan a path for an agent as long as

the path avoids collisions with the paths of other agents stored in token. For example,

in Chapter 7, our extension of TP uses a single-agent path-planning algorithm that takes

kinematic constraints of real-world agents into account.

• TP can potentially be generalized to a distributed setting using a token-based mutual ex-

clusion mechanism (Suzuki & Kasami, 1985).

• TP can potentially apply, without many changes, to cases where tasks have more than one

intermediate target (instead of only one—the pickup vertex) each.

TP can potentially be made more effective with changes in the following components of its

design, which, however, can make TP less general:

1. Task assignment can be made more effective since the cost of assigning a task τj ∈ T to a

free agent ai can be defined as (1) h(loc(ai), sj) if no paths of other agents ends at sj or gj

(similar to Line 7) and (2) a sufficiently large constant otherwise. Therefore, on Line 4, the

token can thus be sent to the agent ai with the smallest cost of being assigned any task in

the task set over all agents that request the token (instead of an arbitrary agent that requests

the token). This procedure thus greedily assigns a task to an agent with the smallest cost

but, however, requires significantly more expensive communication between agents to

generalize TP to a distributed setting. A centralized task-assignment procedure such as

the Hungarian method (Kuhn, 1955) can further improve upon this greedy task assignment

to produce one with the smallest total cost but makes it even harder to generalize TP to a

130

distributed setting. Our centralized algorithm (see Section 6.5) uses the Hungarian method

for task assignment.

2. Task assignment can be changed so that new tasks that have just been added to the system

or agents that have just become free can be taken into account at each time step. Our

extension of TP (see Section 6.4.2) allows reassignment of tasks to agents by transferring

a task from one free agent to another. Our centralized algorithm (see Section 6.5) reas-

signs tasks to agents whenever a new task has been added to the system or an agent has

become free. However, additional considerations need to be made to guarantee long-term

robustness because the agents may need to plan paths from non-endpoints.

3. Path planning can be made more effective. For example, once all agents that request the

token assign themselves (pickup vertices of) tasks (as described in Function Path1) or

endpoints where they can rest (as described in Function Path2), we can use any MAPF

algorithms to plan paths for them together. Based on this idea, our centralized algorithm

(see Section 6.5) uses CBS to plan paths for these agents together. However, it can be hard

to generalize this idea to a distributed setting because a centralized MAPF algorithm is

used. Also, additional considerations need to be made to guarantee that there exist paths

that move the agents to their assigned targets.

4. Deadlock avoidance at the pickup vertex can be made more effective. Function Path1

currently plans two time-minimal paths for an agent ai, the first one from its current vertex

to the pickup vertex sj of its assigned task τj and the second one from sj to gj . Therefore,

the agent can only reach sj at a time step where no path of other agents goes through

sj at a later time step since the agent need to rest at the last vertex (sj) of the first path.

This guarantees that the second path can be found but can unnecessarily delay the arrival

time of the agent at sj . The space-time A* search can be modified so that it finds a time-

minimal path first to sj and then to gj . This modification can potentially let agent ai start to

execute its assigned task earlier in Function Path1 and can even allow agent ai to consider

tasks whose pickup vertex is the last vertex of the path of another agent ai′ (as long as

131

agent ai can reach the pickup vertex before agent ai′) on Line 7. Recent research uses

this modification (Grenouilleau, van Hoeve, & Hooker, 2019). However, this modification

cannot generalize to cases where each agent has to spend an unknown number of time

steps at the pickup vertex to finish the pickup operation.

5. TP can be made more effective if it does not assume that each agent rests at the last vertex

of its path since this vertex can then be used by other paths. Recent research uses this idea

(Okumura, Machida, Defago, & Tamura, 2019). However, the resulting algorithm is not

long-term robust for well-formed MAPD problem instances.

6.4.2 TPTS

We use one of the ideas presented in Section 6.4.1.1 to make TP more effective. The resulting

algorithm, TPTS, is similar to TP except that its task set now contains all unexecuted tasks, rather

than only unassigned tasks. This means that an agent with the token can assign itself not only a

task that is not assigned to any agent but also a task that is already assigned to another agent as

long as that agent is still moving to the pickup vertex of the task. This might be beneficial when

the former agent can move to the pickup vertex of the task in fewer time steps than the latter

agent. The latter agent is then no longer assigned the task and no longer needs to execute it. The

former agent thus sends the token to the latter agent so that the latter agent can try to assign itself

a new task.

Algorithm 6.2 shows the pseudocode of TPTS. It uses the same main loop [Lines 2-6] and the

same Functions Path1 and Path2 as TP. The agent ai with the token executes Function GetTask

[Line 5], where it tries to assign itself a task in the task set T and find a path to an endpoint. The

call of Function GetTask returns success (true) if agent ai finds a path to an endpoint and failure

(false) otherwise.

Algorithm 6.3 shows the pseudo-code of Function GetTask. When executing Function Get-

Task, agent ai considers all tasks τ from the candidate task set T ′ such that no path of other

agents in the token ends at the pickup or delivery vertex of the task [Line 2], one after the other

132

Algorithm 6.2: TPTS

/* system executes now */1 Initialize token with the (trivial) path 〈loc(ai)〉 for each agent ai;2 while true do3 Add all new tasks, if any, to the task set T ;4 while agent ai exists that requests token do

/* system sends token to ai - ai executes now */5 GetTask(ai, token); // function always returns true (Property 6.3)

/* ai returns token to system - system executes now */

6 All agents move along their paths in token for one time step and remove tasks fromT when they start to execute them;

/* system advances to the next time step */

in order of increasing h-values from its current vertex to the pickup vertices of the tasks [Lines

3-5]. If the task is not assigned to any agent, then (as in TP) agent ai assigns itself the task, up-

dates its path in the token with Function Path1 and returns success [Lines 6-9]. Otherwise, agent

ai remembers the token with the paths, task set, and task assignments in case the reassignment

of the task is not successful [Line 11]. Agent ai then unassigns the task from the agent ai′ that is

assigned the task and assigns itself the task [12-13]. It removes the path of agent ai′ from the to-

ken and updates its own path in the token with Function Path1 [Lines 14-15]. If agent ai reaches

the pickup vertex of the task with fewer time steps than agent ai′ , then it sends the token to agent

ai′ , which executes Function GetTask to try to assign itself a new task and eventually returns the

token to agent ai [Lines 16-5]. If agent ai′ returns success, then agent ai returns success as well

[Lines 19-20]. In all other cases, agent ai reverses all changes to the paths, the task set, and the

task assignments in the token and then considers the next task τ [Lines 11 and 21].

Once agent ai has considered all tasks τ unsuccessfully, then it does not assign itself a task

at the current time step. If it is not at an endpoint (which can happen only during a call of

Function GetTask on Line 18), then it updates its path in the token with Function Path2 to move

to an endpoint [Line 23]. The call can fail since agent ai is not at an endpoint. Agent ai returns

success or failure depending on whether it was able to find a path [Lines 30 and 25]. Otherwise,

(as in TP) if the agent is not in the delivery vertex of a task in the task set, then it updates its

133

Algorithm 6.3: Function GetTask(ai, token)

1 Function GetTask(ai, token)2 T ′ ← {τj ∈ T |no path of other agents in token ends at sj or gj};3 while T ′ 6= ∅ do4 τ ← arg minτj∈T ′ h(loc(ai), sj);5 Remove τ from T ′;6 if τ is not assigned to any agent then7 Assign τ to ai;8 Update ai’s path in token with Path1(ai, τ , token);9 return true;

10 else11 Remember token (with paths, task set, and task assignments);12 ai′ ← agent that is assigned τ ;13 Unassign τ from ai′ and assign τ to ai;14 Remove ai′’s path from token;15 Path1(ai, τ , token);16 Compare the arrival time of ai at sj on its path in token to the arrival time of

ai′ at sj on its path that has been remembered on Line 11;17 if ai reaches sj earlier than ai′ then

/* ai sends token to ai′ - ai′ executes now */18 success← GetTask(ai′ , token);

/* ai′ returns token to ai - ai executes now */19 if success then20 return true;

21 Restore token (with paths, task set, and task assignments which have beenremembered on Line 11);

22 if loc(ai) is not an endpoint then23 Update ai’s path in token with Path2(ai, token);24 if no path was found then25 return false;

26 else if no task τj ∈ T exists with gj = loc(ai) then27 Update ai’s path in token with the path 〈loc(ai)〉;28 else29 Update ai’s path in token with Path2(ai, token);

30 return true;

path in the token with the trivial path where it rests at its current vertex [Line 27]. Otherwise, to

avoid deadlocks (where it blocks other agents from executing the task whose delivery vertex is

134

its current vertex), it updates its path in the token with Function Path2 [Line 29]. In both cases,

it returns success [Line 30].

Finally, the agent returns the token to the system and moves along its path in the token,

removing the task that it is assigned (if any) from the task set once it reaches the pickup vertex

of the task and thus starts to execute it [Line 6 of Algorithm 6.2].

Property 6.3. Function GetTask always returns successfully for well-formed MAPD problem

instances when called on Line 5 of Algorithm 6.2.

Proof. Function GetTask returns in finite time because (1) the number of tasks in the task set is

finite; (2) an agent can unassign a task from another agent only if it reaches the pickup vertex

of the task with fewer time steps than the other agent; (3) a task that is assigned to some agent

always continues to be assigned to some agent until it has been executed; and (4) Functions

Path1 and Path2 return in finite time. Properties (1) to (3) make sure that the number of times

an agent can unassign any task from another agent is bounded during the execution of Function

GetTask and the number of recursive calls of Function GetTask on Line 18 of Algorithm 6.3

is thus bounded. Properties (1) to (4) thus make sure that the execution time of each call of

Function GetTask is also bounded.

We now show that an agent that executes Function GetTask on Line 5 of Algorithm 6.2 finds

a path to an endpoint. The agent is always able to find paths with Functions Path1 and Path2

on all lines but Line 23 of Algorithm 6.3 because it is then at an endpoint and thus has to find

a path from an endpoint to an endpoint. The proofs are similar to those of Properties 6.1 and

6.2. However, the agent is not guaranteed to find a path with Function Path2 on Line 23 of

Algorithm 6.3 because it is then not at an endpoint and thus has to find a path from a non-

endpoint to an endpoint. Since the agent is at an endpoint during the call of Function GetTask on

Line 5 of Algorithm 6.2, it does not execute Line 23 of Algorithm 6.3, finds a path, and returns

success.

Theorem 6.2. TPTS is long-term robust for all well-formed MAPD problem instances.

Proof. The proof is similar to the one of Theorem 6.1 but uses Property 6.3.

135

a1

a2τ1

τ2

a1

a2τ1

τ2

a1

a2τ1

τ2

Figure 6.3: Example of a MAPD problem instance for the comparison of TP and TPTS.

TPTS is often more effective than TP but Figure 6.3 shows that this is not guaranteed. The

figure shows a MAPD problem instance with two agents a1 and a2 and two tasks τ1 and τ2 that

are added to the system at time step 0. The blue and green circles are the initial vertices of agents.

Dashed circles are the pickup/delivery vertices. The pickup vertex is the same as the delivery

vertex for both tasks. Assume that both a1 and a2 request the token and the system sends it to

agent a1 first. Agent a1 assigns itself τ1. Figure 6.3 (left) shows the path of a1. The system then

sends the token to a2 next. In TP, a2 assigns itself τ2. Figure 6.3 (center) shows the paths of

a1 and a2 for TP. The resulting service time and makespan are both two. In TPTS, however, a2

assigns itself τ1 because it can reach the pickup vertex of τ1 with fewer time steps than a1. In

return, a1 assigns itself τ2. Figure 6.3 (right) shows the paths of a1 and a2 for TPTS. The service

time is three (the average of five and one), and the makespan is five.

6.4.2.1 Extensions of TPTS

TPTS requires only additional local communication between pairs of agents and thus has the

same benefits of TP (see Section 6.4.1.1). TPTS can potentially be made more effective by

allowing agents on the way to the pickup vertex of its assigned task to consider new tasks that

have just been added to the system. This can be achieved by letting all free agents request the

token at every time step, which makes TPTS much less efficient, or at time steps where a new

136

task is added to the system, which, in a distributed setting, requires an additional broadcasting

mechanism to inform all free agents (that a new task is added). In both cases, the call of Function

GetTask on Line 5 of Algorithm 6.2 can return failure for an agent that is currently on the way to

the pickup vertex of its assigned task because it is not necessarily at an endpoint, but the agent

can still move along its current path. Our centralized algorithm (see Section 6.5) uses a similar

idea to allow all free agents to consider all tasks, including the ones that have just been added to

the system, in the task set.

6.5 Centralized Algorithm

In this section, we use some of the ideas presented in Sections 6.4.1.1 and 6.4.2.1 to develop a

centralized MAPD algorithm, CENTRAL, that can potentially improve the effectiveness of de-

coupled MAPD algorithms. Unlike the decoupled algorithms TP and TPTS, CENTRAL makes

decisions for multiple agents at a time. We expect CENTRAL to be reasonably efficient and

effective. But unlike centralized MAPF and TAPF algorithms, we do not require CENTRAL

to be optimally effective since (1) state-of-the-art algorithms that can solve MAPD optimally in

an offline setting (by making the assumption that all tasks are known a priori) scale to only 4

agents and 4 tasks (Henkel et al., 2019) and (2) there does not exist any optimal online MAPD

algorithm (Azar et al., 1995; Kalyanasundaram & Pruhs, 1993).

Similar to TPTS, CENTRAL allows agents that have just become free to consider not only

unassigned tasks but also all unexecuted tasks, including the ones that have been assigned to

agents, in the task set. Unlike TPTS, CENTRAL uses a centralized target-assignment algorithm,

the Hungarian method (Kuhn, 1955), to assign (pickup vertices of) tasks to agents and allows

all free agents to consider tasks that have just been added to the system. It uses the centralized

MAPF algorithm CBS (Section 3.2.2) to plan paths for multiple agents.

137

6.5.1 Task/Endpoint-Assignment Procedure

At each time step, CENTRAL executes the task/endpoint-assignment procedure in two phases

to assign endpoints to agents.

In the first phase, CENTRAL considers each agent, one after the other, that rests at the pickup

vertex of an unexecuted task. If the delivery vertex of the task is currently not assigned to other

agents, CENTRAL assigns the corresponding unexecuted task (if it is not assigned to the agent

already) and its delivery vertex to the agent. The agent then starts to execute the task and thus

becomes occupied. CENTRAL then executes the first stage of the path-planning procedure (see

Section 6.5.2) to plan paths for all agents that become occupied to execute their assigned tasks.

In the second phase, if any new tasks are added to the system or any agents finish executing

their tasks (and thus become free) at the current time step, CENTRAL assigns either the pickup

vertex of an unexecuted task or some other endpoint as parking endpoint to each free agent.

To make the resulting MAPF problem solvable, the endpoints assigned to all agents must be

pairwise different. Agents are assigned pickup vertices of unexecuted tasks in order to execute

the tasks afterward. Thus, when CENTRAL assigns pickup vertices of unexecuted tasks to

agents, we want the delivery vertices of these tasks to be different from the endpoints assigned

to all agents (except for their own pickup vertices in case the pickup and delivery vertices of a

task are the same) and from each other. CENTRAL achieves these constraints as follows:

• First, CENTRAL greedily constructs a set of possible endpoints X for the free agents as

follows: CENTRAL greedily constructs a subset T ′ of unexecuted tasks, starting with the

empty set, by checking for each unexecuted task, one after the other, whether its pickup

and delivery vertices are different from the delivery vertices of all executed tasks and

the pickup and delivery vertices of all unexecuted tasks already added to T ′ and, if so,

adds it to T ′. CENTRAL then sets X to the pickup vertices of all tasks in T ′. If the

number of free agents is larger than |X|, then CENTRAL needs to add endpoints to X as

parking endpoints for some free agents. Since it is not known a priori which free agents

these parking endpoints will be assigned to, there should be one good parking endpoint

138

available for each free agent, which is possible because there is at least one unique non-

task endpoint for each free agent due to Definition 6.1: CENTRAL greedily determines

a good parking endpoint for each free agent ai, one after the other, as the endpoint e that

minimizes the cost c(ai, e) (“is closest to the agent”) among all endpoints that are different

from the delivery vertices of all executed tasks, the pickup and delivery vertices of all tasks

in T ′, and the parking endpoints already determined, where c(ai, e) is the cost of a time-

minimal path that moves from the current vertex of free agent ai to endpoint e (a unique

non-task endpoint is thus always a good-parking-endpoint candidate for each free agent).

It then adds this endpoint to X .

• Second, CENTRAL assigns each free agent an endpoint in X to satisfy all constraints. It

uses the Hungarian Method (Kuhn, 1955) for this purpose with the modified costs c′(ai, e)

for each pair of free agent ai and endpoint e, where c is the number of free agents, C is

a sufficiently large constant (for example, the maximum over all costs c(ai, e) plus one),

and c′(ai, e) = c · C · c(ai, e) if e is a pickup vertex of a task in T ′ and c′(ai, e) =

c·C2+c(ai, e) if e is a parking endpoint. The modified costs have two desirable properties:

(a) The modified cost of assigning a pickup vertex to a free agent is always smaller than

the modified cost of assigning a parking endpoint to the same agent. Therefore, assigning

pickup vertices is more important than assigning parking endpoints. (b) Assigning a closer

pickup vertex to a single free agent that is assigned a pickup vertex reduces the total

modified cost more than assigning closer parking endpoints to all free agents that are

assigned parking endpoints. Therefore, assigning closer pickup vertices is more important

than assigning closer parking endpoints. The Hungarian Method itself handles the case

where the number of free agents is smaller than |T ′| (and thus |X| = |T ′|) by adding

“dummy” free agents (so that there are |X| “dummy”and “real” free agents in total) that

have a sufficiently large constant cost for every endpoint in X , which does not affect the

assignments of endpoints to “real” free agents.

139

6.5.2 Path-Planning Procedure

CENTRAL uses a version of CBS to plan collision-free paths for all agents from their current

vertices to their assigned endpoints simultaneously. This version of CBS (see Section 3.2.2.3)

minimizes the flowtime, that is, the sum of the number of time steps required by all agents to

reach their assigned endpoints and stop moving, because minimizing the makespan can unnec-

essarily delay the arrival times of some agents at their assigned endpoints for which they could

have otherwise reached at an earlier time step.

We noticed that CENTRAL becomes significantly more efficient if it plans paths in two

stages at each time step:

In the first stage, if any agents become occupied (in the first phase of the task/endpoint-

assignment procedure) at the current time step, CENTRAL plans paths for all these agents to

their assigned endpoints (using the approach described above but treating all other agents as

dynamic obstacles that follow their most recently calculated paths and with which collisions

need to be avoided). These occupied agents will then follow the entire planned paths.

In the second stage, if any new tasks are added to the system or any agents become free

at the current time step (and the second phase of the task/endpoint-assignment procedure has

thus been executed), it plans paths for all free agents to their assigned endpoints (again using

the approach described above but treating all other agents as dynamic obstacles that follow their

most recently calculated paths and with which collisions need to be avoided). The free agents

will not necessarily follow the entire planned agents if they are reassigned tasks at the future

time steps.

In general, two smaller MAPF problem instances can be solved much faster than their union

due to the NP-hardness of the problem. Also, CENTRAL can then determine a more informed

cost c(ai, e) as the cost of a time-minimal path that (1) moves from the current vertex of agent

ai to endpoint e and (2) does not collide with the paths of the occupied agents (as described for

TP).

140

Property 6.4. Path planning for all agents that became occupied at the current time step returns

paths successfully for well-formed MAPD problem instances.

Proof. We construct paths for all agents that became occupied at the current time step from their

current vertices to their assigned endpoints that do not collide with the most recently calculated

paths of all other agents: Assume that all other agents move along their most recently calculated

paths. When all of them have reached the ends of their paths, move all agents that became

occupied one after the other to their assigned endpoints, which is possible due to Definition 6.1

since their current vertices are endpoints and their assigned endpoints are different from the

endpoints that all other agents now occupy. Any complete MAPF algorithm thus returns paths

successfully.

Property 6.5. Path planning for all free agents returns paths successfully for well-formed MAPD

problem instances.

Proof. We construct paths for all free agents from their current vertices to their assigned end-

points that do not collide with the most recently calculated paths of all other agents: In the first

step, assume that all (both free and occupied) agents move along their most recently calculated

paths. In the second step, when all of them have reached the ends of their paths, move all free

agents one after the other to their assigned endpoints, which is possible due to Definition 6.1

since the vertices that they now occupy are endpoints. Directly before an agent moves to its

assigned endpoint, check whether this endpoint is blocked by another agent (which must be a

free agent that has not executed the second step due to the second phase of the task/endpoint-

assignment procedure). If so, move this other agent to an unoccupied endpoint first. Such an

endpoint exists since there are at least M + 1 endpoints for M agents due to Definition 6.1.

The free agents that have executed the second step and all occupied agents never move when

other free agents execute the second step because they do not occupy endpoints assigned to free

agents that have not executed the second step. The resulting paths thus move all free agents to

their assigned endpoints. Any complete MAPF algorithm thus returns paths successfully.

141

Theorem 6.3. CENTRAL is long-term robust for all well-formed MAPD problem instances.

Proof. We show that each task is eventually assigned to some agent and executed by it. The last

task is added to the system at a bounded time step t1. After that, for a proof by contradiction,

assume that all unexecuted tasks remain unexecuted (any executed task will be finished by the

agent that it is assigned to as stated above). If all agents are free at time step t1, we consider

time step t = t1. Otherwise, there must exist an earliest bounded time step t2 > t1 where all

agents have become free, and we consider time step t = t2. At time step t, no unexecuted task is

assigned to an agent in the first phase of the task/endpoint-assignment procedure because, other-

wise, the task will be executed successfully. The task/endpoint-assignment procedure executes

the second phase since a new task is added or an agent becomes free at time step t. Since all

tasks are unexecuted, the candidate task set T ′ admits at least one task. The pickup vertex of

the task is assigned to an agent. This pickup vertex is never re-assigned to another agent after-

ward because no new task is added and no agent changes from occupied to free (otherwise, an

unexecuted task is finished, which contradicts the assumption). Therefore, the agent reaches the

pickup vertex of the task due to Property 6.5. Then, the agent is assigned the delivery vertex of

the task because no other agents are assigned this vertex as endpoint due to the second phase of

the task/endpoint-assignment procedure executed at time step t. The agent eventually reaches

the delivery vertex of the task due to Property 6.4, which contradicts the assumption. Therefore,

at least one unexecuted task is finished by an agent. We finish the proof by applying the above

arguments for the set of remaining unexecuted tasks.

6.5.3 Extensions of CENTRAL

The design of CENTRAL is kept as simple as possible so that it can easily be extended to

different centralized MAPD algorithms.

• Like TP, CENTRAL can be used to constructively prove that all well-formed MAPD prob-

lem instances, as shown above.

142

• Like TP, CENTRAL is also long-term robust even if each agent has to spend a known or

unknown number of time steps at the pickup or delivery vertex because CENTRAL can

then plan a path for the agent from the pickup (respectively delivery) vertex only when the

agent has actually finished the pickup (receptively delivery) operation.

• CENTRAL can replace CBS with any other MAPF algorithm in the path-planning pro-

cedure. The resulting MAPD algorithm is still long-term robust as long as the MAPF

algorithm is complete.

• CENTRAL can potentially apply, without many changes, to cases where tasks have more

than one intermediate target (instead of only one—the pickup vertex) each.

CENTRAL can potentially be made more effective (respectively efficient) with changes in

the following components of its design, which, however, can make CENTRAL less efficient

(respectively effective), robust, or general:

1. The path-planning procedure can combine the MAPF problems in two stages into a larger

MAPF problem (if both stages are executed). Solving such a larger MAPF problem can

potentially make CENTRAL more effective but makes CENTRAL less efficient than solv-

ing the two smaller MAPF problems in two stages. Also, additional considerations are

needed to guarantee long-term robustness.

2. The path-planning procedure can be divided into more stages. For example, in the sec-

ond stage, paths for agents that are assigned pickup vertices of unexecuted tasks can be

planned first, and paths for agents that are assigned parking endpoints, if there are any, can

be planned later. This can make CENTRAL more efficient. However, additional consid-

erations are needed to guarantee long-term robustness.

3. Any MAPF algorithms that are more efficient can be used in the path-planning procedure

to make CENTRAL more efficient. Complete but suboptimal MAPF algorithms can be

used in the path-planning procedure. This can make CENTRAL more efficient but less

effective. Incomplete MAPF algorithms can also be used in the path-planning procedure.

143

However, this can make CENTRAL no longer long-term robust for well-formed MAPD

problem instances.

4. Like TP, the task/endpoint-assignment procedure can consider only unassigned tasks and

free agents with no task assigned and thus do not reassign tasks to agents that have already

been assigned tasks. This can make CENTRAL less effective but makes it more efficient

because the target-assignment and MAPF problems are all smaller. This also allows CEN-

TRAL to use space-time A* searches to plan paths for individual agents (like in TP) and

still guarantees long-term robustness. Some recent research uses this idea (Grenouilleau

et al., 2019; Honig, Kiesel, Tinka, Durham, & Ayanian, 2019).

5. Similar to an idea of extending TP (see Section 6.4.1.1), deadlock avoidance at the pickup

vertex can be made more effective. CENTRAL can assume that agents whose paths end

at the pickup vertices of their assigned tasks do not rest at the pickup vertices. Our recent

research uses this modification (Liu, Ma, Li, & Koenig, 2019) (not covered as a contri-

bution of this dissertation). However, this cannot generalize to cases where each agent

has to spend an unknown number of time steps at the pickup vertex to finish the pickup

operation. Also, additional considerations are needed to guarantee long-term robustness.

6.6 Experiments


laptop with 6 GB RAM. First, we compare the makespans and service times of the solutions

produced by TP, TPTS, and CENTRAL. Second, we compare the runtimes of TP, TPTS, and

CENTRAL. Third, we compare the throughputs of TP, TPTS, and CENTRAL. Fourth, we study

how TP performs in a large simulated warehouse environment with hundreds of agents.

For Experiments 1 to 3, we ran TP, TPTS, and CENTRAL in the small simulated warehouse

environment shown in Figure 6.4. Black cells are blocked. Gray cells in columns of gray cells

are potential initial vertices for the agents. Colored circles are the actual initial vertices of agents,

144

Figure 6.4: The small simulated warehouse environment with 50 agents.

which are drawn randomly from all potential initial vertices and are the only non-task endpoints.

Gray cells other than the initial vertices are task endpoints (that would house inventory pods in

a warehouse even though we do not model inventory pods here). We generated a sequence of

500 tasks by randomly choosing their pickup and delivery vertices from all task endpoints. We

used 6 different task frequencies (numbers of tasks that are added (in order) from the sequence

to the system at each time step): 0.2 (one task every 5 time steps), 0.5, 1, 2, 5, and 10. For each

task frequency, we used 5 different numbers of agents: 10, 20, 30, 40, and 50. This simulated

warehouse environment is inspired by the warehouse map shown in Figure 1.1 that represents

the layout of part of an Amazon Robotics automated warehouse system (Wurman et al., 2008).

The resulting MAPD problem instances have been considered as benchmark MAPD problem

instances by recent research on MAPD (Grenouilleau et al., 2019; Liu, Ma, et al., 2019; Okumura

et al., 2019).

Table 6.1 reports the makespans, the service times, the runtimes per time step (in ms), the

ratios of the service times of TPTS and TP, and the ratios of the service times of CENTRAL and

TP. The measures for a task frequency of 10 tasks per time step are reasonably representative

of the case where all tasks are added in the beginning of the operation since the tasks are added

over the first 50 time steps only.

145

Table 6.1: Results for TP, TPTS, and CENTRAL in the small simulated warehouse environment.

TP TPTS CENTRALta

skfr

eque

ncy

agen

ts

mak

espa

n

serv

ice

time

time

(ms)

mak

espa

n

serv

ice

time

time

(ms)

ratio

mak

espa

n

serv

ice

time

time

(ms)

ratio

0.2

10 2,532 38.54 0.13 2,532 29.33 1.86 0.76 2,513 27.82 37.77 0.7220 2,540 39.77 0.26 2,520 25.36 9.82 0.64 2,520 24.77 218.17 0.6230 2,546 38.71 0.25 2,527 23.88 21.57 0.62 2,520 23.60 553.20 0.6140 2,540 38.88 0.24 2,524 23.50 27.49 0.60 2,516 23.07 1,042.81 0.5950 2,540 40.03 0.32 2,524 23.11 47.33 0.58 2,518 22.67 1,711.85 0.57

0.5

10 1,309 132.79 0.24 1,274 131.15 0.33 0.99 1,264 116.79 64.22 0.8820 1,094 42.69 1.16 1,038 30.74 12.39 0.72 1,036 27.63 232.52 0.6530 1,069 43.97 1.51 1,035 27.14 34.04 0.62 1,032 25.84 712.71 0.5940 1,090 43.01 0.70 1,038 25.98 19.85 0.60 1,032 24.94 1,438.38 0.5850 1,083 43.66 1.36 1,036 25.22 71.48 0.58 1,031 24.46 2,564.55 0.56

1

10 1,198 311.78 0.20 1,182 301.03 0.37 0.97 1,146 290.52 84.78 0.9320 757 95.98 1.03 706 88.25 2.80 0.92 698 75.20 237.68 0.7830 607 53.80 2.81 561 42.84 8.45 0.80 560 31.54 394.55 0.5940 624 48.80 2.14 563 31.99 39.54 0.66 554 27.85 1,110.96 0.5750 597 49.14 3.76 554 30.27 128.13 0.62 555 27.02 2,118.37 0.55

2

10 1,167 407.62 0.19 1,168 407.24 0.37 1.00 1,113 382.25 78.45 0.9420 683 190.76 0.96 667 181.03 2.38 0.95 617 163.97 255.19 0.8630 529 114.39 2.31 496 102.69 8.39 0.90 479 92.88 516.74 0.8140 464 95.32 3.43 425 72.59 7.32 0.76 382 55.88 745.52 0.5950 432 75.63 6.28 383 58.06 126.47 0.77 332 38.96 1,101.63 0.52

5

10 1,162 473.78 0.20 1,165 473.18 0.41 1.00 1,128 462.34 80.75 0.9820 655 247.08 1.02 645 238.02 1.68 0.96 590 222.69 219.01 0.9030 478 170.78 2.22 474 167.66 8.58 0.98 431 147.53 482.46 0.8640 418 155.33 4.15 396 131.36 12.31 0.85 345 109.55 802.54 0.7150 395 124.59 5.92 343 104.86 59.64 0.84 289 86.68 1,296.44 0.70

10

10 1,163 495.93 0.22 1,172 505.26 0.40 1.02 1,115 474.97 75.55 0.9620 643 275.24 1.09 645 258.36 1.87 0.94 589 242.18 223.50 0.8830 526 192.01 1.98 491 198.30 10.82 1.03 422 165.13 440.73 0.8640 407 154.63 1.65 389 152.49 12.62 0.99 344 128.40 785.88 0.8350 333 131.42 5.62 319 126.96 25.32 0.97 304 106.70 1,884.28 0.81

6.6.1 Experiment 1: Makespan and Service Time

The MAPD algorithms, in increasing order of their makespans and service times, tend to be:

CENTRAL, TPTS, and TP. For example, the service time of TPTS (and CENTRAL) is up to

146

about 42 percent (and 48 percent, respectively) smaller than the one of TP for some experimental

runs. The makespans tend to be large for low task frequencies and small for high task frequencies

because the number of tasks is constant and thus more time steps are needed to add all tasks

for low task frequencies. On the other hand, the service times tend to be small for low task

frequencies and high for high task frequencies because the agents tend to be able to attend to tasks

fast if the number of tasks in the system is small. The makespans and service times tend to be

large for small numbers of agents and small for large numbers of agents because the agents tend

to be able to attend to tasks fast if the number of agents is large (although congestion increases).

The makespans and service times for a task frequency of 0.2 tasks per time step are about the

same for all numbers of agents because 10 agents already attend to all tasks as fast as the MAPD

algorithms allow. The makespans are similar for all MAPD algorithms and all numbers of agents

for the task frequency of 0.2 tasks per time step because 10 agents already execute tasks faster

than they are added. The makespans then depend largely on how fast the agents execute the last

few tasks. On the other hand, the makespans of MAPD algorithms increase substantially when

tasks pile up because the agents execute them more slowly than they are added. This allows us to

estimate the smallest number of agents needed for a long-term operation as a function of the task

frequency and MAPD algorithm. For example, the makespan of TPTS increases substantially

when the number of agents is reduced from 20 to 10 for a task frequency of 1 task per time step.

Thus, one needs between about 10 and 20 agents for a long-term operation with TPTS.

6.6.2 Experiment 2: Runtime per Time Step

The MAPD algorithms, in increasing order of their runtimes per time step, tend to be: TP, TPTS,

and CENTRAL. For example, the runtime of TPTS (and CENTRAL) is two orders of magnitude

larger than the runtime of TP (and TPTS, respectively) for some experimental runs. The runtimes

of TP are less than 10 milliseconds, the runtimes of TPTS are less than 200 milliseconds, and the

runtimes of CENTRAL are less than 3,000 milliseconds in all experimental runs. We consider

runtimes below one second to allow for real-time long-term operation. The runtimes tend not to

be correlated with the task frequencies. They tend to be small for small numbers of agents and

147

0

0.05

0.1

0.15

0.2

0.25

0.3

0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600

TP TPTS CENTRAL tasks added

(a) Task frequency: 0.2 tasks per time step.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 100 200 300 400 500 600 700 800 900 1000 1100 1200


(b) Task frequency: 0.5 tasks per time step.

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700


(c) Task frequency: 1 task per time step.

0

0.5

1

1.5

2

2.5

0 50 100 150 200 250 300 350 400 450 500 550


(d) Task frequency: 2 tasks per time step.

0

1

2

3

4

5

6

0 50 100 150 200 250 300 350 400 450 500


(e) Task frequency: 5 tasks per time step.

0

1

2

3

4

5

6

0 50 100 150 200 250 300 350 400 500


(f) Task frequency: 10 tasks per time step.

Figure 6.5: Number of tasks added (gray) and executed by 50 agents per time step in a moving100-time-step window [t − 99, t] for TP, TPTS, and CENTRAL as a function of the time step tfor different task frequencies.

large for large numbers of agents because all agents need to perform computations, which are

not run in parallel in our experiments.

6.6.3 Experiment 3: Number of Executed Tasks

The service times vary over time since only very few tasks are available at the first and last time

steps. The steady state is in between these two extremes. Figure 6.5 visualizes the number of

148

Table 6.2: Throughputs for TP, TPTS, and CENTRAL in the small simulated warehouse envi-ronment.

task frequency TP TPTS CENTRAL0.2 0.191 0.192 0.1920.5 0.430 0.446 0.4481 0.739 0.780 0.7802 0.967 1.062 1.1905 1.031 1.152 1.31610 1.174 1.214 1.259

Table 6.3: Results for TP in the large simulated warehouse environment.

agents 100 200 300 400 500service time 463.25 330.19 301.97 289.08 284.24time (ms) 90.83 538.22 1,854.44 3,881.11 6,121.06

tasks added per time step and the number of tasks executed by 50 agents per time step (that is,

throughput at time step t) during the 100-time-step window [t− 99, t] for all MAPD algorithms

as a function of the time step t. For low task frequencies, the numbers of tasks added match the

numbers of tasks executed closely for all MAPD algorithms. Differences between them arise for

higher task frequencies. For example, for the task frequency of 2 tasks per time step, the number

of tasks executed by CENTRAL increases faster and reaches a higher level than the numbers of

tasks executed by TP and TPTS. The 100-time-step window [150, 249] at time step t = 249 is a

close approximation of the steady state since all tasks are added at a steady rate during the first

250 time steps. CENTRAL executes more tasks during this 100-time-step window than TP and

TPTS and thus has a smaller service time. However, the numbers of tasks executed are smaller

for all MAPD algorithms than the number of tasks added, and tasks thus pile up for all of them

in the steady state. Table 6.2 reports the average throughput over all time steps t whose numbers

of tasks executed during the 100-time-step window [t − 99, t] are positive, measured in tasks

per time step. The MAPD algorithms, in increasing order of their average throughputs, are: TP,

TPTS, and CENTRAL.

149

Figure 6.6: The large simulated warehouse environment with 500 agents.

6.6.4 Experiment 4: Scalability

To evaluate how the MAPD algorithms scale in the number of agents, we ran TP, TPTS, and

CENTRAL in the large simulated warehouse environment shown in Figure 6.6. We generated

a sequence of 1,000 tasks by randomly choosing their pickup and delivery vertices from all

task endpoints. The initial vertices of the agents are the only non-task endpoints. We used

a task frequency of 50 tasks per time step and 100, 200, 300, 400, and 500 agents. TPTS

and CENTRAL did not allow for real-time long-term operation for large numbers of agents.

Table 6.3 thus reports, for TP only, the service times and the runtimes per time step (in ms).

Figure 6.7 visualizes the numbers of tasks executed per time step during a 100-time-step window

for different numbers of agents in the charts. It does not visualize the number of tasks added per

time step since the number is much larger than the number of tasks executed per time step for

most t. The runtime of TP is smaller than 550 milliseconds for 200 agents, allowing for real-time

long-term operation.

150

0

0.5

1

1.5

2

2.5

3

3.5

0 100 200 300 400 500 600 700 800 900 1000 1100

100

200

300

400

500

Figure 6.7: Number of tasks executed per time step during the 100-time-step window [t− 99, t]for TP as a function of the time step t for different numbers of agents.

6.7 Summary

In this chapter, we formalized and studied MAPD, a new variant of MAPF that models the long-

term combined target-assignment and path-planning problem for teams of agents. We demon-

strated how MAPD algorithms can utilize the environmental characteristics of well-formed

MAPD problem instances, a realistic subset of MAPD problem instances, to guarantee long-term

robust for all well-formed MAPD problem instances. We presented two decoupled MAPD algo-

rithms, TP and TPTS, and one centralized MAPD algorithm, CENTRAL. They reduce MAPD

to one-shot target-assignment and path-planning sub-problems for single or multiple agents in

different ways and then iteratively apply one-shot target-assignment and path-planning algo-

rithms to solve these sub-problems. Theoretically, we proved that all three MAPD algorithms

are long-term robust for well-formed MAPD problem instances. Experimentally, we compared

them in a simulated warehouse environment. The MAPD algorithms in increasing order of their

makespans and service times tend to be: CENTRAL, TPTS, and TP. The MAPF algorithms

in increasing order of their runtimes per time step tend to be: TP, TPTS, and CENTRAL. In

particular, TP is the best choice when real-time computation is of primary concern since it re-

mains efficient for MAPD problem instances with hundreds of agents and tasks. Therefore, it

151

has the potential to address real-time long-term coordination of autonomous target-assignment

and path-planning operations for real-world applications of large-scale multi-agent systems.

To summarize, in this chapter, we validated the hypothesis that formalizing and studying new

variants of MAPF can result in new algorithms for the long-term coordination of autonomous

target-assignment and path-planning operations of teams of agents. In Chapter 7, we conduct a

study in a simulated multi-agent system that includes kinematic constraints of real-world agents

and thus confirm the benefits of our MAPD algorithms for real-world applications of multi-agent

systems.

152

Chapter 7

MAPD with Kinematic Constraints in a Simulated

System

In this chapter, we present the fourth major contribution of this dissertation. Specifically, we con-

duct a case study of MAPD with kinematic constraints of real-world agents in a simulated sys-

tem. We demonstrate how MAPD algorithms can take kinematic constraints (see Section 1.1) of

real-world agents into account, guarantee a safety distance between the agents, and remain long-

term robust for all well-formed MAPD problem instances. Specifically, we demonstrate how

to adapt the MAPD algorithm TP (see Section 6.4) to this simulated system to take kinematic

constraints of real-world agents into account directly during planning by using a novel one-shot

single-agent path-planning algorithm, called Safe Interval Path Planning with Reservation Table

(SIPPwRT). The resulting algorithm TP-SIPPwRT computes plan-execution schedules that can

be safely executed by real-world agents with off-the-shelf controllers. We also compare TP-

SIPPwRT against a baseline method that uses the existing existing polynomial-time procedure

MAPF-POST (see Section 3.3) in a post-processing step to transform discrete MAPD solutions

produced by the MAPD algorithms presented in Chapter 6 into plan-execution schedules. Ex-

perimental results on both an agent simulator and a robot simulator validate the hypothesis that

This chapter is based on Ma, H., Honig, W., Kumar, T. K. S., Ayanian, N., & Koenig, S. (2019). Lifelong pathplanning with kinematic constraints for multi-agent pickup and delivery. In AAAI Conference on Artificial Intelligence(pp. 7651–7658).

153

our MAPD algorithms introduced in both Chapters 6 and 7 are potentially applicable to and thus

can benefit real-world applications of multi-agent systems.

The remainder of this chapter is structured as follows. In Section 7.1, we reiterate the moti-

vation behind studying MAPD with kinematic constraints in a simulated system. In Section 7.2,

we state the assumptions used in this chapter and describe how TP-SIPPwRT uses them to take

kinematic constraints into account directly during planning. In Section 7.3, we present SIPP-

wRT that is used by TP-SIPPwRT to solve one-shot single-agent path-planning problems. In

Section 7.4, we analyze the properties of TP-SIPPwRT. In Section 7.5, we provide the spec-

ification of the simulated system used in our experiments. In Section 7.6, we experimentally

evaluate TP-SIPPwRT on both an agent simulator and a robot simulator and compare it with us-

ing MAPF-POST in a post-processing step to transform discrete solutions produced by previous

MAPF algorithms into plan-execution schedules. Finally, we summarize the contributions of

this chapter in Section 7.7.

7.1 Introduction

In this chapter, we develop algorithmic techniques that take kinematic constraints (see Sec-

tion 1.1) of real-world agents into account in their solutions of the long-term combined target-

assignment and path-planning problem MAPD. Simulated systems that model such constraints

provide an important test bed for demonstrating the potential of applying multi-agent coordi-

nation algorithms to real-world applications of multi-agent systems. Existing target-assignment

and path-planning algorithms do not directly consider kinematic constraints of real-world agents.

In Chapter 6, we have developed three MAPD algorithms that are long-term robust for well-

formed MAPD problem instances. However, these MAPD algorithms, like MAPF and TAPF

algorithms that we have covered in Chapters 3 and 5, assume very simple agent models. Their

solutions thus cannot be safely executed by teams of real-world agents. For example, all al-

gorithms we have introduced so far assume discrete agent movements with uniform velocity

and thus make it challenging for real-world agents, such as differential-drive robots, to execute

154

the computed solutions. As we have discussed in Sections 3.3 and 5.5, it has been verified

that MAPF and TAPF algorithms can use MAPF-POST in a post-processing step to transform

their solutions into plan-execution schedules that take kinematic constraints (sizes and veloci-

ties) of real-world agents into account and provide a guaranteed safety distance between them

(Honig, Kumar, Cohen, et al., 2016; Honig, Kumar, Ma, et al., 2016). However, the resulting

plan-execution schedules might then not be effective since planning is oblivious to this trans-

formation. For example, one of the optimal solutions for the MAPF problem instance shown in

Figure 2.1 is {π1 = 〈B,C,D〉, π2 = 〈A,A,C,E〉}. The plan-execution schedule produced by

MAPF-POST for this MAPF solution enforces a temporal constraint (see Section 3.3) that agent

a1 enters vertex C before agent a2 enters it. However, if we assume that agent a1 always moves

with a lower velocity than agent a2, then both agents might arrive at their goal vertices earlier

if the faster agent a2 enters vertex C before the slower agent a1 enters it. The potential loss in

effectiveness might even be larger for MAPD since the effect can cascade. It is thus not known

how well MAPF-POST applies to MAPD. In general, it remains unclear whether and, if so, how

and how well MAPD algorithms can include kinematic constraints of real-world agents in their

solutions as their computation is online and their planning horizon is long.

Therefore, we demonstrate how MAPD algorithms can take kinematic constraints of real-

world agents into account. Specifically, we focus on TP (see Section 6.4), the only algorithm that

remains efficient for MAPD problem instances with hundreds of agents and tasks. TP needs to

repeatedly solve one one-shot single-agent path-planning problem for one agent at a time, while

avoiding collisions with the paths of the other agents. We show how TP can be made even more

efficient by using SIPPwRT, our contribution to improving SIPP (Phillips & Likhachev, 2011)

for MAPD and many other problems. Most importantly, we show how TP can be made more

general by letting SIPPwRT take kinematic constraints of real-world agents into account directly

to compute continuous agent movements. The resulting MAPD algorithm TP-SIPPwRT directly

computes plan-execution schedules and remains long-term robust for well-formed MAPD prob-

lem instances. Experimentally, we demonstrate that TP-SIPPwRT is more efficient and effective

than the baseline method that uses MAPF-POST in a post-processing step to transform MAPD

155

solutions with discrete agent movements into plan-execution schedules with continuous agent

movements. We demonstrate the benefits of our methods in a simulated automated warehouse

system using both an agent simulator and a standard robot simulator. For example, we demon-

strate that TP-SIPPwRT can compute solutions for hundreds of agents and thousands of tasks in

seconds of runtime.

7.2 Assumptions and TP-SIPPwRT

We follow the problem definition of MAPD (see Section 2.5) but make the following assump-

tions throughout this chapter, even though TP-SIPPwRT could easily be generalized beyond

them, mostly because these assumptions have been proven to work for many types of real-world

agents, such as differential-drive robots, with off-the-shelf controllers (Honig, Kumar, Cohen,

et al., 2016; Honig et al., 2019; Honig, Kumar, Ma, et al., 2016) and make it easier to (1) com-

pare TP-SIPPwRT with using MAPF-POST to post-process MAPD solutions with discrete agent

movements and (2) explain TP-SIPPwRT itself:

• The given graphG is a 2D 4-neighbor grid with blocked and unblocked cells of size L×L

each.

• Each agent ai is a circular agent (disk) with (safety) radius Ri ≤ L/2. We use its center

as its reference point. The radii thus take agent sizes and safety distances into account.

• The configuration of an agent is a pair of its cell and orientation (main compass direction).

• Time is continuous.

• Each agent starts from its given initial configuration at time 0 and always moves from

the center of its current unblocked cell to the center of a neighboring unblocked cell via

the following available actions: a point turn of π/2 rads (ninety degrees) in either the

clockwise or counterclockwise direction with a given rotational velocity, a wait in (the

center of) their current unblocked cell, and a forward movement to (the center of) the

neighboring cell with a given translational velocity.

156

• The agents can accelerate and decelerate infinitely fast.

For the robot simulator used in the experiments, we use controllers for real-world agents that

approximate the above assumptions.

In Section 7.3.6, we argue that, for TP-SIPPwRT, the above assumptions on actions are

equivalent to using only turn-and-move actions, each consisting of a point turn in one of the

four compass directions followed by a wait (when necessary) and then a forward movement to a

neighboring unblocked cell. A path of agent ai is a sequence of configurations, each associated

with the (continuous) time when the agent arrives in it, where any two consecutive configurations

are connected by a turn-and-move action. The paths of all agents, when combined with the given

rotational and translational velocities, provide sufficient statistics for the agents to follow them

during execution under the above assumptions on agent movements and are thus also a plan-

execution schedule for the agents. The paths of two agents are free of collisions if and only if

the interiors of the agent disks never intersect when they follow their paths.

For ease of exposition, we refer to a sequence of cells as defined in the previous chapters for

algorithms that assume discrete agent movements also as a path, as defined in previous chapters.

7.3 SIPPwRT

Recall that TP (Algorithm 6.1) assumes discrete agent movements in the main compass direc-

tions with a uniform velocity of typically one cell per time unit (time step). Agents may repeat-

edly call Functions Path1 [Line 10] or Path2 [Line 14], that each plans a time-minimal path for

itself from one endpoint to another, considering the other agents as dynamic obstacles that follow

their paths in the token and with which collisions need to be avoided. The agents use space-time

A* (see Section 3.2.1.1) for these one-shot single-agent path-planning sub-problems.

TP-SIPPwRT replaces space-time A* with SIPPwRT, a version of SIPP that computes con-

tinuous forward movements and point turns with given velocities rather than discrete agent

movements in the main compass directions with uniform velocity.

157

7.3.1 A* Search of SIPP

Space-time A* and SIPP are two versions of A* that both plan time-minimal paths for agents

from their current cells to given goal cells, considering the other agents as dynamic obstacles

that follow their paths and with which collisions need to be avoided. They both assume discrete

agent movements in the main compass directions with a uniform velocity of typically one cell

per time unit on a grid. Space-time A* operates on pairs of cells and time steps, while SIPP

groups contiguous time steps during which a cell is not occupied into safe (time) intervals, that

are closed intervals with a lower and upper (time) bound each, for that cell and thus operates

on pairs of cells and safe intervals. This affords the A* search of SIPP pruning opportunities

because it is always preferable for an agent to arrive at a cell earlier during the same safe interval

since it can then simply wait at the cell. Thus, if the A* search of SIPP has already found a path

that arrives at some cell at some time during some safe interval and then discovers a path that

arrives at the same cell at a later time in the same safe interval, then it can prune the latter path

without losing optimality.

SIPP has already been used for robotics applications (Narayanan, Phillips, & Likhachev,

2012; Yakovlev & Andreychuk, 2017). We generalize it to continuous forward movements and

point turns with given velocities in the following, where a safe interval for a cell is now a maximal

contiguous interval during which the cell is not occupied by dynamic obstacles. Since SIPPwRT,

the resulting version of SIPP, is guaranteed to discover collision-free paths (like space-time A*)

when used as part of TP, TP-SIPPwRT, like TP, remains long-term robust for all well-formed

MAPD problem instances.

7.3.2 Reservation Table and Safe Intervals

SIPP represents the path of each dynamic obstacle as a chronologically ordered sequence of

cells occupied by the dynamic obstacle, which is not efficient since SIPP has to iterate through

all these sequences to calculate all safe intervals of a given cell. On the other hand, space-time

A* maintains a reservation table that is indexed by a cell and a time step, which allows for

158

the efficient calculation of all safe time steps of a given cell. For example, in Section 3.2.1.3,

when Cooperative A* uses a space-time A* search to plan a path for agent a2 for the MAPF

problem instance shown in Figure 2.1, it maintains a reservation table that is indexed by the cell

and the time step of the vertex constraints 〈a2,B, 0〉, 〈a2,C, 1〉, 〈a2,D, 2〉, 〈a2,D, 3〉, . . . that are

imposed by the dynamic obstacles (in this example, only one dynamic obstacle, namely agent a1,

exists). Therefore, when the space-time A* search tries to expand state 〈A, 0〉, it can efficiently

infer from the reservation table that state 〈C, 1〉 is unreachable (removed from the state space).

A space-time A* search often uses a similar reservation table that is indexed by an ordered pair

of cells (thus an edge traversal) and a time step for edge constraints.

SIPPwRT improves upon SIPP by using a version of a reservation table that handles contin-

uous agent movements with given velocities and is indexed by a cell. A reservation table entry

of a given cell is a priority queue that contains all reserved intervals, that are open intervals with

a lower and upper (time) bound each, for that cell in increasing order of their lower bounds. A

reserved interval for a cell is a maximal contiguous interval during which the cell is occupied by

some dynamic obstacle. The reservation table allows SIPPwRT to implement all operations effi-

ciently that are needed by TP-SIPPwRT, namely to (1) calculate all safe intervals of a given cell,

(2) add reservation table entries after a new path has been calculated, and (3) delete reservation

table entries that refer to irrelevant times in the past in order to keep the reservation table small.

7.3.2.1 Function GetSafeIntervals

Function GetSafeIntervals(cell) returns all safe intervals for cell cell in increasing order of their

lower bounds. The safe intervals for the cell are obtained as the complements of the reserved

intervals for the cell with respect to interval [0,∞]. For safe interval i = [i.lb, i.ub] and a

dynamic obstacle departing from cell cell at time i.lb, dep cfg[cell, i] is the configuration of the

dynamic obstacle at time i.lb1. It is NULL if and only if i.lb ≤ current t, where current t is

the time step when the agent starts at a given start configuration. Similarly, for safe interval

1We strictly mean the instant before time i.lb. We are not strict when it comes to the boundary cases of a safe orreserved interval because they do not affect our algorithm.

159

i = [i.lb, i.ub] and the dynamic obstacle arriving at cell cell at time i.ub, arr cfg[cell, i] is the

configuration of a dynamic obstacle at time i.ub. It is NULL if and only if i.ub =∞.

7.3.3 Time Offsets

The safe intervals of a cell represent the times during which the cell is not occupied. However,

this does not mean that an agent can arrive at any of the times at the cell since the agent might

still collide with a dynamic obstacle that has just departed from the cell or is about to arrive at the

cell. Thus, the lower and upper bounds of a safe interval have to be tightened using the following

time offsets.

Function Offset(cfg1, cfg2) returns the time offset ∆T that expresses the minimum amount

of time (the center of) some unknown agent a1 with safety radius R1 and translational velocity

vtrans,1 needs to depart from (the center of) a cell cfg1.cell = l with configuration cfg1 before

(the center of) some unknown agent a2 with safety radius R2 and translational velocity vtrans,2,

coming from some cell l′, arrives at (the center of) the same cell cfg2.cell = l with configuration

cfg2 to avoid a collision. The time offset is zero if and only if either cfg1 = NULL or cfg2 =

NULL, meaning that either agent a1 or agent a2 does not exist. The calculation of the time offset

requires only knowledge of the configurations, safety radii, and velocities of both agents.2

Assume that agent a2 departs from cell l′ at time 0 (and thus arrives at cell l at time t′ =

Lvtrans,2

) and agent a1 departs from cell l at time td ≤ Lvtrans,2

. D(t) is the distance between the

(centers of the) agents, where t is the amount of time elapsed after agent a2 departs from cell

l′. It must hold that D(t) ≥ R1 + R2 to avoid that the two agents collide. We distinguish three

cases to calculate the time offset ∆T :

2In the pseudocode of SIPPwRT, we show how to keep track of the configurations but do not include the safetyradii and velocities in the configurations for ease of readability (although this needs to be done in case they are notthe same for all agents and times).

160

7.3.3.1 Same Direction

Both agents move in the same direction (meaning that the orientations of configurations cfg1 and

cfg2 are equal), see Figure 7.1 (left), where gray lines connect the centers of cells. In this case,

D(t) = L−vtrans,2t+vtrans,1(t− td) for t ≥ td. We now distinguish two sub-cases to show that

the time offset is ∆T = R1+R2min(vtrans,1,vtrans,2)

.

(1) Case vtrans,1 < vtrans,2. This case is shown in Figure 7.1 (middle). D(t) decreases as

the time t increases. D(t) thus reaches its minimum at the time t = t′′ = Lvtrans,2

when agent a2

arrives at cell l. Substituting t = Lvtrans,2

back into D(t) ≥ R1 +R2, we have

D(t) = L− vtrans,2t+ vtrans,1(t− td) = vtrans,1(L

vtrans,2− td) ≥ R1 +R2.

Therefore, td ≤ Lvtrans,2

− R1+R2vtrans,1

. The time offset ∆T is thus

∆T = t′ −max td =L

vtrans,2− (

L

vtrans,2− R1 +R2

vtrans,1) =

R1 +R2

vtrans,1.

(2) Case vtrans,1 ≥ vtrans,2. This case is shown in Figure 7.1 (right). D(t) decreases before

agent a1 starts to move and then increases as the time t increases. D(t) thus reaches its minimum

at the time t = td when agent a1 starts to move. Substituting t = td back into D(t) ≥ R1 +R2,

we have

D(t) = L− vtrans,2t+ vtrans,1(t− td) = L− vtrans,2td ≥ R1 +R2.

Therefore, td ≤ L−(R1+R2)vtrans,2

. The time offset is thus


vtrans,2− L− (R1 +R2)

vtrans,2=R1 +R2

vtrans,2.

7.3.3.2 Orthogonal Directions

Both agents move in orthogonal directions, see Figure 7.2. In this case, D(t) =√(vtrans,1(t− td))2 + (L− vtrans,2t)2. We determine the time t at which D(t) ≥ 0 reaches

161

l′ l

a1

R 1

vtrans,1

a2R 2

vtrans,2

L

L = vtrans,2t

vtrans,1(t− td)

L− vtrans,2t

Figure 7.1: Left: Two agents move in the same direction. Middle: D is at its minimum for thevtrans,1 < vtrans,2 case. Right: D is at its minimum for the vtrans,1 ≥ vtrans,2 case.

l

l′

a1

R 1

vtrans,1

a2R 2

vtra

ns,2

L

D

vtrans,1(t− td)

L−

v trans,2t

Figure 7.2: Left: Two agents move in orthogonal directions. Right: D is at its minimum.

its minimum by solving ∂D2

∂t = 0. Substituting the result t =v2

trans,1td+vtrans,2L

v2trans,1+v2

trans,2into D2 ≥

(R1 +R2)2, we have

D2(t) = (L− vtrans,2t)2 + (vtrans,1(t− td))2 =

(vtrans,1(vtrans,2td − L))2

(v2trans,1 + v2trans,2)≥ (R1 +R2)

2.

Since L ≥ vtrans,2td, we have vtrans,1(L−vtrans,2td)≥√v2trans,1 + v2trans,2(R1 +R2). There-

fore, td ≤vtrans,1L−

√v2

trans,1+v2trans,2(R1+R2)

vtrans,1vtrans,2. The time offset is thus


vtrans,2−vtrans,1L−

√v2trans,1 + v2trans,2(R1 +R2)

vtrans,1vtrans,2

=

√v2trans,1 + v2trans,2(R1 +R2)

vtrans,1vtrans,2.

162

7.3.3.3 Opposite Directions

Both agents move in opposite directions, that is, agent a1 moves from cell l to cell l′ and agent

a2 moves from cell l′ to cell l. The time offset is set to allow agent a1 to arrive at cell l′ even

before agent a2 departs from cell l′. In this case, the time offset is

∆T =L

vtrans,1+

L

vtrans,2,

which is the sum of the times that agent a1 needs to move from cell l to cell l′ and that agent

a2 needs to move from cell l′ to cell l. We show in Section 7.4 that SIPPwRT avoids collisions

when it uses all bounds simultaneously.

7.3.4 Increased/Decreased Bounds

The algorithm calls the following functions to tighten the lower and upper bounds of safe interval

i during which an agent can stay at cell l = cfg.cell safely.

The algorithm calls Function GetLB1(cfg, i) for an agent a2 to return maxj(j.lb +

Offset(dep cfg[cfg.cell, j], cfg)). Here, j.lb + Offset(dep cfg[cfg.cell, j], cfg) is the increased

lower bound for each safe interval j in GetSafeIntervals(cfg.cell) with j.lb ≤ i.lb. For agent

a2 that arrives at cell l = cfg.cell from another cell l′ with configuration cfg, the idea is to pre-

vent it from colliding with any dynamic obstacle a1 that departs from cell l with configuration

dep cfg[cfg.cell, j] before a2 arrives at cell l.

The algorithm calls Function GetUB1(cfg, i) for an agent a1 to return minj(j.ub −

Offset(cfg, arr cfg[cfg.cell, j])). Here, j.ub − Offset(cfg, arr cfg[cfg.cell, j]) is the decreased

upper bound for each safe interval j in GetSafeIntervals(cfg.cell) with j.ub ≥ i.ub. For agent

a1 that departs from cell l = cfg.cell with configuration cfg, the idea is to prevent it from col-

liding with any dynamic obstacle a2 that arrives at cell l from another cell l′ with configuration

arr cfg[cfg.cell, j] after a1 departs from cell l.

The algorithm calls Function GetLB2(cfg, i) for an agent a1 to return maxj(j.lb + Lv′trans−

Lvtrans

). Here, j.lb + Lv′trans

− Lvtrans

is the increased lower bound for each safe interval j in

163

GetSafeIntervals(cfg.cell) where the orientation of dep cfg[cfg.cell, j] is the same as that of cfg

and j.lb ≤ i.lb. For agent a1 that departs from cell l = cfg.cell with configuration cfg and

translational velocity vtrans and moves also toward cell l′, the idea is to prevent it from arriving

at cell l′ earlier than (and thus “passing through”) any dynamic obstacle a2 that departs from

cell l before agent a1 with configuration dep cfg[cfg.cell, j] and translational velocity v′trans and

moves also toward cell l′.

The algorithm calls Function GetUB2(cfg, i) for an agent a1 to return minj(j.lb + Lv′trans−

Lvtrans

). Here, j.lb + Lv′trans

− Lvtrans

is the decreased upper bound for each safe interval j in

GetSafeIntervals(cfg.cell) where the orientation of dep cfg[cfg.cell, j] is the same as that of cfg

and j.lb ≥ i.ub. For agent a1 that departs from cell l = cfg.cell with configuration cfg and

translational velocity vtrans and moves also toward cell l′, the idea is to prevent it from arriving

at cell l′ later than (and thus “being passed through” by) any dynamic obstacle a2 that departs

from cell l after agent a1 with configuration dep cfg[cfg.cell, j] and translational velocity v′trans

and moves toward cell l′.

7.3.5 Admissible h-Values for Multiple Targets

TP-SIPPwRT combines Lines 7-10 of Algorithm 6.1 by letting Function Path1 directly compute

a time-minimal path from its current configuration to a configuration whose cell is the pickup cell

of any task in the candidate task set T ′, thus solving the one-shot single-agent target-assignment

and path-planning problems jointly for a free agent ai, and then a time-minimal path from there

to the delivery cell of the task after the agent becomes occupied. Experimentally, combining

target assignment and path planning for each free agent produces solutions that are as effective

as, if not more effective than, using the pre-computed h-values for target assignment, which does

not take other paths in the token into account, and then planning paths to execute the assigned

tasks.

Therefore, Function Path1 of TP-SIPPwRT requires an agent to use SIPPwRT twice, namely

(1) to plan a time-minimal path from its current configuration to a candidate set of endpoints

(pickup cells of tasks in the candidate task set T ′) and (2) to plan a time-minimal path from the

164

resulting configuration to a particular endpoint (a delivery cell). TP-SIPPwRT uses the same

Function Path2 (Line 14 of Algorithm 6.1), that requires the agent to use SIPPwRT once to plan

a time-minimal path from its current configuration to a candidate set of endpoints (to avoid dead-

locks). The agent always moves along each path with given (fixed) translational and rotational

velocities (unless it waits). Therefore, SIPPwRT has to plan only paths to a given set VGoal of

one or more endpoints.

Like in TP, by ignoring the dynamic obstacles, we determine the admissible h-values

needed for the A* search of SIPPwRT to plan time-minimal paths as follows: We calcu-

late a time-minimal path (that excludes waiting) for the agent from each configuration cfg

to each configuration cfg′ whose cell is an endpoint (by searching backward once from each

configuration cfg′). We then use the minimum heuristic (Stern, Goldenberg, & Felner, 2017)

h(cfg, VGoal) = mincfg′.cell∈VGoalh(cfg, cfg′) as admissible h-value of configuration cfg, where

h(cfg, cfg′) is the calculated time of the time-minimal path from cfg to cfg′. In practice, if set

VGoal is large and endpoints are densely distributed across the grid, it is more efficient to use

h(cfg, VGoal) = 0 (as we do for Function Path2 of TP-SIPPwRT) since it can be calculated faster

even though SIPPwRT might expand more nodes.

7.3.6 Pseudocode of SIPPwRT

Algorithm 7.2 shows the pseudocode of SIPPwRT, which plans a time-minimal path for an agent

with translational velocity vtrans and rotational velocity vrot from its configuration start cfg

at time current t to a cell in set VGoal. SIPPwRT performs a regular A* search with nodes

that are pairs of configurations of the agent and safe intervals. The g-value g[n] of a node

n = 〈n.cfg, n.int〉 with configuration n.cfg and safe interval n.int = [n.int.lb, n.int.ub] is the

earliest discovered time in n.int when the agent can be in configuration n.cfg. The start node is

n = 〈start cfg, [current t,∞]〉 with g[n] = current t. The safe interval n.int of the start node

expresses that the agent can wait forever in its current configuration. A node n is a goal node if

165

and only if the cell of its configuration is in set VGoal and the agent can wait forever in its con-

figuration (n.int.ub = ∞), that is, the agent can rest in the cell of its configuration, as required

by TP.

In our implementation of SIPPwRT, each action is a turn-and-move action. Since only for-

ward movements define the temporal constraints between safe intervals of neighboring cells,

the state space of our search remains unaffected by the use of turn-and-move actions instead of

separate point turn, wait, and move actions independently.

Algorithm 7.1: Function GetSuccessors

1 Function GetSuccessors(n)2 successors← ∅;3 foreach legal turn-and-move action in n do4 cfg t← configuration resulting from executing the point turn of action in n.cfg

(cfg t.cell = n.cfg.cell);5 ub1← GetUB1(cfg t, n.int);6 lb2← GetLB2(cfg t, n.int);7 ub2← GetUB2(cfg t, n.int);8 lb← max((g[n] + ∆tturn(action,vrot)), lb2);9 ub← min(ub1, ub2);

10 if lb ≤ ub then11 cfg′ ← configuration resulting from executing the forward movement of

action in n.cfg t;12 i′.lb← lb + ∆tmove(action,vtrans);13 i′.ub← ub + ∆tmove(action,vtrans);14 safeIntervals← GetSafeIntervals(cfg′.cell);15 foreach i′′ ∈ safeIntervals do16 lb1← GetLB1(cfg′, i′′);17 if [lb1, i′′.ub] ∩ i′ 6= ∅ then18 t′ ← max(i′.lb, lb1);19 n′ ← NewNode(〈cfg′, i′′〉);20 cost[n, n′]← t′ − g[n];21 successors← successors ∪ {n′};

22 return successors

166

7.3.6.1 Function GetSuccessors

Algorithm 7.1 shows the pseudocode of Function GetSuccessors(n), which calculates the suc-

cessors of node n by considering all legal turn-and-move actions action available to the agent

in configuration n.cfg [Line 3]. Assume that executing the point turn of action action takes

∆tturn(action,vrot) time units and results in configuration cfg t with which the agent departs

from its current cell [Line 4]. The agent must depart from its current cell no later than lb and no

earlier than ub to avoid colliding with dynamic obstacles that also visit its current cell [Lines 5-

9]. If the agent can depart from its current cell [Line 10], then assume that executing the forward

movement of action action in configuration cfg t takes ∆tmove(action,vtrans) time units and re-

sults in successor configuration cfg′ [Line 11]. The agent waits an appropriate amount of time in

configuration cfg t after the point turn, then executes the forward movement, and arrives in con-

figuration cfg′ in interval i′ = [lb+∆tmove(action,vtrans), ub+∆tmove(action,vtrans)] [Lines 12-

13]. The successors of node n are generated by processing all safe intervals i′′ = [i′′.lb, i′′.ub] for

the new cell cfg′.cell of the agent [Lines 14-15]. The lower bound of safe interval i′′ is increased

from i′′.lb to lb1 to ensure that the agent can arrive at its new cell without colliding with dynamic

obstacles that also visit its new cell [Line 16]. The updated safe interval [lb1, i′′.ub] is intersected

with interval i′ [Line 17]. If their intersection is non-empty, then the agent can arrive at its suc-

cessor configuration during safe interval i′′. Only the earliest time t′ in the intersection needs to

be considered (since the agent can simply wait in its successor configuration and the later times

in the intersection can thus be pruned, as argued earlier) [Line 18]. The resulting successor of

node n is n′ = 〈cfg′, i′′〉 [Line 19], and the cost (here: time) of the transition from node n to node

n′ is cost[n, n′] = t′ − g[n] [Line 20] (consisting of executing the point turn of action action for

∆tturn(action,vrot) time units, waiting for t′−g[n]−∆tturn(action,vrot)−∆tmove(action,vtrans)

time units, and then executing the forward movement of action action for ∆tmove(action,vtrans)

time units), so that g[n′] = g[n] + cost[n, n′] = t′ is the earliest discovered time in n′.int = i′′

when the agent can be in configuration n′.cfg = cfg′.

167

Algorithm 7.2: SIPPwRTInput: start cfg, VGoal, current t, vtrans, vrot

1 nstart ← NewNode(〈start cfg, [current t,∞]〉);2 g[nstart] ← current t;3 OPEN ← {nstart};4 while OPEN 6= ∅ do5 n← arg minn′∈OPEN(g[n′] + h(n′.cfg, G));6 OPEN ← OPEN \ {n};7 if n.cell ∈ VGoal and n.int.ub =∞ then8 return path from start cfg to n.cfg;

9 successors← GetSuccessors(n);10 foreach n′ ∈ successors do11 if g[n′] is undefined then12 g[n′]←∞;

13 if g[n′] > g[n] + cost[n, n′] then14 parent[n′]← n;15 g[n′]← g[n] + cost[n, n′];16 if n′ /∈ OPEN then17 OPEN ← OPEN ∪ {n′};

18 return “No Path Exists” (does not happen for well-formed MAPD problem instances);

7.3.6.2 Main Routine

The main routine of SIPPwRT performs a regular A* search. It initializes the g-value of the start

node and inserts the node into the OPEN list [Lines 1-3]. It then repeatedly removes a node n

with the smallest sum of g-value and h-value g[n] + h(n.cfg, VGoal) from the OPEN list [Lines

5-6] and processes it: If the node is a goal node, then it returns the path found by following the

parent pointers from the node to the start node [Lines 7-8]. Otherwise, it generates the successors

of the node [Line 9]. For each successor, it initializes its g-value to infinity if the g-value is still

undefined [Lines 11-12]. It then checks whether the g-value of the successor can be reduced

by changing its parent pointer to node n [Line 13]. If so, it changes the parent pointer of the

successor, reduces its g-value, and inserts it into the OPEN list (if necessary) [Lines 14-17].

168

g1s1

Figure 7.3: Example of a MAPD problem instance.

A C

B

D

E

Figure 7.4: Graph representation of the MAPD problem instance shown in Figure 7.3.

7.3.7 Example

We now use an example to describe how SIPPwRT computes a path. Figure 7.3 shows an

example of a MAPD problem instance on a 2D 4-neighbor grid with cells of size 1 m× 1 m

each. Colored circles are agents. The bar on each circle represents the orientation of each agent.

Dashed circles represent pickup and delivery cells. Figure 7.4 shows the corresponding graph

representation of the 2D 4-neighbor grid. Two free agents, a1 (in blue) with initial configuration

〈A,EAST〉 and a2 (in green) with initial configuration 〈E,WEST〉, are given. There is only one

task τ1 with s1 = A and g1 = E which is added to the system at t = 0. Cells B and D are

non-task endpoints. Both agents have a radius of 0.25 m and always turn with rotational velocity

π/2 = 1.57 rad per time unit. Agent a1 always moves with transitional velocity 1 m per time

unit, and agent a2 always moves with transitional velocity 0.5 m per time unit. We assume that

169

: ⟨A, EAST⟩n0

[0, ∞]

: ⟨C, EAST⟩n1

[1, 2]

: ⟨C, EAST⟩n2

[3, ∞]

: ⟨E, EAST⟩n3

[0, ∞]

: ⟨D, SOUTH⟩n4

[0, ∞]

: ⟨A, WEST⟩n5

[0, ∞]

Figure 7.5: The search of SIPPwRT.

agent a2 is assigned the path 〈〈〈E,WEST〉, 0〉, 〈〈C,WEST〉, 2〉, 〈〈B,NORTH〉, 5〉〉 that lets it

move to cell B and rest there. Then, agent a1 is assigned task τ1 and needs to compute a path

from the pickup cell A to the delivery cell E.

Figure 7.5 visualizes the search of SIPPwRT for agent a1. The configuration and safe interval

of each node (rectangular box) are shown. Each dashed arrow represents that some legal turn-

and-move action (see Line 3 of Algorithm 7.1) does not result in any successors.

The search starts with node n0 = 〈〈A,EAST〉, [0,∞]〉 with g[n0] = 0.

When the search expands node n0, it calls Function GetSuccessors(n0). The only legal turn-

and-move action is one that moves to cell C, and the search determines that agent a1 can safely

depart from cell A between lb = 0 and ub = ∞ > lb [Lines 8-10] since no dynamic obstacle

visits cell A and affects the bounds. Agent a1 can thus arrive in cell A in interval i′ = [1,∞]

[Lines 12-13]. The safe intervals of cell C are [0, 2] and [3,∞]. For safe interval i′′ = [0, 2],

no dynamic obstacle departs from cell C before agent a1 arrives in the cell and affects lb1 [Line

16]. The search successfully generates successor n1 for node n0 and determines that the earliest

time agent a1 can arrive at cell C for this safe interval is t′ = 1 (which is g[n1] in the main

routine) [Lines 17-19]. For safe interval i′′ = [2, 3], dynamic obstacle a2 departs from cell C at

time t = 3 in an orthogonal direction before agent a1 arrives in cell C from cell A. Function

170

Offset (the “Orthogonal Directions” case) returns ∆T =√52 , that increases lb1 to 3 +

√52 [Line

16]. The search successfully generates successor n2 for node n0 and determines that the earliest

time agent a1 can arrive at cell C for this safe interval is t′ = 3 +√52 (which is g[n2] in the main

routine) [Lines 17-19]. Here, the cost t′ − g[n0] includes waiting at cell A for√52 time units.

The search then expands node n1 in the main routine and calls Function GetSuccessors(n1).

The function call does not generate any successors because no legal turn-and-move action results

in any successors from the configuration of node n1. For example, for the turn-and-move action

that moves agent a1 to cell B, no dynamic obstacle affects lb2 or ub2, but dynamic obstacle a2,

which arrives in cell C at time t = 2 in an orthogonal direction before agent a1 departs from cell

C for cell B decreases ub1 to 2−√52 due to the “Orthogonal Directions” case of Function Offset

[Lines 4-7]. The search determines that agent a1 can safely depart from cell C only between

lb = 2 (including a point turn for 1 time unit) and ub = 2 −√52 < lb, which is not possible

[Lines 8-10]. For the turn-and-move action that moves agent a1 to cell E, no dynamic obstacle

affects lb2 or ub2, but dynamic obstacle a2, which arrives in cell C at time t = 2 in the opposite

direction before agent a1 departs from cell C for cell E decreases ub1 to 2 − 3 = −1 due to

the “Opposite Directions” case of Function Offset [Lines 4-7] (which implies that agent a1 has

to arrive in cell E before dynamic obstacle a2 departs from the cell at time 0 to avoid colliding

with the dynamic obstacle along the edge between the cells). The search determines that agent

a1 can safely depart from cell C only between lb = 1 and ub = −1 < lb, which is not possible

[Lines 8-10].

The search then expands node n2 in the main routine and calls Function GetSuccessors(n2).

The function call successfully generates successors n3 with g[n3] = 4 +√52 , n4 with g[n4] =

5+√52 , and n5 with g[n5] = 6+

√52 . And when the search expands node n3, it determines that the

node is a goal node and returns the path 〈〈〈A,EAST〉, 0〉, 〈〈C,EAST〉, 3+√52 〉, 〈〈E,EAST〉, 4+

√52 〉〉. Figure 7.6 visualizes agent a1 moving from cell A to cell C along this path.

171

Figure 7.6: Snapshots of agent a1 following its path. Left [t = 2 +√5+12 = 3.62]: 0.5 time

units after agent a1 departs from cell A. Middle [t = 3 + 2√5

5 = 3.89]: The distance between

agents a1 and a2 is at its minimum (0.5 m). Right [t = 3 +√52 = 4.12]: Agent a1 arrives at cell

C.

7.4 Analysis of Properties

We use the following theorem to prove that TP-SIPPwRT is correct (returns collision-free paths)

and long-term robust for well-formed MAPD problem instances.

Theorem 7.1. The path returned by SIPPwRT from the start configuration to a goal is free of

collisions.

Proof (by induction). Consider a path returned by SIPPwRT. By definition, the agent starts in

the first configuration nstart.cfg at time current t without any collisions. Assume that the agent

arrives at cell n.cfg.cell with configuration n.cfg all the way from the first configuration nstart.cfg

without any collisions. Let n′.cfg be the (successor) configuration next to configuration n.cfg in

the path. As the agent moves from cell l = n.cfg.cell to the next cell l′ = n′.cfg.cell, we use the

following arguments:

(1) Line 6 (7) of Algorithm 7.1, when node n is being expanded, guarantees that any dynamic

obstacle, which departs from cell l at a time earlier (later) than when the agent departs

from cell l and which then moves in the same direction as the agent, must arrive at the

next cell l′ earlier (later) than when the agent arrives at cell l′.

172

(2) Line 5 of Algorithm 7.1, when node n is being expanded, guarantees that the agent does

not collide with any dynamic obstacle which arrives at cell l in a non-opposite direction at

a time later than when the agent departs from cell l, until the dynamic obstacle has arrived

at cell l.

(3) Line 16 of Algorithm 7.1, when node n is being expanded and node n′ is being generated,

guarantees that the agent does not collide with any dynamic obstacle which departs from

cell l′ in a non-opposite direction at a time earlier than when the agent arrives at cell l′,

until the agent has arrived at cell l′. The line also guarantees that, if any dynamic obstacle

departs from cell l′ in the opposite direction at a time earlier than when the agent arrives at

cell l′, it also arrives at cell l at a time earlier than when the agent departs from cell l and

thus also departs from cell l at a time earlier than when the agent arrives at cell l because

its corresponding reserved interval does not intersect with the safe interval n.int for cell

l. Therefore, in both cases, the agent does not collide with such a dynamic obstacle (that

departs from cell l′ at a time earlier than when the agent arrives at cell l′) as it moves from

cell l to cell l′ due to the induction assumption.

(4) For a non-goal node n′, Line 5 of Algorithm 7.1, when node n′ is being expanded, guar-

antees that the agent does not collide with any dynamic obstacle which arrives at cell l′ in

a non-opposite direction at a time later than when the agent departs from cell l′, until the

dynamic obstacle has arrived at cell l′.

From the induction assumption, it suffices to prove that the agent does not collide with any given

dynamic obstacle as the agent departs from cell l until it arrives at cell l′. Assume, for a proof

by contradiction, that the agent collides with the dynamic obstacle.

(i) If the dynamic obstacle arrives at cell l at a time earlier than when the agent departs from

cell l, it also departs from cell l at a time earlier than when the agent departs from cell l

because its corresponding reserved interval does not intersect with the safe interval n.int

for cell l. It can collide with the agent only if it then moves from cell l to cell l′ at a fixed

173

velocity v′trans smaller than the fixed velocity vtrans at which the agent moves from cell l

to cell l′. It must arrive at cell l′ at a time earlier than when the agent arrives at cell l′ due

to (1) and also depart at a time earlier than when the agent arrives at cell l′ because its

corresponding reserved interval does not intersect with the safe interval n′.int for cell l′.

The collision thus remains when the agent arrives at cell l′, which contradicts (3).

(ii.a) If the dynamic obstacle arrives at cell l in a non-opposite direction at a time later than

when the agent departs from cell l, it does not collide with the agent until it arrives at cell

l due to (2). As a consequence, the dynamic obstacle also departs from cell l at a time

later than when the agent departs from cell l. It can collide with the agent only if it then

moves from cell l to cell l′ at a fixed velocity v′trans larger than the fixed velocity vtrans at

which the agent moves from cell l to cell l′. It must arrive at cell l′ at a time later than

when the agent arrives at cell l′ due to (1) and also later than when the agent departs from

cell l′ because its corresponding reserved interval does not intersect with the safe interval

n′.int for cell l′. Node n′ is thus a non-goal node due to Line 7 of Algorithm 7.2 (after n′

is popped from OPEN), and the collision thus remains when the agent departs from cell

l′, which contradicts (4).

(ii.b) If the dynamic obstacle arrives at cell l in the opposite direction at a time later than when

the agent departs from cell l, Line 5 of Algorithm 7.1, when node n is being expanded,

guarantees that it departs from cell l′ at a time later than when the agent arrives at cell l′.

The dynamic obstacle thus also arrives at cell l′ at a time later than when the agent departs

from cell l′ because its corresponding reserved interval does not intersect with the safe

interval n′.int for cell l′. Node n′ is thus a non-goal node due to Line 7 of Algorithm 7.2

(after n′ is popped from OPEN). If the dynamic obstacle arrives at cell l′ from another cell

l′′ in the opposite direction to the one in which the agent departs from cell l′ (toward cell

l′′), Line 5 of Algorithm 7.1 (the “Opposite Directions” case of Function Offset in Sec-

tion 7.3.3), when node n′ is being expanded, guarantees that the dynamic obstacle arrives

at cell l′′ at a time later than when the agent departs from cell l′, ensuring that the agent

174

does not collide with the dynamic obstacle until the agent departs from cell l′. Otherwise,

any collision thus remains when the agent departs from cell l′, which contradicts (4).

(iii) If the dynamic obstacle does not fall into Case (i) but also arrives at cell l′ at a time earlier

than when the agent arrives at cell l′, it must also depart from cell l′ earlier than when the

agent arrives at cell l′ because its corresponding reserved interval does not intersect with

the safe interval n′.int for cell l′. Any collision thus remains when the agent arrives at cell

l′, which contradicts (3).

(iv.a) If node n′ is a non-goal node and the dynamic obstacle does not fall into Case (ii.b) but also

arrives at cell l′ at a time later than when the agent arrives at cell l′, the dynamic obstacle

must arrive at cell l′ later than when the agent departs from cell l′ because its corresponding

reserved interval does not intersect with the safe interval n′.int for cell l′. We use the same

argument as for Case (ii.b): If the dynamic obstacle arrives at cell l′ from another cell l′′

in the opposite direction to the one in which the agent departs from cell l′ (toward cell l′′),

Line 5 (the “Opposite Directions” case of Function Offset in Section 7.3.3), when node n′

is being expanded, guarantees that the dynamic obstacle arrives at cell l′′ at a time later

than when the agent departs from cell l′, ensuring that the agent does not collide with

the dynamic obstacle until the agent departs from cell l′. Otherwise, any collision thus

remains when the agent departs from cell l′, which contradicts (4).

(iv.b) If node n′ is the goal node ngoal, no dynamic obstacle arrives at cell l′ at a time later than

when the agent arrives at cell l′ due to Line 7 of Algorithm 7.2 (after n′ is popped from

OPEN). No collision is possible until the agent has arrived at cell l′.

Therefore, the agent arrives at cell l′ = n′.cfg.cell with configuration n′.cfg without any colli-

sions.

Since all heuristics used by SIPPwRT are admissible as argued above, using the argument

in (Phillips & Likhachev, 2011) together with Theorem 7.1, it is straightforward to show that

SIPPwRT returns a time-minimal path to a given set VGoal of one or more endpoints that does

175

Figure 7.7: The small simulated warehouse environment with 50 agents.

not collide with the paths of other agents in the token and is complete for the single-agent path-

planning problems for function calls Path1 and Path2. We can thus rely on the proof of Theorem

6.1 in (Ma, Li, et al., 2017) to show the following theorem.

Theorem 7.2. TP-SIPPwRT is long-term robust for all well-formed MAPD problem instances.

7.5 Simulated Automated Warehouses

We demonstrate the benefits of TP-SIPPwRT for automated warehouses using both an agent

simulator with perfect path execution and a standard robot simulator with imperfect path execu-

tion resulting from unmodeled high-order dynamic constraints and motion noise by the MAPD

algorithms. Figure 7.7 shows an example on the agent simulator with 50 agents and cells of

size 1 m× 1 m. Blue gray cells in columns of gray cells are potential initial cells for the agents.

176

Colored disks are the actual initial cells, which are drawn randomly from all potential initial

cells and are the non-task endpoints. All agents face north in their initial cells. Gray cells other

than the initial cells are task endpoints (that would house inventory pods in a warehouse even

though we do not model inventory pods here). The pickup and delivery cells of all tasks are

drawn randomly from all task endpoints. White cells are non-endpoints. This simulator is based

on the previous MAPD benchmarks introduced in Section 6.6. But it uses a warehouse envi-

ronment that is closer to the layout design of a modern real-world Amazon Robotics automated

warehouse system (Wurman et al., 2008) and simulates some realistic constraints that are not

modeled in the previous MAPD benchmarks. For example, there are no blocked cells between

every two rows of task endpoints, robots carrying inventory pods move with a low translational

velocity and cannot move through cells that house inventory pods, and robots without inventory

pods move with a high translational velocity and are allowed to move through cells that house

inventory pods.

We now provide specifications of agent movements in the agent simulator. The agents model

circular warehouse robots. All agents use the same rotational velocity vrot. The following rules

impose restrictions on their legal movements and translational velocities: All free agents can

move with a high translational (free) velocity vtrans = vfree through all cells. All occupied

agents can move with a low translational (task) velocity vtrans = vtask through only the pickup

and delivery endpoints of their tasks and all other non-endpoints.

7.6 Experiments


laptop with 6 GB RAM. First, we compare TP-SIPPwRT, that directly computes plan-execution

schedules during planning, with MAPF-POST to transform the discrete MAPD solutions pro-

duced by TP (see Section 6.4) and CENTRAL (see Section 6.5) into plan-execution schedules.

Second, we study how TP-SIPPwRT scales with the number of number of agents, task frequency,

177

Table 7.1: Results for TP-SIPPwRT, CENTRAL, and TP-A* in the small simulated warehouseenvironment.

algorithm vtask

disc

rete

serv

ice

time

disc

rete

mak

espa

n

serv

ice

time

mak

espa

n

plan

ning

time

(s)

post

-pr

oces

sing

time

(s)

thro

ughp

ut

TP-SIPPwRT

0.50– –

944.03 2,475.58 0.90–

0.3970.75 601.69 1,755.22 0.92 0.5521.00 435.26 1,392.00 0.83 0.689

CENTRAL0.50

325.28 1,1631,049.51 2,617.00

1,161.44264.66 0.370

0.75 691.90 1,895.68 254.36 0.5041.00 520.36 1,553.00 269.91 0.609

TP-A*0.50

329.83 1,2041,026.23 2,628.22

1.00267.38 0.373

0.75 675.65 1,909.45 295.54 0.5081.00 505.81 1,570.77 278.74 0.609

and velocity of occupied agents. Third, we study how TP-SIPPwRT performs in a large simu-

lated automated warehouse system with hundreds of agents. Fourth, we evaluate TP-SIPPwRT

using a robot simulator.

7.6.1 Experiment 1: MAPD Algorithms and Task Velocity

We compared TP-SIPPwRT for vtask = 0.50, 0.75, and 1.00 m/s on the agent simulator in

the small simulated warehouse environment of Figure 7.7 to two (discrete) MAPD algorithms

that both assume discrete agent movements with uniform velocity to the four neighboring cells,

namely the original TP (labeled as TP-A*) and CENTRAL (see Chapter 6).

An alternative method to include kinematic constraints for the simulated warehouse environ-

ment is to use MAPF-POST (Honig, Kumar, Cohen, et al., 2016; Honig, 2019; Honig, Kumar,

Ma, et al., 2016) (see also Section 3.3) in a post-processing step by treating MAPD solutions

with discrete agent movements as solutions for one-shot coordination problems such as MAPF

and TAPF. To do so, MAPF-POST first converts the discrete MAPD solutions produced by TP

and CENTRAL from containing movements in the four compass directions to containing for-

ward movements and point turns and then adapts the discrete MAPD solutions in polynomial

178

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600

TP-SIPPwRT

CENTRAL

TP-A*

(a) vtask = 0.50 m/s.

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000 1200 1400 1600 1800 2000

TP-SIPPwRT

CENTRAL

TP-A*

(b) vtask = 0.75 m/s.

0

0.2

0.4

0.6

0.8

1

0 200 400 600 800 1000 1200 1400 1600

TP-SIPPwRT

CENTRAL

TP-A*

(c) vtask = 1.00 m/s.

Figure 7.8: Number of tasks executed by 30 agents per second in a moving 100-second window(t − 100, t] for TP-SIPPwRT, CENTRAL, and TP-A* as a function of time t for different taskvelocities.

179

time to continuous agent movements with given velocities. We point out the differences in the

assumptions made by TP-SIPPwRT and MAPF-POST:

• MAPF-POST allows agents to change their translational velocities at an auxiliary location

between the centers of two cells but TP-SIPPwRT allows agents to do so only at the center

of a cell.

• MAPF-POST allows a free (respectively, occupied) agent to move with a translational

velocity lower than vfree (respectively, vtask) and rotate with a rotational velocity lower than

vrot, but TP-SIPPwRT allows it to move only with translational velocity vfree (respectively,

vtask) and rotate with rotational velocity vrot.

• MAPF-POST guarantees safety distances of only at most L/√

2 = 0.71 m between agents

for a 2D 4-neighbor grid with cells of size 1 m× 1 m each and, in this case, requires that

all agents have the same radius, but TP-SIPPwRT can guarantee safety distances of at

most 1 m. We thus used the same radius of R = 0.5L/√

2 = 0.35 m for all agents in this

experiment.

For Experiment 1, we used a runtime limit of 5 minutes per instance. We used 30 agents

since CENTRAL, the most runtime-intensive of our MAPD algorithms, can handle only slightly

more than 30 agents without any timeouts in this environment. We used vfree = 1.00 m/s and

vrot = π/2 = 1.57 rad/s. We generated one sequence of 1,000 tasks by randomly choosing

their pickup and delivery vertices from all task endpoints. We used a task frequency of 2 tasks

per second (number of tasks that are added (in order) from the sequence to the system in the

beginning of every second).

Figure 7.8 visualizes the throughput at time t (number of tasks executed per second in the

100-second window (t − 100, t]), measured in tasks per second, as a function of t, measured in

seconds. It does not visualize the number of tasks added per second since the number is much

larger than the number of tasks executed per second for most t. The throughput at time t of

180

TP-SIPPwRT decreases earlier than the ones of TP-A* and CENTRAL because fewer still un-

executed tasks are available toward the end for TP-SIPPwRT than for them. Thus, TP-SIPPwRT

is more effective than them.

Table 7.1 reports the discrete service time (according to the discrete MAPD solution), dis-

crete makespan (according to the discrete MAPD solution), service time, makespan, planning

time (runtime of the MAPD algorithm), and post-processing time (runtime of MAPF-POST),

as well as the average throughput over all times t whose throughputs are positive, measured in

number of tasks per second. (Inapplicable entries are dashed.) The service time, makespan,

and throughput measure effectiveness, while the planning and post-processing times measure

efficiency. The planning time of TP-SIPPwRT is less than one second for 30 agents and 1,000

tasks. It is on par with the one of TP-A* and smaller than the one of CENTRAL. Furthermore,

TP-SIPPwRT does not have any post-processing time while both TP-A* and CENTRAL have

post-processing times of more than 250 seconds. Thus, TP-SIPPwRT is more efficient than

them. The service time and makespan of TP-SIPPwRT are smaller than the ones of TP-A* and

CENTRAL, while its throughput is larger. Therefore, TP-SIPPwRT is more effective than them.

7.6.2 Experiment 2: Number of Agents, Task Frequency, and Task Velocity

We ran TP-SIPPwRT with the same setup as in Experiment 1 (including the same sequence

of 1,000 tasks) for vtask = 0.50, 0.75, and 1.00 m/s, 10, 20, 30, 40, and 50 agents, and task

frequencies of 1, 2, 5, and 10 tasks per second. Table 7.2 shows that the planning time of TP-

SIPPwRT is less than one second for up to 50 agents and 1,000 tasks. As expected, the service

time decreases as the task frequency decreases; the service time and makespan decrease and

the throughput increases as the number of agents increases; and the service time and makespan

decrease and the throughput increases as the task velocity increases.

181

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

0 400 800 1200 1600 2000 2400 2800

100

150

200

250

tasks added

(a) vtask = 0.50 m/s.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200

100

150

200

250

tasks added

(b) vtask = 0.75 m/s.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

0 200 400 600 800 1000 1200 1400 1600 1800

100

150

200

250

tasks added

(c) vtask = 1.00 m/s.

Figure 7.9: Number of tasks added (gray) and executed per second in a moving 100-secondwindow (t− 100, t] for TP-SIPPwRT as a function of time t for different numbers of agents.

182

Table 7.2: Results for TP-SIPPwRT in the small simulated warehouse environment.

vtask 0.50 0.75 1.00

agen

ts

task

freq

uenc

y

serv

ice

time

mak

espa

n

plan

ning

time

(s)

thro

ughp

ut

serv

ice

time

mak

espa

n

plan

ning

time

(s)

thro

ughp

ut

serv

ice

time

mak

espa

n

plan

ning

time

(s)

thro

ughp

ut

10

1 2,809.72 6,771.00 0.84 0.146 1,834.97 4,764.28 0.86 0.213 1,357.21 3,818.00 0.72 0.2702 3,029.59 6,759.41 0.85 0.157 2,077.68 4,768.89 0.84 0.215 1,584.62 3,784.00 0.73 0.2745 3,181.97 6,789.41 0.86 0.155 2,185.29 4,748.33 0.86 0.225 1,710.76 3763.71 0.75 0.27410 3,215.43 6,775.00 0.84 0.159 2,252.70 4,762.45 0.88 0.219 1,750.19 3,749.00 0.75 0.280

20

1 1,228.35 3,557.58 0.90 0.295 745.48 2,540.33 0.89 0.411 502.50 2,000.71 0.76 0.5112 1,450.40 3,503.00 0.89 0.298 966.27 2,493.67 0.91 0.392 714.20 1,980.00 0.79 0.4895 1,591.79 3,519.83 0.89 0.292 1,088.86 2,481.85 0.88 0.416 844.32 1,966.00 0.81 0.50710 1,661.62 3,502.83 0.88 0.290 1,136.08 2,479.45 0.90 0.417 892.22 1,964.00 0.81 0.507

30

1 723.03 2,482.41 0.94 0.396 389.15 1,763.50 0.91 0.551 222.21 1,431.71 0.82 0.6722 944.03 2,475.58 0.90 0.397 601.69 1,755.22 0.92 0.552 435.26 1,392.00 0.83 0.6895 1,079.62 2,435.83 0.90 0.398 728.33 1,724.18 0.92 0.555 563.14 1,372.71 0.83 0.68810 1,126.47 2,468.00 0.93 0.393 779.67 1,737.00 0.92 0.550 612.06 1,380.00 0.84 0.065

40

1 484.93 2,023.58 0.90 0.484 225.18 1,471.12 0.95 0.657 101.16 1,252.00 0.85 0.7652 701.23 1,945.00 0.94 0.503 432.11 1,430.33 0.95 0.674 298.04 1,122.71 0.89 0.8475 830.73 2,054.00 0.89 0.470 563.25 1,368.67 0.94 0.693 427.23 1,073.00 0.87 0.87010 880.46 1,905.00 0.88 0.506 605.10 1,382.67 0.94 0.686 469.92 1,095.71 0.89 0.853

50

1 331.66 1,680.41 0.98 0.641 122.98 1,262.00 0.98 0.771 63.45 1,140.41 0.96 0.8452 557.10 1,676.58 0.97 0.581 335.58 1,192.51 0.96 0.804 219.99 968.00 0.92 0.9765 683.56 1,674.41 0.97 0.573 454.42 1,153.51 0.93 0.814 344.44 931.00 0.91 0.99210 729.07 1,644.41 0.97 0.582 502.03 1,200.94 0.98 0.784 389.78 926.00 0.93 0.996

183

Table 7.3: Results for TP-SIPPwRT in the large simulated warehouse environment.

vtask 0.50 0.75 1.00

agen

ts

serv

ice

time

mak

espa

n

plan

ning

time

(s)

thro

ughp

ut

serv

ice

time

mak

espa

n

plan

ning

time

(s)

thro

ughp

ut

serv

ice

time

mak

espa

n

plan

ning

time

(s)

thro

ughp

ut

100 877.94 2,891.58 5.72 0.703 489.46 2,130.67 5.83 0.914 289.80 1,671.00 5.15 1.157150 525.07 2,269.58 6.49 0.878 253.49 1,602.00 6.67 1.204 122.69 1,396.71 5.57 1.369200 353.46 1,905.58 7.08 1.026 154.76 1,504.63 7.35 1.314 117.21 1,276.12 9.50 1.504250 267.07 1,762.24 9.35 1.127 147.90 1,271.67 12.83 1.521 132.45 1,297.00 15.76 1.479

184

Figure 7.10: The large simulated warehouse environment with 250 agents.

7.6.3 Experiment 3: Scalability, Number of Agents, and Task Velocity

To evaluate how TP-SIPPwRT scales in the number of agents, we ran TP-SIPPwRT with the

same setup as in Experiment 1 but in the large simulated warehouse environment of Figure

7.10 for 100, 150, 200, and 250 agents and vtask = 0.50, 0.75, and 1.00 m/s. We used one

sequence of 2,000 tasks and a task frequency of 2 tasks per second. Figure 7.9 visualizes the

throughput at time t. Table 7.3 shows that the planning time of TP-SIPPwRT is less than 16

seconds for up to 250 agents and 2,000 tasks, justifying our claim that it can compute paths

for hundreds of agents and thousands of tasks in seconds. Similarly to before, the service time

and makespan decrease and the throughput and planning time increase as the number of agents

increases; and the service time and makespan decrease and the throughput increases as the task

velocity increases. There is an exception due to the congestion resulting from many agents for

250 agents and vtask = 1.00 m/s.

7.6.4 Experiment 4: Robot Simulator

We created a custom model of the kinodynamic constraints of a differential-drive Create2 robot

from iRobot for the robot simulator V-REP (Rohmer, Singh, & Freese, 2013). Create2 robots

have a cylindrical shape with radius 0.175 m and can reach a translational velocity of 0.5 m/s

and a rotational velocity of 4.2 rad/s. We used vfree = 0.40 m/s, vtask = 0.20 m/s, vrot =

π = 3.14 rad/s, and R = 0.40 m as conservative values to allow the robots to follow their paths

safely despite unmodeled high-order dynamic constraints and motion noise by TP-SIPPwRT.

We used an off-the-shelf PID controller (Honig, 2019), which is also used by recent research to

185

Figure 7.11: Screenshots for Experiment 4 at t = 35 s. Left: Agent simulator. Right: Robotsimulator.

verify MAPF-POST for MAPF (Honig, Kumar, Cohen, et al., 2016) and TAPF (Honig, Kumar,

Ma, et al., 2016), that uses [x, y, θ]T (given by V-REP) as the current state and the desired next

cell with the associated desired arrival time (given by TP-SIPPwRT) as the goal state. The PID

controller corrects for heading errors by orienting the robot to face the desired next cell while

simultaneously adjusting the translational velocity to let the robot arrive at the desired next cell at

the desired arrival time. We limit our experiment to the small warehouse environment of Figure

7.11 for 10 robots due to the slow runtime of V-REP. We used one sequence of 20 tasks and a

task frequency of 2 tasks per second. The planning time of TP-SIPPwRT is 2 ms. All robots

follow their paths safely, resulting in a service time of 90.57 s and a makespan of 171.16 s.

7.7 Summary

In this chapter, we studied MAPD with kinematic constraints of real-world agents in a simu-

lated system. We demonstrated how to adapt the MAPD algorithm TP to these constraints by

using the novel one-shot single-agent path-planning algorithm SIPPwRT that computes paths

with continuous agent movements with given velocities. The resulting algorithm TP-SIPPwRT

takes kinematic constraints of real-world agents into account directly during planning to com-

pute plan-execution schedules for them and provides guaranteed safety distances between them.

Theoretically, we showed that TP-SIPPwRT remains to be long-term robust for well-formed

MAPD problem instances. Experimentally, we demonstrated that TP-SIPPwRT can compute

a plan-execution schedule for hundreds of agents and thousands of tasks in seconds. We also

186

described, as a baseline method, how MAPF-POST can be used in a post-processing step to

include kinematic constraints of real-world agents to transform MAPD solutions with discrete

agent movements into plan-execution schedules. In our experiments, we showed that using

MAPF-POST to post-process discrete MAPD solutions is less efficient and effective than letting

TP-SIPPwRT compute plan-execution schedules directly. We showcased the benefits of both

methods for real-world applications of multi-agent systems in a simulated automated warehouse

system using both an agent simulator and a standard robot simulator.

To summarize, in this chapter, we validated the hypothesis that MAPD algorithms can poten-

tially be applied to and thus benefit the long-term coordination of autonomous target-assignment

and path-planning operations for real-world applications of multi-agent systems.

187

Chapter 8

Conclusions

In this chapter, we summarize the contributions of this dissertation and their direct impact on

multi-agent target-assignment and path-planning research. We then present potential future re-

search directions.

8.1 Contributions

In many real-world applications of multi-agent systems, teams of autonomous agents must as-

sign targets among themselves (target-assignment operations) and plan collision-free paths to

their targets (path-planning operations) in order to finish tasks cooperatively. The coordina-

tion of target-assignment and path-planning operations is a fundamental building block for these

multi-agent systems. It requires solving either the one-shot or the long-term combined target-

assignment and path-planning problem. Recent research in the AI community has focused on

tackling only the one-shot path-planning problem MAPF. However, none of the existing results

directly contributes to a theoretical understanding of or results in algorithms for either the one-

shot or the long-term combined target-assignment and path-planning problem.

While MAPF techniques can potentially be applied to the one-shot and the long-term com-

bined target-assignment and path-planning problems, three central questions remain: Question

1: How hard is it to jointly assign targets to and plan paths for teams of agents? Question 2:

188

How and how well can one jointly assign targets to and plan paths for teams of agents? Question

3: How do teams of agents execute the computed solutions?

In this dissertation, we addressed the above central questions by evaluating the following

hypothesis:

Formalizing and studying new variants of MAPF can result in new theoretical in-

sights into or new algorithms for the one-shot and the long-term combined target-

assignment and path-planning problems for teams of agents, which can benefit real-

world applications of multi-agent systems.

We presented the following contributions to validate this hypothesis:

• Contribution 1: In Chapter 4, we introduced a unified NP-hardness proof structure that

can be used to derive complexity results for many target-assignment and path-planning

problems. This unified NP-hardness proof structure stems from formalizing and study-

ing PERR, a variant of MAPF, and its generalization K-PERR. We demonstrated how

to derive NP-hardness and fixed-parameter inapproximability results for both PERR and

K-PERR using this unified NP-hardness proof structure. This unified NP-hardness proof

structure lays the theoretical foundation for studying the one-shot and the long-term com-

bined target-assignment and path-planning problems, namely TAPF and MAPD that are

addressed in this dissertation, and many other target-assignment and path-planning prob-

lems. As a consequence of the theoretical understanding of the relationships between

various target-assignment and path-planning problem formulations (see Chapter 2), we

demonstrated how the theoretical results for PERR and K-PERR can be carried over to

other problems. For example, we improved the state-of-the-art complexity results for

makespan minimization for MAPF. We also used the unified NP-hardness proof structure

to prove that TAPF is NP-hard to approximate within any constant factor less than 4/3,

and that MAPD is NP-hard to solve optimally for service time minimization and NP-hard

to approximate within any constant factor less than 4/3 for makespan minimization.

189

• Contribution 2: In Chapter 5, we formalized and studied TAPF, a variant of MAPF that

models the one-shot combined target-assignment and path-planning problem for multiple

teams of agents. We presented CBM, a hierarchical algorithm that exploits the combinato-

rial structure of TAPF by breaking it down to the polynomial-time solvable sub-problems

of coordinating the agents in every team on the low-level, which are solved by a min-cost

max-flow algorithm, and the NP-hard sub-problem of coordinating multiple teams on the

high level, which is solved by a best-first tree-search algorithm. We proved that CBM

is complete and optimal for TAPF. We also presented an ILP-based TAPF algorithm us-

ing a multi-commodity flow-based ILP encoding. Experimental results showed that CBM

runs faster than the ILP-based TAPF algorithm and can compute solutions for hundreds

of agents in minutes of runtime. These TAPF algorithms can use an existing polynomial-

time procedure in a post-processing step to transform their solutions into plan-execution

schedules that take kinematic constraints of real-world agents into account.

• Contribution 3: In Chapter 6, we formalized and studied MAPD, a variant of MAPF

that models the long-term combined target-assignment and path-planning problem. We

demonstrated how MAPD algorithms can utilize environmental characteristics of well-

formed MAPD problem instances to guarantee long-term robustness. We presented two

decoupled MAPD algorithms, TP and TPTS, and one centralized MAPD algorithm, CEN-

TRAL. They reduce, in different ways, the long-term problem MAPD to a sequence of

one-shot target-assignment and path-planning sub-problems, which can then be solved

by different one-shot target-assignment and path-planning algorithms. Theoretically, we

proved that all these MAPD algorithms are long-term robust for well-formed problem in-

stances. Experimentally, we compared them in a simulated warehouse environment and

provided guidelines for identifying when each one should be used. In particular, TP re-

mains efficient for MAPD problem instances with hundreds of agents and tasks and thus

has the potential to be deployed in large-scale multi-agent systems.

190

• Contribution 4: In Chapter 7, we studied MAPD with kinematic constraints of real-

world agents. We demonstrated how MAPD algorithms can take kinematic constraints of

real-world agents into account to produce plan-execution schedules for the agents. We

presented TP-SIPPwRT, an improvement to TP that uses the novel one-shot single-agent

path-planning algorithm SIPPwRT to compute paths with continuous agent movements

with given velocities. As a result, TP-SIPPwRT takes kinematic constraints of real-world

agents into account directly during planning to compute plan-execution schedules for them

and provides guaranteed safety distances between them. We experimentally evaluated TP-

SIPPwRT in a simulated automated warehouse system using an agent and a robot simula-

tors and demonstrated that TP-SIPPwRT can compute solutions for hundreds of agents and

thousands of tasks in seconds and is more efficient and effective than the baseline method

that uses an existing polynomial-time procedure in a post-processing step to transform

discrete MAPD solutions into plan-execution schedules.

Therefore, Contribution 1 validates that formalizing and studying new variants of MAPF can

result in new theoretical insights into the one-shot and the long-term combined target-assignment

and path-planning problems for teams of agents. Contribution 2 validates that formalizing and

studying new variants of MAPF can result in algorithms for the one-shot combined target-

assignment and path-planning problems for teams of agents that have the potential to be applied

to and can thus benefit real-world applications of multi-agent systems. Contribution 3 validates

that formalizing and studying new variants of MAPF can result in algorithms for the long-term

combined target-assignment and path-planning problems for teams of agents. Contribution 4

validates that the algorithms we presented for the long-term combined target-assignment and

path-planning problems for teams of agents have the potential to be applied to and can thus

benefit real-world applications of multi-agent systems.

191

8.2 Direct Impact

Prior to this dissertation, the AI community has applied MAPF techniques only to the one-shot

path-planning problem. Our recent research on MAPF-POST (Honig, Kumar, Cohen, et al.,

2016) demonstrated the potential benefit of MAPF techniques for real-world applications of

multi-agent systems. However, none of the existing results on MAPF directly solve the one-shot

or the long-term combined target-assignment and path-planning problem, which models the co-

ordination of both the target-assignment and path-planning operations of the agents as required

by many real-world applications of multi-agent systems. This dissertation moves MAPF tech-

niques one-step forward toward establishing theoretical foundations of both the one-shot and the

long-term coordination of target-assignment and path planning operations of teams of agents and

developing algorithmic solutions for addressing these coordination problems. We now demon-

strate how the contributions presented in this dissertation have already had a direct impact on

state-of-the-art multi-agent target-assignment and path-planning research. In the following, we

list examples of a subset of such research that has been published in top-tier AI and robotics

conferences and high-impact robotics journals (AAAI, IJCAI, AAMAS, ICAPS, ICRA, IROS,

and RA-L) between 2017 and 2019.

• Research that presents new algorithms for TAPF or MAPD:

– “Conflict-Based Search with Optimal Task Assignment” (Honig, Kiesel, Tinka,

Durham, & Ayanian, 2018). This work uses the TAPF problem formulation and

presents a new algorithm that is optimal for the flowtime objective, It is part of

Hoenig’s Ph.D. dissertation (Honig, 2019).

– “A Multi-Label A* Algorithm for Multi-Agent Pathfinding” (Grenouilleau et al.,

2019). This work uses the MAPD problem formulation and improves upon TP by

finding a path that moves an agent first to the pickup vertex and then to the delivery

vertex of its assigned task in one A* search.

192

– “Priority Inheritance with Backtracking for Iterative Multi-agent Path Finding”

(Okumura et al., 2019). This work uses the MAPD problem formulation and presents

a new prioritized planning algorithm that can solve both MAPF and MAPD.

• Research that studies new variants of TAPF or MAPD:

– “Generalized Target Assignment and Path Finding Using Answer Set Programming”

(Nguyen et al., 2017). This work studies a generalized version of TAPF where each

agent may need to visit more than one target. The problem can also be considered

as an offline (and thus one-shot) version of MAPD. It can be used for long-term

coordination if lookahead computation is allowed.

– “Target Assignment and Path Planning for Multi-Agent Pickup and Delivery” (Liu,

Ma, et al., 2019). Similar to the above work, this work studies an offline version of

MAPD where all tasks are known. It presents a new way of utilizing environmental

characteristics of well-formed MAPD problem instances.

– “An Optimal Algorithm to Solve the Combined Task Allocation and Path Finding

Problem” (Henkel et al., 2019). This work also studies an offline version of MAPD

and presents an optimal search-based algorithm that scales well to four agents and

four tasks.

– “Online Multi-Agent Pathfinding” (Svancara, Vlk, Stern, Atzmon, & Bartak, 2019).

This work studies the long-term path-planning problem where agents appear at un-

known times with preassigned targets.

• Research that studies MAPD with realistic simulations:

– “A Hierarchical Framework for Coordinating Large-Scale Robot Networks” (Liu,

Zhou, et al., 2019). This work studies MAPD in a simulated warehouse system

where the solutions need to be executed in a distributed setting and communication

between warehouse robots needs to be taken into account.

193

– “Persistent and Robust Execution of MAPF Schedules in Warehouses” (Honig et al.,

2019). This work studies MAPD on a real-time automated warehouse simulator and

develops execution monitoring schemes that handle motion uncertainty. It is also

part of Hoenig’s Ph.D. dissertation (Honig, 2019).

8.3 Limitations and Future Directions

Some promising future directions of this dissertation have already been pointed out in the publi-

cations mentioned in the above section. In the following, we highlight important directions that

result from the limitations of this dissertation and have not yet been addressed by the state of the

art.

• This dissertation does not study the computational complexity of versions of Anonymous

MAPF and 1-PERR that minimize the flowtime. Instead, their complexities have been

posted as open questions in Chapter 4. Future work should thus develop a better theo-

retical and algorithmic understanding of Anonymous MAPF and 1-PERR for flowtime

minimization.

• Few theoretical insights and results are known for MAPD besides the complexity results

presented in Chapter 4 and the well-formed properties presented in Chapter 6. In particu-

lar, the MAPD algorithms presented in Chapters 6 and 7 do not provide any effectiveness

guarantees. Future work should thus develop a better theoretical and algorithmic under-

standing of MAPD. For example, MAPD can be studied from the point of view of com-

petitive analysis, which might result in algorithms that provide effectiveness guarantees or

allow one to conclude that no competitive algorithms exist.

• The MAPD algorithms presented in Chapters 6 and 7 do not consider dependencies be-

tween tasks, task constraints, or different priorities of tasks, which makes them unrealistic

for some real-world applications. For example, in an automated warehouse system, an

inventory pod cannot be simultaneously delivered to multiple inventory stations that need

194

the pod (that is, some tasks cannot be executed in parallel), some tasks might have dead-

line constraints (that is, some tasks must be finished before a deadline), and some tasks

might have a higher priority than the others (for example, some tasks have a hard deadline

but the others do not). Future work should thus investigate how such task dependencies,

constraints, and priorities can be incorporated into MAPD.

• In this dissertation, we have assumed that targets/tasks are either completely known (for

the one-shot problems MAPF, PERR, and TAPF) or completely unknown (for the long-

term problem MAPD). However, in many real-world applications, some information about

unknown targets/tasks, for example, the frequency of each inventory pod is needed by an

inventory station in an automated warehouse system, can be obtained from historical data

using data-driven techniques. Future work should thus investigate how real-world data

can be used to predict when and where a future task will appear, which might improve the

effectiveness of existing target-assignment and path-planning algorithms.

• Many state-of-the-art algorithmic results for MAPF can be applied to TAPF. For example,

in this dissertation, we have focused on formalizing and presenting the first algorithmic

results for TAPF. However, we have not focused on improving the efficiency of the TAPF

algorithms presented in Chapter 5. Recent research (Felner et al., 2018; Li, Boyarski,

et al., 2019; Li, Gange, et al., 2020; Li, Harabor, Stuckey, Felner, et al., 2019; Li, Hara-

bor, Stuckey, Ma, & Koenig, 2019) has presented improved versions of CBS that speed

up the high-level search of CBS or bounded-suboptimal versions of CBS (Barer et al.,

2014; Cohen et al., 2016) that sacrifice optimality for efficiency. Future work should thus

investigate whether and, if so, how these results can make CBM more efficient.

• Many state-of-the-art algorithmic results on MAPF can be applied to MAPD. For example,

in Chapter 7, we have extended TP to TP-SIPPwRT that takes kinematic constraints into

account directly during planning. However, we have not investigated whether and, if so,

how CENTRAL can also take kinematic constraints into account directly during planning.

Recent research (Andreychuk, Yakovlev, Atzmon, & Stern, 2019) has presented a version

195

of CBS that computes MAPF solutions with continuous time on 2k-neighbor grids (where

edges have non-uniform costs). Future work should thus investigate whether and, if so,

how this algorithm can be used to let CENTRAL take kinematic constraints into account

to compute plan-execution schedules directly.

• In this dissertation, we have only considered some of the kinodynamic constraints (which

we refer to as kinematic constraints) of real-world agents (see Section 1.1), including their

kinematic and first-order dynamic constraints. We have ignored high-order dynamic con-

straints, such as finite acceleration and deceleration limits, and have assumed that these

constraints are handled by the controllers of real-world agents. While these assump-

tions have been verified to work on many real-world agents, such as differential-drive

robots, they can make it unsafe for real-world agents with complex dynamics, for exam-

ple, quadcopters, to follow the plan-execution schedules produced by MAPF-POST or

TP-SIPPwRT. Future work should thus investigate whether and, if so, how our target-

assignment and path-planning algorithms can take high-order dynamic constraints of real-

world agents into account.

196

Bibliography

Agmon, N., Urieli, D., & Stone, P. (2011). Multiagent patrol generalized to complex environ-mental conditions. In AAAI Conference on Artificial Intelligence (pp. 1090–1095). (Cit. onp. 1).

Ahmadi, M., & Stone, P. (2006). A multi-robot system for continuous area sweeping tasks. InIEEE International Conference on Robotics and Automation (pp. 1724–1729). (Cit. onpp. 1, 121).

Andreychuk, A., Yakovlev, K., Atzmon, D., & Stern, R. (2019). Multi-agent pathfinding withcontinuous time. In International Joint Conference on Artificial Intelligence (pp. 39–45).(Cit. on p. 195).

Aronson, J. E. (1989). A survey of dynamic network flows. Annals of Operations Research,20(1-4), 1–66. (Cit. on p. 51).

Atzmon, D., Stern, R., Felner, A., Wagner, G., Bartak, R., & Zhou, N. (2018). Robust multi-agent path finding. In International Symposium on Combinatorial Search (pp. 2–9). (Cit.on p. 44).

Azar, Y., Naor, J., & Rom, R. (1995). The competitiveness of on-line assignments. Journal ofAlgorithms, 18(2), 221–237. (Cit. on pp. 6, 8, 15, 121, 137).

Balch, T., & Arkin, R. C. (1998). Behavior-based formation control for multirobot teams. IEEETransactions on Robotics and Automation, 14(6), 926–939. (Cit. on p. 1).

Banfi, J., Basilico, N., & Amigoni, F. (2017). Intractability of time-optimal multirobot path plan-ning on 2D grid graphs with holes. IEEE Robotics and Automation Letters, 2(4), 1941–1947. (Cit. on p. 39).

Barer, M., Sharon, G., Stern, R., & Felner, A. (2014). Suboptimal variants of the conflict-basedsearch algorithm for the multi-agent pathfinding problem. In International Symposium onCombinatorial Search (pp. 19–27). (Cit. on pp. 51, 195).

Becker, R., Zilberstein, S., Lesser, V., & Goldman, C. V. (2004). Solving transition indepen-dent decentralized Markov Decision Processes. Journal of Artificial Intelligence Research,22(1), 423–455. (Cit. on p. 44).

197

Bellman, R. (1958). On a routing problem. Quarterly of Applied Mathematics, 16(1), 87–90.(Cit. on p. 56).

Bennewitz, M., Burgard, W., & Thrun, S. (2002). Finding and optimizing solvable priorityschemes for decoupled path planning techniques for teams of mobile robots. Roboticsand Autonomous Systems, 41(2-3), 89–99. (Cit. on pp. 41, 44, 45).

Bertsekas, D. P. (1992). Auction algorithms for network flow problems: A tutorial introduction.Computational Optimization and Applications, 1(1), 7–66. (Cit. on pp. 7, 11, 15, 57, 61).

Bnaya, Z., & Felner, A. (2014). Conflict-oriented windowed hierarchical cooperative A*. InIEEE International Conference on Robotics and Automation (pp. 3743–3748). (Cit. onp. 42).

Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. InConference on Theoretical Aspects of Rationality and Knowledge (pp. 195–210). (Cit. onp. 44).

Boyarski, E., Felner, A., Stern, R., Sharon, G., Tolpin, D., Betzalel, O., & Shimony, S. E. (2015).ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In Interna-tional Joint Conference on Artificial Intelligence (pp. 740–746). (Cit. on p. 50).

Brucker, P. (2010). Scheduling algorithms (5th). Springer. (Cit. on pp. 6, 13, 62).

Burkard, R. E., Dell’Amico, M., & Martello, S. (2009). Assignment problems. Springer. (Cit. onp. 57).

Cap, M., Vokrınek, J., & Kleiner, A. (2015). Complete decentralized method for on-line multi-robot trajectory planning in well-formed infrastructures. In International Conference onAutomated Planning and Scheduling (pp. 324–332). (Cit. on pp. 5, 8, 45, 122, 126).

Cohen, L., Uras, T., & Koenig, S. (2015). Feasibility study: Using highways for bounded-suboptimal multi-agent path finding. In International Symposium on CombinatorialSearch (pp. 2–8). (Cit. on p. 117).

Cohen, L., Greco, M., Ma, H., Hernandez, C., Felner, A., Kumar, T. K. S., & Koenig, S. (2018).Anytime focal search with applications. In International Joint Conference on ArtificialIntelligence (pp. 1434–1441). (Cit. on p. 51).

Cohen, L., Uras, T., Kumar, T. K. S., Xu, H., Ayanian, N., & Koenig, S. (2016). Improved solversfor bounded-suboptimal multi-agent path finding. In International Joint Conference onArtificial Intelligence (pp. 3067–3074). (Cit. on pp. 51, 115, 195).

Coltin, B., & Veloso, M. (2014). Ridesharing with passenger transfers. In IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems (pp. 3278–3283). (Cit. on p. 62).

198

de Wilde, B., ter Mors, A. W., & Witteveen, C. (2013). Push and Rotate: Cooperative multi-agent path planning. In International Conference on Autonomous Agents and MultiagentSystems (pp. 87–94). (Cit. on p. 40).

Derigs, U., & Zimmermann, U. (1978). An augmenting path method for solving linear bottleneckassignment problems. Computing, 19(4), 285–295. (Cit. on p. 57).

Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische Mathe-matik, 1(1), 269–271. (Cit. on p. 104).

Dresner, K., & Stone, P. (2008). A multiagent approach to autonomous intersection management.Journal of Artificial Intelligence Research, 31, 591–656. (Cit. on p. 1).

Erdem, E., Kisa, D. G., Oztok, U., & Schueller, P. (2013). A general formal framework forpathfinding problems with multiple agents. In AAAI Conference on Artificial Intelligence(pp. 290–296). (Cit. on p. 40).

Erdmann, M. A., & Lozano-Perez, T. (1987). On multiple moving objects. Algorithmica, 2, 477–521. (Cit. on pp. 41, 44, 45).

Felner, A., Li, J., Boyarski, E., Ma, H., Cohen, L., Kumar, T. K. S., & Koenig, S. (2018). Addingheuristics to conflict-based search for multi-agent pathfinding. In International Conferenceon Automated Planning and Scheduling (pp. 83–87). (Cit. on pp. 51, 115, 195).

Felner, A., Stern, R., Shimony, S. E., Boyarski, E., Goldenberg, M., Sharon, G., Sturtevant,N. R., Wagner, G., & Surynek, P. (2017). Search-based optimal solvers for the multi-agentpathfinding problem: Summary and challenges. In International Symposium on Combina-torial Search (pp. 29–37). (Cit. on pp. 7, 14, 15, 20, 39).

Ford Jr, L. R., & Fulkerson, D. R. (2015). Flows in networks. Princeton University Press. (Cit. onp. 56).

Ford, L. R., & Fulkerson, D. R. (1956). Maximal flow through a network. Canadian Journal ofMathematics, 8, 399–404. (Cit. on p. 104).

Fulkerson, D. R., Glicksberg, I., & Gross, O. (1953). A production line assignment problem. TheRand Corporation, Paper RM-1102. (Cit. on pp. 11, 57, 61).

Garfinkel, R. S. (1971). An improved algorithm for the bottleneck assignment problem. Opera-tions Research, 19(7), 1747–1751. (Cit. on pp. 7, 11, 15, 57, 61).

Gerkey, B. P., & Mataric, M. J. (2002). Sold!: Auction methods for multirobot coordination.IEEE Transactions on Robotics and Automation, 18(5), 758–768. (Cit. on pp. 58, 122).

Gerkey, B. P., & Mataric, M. J. (2004). A formal analysis and taxonomy of task allocation inmulti-robot systems. The International Journal of Robotics Research, 23(9), 939–954.(Cit. on pp. 5, 33, 58).

199

Goldberg, A. V., & Tarjan, R. E. (1987). Solving minimum-cost flow problems by successiveapproximation. In Annual ACM Symposium on Theory of Computing (pp. 7–18). (Cit. onpp. 14, 92, 95, 104, 118).

Goldenberg, M., Felner, A., Stern, R., Sharon, G., Sturtevant, N. R., Holte, R. C., & Schaeffer,J. (2014). Enhanced partial expansion A*. Journal of Artificial Intelligence Research, 50,141–187. (Cit. on p. 42).

Goldman, C. V., & Zilberstein, S. (2004). Decentralized control of cooperative systems: Catego-rization and complexity analysis. Journal of Artificial Intelligence Research, 22, 143–174.(Cit. on p. 44).

Goldreich, O. (2011). Finding the shortest move-sequence in the graph-generalized 15-puzzleis NP-hard. In Studies in Complexity and Cryptography. Miscellanea on the Interplaybetween Randomness and Computation (pp. 1–5). Springer. (Cit. on pp. 7, 11, 38, 61).

Goraly, G., & Hassin, R. (2010). Multi-color pebble motion on graphs. Algorithmica, 58(3),610–636. (Cit. on p. 38).

Grenouilleau, F., van Hoeve, W., & Hooker, J. N. (2019). A multi-label A* algorithm for multi-agent pathfinding. In International Conference on Automated Planning and Scheduling(pp. 181–185). (Cit. on pp. 132, 144, 145, 192).

Gross, O. (1959). The bottleneck assignment problem. The Rand Corporation, Paper P-1630.(Cit. on pp. 7, 11, 15, 57, 61).

Henkel, C., Abbenseth, J., & Toussaint, M. (2019). An optimal algorithm to solve the combinedtask allocation and path finding problem. IEEE/RSJ International Conference on Intelli-gent Robots and Systems. (Cit. on pp. 121, 137, 193).

Honig, W., Kumar, T. K. S., Cohen, L., Ma, H., Xu, H., Ayanian, N., & Koenig, S. (2016). Multi-agent path finding with kinematic constraints. In International Conference on AutomatedPlanning and Scheduling (pp. 477–485). (Cit. on pp. 9, 55, 155, 156, 178, 186, 192).

Honig, W. (2019). Motion coordination for large multi-robot teams in obstacle-rich environ-ments (Doctoral dissertation, University of Southern California). (Cit. on pp. 55, 56, 178,185, 192, 194).

Honig, W., Kiesel, S., Tinka, A., Durham, J. W., & Ayanian, N. (2018). Conflict-based searchwith optimal task assignment. In International Conference on Autonomous Agents andMultiagent Systems (pp. 757–765). (Cit. on p. 192).

Honig, W., Kiesel, S., Tinka, A., Durham, J. W., & Ayanian, N. (2019). Persistent and robustexecution of MAPF schedules in warehouses. IEEE Robotics and Automation Letters,4(2), 1125–1131. (Cit. on pp. 144, 156, 194).

200

Honig, W., Kumar, T. K. S., Ma, H., Ayanian, N., & Koenig, S. (2016). Formation change forrobot groups in occluded environments. In IEEE/RSJ International Conference on Intel-ligent Robots and System (pp. 4836–4842). (Cit. on pp. 1, 55, 91, 117, 155, 156, 178,186).

Honig, W., Preiss, J. A., Kumar, T. S., Sukhatme, G. S., & Ayanian, N. (2018). Trajectory plan-ning for quadrotor swarms. IEEE Transactions on Robotics, 34(4), 856–869. (Cit. on pp. 1,91).

Jennings, J. S., Whelan, G., & Evans, W. F. (1997). Cooperative search and rescue with a team ofmobile robots. In International Conference on Advanced Robotics (pp. 193–200). (Cit. onp. 1).

Jiang, Y., Yedidsion, H., Zhang, S., Sharon, G., & Stone, P. (2019). Multi-robot planning withconflicts and synergies. Autonomous Robots, 43(18), 2011–2032. (Cit. on p. 91).

Johnson, W. W., & Story, W. E. (1879). Notes on the “15” puzzle. American Journal of Mathe-matics, 2(4), 397–404. (Cit. on p. 38).

Kalyanasundaram, B., & Pruhs, K. (1993). Online weighted matching. Journal of Algorithms,14(3), 478–488. (Cit. on pp. 6, 8, 15, 58, 121, 126, 137).

Kavraki, L. E., & LaValle, S. M. (2016). Motion planning. Springer Handbook of Robotics, 139–162. (Cit. on p. 6).

Khandelwal, P., Zhang, S., Sinapov, J., Leonetti, M., Thomason, J., Yang, F., Gori, I., Svetlik, M.,Khante, P., Lifschitz, V., et al. (2017). Bwibots: A platform for bridging the gap betweenAI and human–robot interaction research. International Journal of Robotics Research,36(5-7), 635–659. (Cit. on pp. 1, 121).

Khorshid, M., Holte, R., & Sturtevant, N. R. (2011). A polynomial-time algorithm for non-optimal multi-agent pathfinding. In International Symposium on Combinatorial Search(pp. 76–83). (Cit. on p. 40).

Khuller, S., Mitchell, S. G., & Vazirani, V. V. (1994). On-line algorithms for weighted bipartitematching and stable marriages. Theoretical Computer Science, 127(2), 255–267. (Cit. onpp. 8, 15, 58, 122).

Kloder, S., & Hutchinson, S. (2006). Path planning for permutation-invariant multirobot forma-tions. IEEE Transactions on Robotics, 22(4), 650–665. (Cit. on p. 26).

Kornhauser, D., Miller, G., & Spirakis, P. (1984). Coordinating pebble motion on graphs, thediameter of permutation groups, and applications. In Annual Symposium on Foundationsof Computer Science (pp. 241–250). (Cit. on pp. 11, 37, 38, 61).

201

Korsah, G. A., Stentz, A., & Dias, M. B. (2013). A comprehensive taxonomy for multi-robot taskallocation. The International Journal of Robotics Research, 32(12), 1495–1512. (Cit. onp. 58).

Kou, N. M., Peng, C., Ma, H., Kumar, T. K. S., & Koenig, S. (2020). Idle time optimization fortarget assignment and path finding in sortation centers. In AAAI Conference on ArtificialIntelligence (pp. 9925–9932). (Cit. on p. 1).

Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logis-tics Quarterly, 2, 83–97. (Cit. on pp. 7, 11, 15, 57, 61, 130, 137, 139).

Kurniawati, H., Hsu, D., & Lee, W. S. (2008). SARSOP: Efficient point-based POMDP planningby approximating optimally reachable belief spaces. In Robotics: Science and Systems.(Cit. on p. 44).

Li, J., Boyarski, E., Felner, A., Ma, H., & Koenig, S. (2019). Improved heuristics for multi-agentpath finding with conflict-based search. In International Joint Conference on ArtificialIntelligence (pp. 442–449). (Cit. on pp. 51, 195).

Li, J., Gange, G., Harabor, D., Stuckey, P. J., Ma, H., & Koenig, S. (2020). New techniques forpairwise symmetry breaking in multi-agent path finding. In International Conference onAutomated Planning and Scheduling (pp. 193–201). (Cit. on pp. 51, 195).

Li, J., Harabor, D., Stuckey, P. J., Felner, A., Ma, H., & Koenig, S. (2019). Disjoint splitting forconflict-based search for multi-agent path finding. In International Conference on Auto-mated Planning and Scheduling (pp. 279–283). (Cit. on pp. 51, 195).

Li, J., Harabor, D., Stuckey, P. J., Ma, H., & Koenig, S. (2019). Symmetry-breaking constraintsfor grid-based multi-agent path finding. In AAAI Conference on Artificial Intelligence(pp. 6087–6095). (Cit. on pp. 51, 195).

Li, J., Sun, K., Ma, H., Felner, A., Kumar, T. K. S., & Koenig, S. (2020). Moving agents information in congested environments. In International Conference on Autonomous Agentsand Multiagent Systems (pp. 726–734). (Cit. on pp. 1, 91).

Li, J., Surynek, P., Felner, A., Ma, H., Kumar, T. K. S., & Koenig, S. (2019). Multi-agent pathfinding for large agents. In AAAI Conference on Artificial Intelligence (pp. 7627–7634).(Cit. on p. 44).

Liu, L., & Michael, N. (2016). An MDP-based approximation method for goal constrainedmulti-MAV planning under action uncertainty. In IEEE/RSJ International Conference onRobotics and Automation (pp. 56–62). (Cit. on p. 44).

Liu, M., Ma, H., Li, J., & Koenig, S. (2019). Target assignment and path planning for multi-agentpickup and delivery. In International Conference on Autonomous Agents and MultiagentSystems (pp. 2253–2255). (Cit. on pp. 144, 145, 193).

202

Liu, Z., Zhou, S., Wang, H., Shen, Y., Li, H., & Liu, Y. (2019). A hierarchical framework forcoordinating large-scale robot networks. In IEEE International Conference on Roboticsand Automation (pp. 6672–6677). (Cit. on p. 193).

Luna, R., & Bekris, K. E. (2011). Push and Swap: Fast cooperative path-finding with complete-ness guarantees. In International Joint Conference on Artificial Intelligence (pp. 294–300).(Cit. on p. 40).

Ma, H., Honig, W., Cohen, L., Uras, T., Xu, H., Kumar, T. K. S., Ayanian, N., & Koenig, S.(2017). Overview: A hierarchical framework for plan generation and execution in multi-robot systems. IEEE Intelligent Systems, 32(6), 6–12. (Cit. on p. 1).

Ma, H., & Koenig, S. (2016). Optimal target assignment and path finding for teams of agents.In International Conference on Autonomous Agents and Multiagent Systems (pp. 1144–1152). (Cit. on p. 89).

Ma, H., & Koenig, S. (2017). AI buzzwords explained: Multi-agent path finding (MAPF). AIMatters, 3(3), 15–19. (Cit. on pp. 3, 7, 14, 15, 37).

Ma, H., Kumar, T. K. S., & Koenig, S. (2017). Multi-agent path finding with delay probabilities.In AAAI Conference on Artificial Intelligence (pp. 3605–3612). (Cit. on p. 43).

Ma, H., Li, J., Kumar, T. K. S., & Koenig, S. (2017). Lifelong multi-agent path finding foronline pickup and delivery tasks. In International Conference on Autonomous Agents andMultiagent Systems (pp. 837–845). (Cit. on pp. 119, 176).

Ma, H., & Pineau, J. (2015). Information gathering and reward exploitation of subgoals forPOMDPs. In AAAI Conference on Artificial Intelligence (pp. 3320–3326). (Cit. on p. 44).

Ma, H., Wagner, G., Felner, A., Li, J., Kumar, T. K. S., & Koenig, S. (2018a). Multi-agentpath finding with deadlines. In International Joint Conference on Artificial Intelligence(pp. 417–423). (Cit. on pp. 7, 43, 73).

Ma, H., Wagner, G., Felner, A., Li, J., Kumar, T. K. S., & Koenig, S. (2018b). Multi-agent pathfinding with deadlines: Preliminary results. In International Conference on AutonomousAgents and Multiagent Systems (pp. 2004–2006). (Cit. on p. 43).

Ma, H., Yang, J., Cohen, L., Kumar, T. K. S., & Koenig, S. (2017). Feasibility study: Movingnon-homogeneous teams in congested video game environments. In AAAI Conference onArtificial Intelligence and Interactive Digital Entertainment (pp. 270–272). (Cit. on pp. 1,91).

Ma, H., Harabor, D., Stuckey, P. J., Li, J., & Koenig, S. (2019). Searching with consistentprioritization for multi-agent path finding. In AAAI Conference on Artificial Intelligence(pp. 7643–7650). (Cit. on p. 51).

203

Ma, H., Honig, W., Kumar, T. K. S., Ayanian, N., & Koenig, S. (2019). Lifelong path planningwith kinematic constraints for multi-agent pickup and delivery. In AAAI Conference onArtificial Intelligence (pp. 7651–7658). (Cit. on p. 153).

Ma, H., Koenig, S., Ayanian, N., Cohen, L., Honig, W., Kumar, T. K. S., Uras, T., Xu, H., Tovey,C., & Sharon, G. (2016). Overview: Generalizations of multi-agent path finding to real-world scenarios. In IJCAI-16 Workshop on Multi-Agent Path Finding. (Cit. on pp. 3, 7, 14,15, 37).

Ma, H., Tovey, C., Sharon, G., Kumar, T. K. S., & Koenig, S. (2016). Multi-agent path findingwith payload transfers and the package-exchange robot-routing problem. In AAAI Confer-ence on Artificial Intelligence (pp. 3166–3173). (Cit. on p. 60).

MacAlpine, P., Price, E., & Stone, P. (2015). SCRAM: Scalable collision-avoiding role assign-ment with minimal-makespan for formational positioning. In AAAI Conference on Artifi-cial Intelligence (pp. 2096–2102). (Cit. on p. 91).

Masehian, E., & Nejad, A. H. (2009). Solvability of multi robot motion planning problems ontrees. In IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 5936–5941). (Cit. on p. 40).

Mataric, M., Nilsson, M., & Simsarin, K. (1995). Cooperative multi-robot box-pushing. InIEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 556–561). (Cit.on p. 1).

Melo, F. S., & Veloso, M. (2011). Decentralized MDPs with sparse interactions. Artificial Intel-ligence, 175(11), 1757–1789. (Cit. on p. 44).

Morris, R., Pasareanu, C., Luckow, K., Malik, W., Ma, H., Kumar, T. K. S., & Koenig, S. (2016).Planning, scheduling and monitoring for airport surface operations. In AAAI-16 Workshopon Planning for Hybrid Systems (pp. 608–614). (Cit. on pp. 1, 2, 121).

Narayanan, V., Phillips, M., & Likhachev, M. (2012). Anytime safe interval path planning fordynamic environments. In IEEE/RSJ International Conference on Intelligent Robots andSystems (pp. 4708–4715). (Cit. on p. 158).

Nguyen, V., Obermeier, P., Son, T. C., Schaub, T., & Yeoh, W. (2017). Generalized target assign-ment and path finding using answer set programming. In International Joint Conferenceon Artificial Intelligence (pp. 1216–1223). (Cit. on pp. 115, 193).

Okumura, K., Machida, M., Defago, X., & Tamura, Y. (2019). Priority inheritance with back-tracking for iterative multi-agent path finding. In International Joint Conference on Artifi-cial Intelligence (pp. 535–542). (Cit. on pp. 132, 145, 193).

Parker, L. E. (1998). ALLIANCE: An architecture for fault tolerant multirobot cooperation.IEEE Transactions on Robotics and Automation, 14(2), 220–240. (Cit. on p. 122).

204

Parker, L. E. (1999). Cooperative robotics for multi-target observation. Intelligent Automationand Soft Computing, 5(1), 5–19. (Cit. on pp. 58, 122).

Pecora, F., Andreasson, H., Mansouri, M., & Petkov, V. (2018). A loosely-coupled approachfor multi-robot coordination, motion planning and control. In International Conference onAutomated Planning and Scheduling (pp. 485–493). (Cit. on pp. 1, 121).

Phillips, M., & Likhachev, M. (2011). SIPP: Safe interval path planning for dynamic environ-ments. In IEEE International Conference on Robotics and Automation (pp. 5628–5635).(Cit. on pp. 155, 175).

Poduri, S., & Sukhatme, G. S. (2004). Constrained coverage for mobile sensor networks. InIEEE International Conference on Robotics and Automation (pp. 165–171). (Cit. on p. 1).

Preiss, J. A., Honig, W., Ayanian, N., & Sukhatme, G. S. (2017). Downwash-aware trajectoryplanning for large quadrotor teams. In IEEE/RSJ International Conference on IntelligentRobots and Systems (pp. 250–257). (Cit. on pp. 1, 91).

Roger, G., & Helmert, M. (2012). Non-optimal multi-agent pathfinding is solved (since 1984).In International Symposium on Combinatorial Search (pp. 173–174). (Cit. on pp. 11, 38,61).

Ratner, D., & Warmuth, M. (1986). Finding a shortest solution for the N × N extension of the15-puzzle is intractable. In AAAI Conference on Artificial Intelligence (pp. 168–172). (Cit.on pp. 7, 11, 38, 61).

Rohmer, E., Singh, S. P. N., & Freese, M. (2013). V-REP: A versatile and scalable robot simula-tion framework. In IEEE/RSJ International Conference on Intelligent Robots and Systems(pp. 1321–1326). (Cit. on p. 185).

Rus, D., Donald, B., & Jennings, J. (1995). Moving furniture with teams of autonomous robots.In IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 235–242).(Cit. on p. 1).

Ryan, M. (2008). Exploiting subgraph structure in multi-robot path planning. Journal of Artifi-cial Intelligence Research, 31, 497–542. (Cit. on p. 41).

Ryan, M. (2010). Constraint-based multi-robot path planning. In IEEE International Conferenceon Robotics and Automation (pp. 922–928). (Cit. on p. 41).

Sajid, Q., Luna, R., & Bekris, K. E. (2012). Multi-agent pathfinding with simultaneous executionof single-agent primitives. In International Symposium on Combinatorial Search. (Cit. onp. 40).

Salvado, J., Krug, R., Mansouri, M., & Pecora, F. (2018). Motion planning and goal assignmentfor robot fleets using trajectory optimization. In IEEE/RSJ International Conference onIntelligent Robots and Systems (pp. 7939–7946). (Cit. on pp. 1, 121).

205

Scharpff, J., Roijers, D. M., Oliehoek, F. A., Spaan, M. T. J., & de Weerdt, M. M. (2016). Solvingtransition-independent multi-agent MDPs with sparse interactions. In AAAI Conference onArtificial Intelligence (pp. 3174–3180). (Cit. on p. 44).

Shapley, L. S., & Shubik, M. (1971). The assignment game I: The core. International Journal ofGame Theory, 1(1), 111–130. (Cit. on pp. 7, 11, 15, 57, 61).

Sharon, G. (2015). Novel search techniques for path finding in complex environment (Doctoraldissertation, Ben-Gurion University). (Cit. on p. 43).

Sharon, G., Stern, R., Felner, A., & Sturtevant, N. R. (2015). Conflict-based search for optimalmulti-agent pathfinding. Artificial Intelligence, 219, 40–66. (Cit. on pp. 14, 42, 44, 47–50,77, 96, 97, 109, 113).

Sharon, G., Stern, R., Goldenberg, M., & Felner, A. (2013). The increasing cost tree search foroptimal multi-agent pathfinding. Artificial Intelligence, 195, 470–495. (Cit. on p. 42).

Silver, D. (2005). Cooperative pathfinding. In AAAI Conference on Artificial Intelligence andInteractive Digital Entertainment (pp. 117–122). (Cit. on pp. 37, 41, 44–46).

Smith, B. S., Egerstedt, M., & Howard, A. (2009). Automatic generation of persistent forma-tions for multi-agent networks under range constraints. Mobile Networks and Applications,14(3), 322–335. (Cit. on p. 1).

Solovey, K., & Halperin, D. (2016). On the hardness of unlabeled multi-robot motion planning.The International Journal of Robotics Research, 35(14), 1750–1759. (Cit. on p. 26).

Srivastava, S., Fang, E., Riano, L., Chitnis, R., Russell, S., & Abbeel, P. (2014). Combined taskand motion planning through an extensible planner-independent interface layer. In IEEEInternational Conference on Robotics and Automation (pp. 639–646). (Cit. on p. 8).

Standley, T. S. (2010). Finding optimal solutions to cooperative pathfinding problems. In AAAIConference on Artificial Intelligence (pp. 173–178). (Cit. on p. 42).

Stern, R., Goldenberg, M., & Felner, A. (2017). Shortest path for K goals. In International Sym-posium on Combinatorial Search (pp. 167–168). (Cit. on p. 165).

Stern, R., Sturtevant, N. R., Felner, A., Koenig, S., Ma, H., Walker, T. T., Li, J., Atzmon, D.,Cohen, L., Kumar, T. K. S., Boyarski, E., & Bartak, R. (2019). Multi-agent pathfind-ing: Definitions, variants, and benchmarks. In International Symposium on CombinatorialSearch. (Cit. on pp. 7, 14, 15).

Sturtevant, N. R., & Buro, M. (2006). Improving collaborative pathfinding using map abstrac-tion. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment(pp. 80–85). (Cit. on p. 41).

206

Surynek, P. (2009). A novel approach to path planning for multiple robots in bi-connectedgraphs. In IEEE International Conference on Robotics and Automation (pp. 3613–3619).(Cit. on p. 40).

Surynek, P. (2010). An optimization variant of multi-robot path planning is intractable. In AAAIConference on Artificial Intelligence (pp. 1261–1263). (Cit. on pp. 7, 11, 20, 39, 61, 67).

Surynek, P. (2012). Towards optimal cooperative path planning in hard setups through satisfia-bility solving. In Pacific Rim International Conference on Artificial Intelligence (pp. 564–576). (Cit. on p. 39).

Surynek, P. (2015). Reduced time-expansion graphs and goal decomposition for solving coopera-tive path finding sub-optimally. In International Joint Conference on Artificial Intelligence(pp. 1916–1922). (Cit. on p. 40).

Surynek, P., Felner, A., Stern, R., & Boyarski, E. (2016). Efficient SAT approach to multi-agentpath finding under the sum of costs objective. In European Conference on Artificial Intel-ligence (pp. 810–818). (Cit. on p. 40).

Surynek, P., Felner, A., Stern, R., & Boyarski, E. (2017). Modifying optimal SAT-based ap-proach to multi-agent path-finding problem to suboptimal variants. In International Sym-posium on Combinatorial Search (pp. 169–170). (Cit. on p. 40).

Suzuki, I., & Kasami, T. (1985). A distributed mutual exclusion algorithm. ACM Transactionson Computer Systems, 3(4), 344–349. (Cit. on pp. 126, 130).

Svancara, J., Vlk, M., Stern, R., Atzmon, D., & Bartak, R. (2019). Online multi-agent pathfind-ing. In AAAI Conference on Artificial Intelligence (pp. 7732–7739). (Cit. on p. 193).

Tanner, H. G., Pappas, G. J., & Kumar, V. (2004). Leader-to-formation stability. IEEE Transac-tions on Robotics and Automation, 20(3), 443–455. (Cit. on p. 1).

Thurston, T., & Hu, H. (2002). Distributed agent architecture for port automation. In IEEE Com-puter Society Signature Conference on Computers, Software and Applications (pp. 81–87). (Cit. on p. 1).

Tovey, C. (1984). A simplified NP-complete satisfiability problem. Discrete Applied Mathemat-ics, 8, 85–90. (Cit. on pp. 13, 62, 63, 69).

Tovey, C., Lagoudakis, M., Jain, S., & Koenig, S. (2005). The generation of bidding rules forauction-based robot coordination. In Multi-Robot Systems. From Swarms to IntelligentAutomata (Chap. 1, Vol. 3, pp. 3–14). Springer. (Cit. on p. 91).

Turpin, M., Michael, N., & Kumar, V. (2014). CAPT: Concurrent assignment and planning oftrajectories for multiple robots. The International Journal of Robotics Research, 33(1),98–112. (Cit. on p. 91).

207

Turpin, M., Mohta, K., Michael, N., & Kumar, V. (2014). Goal assignment and trajectory plan-ning for large teams of interchangeable robots. Autonomous Robots, 37(4), 401–415. (Cit.on pp. 5, 8, 122).

Veloso, M., Biswas, J., Coltin, B., & Rosenthal, S. (2015). CoBots: Robust symbiotic au-tonomous mobile service robots. In International Joint Conference on Artificial Intelli-gence (pp. 4423–4429). (Cit. on pp. 1, 62, 121).

Wagner, G. (2015). Subdimensional expansion: A framework for computationally tractable mul-tirobot path planning (Doctoral dissertation, Carnegie Mellon University). (Cit. on p. 42).

Wagner, G., & Choset, H. (2015). Subdimensional expansion for multirobot path planning. Arti-ficial Intelligence, 219, 1–24. (Cit. on p. 42).

Wagner, G., & Choset, H. (2017). Path planning for multiple agents under uncertainty. In In-ternational Conference on Automated Planning and Scheduling (pp. 577–585). (Cit. onp. 43).

Wagner, G., Choset, H., & Ayanian, N. (2012). Subdimensional expansion and optimal taskreassignment. In International Symposium on Combinatorial Search (pp. 177–178). (Cit.on p. 91).

Wang, K. (2012). Scalable cooperative multi-agent pathfinding with tractability and complete-ness guarantees (Doctoral dissertation, The Australian National University). (Cit. onp. 41).

Wang, K., & Botea, A. (2008). Fast and memory-efficient multi-agent pathfinding. In Interna-tional Conference on Automated Planning and Scheduling (pp. 380–387). (Cit. on p. 40).

Wang, K., & Botea, A. (2011). MAPP: A scalable multi-agent path planning algorithm withtractability and completeness guarantees. Journal of Artificial Intelligence Research, 42,55–90. (Cit. on pp. 5, 9, 40, 122).

Werger, B. B., & Mataric, M. J. (2000). Broadcast of local eligibility for multi-target observation.In Distributed Autonomous Robotic Systems 4 (pp. 347–356). Springer. (Cit. on pp. 58,122).

Wurman, P. R., D’Andrea, R., & Mountz, M. (2008). Coordinating hundreds of cooperative,autonomous vehicles in warehouses. AI Magazine, 29(1), 9–20. (Cit. on pp. 1, 2, 115, 145,177).

Yakovlev, K., & Andreychuk, A. (2017). Any-angle pathfinding for multiple agents based onSIPP algorithm. In International Conference on Automated Planning and Scheduling(pp. 586–593). (Cit. on p. 158).

Yu, J. (2016). Intractability of optimal multi-robot path planning on planar graphs. IEEERobotics and Automation Letters, 1(1), 33–40. (Cit. on p. 39).

208

Yu, J. (2017). Expected constant-factor optimal multi-robot path planning in well-connected en-vironments. In International Symposium on Multi-Robot and Multi-Agent Systems (pp. 48–55). (Cit. on pp. 5, 9, 40, 122).

Yu, J., & LaValle, S. M. (2013a). Multi-agent path planning and network flow. In AlgorithmicFoundations of Robotics X, Springer Tracts in Advanced Robotics (Vol. 86, pp. 157–173).Springer. (Cit. on pp. 7, 8, 13, 14, 22, 57, 58, 61, 85, 86, 108).

Yu, J., & LaValle, S. M. (2013b). Planning optimal paths for multiple robots on graphs. In IEEEInternational Conference on Robotics and Automation (pp. 3612–3617). (Cit. on pp. 39,40, 44, 51, 52, 54, 79, 93, 105, 109).

Yu, J., & LaValle, S. M. (2013c). Structure and intractability of optimal multi-robot path planningon graphs. In AAAI Conference on Artificial Intelligence (pp. 1444–1449). (Cit. on pp. 7,11, 20, 39, 61, 67).

Yu, J., & LaValle, S. M. (2016). Optimal multi-robot path planning on graphs: Complete algo-rithms and effective heuristics. IEEE Transactions on Robotics, 32(5), 1163–1177. (Cit.on p. 37).

Yu, J., & Rus, D. (2015). Pebble motion on graphs with rotations: Efficient feasibility testsand planning algorithms. In Algorithmic Foundations of Robotics XI, Springer Tracts inAdvanced Robotics (Vol. 107, pp. 729–746). Springer. (Cit. on pp. 20, 26, 39, 49, 54, 61,100).

Zheng, X., & Koenig, S. (2009). K-swaps: Cooperative negotiation for solving task-allocationproblems. In International Joint Conference on Artifical Intelligence (pp. 373–378). (Cit.on p. 91).

209

Documents

TARGET ASSIGNMENT AND PATH PLANNING FOR …idm-lab.org/bib/abstracts/papers/dissertation-ma.pdfbers in my dissertation committee and proposal guidance committee: Nora Ayanian, Gaurav