RANDOM DISAMBIGUATION PATHS: MODELS, ALGORITHMS, …

RANDOM DISAMBIGUATION PATHS: MODELS,

ALGORITHMS, AND ANALYSIS

by

Xugang Ye

A dissertation submitted to The Johns Hopkins University in conformity with the

requirements for the degree of Doctor of Philosophy.

Baltimore, Maryland

December, 2008

ii

Abstract

The main problem considered in this thesis is to navigate an agent that is capable of

disambiguating to safely and swiftly traverse a terrain populated with possible hazardous

regions that can overlap. The problem has three main features. First, the planning is made

under uncertainty but without blindness. Second, the agent can disambiguate the

true-false status of any potential hazard as it approaches the vicinity. Third, the

replanning can be made with less uncertainty if there is new information collected en

route. We formulate this problem as a dynamic shortest path problem in directed,

positively weighted graph, that is, any plan is a shortest path from the agent’s location to

the target node in the graph. Each time when a plan is made, the agent moves accordingly

until it encounters uncertainty. The agent then disambiguates, at a cost, the local

uncertainty, adds the disambiguation result(s) to the knowledge of the world, and replans

a new shortest path from where it is to the goal.

In consideration of real-world practice and simulation efficiency, we apply the A*

algorithm for the deterministic shortest path subproblem. We also give the A* algorithm

a new explanation within the primal-dual framework. For the terrain, we assume that

besides the natural topological information, which facilitates the use of the A* search,

iii

there is also additional prior information regarding the likelihood of the true-false status

of the potential hazards. This additional prior information forms the initial knowledge of

the world. We present a navigation policy called CR and its various versions. The policy

integrates the prior information and the information collected en route. We provide both

theoretical and experimental results under different settings. As part of this dissertation

research, a computer program that simulates an agent to traverse a minefield was

developed. As an important tool, the program, combined with Monte Carlo simulations,

helps us dealing with complicated real-world scenarios that are beyond the reach of any

known analytical method. As an application, we used the program to process the Navy’s

reconnaissance data and obtained exiting results.

iv

Acknowledgements

First, I would like to thank my wife Judy H. Wang. She accompanied me through the

hardship of Ph.D. years. She is priceless wealth in my life. Next, I would like to thank

my mother for her great patience in taking care of my little daughter Vicky. Without her,

it was impossible for me and my wife to pursue our academic goals.

I would like to express my deepest appreciation to my advisor Prof. Shih-Ping Han for

his academic guidance. He has been not only a great mentor, but also a great friend. He

patiently spent countless hours in improving my analytical skill. I am grateful for

everything I learned from him and everything he has done for me.

I would like to express my great gratitude to my co-advisor Prof. Carey Priebe, who not

only provided me continual financial support with his ONR grants but also introduced

me into the RDP project that forms the subject of my dissertation research. He was

always willing to help and he always had great judgement. He provided valuable insight

and suggestions during the development of this dissertation.

I would like to send tons of thanks to Prof. Donniell Fishkind and Prof. Lowell Abrams

v

for their great helps to my efforts in developing the computer program for RDP

simulations and in transforming the technical reports into professional academic papers.

For me, they are not just two project supervisors in our RDP team, they are my good

friends. I feel lucky to have such two friends who are not only knowledgeable but also

considerate.

I would like to thank Prof. Daniel Q. Naiman, Prof. Edward Scheinerman, Prof.

Benjamin Hobbs, and Prof. Justin Williams for serving my Ph.D. candidacy exam and

graduate board oral exam. I would also like to thank Dr. Castello for her kind help.

Finally, I would like to thank the Johns Hopkins University Center for Imaging Science

for providing the JHU CIS 16cpu/128GB computational server for my RDP simulations

and extensive experiments.

vi

Contents Page

Abstract ............................................................................................................................... ii

Acknowledgements ............................................................................................................ iv

List of Tables .................................................................................................................... viii

List of Figures .................................................................................................................... ix

1 Introduction .................................................................................................................. 1

1.1 RDP Problems ................................................................................................... 2

1.2 Story of RDP Research ...................................................................................... 5

1.3 Classical Path Planning ..................................................................................... 7

1.4 Recent Developments ........................................................................................ 9

1.4.1 Terrain Modeling .................................................................................. 10

1.4.2 Search Algorithms ................................................................................. 12

1.5 Contributions of This Dissertation Research ................................................... 15

1.6 Organization of the Thesis ............................................................................... 17

2 The A* Algorithm ...................................................................................................... 20

2.1 Best-First Search ............................................................................................. 20

2.2 Primal-Dual ..................................................................................................... 23

2.3 Derivation of the A* ........................................................................................ 26

2.4 Duality ............................................................................................................. 34

vii

2.5 Heuristics ......................................................................................................... 35

2.6 Bidirectional Search ........................................................................................ 39

3 Traversing Probabilistic Graphs ................................................................................. 42

3.1 Probability Markers ......................................................................................... 43

3.2 The CR Policy ................................................................................................. 45

3.3 Parallel Graph .................................................................................................. 49

4 Mark Information via Sensor ..................................................................................... 63

4.1 Setting .............................................................................................................. 64

4.2 Sensor Monotonicity ....................................................................................... 66

4.3 Threshold and Penalty Policies ....................................................................... 67

5 Traversing Minefield .................................................................................................. 77

5.1 Minefield Model .............................................................................................. 78

5.2 Experimental Setting and Results .................................................................... 83

5.3 The COBRA Data ............................................................................................ 95

6 Deterministic Shortest Path...................................................................................... 101

6.1 Sensor Classification ..................................................................................... 102

6.2 Applied to Minefield ..................................................................................... 104

7 Summary, Conclusion, and Future Research ........................................................... 106

7.1 Summary and Conclusions ............................................................................ 106

7.2 Future Research ............................................................................................. 108

Bibliography ....................................................................................................................112

viii

List of Tables

Table 5.2.1: 1-sided Kolmogorov-Smirnov tests for pairwise comparisons of the

distributions of samples…………………………………………………………….

91

Table 5.2.2: 1-sided t tests for pairwise comparisons of the means of samples……..... 92

Table 5.2.3: Hypothesis tests for comparing the distributions and means of samples

associated with λ3 = 1.0 and λ6 = 2.5……………………………………………….

94

Table 5.3.1: x, y-coordinates of the risk centers and the associated markers in the

COBRA data………………………………………………………………………..

96

.

ix

List of Figures

Figure 2.5.1: A example that the primal-dual algorithm starting from some π(0) = h′

does not terminate, where the length of any arc is 1 and the heuristic function h′

is {h′(s) = 0, h′(u) = 0, h′(t) = 0, and h′(vi) = i for i = 1, 2, …}……………………

38

Figure 3.3.1: An example of a general (nonparalell) graph where the optimal policy

requires a balk……………………………………………………………………..

51

Figure 3.3.2: The decision tree of the balk-free policy a1→ a2 → … → am+1 for

traversing parallel graph…………………………………………………………...

52

Figure 3.3.3: The dynamic programming search tree for finding the optimal policy

for traversing the parallel graph in which A = {a} and B = {e1,

e2}.………………………………………................................................................

58

Figure 4.3.1: Analysis of the convergent graph G with single nondeterministic

arc………………………………………………………………………………….

74

Figure 5.1.1: The grid representation, with 8-adjacency…………….......................... 79

Figure 5.2.1: A collection of m = 100 detections…………………………………..... 84

Figure 5.2.2: A realization of trajectory in a real terrain (upper left) and in one of its

marked map (upper right) with two close-up views (lower left and lower right) in

the real terrain……………………………………………………………………...

86

x

Figure 5.2.3: A realization of another trajectory in the same real terrain (upper left)

and in one of its marked map (upper right) with two close-up views (lower left

and lower right) in the real terrain…………………………………………………

87

Figure 5.2.4: Graphic statistical results of the data from the experiments

conditioning on terrain T1………………………………………………………….

90

Figure 5.2.5: Graphic statistical results of the data from the unconditional

experiments………………………………………………………………………..

90

Figure 5.3.1: The COBRA terrain (left) and the projected COBRA terrain (right)…. 96

Figure 5.3.2: Three trajectories under Cd = 5, 50, 500 displayed in the real terrain

(left plots) and in the originally marked map (right plots)………………………...

97

Figure 5.3.3: Three trajectories under λ = 0, 0.4, 0.5 in the real terrain (left plots)

and in the associated marked maps (right plots)…………………………………..

99

Figure 5.3.4: Plot of total cost vs. improvement parameter for COBRA runs………. 100

Figure 6.2.1: Deterministic shortest paths vs. nondeterministic traversals under the

experimental setting of section 5.2………………………………………………...

105

Figure 7.2.1: Example of minefield setting with costal geography incorporated…… 109

1

1 Introduction

This dissertation is centered at a research project called Random Disambiguation Paths

(RDP). The project is supported by Office of Naval Research (ONR). The main problem,

posed by the Coastal Battlefield Reconnaissance and Analysis (COBRA) Group, is to

navigate a combat unit safely and swiftly through a costal environment with mine threats

and to reach a preferable target location.

The COBRA system consists of three primary components ⎯ the COBRA Airborne

Payload, the Tactical Control Software (TCS), and the COBRA Processing Station. The

COBRA Airborne Payload consists of a multi-spectral sensor system that is placed on an

unmanned aerial vehicle (UAV) to conduct reconnaissance and detect threats. The TCS

that is loaded onto the UAV Ground Control Station controls the COBRA Airborne

Payload. Analysis of the data collected by the COBRA Airborne Payload is conducted at

the COBRA Processing Station. A good navigation algorithm, as a function unit of

COBRA Processing Station, plays an important role in Marine Corps’ operational

Maneuvers.

This dissertation research is conducted on RDP modeling and algorithm designing. As

2

the continual effort of the Johns Hopkins University RDP group, this work explores

reasonable RDP models and proposes practical RDP algorithms. There are mainly two

tracks of this research. One is theoretical analysis; the other is experimental, or numerical,

analysis. The theoretical analysis is developed on simple settings. The goal of the

theoretical analysis is to capture important features of RDP models and algorithms and to

provide guidelines for designing methods suitful for complicated real-world scenarios.

The experimental analysis is performed upon much more complicated settings. The goal

of the experimental analysis is to simulate the real-world scenarios and to provide

numerical and statistical evidences of the efficiency and effectiveness of the proposed

algorithms.

1.1 RDP Problems

A RDP problem in broad setting is to navigate an agent that is capable of disambiguation

to safely and swiftly traverse a terrain populated with possible hazardous regions that can

overlap. As prior information, each region is marked with the likelihood that it is a true

hazard. The agent is assumed to be able to disambiguate, at a cost, the true-false status of

a marked region as it reaches the boundary of the region. The meaning of disambiguation

cost lies in the fact that disambiguation slowdowns the agent. Hence when the agent

disambiguates a potential hazardous region, we may think the agent travels additional

3

distance besides the Euclidean distance and the cost is additive to the Euclidean distance

the agent has traveled. The agent should safely reach a target location with minimum

total cost.

The problem has three main features. First, the agent travels in an uncertain environment.

However, the agent is not totally blind since the likelihood markers at least provide the

pre-knowledge of the terrain. Second, the agent can disambiguate the true-false status of

any potential hazard in vicinity as it travels. Third, new information collected en route

enlarges the knowledge of the world and henceforth any new decision making may face

less uncertainty.

Different versions of the problem can arise from different settings. For example, in a

manner that is relatively convenient for theoretical investigation, one may model the

problem as traversing a directed graph that contains independently marked

nondeterministic arcs. In this setting, the agent can be assumed to be able to

disambiguate a nondeterministic arc once it reaches the tail node of the arc. Although

some practical thought may argue that the disambiguation can happen at somewhere

other than the tail node of the arc, the arc, however, can be split so that the assumption is

still applicable.

4

Compared with the graph model with assumption of independent arc-markers, a

minefield setting is much more complicated. In a typical minefield model, the markers

are initially allocated to some disk-shape regions that may overlap. A discretization

process usually generates a directed graph with a lot of its nondeterministic arcs

dependently marked.

Since the disambiguation cost constitutes part of the total cost of traversal, the

assumption on how disambiguation cost is calculated forms an important feature of a

RDP problem. In the graph model with independent-arc-marker assumption, the cost of

disambiguating a nondeterministic arc can simply be viewed as a given parameter. In a

minefield model, we are usually given the cost of disambiguating each disk-shape region.

If we construct a graph to discretize the world, we need to somehow carry the

information to the graph. Also, there may be some constraint on the agent’s

disambiguation capability (e.g., the agent can at most make a certain number of

disambiguations or the agent can at most afford a certain cost in disambiguations).

Marine Corps’ practice also poses the challenging problem in which the target location is

changeable when the agent travels. A reasonable thought is to replace the single target

location with a set of target locations. In minefield consideration, the set of target

5

locations can be a target region. Hence the mission is accomplished as long as the agent

is inside the target region.

More challenging problem can arise when there is more than one agent to navigate. An

issue of a multi-agent RDP problem is that the information collected by an agent can be

shared by other agents. Hence there is more complexity due to the communication among

the agents.

1.2 Story of RDP Research

The effort of the Johns Hopkins University RDP group has been focused on the RDP

problem with fixed target and single agent. Early work also assumes the availability of

the risky regions’ probabilities of nontraversability, which are given to the agent at the

outset. The object RDP was first introduced in Priebe et al. 2005 [1]. An important result

is that under mild assumption, an RDP, with positive probability, strictly reduces

expected cost of traversal compared with any deterministic shortest path. This result

suggests that a RDP algorithm should be able to exploit this benefit as long as it exists.

Both the work in [1] and the follow-up development by Fishkind et al. 2007 [2] explored

the methods of finding a policy that yields small expected cost. Currently known

methods usually take three steps. The first step is to discretize the world by directed

6

graph. In [2], the tangent arc graph (TAG) is applied to the minefield setting. TAG is a

precise map representation. The downside is that constructing a TAG is computationally

demanding when the number of mine detections is large. The second step is to assign

weights to the arcs of the graph. This step is the most important since the weight function

largely determines the quality of the final traversal. This step actually reflects how a

policy uses the makers. The third step is to invoke an efficient shortest path algorithm to

compute a shortest path from where the agent is to the target node in the graph. In [2], the

Dijkstra algorithm with binary heap implementation was used. In this step, the efficiency

of implementation also depends on the data structure rendered in the first step.

Functionally, completion of the third step specifies the plan of the agent’s next move. If

the planned next move is risk-free, then the agent moves on; otherwise, the agent

disambiguates. The disambiguation result(s) will be incorporated into the new weight

function and both the second and third steps will be repeated.

A shortcoming of the weight function in [2] is that it does not contain the disambiguation

cost. In [3], a weight function called CR was introduced. The author of this thesis (as the

co-author of [3]) proved that CR weight function yields the minimum expected cost in a

special setting in which the graph is parallel graph (with only two nodes s and t) and the

nondeterministic arcs are independently marked with the probabilities of

7

nontraversability. Although this result is very limited, it motivates us to apply the CR

weight function to general settings that are much closer to the real-world scenarios.

[4] was mainly done by the author of this thesis. The starting point of this paper is that in

many practices, the markers only represent the estimates of the underlying true-false

status of the potential hazards. And the markers are actually obtained from a sensor’s

readings. Hence a natural question is: does the improvement of the sensor lead to the less

cost? It turns out that the intuitive answer “yes” does not have a trivial validation. The

focus of [4] is henceforth on the sensor monotonicity. Due to the requirement of intensive

Monte Carlo simulations, the grid graph and the A* algorithm with binary heap

implementation were invoked. Massive Monte Carlo simulations under minefield setting

did produce the numerical monotonicity results, which strongly complement two

analytical monotonicity results under simple settings.

1.3 Classical Path Planning

In literature, path planning is concerned with finding paths connecting different locations

in an environment. If the environment takes the form of a graph, in which the nodes (or

vertices) are defined as the locations and the weights of the arcs (or edges) are defined as

the transition costs, then path planning falls into the range of the classical shortest path

8

problems (See Cormen et al. 2001 [5] and Ahuja et al. 1993 [6] for a comprehensive

survey). In the field of Artificial Intelligence (AI), path planning deals with computing

desired paths in a geometric space embedded with forbidden areas or risky regions. One

of the most fundamental geometric path planning problems is to find a shortest path in a

plane populated by a finite number of pre-known static polygonal obstacles, without

passing any interior point of any obstacle. This problem was initially solved by

constructing a visibility graph and invoking a shortest path algorithm on the graph (see

Lozano-Perez and Wesley 1979 [7]). This approach fueled intensive research on

computing visibility graphs. For the worst-case time complexity of computing a visibility

graph see Welzl 1985 [8] and Ghosh and Mount 1991 [9]. In classical path planning, a

visibility graph is usually coupled with the Dijkstra’s algorithm that is implemented with

heap data structure. Reprehensive heap implementations include binary heap, Fibonacci

heap, and radix heap etc. The detailed information on worst-case time complexity can be

found in [5], [6], [10], [11]. Besides the methods that are based on visibility graphs, there

are also shortest path map approaches (e.g., Mitchell et al. 1993 [12]). Quite often, a

quad-tree-style subdivision of the plane (see Bern et al. 1990 [13]) is employed. A

representative composite method that combines the shortest path map approach and the

quad-tree-style subdivision of the plane was provided by Hershberger and Suri 1993

[14].

9

1.4 Recent Developments

Since 1990’s, path planning has found its practical applications in real time strategy (RTS)

computer games (e.g., Command and Conquer and the Age of Empires) and the

real-world navigation systems (e.g., planet rovers, combat ships, and ground armored

vehicles). To practitioners, the challenges of incorporating the existing geometric shortest

path algorithms into those applications are overwhelmingly strong. For example, other

than just considering the static polygonal obstacles on a perfectly “flat” land, the

practical path planning algorithms must be able to deal with very complicated obstacle

shapes, non-flat landscapes, and, if possible, the dynamic environment. The cost of a path

may also depend on more “general” factors than the Euclidian distance. Those factors

may include the types of areas passed through, slopes, turning angles etc (see Chen 1996

[15]). Many known geometric shortest path algorithms are very environment-specific and

depend on sophisticated data structures and geometric procedures (see Latombe 1991 [16]

and Hwang and Ahuja 1992 [17] for good surveys). Hence, it is necessary to develop

simple-to-implement yet reasonably efficient methods that work for more “general” path

planning systems. It is also highly desirable to develop such methods that are compatible

with some “standard” input (e.g., terrain matrices). Under this requirement, a practically

useful path planning method should include at least two features: 1) flexible terrain

modeling, 2) efficient and effective search algorithm.

10

1.4.1 Terrain Modeling

Terrain modeling is the preprocessing phase of the path planning. For the simple case

like polygonal forbidden regions in a flat plane, the exact representation of the world

could be used. Any method of this type needs to have a special data structure to store the

obstacle information (e.g., vertices and edges). In literature, methods that establish exact

representation of the world include visibility graph, Voronoi diagram, and triangulation

etc. Despite the accuracy, this class of methods has little practical application. The more

flexible mapping technique is the space discretization, which extensively appears in the

development of real-time strategy computer games and the recent path planning systems.

The central idea is to decompose the world into mutually exclusive cells regardless of the

obstacles and the area types. For each cell, the reachability or the traversability is defined

deterministically or even probabilistically. The transition cost from one cell to an

adjacent cell is also defined regarding the effort of the agent. For the agent, if a state is

defined as the cell in which the agent is located, then a path from the origin to the goal

can be defined as a sequence of non-dividable feasible state transitions from the initial

state to the target state. The most notable advantage is that the obstacles can be

approximated as the union of specifically labeled cells. The higher the grid resolution,

the higher the map accuracy. Another advantage is that a grid graph can be easily

extended if the size of the map needs to be enlarged. This is usually done with an

11

associated coordinate system. Typically, a cell can be a square, or a hexagon, or a triangle.

One can find in literature that the eight-connected square grid graph is very popular due

to its easy implementation and relatively efficient memory requirement. For more details

on the concepts and algorithms for building a grid network and its coordinate system, see

Grunbaum and Shephard 1986 [18] and Chavey 1989 [19]. An excellent web source can

be found at Patel 2006 [20].

Under some circumstance, some non-uniform grid graphs can be used. For example,

when a flat plane sparsely contains some obstacles, a quadtree (see Samet 1988 [21]) is

more efficient than a regular square grid graph (and also a visibility graph, or a Voronoi

diagram). The reason is that the large empty areas are only coded with very low

resolution, hence both the storage and the searching scale are reduced. The price is the

considerably increased complexity of data structure (see Kambhampati and Davis 1986

[22]). Another disadvantage of quadtree is that a path found with such map representation

is usually jagged. An improvement is to use a new representation called “framed”

quadtree (see Chen et al. 1997 [23] and Yahja et al. 1998 [24]), which is a modified, and

more complicated, version of quadtree. In framed quadtree, cells of the highest resolution

are added along the perimeter of each quadtree region. The non-dividable state transition

is redefined as the shift from one cell of the highest resolution to a neighbor cell of the

12

highest resolution within the same quadtree region. It has been empirically shown (see

[24]) that the path quality can be significantly improved if the quadtree is replaced by its

framed version. However, since the grid graph generated by a framed quadtree could be

much more dense (i.e. a node has much more incident arcs) than that generated by the

corresponding quadtree, a path planning search algorithm executed on such graph could

have much higher time complexity. It has also been empirically shown (see [24]) that the

framed quadtree usually is not advantageous over the regular square grid graph when the

environment is uniformly, highly cluttered.

1.4.2 Search Algorithms

Although the Dijkstra’s algorithm dominates the early path planning literature, recent

favor has been given to the A* algorithm (see Hart et al. 1968 [25], Nilsson 1980 [26],

Pearl 1984 [27], Russell 2003 [28] , Lester 2005 [29], and Patel 2006 [30]). Unlike the

Dijkstra’s search, which is blind, the A* search is informed. But A* search is applicable

only if there exists heuristic estimate of the “distance” from every node of the (directed)

graph to the target node. Provably, the final shortest path tree constructed by the A*

algorithm that uses a so-called consistent heuristic is smaller than that constructed by the

Dijkstra’s algorithm. Empirically, the A* algorithm is much more efficient than the

Dijikstra algorithm for finding a least-cost path from an origin to a goal in a graph that is

13

embedded in a Euclidean space. Although there is computational cost for evaluating the

heuristics, the benefit brought by the “informed” search overwhelms. In general, the time

complexity of the A* algorithm depends on the heuristic. A good heuristic has too-fold

meanings: first, it estimates the distance from every node of the graph to the target node

well and the estimate satisfies the triangle inequality; second, it is not expensive to

evaluate. A heuristic that is better in these two senses leads to less search effort.

Like the Dijkstra’s algorithm, the main problem of the A* algorithm is also the memory

requirement. This problem is serious when the graph is very large and the distance

between the origin and the goal is long. Several representative variants of the A*

algorithm that are memory bounded are IDA* (Korf 1985 [31]), MA* (Chakrabarti et al.

1989 [32]), SMA* (Russel 1992 [33]), and RBFS (Korf 1993 [34]) etc. They were

mainly designed for avoid exponential storage growth in game tree search (see [26], [27],

[28]). A very recent memory-saving variant of the A* algorithm is called Frontier A* (see

Korf 2005 [35]). This algorithm works for sparse graph and only returns the length of the

shortest path (not the path itself), after one run. To find the solution path, a

divide-and-conquer technique (also see [35]) is required, hence repeated A* search with

decreasing scale. It is advised that for the problem with moderate scale, the A* algorithm

is still the best choice as long as a good heuristic can be found.

14

When the environment of path planning is dynamic, the A* algorithm can be

implemented as its replanning version (or dynamic version), that is, finding a new

shortest path given the updated knowledge of the world. Extensive research efforts have

been put into the replanning problem where the target is fixed. Zelinsky 1992 [36]

adopted the brute-force A* replanning in which a shortest path from where the agent is to

the goal is found from scratch. Stentz 1994 [37] pointed out that reusing the information

gained by previous searches may improve the replanning efficiency when the

environment is expansive, the goal is far away, and the map update is very locally around

the agent’s location. The D* algorithm [37] was designed based on this point. A later

improved version is called Focused D* Algorithm (see Stentz 1995 [38]). Experiments

on partially known or unknown fractally generated large terrain have shown that the D*

replanning is far more efficient than the brute-force A* replanning. Besides the D*

algorithm, there is another functionally same but algorithmically different replanning

algorithm called D* Lite (see Koenig and Likhachev 2002 [39]), which is the “reversed”

version of an earlier algorithm called LPA* (see Koenig and Likhachev 2002 [40]). The

LPA* algorithm maintains a shortest path from the starting node to the target node in the

graph and it was developed from another algorithm called DynamicSWSF-FP (see

Ramalingam and Reps 1996 [41]), which maintains a shortest path from a single source

node to all the other nodes in the graph by processing the so-called inconsistent node list

15

in right order. It is advised that what replanning algorithm to choose for a replanning

problem should be based on the specific feature of the problem.

1.5 Contributions of This Dissertation Research

The main contribution of this dissertation research is a new formulation of the RDP

problem under the new concept of mark information and the new RDP algorithms. Also,

for minefield application, a fast, flexible RDP simulation program that is based on

dynamic A* search was delivered.

The theoretical contributions mainly include:

1) We found a new explanation of the A* algorithm based on the primal-dual framework.

More specifically, we have shown that if a consistent heuristic function is available,

then a special initial feasible solution to the dual model of the shortest path problem

can be constructed such that the primal-dual algorithm, with proper implementation,

becomes the A* algorithm.

2) We developed the concept of sensor and the new concept of mark information based

on the sensor’s readings. We proposed the threshold policy and the penalty policy,

both of which incorporate the markers and the disambiguation cost into the planning

and replanning.

16

3) We proved that the CR policy, which is a special penalty policy that uses the CR

weight function, is an optimal policy in the sense of smallest expected cost for

traversing the probabilistic parallel graph that has an independent probability marker

for each nondeterministic arc.

4) We proved that for the parallel graph with its nondeterministic arcs independently

marked, the threshold policy is weakly monotone with respect to sensor. We also

proved that for any convergent graph with a single nondeterministic arc, both the

threshold policy and the penalty policy are strongly monotone with respect to sensor.

The experimental contributions mainly include:

1) We developed a RDP simulation program that simulates an agent to traverse

minefield. The current version is RDP V2.2, which assumes a fixed target location.

An extended version that is being developed is RDP V2.2.1, which assumes a target

region. The A* algorithm that uses the Euclidean distance as the natural (consistent)

heuristic constitutes the central routine of the programs. The weight function is CR.

And the A* algorithm is implemented as its best-first search version with the Open

list maintained as a binary heap.

2) We performed extensive Monte Carlo Simulations to study the sensor monotonicity

in minefield model. We found the numerical and statistical evidences of the sensor

17

monotonicity from both the conditional experiments and the unconditional

experiments. We found, from the empirical distribution of the large samples, that an

adjusted CR policy, applied to a general minefield setting, is both weakly monotone

and strongly monotone.

3) We performed extensive Monte Carlo Simulations to study the distribution of the

deterministic shortest path in minefield model. From the unconditional experiments,

we found that the adjusted CR policy yields higher average cost than the average

length of the deterministic shortest paths when the quality of the sensor is poor. The

implication of this phenomenon is the trade-off between the sensor quality and the

deterministic shortest path. The trade-off should be quantified with some critical

value(s) of the sensor’s parameter(s).

1.6 Organization of the Thesis

The rest of this thesis is organized as follows:

Chapter 2 introduces the A* algorithm and presents a new derivation of the A* algorithm

that uses consistent heuristic from the primal-dual algorithm for linear programming (LP).

We also explain how the A* iterations improve the dual objective of the LP model of the

shortest path problem and discuss, from the primal-dual point of view, various heuristics

18

and strategies used in the A* search.

Chapter 3 is focused on the CR policy for traversing probabilistic graphs. And special

emphasis is given to the presentation of a theorem on the optimality of the CR policy for

traversing the probabilistic parallel graph that has independent probability markers for

nondeterministic arcs.

Chapter 4 is on the sensor and mark information. We introduce the concept of sensor and

the new concept of the mark information that is based on the sensor’s readings. We

introduce the important concepts of sensor monotonicity. We introduce the threshold

policy and the penalty policy. We also present some analytical monotonicity results under

simple settings.

Chapter 5 is on the minefield model and the sensor monotonicity. We introduce a new

formulation of the minefield model and present an adjusted CR policy that is specifically

designed for the minefield application. We graphically demonstrate the running cases of

the RDP simulation program. We present the Monte Carlo simulation results for

supporting the weak monotonicity and strong monotonicity from both the conditional

experiments and the unconditional experiments.

19

Chapter 6 is on the deterministic shortest path in minefield. Based on the Monte Carlo

simulations, we introduce the results of comparison between the length of the

deterministic shortest path and the cost of nondeterministic straversal under the adjusted

CR policy. We also present the suggestion for incorporating the critical value(s) of the

sensor parameter(s) into the design of a policy like the adjusted CR.

Chapter 7 presents summary, conclusions, and suggestions for future research.

20

2 The A* Algorithm

The A* algorithm is the core of our RDP simulation program. It can be used to find a

shortest path from a starting node to a target node in a positively weighted graph. With a

consistent heuristic, the A* algorithm expands a shortest path tree that is rooted at the

starting node, node by node, favorably toward the target node. The more precise the

heuristic estimate of the distance from every node to the target node, the smaller the final

shortest path tree that covers the target node. In this chapter, we introduce the A*

algorithm from the primal-dual point of view. We first set up the problem domain and

introduce the A* algorithm and the primal-dual algorithm; we then use the heuristic to

construct an initial feasible solution to the dual and propose a best-first search (see [27])

version of the primal-dual algorithm; We show that this version of the primal-dual

algorithm behaves essentially as same as the A* algorithm that uses the same heuristic;

finally, we present some interesting implications of this result.

2.1 Best-First Search

As a popular best-first search method, the A* algorithm maintains two node lists

throughout. One is the Open list, which consists of those nodes that are temporally

21

labeled with the estimates of the distances from the starting node; the other is the Closed

list, which consists of those nodes that are permanently labeled with the exact distances

from the starting node. We now set up the problem domain and give a description of the

algorithm.

We consider a directed, positively weighted simple graph denoted as G = (V, A, W, δ, b),

where V is the set of nodes, A is the set of arcs, W: A → R is the weight function, δ > 0 is

a constant such that δ ≤ W(a) < +∞ for all a ∈ A, and finally b > 0 is a constant integer

such that |{v | (u, v) ∈ A or (v, u) ∈ A}| ≤ b for all u ∈ V. Suppose we want to find a

shortest s-t (directed) path in G, where s ∈ V is a specified starting node and t ∈ V is a

specified terminal node. Further suppose that there exists a heuristic function h: V→ R

such that h(v) ≥ 0 for all v ∈ V, h(t) = 0, and W(u, v) + h(v) ≥ h(u) for all (u, v) ∈ A. Such

h is called consistent heuristic. According to [25], [26], [27], the A* algorithm that uses

such h is complete, that is, it can find a shortest s-t path in G as long as there exists an s-t

path in G. The algorithm can be stated as follows. It searches from s to t.

The A* Algorithm

Notations:

h: heuristic

22

O: Open list

E: Closed list

d: distance label

f: node selection key

pred: predecessor

Steps:

Given G, s, t, and h

Step 1. Set O = {s}, d(s) = 0, and E = φ.

Step 2. If O = φ and t ∉ E, then stop (there is no s-t path); otherwise, continue.

Step 3. Find u = Ov∈minarg f(v) = d(v) + h(v). Set O = O \ {u} and E = E ∪{u}. If t ∈ E,

then stop (a shortest s-t path is found); otherwise, continue.

Step 4. For each node v ∈ V such that (u, v) ∈ A and v ∉ E,

if v ∉ O, then

set O = O ∪{v}, d(v) = d(u) + W(u, v), and pred(v) = u;

otherwise,

if d(v) > d(u) + W(u, v), then

set d(v) = d(u) + W(u, v) and pred(v) = u.

Go to Step 2.

23

In particular, when h = 0, the A* algorithm stated above reduces to the Dijkstra’s

algorithm. For convenience, for any two nodes u ∈ V and v ∈ V, let dist(u, v) denote the

distance from u to v in G. That is, if there is no u-v path in G, we define dist(u, v) = +∞;

otherwise, we define dist(u, v) to be the length of a shortest u-v path in G. According to

[25], [27], a central property, called strong optimality, of the A* algorithm stated above is

d(u) = dist(s, u) when u ∈ E.

Given a consistent heuristic h, we can define a new weight function Wh such that Wh(u, v)

= W(u, v) + h(v) – h(u) for all (u, v) ∈ A. This change of weights results in a new graph

Gh = (V, A, Wh, δ, b). It has been known from [6] that running the Dijkstra’s algorithm to

find a shortest s-t path in Gh is equivalent to running the A* algorithm stated above to

find a shortest s-t path in G if the two algorithms apply the same tie-breaking rule. The

equivalence is due to the truth that the two algorithms construct identical shortest path

tree that is rooted at s although the distance labels of the same leaf are distinct. The

equivalence tells that the two algorithms can be derived from each other.

2.2 Primal-Dual

Now consider modeling the shortest path problem as linear programming (LP). For

convenience, we define G% = (V, A% , W% , δ, b), where A% = {(u, v) | (v, u) ∈ A} and

24

W% (u, v) = W(v, u) for all (u, v) ∈ A% , that is, G% is formed by reversing the directions of

all the arcs of G. Clearly, to find a shortest s-t path in G is equivalent to find a shortest t-s

path in G% . For each (u, v) ∈ A% , let x(u, v) denote the decision variable. A primal LP

model for finding a shortest t-s path in G% is

Min( , )

( , ) ( , )u v A

W u v x u v∈

⋅∑%

%

Subject to

:( , )

( , )v u v A

x u v∈

∑%

−:( , )

( , )v v u A

x v u∈

∑%

=

x(u, v) ≥ 0 for all (u, v) ∈ A% .

As long as there exists an s-t path in G, it can be easily shown that a binary optimal

solution to Model (2.2.1-2.2.3) exists. In fact, Model (2.2.1-2.2.3) is just to send a unit

flow from a supplier t to a customer s in G% with least cost. The price of sending a unit

flow along any (u, v) ∈ A% is W% (u, v). One option is to find a shortest t-s path in G%

and send a unit flow along this path. The general option is to divide the unit flow into

pieces. However, to minimize the cost, each piece must be sent along a shortest t-s path

in G% . This backward version of the primal LP model has a very nice dual, which can be

expressed with respect to G. It is stated as

1 if u = t; −1 if u = s; 0 for all u ∈ V \ {s, t},

(P)

(2.2.1)

(2.2.2)

(2.2.3)

25

Max π(t) − π(s)

Subject to

π(v) − π(u) ≤ W(u, v) for all (u, v) ∈ A,

where for each v ∈ V, the decision variable π(v) is called the potential of v. Constraint

(2.2.5) can be derived from its original form: π(u) − π(v) ≤ W% (u, v) for all (u, v) ∈ A% .

This constraint says that for each (u, v) ∈ A, a triangle inequality relative to s holds.

An obvious advantage of (D) is that a feasible solution is easy to find. At least, π = 0 is

one. The key idea of the primal-dual algorithm for shortest path problem, illustrated in

[42], is to start from a feasible solution π to (D), search for a feasible solution x to (P)

such that for each (u, v) ∈ A, x(u, v) = 0 whenever W(u, v) − π(v) + π(u) > 0. If such x is

found, then a shortest s-t path in G can be found. In fact, such x corresponds to an s-t

path on which for each arc (u, v), the equality W(u, v) − π(v) + π(u) = 0 holds. If such x

cannot be found, then some procedure is needed to update π such that Constraint (2.2.5)

is still satisfied and Objective (2.2.4) is improved. An important feature of the

primal-dual algorithm is that any equality in Constraint (2.2.5) still holds after π is

updated. Another important feature is that after π is updated, one strict inequality in

Constraint (2.2.5) may become equality. The primal-dual algorithm keeps attempting to

(D)

(2.2.5)

(2.2.4)

26

construct an s-t path in G by using the arcs that correspond to the equalities in Constraint

(2.2.5). According to [42], given the initial feasible solution π = 0 to (D), the primal-dual

algorithm behaves essentially as same as the Dijkstra’s algorithm that searches from s to t

in G. Hence the Dijkstra’s algorithm can be derived from the primal-dual algorithm.

Since the A* algorithm with consistent heuristic can be derived from the Dijkstra’s

algorithm and the Dijkstra’s algorithm can be derived from the primal-dual algorithm, we

have that the A* algorithm that uses consistent heuristic can be derived from the

primal-dual algorithm. But the derivation needs the Dijkstra’s algorithm as the bridge. It

also involves the change of the weight function. In this thesis, we show that if we use h

to construct an initial feasible solution to (D), then applying the primal-dual algorithm

directly leads to the A* algorithm that searches from s to t in G.

2.3 Derivation of the A*

The key point of our derivation is to choose π(0) = − h as the initial feasible solution to

(D). To justify the dual feasibility of π(0), we notice, by the consistency of h, that W(u, v)

+ h(v) ≥ h(u) for all (u, v) ∈ A. Hence W(u, v) − π(0)(v) ≥ −π(0)(u) for all (u, v) ∈ A. The

inequality can be rewritten as π(0)(v) − π(0)(u) ≤ W(u, v), which is exactly what the dual

feasibility requires.

27

A nice property of (D) is that it does not require its solution to be nonnegative. Although

π(0) = − h ≤ 0, what really matters is π(0)(t) − π(0)(s) = − h(t) + h(s) = h(s) ≥ 0. This means

π(0) = − h is a better initial feasible solution to (D) than π(0) = 0. But we still need to

justify the validity of π(0) = − h. That is, we still need to show that the primal-dual

algorithm that starts from the solution π(0) = − h to (D) can find a shortest s-t path in G as

long as there exists an s-t path in G. It suffices to show the equivalence between the

primal-dual algorithm that starts from − h and the A* algorithm that uses h. We now give

the description of the best-first search version of the primal-dual algorithm that starts

from − h as follows.

Algorithm 2.3.1

Notations:

h: heuristic

O: Open list

E: Closed list

π : potential

f1: node selection key

pred: predecessor

θ: potential increment

28

Φ: cumulative potential increase

Steps:

Given G, s, t, and h

Step 1. Set Φ = 0. Set O = {s}, π(s) = −h(s), pred(s) = s, and E = φ. Set W(s, s) = 0.

Step 2. If O = φ and t ∉ E, then stop (there is no s-t path); otherwise, continue.

Step 3. Find u = arg minv O∈

f1(v) = W(pred(v), v) − π(v) + π(pred(v)). Set θ = W(pred(u), u) −

π(u) + π(pred(u)). Set Φ = Φ + θ. Set O = O \ {u} and E = E ∪{u}. Set π(u) =

−h(u) + Φ. If t ∈ E, then stop (a shortest s-t path is found); otherwise, continue.

Step 4. For each v ∈ O, set π(v) = −h(v) + Φ.

Step 5. For each v ∈ V such that (u, v) ∈ A and v ∉ E,

if v ∉ O, then

set O = O ∪{v}, pred(v) = u, and π(v) = −h(v) + Φ;

otherwise,

if W(pred(v), v) + π(pred(v)) > W(u, v) + π(u), then

set pred(v) = u.

Go to Step 2.

For theoretical convenience, right after Step 5, for all v ∈ V \ (E∪O), we define π(v) =

29

−h(v) + Φ. We can show that Algorithm 2.3.1, just like the classical version [2] of the

primal-dual algorithm, maintains the dual feasibility throughout.

Theorem 2.3.1. The dual feasibility stated as Constraint (2.2.5) is maintained when

running Algorithm 2.3.1.

Proof. The proof is inductive. The base case is upon the completion of the first iteration.

At this moment, π = π(0), it’s trivially true. Suppose right before the k-th iteration (k > 1),

the potential of any v ∈ V is π(v) and π satisfies the dual feasibility. We need to show that

right after the k-th iteration, the dual feasibility is still maintained. Right before the k-th

iteration, let [E, O] denote the E-O cut, which is the set of arcs from E to O. We only

need to show that the node selection rule in Step 3 of Algorithm 2.3.1 is equivalent to

finding an arc (u, v) ∈ [E, O] such that W(u, v) − π(v) + π(u) is the minimum. In fact,

( , ) [ , ]min

u v E O∈[W(u, v) − π(v) + π(u)]

= minv O∈

( , ) [ , ]

minu E

u v E O∈∈

[W(u, v) − π(v) + π(u)]

= minv O∈

( , ) [ , ]

min [ ( , ) ( )] ( )u E

u v E O

W u v u vπ π∈∈

⎡ ⎤⎡ ⎤⎢ ⎥+ −⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦

= minv O∈

[W(pred(v), v) + π(pred(v)) − π(v)].

Hence, the node selection rule in Step 3 of Algorithm 2.3.1 is equivalent to the arc

selection rule in the classical version of the primal-dual algorithm. Note that the later one

guarantees the satisfaction of the dual feasibility right after the k-th iteration, therefore,

30

the theorem is true.

The potential π of Algorithm 2.3.1 is meaningful. It is somehow closely related to the

distance labels of the nodes of G in the A* algorithm. The potentials of those nodes that

have entered the Closed list E become permanent. We can show that the potential

difference between any u ∈ E and s is actually the length of the shortest s-u path in G.

Theorem 2.3.2. After each iteration of Algorithm 2.3.1, π(u) − π(s) = dist(s, u) for any

node u ∈ E.

Proof. When node u enters E, an s-u pointer path, say P: v1 (= s) ~ v2 ~ … ~ vk (= u), is

determined. Denote L(P) as the length of P. Note that π(v2) = W(v1, v2) + π(v1), …, π(vk)

= W(vk-1, vk) + π(vk-1). By telescoping, we have π(vk) = L(P) + π(v1), i.e. π(u) − π(s) =

L(P). Since there is an s-u path in G, there must be a shortest s-u path in G. This is

because any s-u path in G with length no longer than L(P) has only finite number of arcs,

hence the number of s-u paths in G with length no longer than L(P) is finite. Let P)

: 1v)

(= s) ~ 2v) ~ … ~ kv) (= u) be a shortest s-u path in G and denote L( P)

) as the length of P)

.

By Theorem 2.3.1, Algorithm 2.3.1 maintains the dual feasibility stated as Constraint

(2.2.5). Hence π( 2v) ) ≤ W( 1v) , 2v) ) + π( 1v) ), …, π( kv) ) ≤ W( 1kv −) , kv) ) + π( 1kv −

) ). By

telescoping, we have π( kv) ) ≤ L( P)

) + π( 1v) ), i.e. π(u) − π(s) ≤ L( P)

). The two arguments

31

jointly imply that P is in fact a shortest s-u path and π(u) − π(s) = dist(s, u).

We now show the equivalence between Algorithm 2.3.1 and the A* algorithm we listed at

section 2.1.

Theorem 2.3.3. Under the same tie-breaking rule, Algorithm 2.3.1 is equivalent to the

A* algorithm that uses h, searching from s to t.

Proof. The proof is inductive. We need to show that right after each same iteration, the

two algorithms have the same Open list and Closed list, and for each node in the Open

list, the two algorithms assign the same predecessor in the Closed list. The base case is

upon the completion of the first iteration of the two algorithms, respectively. Note that in

the base case, s is the only node “closed” by the two algorithms. Hence the base case is

trivially true. Suppose (inductive hypothesis) right before the k-th iteration (k > 1) of the

two algorithms, the claim above is true. We now show that the claim still holds right after

the k-th iteration.

Firstly, we need to show that the node selection rule in Step 3 of Algorithm 2.3.1 is

equivalent to the node selection rule in the A* algorithm. Consider the moment

Algorithm 2.3.1 is about to enter its Step 3. At this moment, (arbitrarily) consider a node

32

v ∈ O. Note that v must have a predecessor, say u ∈ E. The selection key of v is W(u, v) −

π(v) + π(u). Note that

W(u, v) − π(v) + π(u)

= W(u, v) − (−h(v) + Φ) + π(u)

= W(u, v) + π(u) − π(s) + h(v) − Φ + π(s).

By Theorem 2.3.2, we have π(u) − π(s) = dist(s, u). Hence

W(u, v) − π(v) + π(u)

= W(u, v) + dist(s, u) + h(v) − Φ + π(s).

We can see that W(u, v) + dist(s, u) = W(u, v) + d(u) = d(v) and d(v) + h(v) = f(v). Also

Note that both Φ and π(s) remain the same when different nodes in O are considered.

Hence by inductive hypothesis, under the same tie-breaking rule, Algorithm 2.3.1 selects

the same node from O as the A* algorithm.

Secondly, we need to show that, under the same tie-breaking rule, after a node, say u, is

removed from O and put into E in Step 3 of Algorithm 2.3.1, the predecessor update on

any node v ∈ O is as same as that in the A* algorithm. In fact, if there is no arc (u, v) ∈ A,

there won’t be any predecessor update on v. If there is an arc (u, v) ∈ A, then in

Algorithm 2.3.1, the update is based on comparing W(pred(v), v) + π(pred(v)) with W(u,

v) + π(u). By Theorem 2.3.2 again,

33

W(pred(v), v) + π(pred(v))

= W(pred(v), v) + π(pred(v)) − π(s) + π(s)

= W(pred(v), v) + dist(s, pred(v)) + π(s)

and

W(u, v) + π(u)

= W(u, v) + π(u) − π(s) + π(s)

= W(u, v) + dist(s, u) + π(s).

Note that π(s) is common, hence the comparison is actually between W(pred(v), v) +

dist(s, pred(v)) = d(v) and W(u, v) + dist(s, u) = W(u, v) + d(u). Under the same

tie-breaking rule, this is just the predecessor update rule in the A* algorithm.

Finally, note that if there is a node v ∈ V \ (E∪O) such that (u, v) ∈ A, then Algorithm

2.3.1 will put it into O and assign it a predecessor u. Hence W(pred(v), v) + π(pred(v)) =

W(u, v) + π(u) = W(u, v) + dist(s, u) + π(s), which implies that v receives a distance label

d(v) = W(u, v) + dist(s, u). This is just what the A* algorithm does.

Combine the three arguments above, we have shown that the two algorithms maintain the

same Open list and Closed list, and for each node in the Open list, they assign the same

predecessor in the Closed list.

34

2.4 Duality

There is a nice property of the A* algorithm that uses consistent heuristic. For the A*

algorithm we listed at the beginning, suppose a node u1 is closed no later than another

node u2, then according to [25], [27], f(u1) ≤ f(u2). This property is called key

monotonicity. By key monotonicity, before t is closed, for any closed node u, f(u) ≤ dist(s,

t). By combining the key monotonicity property of the A* algorithm and Algorithm 2.3.1,

we have the following interesting result:

Theorem 2.4.1. After each iteration of Algorithm 2.3.1, π(t) − π(s) = maxu E∈

f(u) ≤ dist(s,

t).

Proof. The inequality directly follows from the key monotonicity property. We now show

the equality. Consider any iteration of Algorithm 2.3.1. Suppose node u is selected in

Step 3 during this iteration. Upon the completion of this iteration, by the proof of

Theorem 2.3.3, we see that Φ = f(u) + π(s); also note that π(t) = − h(t) + Φ = Φ, hence π(t)

− π(s) = f(u). By monotonicity property, upon the completion of this iteration, f(u) =

'maxu E∈

f(u ′).

If we define d(t) = +∞ when t ∉ E∪O, we then have that π(t) − π(s) ≤ d(t) always holds.

This is because d(t) ≥ dist(s, t) always hold. The inequality π(t) − π(s) ≤ d(t) can be

35

viewed as the weak duality. The duality gap is d(t) − (π(t) − π(s)) = d(t) − maxu E∈

f(u). As

the primal objective, d(t) is monotonically nonincreasing; as the dual objective, π(t) − π(s)

= maxu E∈

f(u) is monotonically increasing. By completeness, if there exists an s-t path in G,

then t will eventually be closed. Upon the moment t is closed, we have maxu E∈

f(u) = f(t) =

d(t). Hence both π(t) − π(s) and d(t) reach optimal. This means the duality gap is

eliminated. The final equality is just the so-called strong duality.

Just like the A* algorithm, Algorithm 2.3.1 may terminate at its Step 2. Simple analysis

(see [25]) of the A* algorithm tells that termination at Step 2 implies there is no s-t path

in G. An explanation within the primal-dual framework is that when Algorithm 2.3.1

terminates at its Step 2, the potentials of all nodes outside E can be arbitrarily raised a

same value without violating Constraint (2.2.5). But the objective (2.2.4) will be

unbounded. This just implies the infeasibility of (P). Just like the A* algorithm,

Algorithm 2.3.1 may not terminate at all. This happens when there is no s-t path in G, but

there are infinitely many nodes that are connected with s via paths. Under this

circumstance, the objective (2.2.4) is also unbounded.

2.5 Heuristics

Our analysis so far tells that the selection of π(0) = –h is sound. Actually, –h defines a

36

class of initial feasible solutions to (D). Suppose we have two consistent heuristics h1 and

h2. Denote PDi as the primal-dual algorithm starting from –hi, i = 1, 2. By completeness,

both PD1 and PD2 will successfully terminate as long as there exists an s-t path in G. If

h1(v) > h2(v) for all v ∈ V \ {t} and there exists an s-t path in G, then A dominance

theorem (see [25], [26], [27], [28]) on the A* algorithm says that E1 ⊆ E2, where Ei

denotes the final Closed list of PDi, i = 1, 2.

There are two interesting extreme cases. One is that π(0) = 0. In this case, Algorithm 2.3.1

reduces to the Dijkstra’s algorithm. This just indicates the derivation of the Dijkstra’s

algorithm from the primal-dual algorithm. The other is that π(0)(v) = − dist(v, t) for all v ∈

V. In this case, Algorithm 2.3.1 only closes the nodes that lie on a shortest s-t path in G,

hence the initial feasible solution to (D) is perfect.

Sometimes, there exits some metric function H: V×V → R such that H(u, v) ≥ 0 for all u,

v ∈ V, H(v, v) = 0 for all v ∈ V, and W(u, v) + H(v, w) ≥ H(u, w) for all (u, v) ∈ A. We

immediately have a consistent heuristic, say hH, defined as hH(v) = H(v, t) for all v ∈ V.

Under some condition, we can also find another consistent heuristic. Suppose we already

have a partial solution that is represented by a shortest path tree T of G% rooted at t.

Suppose this shortest path tree is found by the Dijkstra’s algorithm that searches from t to

37

s in G% . Let ET and OT denote the Closed list and Open list associated with T. We define

It can be easily shown that hH,T is a consistent heuristic, and for any v ∈ V, as two

estimates of dist(v, t), hH,T(v) is at least as good as hH(v), that is, hH(v) ≤ hH,T(v) ≤ dist(v, t).

The primal-dual algorithm can start from π(0) = −hH if H is available. The primal-dual

algorithm can also start from π(0) = −hH,T if both H and T are available. In the case that

both H and T are available, to start from −hH,T may result in less closed nodes upon

closing t than to start from −hH, but it doesn’t mean that the former has better efficiency

since to evaluate hH,T by (2.5.1) is more costly than to evaluate hH. However, an issue that

should be addressed is that the primal-dual algorithm that starts from −hH,T may be able

to successfully find a solution earlier than the moment t is closed. Hence, a different

termination condition might apply. This is actually related to the bidirectional search that

is discussed later.

Sometimes there also exists another type of heuristic h′ such that h′(v) ≥ 0 for all v ∈ V,

h′(s) = 0, and W(u, v) + h′(u) ≥ h′(v) for all (u, v) ∈ A. h′ is called consistent relative to s.

It’s obvious that π(0) = h′ is a feasible solution to (D). The corresponding objective value

hH,T(v) = min

TOτ∈[H(v, τ) + dist(τ, t)] for all v ∈ V \ ET;

dist(v, t) for all v ∈ ET. (2.5.1)

38

of (D) is π(0)(t) − π(0)(s) = h′(t) − h′(s) = h′(t) ≥ 0. It seems that π(0) = h′ is also a better

choice of initial feasible solution to (D) than π(0) = 0, however, the following simple

example shows that the primal-dual algorithm that starts from h′ may not terminate at all

even if there exists an s-t path in G.

The following Figure 2.5.1 shows a simple infinite graph. We want to find a shortest s-t

path. When applying the primal-dual algorithm with π(0) = {h′(s) = 0, h′(u) = 0, h′(t) = 0,

and h′(vi) = i for i = 1, 2, …} as the initial feasible solution to (D), the algorithm

(Algorithm 2.3.1 with initial node potentials set as this π(0)) will close s at first, then v1,

then v2, …. Note that u will never be closed, let alone t. Hence the algorithm won’t be

able to successfully terminate.

...

s

t

u1

1

1

1

1

1

1

v1

v2

v3

v4

π(0)(s) = 0

π(0)(u) = 0

π(0)(t) = 0

π(0)(v1) = 1

π(0)(v2) = 2

π(0)(v3) = 3

π(0)(v4) = 4

Figure 2.5.1: A example that the primal-dual algorithm starting from some π(0) = h′ does not terminate, where the length of any arc is 1 and the heuristic function h′ is {h′(s) = 0, h′(u) = 0, h′(t) = 0, and h′(vi) = i for i = 1, 2, …}.

39

2.6 Bidirectional Search

Although h′ is not a proper choice of the initial feasible solution to (D) for Algorithm

2.3.1 to start from, it can be used for bidirectional search. Consider the primal LP model

with respect to G, in which for each (u, v) ∈ A, we still use x(u, v) to denote the decision

variable:

Min ( , )

( , ) ( , )u v A

W u v x u v∈

⋅∑

Subject to

:( , )

( , )v u v A

x u v∈

∑ −:( , )

( , )v v u A

x v u∈

∑ =

x(u, v) ≥ 0 for all (u, v) ∈ A.

This forward version of primal LP model stands for sending a unit flow from a supplier s

to a customer t in G with least cost. It has the following dual:

Max π(s) − π(t)

Subject to

π(u) − π(v) ≤ W(u, v) for all (u, v) ∈ A.

Similar analysis can show that the primal-dual algorithm that uses –h′ as the initial

1 if u = s; −1 if u = t; 0 for all u ∈ V \ {s, t}

(P′)

(2.6.1)

(D′)

(2.6.3)

(2.6.2)

(2.6.5)

(2.6.4)

40

feasible solution to (D′) is essentially the A* algorithm that searches from t to s in G%

using the heuristic h′. This version of the primal-dual algorithm is exactly the backward

version of Algorithm 2.3.1. If both algorithms are used, searching toward each other, then

a bidirectional A* search can be established. For the backward version of Algorithm 2.3.1,

let O′, π ′ and pred′ denote the corresponding Open list, potential function, and

predecessor function, respectively. When the two search fronts (Open lists) meet, an s-t

path is found, and its length, denoted as L, can be expressed as L = [W(pred(v), v) +

π(pred(v)) − π(s)] + [W(v, pred ′(v)) + π ′(pred ′(v)) − π ′(t)], where v ∈ O∪O′ is the

meeting node. When the search continues, a sequence of lengths, say L1, L2, …, is

generated.

Let L)

= min{L1, L2, …}, which is the length of the shortest s-t path in G found so far.

Since π(t) − π(s) ≤ dist(s, t) ≤ L)

and π ′(s) − π ′(t) ≤ dist(s, t) ≤ L)

, we have a

termination condition for the bidirectional A* search, expressed via π and π ′, as

max{π(t) − π(s), π ′(s) − π ′(t)} = L)

.

This condition is essentially as same as the one that appears in [43]. It can eventually be

satisfied.

Again, the bidirectional A* search reduces to the bidirectional Dijkstra’s search when h =

(2.6.6)

41

h′ = 0. An alternative termination condition, according to [35] and Theorem 2.3.2, that is

expressed via π and π ′, is

minv O∈

[W(pred(v), v) + π(pred(v)) − π(s)]

+ ' '

minv O∈

[W(v′, pred ′(v′)) + π ′(pred ′(v′)) − π ′(t)] = L)

,

which can also eventually be satisfied.

(2.6.7)

42

3 Traversing Probabilistic Graphs

Early efforts formulated the RDP problem as traversing a probabilistic graph and the

objective is to find a policy that has smallest expected cost. This formulation is known as

the Canadian Traveler Problem (CTP) (see Bar-Noy and Schieber 1991 [44]).

Papadimitriou and Yannakakis 1991 [45] proved the intractability of several variants of

CTP. Several modifications and extensions of CTP were discussed in [44], [46], and [47].

CTP is a special case of the stochastic shortest paths with recourse (SPR) problem of

Andreatta and Romeo 1988 [48], who presented a stochastic dynamic programming

formulation for SPR and noted its intractability. Polychronopoulos and Tsitsiklis 1996

[49] also presented a stochastic dynamic programming formulation for SPR and then

proved the intractability of several variants. Provan 2003 [50] proved that SPR is

intractable even if the underlying graph is directed and acyclic.

Although finding an optimal policy in the sense of smallest expected cost is in general

intractable, for some special graph, the problem is tractable, i.e. there exists algorithm

that can find an optimal policy within polynomial-time. The aim of this chapter is to

present the CR policy and show that it is an optimal policy for traversing a special

probabilistic graph ⎯ parallel graph, under the assumption that the traversability status

43

of each nondeterministic arc is independent Bernoulli-distributed.

3.1 Probability Markers

A probabilistic graph is a finite directed graph denoted as G = (V, A, B, l, c, ρ), where V

is the set of nodes that contains a specified starting node s and a specified target node t, A

is the set of deterministic arcs, B is the set of nondeterministic arcs, l: A∪B → R+ is the

length function, c: B → R+ is the disambiguation cost function, ρ: B → (0, 1) is the

probability marker function such that for each e ∈ B, ρ(e) represents the probability that

e is not traversable. Without losing generality, we can assume that l(a) < +∞ and c(a) <

+∞ for all a ∈ A∪B.

For convenience we define X: A∪B → {0, 1} as an indicator function such that in any

realization of G, for each a ∈ A∪B, X(a) = 1 if a is nontraversable in this realization; X(a)

= 0 otherwise. Note that X(a) is deterministic for all a ∈ A and X(e) is random for all e ∈

B. We assume that for each e ∈ B, X(e) is independently Bernoulli(ρ(e))- distributed:

P(X(e) = 1) = ρ(e) and P(X(e) = 0) = 1 −ρ(e).

As required by the early formulation of the RDP problem, it is also assumed that for each

e ∈ B, the probability marker ρ(e) is known as priori. The probability marker might be

44

empirically estimated from the historical data. Each time before the agent traverses the

graph, the agent actually faces a realization of the graph. That is, the indicator X is

realized. As mentioned before, the agent does not have the information of the realized X.

The agent only has the probabilistic information ρ as priori. But the agent can

disambiguate the status of each e ∈ B as long as the agent reaches the tail of e. The

disambiguation of an arc e ∈ B updates the graph G by transferring e from the set B to

the set A and in the meantime, the value of X(e) is found. Hence the information on the

graph G is dynamic. For convenience, we define function ρ+: A∪B → R as an extension

of ρ:

We define function l+: A∪B → R as an extension of l:

We define function c+: A∪B → R as an extension of c:

The tuple (ρ+, l+, c+) represents the updated knowledge of the graph G.

c+(a) = c(a) if 0 < ρ+(a) < 1;

0 otherwise. (3.1.3)

(3.1.1) ρ+(a) =

0 if X(a) = 0;

ρ(a) if a ∈ B.

1 if X(a) = 1;

l+(a) = l(a) if 0 ≤ ρ+(a) < 1;

+∞ if ρ+(a) = 1. (3.1.2)

45

3.2 The CR Policy

We now introduce the CR policy that forms the core of [3]. The CR policy uses the CR

weight function in its shortest path subproblem. Under the setting of section 3.1, the CR

weight function WCR: A∪B → R+ is defined as:

WCR(a) = l+(a) + ( )1 ( )

c aaρ

+

+−,

for all a ∈ A∪B. Note that the extended probability marker ρ+ well characterizes the

status of the uncertainty of the graph G (i.e. what has been known deterministically and

what has been known probabilistically). And this knowledge, together with the extended

length function l+ and disambiguation cost function c+, is incorporated into the setting of

the shortest path subproblem. We call a weight function W is well posed with respect to

the knowledge of the graph G if

By inspection, for any a ∈ A∪B, WCR(a) = l(a) if ρ+(a) = 0; WCR(a) = +∞ if ρ+(a) = 1;

and l(a) < WCR(a) = l(a) + ( )1 ( )

c aaρ−

< +∞ if 0 < ρ+(a) < 1. Hence, the CR weight

function is well posed. In finding the shortest path in G, it prohibits any arc that has been

known to be nontraversable and penalizes any arc that is still nondeterministic.

(3.2.1)

W(a) = l(a) if X(a) = 0;

l(a) < W(a) < +∞ if a ∈ B.

W(a) = +∞ if X(a) = 1; (3.2.2)

46

With the CR weight function, the CR policy can be stated as:

under the knowledge (ρ+, l+, c+) of the graph G, find a shortest path relative to the CR

weight function (3.2.1) in G from the agent’s current location to the target node t, let the

agent follow the shortest path plan until the agent reaches t or encounters a

nondeterministic arc. In the former case, the navigation process successfully completes;

in the later case, the agent disambiguates the nondeterministic arc, say e. Upon the

completion of the disambiguation, e is transferred from the set B to the set A and the

value of X(e) is found. If X(e) = 0, then the agent moves on along the planned path;

otherwise, find a shortest path relative to the updated CR weight function (3.2.1) in G

from the agent’s current location to the target node t and let the agent follow the new

shortest path plan. In searching for the shortest paths, the tie-breaking favors the

deterministic arcs.

There are two immediate questions to answer:

1) As the CR policy above states, if the disambiguation of some arc e ∈ B discloses that

X(e) = 0, then the agent takes the arc e and moves on. The question is “does the

replanning of a new shortest path based on accrued disambiguation results from

where the agent is to t yields a strictly shorter path than the original plan?”

47

2) Under the CR policy, can the agent always reach the target t?

The answer to the first question, under the setting of section 3.1 and the definition of the

CR weight function (3.2.1), is “no”. That is, as long as a disambiguation says ok to go,

there is no need to replan a new shortest path. The argument is summarized as the

following theorem:

Theorem 3.2.1. Under the setting of section 3.1 and using the CR weight function

defined as (3.2.1), it’s unnecessary to replan a new shortest path from where the agent is

to t upon the moment when the agent completes a disambiguation that discloses the next

arc to be traversable.

Proof. Suppose the agent is at u and the next arc e = (u, v) ∈ B but X(e) = 0. Suppose that

before the disambiguation, the planned path from u to t is P; and after disambiguation,

the replanned path from u to t is P′. Suppose before disambiguation, the weight function

is W; after the disambiguation, the weight function is W ′ (the weight function is updated

because the tuple (ρ+, l+, c+) is updated). Note that P passes e, we denote Pvt as the

subpath of P from v to t.

Before the disambiguation, the weight of e is W(e) = l(e) + ( )1 ( )

c eeρ−

, which means the

length of P, denoted as L(P), is

48

L(P) = W(e) + L(Pvt) = l(e) + ( )1 ( )

c eeρ−

+ L(Pvt).

After the disambiguation, the new weight of e is W ′(e) = l(e), which means the new

length of P, denoted as L′(P), is

L′(P) = l(e) + L′(Pvt).

Note that L′(Pvt) = L(Pvt), which implies L′(P) = l(e) + L(Pvt).

Let L′(P′) be the length of P′ after the disambiguation. We now show that L′(P) ≤ L′(P′).

In fact, we can discuss two cases. Case 1: P′ passes e. Denote P′vt as the subpath of P′

from v to t. Note that L′(P′vt) = L(P′vt) and Pvt is a v-t shortest path in G before the

disambiguation, hence

L′(P′) = l(e) + L′(P′vt) = l(e) + L(P′vt) ≥ l(e) + L(Pvt) = L′(P).

Case 2: P′ does not pass e. Note that P is a u-t shortest path in G before the

disambiguation, hence

L′(P′) = L(P′) ≥ L(P) > L′(P).

The answer to the second question is “yes”, if the graph G satisfies certain condition.

One condition is that G is convergent graph, which is defined as follows:

Definition 3.2.2. Graph G is called convergent with respective to t if for any v ∈ V，v ≠ t,

49

there is a v-t path that contains only the arcs in A.

Theorem 3.2.3. Under the setting of section 3.1, for convergent graph G, the CR policy

that uses the CR weight function defined as (3.2.1) has finite expected cost.

Proof. The finiteness of G implies there are only finite possibilities. Suppose (for

contradiction) that there exists a positive probability that the agent pays infinitely large

cost to reach t (i.e. the agent can never reach t). Since |B| is finite, the agent at most pays

finite disambiguation cost. Hence it must be stuck in some cycle. Note that the CR policy

uses positive weight functions and plans shortest paths, by convergence of G, the cycle

will eventually be avoided, a contradiction!

A special convergent graph is the so-called parallel graph. Under the setting of section

3.1, a parallel graph is such G that V = {s, t}, we next show that the CR policy yields the

smallest expected cost for traversing the parallel graph.

3.3 Parallel Graph

Showing the optimality of the CR policy in the sense of expectation in principle requires

comparing the CR policy with all the other policies. A simple argument can show that

there are exponentially many distinct policies even for traversing the parallel graph. In

50

fact, without losing generality, we can assume |A| = 1 since the only the shortest

deterministic arc deserves consideration. Let a ∈ A be the only deterministic arc. For

convenience, let B = {e1, e2, …, em}. There are at least (m + 1)! distinct policies for

traversing the parallel graph with each one simply being a permutation of a, e1, e2, …, em.

A permutation, denoted as a1 → a2 → … → am+1, of a, e1, e2, …, em means to try a1 at

first, if X(a1) = 0, then take a1 and go; otherwise, try a2, …, so on so forth. Of course, all

these permutations only form a class of policies, there can be other policies that do not

belong to this class. Obviously, as m is large, brute-force enumeration must give way to

smarter approach.

Our approach is to prune the dynamic programming search tree (DPST) that contains all

the possible policies and show that the CR policy is an optimal decision sequence in the

DPST. To show this, we first prove a weak result that involves an important concept:

balk.

For general graphs, it may sometimes be advantageous to disambiguate an arc and then,

even if the arc is discovered to be traversable, not to traverse it immediately. We call such

a delay a balk. For example, in Figure 3.3.1, the optimal policy is to traverse a1,

disambiguate e1 at v1, traverse a2, disambiguate e2 at v2, if e2 is traversable, then traverse

51

it; otherwise, take the path v2 ~ v1 ~ t or v2 ~ v3 ~ t based on whether e1 is or isn’t

traversable. Although the early disambiguation of e1 adds an extra expected cost 12⋅1, it

saves an expected length 12⋅12⋅(5 + 5) on a possible later backtrack if both e1 and e2 are

discovered not to be traversable. We say a policy is balk-free if it has the property that,

upon any disambiguation revealing that an arc is traversable, this arc is immediately

traversed. Obviously, the CR policy is a balk-free policy, and the (m+1)! permutations

mentioned as the beginning of this section form the class of balk-free policies.

Theorem 3.3.1. Under the setting of section 3.1, the CR policy for traversing the parallel

graph has the minimum expected cost among the class of balk-free policies.

a1 a2

a3 a4

e1

e2 s t

v1

v2

v3

5 5

5

50

5

500

Figure 3.3.1: An example of a general (nonparalell) graph where the optimal policy requires a balk. Each arc is bidirectional and labeled with its length. The dashed arcs e1 and e2 are probabilistic; both

of these have probability 12

of being traversable and both has disambiguation cost 1.

52

To prove this theorem, we first prove two lemmas.

For convenience, for the balk-free policy a1 → a2 → … → am+1 for traversing the parallel

graph, for i = 1, 2, …, m+1, let ρi = 0 if ai ∈ A;ρi = ρ(ai) if ai ∈ B, let li = l(ai); let ci = 0 if

ai ∈ A; ci = c(ai) if ai ∈ B; and let hi = 1

ii

i

clρ

+−

. Also, denote Ebf(a1→a2→ … →am+1)

as the expected cost of the policy a1 → a2 → … → am+1.

Lemma 3.3.2. Ebf(a1 → a2→ … → am+1)

= (1−ρ1)h1 + ρ1(1−ρ2)h2 + … + ρ1…ρm(1−ρm+1)hm+1 + ρ1…ρm+1l(a).

Proof. We use the following decision tree to calculate Ebf(a1→a2→ … →am+1).

a1

a2 c1 + l1

a3 c1+ c2+ l2

ρ1 1−ρ1

1−ρ2 ρ2

…

am+1

c1 + c2 + … + cm+1 + lm+1

1−ρm+1 ρ m+1

c1 + c2 + … + cm+1 + l(a)

Figure 3.3.2: The decision tree of the balk-free policy a1→ a2 → … → am+1 for traversing parallel graph. Each leaf node is a possible outcome of the cost.

53

As Figure 3.3.2 shows,

Ebf(a1 → a2 → … → am+1)

= (1−ρ1)( c1+ l1) + ρ1(1−ρ2)( c1 + c2 + l2) + …

+ ρ1…ρm(1−ρm+1)( c1 + c2 + … + cm+1 + lm+1)

+ ρ1…ρmρm+1( c1 + c2 + … + cm+1 + l(a)).

By combining the like terms and note that hi = 1

ii

i

clρ

+−

for each i = 1, 2, …, m+1, we

have that

Ebf(a1 → a2 → … → am+1)

= c1 + (1−ρ1) l1 + ρ1c2 + ρ1(1−ρ2) l2 + …

+ ρ1…ρmcm+1 + ρ1…ρm(1−ρm+1)lm+1

+ ρ1…ρmρm+1l(a)

= (1−ρ1)h1 + ρ1(1−ρ2)h2 + … + ρ1…ρm(1−ρm+1)hm+1+ρ1…ρmρm+1l(a).

Lemma 3.3.3. For two balk-free policies a1 → a2 → … → ak → ak+1 → … → am+1 and

a1 → a2 → … → ak+1 → ak → … → am+1 for traversing the parallel graph,

Ebf(a1 → a2 → … → ak+1 → ak → … → am+1)

− Ebf(a1 → a2 → … → ak → ak+1 → … → am+1)

= ρ1…ρk-1(1− ρk)(1− ρk+1)(hk+1 − hk).

Proof. By Lemma 3.3.2 and simple algebraic operations we have

54

Ebf(a1 → a2 → … → ak+1 → ak → … → am+1)

− Ebf(a1 → a2 → … → ak → ak+1 → … → am+1)

= ρ1…ρk-1(1− ρk+1)hk+1 + ρ1…ρk-1ρk+1(1− ρk)hk

− ρ1…ρk-1(1− ρk)hk − ρ1…ρk(1− ρk+1)hk+1

= ρ1…ρk-1(1−ρk)(1−ρk+1)(hk+1 − hk).

Proof of Theorem 3.3.1. By Lemma 3.3.3 we have that hk+1 < hk implies Ebf(a1 → a2

→ … → ak+1 → ak → … → am+1) − Ebf(a1 → a2 → … → ak → ak+1 → … → am+1) ≤ 0.

Hence if hk+1 < hk, after changing the policy a1 → a2 → … → ak+1 → ak → … → am+1 by

swapping ak and ak+1, the expected cost won’t be increased. We can change any balk-free

policy into the CR policy simply by reordering the arcs in ascending order of the CR

weights via bubble sort (see [5]). Since each single adjacent swap doesn’t raise the

expected cost, we have that the expected cost of the CR policy is no greater than any

other balk-free policy.

We now show the strong result.

Theorem 3.3.4. Under the setting of section 3.1, the CR policy for traversing the parallel

graph has the minimum expected cost.

55

To prove this theorem, we first prove two reduction lemmas.

For convenience, let E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k}) denote the minimum

expected cost for the agent to traverse the parallel graph given that the arcs x1, x2, …,

xm+1−k are known to be traversable and the arcs y1, y2, …, yk are nondeterministic, where

y1, y2, …, yk, x1, x2, …, xm+1−k are the distinct members of A∪B. This notation is very

useful for describing the general decision tree for traversing the parallel graph. Since x1,

x2, …, xm+1−k are known to be traversable, E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k}) can be

simplified due to the fact that the deterministic arc with the minimum length matters.

Hence we have

E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k})

= E*({y1, y2, …, yk} | argmin{l(x1), l(x2), …, l(xm+1−k)}).

Let x0 = argmin{l(x1), l(x2), …, l(xm+1−k)}, then

E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k})

= E*({y1, y2, …, yk} | x0).

Lemma 3.3.5. E*({y1, y2, …, yk} | x0)

= min{l(x0), ki ,,2,1

minL=

[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk}\yi | x0) + (1−

ρ(yi))⋅E*({y1, y2, …, yk} \ yi | argmin{l(x0), l(yi)})]}.

56

Proof. Conditioning on first disambiguating any nondeterministic arc yi, the minimum

expected cost of traversal is c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1,

y2, …, yk} \ yi | argmin{l(x0), l(yi)}). Note that this must be compared with l(x0), hence the

lemma is true.

Lemma 3.3.6. If l(x0) ≤ l(yj) for some j, then

E*({y1, y2, …, yk} | x0)

= min{l(x0), ji

ki≠= ,,2,1min

L[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1, y2, …, yk}

\ yi | argmin{l(x0), l(yi)})]}.

Proof. Note that l(x0) ≤ l(yj) implies x0 = argmin{l(x0), l(yj)}. Hence the minimum

expected cost conditioning on first disambiguating yj is

ρ(yj)⋅(c(yj) + E*({y1, y2, …, yk} \ yj | x0))

+ (1− ρ(yj))⋅(c(yj) + E*({y1, y2, …, yk} \ yj | argmin {l(x0), l(yj)}))

= c(yj) + E*({y1, y2, …, yk} \ yj | x0).

Note, by Lemma 3.3.5, that

E*({y1, y2, …, yk} \ yj | x0)

= min{l(x0), ji

ki≠= ,,2,1min

L[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ {yj, yi} | x0) + (1− ρ(yi))⋅E*({y1,

y2, …, yk} \ {yj, yi} | argmin {l(x0), l(yi)})]}.

Also note that for i ≠ j,

57

E*({y1, y2, …, yk} \ {yj, yi} | x0) ≥ E*({y1, y2, …, yk} \ yi | x0)

and

E*({y1, y2, …, yk} \ {yj, yi} | argmin {l(x0), l(yi)})]}

≥ E*({y1, y2, …, yk} \ yi | argmin {l(x0), l(yi)})]}.

We have that

c(yj) + E*({y1, y2, …, yk} \ yj | x0)

> E*({y1, y2, …, yk} \ yj | x0)

≥ min{l(x0), ji

ki≠= ,,2,1min



By Lemma 3.3.5 again,

E*({y1, y2, …, yk} | x0)

= min{l(x0), ki ,,2,1

minL=

[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1, y2, …, yk}


Hence

E*({y1, y2, …, yk} | x0)

= min{l(x0), ji

ki≠= ,,2,1min



Remark. Lemma 3.3.5 simply says that the minimum expected cost of traversing the

58

parallel graph can be evaluated by dynamic programming, which has exponential

complexity. Lemma 3.3.6 says there is a way of pruning the DPST such that the optimal

decision sequence still remains in the pruned search tree. In more concrete words, if the

length of a nondeteministic arc is no less than that of the shortest arc known to be

traversable, then the search branch that goes to disambiguating this nondeteministic arc

can be pruned.

It’s illustrative to capture the structure of the DPST from the simple parallel graph in

which A = {a} and B = {e1, e2}. The tree is demonstrated in Figure 3.3.3.

l0 e1 ρ1 1−ρ1

c1+E*({e2}| a) c1+E*({e2}|argmin{l(a), l(e1)})

e2

E*({e1, e2} | a)

ρ2 1−ρ2

c2+E*({e1}| a) c2+E*({e1}|argmin{l(a), l(e2)}) ρ1 1−ρ1

e2 l0 ρ2 1−ρ2

min{l0, l1} e2 ρ2 1−ρ2

l0 e1 ρ1 1−ρ1

min{l0, l2} e1 ρ1 1−ρ1

c2+l0 c2+min{l0, l2} c2+min{l0, l1} c2+min{l0, l1, l2} c1+l0 c1+min{l0, l1} c1+min{l0, l2} c1+min{l0, l1, l2}

Figure 3.3.3: The dynamic programming search tree for finding the optimal policy for traversing the parallel graph in which A = {a} and B = {e1, e2}. ρ1 = ρ(e1), ρ2 = ρ(e2), l0 = l(a), l1 = l(e1), l2 = l(e2), c1 = c(e1), c2 = c(e2). The root of the tree is the problem of evaluating E*({e1, e2} | a), which is recursively reduced into subproblems via conditioning.

59

Proof of Theorem 3.3.4. The proof is by induction on |B|. For convenience, let ECR

denote the expected cost of the CR policy. The base case is when |B| = 0, in which the

theorem is trivially true. Suppose (inductive hypothesis) that the CR policy has the

minimum expected cost when |B| = k ≥ 0. Now, without losing generality, consider any

case in which |B| = k +1, A = {a}, and B = {e1, e2, …, ek+1}. We need to show that E*({e1,

e2, …, ek+1} | a) = ECR({e1, e2, …, ek+1} | a). If l(a) ≤ l(ei) for each i = 1, 2, …, k+1, then

E*({e1, e2, …, ek+1} | a) = l(a) = ECR({e1, e2, …, ek+1} | a). It’s trivial. We now consider

the nontrivial case that there exists at least an i such that l(a) > l(ei).

In the DPST for evaluating E*({e1, e2, …, ek+1} | a), by Lemma 3.3.6, we only need to

consider the branches that go to disambiguating the nondeterministic arcs with the

lengths strictly less than l(a). By Lemma 3.3.5, suppose (without losing generality) that

E*({e1, e2, …, ek+1} | a)

= min{l(a), 1,,2,1

min+= ki L

[c(ei) + ρ(yi)⋅E*({e1, e2, …, ek+1} \ ei | a) + (1− ρ(ei))⋅E*({e1, e2, …,

ek+1} \ ei |argmin{ l(a), l(ei)})]}

= min{l(a), c(ek+1) + ρ(ek+1)⋅E*({e1, e2, …, ek} | a) + (1− ρ(ek+1))⋅E*({e1, e2, …, ek} |

argmin{l(a), l(ek+1)})}

= min{l(a), c(ek+1) + ρ(ek+1)⋅E*({e1, e2, …, ek} | a) + (1− ρ(ek+1))⋅E*({e1, e2, …, ek} | ek+1},

with the last equality due to l(ek+1) < l(a).

60

By inductive hypothesis, E*({e1, e2, …, ek} | a) = ECR({e1, e2, …, ek}| a) and E*({e1,

e2, …, ek} | ek+1) = ECR({e1, e2, …, ek}| ek+1), hence

E*({e1, e2, …, ek+1} | a)

= min{l(a), c(ek+1) + ρ(ek+1)⋅ECR({e1, e2, …, ek} | a) + (1− ρ(ek+1))⋅ECR({e1, e2, …, ek} |

ek+1}.

Without losing generality, further suppose that he,1 ≤ he,2 ≤ … ≤ he,k, where he,i =

( )( )1 ( )

ii

i

c el eeρ

+−

for i = 1, 2, …, k. Let p ≤ k, q ≤ k be such integers that he,1 ≤ he,2 ≤ … ≤

he,p ≤ l(a) and he,1 ≤ he,2 ≤ … ≤ he,q ≤ l(ek+1). Since l(ek+1) < l(a), we have q ≤ p. By

Lemma 3.3.2,

ECR({e1, e2, …, ek}| a)

= (1− ρ(e1))⋅he,1 + ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep−1)⋅(1− ρ(ep))⋅he,p + ρ(e1)⋅ ⋅⋅⋅

⋅ρ(ep)⋅l(a)

and

ECR({e1, e2, …, ek}| ek+1)

= (1− ρ(e1))⋅he,1 + ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq−1)⋅(1− ρ(eq))⋅he,q + ρ(e1)⋅ ⋅⋅⋅

⋅ρ(eq)⋅l(ek+1).

Hence

E*({e1, e2, …, ek+1} | a)

61

= min{l(a), c(ek+1) + ρ(ek+1)⋅((1− ρ(e1))⋅he,1 + ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅

⋅ρ(ep−1)⋅(1− ρ(ep))⋅he,p + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep)⋅l(a)) + (1 − ρ(ek+1))⋅((1− ρ(e1))⋅he,1 +

ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq−1)⋅(1− ρ(eq))⋅he,q + ρ(e1)⋅ ⋅⋅⋅

⋅ρ(eq)⋅l(ek+1))}

= min{l(a), c(ek+1) + (1− ρ(ek+1))⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅l(ek+1) + (1− ρ(e1))⋅he,1 + ρ(e1)⋅(1−

ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq−1)⋅(1− ρ(eq))⋅he,q + ρ(ek+1)⋅(ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅(1−

ρ(eq+1))⋅he,q+1 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep-1)⋅(1− ρ(ep))⋅he,p + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep)⋅l(a))}.

Note that

c(ek+1) + (1−ρ(ek+1))⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅l(ek+1)

≥ c(ek+1)⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq) + (1− ρ(ek+1))⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅l(ek+1)

= ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅(1− ρ(ek+1))⋅he,k+1,

hence

E*({e1, e2, …, ek+1} | a) ≥ min{l(a), Ebf(e1→ e2→ …→ eq→ek+1→…→a→ …)},

where e1→ e2→ …→ eq→ek+1→…→a→ … is a balk free policy with ek+1 at q+1-th

position and a at p+1-th position.

By Theorem 3.3.1, we have

l(a) ≥ ECR({e1, e2, …, ek+1} | a)

and

Ebf(e1→ e2→ …→ eq→ek+1→…→a→ …) ≥ ECR({e1, e2, …, ek+1} | a),

62

hence

E*({e1, e2, …, ek+1} | a) ≥ ECR({e1, e2, …, ek+1} | a).

But

E*({e1, e2, …, ek+1} | a) ≤ ECR({e1, e2, …, ek+1} | a),

hence

E*({y1, y2, …, yk+1} | x0) = Ecr({y1, y2, …, yk+1} | x0).

Remark. Theorem 3.3.4 tells that the problem of traversing the probabilistic parallel

graph, under the assumption of independent probability markers, is nicely

computationally tractable. Obviously, to execute the CR policy, under this scenario, we

only need to find the potentially traversable arc with the smallest CR weight. To facilitate

the min-extraction process, we could build a heap (e.g., a binary heap) as the data

structure of the arcs. Note that building a binary heap has time complexity O((|B| +

1)⋅log(|B| + 1)) and the space complexity O(|B| + 1) when we start from an empty heap

and insert one by one into the heap. If we start with an array consisting the |B| + 1 arcs,

both the time complexity and space complexity can be O(|B| + 1).

63

4 Mark Information via Sensor

In last chapter, we introduced the early formulation of the RDP problem: traversing

probabilistic graphs. An important feature of the formulation is the probability marker

assumption, that is, before the agent travels, the probability of the nontraversability of

each nondeterministic arc is given. With this information, the primary objective is to find

a policy that has as small expected cost as possible. The probability marker assumption,

however, in recent real-world practice, seems to be invalid anymore. A recent navigation

system like COBRA’s processing station does not provide the traversability probabilities

(that might be estimated from historical data). It makes in-situ observation and provides a

probabilistic estimate of the traversability of the potential hazardous regions.

The estimates of the underlying true-false status of the potential hazards are probabilistic

and come from some sensor’s readings. The focus of this chapter is to systemically

formulate the new concept of markers from the perspective of sensor and investigate how

the improvement of the sensor might yield statistically better traversal. We first provide

the specific problem setting within which we will work, we then define sensors and the

new concept: sensor monotonicity, finally, we present two classes of policies: threshold

policies and penalty policies and show the sensor monotonicity results in two simple

64

cases.

4.1 Setting

The basic setting is as same as that in section 3.1. Let G = (V, A, B, l, c, ρ) be a finite

directed graph, where V is the set of nodes that contains a specified starting node s and a

specified target node t, A is the set of deterministic arcs, B is the set of n nondeterministic

arcs, l: A∪B → R+ is the length function, c: B → R+ is the disambiguation cost function,

ρ: B → (0, 1) is the probability function such that for each e ∈ B, ρ(e) represents the

probability that e is not traversable. Without losing generality, we can assume that l(a) <

+∞ and c(a) < +∞ for all a ∈ A∪B.

For convenience we define X: A∪B → {0, 1} as an indicator function such that in any

realization of G, for each a ∈ A∪B, X(a) = 1 if a is nontraversable in this realization; X(a)

= 0 otherwise. Note that X(a) is deterministic for all a ∈ A and X(e) is random for all e ∈

B. We assume that for each e ∈ B, X(e) is independently Bernoulli(ρ(e))- distributed:

P(X(e) = 1) = ρ(e) and P(X(e) = 0) = 1 −ρ(e).

Note here that we don’t assume ρ is known as a priori. But we assume there exists a

marker function Y: B → (0, 1) such that for any realization of G and each e ∈ B in this

65

realization, independently, Y(e) ~ F0 if X(e) = 0 and Y(e) ~ F1 if X(e) = 1, where F0: [0,1]

→ [0,1] and F1: [0,1] → [0,1] are two (continuous) distribution functions. We define a

sensor S as an ordered pair (F0, F1), denoted as S = (F0, F1). We use the notation Y(e) ~ S

to denote that the marker Y(e) of the arc e is generated from S.

A sensor S is said to be valid if F0(y) ≥ F1(y) for any 0 ≤ y ≤ 1. A valid sensor is said to be

discerning if F0(1/2) > 1/2 and F1(1/2) < 1/2. Consider two sensors S(1) = ( (1)0F , (1)

1F ) and

S(2) = ( (2)0F , (2)

1F ). For any realization of G, suppose Y(1) ~ S(1) and Y(2) ~ S(2). We say that

S(1) is stochastically at least as good as S(2) and write Y(1)fY(2), if for any 0 ≤ y ≤ 1, (1)0F (y)

≥ (2)0F (y) and (1)

1F (y) ≤ (2)1F (y).

With the new concept of the markers, which is characterized by Y other than ρ, for any e

∈ B, a policy for traversing G utilizes the information Y(e) in a realization of G. As under

the setting of section 3.1, the objective of designing a policy can be to minimize the

expected cost. There can also be other cost measures like quantiles (e.g., median), which

lead to alternative objectives.

We use C = C(G, s, t, X, Y, P) to denote the cost (traveling cost plus disambiguation cost)

the agent pays to travel from s to t under the policy P. C is a random variable since both

66

X and Y are random. Moreover, there may also be some randomness in P if P contains

some randomized algorithm. To measure the utility of P, we can consider the distribution

function ( )CH x = P (C ≤ x), x ≥ 0, the leftmost P0-quantile 0C = inf {x: ( )CH x ≥ P0},

and the mean E(C). A disadvantage of the mean measure is that it requires finite

expectation. This may not be always satisfied. For instance, if there exists a tiny

probability that C = +∞, then, theoretically, the mean is not a proper measure of the

utility of the P. However, the leftmost P0-quantile for some 0 < P0 < 1 still works as a

utility measure.

4.2 Sensor Monotonicity

We give three definitions of the sensor monotonicity.

Definition 4.2.1. Let C(1) = C(G, s, t, X, Y(1), P) and C(2) = C(G, s, t, X, Y(2), P), where Y(1)

~ S(1) and Y(2) ~ S(2). We say P is strongly monotone with respect to sensor if Y(1)fY(2)

implies (1) ( )C

H x ≥ ( 2) ( )C

H x for any x ≥ 0.

Definition 4.2.2. Let C(1) = C(G, s, t, X, Y(1), P) and C(2) = C(G, s, t, X, Y(2), P), where Y(1)

~ S(1) and Y(2) ~ S(2). We say P is P0-quantile monotone with respect to sensor for some 0

67

≤ P0 ≤ 1 if Y(1)fY(2) implies (1)0C ≤ (2)

0C , where (1)0C = inf {x: (1) ( )

CH x ≥ P0} and (2)

0C =

inf {x: ( 2) ( )C

H x ≥ P0}.

Definition 4.2.3. Let C(1) = C(G, s, t, X, Y(1), P) and C(2) = C(G, s, t, X, Y(2), P) , where Y(1)

~ S(1) and Y(2) ~ S(2) and C(1) < +∞, C(2) < +∞. We say P is weakly monotone with respect

to sensor if Y(1)fY(2) implies E(C(1)) ≤ E(C(2)).

It can be easily shown (see [51]) that strong monotonicity implies P0-quantile

monotonicity and weak monotonicity. Moreover, P0-quantile monotonicity for all 0 ≤ P0

≤ 1 implies strong monotonicity and if weak monotonicity holds for all nondercreasing

function of C, then strong monotonicity is implied.

4.3 Threshold and Penalty Policies

Similar to section 3.1, the tuple (Y, l, c) represents the prior knowledge of the graph G. In

consideration of the dynamics of knowledge of the graph G, for convenience, we define

function Y+: A∪B → R as an extension of Y:

(4.3.1) Y+(a) =

0 if X(a) = 0;

1 if X(a) = 1;

Y(a) if a∈ B.

68

We define function l+: A∪B → R as an extension of l:

We define function c+: A∪B → R as an extension of c:

Hence, the tuple (Y+, l+, c+) represents the updated knowledge of the graph G.

We now describe the class of threshold policies and the class of penalty policies.

Basically, both plan shortest paths in the same dynamic manner as what the CR policy

adopts. For any policy in either class, there are four main features: first, the initial

knowledge of the graph G is represented by the tuple (Y, l, c); second, as the agent travels,

the updated knowledge of G is represented by the tuple (Y+, l+, c+); third, a shortest path

plan is made relative to some weight function of Y+, l+, and c+ and in searching the

shortest path, the tie-breaking favors the deterministic arcs; fourth, the policy is balk-free.

What distinguishes a policy from another is the structure of the weight function.

The basic idea of a threshold policy is to screen out those arcs that seem not likely to be

traversable. In the threshold policy, a threshold vector αr is predetermined, with its

l+(a) = l(a) if 0 ≤ Y+(a) < 1;

+∞ if Y+(a) = 1.

c+(a) = c(a) if 0 < Y+(a) < 1;

0 otherwise. (4.3.3)

(4.3.2)

69

component 0 ≤ αe ≤ 1 representing the threshold of the arc e ∈ B. This vector is the

criterion for the screening, i.e. for each e ∈ B, if Y(e) ≥ αe, then view e as nontraversable.

The αr -threshold weight function in the shortest path planning is

We use Θ(αr ) to denote the threshold policy that uses theαr -threshold weight function.

The key element of a penalty policy is a penalty function l% (a) = l% (Y+(a), l+(a), c+(a)) >

0 for all a ∈ A∪B that is monotonically increasing with respect to its three arguments.

The penalty function also has the property that l% (0, l+(a), c+(a)) = l(a); l% (Y+(a), l+(a),

c+(a)) → c(a) + l(a) as Y+(a) → 0+; and l% (Y+(a), l+(a), c+(a)) → +∞ as Y+(a) → 1. The

l% -penalty weight function in the shortest path planning is

lW % (a) = l% (Y+(a), l+(a), c+(a)),

for all a ∈ A∪B. We use Ψ( l% ) to denote the penalty policy that uses the l% -penalty

weight function.

By essentially same arguments as those that lead to Theorem 3.2.1 and Theorem 3.2.3,

we have two immediate analytical results:

(4.3.4) W αr (a) =

l(a) if Y+(a) = 0;

+ ∞ if Y+(a) ≥ αa;

c(a) + l(a) if 0 < Y+(a) < αa.

(4.3.5)

70

Theorem 4.3.1. Under the setting of section 4.1, when applying any Θ(αr ) or any Ψ( l% )

to the convergent graph, it’s unnecessary to replan a new shortest path from where the

agent is to t upon the moment when the agent completes a disambiguation that discloses

the next arc to be traversable.

Theorem 4.3.2. Under the setting of section 4.1, for convergent graph, any Θ(αr ) and

any Ψ( l% ) has finite expected cost.

More importantly, we have the following two sensor monotonicity results:

Theorem 4.3.3. Under the setting of section 4.1, for traversing parallel graph G with V=

{s, t}, Θ(αr ) is weakly monotone with respect to sensor for any αr ∈ [0, 1]n.

Proof. Let S(1) = ( (1)0F , (1)

1F ) and S(2) = ( (2)0F , (2)

1F ) be two sensors such that for any 0 ≤ y

≤ 1, (1)0F (y) ≥ (2)

0F (y) and (1)1F (y) ≤ (2)

1F (y). For any realization of G, suppose Y(1) ~ S(1) and

Y(2) ~ S(2). Let C(1) = C(G, s, t, X, Y(1), Θ(αr )) and C(2) = C(G, s, t, X, Y(2), Θ(αr )). We need

to show that E(C(1)) ≤ E(C(2)).

At first, without losing generality, we assume that |A| = 1 since only the shortest

deterministic arc may affect the cost. Suppose A = {a}. The proof is by induction on |B|.

71

The base case is |B| = 0. In base case, C(1) = C(2) = l(a), which is constant, hence the weak

monotonicity trivially holds. Suppose (inductive hypothesis) the weak monotonicity

holds for |B| = k ≥ 0, we then consider the case |B| = k +1.

Consider e0 = arg mine B∈

(c(e) + l(e)). If l(a) ≤ c(e0) + l(e0), then C(1) = C(2) = l(a), the weak

monotonicity is trivially true. We now consider the nontrivial case that l(a) > c(e0) + l(e0).

In this case, Θ(αr ) returns e0 in its first plan.

Suppose 0 ≤ 0eα ≤ 1 is the threshold for e0. For convenience, denote γ = E(C | Y(e0)

<0eα , X(e0) = 0), ξ = E(C | Y(e0) <

0eα , X(e0) = 1), and η = E(C | Y(e0) ≥ 0eα ), where C =

C (G, s, t, X, Y, Θ(αr )) and Y(e) ~ S for any e ∈ B. Obviously,

γ = c(e0) + l(e0) ≤ η < η + c(e) = ξ < +∞,

with the last strict inequality due to Theorem 4.3.2. We can compute (via conditioning)

P(Y(e0) <0eα , X(e0) = 0)

= P(Y(e0) <0eα | X(e0) = 0)⋅P(X(e0) = 0)

= (1−ρ(e0))⋅F0(0eα ),

P(Y(e0) <0eα , X(e0) = 1)

= P(Y(e0) <0eα | X(e0) = 1)⋅P(X(e0) = 1)

= ρ(e0)⋅F1(0eα ),

72

and

P(Y(e0) ≥0eα )

= P(Y(e0) ≥0eα | X(e0) = 0)⋅P(X(e0) = 0) + P(Y(e0) ≥

0eα | X(e0) = 1)⋅P(X(e0) = 1)

= (1−ρ(e0))⋅[1−F0(0eα )] +ρ(e0)⋅[1−F1(

0eα )].

Hence a recursive formula for evaluating E(C) is

E(C)

= γ⋅(1−ρ(e0))⋅F0(0eα ) + ξ⋅ρ(e0)⋅F1(

0eα ) + η⋅(1−ρ(e0))⋅[1− F0(0eα )] + η⋅ρ(e0)⋅[1−F1(

0eα )].

Now

E(C(1))

= γ(1)⋅(1−ρ(e0))⋅0

(1)0 ( )eF α + ξ(1)⋅ρ(e0)⋅

0

(1)1 ( )eF α

+ η(1)⋅(1−ρ(e0))⋅[1−0

(1)0 ( )eF α ] + η(1)⋅ρ(e0)⋅[1−

0

(1)1 ( )eF α ]

and

E(C(2))

= γ(2)⋅(1−ρ(e0))⋅0

(2)0 ( )eF α + ξ(2)⋅ρ(e0)⋅

0

(2)1 ( )eF α

+ η(2)⋅(1−ρ(e0))⋅[1−0

(2)0 ( )eF α ] + η(2)⋅ρ(e0)⋅[1−

0

(2)1 ( )eF α ].

Note that γ(1) = γ(2). By inductive hypothesis, we have ξ(1) ≤ ξ(2) and η(1) ≤ η(2). Hence,

E(C(1))

≤ γ(2)⋅(1−ρ(e0))⋅0

(1)0 ( )eF α + ξ(2)⋅ρ(e0)⋅

0

(1)1 ( )eF α

+ η(2)⋅(1−ρ(e0))⋅[1−0

(1)0 ( )eF α ] + η(2)⋅ρ(e0)⋅[1−

0

(1)1 ( )eF α ],

73

which implies

E(C(1)) − E(C(2))

≤ γ(2)⋅(1−ρ(e0))⋅[0

(1)0 ( )eF α −

0

(2)0 ( )eF α ] + ξ(2)⋅ρ(e0)⋅ [

0

(1)1 ( )eF α −

0

(2)1 ( )eF α ]

+ η(2)⋅(1−ρ(e0))⋅[0

(2)0 ( )eF α −

0

(1)0 ( )eF α ] + η(2)⋅ρ(e0)⋅[

0

(2)1 ( )eF α −

0

(1)1 ( )eF α ]

= (γ(2) − η(2))⋅(1−ρ(e0))⋅[0

(1)0 ( )eF α −

0

(2)0 ( )eF α ]

+ (η(2) − ξ(2))⋅ρ(e0)⋅ [0

(2)1 ( )eF α −

0

(1)1 ( )eF α ].

By γ(2) ≤ η(2) ≤ ξ(2) and Y(1)fY(2), we have E(C(1)) − E(C(2)) ≤ 0.

Corollary 4.3.4. Under the setting of section 4.1, for traversing the parallel graph,

Y(1)fY(2) implies E[C(G, s, t, X, Y(1), Θ( *1αr ))] ≤ E[C(G, s, t, X, Y(2), Θ( *

2αr ))], where Y(1) ~

S(1), Y(2) ~ S(2), *1αr =

[0,1]arg min

nα∈rE[C(G, s, t, X, Y(1), Θ(αr ))], and *

2αr =

[0,1]arg min

nα∈rE[C(G, s, t, X,

Y(2), Θ(αr ))].

Proof. By the definitions of *1αr and *

2αr and Theorem 6, we have

E[C(G, s, t, X, Y(1), Θ( *1αr ))]

≤ E[C(G, s, t, X, Y(1), Θ( *2αr ))]

≤ E[C(G, s, t, X, Y(2), Θ( *2αr ))].

Theorem 4.3.5. Under the setting of section 4.1, for traversing the convergent (relative to

t) graph G with |B| = 1, both Θ(α) and Ψ( l% ) are strongly monotone with respect to sensor

74

for any threshold α ∈ [0, 1] and l% -penalty function that are associated with the single

nondeterministic arc.

Proof. Suppose B = {e = (u, v)}. We first consider any threshold policy Θ(αe) with 0 ≤ αe

≤ 1 being the threshold for e. Suppose P1, with length L1, is a shortest s-t path that does

not pass e; P2, with length L2, is a shortest s-u path; P3, with length L3, is a shortest v-t

path; and P4, with length L4, is a shortest u-t path that does not pass e. Since G is

convergent with respect to t, the three paths P1, P3, and P4 must exist. If P2 does not exist,

then C(G, s, t, X, Y, Θ(αe)) = L1, the strong monotonicity trivially holds. We consider the

case that P2 exists, as Figure 4.3.1 shows.

Note that L1 ≤ L2 + L4 < c(e) + L2 + L4. If L1 ≤ c(e) + l(e) + L2 + L3, then C(G, s, t, X, Y,

Θ(αe)) = L1, the strong monotonicity trivially holds too. We consider the nontrivial case

that L1 > c(e) + l(e) + L2 + L3. In this case, Θ(αe) returns P2 ⊕ e ⊕ P3 in its first plan. It’s

easy to find that

Figure 4.3.1: Analysis of the convergent graph G with single nondeterministic arc

s u v t

P1, L1

P2, L2 l(e), ρ(e) , c(e) X(e), Y(e)

P3, L3

P4, L4

75

which implies that distribution function of C(G, s, t, X, Y, Θ(αe)) is

HC(c(e) + l(e) + L2 + L3) = (1−ρ(e))⋅F0(αe);

HC(L1) = (1−ρ(e))⋅[1 −F0(αe)] +ρ(e)⋅[1−F1(αe)] + (1−ρ(e))⋅F0(αe)= 1 −ρ(e)⋅F1(αe)

HC(c(e) + L2 + L4) = 1.

This cost distribution function implies the strong monotonicity since improving sensor

means F0(αe) increases and F1(αe) decreases.

We then consider the penalty policy Ψ( l% ) with the argument l% = l% (Y(e), l(e), c(e)) being

the penalty function of e. It’s crucial to compare L1 with L2 + l% (Y(e), l(e), c(e)) + L3. Note

that l% (Y(e), l(e), c(e)) > c(e) + l(e). Hence if L1 ≤ c(e) + l(e) + L2 + L3, then C(G, s, t, X,

Y, Ψ( l% )) = L1, the strong monotonicity trivially holds. We now consider the nontrivial

case that L1 > c(e) + l(e) + L2 + L3. It’s not hard to find that

C(G, s, t, X, Y, Θ(αe)) =

c(e) + l(e) + L2 + L3 if Y(e) < αe and X(e) = 0;

L1 if Y(e) ≥ αe;

c(e) + L2 + L4 if Y(e) < αe and X(e) = 1,

C(G, s, t, X, Y, Ψ( l% )) =

c(e) + l(e) + L2 + L3 if L1 > L2 + l% (Y(e), l(e), c(e)) + L3

and X(e) = 0;

L1 if L1 ≤ L2 + l% (Y(e), l(e), c(e)) + L3;

c(e) + L2 + L4 if L1 > L2 + l% (Y(e), l(e), c(e)) + L3

and X(e) = 1.

76

Define α* = inf{α ∈ [0,1] | L1 ≤ L2 + l% (α, l(e), c(e)) + L3}. Since l% is monotonically

increasing in its first argument, we have that L1 ≤ L2 + l% (Y(e), l(e), c(e)) + L3} if and only

if Y(e) ≥ α*. Hence

This means Ψ( l% ) behaves as same as Θ(α*), hence Ψ( l% ) is strongly monotone with

respect to sensor.

c(e) + l(e) + L2 + L3 if Y(e) < α* and X(e) = 0;

L1 if Y(e) ≥ α*;

c(e) + L2 + L4 if Y(e) < α* and X(e) = 1.

C(G, s, t, X, Y, Ψ( l% )) =

77

5 Traversing Minefield

This chapter is application-oriented and we focus on the problem of traversing minefield.

Unlike the settings in Chapters 3 and 4, for traversing a minefield, we are usually given

independent markers of some regions in R2, and those markers should be properly carried

to the arcs of a graph that is used to discretize the world. In accordance with the COBRA

system, the markers are from sensor’s readings.

In this chapter, we present the minefield model and an adjusted CR policy for traversing

the minefield and graphically demonstrate the running cases of the adjusted CR policy.

After introducing the experimental setting, based on extensive Monte Carlo simulations

using the RDP simulation programs we developed, we present some numerical and

statistical evidences that the adjusted CR policy is both strongly monotone and weakly

monotone with respect to sensor. We show the monotonicity results from both the

conditional experiments and the unconditional experiments. We also process a set of

potential-mine detection data provided by the COBRA group and show that improving

the current quality of detections does improve the quality of the traversals. The meaning

of the monotonicity results is that the results form a basis for a cost-benefit analysis for

consideration of adoption of superior, but presumably more expensive, sensors.

78

5.1 Minefield Model

According to the COBRA system, we model a minefield as m detected risk centers d1,

d2, …, dm ∈ S ⊂ R2, with each di being either a true detection or a false detection and S

denoting a bounded region. For i = 1, 2, …, m, we let Xi be the indicator that di is a true

detection or not, that is, Xi = 1 if di is a true detection; Xi = 0 if di is a false detection. For

each risk center di, there is disk shape risk region, say Di, that is centered at di and has a

radius ri > 0. Suppose a global sensor generates a marker 0 < Yi < 1 for each di and

guided by the markers, the agent travels from a starting location s ∈ S ⊂ R2 to a target

location t ∈ S ⊂ R2.

A disambiguation of di at a cost ci > 0 happens when the agent is right outside Di but

about to enter Di and di has not been disambiguated. If the disambiguation discloses that

Xi = 1, then di is confirmed to be a true hazard, in this case, the region covered by Di

should be forbidden, that is, the marker should be updated into 1. If the disambiguation

discloses that Xi = 0, then di should be removed and its marker should be updated into 0,

but the region covered by Di may still be questionable since Di may intersect other risk

disks that have not been disambiguated. During the travel, the agent dynamically collects

the new information through (local) disambiguations and updates the knowledge of the

world. Like the extended marker functions in Chapters 3 and 4, we define the extension

79

of Yi’s as

Hence the knowledge on the true-false status of all di’s can be represented as all Yi+’s.

We discretize S into Z2 endowed with the 8-adjacency relation. This is illustrated in

Figure 5.1.1, where δ denotes the cell size (or resolution). The grid graph can be viewed

as directed since each edge with end vertices u and v can be viewed as two-way directed,

that is, (u, v) and (v, u).

To make a shortest path plan in the grid graph, we need to determine the weight function.

(5.1.1) Yi+ =

0 if Xi = 0;

1 if Xi = 1;

Yi if di has not been disambiguated.

vij

vi+1, j

vi-1, j

vi, j+1

vi+1, j+1

vi-1, j+1

vi, j-1

vi+1, j-1

δ

Figure 5.1.1: The grid representation, with 8-adjacency

vi-1, j-1

80

The weight of each arc contains three types of information: marker, length, and

disambiguation cost. For any arc a = (u, v), let l(a) be the (Euclidean) length and note

that l(a) = δ or 2δ depending on , like the previous settings, we use Y+(a) to denote the

extended marker, use l+(a) to denote the extended length and use c+(a) to denote the

extended disambiguation cost.

To define Y+(a), we first define YI : the derived maker of the region covered by ii I

D∈I ,

where I ⊆ {1, 2, …, m} is an index set, as

YI = 1 − (1 )ii I

Y +

∈

−∏

with the convention that (1 )ii I

Y +

∈

−∏ = 1 if I is empty set. For a = (u, v), let Iu = {i | u is

covered by Di}, Iv = {i | v is covered by Di}, and Id = {i | di has not been disambiguated}.

We define the extended marker of a as

Y+(a) = \v uI IY .

Theorem 5.1.1. The extended marker defined as (5.1.3) well characterizes the

information on the arc a = (u, v), that is, Y+(a) = 0 if and only a is known to be

traversable; Y+(a) = 1 if and only if a is known to be nontraversable; and 0 < Y+(a) < 1 if

and only if a is nondeterministic.

Proof. Note that Y+(a) = \v uI IY = 1 − ( \ )\ ( \ )

(1 ) (1 )v u d v u d

i ii I I I i I I I

Y Y+ +

∈ ∈ ∩

⎡ ⎤ ⎡ ⎤− ⋅ −⎢ ⎥ ⎢ ⎥

⎣ ⎦ ⎣ ⎦∏ ∏ . First, \v uI IY

(5.1.2)

(5.1.3)

81

= 1 if and only if ( \ )\

(1 )v u d

ii I I I

Y +

∈

−∏ = 0, which is equivalent to the existence of a j ∈ (Iv\

Iu) \ Id such that Yj+ = 1, i.e. dj has been disambiguated and Xj = 1, hence Y+(a) = 1 if and

only if a is confirmed to be nontraversable. Second, \v uI IY = 0 if and only if

( \ )\

(1 )v u d

ii I I I

Y +

∈

−∏ = 1 and (Iv\ Iu) ∩ Id is empty, hence Y+(a) = 0 if and only if a is

confirmed to be traversable. Third, 0 < \v uI IY < 1 if and only if ( \ )\

(1 )v u d

ii I I I

Y +

∈

−∏ = 1 and

(Iv\ Iu) ∩ Id is nonempty, hence 0 < Y+(a) < 1 if and only if a is potentially traversable but

nondeterministic.

Once the extended marker Y+(a) is determined, the extended length l+(a) can be defined

as

The extended disambiguation cost c+(a) is defined as

This formula says that the arc a = (u, v) needs to be disambiguated if and only if it is

nondeterministic, and furthermore, disambiguation of a means disambiguating all the

undisambiguated risk disks that cover v but not u.

l(a) if 0 ≤ Y+(a) < 1;

+ ∞ if Y+(a) = 1. l+(a) =

[ ( )\ ( )] d

ii I v I u I

c∈ ∩∑ if 0 < Y+(a) < 1;

0 otherwise. c+(a) =

(5.1.4)

(5.1.5)

82

Let Yr

= [Y1 Y2 … Ym]T and cr = [c1 c2 … cm]T, then the prior knowledge of the terrain

(minefield) can be represented by the tuple (Yr

, l, cr ). The updated knowledge of the

terrain can be represented by the tuple (Y+, l+, c+), which is in the same form as that in

section 4.3.

We define the CR weight function for the grid graph as

WCR,Y(a) = l+(a) + ( )1 ( )

c aY a

+

+−

for all a. We still call this weight function as “CR” because we simply replace ρ+(a) in

(3.2.1) with Y+(a). Note that WCR,Y(a) = l(a) if Y+(a) = 0; WCR,Y(a) = + ∞ when Y+(a) = 1;

and l(a) < WCR,Y(u, v) < +∞ when 0 < Y+(a) < 1. Hence, as mentioned in section 3.2, this

CR weight function is well posed with respect to the knowledge of the terrain.

We now present the adjusted CR policy:

Under the knowledge (Y+, l+, c+) of the terrain, find a shortest path relative to WCR,Y from

its current location to the target location t, let the agent follow the shortest path plan until

the agent reaches t or encounters a nondeterministic arc. In the former case, the

navigation process successfully completes; in the later case, the agent disambiguates the

arc by disambiguating all the newly encountered risk disks. The disambiguation results

(5.1.6)

83

update the knowledge (Y+, l+, c+) and a new shortest path from agent’s current location to

the target location t relative to updated WCR,Y is found for the agent to follow.

In the minefield model, the updated knowledge of the world renders the necessity of

replanning even if the encountered nondeterministic arc is disclosed as traversable.

Replanning after new discoveries makes sure that the agent always follows a shortest

path from where it is to t under the updated information. This is different from the

previous settings that each arc is independently marked and the disambiguation of a

nondeterministic arc only removes the uncertainty of this arc alone. Under the previous

settings, the planned path remains to be the shortest if the disambiguation of its first

nondeterministic arc discloses the traversability. However, here in the minefield model,

the disambiguation of an arc also disambiguates other nondeterministic arcs, hence

replanning after the discovery of false detections is still needed.

5.2 Experimental Setting and Results

We simulate m = 100 risk disks within [0, 100]×[0, 100]. The locations d1, d2, …, d100 are

independently uniformly drawn from [10, 90]×[10, 90]. For i = 1, 2, …, 100, we set ri =

4.5. Based on this risk disk size, we choose δ = 1, which means the resolution of the grid

is 100×100. We let s = (0, 0) be the starting location and t = (100, 100) be the target

84

location. This setting is illustrated in Figure 5.2.1.

We simulate the markers from Beta distribution, which has the probability density

function

where A > 0 and B > 0 are the two parameters, Γ is Gamma function. In our experiments,

for each i = 1, 2, …, 100, we draw 0|ii XY = ~ fBeta(y; 3.5 − λ, 3.5 + λ) and 1|

ii XY = ~

fBeta(y; 3.5 + λ, 3.5 − λ), where 0 < λ < 3.5 is the uniparameter. In fact, such parameter

fBeta(x; A, B) = 1 1( ) (1 )

( ) ( )A BA B x x

A B− −Γ +

−Γ Γ

0

if 0 < x < 1;

otherwise, (5.2.1)

Figure 5.2.1: A collection of m = 100 detections. The little squares are the simulated risk centers, the circles are the boundaries of the simulated risk disks.

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

Simulated Risk Centers and Disks

x

y

s

t

85

setting renders us a discerning (Beta) sensor for each λ. And the larger the value of λ, the

better the sensor. As λ → 0, the sensor approaches “useless”; as λ → 3.5, the sensor

approaches “perfect”.

Since all the 100 disks have the same size, we assume a constant disambiguation cost

Cd > 0 of each di. In our experiments, we use Cd = 2.25. It should be noted that the larger

the value of Cd, the more likely it is that the planned path does not traverse any risk disks.

We choose Cd = 2.25 to observe the cases in which disambiguation happens.

Finally for i = 1, 2, …, 100, we independently draw Xi ~ Bernoulli(0.6).

We apply the A* algorithm for the shortest path subproblems implicit in the adjusted CR

policy that uses the CR weight function (5.1.6). The A* algorithm is implemented in its

best-first search version and the Open list is maintained as a binary heap. The algorithm

uses the Euclidean distance as the natural heuristic. In our experiments, the A* algorithm

coded in Matlab 7.1 performs efficiently even when many replannings were required.

Figures 5.2.2 and 5.2.3 show two trajectory realizations. In Figure 5.2.2, the sensor

86

Figure 5.2.2: A realization of trajectory in a real terrain (upper left) and in one of its marked map (upper right) with two close-up views (lower left and lower right) in the real terrain. The solid circles are real detections; the dotted circles are false detections. The little pluses denote the planned next locations. Sensor parameter λ = 0.5. Total cost: 242.14; traveling cost: 215.14; and disambiguation cost: 27. There are totally 12 disambiguations. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM: 21.891 seconds.

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

x

y

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

20 25 30 35 40 45 50

35

40

45

50

55

60

65

x

y

40 45 50 55 60 65 70

60

65

70

75

80

85

90

x

y

87

Figure 5.2.3: A realization of another trajectory in the same real terrain (upper left) and in one of its marked map (upper right) with two close-up views (lower left and lower right) in the real terrain. The solid circles are real detections; the dotted circles are false detections. The little pluses denote the planned next locations. Sensor parameter λ = 3.0. Total cost: 174.77; traveling cost: 168.02; and disambiguation cost: 6.75. There are totally 3 disambiguations. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM: 10.844 seconds.

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

x

y

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10 15 20 25 30 35 40

45

50

55

60

65

70

75

x

y

45 50 55 60 65 70

70

75

80

85

90

95

x

y

88

parameter is set as λ = 0.5. The upper left plot shows the final trajectory displayed in real

terrain in which true hazards are represented as solid circles while false detections are

represented as dotted circles. The upper right plot shows the same trajectory displayed in

one marked map of the same terrain. The marked map is color-coded so that a large value

of the sensor reading for a detection produces deep pink (probable danger) and small

value of the sensor reading produces light pink (probable safety). The little pluses in the

upper left plot denote the planned next locations. Those locations are not necessarily

taken since each one either falls in true hazard disks and replaced by a new plan or it is

just simply replaced by the new plan due to the discrepancy between the new plan and

the old plan. Observe that the little pluses, with close-up views in two lower plots,

demonstrate which risk disks (besides the agent’s trajectory) are disambiguated. The case

in Figure 5.2.2 shows that bad sensor readings not only lead the agent into dead ends but

also result in many unnecessary disambiguations. The final trajectory is long (the length

is 215.14) and the total disambiguation cost is high (there are 12 disambiguations with

total disambiguation cost 27). In Figure 5.2.3, the sensor parameter is set as λ = 3.0 and

the real terrain is the same as that displayed in Figure 5.2.2. Obviously, the final

trajectory displayed in Figure 5.2.3 is much better (the length is 168.02) and the total

disambiguation cost is low (there are only 3 disambiguations with total disambiguation

cost 6.75). Hence we have a vivid example that better sensor yields superior traversal.

89

We seek statistical evidence of strong or weak monotonicity of mark information via

Monte Carlo simulations. We consider 8 values of λ: λ1 = 0.01, λ2 = 0.5, λ3 = 1.0, λ4 =

1.5, λ5 = 2.0, λ6 = 2.5, λ7 = 3.0, λ8 = 3.49.

First, conditioning on the terrain, denoted as T1, that is displayed in Figure 5.2.2, For

each i = 1, 2, …, 8, we simulate 400 marked maps using λi. We totally execute 3200 runs

and each run returns a total cost. The plot of empirical cumulative distribution functions

(ECDF) and the error bar plot of average cost vs. sensor parameter are displayed in

Figure 5.2.4. By inspection, it is seen that both strong monotonicity and weak

monotonicity are exhibited. Second, we perform such “unconditional” experiments: for

each i = 1, 2, …, 8, we execute 2500 runs with each run starting with a randomly

generated terrain and an associated simulated marked map under λi. We still maintain the

basic settings such as the starting location, the target location, the number of disks, the

radii of the disks, the disambiguation cost per disk, and the underlying distribution of

true-false status. Hence the sources of randomness are the terrain (locations of the

detection centers plus the true-false status of the potential hazards) and the markers.

There are totally 20000 runs and each run returns a total cost. The ECDF plot and the

error bar plot of average cost vs. sensor parameter are displayed in Figure 5.2.5. Again by

inspection, it is seen that both strong monotonicity and weak monotonicity are exhibited.

90

Figure 5.2.4: Graphic statistical results of the data from the experiments conditioning on terrain T1. For each i = 1, 2, …, 8, the sample size under λi is 400. Left: plot of ECDFs; Right: error bar plot of average cost vs. sensor parameter

Figure 5.2.5: Graphic statistical results of the data from the unconditional experiments. For each i = 1, 2, …, 8, the sample size under λi is 2500. Left: plot of ECDFs; Right: error bar plot of average cost vs. sensor parameter

0 0.5 1 1.5 2 2.5 3 3.5160

170

180

190

200

210

220

230

240

250

260

λ

Ave

rage

Cos

t

Average Cost vs. Sensor Parameter

0 0.5 1 1.5 2 2.5 3 3.5160

165

170

175

180

185

190

λ

Ave

rage

Cos

t


180 200 220 240 260 280 300 320 340

0

0.2

0.4

0.6

0.8

1

Length

Pro

babi

lity

Empirical CDFs

λ = 0.01λ = 0.5λ = 1.0λ = 1.5λ = 2.0λ = 2.5λ = 3.0λ = 3.5

150 200 250 300 350 400 450 500

0

0.2

0.4

0.6

0.8

1

Length

Pro

babi

lity

Empirical CDFs

λ = 0.01λ = 0.5λ = 1.0λ = 1.5λ = 2.0λ = 2.5λ = 3.0λ = 3.5

91

We quantify our observation on strong monotonicity by performing 1-sided Kolmogorov-

Smirnov (KS) tests to do the pairwise comparisons of the distributions of samples. We

use F(x|T1, λ) to denote the conditional cost distribution function conditioning on terrain

T1 and parameterized by λ. We use F(x, λ) to denote the unconditional cost distribution

function parameterized by λ. The tests are significant at 5% level. The (asymptotic)

p-values are listed in Table 5.2.1. Observe that the small p-values for all

pair-comparisons suggest that F(x|T1, λi) < F(x|T1, λi+1) and F(x, λi) < F(x, λi+1) for i = 1,

2, …, 7. Hence, we have quantified statistical evidence of strong monotonicity from both

conditional experiments and unconditional experiments.

We quantify our observation on weak monotonicity by performing 1-sided t tests to do

the pairwise comparisons of the means of samples. We use E(C|T1,λ) to denote the mean

Pair Comparison Sample size: 400 Sample size: 2500 Alternative hypothesis p-value Alternative hypothesis p-value

λ1 vs. λ2 F(x|T1, λ1) < F(x|T1, λ2) 2.7479e-010 F(x, λ1) < F(x, λ2) 6.2707e-005

λ2 vs. λ3 F(x|T1, λ2) < F(x|T1, λ3) 0 F(x, λ2) < F(x, λ3) 1.6291e-008




λ6 vs. λ7 F(x|T1, λ6) < F(x|T1, λ7) 0 F(x, λ6) < F(x, λ7) 0.0002

λ7 vs. λ8 F(x|T1, λ7) < F(x|T1, λ8) 0 F(x, λ7) < F(x, λ8) 0.0383

Table 5.2.1: 1-sided Kolmogorov-Smirnov tests for pairwise comparisons of the distributions of samples. The tests are at 0.05 significance level.

92

cost conditioning on terrain T1 and under the sensor parameter λ. We use E(C|λ) to

denote the mean cost under the sensor parameter λ. The tests are significant at 5% level.

The p-values are listed in Table 5.2.2. Observe that the small p-values for all

pair-comparisons suggest that E(C|T1,λi) < E(C|T1,λi+1) and E(C|λi) < E(C|λi+1) for i = 1,

2, …, 7. Hence, we have quantified statistical evidence of weak monotonicity from both

conditional experiments and unconditional experiments.

Note that if strong (weak) monotonicity holds conditioning on any terrain, then strong

(weak) monotonicity also holds unconditionally. The experimental results introduced so

far strongly suggest unconditional strong (weak) monotonicity. We suspect that if we

condition on terrain structure like the locations of the detection centers and the true-false

status of the potential hazards, strong (weak) monotonicity almost holds, i.e. the

Table 5.2.2: 1-sided t tests for pairwise comparisons of the means of samples. The tests are at 0.05 significance level.

Pair Comparison Sample size: 400 Sample size: 2500 Alternative hypothesis p-value Alternative hypothesis p-value

λ1 vs. λ2 E(C|T1, λ1) > E(C|T1, λ2) 0.0094 E(C|λ1) > E(C|λ2) 7.3382e-005

λ2 vs. λ3 E(C|T1, λ2) > E(C|T1, λ3) 0.0018 E(C|λ2) > E(C|λ3) 0.0002

λ3 vs. λ4 E(C|T1, λ3) > E(C|T1, λ4) 0.0025 E(C|λ3) > E(C|λ4) 3.4639e-005

λ4 vs. λ5 E(C|T1, λ4) > E(C|T1, λ5) 1.8035e-006 E(C|λ4) > E(C|λ5) 2.2253e-009

λ5 vs. λ6 E(C|T1, λ5) > E(C|T1, λ6) 0 E(C|λ5) > E(C|λ6) 2.1351e-005

λ6 vs. λ7 E(C|T1, λ6) > E(C|T1, λ7) 0 E(C|λ6) > E(C|λ7) 0

λ7 vs. λ8 E(C|T1, λ7) > E(C|T1, λ8) 0 E(C|λ7) > E(C|λ8) 2.3210e-008

93

probability that strong (weak) monotonicity does not hold is almost zero. However, up to

now, only one terrain is conditioned. We next introduce additional experimental results

conditioning on 50 randomly generated (different) terrains. We still maintain the basic

settings as in the unconditional experiments. For each fixed terrain, the sources of

randomness are from the markers. Due to the limited computing resources, we only

consider those two values of λ: λ3 = 1.0 and λ6 = 2.5. For each terrain and each of the

two values of λ, we simulate 100 marked maps. There are totally 10000 runs with each

run returning a total cost. To compare the conditional performance under λ3 = 1.0 and λ6

= 2.5, we perform both KS tests and t tests on the generated data. We perform all the

three types of tests specified by two-tail, left-tail, and right-tail. The p-values of all the

tests are listed in Table 5.2.3. The p-values show that if we improve the uniparameter of

the (Beta) sensor from λ3 = 1.0 to λ6 = 2.5, then for i = 1, 2, …, 50, both F(x|Ti, λ3) <

F(x|Ti, λ6) and E(C|Ti, λ3) > E(C|Ti, λ6) are significant as long as F(x|Ti, λ3) ≠ F(x|Ti, λ6)

and E(C|Ti, λ3) ≠ E(C|Ti, λ6) are significant. This supports the conditional strong (weak)

monotonicity for almost every terrain.

Based on the experimental results we obtained so far, we conjecture that our adjusted CR

policy, under the minefield setting and restricted to the particular Beta sensor, is

unconditionally weakly monotone and strongly monotone. Also, the monotonicity in the

94

p-value Kolmogorov-Smirnov test t-test Alternative hypothesis Alternative hypothesis

i of Ti F(x|Ti, λ3) ≠ F(x|Ti, λ6)

F(x|Ti, λ3) < F(x|Ti, λ6)

F(x|Ti, λ3) > F(x|Ti, λ6)

E(C|Ti, λ3) ≠ E(C|Ti, λ6)

E(C|Ti, λ3) > E(C|Ti, λ6)

E(C|Ti, λ3) < E(C|Ti, λ6)

1 0 0 1 0 0 1 2 0 0 1 1.5400E-05 7.7000E-06 1 3 0.9921 0.6880 1 0.0991 0.0495 0.9505 4 2.8500E-06 1.4200E-06 1 1.1500E-07 5.7300E-08 1 5 3.7000E-09 1.8500E-09 1 7.4100E-10 3.700E-10 1 6 1 0.9897 1 0.3197 0.1599 0.8401 7 0.0205 0.0102 1 0.0528 0.0264 0.9736 8 0 0 0.9593 0 0 1 9 1 0.9593 1 0.1583 0.0792 0.9208

10 0.0994 0.0497 1 0.0255 0.0128 0.9872 11 0 0 1 0 0 1 12 0 0 1 0 0 1 13 0 0 1 8.7800E-09 4.3900E-09 1 14 0 0 1 0 0 1 15 0 0 1 0 0 1 16 0 0 1 0 0 1 17 0 0 1 0 0 1 18 0 0 1 0 0 1 19 0 0 1 0 0 1 20 0 0 1 0 0 1 21 0 0 1 0 0 1 22 0.9921 0.6880 1 0.0136 0.0068 0.9932 23 0.0030 0.0015 1 2.2600E-07 1.1300E-07 1 24 0 0 1 0 0 1 25 0.8938 0.5144 1 0.0042 0.0021 0.9980 26 0 0 1 0 0 1 27 1.4700E-09 7.3300E-10 0.9897 1.36E-10 0 1 28 0 0 1 0 0 1 29 1 0.8469 1 0.0459 0.0230 0.9770 30 0 0 1 0 0 1 31 5.2200E-08 2.6100E-08 1 1.9500E-07 9.7700E-08 1 32 0 0 1 0 0 1 33 0 0 1 0 0 1 34 0 0 1 0 0 1 35 0 0 1 0 0 1 36 0 0 1 0 0 1 37 0 0 1 0 0 1 38 0.4431 0.2241 1 0.0004 0.0002 0.9998 39 0 0 1 0 0 1 40 0 0 1 3.2900E-09 1.6500E-09 1 41 0.4431 0.2241 1 0.0127 0.0063 0.9937 42 9.1200E-09 4.5600E-09 1 7.1800E-08 3.5900E-08 1 43 0 0 1 0 0 1 44 9.1200E-09 4.5600E-09 1 4.5400E-08 2.2700E-08 1 45 0 0 1 0 0 1 46 0 0 1 0 0 1 47 0.1400 0.0700 1 0.0002 7.6900E-05 0.9999 48 0 0 1 0 0 1 49 0.0314 0.0157 1 2.7600E-05 1.3800E-05 1 50 0 0 1 0 0 1

Table 5.2.3: Hypothesis tests for comparing the distributions and means of samples associated with λ3 = 1.0 and λ6 = 2.5. There are 50 randomly generated terrains.

95

two senses is true when conditioning on almost any terrain structure. Such empirical

results would provide the basis for a cost-benefit analysis for consideration of adoption

of superior, but presumably more costly, sensors.

5.3 The COBRA Data

We now present the results of applying our adjusted CR policy to a set of potential-mine

detection data provided by the COBRA group.

The following marked point process realization, referred to in Priebe et al. 1999 [52] and

Olson et al. 2002 [53], has 39 potential mines with x-y coordinates and associated

markers Y listed in Table 5.3.1. The markers were generated by the post-classification

rule in [53]. Each risk disk has radius 50. The later-found true-false status is illustrated in

Figure 5.3.1. The COBRA data has different coordinate system from [0,100] × [0, 100],

which is used in our experiments. However, the terrain, displayed as the left plot of

Figure 5.3.1, can be proportionally projected into [0, 100] × [0, 100]. The projection

renders us a projected terrain, which is displayed as the right plot of Figure 5.3.1. We

perform experiments with the projected terrain, where the radii of all risk disks becomes

5.0, and we choose the value of Cd accordingly. To show the trajectories and other results

in original terrain, we can just simply scale back.

96

First, as in [1], [2], we select s = (0, 800) and t = (0, 100) in the original terrain. Three

trajectories under Cd = 5, 50, 500 in the real terrain and the marked map are displayed in

x y Y x y Y x y Y 321.17 158.27 0.59017 -105.75 262.2 0.25748 95.39 248.12 0.1886854.23 201.12 0.54178 185.31 182.18 0.65266 -78.75 396.14 0.0731

158.17 516.48 0.43525 116.39 110.84 0.44124 -245.28 372.05 0.52154215.13 428.31 0.6189 -128.6 274.12 0.62001 -166.45 180.33 0.61082-145.67 703.06 0.61714 -61.19 345.12 0.17183 -134.53 769.27 0.19386-151.01 572.15 0.56076 -91.27 664.45 0.16675 -258.45 641.03 0.6567 221.12 557.31 0.64047 -82.87 248.29 0.58308 111.6 640.1 0.56529-166.36 299.42 0.49173 105.47 509.8 0.85147 -219.32 313.68 0.57449296.16 163.31 0.11649 -19.93 568.04 0.59937 -455.72 742.57 0.63987163.31 186.14 0.65636 -310.23 402.92 0.65428 -157.1 441.96 0.6444428.31 205.03 0.15269 -320.73 532.23 0.33092 -242.22 321.51 0.65655-79.26 709.99 0.56085 -35.11 242.61 0.1033 -237.86 546.19 0.13793100.4 376.47 0.51487 -169.99 438.9 0.64163 -269.98 379.65 0.52802

Table 5.3.1: x, y-coordinates of the risk centers and the associated markers in the COBRA data

Figure 5.3.1: The COBRA terrain (left) and the projected COBRA terrain (right).

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

COBRA Terrain

x

y

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100Projected COBRA Terrain

x

y

97

Figure 5.3.2: Three trajectories under Cd = 5, 50, 500 displayed in the real terrain (left plots) and in the originally marked map (right plots). When Cd = 5, the total cost is 715, with 3 disambiguations; when Cd = 50, the total cost is 807.99 with one disambiguation; when Cd = 500, the total cost is 1043.3 with no disambiguation. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM in three cases is 1.609 seconds, 1.906 seconds, and 4.063 seconds, respectively.

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9s

t

Cd = 5

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

x

y

s

t

Cd = 5

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

x

y

s

t

Cd = 50

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9s

t

Cd = 50

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

x

y

s

t

Cd = 500

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9s

t

Cd = 500

98

Figure 5.3.2. Observe that the larger the value of Cd, the greater the number of

disambiguations. In all the three cases, the respective final costs 715, 807.99, and 1043.3

already reach the minimums (that are realized under the perfect makers).

Second, we consider the case in which improvement of markers yields less cost. This

time we choose s = (−300, 250) and t = (300, 600) in the original terrain. We choose Cd =

50. We consider the following improving scheme of the markers:

with 0 ≤ λ ≤ 1. Note that λ = 0 corresponds to zero improvement; while λ = 1

corresponds to perfect improvement. As λ goes from 0 to 1, the markers go from Yi’s to

perfect markers. We run the experiments under the mesh of values λ = 0.1i, i = 0, 1, …,

10. Three typical cases are λ = 0, λ = 0.4, and λ = 0.5 with respective total costs: 951.84,

919.41, and 903.26. The corresponding trajectories in the real terrain and the marked

maps associated with λ = 0, 0.4, 0.5 are displayed in Figure 5.3.3. The plot of total cost

vs. improvement parameter is displayed in Figure 5.3.4. Note that improving the original

marker as λ goes from 0 to 1 improves the traversal taken and the improvement appears

to be monotone.

λ + (1 − λ)⋅Yi if Xi = 1;

(1 − λ)⋅Yi if Xi = 0, Yi ′= (5.3.1)

99

Figure 5.3.3: Three trajectories under λ = 0, 0.4, 0.5 in the real terrain (left plots) and in the associated marked maps (right plots). Cd = 50. When λ = 0, the total cost is 951.84 with no disambiguation; when λ = 0.4, the total cost is 919.41 with one disambiguation; when λ = 0.5, the total cost is 903.26 with 3 disambiguations. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM in the three cases are 1.797 seconds, 3.828 seconds and 5.797 seconds, respectively.

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

x

y

s

t

Cd = 50λ = 0

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

s

t

Cd = 50λ = 0

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

x

y

s

t

Cd = 50λ = 0.4

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

s

t

Cd = 50λ = 0.4

-400 -200 0 200 400

0

100

200

300

400

500

600

700

800

900

x

y

s

t

Cd = 50λ = 0.5

x

y

0 20 40 60 80 100

0

10

20

30

40

50

60

70

80

90

1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

s

t

Cd = 50λ = 0.5

100

Figure 5.3.4: Plot of total cost vs. improvement parameter for COBRA runs. The mesh of values of λ are 0.1i, i = 1, 2, …, 10. Starting location: s = (−300, 250); target location t = (300, 600); and disambiguation cost per disk is Cd = 50.

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1880

890

900

910

920

930

940

950

960

970

980Total Cost vs. Improvement Parameter

λ

Tota

l Cos

t

101

6 Deterministic Shortest Path

All the policies introduced in Chapter 4 and Chapter 5 make the shortest path plans that

are dependent on the mark information, which come from sensor’s readings. It is

understood, especially from the experimental results so far, that a sensor of bad quality is

not reliable at all because the misleading information often results in high cost. The

unnecessary cost is due to the over-travel and over-disambiguation. A natural idea is to

let the agent take a deterministic shortest path (if it exists) from the starting location to

the target location if the sensor is too bad. Based on this idea, the length of the

deterministic shortest path forms an important criterion for quantifying the quality of a

sensor. Once a sensor classifier is found, any policy mentioned before can be adjusted so

that high cost due to the bad sensor quality could be avoided.

In this chapter, we first propose the concept of sensor classification based on the

deterministic shortest path, we then analytically show a sensor classifier in a simple case,

finally, based on the results of Monte Carlo simulations on the deterministic shortest path

in minefield, we empirically derive a classifier of the Beta sensors and propose an

adjustment on the adjusted CR policy introduced in Chapter 5. The goal of this

adjustment is to control the expected cost.

102

6.1 Sensor Classification

To formulate the concept of sensor classification, we use the same setting and notations

that appear in Chapter 4. Given Y ~ S, the classification rule under the measure of

expectation is for a policy P to use Y if

E[C(G, s, t, X, Y, P)] < Ld

or to ignore Y if

E[C(G, s, t, X, Y, P)] ≥ Ld,

where Ld is the length of the deterministic shortest path.

We now present an analytical result in a simple case.

Consider a convergent graph G = (V, A, B, l, c, ρ) with |B| = 1 that is illustrated in Figure

4.3.1. Let a policy P be either a threshold policy Θ(α) or a penalty policy Ψ( l% ) with the

arguments α ∈ [0, 1] and l% -penalty function being associated with the single

nondeterministic arc, say e. Note that P either leads to a constant cost L1, which is the

length of the deterministic shortest path, or lead to a random cost in the following form:

C(G, s, t, X, Y, P) =

c(e) + l(e) + L2 + L3 if Y(e) <α̂ and X(e) = 0;

L1 if Y(e) ≥α̂ ;

c(e) + L2 + L4 if Y(e) <α̂ and X(e) = 1,

103

where α̂ ∈ [0, 1]. We only need to discuss the second case and note that the expected

cost is

E[C(G, s, t, X, Y, P)]

= (c(e) + l(e) + L2 + L3)⋅P(Y(e) <α̂ and X(e) = 0) + L1⋅P(Y(e) ≥α̂ )

+ (c(e) + L2 + L4)⋅P(Y(e) <α̂ and X(e) = 1)

= (c(e) + l(e) + L2 + L3)⋅(1−ρ(e))⋅F0(α̂ )

+ L1⋅(1−ρ(e))⋅[1−F0(α̂ )] + L1⋅ρ(e)⋅[1−F1(α̂ )]

+ (c(e) + L2 + L4)⋅ρ(e)⋅F1(α̂ )

= L1 + (c(e) + L2 + L4 − L1)⋅ρ(e)⋅F1(α̂ ) − [L1 − (c(e) + l(e) + L2 + L3)]⋅(1−ρ(e))⋅F0(α̂ ).

Hence the classification rule under the measure of expectation is for P to use Y if

(c(e) + L2 + L4 − L1)⋅ρ(e)⋅F1(α̂ ) < [L1 − (c(e) + l(e) + L2 + L3)]⋅(1−ρ(e))⋅F0(α̂ )

or to ignore Y if

(c(e) + L2 + L4 − L1)⋅ρ(e)⋅F1(α̂ ) ≥ [L1 − (c(e) + l(e) + L2 + L3)]⋅(1−ρ(e))⋅F0(α̂ ).

If F1(α̂ ) = 0, then the rule reduces to use Y if F0(α̂ ) > 0 and ignore Y if F0(α̂ ) = 0. We

consider the nontrivial case that F1(α̂ ) > 0, and let r) (α̂ ) = F0(α̂ ) / F1(α̂ ), then we can

see that the rule becomes comparing the ratio r) (α̂ ) with the ratio

rG = 2 4 1

1 2 3

( ( ) ) ( )[ ( ( ) ( ) )] (1 ( ))

c e L L L eL c e l e L L e

ρρ

+ + − ⋅− + + + ⋅ −

.

To use Y , we require

104

r) (α̂ ) > rG,

which simply says that for policy P, a usually sensor must satisfy a minimum quality

requirement that is specified by the nature of graph. Note that improving the sensor S =

(F1, F2) means increasing F0(α̂ ) and decreasing F1(α̂ ), hence r) (α̂ ) is larger and

inequality (6.1.1) can be satisfied more easily.

A problem is that the theoretical critical ratio rG is unknown since ρ(e) is unknown. We

have to estimate ρ(e) via the observations of X(e) in order to apply rG. In practice,

however, there is only one observation of X(e). Analyzing the critical sensor ratio may

resort to simulations. This requires that the simulation model correctly captures the

nature of the terrain.

6.2 Applied to Minefield

We now turn to the minefield model in section 5.1 and the experimental setting in section

5.2. The policy is the adjusted CR policy that uses the CR weight function (5.1.6) in its

shortest path planning. Note that, in the minefield model, the deterministic s-t shortest

path is random, so is its length.

Under the experimental setting in section 5.2, we randomly generate 2500 terrains and

(6.1.1)

105

for each terrain, we calculate the length of the deterministic shortest path that avoids all

the risk disks. To find a deterministic shortest path, we can simply let the constant

disambiguation cost per disk Cd be sufficiently large (say 450). This setup let the adjusted

CR policy only make a single plan in each run. Let dL denote the expected length of the

deterministic shortest path. As in section 5.2, let E(C|λ) denote the mean cost under the

(Beta) sensor parameter λ. By our experimental results, E(C|λ) is strictly monotonically

decreasing with respect to λ. Hence we may expect E(C|λ) < dL if and only if λ > λ*

for some 0 < λ* < 3.5. The comparison between the average cost of deterministic

traversals and the average cost of nondeterministic traversals is illustrated in Figure 6.2.1.

As we expect, the critical sensor parameter is λ* = 1.8561, which means to use Yi, i = 1,

2, …, 100 if λ > 1.8561; or to ignore those Yi’s otherwise.

0 0.5 1 1.5 2 2.5 3 3.5160

165

170

175

180

185

190

λ

Ave

rage

Cos

t


Average costs of nondeterministic traversals Average length of deterministic shortest pathsCritical sensor parameter

Figure 6.2.1: Deterministic shortest paths vs. nondeterministic traversals under the experimental setting of section 5.2. The critical parameter of the Beta sensor is λ* = 1.8561.

106

7 Summary, Conclusion, and Future Research

7.1 Summary and Conclusions

In this dissertation research, we developed a new formulation of the RDP problem under

the new concept of mark information and the associated new RDP algorithms. Also, for

minefield application, a fast, flexible RDP simulation program that is based on dynamic

A* search was delivered.

We found a new explanation of the A* algorithm with a primal-dual point of view. We

proved the tractability of the problem of traversing probabilistic graph in the special case

of parallel graph. We developed the concept of sensor and the new concept of marke

information based on sensor’s readings. We proposed the class of threshold policies and

the class of the penalty policies with both classes not only considering the markers but

also considering the disambiguation cost. We developed the important concept of sensor

monotonicity and proved some analytical monotonicity results in simple settings. We

developed an RDP simulation program for the minefield model and performed extensive

simulations to provide the numerical and statistical evidences of sensor monotonicity

under the minefield settings that are beyond the reach analytical studies. We also made

some experimental comparisons between the deterministic shortest paths and the

107

nondeterministic traversals.

Based on the operational features of the COBRA system, our new RDP model that is

based on the new concept of mark information well captures the nature of the navigation

and disambiguation problems posed by the COBRA group. This new formulation also

reflects the trend of development of modern navigation systems. That is, collecting

real-time information via the in-situ observation using high technology enhanced devices

such as UAV based sensors.

Experiment results show that the new RDP algorithm we developed for the minefield

model is both efficient and effective. Considerably many running cases show that our

current RDP simulation program can well handle one hundred disks and the graphic

results appear to be reasonable. More importantly, massive simulations have shown that

under our adjusted CR policy, the traversal performance has the property of sensor

monotonicity. The simulation tool, thus, is useful for virtually testing multiple sensors

and providing the information supporting to the cost/benefit analysis on whether a

superior (and presumably more expensive) sensor is worth additional cost.

Although our theoretical analysis is far behind our experimental studies, we did lay the

108

groundwork for possible future advances. For example, we already know that the

problem of traversing the probabilistic parallel graph, with the advent of the CR policy, is

tractable under the independent probability marker assumption. We also know that for

traversing the parallel graph with independently marked nondeterministic arcs, the class

of threshold policies is weakly monotone and for traversing the convergent graph with

only one marked nondeterministic arc, both the class of threshold policies and the class

of penalty policies are strongly monotone. It’s naturally motivated to extend these results

to more general graphs or more general conditions (e.g., dependence).

7.2 Future Research

We suggest several directions for future research:

The first direction is on continual RDP modeling. The goal of this direction is to let the

new formulations of the RDP problem be more and more realistic. The new concept of

the mark information based on sensor’s readings represents a leap of the development of

the RDP modeling. There are also many other issues that should be considered. For

example, modeling a minefield in some costal area must also take the costal type into

consideration. A planner must combine the information of the charted minefield with the

geographic information so that the agent not only avoids the mine threats but also detours

109

the geographically inaccessible regions. A simulated minefield in costal environment is

illustrated in Figure 7.2.1. The terrain is fractally generated using the diamond square

algorithm of Miller 1986 [54] . The future RDP simulation program should be able to

interface with both the detection data and the simulated terrain matrices. Like mentioned

in Chapter 1, the target location might be changeable when the argent travels, or as

illustrated in the right plot of Figure 7.2.1, the target might be a region other than a point.

Hence attention should also be given to the modeling of the target. A more important

concern is the location uncertainty, which is mentioned in [1] as a strong suggestion for

the focus of future endeavors. Various types of naval mines and laying strategies were

introduced in [55], [56]. From the practical point of view, a water minefield pattern is

changeable over time. For example, some mines are drifting either because they are

floating mines or because some moored mines break from their moorings. Usually, a

Figure 7.2.1: Example of minefield setting with costal geography incorporated. Left: fixed target location; Right: a target region.

s

t

s

t

110

naval ship is also equipped with the short-range scouting helicopters that can provide

continual scan information. Therefore, to match practices, the model with one-time prior

knowledge may be replaced by the one that considers multi-stage prior knowledge.

Once a RDP model is established, the next to do are the policy design and the building of

the simulation program. Although our current simulation programs can well handle 100

disks in R2, the applicability is still very limited. Our suggestions for improvement

include

1) When the target is a fixed point in R2 and the grid graph that is used to discretize the

world is large (e.g., 103×103 Z2 endowed with eight adjacency), replace the A*

algorithm with the D* algorithm (or the D* Lite algorithm) for replanning since the

D* replanning enables real-time operation.

2) Model the target and enable the simulation program to handle the changing target or

target region. To accelerate the A* planning and replanning when the target is far

away from the starting location, a grid graph with heterogeneous resolutions (e.g.,

quadtree) can be used.

3) Model the changing pattern of the minefield (especially the location uncertainty) and

enable the simulation program to simulate the multi-channel updates on the

knowledge of the world.

111

4) Design the policies that not only deal with the uncertainty of true-false status but also

deal with the uncertainty of locations. Virtually test various policies and choose those

that have relatively small cost.

5) Enable the simulation program to incorporate the simulated geographic data into the

planning and replanning.

The third direction is on purely theoretical investigation. The preliminary analytical

results we have proved in simple cases, together with the experimental results we have

obtained under more general settings, lead us to make many conjectures. For example,

we conjecture that the CR policy is strongly monotone with respect sensor under general

graph setting. And this result may hold even without the assumption of independence. We

may continue our efforts in proving more general monotonicity results. As mentioned in

Chapter 6, the distribution of the deterministic shortest path under the minefield setting

(in Chapter 5) is vey interesting. We may explore some distributional properties of the

deterministic shortest path. There are still open questions regarding the formulation as

traversing probabilistic graphs (see Mani et al. [57]). Is the problem of traversing some

probabilistic graph more general than the parallel graph also tractable? How well the CR

policy performs for traversing general graphs?

112

Bibliography

[1] C. E. Priebe, D. E. Fishkind, L. Abrams, and C. D. Piatko. Random Disambiguation Paths. Naval

Research Logistics, Vol. 52, pp. 285–292, 2005.

[2] D. E. Fishkind, C. E. Priebe, K. E. Giles, L. N. Smith, and V. Aksakalli. Disambiguation

Protocols Based on Risk Simulation. IEEE Transactions on System, Man and Cybernetics, Part A.

Vol. 37, No. 5, pp. 814 – 823, 2007.

[3] V. Aksakalli, D. E. Fishkind, C. E. Priebe, and X. Ye. The CR Disambiguation Protocol.

Computer and Operations Research, to appear, 2008.

[4] X. Ye, C. E. Priebe, D. E. Fishkind, and L. Abrams. Sensor Information Monotonicity in

Disambiguation Protocols. Submitted for publication, 2008.

[5] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to

Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.

[6] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and

Applications. Prentice Hall, NJ, 1993.

[7] T. Lozano-Perez and M. A. Wesley. An Algorithm for Planning Collision-Free Paths among

Polygonal Obstacles. Communications of the ACM, 22(10), pp. 560-570, 1979.

[8] E. Welzl. Constructing the Visibility Graph for N-Line Segments in O(n2) Time, Information

Processing Letters, 20, pp. 167-171, 1985.

[9] S. K. Ghosh and D. M. Mount. An Output-Sensitive Algorithm for Computing Visibility Graphs.

SIAM Journal on Computing, 20(5), pp. 888-910, 1991.

[10] M. L. Fredman & R. E. Tarjan. Fibonacci Heaps and Their Uses in Improved Network

Optimization Algorithms. Journal of the Association for Computing Machinery, 34(3), pp.

596-615, 1987.

[11] R. Ahuja, K. Mehlhorn, J. B. Orlin, R.E. Tarjan. Faster Algorithms for the Shortest Path Problem.

113

Journal of the Association for Computing Machinery, 37(2), pp. 213-223, 1990.

[12] J. S. B. Mitchell. Shortest Paths among Obstacles in the Plane. In Proceedings of the 9th ACM

Symposium on Computational Geometry, pp. 308-317, 1993.

[13] M. Bern, D. Eppstein, and J. R. Gilbert. Provably Good Mesh Generation. In Proceedings of the

31st IEEE Symposium on Foundations of Computer Science, pp. 231-241, 1990.

[14] J. Hershberger and S. Suri. Efficient Computation of Euclidean Shortest Paths in the Plane,

Proceedings. 34th IEEE Annual Symposium on Foundations of Computer Science, pp. 508-517,

1993.

[15] D. Z. Chen. Developing Algorithms and Software for Geometric Path Planning Problems. ACM

Computing Surveys, 28 (4), Article No. 18, 1996.

[16] J.-C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Norwell, MA, 1991.

[17] Y. K. Hwang and N. Ahuja. Gross Motion Planning: A Survey. ACM Computing Surveys, 24(3),

pp. 219-291, 1992.

[18] B. Grunbaum and G. C. Shephard. Tilings and Patterns. W. H. Freeman and Company, New York,

NY, 1986.

[19] D. Chavey. Tilings by Regular Polygons — II: A Catalog of Tilings. Computers & Mathematics

with Applications 17: 147–165, 1989.

[20] A. Patel. Amits’ Thought on Grids. 2006. http://www-cs-students.stanford.edu/~amitp/game-programming/grids/

[21] H. Samet. An Overview of Quadtrees, Octrees, and Related Hierarchical Data Structures. NATO

ASI Series, Vol. F40, 1988.

[22] S. Kambhampati, L. Davis. Multiresolution Path Planning for Mobile Robots. IEEE Journal of

Robotics and Automation, 2(3), pp. 135- 145, 1986.

[23] D. Z. Chen, R. J. Szczerba, and J. J. Uhran Jr.. A Framed-Quadtree Approach for Determining

Euclidean Shortest Paths in a 2-D Environment. IEEE Transactions on Robotics and Automation,

13(5), pp. 668-681, 1997.

[24] A. Yahja, A. Stentz, S. Singh, and B. Brummit. Framed-Quadtree Path Planning for Mobile

114

Robots Operating in Sparse Environments. In Proceedings, IEEE Conference on Robotics and

Automation, (ICRA), Leuven, Belgium, May, 1998.

[25] P. E. Hart, N. J. Nilsson, and B. Raphael. A Formal Basis for the Heuristic Determination of

Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, SSC4 (2), pp.

100–107, 1968.

[26] N. J. Nilsson. Principles of Artificial Intelligence. Morgan Kaufmann, San Mateo, California,

1980.

[27] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley,

1984.

[28] S. J. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Pearson Education,

2003.

[29] P. Lester. 2005. A* Pathfinding for Beginners. http://www.policyalmanac.org/games/aStarTutorial.htm

[30] A. Patel. 2006. Amit’s A* pages. http://theory.stanford.edu/~amitp/GameProgramming/

[31] R. E. Korf. Depth-First Iterative-Deepening: An Optimal Admissible Tree Search. Artificial

Intelligence, 27 (1), pp. 97–109, 1985.

[32] P. P. Chakrabarti, S. Ghose, A. Acharya and S. C. De Sarkar. Heuristic Search in Restricted

Memory. Artificial Intelligence, 41(2), pp. 197-221, 1989.

[33] S. J. Russel. Efficient Memory-Bounded Search Methods. In ECAI 92: 10th European

Conference on Artificial Intelligence Proceedings, pp1-5, Vienna, Austria. Wiley.

[34] R. E. Korf. Linear-Space Best-First Search. Artificial Intelligence, 62(1): pp. 41-78, 1993.

[35] R. E. Korf and W. Zhang, Frontier Search, Journal of the Association for Computing Machinery,

52 (5), pp. 715-748, 2005.

[36] A. Zelinsky. A Mobile Robot Exploration Algorithm. IEEE Transactions on Robotics and

Automation, 8 (6), December, 1992.

[37] A. Stentz. Optimal and Efficient Path Planning for Partially-Known Environments. In

Proceedings of the IEEE International Conference on Robotics and Automation, May 1994.

115

[38] A. Stentz. The Focussed D* Algorithm for Real-Time Replanning. In Proceedings of the

International Joint Conference on Artificial Intelligence (IJCAI), 1995.

[39] S. Koenig and M. Likhachev. D* Lite. Proceedings of the Eighteenth National Conference on

Artificial Intelligence (AAAI), pp. 476-483, 2002.

[40] S. Koenig and M. Likhachev. Incremental A*. Advances in Neural Information Processing

Systems 14 (NIPS), MIT Press, Cambridge, MA, 2002.

[41] G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path

problem. Journal of Algorithms, 21, pp. 267–305, 1996.

[42] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization, Dover Publications, INC.,

1998.

[43] I. Pohl. Bidirectional Search, in Machine Intelligence 6, B. Meltzer and D. Michie, Eds.,

American Elsevier, New York, pp. 127-140, 1971.

[44] A. Bar-Noy and B. Schieber. The Canadian Traveller Problem. SODA ’91: Proceedings of the

Second Annual ACM-SIAM Symposium on Discrete Algorithms, 1991.

[45] C.H. Papadimitriou and M. Yannakakis. Shortest Paths without A Map, Theoretical Computer

Science, 84, pp. 127–150, 1991.

[46] D.M. Blei and L.P. Kaelbling. Shortest Paths in a Dynamic Uncertain Domain. IJCAI Workshop

on Adaptive Spatial Representations of Dynamic Environments, 1999.

[47] M. Likhachev, G. Gordon, and S. Thrun. Planning for Markov Decision Processes with Sparse

Stochasticity. In Lawrence K. Saul, Yair Weiss, and L´eon Bottou, Editors, Advances in Neural

Information Processing Systems, MIT Press, Cambridge, MA, 2005.

[48] G. Andreatta and L. Romeo. Stochastic Shortest Paths with Recourse. Networks, 18, pp. 193–204,

1988.

[49] G. H. Polychronopoulos and J. N. Tsitsiklis. Stochastic Shortest Path Problems with Recourse.

Networks, 27, pp. 133–143, 1996.

[50] J. S. Provan. A Polynomial-Time Algorithm to Find Shortest Paths with Recourse. Networks, 42,

116

pp. 115–125, 2003.

[51] M. Shaked and J. G. Shanthikumar. Stochastic Orders and Their Applications. Associated Press,

1994.

[52] C.E. Priebe, J.S. Pang, and T. Olson. Optimizing Sensor Fusion for Classification Performance.

Proceedings of the CISST 1999 International Conference, pp. 397–403, 1999.

[53] T. Olson, J. S. Pang, and C. E. Priebe. A Likelihood-MPEC Approach to Target Classification.

Mathematical Programming, 96, pp.1–31, 2002.

[54] G. Miller. The Definition and Rendering of Terrain Maps. Computer Graphics, 20 (4), pp. 9-48,

1986.

[55] G. K. Hartmann and S. C. Truver. Weapons That Wait: Mine Warfare in the U.S. Navy. Naval

Institute Press, Annapolis, 1991.

[56] A. Washburn. Mine Warfare Models. Naval postgraduate school notes, September 2007.

[57] M. Mani, A. Zelikovsky, G. Bhatia, and A. B. Kahng. Traversing Probabilistic Graphs. Technical

Report, University of California at Los Angels, 1999.

Documents

RANDOM DISAMBIGUATION PATHS: MODELS, ALGORITHMS, …