Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
RANDOM DISAMBIGUATION PATHS: MODELS,
ALGORITHMS, AND ANALYSIS
by
Xugang Ye
A dissertation submitted to The Johns Hopkins University in conformity with the
requirements for the degree of Doctor of Philosophy.
Baltimore, Maryland
December, 2008
ii
Abstract
The main problem considered in this thesis is to navigate an agent that is capable of
disambiguating to safely and swiftly traverse a terrain populated with possible hazardous
regions that can overlap. The problem has three main features. First, the planning is made
under uncertainty but without blindness. Second, the agent can disambiguate the
true-false status of any potential hazard as it approaches the vicinity. Third, the
replanning can be made with less uncertainty if there is new information collected en
route. We formulate this problem as a dynamic shortest path problem in directed,
positively weighted graph, that is, any plan is a shortest path from the agent’s location to
the target node in the graph. Each time when a plan is made, the agent moves accordingly
until it encounters uncertainty. The agent then disambiguates, at a cost, the local
uncertainty, adds the disambiguation result(s) to the knowledge of the world, and replans
a new shortest path from where it is to the goal.
In consideration of real-world practice and simulation efficiency, we apply the A*
algorithm for the deterministic shortest path subproblem. We also give the A* algorithm
a new explanation within the primal-dual framework. For the terrain, we assume that
besides the natural topological information, which facilitates the use of the A* search,
iii
there is also additional prior information regarding the likelihood of the true-false status
of the potential hazards. This additional prior information forms the initial knowledge of
the world. We present a navigation policy called CR and its various versions. The policy
integrates the prior information and the information collected en route. We provide both
theoretical and experimental results under different settings. As part of this dissertation
research, a computer program that simulates an agent to traverse a minefield was
developed. As an important tool, the program, combined with Monte Carlo simulations,
helps us dealing with complicated real-world scenarios that are beyond the reach of any
known analytical method. As an application, we used the program to process the Navy’s
reconnaissance data and obtained exiting results.
iv
Acknowledgements
First, I would like to thank my wife Judy H. Wang. She accompanied me through the
hardship of Ph.D. years. She is priceless wealth in my life. Next, I would like to thank
my mother for her great patience in taking care of my little daughter Vicky. Without her,
it was impossible for me and my wife to pursue our academic goals.
I would like to express my deepest appreciation to my advisor Prof. Shih-Ping Han for
his academic guidance. He has been not only a great mentor, but also a great friend. He
patiently spent countless hours in improving my analytical skill. I am grateful for
everything I learned from him and everything he has done for me.
I would like to express my great gratitude to my co-advisor Prof. Carey Priebe, who not
only provided me continual financial support with his ONR grants but also introduced
me into the RDP project that forms the subject of my dissertation research. He was
always willing to help and he always had great judgement. He provided valuable insight
and suggestions during the development of this dissertation.
I would like to send tons of thanks to Prof. Donniell Fishkind and Prof. Lowell Abrams
v
for their great helps to my efforts in developing the computer program for RDP
simulations and in transforming the technical reports into professional academic papers.
For me, they are not just two project supervisors in our RDP team, they are my good
friends. I feel lucky to have such two friends who are not only knowledgeable but also
considerate.
I would like to thank Prof. Daniel Q. Naiman, Prof. Edward Scheinerman, Prof.
Benjamin Hobbs, and Prof. Justin Williams for serving my Ph.D. candidacy exam and
graduate board oral exam. I would also like to thank Dr. Castello for her kind help.
Finally, I would like to thank the Johns Hopkins University Center for Imaging Science
for providing the JHU CIS 16cpu/128GB computational server for my RDP simulations
and extensive experiments.
vi
Contents Page
Abstract ............................................................................................................................... ii
Acknowledgements ............................................................................................................ iv
List of Tables .................................................................................................................... viii
List of Figures .................................................................................................................... ix
1 Introduction .................................................................................................................. 1
1.1 RDP Problems ................................................................................................... 2
1.2 Story of RDP Research ...................................................................................... 5
1.3 Classical Path Planning ..................................................................................... 7
1.4 Recent Developments ........................................................................................ 9
1.4.1 Terrain Modeling .................................................................................. 10
1.4.2 Search Algorithms ................................................................................. 12
1.5 Contributions of This Dissertation Research ................................................... 15
1.6 Organization of the Thesis ............................................................................... 17
2 The A* Algorithm ...................................................................................................... 20
2.1 Best-First Search ............................................................................................. 20
2.2 Primal-Dual ..................................................................................................... 23
2.3 Derivation of the A* ........................................................................................ 26
2.4 Duality ............................................................................................................. 34
vii
2.5 Heuristics ......................................................................................................... 35
2.6 Bidirectional Search ........................................................................................ 39
3 Traversing Probabilistic Graphs ................................................................................. 42
3.1 Probability Markers ......................................................................................... 43
3.2 The CR Policy ................................................................................................. 45
3.3 Parallel Graph .................................................................................................. 49
4 Mark Information via Sensor ..................................................................................... 63
4.1 Setting .............................................................................................................. 64
4.2 Sensor Monotonicity ....................................................................................... 66
4.3 Threshold and Penalty Policies ....................................................................... 67
5 Traversing Minefield .................................................................................................. 77
5.1 Minefield Model .............................................................................................. 78
5.2 Experimental Setting and Results .................................................................... 83
5.3 The COBRA Data ............................................................................................ 95
6 Deterministic Shortest Path...................................................................................... 101
6.1 Sensor Classification ..................................................................................... 102
6.2 Applied to Minefield ..................................................................................... 104
7 Summary, Conclusion, and Future Research ........................................................... 106
7.1 Summary and Conclusions ............................................................................ 106
7.2 Future Research ............................................................................................. 108
Bibliography ....................................................................................................................112
viii
List of Tables
Table 5.2.1: 1-sided Kolmogorov-Smirnov tests for pairwise comparisons of the
distributions of samples…………………………………………………………….
91
Table 5.2.2: 1-sided t tests for pairwise comparisons of the means of samples……..... 92
Table 5.2.3: Hypothesis tests for comparing the distributions and means of samples
associated with λ3 = 1.0 and λ6 = 2.5……………………………………………….
94
Table 5.3.1: x, y-coordinates of the risk centers and the associated markers in the
COBRA data………………………………………………………………………..
96
.
ix
List of Figures
Figure 2.5.1: A example that the primal-dual algorithm starting from some π(0) = h′
does not terminate, where the length of any arc is 1 and the heuristic function h′
is {h′(s) = 0, h′(u) = 0, h′(t) = 0, and h′(vi) = i for i = 1, 2, …}……………………
38
Figure 3.3.1: An example of a general (nonparalell) graph where the optimal policy
requires a balk……………………………………………………………………..
51
Figure 3.3.2: The decision tree of the balk-free policy a1→ a2 → … → am+1 for
traversing parallel graph…………………………………………………………...
52
Figure 3.3.3: The dynamic programming search tree for finding the optimal policy
for traversing the parallel graph in which A = {a} and B = {e1,
e2}.………………………………………................................................................
58
Figure 4.3.1: Analysis of the convergent graph G with single nondeterministic
arc………………………………………………………………………………….
74
Figure 5.1.1: The grid representation, with 8-adjacency…………….......................... 79
Figure 5.2.1: A collection of m = 100 detections…………………………………..... 84
Figure 5.2.2: A realization of trajectory in a real terrain (upper left) and in one of its
marked map (upper right) with two close-up views (lower left and lower right) in
the real terrain……………………………………………………………………...
86
x
Figure 5.2.3: A realization of another trajectory in the same real terrain (upper left)
and in one of its marked map (upper right) with two close-up views (lower left
and lower right) in the real terrain…………………………………………………
87
Figure 5.2.4: Graphic statistical results of the data from the experiments
conditioning on terrain T1………………………………………………………….
90
Figure 5.2.5: Graphic statistical results of the data from the unconditional
experiments………………………………………………………………………..
90
Figure 5.3.1: The COBRA terrain (left) and the projected COBRA terrain (right)…. 96
Figure 5.3.2: Three trajectories under Cd = 5, 50, 500 displayed in the real terrain
(left plots) and in the originally marked map (right plots)………………………...
97
Figure 5.3.3: Three trajectories under λ = 0, 0.4, 0.5 in the real terrain (left plots)
and in the associated marked maps (right plots)…………………………………..
99
Figure 5.3.4: Plot of total cost vs. improvement parameter for COBRA runs………. 100
Figure 6.2.1: Deterministic shortest paths vs. nondeterministic traversals under the
experimental setting of section 5.2………………………………………………...
105
Figure 7.2.1: Example of minefield setting with costal geography incorporated…… 109
1
1 Introduction
This dissertation is centered at a research project called Random Disambiguation Paths
(RDP). The project is supported by Office of Naval Research (ONR). The main problem,
posed by the Coastal Battlefield Reconnaissance and Analysis (COBRA) Group, is to
navigate a combat unit safely and swiftly through a costal environment with mine threats
and to reach a preferable target location.
The COBRA system consists of three primary components ⎯ the COBRA Airborne
Payload, the Tactical Control Software (TCS), and the COBRA Processing Station. The
COBRA Airborne Payload consists of a multi-spectral sensor system that is placed on an
unmanned aerial vehicle (UAV) to conduct reconnaissance and detect threats. The TCS
that is loaded onto the UAV Ground Control Station controls the COBRA Airborne
Payload. Analysis of the data collected by the COBRA Airborne Payload is conducted at
the COBRA Processing Station. A good navigation algorithm, as a function unit of
COBRA Processing Station, plays an important role in Marine Corps’ operational
Maneuvers.
This dissertation research is conducted on RDP modeling and algorithm designing. As
2
the continual effort of the Johns Hopkins University RDP group, this work explores
reasonable RDP models and proposes practical RDP algorithms. There are mainly two
tracks of this research. One is theoretical analysis; the other is experimental, or numerical,
analysis. The theoretical analysis is developed on simple settings. The goal of the
theoretical analysis is to capture important features of RDP models and algorithms and to
provide guidelines for designing methods suitful for complicated real-world scenarios.
The experimental analysis is performed upon much more complicated settings. The goal
of the experimental analysis is to simulate the real-world scenarios and to provide
numerical and statistical evidences of the efficiency and effectiveness of the proposed
algorithms.
1.1 RDP Problems
A RDP problem in broad setting is to navigate an agent that is capable of disambiguation
to safely and swiftly traverse a terrain populated with possible hazardous regions that can
overlap. As prior information, each region is marked with the likelihood that it is a true
hazard. The agent is assumed to be able to disambiguate, at a cost, the true-false status of
a marked region as it reaches the boundary of the region. The meaning of disambiguation
cost lies in the fact that disambiguation slowdowns the agent. Hence when the agent
disambiguates a potential hazardous region, we may think the agent travels additional
3
distance besides the Euclidean distance and the cost is additive to the Euclidean distance
the agent has traveled. The agent should safely reach a target location with minimum
total cost.
The problem has three main features. First, the agent travels in an uncertain environment.
However, the agent is not totally blind since the likelihood markers at least provide the
pre-knowledge of the terrain. Second, the agent can disambiguate the true-false status of
any potential hazard in vicinity as it travels. Third, new information collected en route
enlarges the knowledge of the world and henceforth any new decision making may face
less uncertainty.
Different versions of the problem can arise from different settings. For example, in a
manner that is relatively convenient for theoretical investigation, one may model the
problem as traversing a directed graph that contains independently marked
nondeterministic arcs. In this setting, the agent can be assumed to be able to
disambiguate a nondeterministic arc once it reaches the tail node of the arc. Although
some practical thought may argue that the disambiguation can happen at somewhere
other than the tail node of the arc, the arc, however, can be split so that the assumption is
still applicable.
4
Compared with the graph model with assumption of independent arc-markers, a
minefield setting is much more complicated. In a typical minefield model, the markers
are initially allocated to some disk-shape regions that may overlap. A discretization
process usually generates a directed graph with a lot of its nondeterministic arcs
dependently marked.
Since the disambiguation cost constitutes part of the total cost of traversal, the
assumption on how disambiguation cost is calculated forms an important feature of a
RDP problem. In the graph model with independent-arc-marker assumption, the cost of
disambiguating a nondeterministic arc can simply be viewed as a given parameter. In a
minefield model, we are usually given the cost of disambiguating each disk-shape region.
If we construct a graph to discretize the world, we need to somehow carry the
information to the graph. Also, there may be some constraint on the agent’s
disambiguation capability (e.g., the agent can at most make a certain number of
disambiguations or the agent can at most afford a certain cost in disambiguations).
Marine Corps’ practice also poses the challenging problem in which the target location is
changeable when the agent travels. A reasonable thought is to replace the single target
location with a set of target locations. In minefield consideration, the set of target
5
locations can be a target region. Hence the mission is accomplished as long as the agent
is inside the target region.
More challenging problem can arise when there is more than one agent to navigate. An
issue of a multi-agent RDP problem is that the information collected by an agent can be
shared by other agents. Hence there is more complexity due to the communication among
the agents.
1.2 Story of RDP Research
The effort of the Johns Hopkins University RDP group has been focused on the RDP
problem with fixed target and single agent. Early work also assumes the availability of
the risky regions’ probabilities of nontraversability, which are given to the agent at the
outset. The object RDP was first introduced in Priebe et al. 2005 [1]. An important result
is that under mild assumption, an RDP, with positive probability, strictly reduces
expected cost of traversal compared with any deterministic shortest path. This result
suggests that a RDP algorithm should be able to exploit this benefit as long as it exists.
Both the work in [1] and the follow-up development by Fishkind et al. 2007 [2] explored
the methods of finding a policy that yields small expected cost. Currently known
methods usually take three steps. The first step is to discretize the world by directed
6
graph. In [2], the tangent arc graph (TAG) is applied to the minefield setting. TAG is a
precise map representation. The downside is that constructing a TAG is computationally
demanding when the number of mine detections is large. The second step is to assign
weights to the arcs of the graph. This step is the most important since the weight function
largely determines the quality of the final traversal. This step actually reflects how a
policy uses the makers. The third step is to invoke an efficient shortest path algorithm to
compute a shortest path from where the agent is to the target node in the graph. In [2], the
Dijkstra algorithm with binary heap implementation was used. In this step, the efficiency
of implementation also depends on the data structure rendered in the first step.
Functionally, completion of the third step specifies the plan of the agent’s next move. If
the planned next move is risk-free, then the agent moves on; otherwise, the agent
disambiguates. The disambiguation result(s) will be incorporated into the new weight
function and both the second and third steps will be repeated.
A shortcoming of the weight function in [2] is that it does not contain the disambiguation
cost. In [3], a weight function called CR was introduced. The author of this thesis (as the
co-author of [3]) proved that CR weight function yields the minimum expected cost in a
special setting in which the graph is parallel graph (with only two nodes s and t) and the
nondeterministic arcs are independently marked with the probabilities of
7
nontraversability. Although this result is very limited, it motivates us to apply the CR
weight function to general settings that are much closer to the real-world scenarios.
[4] was mainly done by the author of this thesis. The starting point of this paper is that in
many practices, the markers only represent the estimates of the underlying true-false
status of the potential hazards. And the markers are actually obtained from a sensor’s
readings. Hence a natural question is: does the improvement of the sensor lead to the less
cost? It turns out that the intuitive answer “yes” does not have a trivial validation. The
focus of [4] is henceforth on the sensor monotonicity. Due to the requirement of intensive
Monte Carlo simulations, the grid graph and the A* algorithm with binary heap
implementation were invoked. Massive Monte Carlo simulations under minefield setting
did produce the numerical monotonicity results, which strongly complement two
analytical monotonicity results under simple settings.
1.3 Classical Path Planning
In literature, path planning is concerned with finding paths connecting different locations
in an environment. If the environment takes the form of a graph, in which the nodes (or
vertices) are defined as the locations and the weights of the arcs (or edges) are defined as
the transition costs, then path planning falls into the range of the classical shortest path
8
problems (See Cormen et al. 2001 [5] and Ahuja et al. 1993 [6] for a comprehensive
survey). In the field of Artificial Intelligence (AI), path planning deals with computing
desired paths in a geometric space embedded with forbidden areas or risky regions. One
of the most fundamental geometric path planning problems is to find a shortest path in a
plane populated by a finite number of pre-known static polygonal obstacles, without
passing any interior point of any obstacle. This problem was initially solved by
constructing a visibility graph and invoking a shortest path algorithm on the graph (see
Lozano-Perez and Wesley 1979 [7]). This approach fueled intensive research on
computing visibility graphs. For the worst-case time complexity of computing a visibility
graph see Welzl 1985 [8] and Ghosh and Mount 1991 [9]. In classical path planning, a
visibility graph is usually coupled with the Dijkstra’s algorithm that is implemented with
heap data structure. Reprehensive heap implementations include binary heap, Fibonacci
heap, and radix heap etc. The detailed information on worst-case time complexity can be
found in [5], [6], [10], [11]. Besides the methods that are based on visibility graphs, there
are also shortest path map approaches (e.g., Mitchell et al. 1993 [12]). Quite often, a
quad-tree-style subdivision of the plane (see Bern et al. 1990 [13]) is employed. A
representative composite method that combines the shortest path map approach and the
quad-tree-style subdivision of the plane was provided by Hershberger and Suri 1993
[14].
9
1.4 Recent Developments
Since 1990’s, path planning has found its practical applications in real time strategy (RTS)
computer games (e.g., Command and Conquer and the Age of Empires) and the
real-world navigation systems (e.g., planet rovers, combat ships, and ground armored
vehicles). To practitioners, the challenges of incorporating the existing geometric shortest
path algorithms into those applications are overwhelmingly strong. For example, other
than just considering the static polygonal obstacles on a perfectly “flat” land, the
practical path planning algorithms must be able to deal with very complicated obstacle
shapes, non-flat landscapes, and, if possible, the dynamic environment. The cost of a path
may also depend on more “general” factors than the Euclidian distance. Those factors
may include the types of areas passed through, slopes, turning angles etc (see Chen 1996
[15]). Many known geometric shortest path algorithms are very environment-specific and
depend on sophisticated data structures and geometric procedures (see Latombe 1991 [16]
and Hwang and Ahuja 1992 [17] for good surveys). Hence, it is necessary to develop
simple-to-implement yet reasonably efficient methods that work for more “general” path
planning systems. It is also highly desirable to develop such methods that are compatible
with some “standard” input (e.g., terrain matrices). Under this requirement, a practically
useful path planning method should include at least two features: 1) flexible terrain
modeling, 2) efficient and effective search algorithm.
10
1.4.1 Terrain Modeling
Terrain modeling is the preprocessing phase of the path planning. For the simple case
like polygonal forbidden regions in a flat plane, the exact representation of the world
could be used. Any method of this type needs to have a special data structure to store the
obstacle information (e.g., vertices and edges). In literature, methods that establish exact
representation of the world include visibility graph, Voronoi diagram, and triangulation
etc. Despite the accuracy, this class of methods has little practical application. The more
flexible mapping technique is the space discretization, which extensively appears in the
development of real-time strategy computer games and the recent path planning systems.
The central idea is to decompose the world into mutually exclusive cells regardless of the
obstacles and the area types. For each cell, the reachability or the traversability is defined
deterministically or even probabilistically. The transition cost from one cell to an
adjacent cell is also defined regarding the effort of the agent. For the agent, if a state is
defined as the cell in which the agent is located, then a path from the origin to the goal
can be defined as a sequence of non-dividable feasible state transitions from the initial
state to the target state. The most notable advantage is that the obstacles can be
approximated as the union of specifically labeled cells. The higher the grid resolution,
the higher the map accuracy. Another advantage is that a grid graph can be easily
extended if the size of the map needs to be enlarged. This is usually done with an
11
associated coordinate system. Typically, a cell can be a square, or a hexagon, or a triangle.
One can find in literature that the eight-connected square grid graph is very popular due
to its easy implementation and relatively efficient memory requirement. For more details
on the concepts and algorithms for building a grid network and its coordinate system, see
Grunbaum and Shephard 1986 [18] and Chavey 1989 [19]. An excellent web source can
be found at Patel 2006 [20].
Under some circumstance, some non-uniform grid graphs can be used. For example,
when a flat plane sparsely contains some obstacles, a quadtree (see Samet 1988 [21]) is
more efficient than a regular square grid graph (and also a visibility graph, or a Voronoi
diagram). The reason is that the large empty areas are only coded with very low
resolution, hence both the storage and the searching scale are reduced. The price is the
considerably increased complexity of data structure (see Kambhampati and Davis 1986
[22]). Another disadvantage of quadtree is that a path found with such map representation
is usually jagged. An improvement is to use a new representation called “framed”
quadtree (see Chen et al. 1997 [23] and Yahja et al. 1998 [24]), which is a modified, and
more complicated, version of quadtree. In framed quadtree, cells of the highest resolution
are added along the perimeter of each quadtree region. The non-dividable state transition
is redefined as the shift from one cell of the highest resolution to a neighbor cell of the
12
highest resolution within the same quadtree region. It has been empirically shown (see
[24]) that the path quality can be significantly improved if the quadtree is replaced by its
framed version. However, since the grid graph generated by a framed quadtree could be
much more dense (i.e. a node has much more incident arcs) than that generated by the
corresponding quadtree, a path planning search algorithm executed on such graph could
have much higher time complexity. It has also been empirically shown (see [24]) that the
framed quadtree usually is not advantageous over the regular square grid graph when the
environment is uniformly, highly cluttered.
1.4.2 Search Algorithms
Although the Dijkstra’s algorithm dominates the early path planning literature, recent
favor has been given to the A* algorithm (see Hart et al. 1968 [25], Nilsson 1980 [26],
Pearl 1984 [27], Russell 2003 [28] , Lester 2005 [29], and Patel 2006 [30]). Unlike the
Dijkstra’s search, which is blind, the A* search is informed. But A* search is applicable
only if there exists heuristic estimate of the “distance” from every node of the (directed)
graph to the target node. Provably, the final shortest path tree constructed by the A*
algorithm that uses a so-called consistent heuristic is smaller than that constructed by the
Dijkstra’s algorithm. Empirically, the A* algorithm is much more efficient than the
Dijikstra algorithm for finding a least-cost path from an origin to a goal in a graph that is
13
embedded in a Euclidean space. Although there is computational cost for evaluating the
heuristics, the benefit brought by the “informed” search overwhelms. In general, the time
complexity of the A* algorithm depends on the heuristic. A good heuristic has too-fold
meanings: first, it estimates the distance from every node of the graph to the target node
well and the estimate satisfies the triangle inequality; second, it is not expensive to
evaluate. A heuristic that is better in these two senses leads to less search effort.
Like the Dijkstra’s algorithm, the main problem of the A* algorithm is also the memory
requirement. This problem is serious when the graph is very large and the distance
between the origin and the goal is long. Several representative variants of the A*
algorithm that are memory bounded are IDA* (Korf 1985 [31]), MA* (Chakrabarti et al.
1989 [32]), SMA* (Russel 1992 [33]), and RBFS (Korf 1993 [34]) etc. They were
mainly designed for avoid exponential storage growth in game tree search (see [26], [27],
[28]). A very recent memory-saving variant of the A* algorithm is called Frontier A* (see
Korf 2005 [35]). This algorithm works for sparse graph and only returns the length of the
shortest path (not the path itself), after one run. To find the solution path, a
divide-and-conquer technique (also see [35]) is required, hence repeated A* search with
decreasing scale. It is advised that for the problem with moderate scale, the A* algorithm
is still the best choice as long as a good heuristic can be found.
14
When the environment of path planning is dynamic, the A* algorithm can be
implemented as its replanning version (or dynamic version), that is, finding a new
shortest path given the updated knowledge of the world. Extensive research efforts have
been put into the replanning problem where the target is fixed. Zelinsky 1992 [36]
adopted the brute-force A* replanning in which a shortest path from where the agent is to
the goal is found from scratch. Stentz 1994 [37] pointed out that reusing the information
gained by previous searches may improve the replanning efficiency when the
environment is expansive, the goal is far away, and the map update is very locally around
the agent’s location. The D* algorithm [37] was designed based on this point. A later
improved version is called Focused D* Algorithm (see Stentz 1995 [38]). Experiments
on partially known or unknown fractally generated large terrain have shown that the D*
replanning is far more efficient than the brute-force A* replanning. Besides the D*
algorithm, there is another functionally same but algorithmically different replanning
algorithm called D* Lite (see Koenig and Likhachev 2002 [39]), which is the “reversed”
version of an earlier algorithm called LPA* (see Koenig and Likhachev 2002 [40]). The
LPA* algorithm maintains a shortest path from the starting node to the target node in the
graph and it was developed from another algorithm called DynamicSWSF-FP (see
Ramalingam and Reps 1996 [41]), which maintains a shortest path from a single source
node to all the other nodes in the graph by processing the so-called inconsistent node list
15
in right order. It is advised that what replanning algorithm to choose for a replanning
problem should be based on the specific feature of the problem.
1.5 Contributions of This Dissertation Research
The main contribution of this dissertation research is a new formulation of the RDP
problem under the new concept of mark information and the new RDP algorithms. Also,
for minefield application, a fast, flexible RDP simulation program that is based on
dynamic A* search was delivered.
The theoretical contributions mainly include:
1) We found a new explanation of the A* algorithm based on the primal-dual framework.
More specifically, we have shown that if a consistent heuristic function is available,
then a special initial feasible solution to the dual model of the shortest path problem
can be constructed such that the primal-dual algorithm, with proper implementation,
becomes the A* algorithm.
2) We developed the concept of sensor and the new concept of mark information based
on the sensor’s readings. We proposed the threshold policy and the penalty policy,
both of which incorporate the markers and the disambiguation cost into the planning
and replanning.
16
3) We proved that the CR policy, which is a special penalty policy that uses the CR
weight function, is an optimal policy in the sense of smallest expected cost for
traversing the probabilistic parallel graph that has an independent probability marker
for each nondeterministic arc.
4) We proved that for the parallel graph with its nondeterministic arcs independently
marked, the threshold policy is weakly monotone with respect to sensor. We also
proved that for any convergent graph with a single nondeterministic arc, both the
threshold policy and the penalty policy are strongly monotone with respect to sensor.
The experimental contributions mainly include:
1) We developed a RDP simulation program that simulates an agent to traverse
minefield. The current version is RDP V2.2, which assumes a fixed target location.
An extended version that is being developed is RDP V2.2.1, which assumes a target
region. The A* algorithm that uses the Euclidean distance as the natural (consistent)
heuristic constitutes the central routine of the programs. The weight function is CR.
And the A* algorithm is implemented as its best-first search version with the Open
list maintained as a binary heap.
2) We performed extensive Monte Carlo Simulations to study the sensor monotonicity
in minefield model. We found the numerical and statistical evidences of the sensor
17
monotonicity from both the conditional experiments and the unconditional
experiments. We found, from the empirical distribution of the large samples, that an
adjusted CR policy, applied to a general minefield setting, is both weakly monotone
and strongly monotone.
3) We performed extensive Monte Carlo Simulations to study the distribution of the
deterministic shortest path in minefield model. From the unconditional experiments,
we found that the adjusted CR policy yields higher average cost than the average
length of the deterministic shortest paths when the quality of the sensor is poor. The
implication of this phenomenon is the trade-off between the sensor quality and the
deterministic shortest path. The trade-off should be quantified with some critical
value(s) of the sensor’s parameter(s).
1.6 Organization of the Thesis
The rest of this thesis is organized as follows:
Chapter 2 introduces the A* algorithm and presents a new derivation of the A* algorithm
that uses consistent heuristic from the primal-dual algorithm for linear programming (LP).
We also explain how the A* iterations improve the dual objective of the LP model of the
shortest path problem and discuss, from the primal-dual point of view, various heuristics
18
and strategies used in the A* search.
Chapter 3 is focused on the CR policy for traversing probabilistic graphs. And special
emphasis is given to the presentation of a theorem on the optimality of the CR policy for
traversing the probabilistic parallel graph that has independent probability markers for
nondeterministic arcs.
Chapter 4 is on the sensor and mark information. We introduce the concept of sensor and
the new concept of the mark information that is based on the sensor’s readings. We
introduce the important concepts of sensor monotonicity. We introduce the threshold
policy and the penalty policy. We also present some analytical monotonicity results under
simple settings.
Chapter 5 is on the minefield model and the sensor monotonicity. We introduce a new
formulation of the minefield model and present an adjusted CR policy that is specifically
designed for the minefield application. We graphically demonstrate the running cases of
the RDP simulation program. We present the Monte Carlo simulation results for
supporting the weak monotonicity and strong monotonicity from both the conditional
experiments and the unconditional experiments.
19
Chapter 6 is on the deterministic shortest path in minefield. Based on the Monte Carlo
simulations, we introduce the results of comparison between the length of the
deterministic shortest path and the cost of nondeterministic straversal under the adjusted
CR policy. We also present the suggestion for incorporating the critical value(s) of the
sensor parameter(s) into the design of a policy like the adjusted CR.
Chapter 7 presents summary, conclusions, and suggestions for future research.
20
2 The A* Algorithm
The A* algorithm is the core of our RDP simulation program. It can be used to find a
shortest path from a starting node to a target node in a positively weighted graph. With a
consistent heuristic, the A* algorithm expands a shortest path tree that is rooted at the
starting node, node by node, favorably toward the target node. The more precise the
heuristic estimate of the distance from every node to the target node, the smaller the final
shortest path tree that covers the target node. In this chapter, we introduce the A*
algorithm from the primal-dual point of view. We first set up the problem domain and
introduce the A* algorithm and the primal-dual algorithm; we then use the heuristic to
construct an initial feasible solution to the dual and propose a best-first search (see [27])
version of the primal-dual algorithm; We show that this version of the primal-dual
algorithm behaves essentially as same as the A* algorithm that uses the same heuristic;
finally, we present some interesting implications of this result.
2.1 Best-First Search
As a popular best-first search method, the A* algorithm maintains two node lists
throughout. One is the Open list, which consists of those nodes that are temporally
21
labeled with the estimates of the distances from the starting node; the other is the Closed
list, which consists of those nodes that are permanently labeled with the exact distances
from the starting node. We now set up the problem domain and give a description of the
algorithm.
We consider a directed, positively weighted simple graph denoted as G = (V, A, W, δ, b),
where V is the set of nodes, A is the set of arcs, W: A → R is the weight function, δ > 0 is
a constant such that δ ≤ W(a) < +∞ for all a ∈ A, and finally b > 0 is a constant integer
such that |{v | (u, v) ∈ A or (v, u) ∈ A}| ≤ b for all u ∈ V. Suppose we want to find a
shortest s-t (directed) path in G, where s ∈ V is a specified starting node and t ∈ V is a
specified terminal node. Further suppose that there exists a heuristic function h: V→ R
such that h(v) ≥ 0 for all v ∈ V, h(t) = 0, and W(u, v) + h(v) ≥ h(u) for all (u, v) ∈ A. Such
h is called consistent heuristic. According to [25], [26], [27], the A* algorithm that uses
such h is complete, that is, it can find a shortest s-t path in G as long as there exists an s-t
path in G. The algorithm can be stated as follows. It searches from s to t.
The A* Algorithm
Notations:
h: heuristic
22
O: Open list
E: Closed list
d: distance label
f: node selection key
pred: predecessor
Steps:
Given G, s, t, and h
Step 1. Set O = {s}, d(s) = 0, and E = φ.
Step 2. If O = φ and t ∉ E, then stop (there is no s-t path); otherwise, continue.
Step 3. Find u = Ov∈minarg f(v) = d(v) + h(v). Set O = O \ {u} and E = E ∪{u}. If t ∈ E,
then stop (a shortest s-t path is found); otherwise, continue.
Step 4. For each node v ∈ V such that (u, v) ∈ A and v ∉ E,
if v ∉ O, then
set O = O ∪{v}, d(v) = d(u) + W(u, v), and pred(v) = u;
otherwise,
if d(v) > d(u) + W(u, v), then
set d(v) = d(u) + W(u, v) and pred(v) = u.
Go to Step 2.
23
In particular, when h = 0, the A* algorithm stated above reduces to the Dijkstra’s
algorithm. For convenience, for any two nodes u ∈ V and v ∈ V, let dist(u, v) denote the
distance from u to v in G. That is, if there is no u-v path in G, we define dist(u, v) = +∞;
otherwise, we define dist(u, v) to be the length of a shortest u-v path in G. According to
[25], [27], a central property, called strong optimality, of the A* algorithm stated above is
d(u) = dist(s, u) when u ∈ E.
Given a consistent heuristic h, we can define a new weight function Wh such that Wh(u, v)
= W(u, v) + h(v) – h(u) for all (u, v) ∈ A. This change of weights results in a new graph
Gh = (V, A, Wh, δ, b). It has been known from [6] that running the Dijkstra’s algorithm to
find a shortest s-t path in Gh is equivalent to running the A* algorithm stated above to
find a shortest s-t path in G if the two algorithms apply the same tie-breaking rule. The
equivalence is due to the truth that the two algorithms construct identical shortest path
tree that is rooted at s although the distance labels of the same leaf are distinct. The
equivalence tells that the two algorithms can be derived from each other.
2.2 Primal-Dual
Now consider modeling the shortest path problem as linear programming (LP). For
convenience, we define G% = (V, A% , W% , δ, b), where A% = {(u, v) | (v, u) ∈ A} and
24
W% (u, v) = W(v, u) for all (u, v) ∈ A% , that is, G% is formed by reversing the directions of
all the arcs of G. Clearly, to find a shortest s-t path in G is equivalent to find a shortest t-s
path in G% . For each (u, v) ∈ A% , let x(u, v) denote the decision variable. A primal LP
model for finding a shortest t-s path in G% is
Min( , )
( , ) ( , )u v A
W u v x u v∈
⋅∑%
%
Subject to
:( , )
( , )v u v A
x u v∈
∑%
−:( , )
( , )v v u A
x v u∈
∑%
=
x(u, v) ≥ 0 for all (u, v) ∈ A% .
As long as there exists an s-t path in G, it can be easily shown that a binary optimal
solution to Model (2.2.1-2.2.3) exists. In fact, Model (2.2.1-2.2.3) is just to send a unit
flow from a supplier t to a customer s in G% with least cost. The price of sending a unit
flow along any (u, v) ∈ A% is W% (u, v). One option is to find a shortest t-s path in G%
and send a unit flow along this path. The general option is to divide the unit flow into
pieces. However, to minimize the cost, each piece must be sent along a shortest t-s path
in G% . This backward version of the primal LP model has a very nice dual, which can be
expressed with respect to G. It is stated as
1 if u = t; −1 if u = s; 0 for all u ∈ V \ {s, t},
(P)
(2.2.1)
(2.2.2)
(2.2.3)
25
Max π(t) − π(s)
Subject to
π(v) − π(u) ≤ W(u, v) for all (u, v) ∈ A,
where for each v ∈ V, the decision variable π(v) is called the potential of v. Constraint
(2.2.5) can be derived from its original form: π(u) − π(v) ≤ W% (u, v) for all (u, v) ∈ A% .
This constraint says that for each (u, v) ∈ A, a triangle inequality relative to s holds.
An obvious advantage of (D) is that a feasible solution is easy to find. At least, π = 0 is
one. The key idea of the primal-dual algorithm for shortest path problem, illustrated in
[42], is to start from a feasible solution π to (D), search for a feasible solution x to (P)
such that for each (u, v) ∈ A, x(u, v) = 0 whenever W(u, v) − π(v) + π(u) > 0. If such x is
found, then a shortest s-t path in G can be found. In fact, such x corresponds to an s-t
path on which for each arc (u, v), the equality W(u, v) − π(v) + π(u) = 0 holds. If such x
cannot be found, then some procedure is needed to update π such that Constraint (2.2.5)
is still satisfied and Objective (2.2.4) is improved. An important feature of the
primal-dual algorithm is that any equality in Constraint (2.2.5) still holds after π is
updated. Another important feature is that after π is updated, one strict inequality in
Constraint (2.2.5) may become equality. The primal-dual algorithm keeps attempting to
(D)
(2.2.5)
(2.2.4)
26
construct an s-t path in G by using the arcs that correspond to the equalities in Constraint
(2.2.5). According to [42], given the initial feasible solution π = 0 to (D), the primal-dual
algorithm behaves essentially as same as the Dijkstra’s algorithm that searches from s to t
in G. Hence the Dijkstra’s algorithm can be derived from the primal-dual algorithm.
Since the A* algorithm with consistent heuristic can be derived from the Dijkstra’s
algorithm and the Dijkstra’s algorithm can be derived from the primal-dual algorithm, we
have that the A* algorithm that uses consistent heuristic can be derived from the
primal-dual algorithm. But the derivation needs the Dijkstra’s algorithm as the bridge. It
also involves the change of the weight function. In this thesis, we show that if we use h
to construct an initial feasible solution to (D), then applying the primal-dual algorithm
directly leads to the A* algorithm that searches from s to t in G.
2.3 Derivation of the A*
The key point of our derivation is to choose π(0) = − h as the initial feasible solution to
(D). To justify the dual feasibility of π(0), we notice, by the consistency of h, that W(u, v)
+ h(v) ≥ h(u) for all (u, v) ∈ A. Hence W(u, v) − π(0)(v) ≥ −π(0)(u) for all (u, v) ∈ A. The
inequality can be rewritten as π(0)(v) − π(0)(u) ≤ W(u, v), which is exactly what the dual
feasibility requires.
27
A nice property of (D) is that it does not require its solution to be nonnegative. Although
π(0) = − h ≤ 0, what really matters is π(0)(t) − π(0)(s) = − h(t) + h(s) = h(s) ≥ 0. This means
π(0) = − h is a better initial feasible solution to (D) than π(0) = 0. But we still need to
justify the validity of π(0) = − h. That is, we still need to show that the primal-dual
algorithm that starts from the solution π(0) = − h to (D) can find a shortest s-t path in G as
long as there exists an s-t path in G. It suffices to show the equivalence between the
primal-dual algorithm that starts from − h and the A* algorithm that uses h. We now give
the description of the best-first search version of the primal-dual algorithm that starts
from − h as follows.
Algorithm 2.3.1
Notations:
h: heuristic
O: Open list
E: Closed list
π : potential
f1: node selection key
pred: predecessor
θ: potential increment
28
Φ: cumulative potential increase
Steps:
Given G, s, t, and h
Step 1. Set Φ = 0. Set O = {s}, π(s) = −h(s), pred(s) = s, and E = φ. Set W(s, s) = 0.
Step 2. If O = φ and t ∉ E, then stop (there is no s-t path); otherwise, continue.
Step 3. Find u = arg minv O∈
f1(v) = W(pred(v), v) − π(v) + π(pred(v)). Set θ = W(pred(u), u) −
π(u) + π(pred(u)). Set Φ = Φ + θ. Set O = O \ {u} and E = E ∪{u}. Set π(u) =
−h(u) + Φ. If t ∈ E, then stop (a shortest s-t path is found); otherwise, continue.
Step 4. For each v ∈ O, set π(v) = −h(v) + Φ.
Step 5. For each v ∈ V such that (u, v) ∈ A and v ∉ E,
if v ∉ O, then
set O = O ∪{v}, pred(v) = u, and π(v) = −h(v) + Φ;
otherwise,
if W(pred(v), v) + π(pred(v)) > W(u, v) + π(u), then
set pred(v) = u.
Go to Step 2.
For theoretical convenience, right after Step 5, for all v ∈ V \ (E∪O), we define π(v) =
29
−h(v) + Φ. We can show that Algorithm 2.3.1, just like the classical version [2] of the
primal-dual algorithm, maintains the dual feasibility throughout.
Theorem 2.3.1. The dual feasibility stated as Constraint (2.2.5) is maintained when
running Algorithm 2.3.1.
Proof. The proof is inductive. The base case is upon the completion of the first iteration.
At this moment, π = π(0), it’s trivially true. Suppose right before the k-th iteration (k > 1),
the potential of any v ∈ V is π(v) and π satisfies the dual feasibility. We need to show that
right after the k-th iteration, the dual feasibility is still maintained. Right before the k-th
iteration, let [E, O] denote the E-O cut, which is the set of arcs from E to O. We only
need to show that the node selection rule in Step 3 of Algorithm 2.3.1 is equivalent to
finding an arc (u, v) ∈ [E, O] such that W(u, v) − π(v) + π(u) is the minimum. In fact,
( , ) [ , ]min
u v E O∈[W(u, v) − π(v) + π(u)]
= minv O∈
( , ) [ , ]
minu E
u v E O∈∈
[W(u, v) − π(v) + π(u)]
= minv O∈
( , ) [ , ]
min [ ( , ) ( )] ( )u E
u v E O
W u v u vπ π∈∈
⎡ ⎤⎡ ⎤⎢ ⎥+ −⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦⎣ ⎦
= minv O∈
[W(pred(v), v) + π(pred(v)) − π(v)].
Hence, the node selection rule in Step 3 of Algorithm 2.3.1 is equivalent to the arc
selection rule in the classical version of the primal-dual algorithm. Note that the later one
guarantees the satisfaction of the dual feasibility right after the k-th iteration, therefore,
30
the theorem is true.
The potential π of Algorithm 2.3.1 is meaningful. It is somehow closely related to the
distance labels of the nodes of G in the A* algorithm. The potentials of those nodes that
have entered the Closed list E become permanent. We can show that the potential
difference between any u ∈ E and s is actually the length of the shortest s-u path in G.
Theorem 2.3.2. After each iteration of Algorithm 2.3.1, π(u) − π(s) = dist(s, u) for any
node u ∈ E.
Proof. When node u enters E, an s-u pointer path, say P: v1 (= s) ~ v2 ~ … ~ vk (= u), is
determined. Denote L(P) as the length of P. Note that π(v2) = W(v1, v2) + π(v1), …, π(vk)
= W(vk-1, vk) + π(vk-1). By telescoping, we have π(vk) = L(P) + π(v1), i.e. π(u) − π(s) =
L(P). Since there is an s-u path in G, there must be a shortest s-u path in G. This is
because any s-u path in G with length no longer than L(P) has only finite number of arcs,
hence the number of s-u paths in G with length no longer than L(P) is finite. Let P)
: 1v)
(= s) ~ 2v) ~ … ~ kv) (= u) be a shortest s-u path in G and denote L( P)
) as the length of P)
.
By Theorem 2.3.1, Algorithm 2.3.1 maintains the dual feasibility stated as Constraint
(2.2.5). Hence π( 2v) ) ≤ W( 1v) , 2v) ) + π( 1v) ), …, π( kv) ) ≤ W( 1kv −) , kv) ) + π( 1kv −
) ). By
telescoping, we have π( kv) ) ≤ L( P)
) + π( 1v) ), i.e. π(u) − π(s) ≤ L( P)
). The two arguments
31
jointly imply that P is in fact a shortest s-u path and π(u) − π(s) = dist(s, u).
We now show the equivalence between Algorithm 2.3.1 and the A* algorithm we listed at
section 2.1.
Theorem 2.3.3. Under the same tie-breaking rule, Algorithm 2.3.1 is equivalent to the
A* algorithm that uses h, searching from s to t.
Proof. The proof is inductive. We need to show that right after each same iteration, the
two algorithms have the same Open list and Closed list, and for each node in the Open
list, the two algorithms assign the same predecessor in the Closed list. The base case is
upon the completion of the first iteration of the two algorithms, respectively. Note that in
the base case, s is the only node “closed” by the two algorithms. Hence the base case is
trivially true. Suppose (inductive hypothesis) right before the k-th iteration (k > 1) of the
two algorithms, the claim above is true. We now show that the claim still holds right after
the k-th iteration.
Firstly, we need to show that the node selection rule in Step 3 of Algorithm 2.3.1 is
equivalent to the node selection rule in the A* algorithm. Consider the moment
Algorithm 2.3.1 is about to enter its Step 3. At this moment, (arbitrarily) consider a node
32
v ∈ O. Note that v must have a predecessor, say u ∈ E. The selection key of v is W(u, v) −
π(v) + π(u). Note that
W(u, v) − π(v) + π(u)
= W(u, v) − (−h(v) + Φ) + π(u)
= W(u, v) + π(u) − π(s) + h(v) − Φ + π(s).
By Theorem 2.3.2, we have π(u) − π(s) = dist(s, u). Hence
W(u, v) − π(v) + π(u)
= W(u, v) + dist(s, u) + h(v) − Φ + π(s).
We can see that W(u, v) + dist(s, u) = W(u, v) + d(u) = d(v) and d(v) + h(v) = f(v). Also
Note that both Φ and π(s) remain the same when different nodes in O are considered.
Hence by inductive hypothesis, under the same tie-breaking rule, Algorithm 2.3.1 selects
the same node from O as the A* algorithm.
Secondly, we need to show that, under the same tie-breaking rule, after a node, say u, is
removed from O and put into E in Step 3 of Algorithm 2.3.1, the predecessor update on
any node v ∈ O is as same as that in the A* algorithm. In fact, if there is no arc (u, v) ∈ A,
there won’t be any predecessor update on v. If there is an arc (u, v) ∈ A, then in
Algorithm 2.3.1, the update is based on comparing W(pred(v), v) + π(pred(v)) with W(u,
v) + π(u). By Theorem 2.3.2 again,
33
W(pred(v), v) + π(pred(v))
= W(pred(v), v) + π(pred(v)) − π(s) + π(s)
= W(pred(v), v) + dist(s, pred(v)) + π(s)
and
W(u, v) + π(u)
= W(u, v) + π(u) − π(s) + π(s)
= W(u, v) + dist(s, u) + π(s).
Note that π(s) is common, hence the comparison is actually between W(pred(v), v) +
dist(s, pred(v)) = d(v) and W(u, v) + dist(s, u) = W(u, v) + d(u). Under the same
tie-breaking rule, this is just the predecessor update rule in the A* algorithm.
Finally, note that if there is a node v ∈ V \ (E∪O) such that (u, v) ∈ A, then Algorithm
2.3.1 will put it into O and assign it a predecessor u. Hence W(pred(v), v) + π(pred(v)) =
W(u, v) + π(u) = W(u, v) + dist(s, u) + π(s), which implies that v receives a distance label
d(v) = W(u, v) + dist(s, u). This is just what the A* algorithm does.
Combine the three arguments above, we have shown that the two algorithms maintain the
same Open list and Closed list, and for each node in the Open list, they assign the same
predecessor in the Closed list.
34
2.4 Duality
There is a nice property of the A* algorithm that uses consistent heuristic. For the A*
algorithm we listed at the beginning, suppose a node u1 is closed no later than another
node u2, then according to [25], [27], f(u1) ≤ f(u2). This property is called key
monotonicity. By key monotonicity, before t is closed, for any closed node u, f(u) ≤ dist(s,
t). By combining the key monotonicity property of the A* algorithm and Algorithm 2.3.1,
we have the following interesting result:
Theorem 2.4.1. After each iteration of Algorithm 2.3.1, π(t) − π(s) = maxu E∈
f(u) ≤ dist(s,
t).
Proof. The inequality directly follows from the key monotonicity property. We now show
the equality. Consider any iteration of Algorithm 2.3.1. Suppose node u is selected in
Step 3 during this iteration. Upon the completion of this iteration, by the proof of
Theorem 2.3.3, we see that Φ = f(u) + π(s); also note that π(t) = − h(t) + Φ = Φ, hence π(t)
− π(s) = f(u). By monotonicity property, upon the completion of this iteration, f(u) =
'maxu E∈
f(u ′).
If we define d(t) = +∞ when t ∉ E∪O, we then have that π(t) − π(s) ≤ d(t) always holds.
This is because d(t) ≥ dist(s, t) always hold. The inequality π(t) − π(s) ≤ d(t) can be
35
viewed as the weak duality. The duality gap is d(t) − (π(t) − π(s)) = d(t) − maxu E∈
f(u). As
the primal objective, d(t) is monotonically nonincreasing; as the dual objective, π(t) − π(s)
= maxu E∈
f(u) is monotonically increasing. By completeness, if there exists an s-t path in G,
then t will eventually be closed. Upon the moment t is closed, we have maxu E∈
f(u) = f(t) =
d(t). Hence both π(t) − π(s) and d(t) reach optimal. This means the duality gap is
eliminated. The final equality is just the so-called strong duality.
Just like the A* algorithm, Algorithm 2.3.1 may terminate at its Step 2. Simple analysis
(see [25]) of the A* algorithm tells that termination at Step 2 implies there is no s-t path
in G. An explanation within the primal-dual framework is that when Algorithm 2.3.1
terminates at its Step 2, the potentials of all nodes outside E can be arbitrarily raised a
same value without violating Constraint (2.2.5). But the objective (2.2.4) will be
unbounded. This just implies the infeasibility of (P). Just like the A* algorithm,
Algorithm 2.3.1 may not terminate at all. This happens when there is no s-t path in G, but
there are infinitely many nodes that are connected with s via paths. Under this
circumstance, the objective (2.2.4) is also unbounded.
2.5 Heuristics
Our analysis so far tells that the selection of π(0) = –h is sound. Actually, –h defines a
36
class of initial feasible solutions to (D). Suppose we have two consistent heuristics h1 and
h2. Denote PDi as the primal-dual algorithm starting from –hi, i = 1, 2. By completeness,
both PD1 and PD2 will successfully terminate as long as there exists an s-t path in G. If
h1(v) > h2(v) for all v ∈ V \ {t} and there exists an s-t path in G, then A dominance
theorem (see [25], [26], [27], [28]) on the A* algorithm says that E1 ⊆ E2, where Ei
denotes the final Closed list of PDi, i = 1, 2.
There are two interesting extreme cases. One is that π(0) = 0. In this case, Algorithm 2.3.1
reduces to the Dijkstra’s algorithm. This just indicates the derivation of the Dijkstra’s
algorithm from the primal-dual algorithm. The other is that π(0)(v) = − dist(v, t) for all v ∈
V. In this case, Algorithm 2.3.1 only closes the nodes that lie on a shortest s-t path in G,
hence the initial feasible solution to (D) is perfect.
Sometimes, there exits some metric function H: V×V → R such that H(u, v) ≥ 0 for all u,
v ∈ V, H(v, v) = 0 for all v ∈ V, and W(u, v) + H(v, w) ≥ H(u, w) for all (u, v) ∈ A. We
immediately have a consistent heuristic, say hH, defined as hH(v) = H(v, t) for all v ∈ V.
Under some condition, we can also find another consistent heuristic. Suppose we already
have a partial solution that is represented by a shortest path tree T of G% rooted at t.
Suppose this shortest path tree is found by the Dijkstra’s algorithm that searches from t to
37
s in G% . Let ET and OT denote the Closed list and Open list associated with T. We define
It can be easily shown that hH,T is a consistent heuristic, and for any v ∈ V, as two
estimates of dist(v, t), hH,T(v) is at least as good as hH(v), that is, hH(v) ≤ hH,T(v) ≤ dist(v, t).
The primal-dual algorithm can start from π(0) = −hH if H is available. The primal-dual
algorithm can also start from π(0) = −hH,T if both H and T are available. In the case that
both H and T are available, to start from −hH,T may result in less closed nodes upon
closing t than to start from −hH, but it doesn’t mean that the former has better efficiency
since to evaluate hH,T by (2.5.1) is more costly than to evaluate hH. However, an issue that
should be addressed is that the primal-dual algorithm that starts from −hH,T may be able
to successfully find a solution earlier than the moment t is closed. Hence, a different
termination condition might apply. This is actually related to the bidirectional search that
is discussed later.
Sometimes there also exists another type of heuristic h′ such that h′(v) ≥ 0 for all v ∈ V,
h′(s) = 0, and W(u, v) + h′(u) ≥ h′(v) for all (u, v) ∈ A. h′ is called consistent relative to s.
It’s obvious that π(0) = h′ is a feasible solution to (D). The corresponding objective value
hH,T(v) = min
TOτ∈[H(v, τ) + dist(τ, t)] for all v ∈ V \ ET;
dist(v, t) for all v ∈ ET. (2.5.1)
38
of (D) is π(0)(t) − π(0)(s) = h′(t) − h′(s) = h′(t) ≥ 0. It seems that π(0) = h′ is also a better
choice of initial feasible solution to (D) than π(0) = 0, however, the following simple
example shows that the primal-dual algorithm that starts from h′ may not terminate at all
even if there exists an s-t path in G.
The following Figure 2.5.1 shows a simple infinite graph. We want to find a shortest s-t
path. When applying the primal-dual algorithm with π(0) = {h′(s) = 0, h′(u) = 0, h′(t) = 0,
and h′(vi) = i for i = 1, 2, …} as the initial feasible solution to (D), the algorithm
(Algorithm 2.3.1 with initial node potentials set as this π(0)) will close s at first, then v1,
then v2, …. Note that u will never be closed, let alone t. Hence the algorithm won’t be
able to successfully terminate.
...
s
t
u1
1
1
1
1
1
1
v1
v2
v3
v4
π(0)(s) = 0
π(0)(u) = 0
π(0)(t) = 0
π(0)(v1) = 1
π(0)(v2) = 2
π(0)(v3) = 3
π(0)(v4) = 4
Figure 2.5.1: A example that the primal-dual algorithm starting from some π(0) = h′ does not terminate, where the length of any arc is 1 and the heuristic function h′ is {h′(s) = 0, h′(u) = 0, h′(t) = 0, and h′(vi) = i for i = 1, 2, …}.
39
2.6 Bidirectional Search
Although h′ is not a proper choice of the initial feasible solution to (D) for Algorithm
2.3.1 to start from, it can be used for bidirectional search. Consider the primal LP model
with respect to G, in which for each (u, v) ∈ A, we still use x(u, v) to denote the decision
variable:
Min ( , )
( , ) ( , )u v A
W u v x u v∈
⋅∑
Subject to
:( , )
( , )v u v A
x u v∈
∑ −:( , )
( , )v v u A
x v u∈
∑ =
x(u, v) ≥ 0 for all (u, v) ∈ A.
This forward version of primal LP model stands for sending a unit flow from a supplier s
to a customer t in G with least cost. It has the following dual:
Max π(s) − π(t)
Subject to
π(u) − π(v) ≤ W(u, v) for all (u, v) ∈ A.
Similar analysis can show that the primal-dual algorithm that uses –h′ as the initial
1 if u = s; −1 if u = t; 0 for all u ∈ V \ {s, t}
(P′)
(2.6.1)
(D′)
(2.6.3)
(2.6.2)
(2.6.5)
(2.6.4)
40
feasible solution to (D′) is essentially the A* algorithm that searches from t to s in G%
using the heuristic h′. This version of the primal-dual algorithm is exactly the backward
version of Algorithm 2.3.1. If both algorithms are used, searching toward each other, then
a bidirectional A* search can be established. For the backward version of Algorithm 2.3.1,
let O′, π ′ and pred′ denote the corresponding Open list, potential function, and
predecessor function, respectively. When the two search fronts (Open lists) meet, an s-t
path is found, and its length, denoted as L, can be expressed as L = [W(pred(v), v) +
π(pred(v)) − π(s)] + [W(v, pred ′(v)) + π ′(pred ′(v)) − π ′(t)], where v ∈ O∪O′ is the
meeting node. When the search continues, a sequence of lengths, say L1, L2, …, is
generated.
Let L)
= min{L1, L2, …}, which is the length of the shortest s-t path in G found so far.
Since π(t) − π(s) ≤ dist(s, t) ≤ L)
and π ′(s) − π ′(t) ≤ dist(s, t) ≤ L)
, we have a
termination condition for the bidirectional A* search, expressed via π and π ′, as
max{π(t) − π(s), π ′(s) − π ′(t)} = L)
.
This condition is essentially as same as the one that appears in [43]. It can eventually be
satisfied.
Again, the bidirectional A* search reduces to the bidirectional Dijkstra’s search when h =
(2.6.6)
41
h′ = 0. An alternative termination condition, according to [35] and Theorem 2.3.2, that is
expressed via π and π ′, is
minv O∈
[W(pred(v), v) + π(pred(v)) − π(s)]
+ ' '
minv O∈
[W(v′, pred ′(v′)) + π ′(pred ′(v′)) − π ′(t)] = L)
,
which can also eventually be satisfied.
(2.6.7)
42
3 Traversing Probabilistic Graphs
Early efforts formulated the RDP problem as traversing a probabilistic graph and the
objective is to find a policy that has smallest expected cost. This formulation is known as
the Canadian Traveler Problem (CTP) (see Bar-Noy and Schieber 1991 [44]).
Papadimitriou and Yannakakis 1991 [45] proved the intractability of several variants of
CTP. Several modifications and extensions of CTP were discussed in [44], [46], and [47].
CTP is a special case of the stochastic shortest paths with recourse (SPR) problem of
Andreatta and Romeo 1988 [48], who presented a stochastic dynamic programming
formulation for SPR and noted its intractability. Polychronopoulos and Tsitsiklis 1996
[49] also presented a stochastic dynamic programming formulation for SPR and then
proved the intractability of several variants. Provan 2003 [50] proved that SPR is
intractable even if the underlying graph is directed and acyclic.
Although finding an optimal policy in the sense of smallest expected cost is in general
intractable, for some special graph, the problem is tractable, i.e. there exists algorithm
that can find an optimal policy within polynomial-time. The aim of this chapter is to
present the CR policy and show that it is an optimal policy for traversing a special
probabilistic graph ⎯ parallel graph, under the assumption that the traversability status
43
of each nondeterministic arc is independent Bernoulli-distributed.
3.1 Probability Markers
A probabilistic graph is a finite directed graph denoted as G = (V, A, B, l, c, ρ), where V
is the set of nodes that contains a specified starting node s and a specified target node t, A
is the set of deterministic arcs, B is the set of nondeterministic arcs, l: A∪B → R+ is the
length function, c: B → R+ is the disambiguation cost function, ρ: B → (0, 1) is the
probability marker function such that for each e ∈ B, ρ(e) represents the probability that
e is not traversable. Without losing generality, we can assume that l(a) < +∞ and c(a) <
+∞ for all a ∈ A∪B.
For convenience we define X: A∪B → {0, 1} as an indicator function such that in any
realization of G, for each a ∈ A∪B, X(a) = 1 if a is nontraversable in this realization; X(a)
= 0 otherwise. Note that X(a) is deterministic for all a ∈ A and X(e) is random for all e ∈
B. We assume that for each e ∈ B, X(e) is independently Bernoulli(ρ(e))- distributed:
P(X(e) = 1) = ρ(e) and P(X(e) = 0) = 1 −ρ(e).
As required by the early formulation of the RDP problem, it is also assumed that for each
e ∈ B, the probability marker ρ(e) is known as priori. The probability marker might be
44
empirically estimated from the historical data. Each time before the agent traverses the
graph, the agent actually faces a realization of the graph. That is, the indicator X is
realized. As mentioned before, the agent does not have the information of the realized X.
The agent only has the probabilistic information ρ as priori. But the agent can
disambiguate the status of each e ∈ B as long as the agent reaches the tail of e. The
disambiguation of an arc e ∈ B updates the graph G by transferring e from the set B to
the set A and in the meantime, the value of X(e) is found. Hence the information on the
graph G is dynamic. For convenience, we define function ρ+: A∪B → R as an extension
of ρ:
We define function l+: A∪B → R as an extension of l:
We define function c+: A∪B → R as an extension of c:
The tuple (ρ+, l+, c+) represents the updated knowledge of the graph G.
c+(a) = c(a) if 0 < ρ+(a) < 1;
0 otherwise. (3.1.3)
(3.1.1) ρ+(a) =
0 if X(a) = 0;
ρ(a) if a ∈ B.
1 if X(a) = 1;
l+(a) = l(a) if 0 ≤ ρ+(a) < 1;
+∞ if ρ+(a) = 1. (3.1.2)
45
3.2 The CR Policy
We now introduce the CR policy that forms the core of [3]. The CR policy uses the CR
weight function in its shortest path subproblem. Under the setting of section 3.1, the CR
weight function WCR: A∪B → R+ is defined as:
WCR(a) = l+(a) + ( )1 ( )
c aaρ
+
+−,
for all a ∈ A∪B. Note that the extended probability marker ρ+ well characterizes the
status of the uncertainty of the graph G (i.e. what has been known deterministically and
what has been known probabilistically). And this knowledge, together with the extended
length function l+ and disambiguation cost function c+, is incorporated into the setting of
the shortest path subproblem. We call a weight function W is well posed with respect to
the knowledge of the graph G if
By inspection, for any a ∈ A∪B, WCR(a) = l(a) if ρ+(a) = 0; WCR(a) = +∞ if ρ+(a) = 1;
and l(a) < WCR(a) = l(a) + ( )1 ( )
c aaρ−
< +∞ if 0 < ρ+(a) < 1. Hence, the CR weight
function is well posed. In finding the shortest path in G, it prohibits any arc that has been
known to be nontraversable and penalizes any arc that is still nondeterministic.
(3.2.1)
W(a) = l(a) if X(a) = 0;
l(a) < W(a) < +∞ if a ∈ B.
W(a) = +∞ if X(a) = 1; (3.2.2)
46
With the CR weight function, the CR policy can be stated as:
under the knowledge (ρ+, l+, c+) of the graph G, find a shortest path relative to the CR
weight function (3.2.1) in G from the agent’s current location to the target node t, let the
agent follow the shortest path plan until the agent reaches t or encounters a
nondeterministic arc. In the former case, the navigation process successfully completes;
in the later case, the agent disambiguates the nondeterministic arc, say e. Upon the
completion of the disambiguation, e is transferred from the set B to the set A and the
value of X(e) is found. If X(e) = 0, then the agent moves on along the planned path;
otherwise, find a shortest path relative to the updated CR weight function (3.2.1) in G
from the agent’s current location to the target node t and let the agent follow the new
shortest path plan. In searching for the shortest paths, the tie-breaking favors the
deterministic arcs.
There are two immediate questions to answer:
1) As the CR policy above states, if the disambiguation of some arc e ∈ B discloses that
X(e) = 0, then the agent takes the arc e and moves on. The question is “does the
replanning of a new shortest path based on accrued disambiguation results from
where the agent is to t yields a strictly shorter path than the original plan?”
47
2) Under the CR policy, can the agent always reach the target t?
The answer to the first question, under the setting of section 3.1 and the definition of the
CR weight function (3.2.1), is “no”. That is, as long as a disambiguation says ok to go,
there is no need to replan a new shortest path. The argument is summarized as the
following theorem:
Theorem 3.2.1. Under the setting of section 3.1 and using the CR weight function
defined as (3.2.1), it’s unnecessary to replan a new shortest path from where the agent is
to t upon the moment when the agent completes a disambiguation that discloses the next
arc to be traversable.
Proof. Suppose the agent is at u and the next arc e = (u, v) ∈ B but X(e) = 0. Suppose that
before the disambiguation, the planned path from u to t is P; and after disambiguation,
the replanned path from u to t is P′. Suppose before disambiguation, the weight function
is W; after the disambiguation, the weight function is W ′ (the weight function is updated
because the tuple (ρ+, l+, c+) is updated). Note that P passes e, we denote Pvt as the
subpath of P from v to t.
Before the disambiguation, the weight of e is W(e) = l(e) + ( )1 ( )
c eeρ−
, which means the
length of P, denoted as L(P), is
48
L(P) = W(e) + L(Pvt) = l(e) + ( )1 ( )
c eeρ−
+ L(Pvt).
After the disambiguation, the new weight of e is W ′(e) = l(e), which means the new
length of P, denoted as L′(P), is
L′(P) = l(e) + L′(Pvt).
Note that L′(Pvt) = L(Pvt), which implies L′(P) = l(e) + L(Pvt).
Let L′(P′) be the length of P′ after the disambiguation. We now show that L′(P) ≤ L′(P′).
In fact, we can discuss two cases. Case 1: P′ passes e. Denote P′vt as the subpath of P′
from v to t. Note that L′(P′vt) = L(P′vt) and Pvt is a v-t shortest path in G before the
disambiguation, hence
L′(P′) = l(e) + L′(P′vt) = l(e) + L(P′vt) ≥ l(e) + L(Pvt) = L′(P).
Case 2: P′ does not pass e. Note that P is a u-t shortest path in G before the
disambiguation, hence
L′(P′) = L(P′) ≥ L(P) > L′(P).
The answer to the second question is “yes”, if the graph G satisfies certain condition.
One condition is that G is convergent graph, which is defined as follows:
Definition 3.2.2. Graph G is called convergent with respective to t if for any v ∈ V,v ≠ t,
49
there is a v-t path that contains only the arcs in A.
Theorem 3.2.3. Under the setting of section 3.1, for convergent graph G, the CR policy
that uses the CR weight function defined as (3.2.1) has finite expected cost.
Proof. The finiteness of G implies there are only finite possibilities. Suppose (for
contradiction) that there exists a positive probability that the agent pays infinitely large
cost to reach t (i.e. the agent can never reach t). Since |B| is finite, the agent at most pays
finite disambiguation cost. Hence it must be stuck in some cycle. Note that the CR policy
uses positive weight functions and plans shortest paths, by convergence of G, the cycle
will eventually be avoided, a contradiction!
A special convergent graph is the so-called parallel graph. Under the setting of section
3.1, a parallel graph is such G that V = {s, t}, we next show that the CR policy yields the
smallest expected cost for traversing the parallel graph.
3.3 Parallel Graph
Showing the optimality of the CR policy in the sense of expectation in principle requires
comparing the CR policy with all the other policies. A simple argument can show that
there are exponentially many distinct policies even for traversing the parallel graph. In
50
fact, without losing generality, we can assume |A| = 1 since the only the shortest
deterministic arc deserves consideration. Let a ∈ A be the only deterministic arc. For
convenience, let B = {e1, e2, …, em}. There are at least (m + 1)! distinct policies for
traversing the parallel graph with each one simply being a permutation of a, e1, e2, …, em.
A permutation, denoted as a1 → a2 → … → am+1, of a, e1, e2, …, em means to try a1 at
first, if X(a1) = 0, then take a1 and go; otherwise, try a2, …, so on so forth. Of course, all
these permutations only form a class of policies, there can be other policies that do not
belong to this class. Obviously, as m is large, brute-force enumeration must give way to
smarter approach.
Our approach is to prune the dynamic programming search tree (DPST) that contains all
the possible policies and show that the CR policy is an optimal decision sequence in the
DPST. To show this, we first prove a weak result that involves an important concept:
balk.
For general graphs, it may sometimes be advantageous to disambiguate an arc and then,
even if the arc is discovered to be traversable, not to traverse it immediately. We call such
a delay a balk. For example, in Figure 3.3.1, the optimal policy is to traverse a1,
disambiguate e1 at v1, traverse a2, disambiguate e2 at v2, if e2 is traversable, then traverse
51
it; otherwise, take the path v2 ~ v1 ~ t or v2 ~ v3 ~ t based on whether e1 is or isn’t
traversable. Although the early disambiguation of e1 adds an extra expected cost 12⋅1, it
saves an expected length 12⋅12⋅(5 + 5) on a possible later backtrack if both e1 and e2 are
discovered not to be traversable. We say a policy is balk-free if it has the property that,
upon any disambiguation revealing that an arc is traversable, this arc is immediately
traversed. Obviously, the CR policy is a balk-free policy, and the (m+1)! permutations
mentioned as the beginning of this section form the class of balk-free policies.
Theorem 3.3.1. Under the setting of section 3.1, the CR policy for traversing the parallel
graph has the minimum expected cost among the class of balk-free policies.
a1 a2
a3 a4
e1
e2 s t
v1
v2
v3
5 5
5
50
5
500
Figure 3.3.1: An example of a general (nonparalell) graph where the optimal policy requires a balk. Each arc is bidirectional and labeled with its length. The dashed arcs e1 and e2 are probabilistic; both
of these have probability 12
of being traversable and both has disambiguation cost 1.
52
To prove this theorem, we first prove two lemmas.
For convenience, for the balk-free policy a1 → a2 → … → am+1 for traversing the parallel
graph, for i = 1, 2, …, m+1, let ρi = 0 if ai ∈ A;ρi = ρ(ai) if ai ∈ B, let li = l(ai); let ci = 0 if
ai ∈ A; ci = c(ai) if ai ∈ B; and let hi = 1
ii
i
clρ
+−
. Also, denote Ebf(a1→a2→ … →am+1)
as the expected cost of the policy a1 → a2 → … → am+1.
Lemma 3.3.2. Ebf(a1 → a2→ … → am+1)
= (1−ρ1)h1 + ρ1(1−ρ2)h2 + … + ρ1…ρm(1−ρm+1)hm+1 + ρ1…ρm+1l(a).
Proof. We use the following decision tree to calculate Ebf(a1→a2→ … →am+1).
a1
a2 c1 + l1
a3 c1+ c2+ l2
ρ1 1−ρ1
1−ρ2 ρ2
…
am+1
c1 + c2 + … + cm+1 + lm+1
1−ρm+1 ρ m+1
c1 + c2 + … + cm+1 + l(a)
Figure 3.3.2: The decision tree of the balk-free policy a1→ a2 → … → am+1 for traversing parallel graph. Each leaf node is a possible outcome of the cost.
53
As Figure 3.3.2 shows,
Ebf(a1 → a2 → … → am+1)
= (1−ρ1)( c1+ l1) + ρ1(1−ρ2)( c1 + c2 + l2) + …
+ ρ1…ρm(1−ρm+1)( c1 + c2 + … + cm+1 + lm+1)
+ ρ1…ρmρm+1( c1 + c2 + … + cm+1 + l(a)).
By combining the like terms and note that hi = 1
ii
i
clρ
+−
for each i = 1, 2, …, m+1, we
have that
Ebf(a1 → a2 → … → am+1)
= c1 + (1−ρ1) l1 + ρ1c2 + ρ1(1−ρ2) l2 + …
+ ρ1…ρmcm+1 + ρ1…ρm(1−ρm+1)lm+1
+ ρ1…ρmρm+1l(a)
= (1−ρ1)h1 + ρ1(1−ρ2)h2 + … + ρ1…ρm(1−ρm+1)hm+1+ρ1…ρmρm+1l(a).
Lemma 3.3.3. For two balk-free policies a1 → a2 → … → ak → ak+1 → … → am+1 and
a1 → a2 → … → ak+1 → ak → … → am+1 for traversing the parallel graph,
Ebf(a1 → a2 → … → ak+1 → ak → … → am+1)
− Ebf(a1 → a2 → … → ak → ak+1 → … → am+1)
= ρ1…ρk-1(1− ρk)(1− ρk+1)(hk+1 − hk).
Proof. By Lemma 3.3.2 and simple algebraic operations we have
54
Ebf(a1 → a2 → … → ak+1 → ak → … → am+1)
− Ebf(a1 → a2 → … → ak → ak+1 → … → am+1)
= ρ1…ρk-1(1− ρk+1)hk+1 + ρ1…ρk-1ρk+1(1− ρk)hk
− ρ1…ρk-1(1− ρk)hk − ρ1…ρk(1− ρk+1)hk+1
= ρ1…ρk-1(1−ρk)(1−ρk+1)(hk+1 − hk).
Proof of Theorem 3.3.1. By Lemma 3.3.3 we have that hk+1 < hk implies Ebf(a1 → a2
→ … → ak+1 → ak → … → am+1) − Ebf(a1 → a2 → … → ak → ak+1 → … → am+1) ≤ 0.
Hence if hk+1 < hk, after changing the policy a1 → a2 → … → ak+1 → ak → … → am+1 by
swapping ak and ak+1, the expected cost won’t be increased. We can change any balk-free
policy into the CR policy simply by reordering the arcs in ascending order of the CR
weights via bubble sort (see [5]). Since each single adjacent swap doesn’t raise the
expected cost, we have that the expected cost of the CR policy is no greater than any
other balk-free policy.
We now show the strong result.
Theorem 3.3.4. Under the setting of section 3.1, the CR policy for traversing the parallel
graph has the minimum expected cost.
55
To prove this theorem, we first prove two reduction lemmas.
For convenience, let E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k}) denote the minimum
expected cost for the agent to traverse the parallel graph given that the arcs x1, x2, …,
xm+1−k are known to be traversable and the arcs y1, y2, …, yk are nondeterministic, where
y1, y2, …, yk, x1, x2, …, xm+1−k are the distinct members of A∪B. This notation is very
useful for describing the general decision tree for traversing the parallel graph. Since x1,
x2, …, xm+1−k are known to be traversable, E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k}) can be
simplified due to the fact that the deterministic arc with the minimum length matters.
Hence we have
E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k})
= E*({y1, y2, …, yk} | argmin{l(x1), l(x2), …, l(xm+1−k)}).
Let x0 = argmin{l(x1), l(x2), …, l(xm+1−k)}, then
E*({y1, y2, …, yk} | {x1, x2, …, xm+1−k})
= E*({y1, y2, …, yk} | x0).
Lemma 3.3.5. E*({y1, y2, …, yk} | x0)
= min{l(x0), ki ,,2,1
minL=
[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk}\yi | x0) + (1−
ρ(yi))⋅E*({y1, y2, …, yk} \ yi | argmin{l(x0), l(yi)})]}.
56
Proof. Conditioning on first disambiguating any nondeterministic arc yi, the minimum
expected cost of traversal is c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1,
y2, …, yk} \ yi | argmin{l(x0), l(yi)}). Note that this must be compared with l(x0), hence the
lemma is true.
Lemma 3.3.6. If l(x0) ≤ l(yj) for some j, then
E*({y1, y2, …, yk} | x0)
= min{l(x0), ji
ki≠= ,,2,1min
L[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1, y2, …, yk}
\ yi | argmin{l(x0), l(yi)})]}.
Proof. Note that l(x0) ≤ l(yj) implies x0 = argmin{l(x0), l(yj)}. Hence the minimum
expected cost conditioning on first disambiguating yj is
ρ(yj)⋅(c(yj) + E*({y1, y2, …, yk} \ yj | x0))
+ (1− ρ(yj))⋅(c(yj) + E*({y1, y2, …, yk} \ yj | argmin {l(x0), l(yj)}))
= c(yj) + E*({y1, y2, …, yk} \ yj | x0).
Note, by Lemma 3.3.5, that
E*({y1, y2, …, yk} \ yj | x0)
= min{l(x0), ji
ki≠= ,,2,1min
L[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ {yj, yi} | x0) + (1− ρ(yi))⋅E*({y1,
y2, …, yk} \ {yj, yi} | argmin {l(x0), l(yi)})]}.
Also note that for i ≠ j,
57
E*({y1, y2, …, yk} \ {yj, yi} | x0) ≥ E*({y1, y2, …, yk} \ yi | x0)
and
E*({y1, y2, …, yk} \ {yj, yi} | argmin {l(x0), l(yi)})]}
≥ E*({y1, y2, …, yk} \ yi | argmin {l(x0), l(yi)})]}.
We have that
c(yj) + E*({y1, y2, …, yk} \ yj | x0)
> E*({y1, y2, …, yk} \ yj | x0)
≥ min{l(x0), ji
ki≠= ,,2,1min
L[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1, y2, …, yk}
\ yi | argmin{l(x0), l(yi)})]}.
By Lemma 3.3.5 again,
E*({y1, y2, …, yk} | x0)
= min{l(x0), ki ,,2,1
minL=
[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1, y2, …, yk}
\ yi | argmin{l(x0), l(yi)})]}.
Hence
E*({y1, y2, …, yk} | x0)
= min{l(x0), ji
ki≠= ,,2,1min
L[c(yi) + ρ(yi)⋅E*({y1, y2, …, yk} \ yi | x0) + (1− ρ(yi))⋅E*({y1, y2, …, yk}
\ yi | argmin{l(x0), l(yi)})]}.
Remark. Lemma 3.3.5 simply says that the minimum expected cost of traversing the
58
parallel graph can be evaluated by dynamic programming, which has exponential
complexity. Lemma 3.3.6 says there is a way of pruning the DPST such that the optimal
decision sequence still remains in the pruned search tree. In more concrete words, if the
length of a nondeteministic arc is no less than that of the shortest arc known to be
traversable, then the search branch that goes to disambiguating this nondeteministic arc
can be pruned.
It’s illustrative to capture the structure of the DPST from the simple parallel graph in
which A = {a} and B = {e1, e2}. The tree is demonstrated in Figure 3.3.3.
l0 e1 ρ1 1−ρ1
c1+E*({e2}| a) c1+E*({e2}|argmin{l(a), l(e1)})
e2
E*({e1, e2} | a)
ρ2 1−ρ2
c2+E*({e1}| a) c2+E*({e1}|argmin{l(a), l(e2)}) ρ1 1−ρ1
e2 l0 ρ2 1−ρ2
min{l0, l1} e2 ρ2 1−ρ2
l0 e1 ρ1 1−ρ1
min{l0, l2} e1 ρ1 1−ρ1
c2+l0 c2+min{l0, l2} c2+min{l0, l1} c2+min{l0, l1, l2} c1+l0 c1+min{l0, l1} c1+min{l0, l2} c1+min{l0, l1, l2}
Figure 3.3.3: The dynamic programming search tree for finding the optimal policy for traversing the parallel graph in which A = {a} and B = {e1, e2}. ρ1 = ρ(e1), ρ2 = ρ(e2), l0 = l(a), l1 = l(e1), l2 = l(e2), c1 = c(e1), c2 = c(e2). The root of the tree is the problem of evaluating E*({e1, e2} | a), which is recursively reduced into subproblems via conditioning.
59
Proof of Theorem 3.3.4. The proof is by induction on |B|. For convenience, let ECR
denote the expected cost of the CR policy. The base case is when |B| = 0, in which the
theorem is trivially true. Suppose (inductive hypothesis) that the CR policy has the
minimum expected cost when |B| = k ≥ 0. Now, without losing generality, consider any
case in which |B| = k +1, A = {a}, and B = {e1, e2, …, ek+1}. We need to show that E*({e1,
e2, …, ek+1} | a) = ECR({e1, e2, …, ek+1} | a). If l(a) ≤ l(ei) for each i = 1, 2, …, k+1, then
E*({e1, e2, …, ek+1} | a) = l(a) = ECR({e1, e2, …, ek+1} | a). It’s trivial. We now consider
the nontrivial case that there exists at least an i such that l(a) > l(ei).
In the DPST for evaluating E*({e1, e2, …, ek+1} | a), by Lemma 3.3.6, we only need to
consider the branches that go to disambiguating the nondeterministic arcs with the
lengths strictly less than l(a). By Lemma 3.3.5, suppose (without losing generality) that
E*({e1, e2, …, ek+1} | a)
= min{l(a), 1,,2,1
min+= ki L
[c(ei) + ρ(yi)⋅E*({e1, e2, …, ek+1} \ ei | a) + (1− ρ(ei))⋅E*({e1, e2, …,
ek+1} \ ei |argmin{ l(a), l(ei)})]}
= min{l(a), c(ek+1) + ρ(ek+1)⋅E*({e1, e2, …, ek} | a) + (1− ρ(ek+1))⋅E*({e1, e2, …, ek} |
argmin{l(a), l(ek+1)})}
= min{l(a), c(ek+1) + ρ(ek+1)⋅E*({e1, e2, …, ek} | a) + (1− ρ(ek+1))⋅E*({e1, e2, …, ek} | ek+1},
with the last equality due to l(ek+1) < l(a).
60
By inductive hypothesis, E*({e1, e2, …, ek} | a) = ECR({e1, e2, …, ek}| a) and E*({e1,
e2, …, ek} | ek+1) = ECR({e1, e2, …, ek}| ek+1), hence
E*({e1, e2, …, ek+1} | a)
= min{l(a), c(ek+1) + ρ(ek+1)⋅ECR({e1, e2, …, ek} | a) + (1− ρ(ek+1))⋅ECR({e1, e2, …, ek} |
ek+1}.
Without losing generality, further suppose that he,1 ≤ he,2 ≤ … ≤ he,k, where he,i =
( )( )1 ( )
ii
i
c el eeρ
+−
for i = 1, 2, …, k. Let p ≤ k, q ≤ k be such integers that he,1 ≤ he,2 ≤ … ≤
he,p ≤ l(a) and he,1 ≤ he,2 ≤ … ≤ he,q ≤ l(ek+1). Since l(ek+1) < l(a), we have q ≤ p. By
Lemma 3.3.2,
ECR({e1, e2, …, ek}| a)
= (1− ρ(e1))⋅he,1 + ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep−1)⋅(1− ρ(ep))⋅he,p + ρ(e1)⋅ ⋅⋅⋅
⋅ρ(ep)⋅l(a)
and
ECR({e1, e2, …, ek}| ek+1)
= (1− ρ(e1))⋅he,1 + ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq−1)⋅(1− ρ(eq))⋅he,q + ρ(e1)⋅ ⋅⋅⋅
⋅ρ(eq)⋅l(ek+1).
Hence
E*({e1, e2, …, ek+1} | a)
61
= min{l(a), c(ek+1) + ρ(ek+1)⋅((1− ρ(e1))⋅he,1 + ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅
⋅ρ(ep−1)⋅(1− ρ(ep))⋅he,p + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep)⋅l(a)) + (1 − ρ(ek+1))⋅((1− ρ(e1))⋅he,1 +
ρ(e1)⋅(1− ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq−1)⋅(1− ρ(eq))⋅he,q + ρ(e1)⋅ ⋅⋅⋅
⋅ρ(eq)⋅l(ek+1))}
= min{l(a), c(ek+1) + (1− ρ(ek+1))⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅l(ek+1) + (1− ρ(e1))⋅he,1 + ρ(e1)⋅(1−
ρ(e2))⋅he,2 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq−1)⋅(1− ρ(eq))⋅he,q + ρ(ek+1)⋅(ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅(1−
ρ(eq+1))⋅he,q+1 + … + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep-1)⋅(1− ρ(ep))⋅he,p + ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(ep)⋅l(a))}.
Note that
c(ek+1) + (1−ρ(ek+1))⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅l(ek+1)
≥ c(ek+1)⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq) + (1− ρ(ek+1))⋅ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅l(ek+1)
= ρ(e1)⋅ ⋅⋅⋅ ⋅ρ(eq)⋅(1− ρ(ek+1))⋅he,k+1,
hence
E*({e1, e2, …, ek+1} | a) ≥ min{l(a), Ebf(e1→ e2→ …→ eq→ek+1→…→a→ …)},
where e1→ e2→ …→ eq→ek+1→…→a→ … is a balk free policy with ek+1 at q+1-th
position and a at p+1-th position.
By Theorem 3.3.1, we have
l(a) ≥ ECR({e1, e2, …, ek+1} | a)
and
Ebf(e1→ e2→ …→ eq→ek+1→…→a→ …) ≥ ECR({e1, e2, …, ek+1} | a),
62
hence
E*({e1, e2, …, ek+1} | a) ≥ ECR({e1, e2, …, ek+1} | a).
But
E*({e1, e2, …, ek+1} | a) ≤ ECR({e1, e2, …, ek+1} | a),
hence
E*({y1, y2, …, yk+1} | x0) = Ecr({y1, y2, …, yk+1} | x0).
Remark. Theorem 3.3.4 tells that the problem of traversing the probabilistic parallel
graph, under the assumption of independent probability markers, is nicely
computationally tractable. Obviously, to execute the CR policy, under this scenario, we
only need to find the potentially traversable arc with the smallest CR weight. To facilitate
the min-extraction process, we could build a heap (e.g., a binary heap) as the data
structure of the arcs. Note that building a binary heap has time complexity O((|B| +
1)⋅log(|B| + 1)) and the space complexity O(|B| + 1) when we start from an empty heap
and insert one by one into the heap. If we start with an array consisting the |B| + 1 arcs,
both the time complexity and space complexity can be O(|B| + 1).
63
4 Mark Information via Sensor
In last chapter, we introduced the early formulation of the RDP problem: traversing
probabilistic graphs. An important feature of the formulation is the probability marker
assumption, that is, before the agent travels, the probability of the nontraversability of
each nondeterministic arc is given. With this information, the primary objective is to find
a policy that has as small expected cost as possible. The probability marker assumption,
however, in recent real-world practice, seems to be invalid anymore. A recent navigation
system like COBRA’s processing station does not provide the traversability probabilities
(that might be estimated from historical data). It makes in-situ observation and provides a
probabilistic estimate of the traversability of the potential hazardous regions.
The estimates of the underlying true-false status of the potential hazards are probabilistic
and come from some sensor’s readings. The focus of this chapter is to systemically
formulate the new concept of markers from the perspective of sensor and investigate how
the improvement of the sensor might yield statistically better traversal. We first provide
the specific problem setting within which we will work, we then define sensors and the
new concept: sensor monotonicity, finally, we present two classes of policies: threshold
policies and penalty policies and show the sensor monotonicity results in two simple
64
cases.
4.1 Setting
The basic setting is as same as that in section 3.1. Let G = (V, A, B, l, c, ρ) be a finite
directed graph, where V is the set of nodes that contains a specified starting node s and a
specified target node t, A is the set of deterministic arcs, B is the set of n nondeterministic
arcs, l: A∪B → R+ is the length function, c: B → R+ is the disambiguation cost function,
ρ: B → (0, 1) is the probability function such that for each e ∈ B, ρ(e) represents the
probability that e is not traversable. Without losing generality, we can assume that l(a) <
+∞ and c(a) < +∞ for all a ∈ A∪B.
For convenience we define X: A∪B → {0, 1} as an indicator function such that in any
realization of G, for each a ∈ A∪B, X(a) = 1 if a is nontraversable in this realization; X(a)
= 0 otherwise. Note that X(a) is deterministic for all a ∈ A and X(e) is random for all e ∈
B. We assume that for each e ∈ B, X(e) is independently Bernoulli(ρ(e))- distributed:
P(X(e) = 1) = ρ(e) and P(X(e) = 0) = 1 −ρ(e).
Note here that we don’t assume ρ is known as a priori. But we assume there exists a
marker function Y: B → (0, 1) such that for any realization of G and each e ∈ B in this
65
realization, independently, Y(e) ~ F0 if X(e) = 0 and Y(e) ~ F1 if X(e) = 1, where F0: [0,1]
→ [0,1] and F1: [0,1] → [0,1] are two (continuous) distribution functions. We define a
sensor S as an ordered pair (F0, F1), denoted as S = (F0, F1). We use the notation Y(e) ~ S
to denote that the marker Y(e) of the arc e is generated from S.
A sensor S is said to be valid if F0(y) ≥ F1(y) for any 0 ≤ y ≤ 1. A valid sensor is said to be
discerning if F0(1/2) > 1/2 and F1(1/2) < 1/2. Consider two sensors S(1) = ( (1)0F , (1)
1F ) and
S(2) = ( (2)0F , (2)
1F ). For any realization of G, suppose Y(1) ~ S(1) and Y(2) ~ S(2). We say that
S(1) is stochastically at least as good as S(2) and write Y(1)fY(2), if for any 0 ≤ y ≤ 1, (1)0F (y)
≥ (2)0F (y) and (1)
1F (y) ≤ (2)1F (y).
With the new concept of the markers, which is characterized by Y other than ρ, for any e
∈ B, a policy for traversing G utilizes the information Y(e) in a realization of G. As under
the setting of section 3.1, the objective of designing a policy can be to minimize the
expected cost. There can also be other cost measures like quantiles (e.g., median), which
lead to alternative objectives.
We use C = C(G, s, t, X, Y, P) to denote the cost (traveling cost plus disambiguation cost)
the agent pays to travel from s to t under the policy P. C is a random variable since both
66
X and Y are random. Moreover, there may also be some randomness in P if P contains
some randomized algorithm. To measure the utility of P, we can consider the distribution
function ( )CH x = P (C ≤ x), x ≥ 0, the leftmost P0-quantile 0C = inf {x: ( )CH x ≥ P0},
and the mean E(C). A disadvantage of the mean measure is that it requires finite
expectation. This may not be always satisfied. For instance, if there exists a tiny
probability that C = +∞, then, theoretically, the mean is not a proper measure of the
utility of the P. However, the leftmost P0-quantile for some 0 < P0 < 1 still works as a
utility measure.
4.2 Sensor Monotonicity
We give three definitions of the sensor monotonicity.
Definition 4.2.1. Let C(1) = C(G, s, t, X, Y(1), P) and C(2) = C(G, s, t, X, Y(2), P), where Y(1)
~ S(1) and Y(2) ~ S(2). We say P is strongly monotone with respect to sensor if Y(1)fY(2)
implies (1) ( )C
H x ≥ ( 2) ( )C
H x for any x ≥ 0.
Definition 4.2.2. Let C(1) = C(G, s, t, X, Y(1), P) and C(2) = C(G, s, t, X, Y(2), P), where Y(1)
~ S(1) and Y(2) ~ S(2). We say P is P0-quantile monotone with respect to sensor for some 0
67
≤ P0 ≤ 1 if Y(1)fY(2) implies (1)0C ≤ (2)
0C , where (1)0C = inf {x: (1) ( )
CH x ≥ P0} and (2)
0C =
inf {x: ( 2) ( )C
H x ≥ P0}.
Definition 4.2.3. Let C(1) = C(G, s, t, X, Y(1), P) and C(2) = C(G, s, t, X, Y(2), P) , where Y(1)
~ S(1) and Y(2) ~ S(2) and C(1) < +∞, C(2) < +∞. We say P is weakly monotone with respect
to sensor if Y(1)fY(2) implies E(C(1)) ≤ E(C(2)).
It can be easily shown (see [51]) that strong monotonicity implies P0-quantile
monotonicity and weak monotonicity. Moreover, P0-quantile monotonicity for all 0 ≤ P0
≤ 1 implies strong monotonicity and if weak monotonicity holds for all nondercreasing
function of C, then strong monotonicity is implied.
4.3 Threshold and Penalty Policies
Similar to section 3.1, the tuple (Y, l, c) represents the prior knowledge of the graph G. In
consideration of the dynamics of knowledge of the graph G, for convenience, we define
function Y+: A∪B → R as an extension of Y:
(4.3.1) Y+(a) =
0 if X(a) = 0;
1 if X(a) = 1;
Y(a) if a∈ B.
68
We define function l+: A∪B → R as an extension of l:
We define function c+: A∪B → R as an extension of c:
Hence, the tuple (Y+, l+, c+) represents the updated knowledge of the graph G.
We now describe the class of threshold policies and the class of penalty policies.
Basically, both plan shortest paths in the same dynamic manner as what the CR policy
adopts. For any policy in either class, there are four main features: first, the initial
knowledge of the graph G is represented by the tuple (Y, l, c); second, as the agent travels,
the updated knowledge of G is represented by the tuple (Y+, l+, c+); third, a shortest path
plan is made relative to some weight function of Y+, l+, and c+ and in searching the
shortest path, the tie-breaking favors the deterministic arcs; fourth, the policy is balk-free.
What distinguishes a policy from another is the structure of the weight function.
The basic idea of a threshold policy is to screen out those arcs that seem not likely to be
traversable. In the threshold policy, a threshold vector αr is predetermined, with its
l+(a) = l(a) if 0 ≤ Y+(a) < 1;
+∞ if Y+(a) = 1.
c+(a) = c(a) if 0 < Y+(a) < 1;
0 otherwise. (4.3.3)
(4.3.2)
69
component 0 ≤ αe ≤ 1 representing the threshold of the arc e ∈ B. This vector is the
criterion for the screening, i.e. for each e ∈ B, if Y(e) ≥ αe, then view e as nontraversable.
The αr -threshold weight function in the shortest path planning is
We use Θ(αr ) to denote the threshold policy that uses theαr -threshold weight function.
The key element of a penalty policy is a penalty function l% (a) = l% (Y+(a), l+(a), c+(a)) >
0 for all a ∈ A∪B that is monotonically increasing with respect to its three arguments.
The penalty function also has the property that l% (0, l+(a), c+(a)) = l(a); l% (Y+(a), l+(a),
c+(a)) → c(a) + l(a) as Y+(a) → 0+; and l% (Y+(a), l+(a), c+(a)) → +∞ as Y+(a) → 1. The
l% -penalty weight function in the shortest path planning is
lW % (a) = l% (Y+(a), l+(a), c+(a)),
for all a ∈ A∪B. We use Ψ( l% ) to denote the penalty policy that uses the l% -penalty
weight function.
By essentially same arguments as those that lead to Theorem 3.2.1 and Theorem 3.2.3,
we have two immediate analytical results:
(4.3.4) W αr (a) =
l(a) if Y+(a) = 0;
+ ∞ if Y+(a) ≥ αa;
c(a) + l(a) if 0 < Y+(a) < αa.
(4.3.5)
70
Theorem 4.3.1. Under the setting of section 4.1, when applying any Θ(αr ) or any Ψ( l% )
to the convergent graph, it’s unnecessary to replan a new shortest path from where the
agent is to t upon the moment when the agent completes a disambiguation that discloses
the next arc to be traversable.
Theorem 4.3.2. Under the setting of section 4.1, for convergent graph, any Θ(αr ) and
any Ψ( l% ) has finite expected cost.
More importantly, we have the following two sensor monotonicity results:
Theorem 4.3.3. Under the setting of section 4.1, for traversing parallel graph G with V=
{s, t}, Θ(αr ) is weakly monotone with respect to sensor for any αr ∈ [0, 1]n.
Proof. Let S(1) = ( (1)0F , (1)
1F ) and S(2) = ( (2)0F , (2)
1F ) be two sensors such that for any 0 ≤ y
≤ 1, (1)0F (y) ≥ (2)
0F (y) and (1)1F (y) ≤ (2)
1F (y). For any realization of G, suppose Y(1) ~ S(1) and
Y(2) ~ S(2). Let C(1) = C(G, s, t, X, Y(1), Θ(αr )) and C(2) = C(G, s, t, X, Y(2), Θ(αr )). We need
to show that E(C(1)) ≤ E(C(2)).
At first, without losing generality, we assume that |A| = 1 since only the shortest
deterministic arc may affect the cost. Suppose A = {a}. The proof is by induction on |B|.
71
The base case is |B| = 0. In base case, C(1) = C(2) = l(a), which is constant, hence the weak
monotonicity trivially holds. Suppose (inductive hypothesis) the weak monotonicity
holds for |B| = k ≥ 0, we then consider the case |B| = k +1.
Consider e0 = arg mine B∈
(c(e) + l(e)). If l(a) ≤ c(e0) + l(e0), then C(1) = C(2) = l(a), the weak
monotonicity is trivially true. We now consider the nontrivial case that l(a) > c(e0) + l(e0).
In this case, Θ(αr ) returns e0 in its first plan.
Suppose 0 ≤ 0eα ≤ 1 is the threshold for e0. For convenience, denote γ = E(C | Y(e0)
<0eα , X(e0) = 0), ξ = E(C | Y(e0) <
0eα , X(e0) = 1), and η = E(C | Y(e0) ≥ 0eα ), where C =
C (G, s, t, X, Y, Θ(αr )) and Y(e) ~ S for any e ∈ B. Obviously,
γ = c(e0) + l(e0) ≤ η < η + c(e) = ξ < +∞,
with the last strict inequality due to Theorem 4.3.2. We can compute (via conditioning)
P(Y(e0) <0eα , X(e0) = 0)
= P(Y(e0) <0eα | X(e0) = 0)⋅P(X(e0) = 0)
= (1−ρ(e0))⋅F0(0eα ),
P(Y(e0) <0eα , X(e0) = 1)
= P(Y(e0) <0eα | X(e0) = 1)⋅P(X(e0) = 1)
= ρ(e0)⋅F1(0eα ),
72
and
P(Y(e0) ≥0eα )
= P(Y(e0) ≥0eα | X(e0) = 0)⋅P(X(e0) = 0) + P(Y(e0) ≥
0eα | X(e0) = 1)⋅P(X(e0) = 1)
= (1−ρ(e0))⋅[1−F0(0eα )] +ρ(e0)⋅[1−F1(
0eα )].
Hence a recursive formula for evaluating E(C) is
E(C)
= γ⋅(1−ρ(e0))⋅F0(0eα ) + ξ⋅ρ(e0)⋅F1(
0eα ) + η⋅(1−ρ(e0))⋅[1− F0(0eα )] + η⋅ρ(e0)⋅[1−F1(
0eα )].
Now
E(C(1))
= γ(1)⋅(1−ρ(e0))⋅0
(1)0 ( )eF α + ξ(1)⋅ρ(e0)⋅
0
(1)1 ( )eF α
+ η(1)⋅(1−ρ(e0))⋅[1−0
(1)0 ( )eF α ] + η(1)⋅ρ(e0)⋅[1−
0
(1)1 ( )eF α ]
and
E(C(2))
= γ(2)⋅(1−ρ(e0))⋅0
(2)0 ( )eF α + ξ(2)⋅ρ(e0)⋅
0
(2)1 ( )eF α
+ η(2)⋅(1−ρ(e0))⋅[1−0
(2)0 ( )eF α ] + η(2)⋅ρ(e0)⋅[1−
0
(2)1 ( )eF α ].
Note that γ(1) = γ(2). By inductive hypothesis, we have ξ(1) ≤ ξ(2) and η(1) ≤ η(2). Hence,
E(C(1))
≤ γ(2)⋅(1−ρ(e0))⋅0
(1)0 ( )eF α + ξ(2)⋅ρ(e0)⋅
0
(1)1 ( )eF α
+ η(2)⋅(1−ρ(e0))⋅[1−0
(1)0 ( )eF α ] + η(2)⋅ρ(e0)⋅[1−
0
(1)1 ( )eF α ],
73
which implies
E(C(1)) − E(C(2))
≤ γ(2)⋅(1−ρ(e0))⋅[0
(1)0 ( )eF α −
0
(2)0 ( )eF α ] + ξ(2)⋅ρ(e0)⋅ [
0
(1)1 ( )eF α −
0
(2)1 ( )eF α ]
+ η(2)⋅(1−ρ(e0))⋅[0
(2)0 ( )eF α −
0
(1)0 ( )eF α ] + η(2)⋅ρ(e0)⋅[
0
(2)1 ( )eF α −
0
(1)1 ( )eF α ]
= (γ(2) − η(2))⋅(1−ρ(e0))⋅[0
(1)0 ( )eF α −
0
(2)0 ( )eF α ]
+ (η(2) − ξ(2))⋅ρ(e0)⋅ [0
(2)1 ( )eF α −
0
(1)1 ( )eF α ].
By γ(2) ≤ η(2) ≤ ξ(2) and Y(1)fY(2), we have E(C(1)) − E(C(2)) ≤ 0.
Corollary 4.3.4. Under the setting of section 4.1, for traversing the parallel graph,
Y(1)fY(2) implies E[C(G, s, t, X, Y(1), Θ( *1αr ))] ≤ E[C(G, s, t, X, Y(2), Θ( *
2αr ))], where Y(1) ~
S(1), Y(2) ~ S(2), *1αr =
[0,1]arg min
nα∈rE[C(G, s, t, X, Y(1), Θ(αr ))], and *
2αr =
[0,1]arg min
nα∈rE[C(G, s, t, X,
Y(2), Θ(αr ))].
Proof. By the definitions of *1αr and *
2αr and Theorem 6, we have
E[C(G, s, t, X, Y(1), Θ( *1αr ))]
≤ E[C(G, s, t, X, Y(1), Θ( *2αr ))]
≤ E[C(G, s, t, X, Y(2), Θ( *2αr ))].
Theorem 4.3.5. Under the setting of section 4.1, for traversing the convergent (relative to
t) graph G with |B| = 1, both Θ(α) and Ψ( l% ) are strongly monotone with respect to sensor
74
for any threshold α ∈ [0, 1] and l% -penalty function that are associated with the single
nondeterministic arc.
Proof. Suppose B = {e = (u, v)}. We first consider any threshold policy Θ(αe) with 0 ≤ αe
≤ 1 being the threshold for e. Suppose P1, with length L1, is a shortest s-t path that does
not pass e; P2, with length L2, is a shortest s-u path; P3, with length L3, is a shortest v-t
path; and P4, with length L4, is a shortest u-t path that does not pass e. Since G is
convergent with respect to t, the three paths P1, P3, and P4 must exist. If P2 does not exist,
then C(G, s, t, X, Y, Θ(αe)) = L1, the strong monotonicity trivially holds. We consider the
case that P2 exists, as Figure 4.3.1 shows.
Note that L1 ≤ L2 + L4 < c(e) + L2 + L4. If L1 ≤ c(e) + l(e) + L2 + L3, then C(G, s, t, X, Y,
Θ(αe)) = L1, the strong monotonicity trivially holds too. We consider the nontrivial case
that L1 > c(e) + l(e) + L2 + L3. In this case, Θ(αe) returns P2 ⊕ e ⊕ P3 in its first plan. It’s
easy to find that
Figure 4.3.1: Analysis of the convergent graph G with single nondeterministic arc
s u v t
P1, L1
P2, L2 l(e), ρ(e) , c(e) X(e), Y(e)
P3, L3
P4, L4
75
which implies that distribution function of C(G, s, t, X, Y, Θ(αe)) is
HC(c(e) + l(e) + L2 + L3) = (1−ρ(e))⋅F0(αe);
HC(L1) = (1−ρ(e))⋅[1 −F0(αe)] +ρ(e)⋅[1−F1(αe)] + (1−ρ(e))⋅F0(αe)= 1 −ρ(e)⋅F1(αe)
HC(c(e) + L2 + L4) = 1.
This cost distribution function implies the strong monotonicity since improving sensor
means F0(αe) increases and F1(αe) decreases.
We then consider the penalty policy Ψ( l% ) with the argument l% = l% (Y(e), l(e), c(e)) being
the penalty function of e. It’s crucial to compare L1 with L2 + l% (Y(e), l(e), c(e)) + L3. Note
that l% (Y(e), l(e), c(e)) > c(e) + l(e). Hence if L1 ≤ c(e) + l(e) + L2 + L3, then C(G, s, t, X,
Y, Ψ( l% )) = L1, the strong monotonicity trivially holds. We now consider the nontrivial
case that L1 > c(e) + l(e) + L2 + L3. It’s not hard to find that
C(G, s, t, X, Y, Θ(αe)) =
c(e) + l(e) + L2 + L3 if Y(e) < αe and X(e) = 0;
L1 if Y(e) ≥ αe;
c(e) + L2 + L4 if Y(e) < αe and X(e) = 1,
C(G, s, t, X, Y, Ψ( l% )) =
c(e) + l(e) + L2 + L3 if L1 > L2 + l% (Y(e), l(e), c(e)) + L3
and X(e) = 0;
L1 if L1 ≤ L2 + l% (Y(e), l(e), c(e)) + L3;
c(e) + L2 + L4 if L1 > L2 + l% (Y(e), l(e), c(e)) + L3
and X(e) = 1.
76
Define α* = inf{α ∈ [0,1] | L1 ≤ L2 + l% (α, l(e), c(e)) + L3}. Since l% is monotonically
increasing in its first argument, we have that L1 ≤ L2 + l% (Y(e), l(e), c(e)) + L3} if and only
if Y(e) ≥ α*. Hence
This means Ψ( l% ) behaves as same as Θ(α*), hence Ψ( l% ) is strongly monotone with
respect to sensor.
c(e) + l(e) + L2 + L3 if Y(e) < α* and X(e) = 0;
L1 if Y(e) ≥ α*;
c(e) + L2 + L4 if Y(e) < α* and X(e) = 1.
C(G, s, t, X, Y, Ψ( l% )) =
77
5 Traversing Minefield
This chapter is application-oriented and we focus on the problem of traversing minefield.
Unlike the settings in Chapters 3 and 4, for traversing a minefield, we are usually given
independent markers of some regions in R2, and those markers should be properly carried
to the arcs of a graph that is used to discretize the world. In accordance with the COBRA
system, the markers are from sensor’s readings.
In this chapter, we present the minefield model and an adjusted CR policy for traversing
the minefield and graphically demonstrate the running cases of the adjusted CR policy.
After introducing the experimental setting, based on extensive Monte Carlo simulations
using the RDP simulation programs we developed, we present some numerical and
statistical evidences that the adjusted CR policy is both strongly monotone and weakly
monotone with respect to sensor. We show the monotonicity results from both the
conditional experiments and the unconditional experiments. We also process a set of
potential-mine detection data provided by the COBRA group and show that improving
the current quality of detections does improve the quality of the traversals. The meaning
of the monotonicity results is that the results form a basis for a cost-benefit analysis for
consideration of adoption of superior, but presumably more expensive, sensors.
78
5.1 Minefield Model
According to the COBRA system, we model a minefield as m detected risk centers d1,
d2, …, dm ∈ S ⊂ R2, with each di being either a true detection or a false detection and S
denoting a bounded region. For i = 1, 2, …, m, we let Xi be the indicator that di is a true
detection or not, that is, Xi = 1 if di is a true detection; Xi = 0 if di is a false detection. For
each risk center di, there is disk shape risk region, say Di, that is centered at di and has a
radius ri > 0. Suppose a global sensor generates a marker 0 < Yi < 1 for each di and
guided by the markers, the agent travels from a starting location s ∈ S ⊂ R2 to a target
location t ∈ S ⊂ R2.
A disambiguation of di at a cost ci > 0 happens when the agent is right outside Di but
about to enter Di and di has not been disambiguated. If the disambiguation discloses that
Xi = 1, then di is confirmed to be a true hazard, in this case, the region covered by Di
should be forbidden, that is, the marker should be updated into 1. If the disambiguation
discloses that Xi = 0, then di should be removed and its marker should be updated into 0,
but the region covered by Di may still be questionable since Di may intersect other risk
disks that have not been disambiguated. During the travel, the agent dynamically collects
the new information through (local) disambiguations and updates the knowledge of the
world. Like the extended marker functions in Chapters 3 and 4, we define the extension
79
of Yi’s as
Hence the knowledge on the true-false status of all di’s can be represented as all Yi+’s.
We discretize S into Z2 endowed with the 8-adjacency relation. This is illustrated in
Figure 5.1.1, where δ denotes the cell size (or resolution). The grid graph can be viewed
as directed since each edge with end vertices u and v can be viewed as two-way directed,
that is, (u, v) and (v, u).
To make a shortest path plan in the grid graph, we need to determine the weight function.
(5.1.1) Yi+ =
0 if Xi = 0;
1 if Xi = 1;
Yi if di has not been disambiguated.
vij
vi+1, j
vi-1, j
vi, j+1
vi+1, j+1
vi-1, j+1
vi, j-1
vi+1, j-1
δ
Figure 5.1.1: The grid representation, with 8-adjacency
vi-1, j-1
80
The weight of each arc contains three types of information: marker, length, and
disambiguation cost. For any arc a = (u, v), let l(a) be the (Euclidean) length and note
that l(a) = δ or 2δ depending on , like the previous settings, we use Y+(a) to denote the
extended marker, use l+(a) to denote the extended length and use c+(a) to denote the
extended disambiguation cost.
To define Y+(a), we first define YI : the derived maker of the region covered by ii I
D∈I ,
where I ⊆ {1, 2, …, m} is an index set, as
YI = 1 − (1 )ii I
Y +
∈
−∏
with the convention that (1 )ii I
Y +
∈
−∏ = 1 if I is empty set. For a = (u, v), let Iu = {i | u is
covered by Di}, Iv = {i | v is covered by Di}, and Id = {i | di has not been disambiguated}.
We define the extended marker of a as
Y+(a) = \v uI IY .
Theorem 5.1.1. The extended marker defined as (5.1.3) well characterizes the
information on the arc a = (u, v), that is, Y+(a) = 0 if and only a is known to be
traversable; Y+(a) = 1 if and only if a is known to be nontraversable; and 0 < Y+(a) < 1 if
and only if a is nondeterministic.
Proof. Note that Y+(a) = \v uI IY = 1 − ( \ )\ ( \ )
(1 ) (1 )v u d v u d
i ii I I I i I I I
Y Y+ +
∈ ∈ ∩
⎡ ⎤ ⎡ ⎤− ⋅ −⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦∏ ∏ . First, \v uI IY
(5.1.2)
(5.1.3)
81
= 1 if and only if ( \ )\
(1 )v u d
ii I I I
Y +
∈
−∏ = 0, which is equivalent to the existence of a j ∈ (Iv\
Iu) \ Id such that Yj+ = 1, i.e. dj has been disambiguated and Xj = 1, hence Y+(a) = 1 if and
only if a is confirmed to be nontraversable. Second, \v uI IY = 0 if and only if
( \ )\
(1 )v u d
ii I I I
Y +
∈
−∏ = 1 and (Iv\ Iu) ∩ Id is empty, hence Y+(a) = 0 if and only if a is
confirmed to be traversable. Third, 0 < \v uI IY < 1 if and only if ( \ )\
(1 )v u d
ii I I I
Y +
∈
−∏ = 1 and
(Iv\ Iu) ∩ Id is nonempty, hence 0 < Y+(a) < 1 if and only if a is potentially traversable but
nondeterministic.
Once the extended marker Y+(a) is determined, the extended length l+(a) can be defined
as
The extended disambiguation cost c+(a) is defined as
This formula says that the arc a = (u, v) needs to be disambiguated if and only if it is
nondeterministic, and furthermore, disambiguation of a means disambiguating all the
undisambiguated risk disks that cover v but not u.
l(a) if 0 ≤ Y+(a) < 1;
+ ∞ if Y+(a) = 1. l+(a) =
[ ( )\ ( )] d
ii I v I u I
c∈ ∩∑ if 0 < Y+(a) < 1;
0 otherwise. c+(a) =
(5.1.4)
(5.1.5)
82
Let Yr
= [Y1 Y2 … Ym]T and cr = [c1 c2 … cm]T, then the prior knowledge of the terrain
(minefield) can be represented by the tuple (Yr
, l, cr ). The updated knowledge of the
terrain can be represented by the tuple (Y+, l+, c+), which is in the same form as that in
section 4.3.
We define the CR weight function for the grid graph as
WCR,Y(a) = l+(a) + ( )1 ( )
c aY a
+
+−
for all a. We still call this weight function as “CR” because we simply replace ρ+(a) in
(3.2.1) with Y+(a). Note that WCR,Y(a) = l(a) if Y+(a) = 0; WCR,Y(a) = + ∞ when Y+(a) = 1;
and l(a) < WCR,Y(u, v) < +∞ when 0 < Y+(a) < 1. Hence, as mentioned in section 3.2, this
CR weight function is well posed with respect to the knowledge of the terrain.
We now present the adjusted CR policy:
Under the knowledge (Y+, l+, c+) of the terrain, find a shortest path relative to WCR,Y from
its current location to the target location t, let the agent follow the shortest path plan until
the agent reaches t or encounters a nondeterministic arc. In the former case, the
navigation process successfully completes; in the later case, the agent disambiguates the
arc by disambiguating all the newly encountered risk disks. The disambiguation results
(5.1.6)
83
update the knowledge (Y+, l+, c+) and a new shortest path from agent’s current location to
the target location t relative to updated WCR,Y is found for the agent to follow.
In the minefield model, the updated knowledge of the world renders the necessity of
replanning even if the encountered nondeterministic arc is disclosed as traversable.
Replanning after new discoveries makes sure that the agent always follows a shortest
path from where it is to t under the updated information. This is different from the
previous settings that each arc is independently marked and the disambiguation of a
nondeterministic arc only removes the uncertainty of this arc alone. Under the previous
settings, the planned path remains to be the shortest if the disambiguation of its first
nondeterministic arc discloses the traversability. However, here in the minefield model,
the disambiguation of an arc also disambiguates other nondeterministic arcs, hence
replanning after the discovery of false detections is still needed.
5.2 Experimental Setting and Results
We simulate m = 100 risk disks within [0, 100]×[0, 100]. The locations d1, d2, …, d100 are
independently uniformly drawn from [10, 90]×[10, 90]. For i = 1, 2, …, 100, we set ri =
4.5. Based on this risk disk size, we choose δ = 1, which means the resolution of the grid
is 100×100. We let s = (0, 0) be the starting location and t = (100, 100) be the target
84
location. This setting is illustrated in Figure 5.2.1.
We simulate the markers from Beta distribution, which has the probability density
function
where A > 0 and B > 0 are the two parameters, Γ is Gamma function. In our experiments,
for each i = 1, 2, …, 100, we draw 0|ii XY = ~ fBeta(y; 3.5 − λ, 3.5 + λ) and 1|
ii XY = ~
fBeta(y; 3.5 + λ, 3.5 − λ), where 0 < λ < 3.5 is the uniparameter. In fact, such parameter
fBeta(x; A, B) = 1 1( ) (1 )
( ) ( )A BA B x x
A B− −Γ +
−Γ Γ
0
if 0 < x < 1;
otherwise, (5.2.1)
Figure 5.2.1: A collection of m = 100 detections. The little squares are the simulated risk centers, the circles are the boundaries of the simulated risk disks.
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
Simulated Risk Centers and Disks
x
y
s
t
85
setting renders us a discerning (Beta) sensor for each λ. And the larger the value of λ, the
better the sensor. As λ → 0, the sensor approaches “useless”; as λ → 3.5, the sensor
approaches “perfect”.
Since all the 100 disks have the same size, we assume a constant disambiguation cost
Cd > 0 of each di. In our experiments, we use Cd = 2.25. It should be noted that the larger
the value of Cd, the more likely it is that the planned path does not traverse any risk disks.
We choose Cd = 2.25 to observe the cases in which disambiguation happens.
Finally for i = 1, 2, …, 100, we independently draw Xi ~ Bernoulli(0.6).
We apply the A* algorithm for the shortest path subproblems implicit in the adjusted CR
policy that uses the CR weight function (5.1.6). The A* algorithm is implemented in its
best-first search version and the Open list is maintained as a binary heap. The algorithm
uses the Euclidean distance as the natural heuristic. In our experiments, the A* algorithm
coded in Matlab 7.1 performs efficiently even when many replannings were required.
Figures 5.2.2 and 5.2.3 show two trajectory realizations. In Figure 5.2.2, the sensor
86
Figure 5.2.2: A realization of trajectory in a real terrain (upper left) and in one of its marked map (upper right) with two close-up views (lower left and lower right) in the real terrain. The solid circles are real detections; the dotted circles are false detections. The little pluses denote the planned next locations. Sensor parameter λ = 0.5. Total cost: 242.14; traveling cost: 215.14; and disambiguation cost: 27. There are totally 12 disambiguations. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM: 21.891 seconds.
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
x
y
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
20 25 30 35 40 45 50
35
40
45
50
55
60
65
x
y
40 45 50 55 60 65 70
60
65
70
75
80
85
90
x
y
87
Figure 5.2.3: A realization of another trajectory in the same real terrain (upper left) and in one of its marked map (upper right) with two close-up views (lower left and lower right) in the real terrain. The solid circles are real detections; the dotted circles are false detections. The little pluses denote the planned next locations. Sensor parameter λ = 3.0. Total cost: 174.77; traveling cost: 168.02; and disambiguation cost: 6.75. There are totally 3 disambiguations. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM: 10.844 seconds.
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
x
y
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10 15 20 25 30 35 40
45
50
55
60
65
70
75
x
y
45 50 55 60 65 70
70
75
80
85
90
95
x
y
88
parameter is set as λ = 0.5. The upper left plot shows the final trajectory displayed in real
terrain in which true hazards are represented as solid circles while false detections are
represented as dotted circles. The upper right plot shows the same trajectory displayed in
one marked map of the same terrain. The marked map is color-coded so that a large value
of the sensor reading for a detection produces deep pink (probable danger) and small
value of the sensor reading produces light pink (probable safety). The little pluses in the
upper left plot denote the planned next locations. Those locations are not necessarily
taken since each one either falls in true hazard disks and replaced by a new plan or it is
just simply replaced by the new plan due to the discrepancy between the new plan and
the old plan. Observe that the little pluses, with close-up views in two lower plots,
demonstrate which risk disks (besides the agent’s trajectory) are disambiguated. The case
in Figure 5.2.2 shows that bad sensor readings not only lead the agent into dead ends but
also result in many unnecessary disambiguations. The final trajectory is long (the length
is 215.14) and the total disambiguation cost is high (there are 12 disambiguations with
total disambiguation cost 27). In Figure 5.2.3, the sensor parameter is set as λ = 3.0 and
the real terrain is the same as that displayed in Figure 5.2.2. Obviously, the final
trajectory displayed in Figure 5.2.3 is much better (the length is 168.02) and the total
disambiguation cost is low (there are only 3 disambiguations with total disambiguation
cost 6.75). Hence we have a vivid example that better sensor yields superior traversal.
89
We seek statistical evidence of strong or weak monotonicity of mark information via
Monte Carlo simulations. We consider 8 values of λ: λ1 = 0.01, λ2 = 0.5, λ3 = 1.0, λ4 =
1.5, λ5 = 2.0, λ6 = 2.5, λ7 = 3.0, λ8 = 3.49.
First, conditioning on the terrain, denoted as T1, that is displayed in Figure 5.2.2, For
each i = 1, 2, …, 8, we simulate 400 marked maps using λi. We totally execute 3200 runs
and each run returns a total cost. The plot of empirical cumulative distribution functions
(ECDF) and the error bar plot of average cost vs. sensor parameter are displayed in
Figure 5.2.4. By inspection, it is seen that both strong monotonicity and weak
monotonicity are exhibited. Second, we perform such “unconditional” experiments: for
each i = 1, 2, …, 8, we execute 2500 runs with each run starting with a randomly
generated terrain and an associated simulated marked map under λi. We still maintain the
basic settings such as the starting location, the target location, the number of disks, the
radii of the disks, the disambiguation cost per disk, and the underlying distribution of
true-false status. Hence the sources of randomness are the terrain (locations of the
detection centers plus the true-false status of the potential hazards) and the markers.
There are totally 20000 runs and each run returns a total cost. The ECDF plot and the
error bar plot of average cost vs. sensor parameter are displayed in Figure 5.2.5. Again by
inspection, it is seen that both strong monotonicity and weak monotonicity are exhibited.
90
Figure 5.2.4: Graphic statistical results of the data from the experiments conditioning on terrain T1. For each i = 1, 2, …, 8, the sample size under λi is 400. Left: plot of ECDFs; Right: error bar plot of average cost vs. sensor parameter
Figure 5.2.5: Graphic statistical results of the data from the unconditional experiments. For each i = 1, 2, …, 8, the sample size under λi is 2500. Left: plot of ECDFs; Right: error bar plot of average cost vs. sensor parameter
0 0.5 1 1.5 2 2.5 3 3.5160
170
180
190
200
210
220
230
240
250
260
λ
Ave
rage
Cos
t
Average Cost vs. Sensor Parameter
0 0.5 1 1.5 2 2.5 3 3.5160
165
170
175
180
185
190
λ
Ave
rage
Cos
t
Average Cost vs. Sensor Parameter
180 200 220 240 260 280 300 320 340
0
0.2
0.4
0.6
0.8
1
Length
Pro
babi
lity
Empirical CDFs
λ = 0.01λ = 0.5λ = 1.0λ = 1.5λ = 2.0λ = 2.5λ = 3.0λ = 3.5
150 200 250 300 350 400 450 500
0
0.2
0.4
0.6
0.8
1
Length
Pro
babi
lity
Empirical CDFs
λ = 0.01λ = 0.5λ = 1.0λ = 1.5λ = 2.0λ = 2.5λ = 3.0λ = 3.5
91
We quantify our observation on strong monotonicity by performing 1-sided Kolmogorov-
Smirnov (KS) tests to do the pairwise comparisons of the distributions of samples. We
use F(x|T1, λ) to denote the conditional cost distribution function conditioning on terrain
T1 and parameterized by λ. We use F(x, λ) to denote the unconditional cost distribution
function parameterized by λ. The tests are significant at 5% level. The (asymptotic)
p-values are listed in Table 5.2.1. Observe that the small p-values for all
pair-comparisons suggest that F(x|T1, λi) < F(x|T1, λi+1) and F(x, λi) < F(x, λi+1) for i = 1,
2, …, 7. Hence, we have quantified statistical evidence of strong monotonicity from both
conditional experiments and unconditional experiments.
We quantify our observation on weak monotonicity by performing 1-sided t tests to do
the pairwise comparisons of the means of samples. We use E(C|T1,λ) to denote the mean
Pair Comparison Sample size: 400 Sample size: 2500 Alternative hypothesis p-value Alternative hypothesis p-value
λ1 vs. λ2 F(x|T1, λ1) < F(x|T1, λ2) 2.7479e-010 F(x, λ1) < F(x, λ2) 6.2707e-005
λ2 vs. λ3 F(x|T1, λ2) < F(x|T1, λ3) 0 F(x, λ2) < F(x, λ3) 1.6291e-008
λ3 vs. λ4 F(x|T1, λ3) < F(x|T1, λ4) 0 F(x, λ3) < F(x, λ4) 1.1555e-005
λ4 vs. λ5 F(x|T1, λ4) < F(x|T1, λ5) 0 F(x, λ4) < F(x, λ5) 2.2300e-007
λ5 vs. λ6 F(x|T1, λ5) < F(x|T1, λ6) 0 F(x, λ5) < F(x, λ6) 7.6715e-006
λ6 vs. λ7 F(x|T1, λ6) < F(x|T1, λ7) 0 F(x, λ6) < F(x, λ7) 0.0002
λ7 vs. λ8 F(x|T1, λ7) < F(x|T1, λ8) 0 F(x, λ7) < F(x, λ8) 0.0383
Table 5.2.1: 1-sided Kolmogorov-Smirnov tests for pairwise comparisons of the distributions of samples. The tests are at 0.05 significance level.
92
cost conditioning on terrain T1 and under the sensor parameter λ. We use E(C|λ) to
denote the mean cost under the sensor parameter λ. The tests are significant at 5% level.
The p-values are listed in Table 5.2.2. Observe that the small p-values for all
pair-comparisons suggest that E(C|T1,λi) < E(C|T1,λi+1) and E(C|λi) < E(C|λi+1) for i = 1,
2, …, 7. Hence, we have quantified statistical evidence of weak monotonicity from both
conditional experiments and unconditional experiments.
Note that if strong (weak) monotonicity holds conditioning on any terrain, then strong
(weak) monotonicity also holds unconditionally. The experimental results introduced so
far strongly suggest unconditional strong (weak) monotonicity. We suspect that if we
condition on terrain structure like the locations of the detection centers and the true-false
status of the potential hazards, strong (weak) monotonicity almost holds, i.e. the
Table 5.2.2: 1-sided t tests for pairwise comparisons of the means of samples. The tests are at 0.05 significance level.
Pair Comparison Sample size: 400 Sample size: 2500 Alternative hypothesis p-value Alternative hypothesis p-value
λ1 vs. λ2 E(C|T1, λ1) > E(C|T1, λ2) 0.0094 E(C|λ1) > E(C|λ2) 7.3382e-005
λ2 vs. λ3 E(C|T1, λ2) > E(C|T1, λ3) 0.0018 E(C|λ2) > E(C|λ3) 0.0002
λ3 vs. λ4 E(C|T1, λ3) > E(C|T1, λ4) 0.0025 E(C|λ3) > E(C|λ4) 3.4639e-005
λ4 vs. λ5 E(C|T1, λ4) > E(C|T1, λ5) 1.8035e-006 E(C|λ4) > E(C|λ5) 2.2253e-009
λ5 vs. λ6 E(C|T1, λ5) > E(C|T1, λ6) 0 E(C|λ5) > E(C|λ6) 2.1351e-005
λ6 vs. λ7 E(C|T1, λ6) > E(C|T1, λ7) 0 E(C|λ6) > E(C|λ7) 0
λ7 vs. λ8 E(C|T1, λ7) > E(C|T1, λ8) 0 E(C|λ7) > E(C|λ8) 2.3210e-008
93
probability that strong (weak) monotonicity does not hold is almost zero. However, up to
now, only one terrain is conditioned. We next introduce additional experimental results
conditioning on 50 randomly generated (different) terrains. We still maintain the basic
settings as in the unconditional experiments. For each fixed terrain, the sources of
randomness are from the markers. Due to the limited computing resources, we only
consider those two values of λ: λ3 = 1.0 and λ6 = 2.5. For each terrain and each of the
two values of λ, we simulate 100 marked maps. There are totally 10000 runs with each
run returning a total cost. To compare the conditional performance under λ3 = 1.0 and λ6
= 2.5, we perform both KS tests and t tests on the generated data. We perform all the
three types of tests specified by two-tail, left-tail, and right-tail. The p-values of all the
tests are listed in Table 5.2.3. The p-values show that if we improve the uniparameter of
the (Beta) sensor from λ3 = 1.0 to λ6 = 2.5, then for i = 1, 2, …, 50, both F(x|Ti, λ3) <
F(x|Ti, λ6) and E(C|Ti, λ3) > E(C|Ti, λ6) are significant as long as F(x|Ti, λ3) ≠ F(x|Ti, λ6)
and E(C|Ti, λ3) ≠ E(C|Ti, λ6) are significant. This supports the conditional strong (weak)
monotonicity for almost every terrain.
Based on the experimental results we obtained so far, we conjecture that our adjusted CR
policy, under the minefield setting and restricted to the particular Beta sensor, is
unconditionally weakly monotone and strongly monotone. Also, the monotonicity in the
94
p-value Kolmogorov-Smirnov test t-test Alternative hypothesis Alternative hypothesis
i of Ti F(x|Ti, λ3) ≠ F(x|Ti, λ6)
F(x|Ti, λ3) < F(x|Ti, λ6)
F(x|Ti, λ3) > F(x|Ti, λ6)
E(C|Ti, λ3) ≠ E(C|Ti, λ6)
E(C|Ti, λ3) > E(C|Ti, λ6)
E(C|Ti, λ3) < E(C|Ti, λ6)
1 0 0 1 0 0 1 2 0 0 1 1.5400E-05 7.7000E-06 1 3 0.9921 0.6880 1 0.0991 0.0495 0.9505 4 2.8500E-06 1.4200E-06 1 1.1500E-07 5.7300E-08 1 5 3.7000E-09 1.8500E-09 1 7.4100E-10 3.700E-10 1 6 1 0.9897 1 0.3197 0.1599 0.8401 7 0.0205 0.0102 1 0.0528 0.0264 0.9736 8 0 0 0.9593 0 0 1 9 1 0.9593 1 0.1583 0.0792 0.9208
10 0.0994 0.0497 1 0.0255 0.0128 0.9872 11 0 0 1 0 0 1 12 0 0 1 0 0 1 13 0 0 1 8.7800E-09 4.3900E-09 1 14 0 0 1 0 0 1 15 0 0 1 0 0 1 16 0 0 1 0 0 1 17 0 0 1 0 0 1 18 0 0 1 0 0 1 19 0 0 1 0 0 1 20 0 0 1 0 0 1 21 0 0 1 0 0 1 22 0.9921 0.6880 1 0.0136 0.0068 0.9932 23 0.0030 0.0015 1 2.2600E-07 1.1300E-07 1 24 0 0 1 0 0 1 25 0.8938 0.5144 1 0.0042 0.0021 0.9980 26 0 0 1 0 0 1 27 1.4700E-09 7.3300E-10 0.9897 1.36E-10 0 1 28 0 0 1 0 0 1 29 1 0.8469 1 0.0459 0.0230 0.9770 30 0 0 1 0 0 1 31 5.2200E-08 2.6100E-08 1 1.9500E-07 9.7700E-08 1 32 0 0 1 0 0 1 33 0 0 1 0 0 1 34 0 0 1 0 0 1 35 0 0 1 0 0 1 36 0 0 1 0 0 1 37 0 0 1 0 0 1 38 0.4431 0.2241 1 0.0004 0.0002 0.9998 39 0 0 1 0 0 1 40 0 0 1 3.2900E-09 1.6500E-09 1 41 0.4431 0.2241 1 0.0127 0.0063 0.9937 42 9.1200E-09 4.5600E-09 1 7.1800E-08 3.5900E-08 1 43 0 0 1 0 0 1 44 9.1200E-09 4.5600E-09 1 4.5400E-08 2.2700E-08 1 45 0 0 1 0 0 1 46 0 0 1 0 0 1 47 0.1400 0.0700 1 0.0002 7.6900E-05 0.9999 48 0 0 1 0 0 1 49 0.0314 0.0157 1 2.7600E-05 1.3800E-05 1 50 0 0 1 0 0 1
Table 5.2.3: Hypothesis tests for comparing the distributions and means of samples associated with λ3 = 1.0 and λ6 = 2.5. There are 50 randomly generated terrains.
95
two senses is true when conditioning on almost any terrain structure. Such empirical
results would provide the basis for a cost-benefit analysis for consideration of adoption
of superior, but presumably more costly, sensors.
5.3 The COBRA Data
We now present the results of applying our adjusted CR policy to a set of potential-mine
detection data provided by the COBRA group.
The following marked point process realization, referred to in Priebe et al. 1999 [52] and
Olson et al. 2002 [53], has 39 potential mines with x-y coordinates and associated
markers Y listed in Table 5.3.1. The markers were generated by the post-classification
rule in [53]. Each risk disk has radius 50. The later-found true-false status is illustrated in
Figure 5.3.1. The COBRA data has different coordinate system from [0,100] × [0, 100],
which is used in our experiments. However, the terrain, displayed as the left plot of
Figure 5.3.1, can be proportionally projected into [0, 100] × [0, 100]. The projection
renders us a projected terrain, which is displayed as the right plot of Figure 5.3.1. We
perform experiments with the projected terrain, where the radii of all risk disks becomes
5.0, and we choose the value of Cd accordingly. To show the trajectories and other results
in original terrain, we can just simply scale back.
96
First, as in [1], [2], we select s = (0, 800) and t = (0, 100) in the original terrain. Three
trajectories under Cd = 5, 50, 500 in the real terrain and the marked map are displayed in
x y Y x y Y x y Y 321.17 158.27 0.59017 -105.75 262.2 0.25748 95.39 248.12 0.1886854.23 201.12 0.54178 185.31 182.18 0.65266 -78.75 396.14 0.0731
158.17 516.48 0.43525 116.39 110.84 0.44124 -245.28 372.05 0.52154215.13 428.31 0.6189 -128.6 274.12 0.62001 -166.45 180.33 0.61082-145.67 703.06 0.61714 -61.19 345.12 0.17183 -134.53 769.27 0.19386-151.01 572.15 0.56076 -91.27 664.45 0.16675 -258.45 641.03 0.6567 221.12 557.31 0.64047 -82.87 248.29 0.58308 111.6 640.1 0.56529-166.36 299.42 0.49173 105.47 509.8 0.85147 -219.32 313.68 0.57449296.16 163.31 0.11649 -19.93 568.04 0.59937 -455.72 742.57 0.63987163.31 186.14 0.65636 -310.23 402.92 0.65428 -157.1 441.96 0.6444428.31 205.03 0.15269 -320.73 532.23 0.33092 -242.22 321.51 0.65655-79.26 709.99 0.56085 -35.11 242.61 0.1033 -237.86 546.19 0.13793100.4 376.47 0.51487 -169.99 438.9 0.64163 -269.98 379.65 0.52802
Table 5.3.1: x, y-coordinates of the risk centers and the associated markers in the COBRA data
Figure 5.3.1: The COBRA terrain (left) and the projected COBRA terrain (right).
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
COBRA Terrain
x
y
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100Projected COBRA Terrain
x
y
97
Figure 5.3.2: Three trajectories under Cd = 5, 50, 500 displayed in the real terrain (left plots) and in the originally marked map (right plots). When Cd = 5, the total cost is 715, with 3 disambiguations; when Cd = 50, the total cost is 807.99 with one disambiguation; when Cd = 500, the total cost is 1043.3 with no disambiguation. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM in three cases is 1.609 seconds, 1.906 seconds, and 4.063 seconds, respectively.
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9s
t
Cd = 5
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
x
y
s
t
Cd = 5
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
x
y
s
t
Cd = 50
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9s
t
Cd = 50
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
x
y
s
t
Cd = 500
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9s
t
Cd = 500
98
Figure 5.3.2. Observe that the larger the value of Cd, the greater the number of
disambiguations. In all the three cases, the respective final costs 715, 807.99, and 1043.3
already reach the minimums (that are realized under the perfect makers).
Second, we consider the case in which improvement of markers yields less cost. This
time we choose s = (−300, 250) and t = (300, 600) in the original terrain. We choose Cd =
50. We consider the following improving scheme of the markers:
with 0 ≤ λ ≤ 1. Note that λ = 0 corresponds to zero improvement; while λ = 1
corresponds to perfect improvement. As λ goes from 0 to 1, the markers go from Yi’s to
perfect markers. We run the experiments under the mesh of values λ = 0.1i, i = 0, 1, …,
10. Three typical cases are λ = 0, λ = 0.4, and λ = 0.5 with respective total costs: 951.84,
919.41, and 903.26. The corresponding trajectories in the real terrain and the marked
maps associated with λ = 0, 0.4, 0.5 are displayed in Figure 5.3.3. The plot of total cost
vs. improvement parameter is displayed in Figure 5.3.4. Note that improving the original
marker as λ goes from 0 to 1 improves the traversal taken and the improvement appears
to be monotone.
λ + (1 − λ)⋅Yi if Xi = 1;
(1 − λ)⋅Yi if Xi = 0, Yi ′= (5.3.1)
99
Figure 5.3.3: Three trajectories under λ = 0, 0.4, 0.5 in the real terrain (left plots) and in the associated marked maps (right plots). Cd = 50. When λ = 0, the total cost is 951.84 with no disambiguation; when λ = 0.4, the total cost is 919.41 with one disambiguation; when λ = 0.5, the total cost is 903.26 with 3 disambiguations. Total simulation run time in a PC with Pentium 4 CPU and 1G RAM in the three cases are 1.797 seconds, 3.828 seconds and 5.797 seconds, respectively.
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
x
y
s
t
Cd = 50λ = 0
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
s
t
Cd = 50λ = 0
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
x
y
s
t
Cd = 50λ = 0.4
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
s
t
Cd = 50λ = 0.4
-400 -200 0 200 400
0
100
200
300
400
500
600
700
800
900
x
y
s
t
Cd = 50λ = 0.5
x
y
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
s
t
Cd = 50λ = 0.5
100
Figure 5.3.4: Plot of total cost vs. improvement parameter for COBRA runs. The mesh of values of λ are 0.1i, i = 1, 2, …, 10. Starting location: s = (−300, 250); target location t = (300, 600); and disambiguation cost per disk is Cd = 50.
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1880
890
900
910
920
930
940
950
960
970
980Total Cost vs. Improvement Parameter
λ
Tota
l Cos
t
101
6 Deterministic Shortest Path
All the policies introduced in Chapter 4 and Chapter 5 make the shortest path plans that
are dependent on the mark information, which come from sensor’s readings. It is
understood, especially from the experimental results so far, that a sensor of bad quality is
not reliable at all because the misleading information often results in high cost. The
unnecessary cost is due to the over-travel and over-disambiguation. A natural idea is to
let the agent take a deterministic shortest path (if it exists) from the starting location to
the target location if the sensor is too bad. Based on this idea, the length of the
deterministic shortest path forms an important criterion for quantifying the quality of a
sensor. Once a sensor classifier is found, any policy mentioned before can be adjusted so
that high cost due to the bad sensor quality could be avoided.
In this chapter, we first propose the concept of sensor classification based on the
deterministic shortest path, we then analytically show a sensor classifier in a simple case,
finally, based on the results of Monte Carlo simulations on the deterministic shortest path
in minefield, we empirically derive a classifier of the Beta sensors and propose an
adjustment on the adjusted CR policy introduced in Chapter 5. The goal of this
adjustment is to control the expected cost.
102
6.1 Sensor Classification
To formulate the concept of sensor classification, we use the same setting and notations
that appear in Chapter 4. Given Y ~ S, the classification rule under the measure of
expectation is for a policy P to use Y if
E[C(G, s, t, X, Y, P)] < Ld
or to ignore Y if
E[C(G, s, t, X, Y, P)] ≥ Ld,
where Ld is the length of the deterministic shortest path.
We now present an analytical result in a simple case.
Consider a convergent graph G = (V, A, B, l, c, ρ) with |B| = 1 that is illustrated in Figure
4.3.1. Let a policy P be either a threshold policy Θ(α) or a penalty policy Ψ( l% ) with the
arguments α ∈ [0, 1] and l% -penalty function being associated with the single
nondeterministic arc, say e. Note that P either leads to a constant cost L1, which is the
length of the deterministic shortest path, or lead to a random cost in the following form:
C(G, s, t, X, Y, P) =
c(e) + l(e) + L2 + L3 if Y(e) <α̂ and X(e) = 0;
L1 if Y(e) ≥α̂ ;
c(e) + L2 + L4 if Y(e) <α̂ and X(e) = 1,
103
where α̂ ∈ [0, 1]. We only need to discuss the second case and note that the expected
cost is
E[C(G, s, t, X, Y, P)]
= (c(e) + l(e) + L2 + L3)⋅P(Y(e) <α̂ and X(e) = 0) + L1⋅P(Y(e) ≥α̂ )
+ (c(e) + L2 + L4)⋅P(Y(e) <α̂ and X(e) = 1)
= (c(e) + l(e) + L2 + L3)⋅(1−ρ(e))⋅F0(α̂ )
+ L1⋅(1−ρ(e))⋅[1−F0(α̂ )] + L1⋅ρ(e)⋅[1−F1(α̂ )]
+ (c(e) + L2 + L4)⋅ρ(e)⋅F1(α̂ )
= L1 + (c(e) + L2 + L4 − L1)⋅ρ(e)⋅F1(α̂ ) − [L1 − (c(e) + l(e) + L2 + L3)]⋅(1−ρ(e))⋅F0(α̂ ).
Hence the classification rule under the measure of expectation is for P to use Y if
(c(e) + L2 + L4 − L1)⋅ρ(e)⋅F1(α̂ ) < [L1 − (c(e) + l(e) + L2 + L3)]⋅(1−ρ(e))⋅F0(α̂ )
or to ignore Y if
(c(e) + L2 + L4 − L1)⋅ρ(e)⋅F1(α̂ ) ≥ [L1 − (c(e) + l(e) + L2 + L3)]⋅(1−ρ(e))⋅F0(α̂ ).
If F1(α̂ ) = 0, then the rule reduces to use Y if F0(α̂ ) > 0 and ignore Y if F0(α̂ ) = 0. We
consider the nontrivial case that F1(α̂ ) > 0, and let r) (α̂ ) = F0(α̂ ) / F1(α̂ ), then we can
see that the rule becomes comparing the ratio r) (α̂ ) with the ratio
rG = 2 4 1
1 2 3
( ( ) ) ( )[ ( ( ) ( ) )] (1 ( ))
c e L L L eL c e l e L L e
ρρ
+ + − ⋅− + + + ⋅ −
.
To use Y , we require
104
r) (α̂ ) > rG,
which simply says that for policy P, a usually sensor must satisfy a minimum quality
requirement that is specified by the nature of graph. Note that improving the sensor S =
(F1, F2) means increasing F0(α̂ ) and decreasing F1(α̂ ), hence r) (α̂ ) is larger and
inequality (6.1.1) can be satisfied more easily.
A problem is that the theoretical critical ratio rG is unknown since ρ(e) is unknown. We
have to estimate ρ(e) via the observations of X(e) in order to apply rG. In practice,
however, there is only one observation of X(e). Analyzing the critical sensor ratio may
resort to simulations. This requires that the simulation model correctly captures the
nature of the terrain.
6.2 Applied to Minefield
We now turn to the minefield model in section 5.1 and the experimental setting in section
5.2. The policy is the adjusted CR policy that uses the CR weight function (5.1.6) in its
shortest path planning. Note that, in the minefield model, the deterministic s-t shortest
path is random, so is its length.
Under the experimental setting in section 5.2, we randomly generate 2500 terrains and
(6.1.1)
105
for each terrain, we calculate the length of the deterministic shortest path that avoids all
the risk disks. To find a deterministic shortest path, we can simply let the constant
disambiguation cost per disk Cd be sufficiently large (say 450). This setup let the adjusted
CR policy only make a single plan in each run. Let dL denote the expected length of the
deterministic shortest path. As in section 5.2, let E(C|λ) denote the mean cost under the
(Beta) sensor parameter λ. By our experimental results, E(C|λ) is strictly monotonically
decreasing with respect to λ. Hence we may expect E(C|λ) < dL if and only if λ > λ*
for some 0 < λ* < 3.5. The comparison between the average cost of deterministic
traversals and the average cost of nondeterministic traversals is illustrated in Figure 6.2.1.
As we expect, the critical sensor parameter is λ* = 1.8561, which means to use Yi, i = 1,
2, …, 100 if λ > 1.8561; or to ignore those Yi’s otherwise.
0 0.5 1 1.5 2 2.5 3 3.5160
165
170
175
180
185
190
λ
Ave
rage
Cos
t
Average Cost vs. Sensor Parameter
Average costs of nondeterministic traversals Average length of deterministic shortest pathsCritical sensor parameter
Figure 6.2.1: Deterministic shortest paths vs. nondeterministic traversals under the experimental setting of section 5.2. The critical parameter of the Beta sensor is λ* = 1.8561.
106
7 Summary, Conclusion, and Future Research
7.1 Summary and Conclusions
In this dissertation research, we developed a new formulation of the RDP problem under
the new concept of mark information and the associated new RDP algorithms. Also, for
minefield application, a fast, flexible RDP simulation program that is based on dynamic
A* search was delivered.
We found a new explanation of the A* algorithm with a primal-dual point of view. We
proved the tractability of the problem of traversing probabilistic graph in the special case
of parallel graph. We developed the concept of sensor and the new concept of marke
information based on sensor’s readings. We proposed the class of threshold policies and
the class of the penalty policies with both classes not only considering the markers but
also considering the disambiguation cost. We developed the important concept of sensor
monotonicity and proved some analytical monotonicity results in simple settings. We
developed an RDP simulation program for the minefield model and performed extensive
simulations to provide the numerical and statistical evidences of sensor monotonicity
under the minefield settings that are beyond the reach analytical studies. We also made
some experimental comparisons between the deterministic shortest paths and the
107
nondeterministic traversals.
Based on the operational features of the COBRA system, our new RDP model that is
based on the new concept of mark information well captures the nature of the navigation
and disambiguation problems posed by the COBRA group. This new formulation also
reflects the trend of development of modern navigation systems. That is, collecting
real-time information via the in-situ observation using high technology enhanced devices
such as UAV based sensors.
Experiment results show that the new RDP algorithm we developed for the minefield
model is both efficient and effective. Considerably many running cases show that our
current RDP simulation program can well handle one hundred disks and the graphic
results appear to be reasonable. More importantly, massive simulations have shown that
under our adjusted CR policy, the traversal performance has the property of sensor
monotonicity. The simulation tool, thus, is useful for virtually testing multiple sensors
and providing the information supporting to the cost/benefit analysis on whether a
superior (and presumably more expensive) sensor is worth additional cost.
Although our theoretical analysis is far behind our experimental studies, we did lay the
108
groundwork for possible future advances. For example, we already know that the
problem of traversing the probabilistic parallel graph, with the advent of the CR policy, is
tractable under the independent probability marker assumption. We also know that for
traversing the parallel graph with independently marked nondeterministic arcs, the class
of threshold policies is weakly monotone and for traversing the convergent graph with
only one marked nondeterministic arc, both the class of threshold policies and the class
of penalty policies are strongly monotone. It’s naturally motivated to extend these results
to more general graphs or more general conditions (e.g., dependence).
7.2 Future Research
We suggest several directions for future research:
The first direction is on continual RDP modeling. The goal of this direction is to let the
new formulations of the RDP problem be more and more realistic. The new concept of
the mark information based on sensor’s readings represents a leap of the development of
the RDP modeling. There are also many other issues that should be considered. For
example, modeling a minefield in some costal area must also take the costal type into
consideration. A planner must combine the information of the charted minefield with the
geographic information so that the agent not only avoids the mine threats but also detours
109
the geographically inaccessible regions. A simulated minefield in costal environment is
illustrated in Figure 7.2.1. The terrain is fractally generated using the diamond square
algorithm of Miller 1986 [54] . The future RDP simulation program should be able to
interface with both the detection data and the simulated terrain matrices. Like mentioned
in Chapter 1, the target location might be changeable when the argent travels, or as
illustrated in the right plot of Figure 7.2.1, the target might be a region other than a point.
Hence attention should also be given to the modeling of the target. A more important
concern is the location uncertainty, which is mentioned in [1] as a strong suggestion for
the focus of future endeavors. Various types of naval mines and laying strategies were
introduced in [55], [56]. From the practical point of view, a water minefield pattern is
changeable over time. For example, some mines are drifting either because they are
floating mines or because some moored mines break from their moorings. Usually, a
Figure 7.2.1: Example of minefield setting with costal geography incorporated. Left: fixed target location; Right: a target region.
s
t
s
t
110
naval ship is also equipped with the short-range scouting helicopters that can provide
continual scan information. Therefore, to match practices, the model with one-time prior
knowledge may be replaced by the one that considers multi-stage prior knowledge.
Once a RDP model is established, the next to do are the policy design and the building of
the simulation program. Although our current simulation programs can well handle 100
disks in R2, the applicability is still very limited. Our suggestions for improvement
include
1) When the target is a fixed point in R2 and the grid graph that is used to discretize the
world is large (e.g., 103×103 Z2 endowed with eight adjacency), replace the A*
algorithm with the D* algorithm (or the D* Lite algorithm) for replanning since the
D* replanning enables real-time operation.
2) Model the target and enable the simulation program to handle the changing target or
target region. To accelerate the A* planning and replanning when the target is far
away from the starting location, a grid graph with heterogeneous resolutions (e.g.,
quadtree) can be used.
3) Model the changing pattern of the minefield (especially the location uncertainty) and
enable the simulation program to simulate the multi-channel updates on the
knowledge of the world.
111
4) Design the policies that not only deal with the uncertainty of true-false status but also
deal with the uncertainty of locations. Virtually test various policies and choose those
that have relatively small cost.
5) Enable the simulation program to incorporate the simulated geographic data into the
planning and replanning.
The third direction is on purely theoretical investigation. The preliminary analytical
results we have proved in simple cases, together with the experimental results we have
obtained under more general settings, lead us to make many conjectures. For example,
we conjecture that the CR policy is strongly monotone with respect sensor under general
graph setting. And this result may hold even without the assumption of independence. We
may continue our efforts in proving more general monotonicity results. As mentioned in
Chapter 6, the distribution of the deterministic shortest path under the minefield setting
(in Chapter 5) is vey interesting. We may explore some distributional properties of the
deterministic shortest path. There are still open questions regarding the formulation as
traversing probabilistic graphs (see Mani et al. [57]). Is the problem of traversing some
probabilistic graph more general than the parallel graph also tractable? How well the CR
policy performs for traversing general graphs?
112
Bibliography
[1] C. E. Priebe, D. E. Fishkind, L. Abrams, and C. D. Piatko. Random Disambiguation Paths. Naval
Research Logistics, Vol. 52, pp. 285–292, 2005.
[2] D. E. Fishkind, C. E. Priebe, K. E. Giles, L. N. Smith, and V. Aksakalli. Disambiguation
Protocols Based on Risk Simulation. IEEE Transactions on System, Man and Cybernetics, Part A.
Vol. 37, No. 5, pp. 814 – 823, 2007.
[3] V. Aksakalli, D. E. Fishkind, C. E. Priebe, and X. Ye. The CR Disambiguation Protocol.
Computer and Operations Research, to appear, 2008.
[4] X. Ye, C. E. Priebe, D. E. Fishkind, and L. Abrams. Sensor Information Monotonicity in
Disambiguation Protocols. Submitted for publication, 2008.
[5] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to
Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001.
[6] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and
Applications. Prentice Hall, NJ, 1993.
[7] T. Lozano-Perez and M. A. Wesley. An Algorithm for Planning Collision-Free Paths among
Polygonal Obstacles. Communications of the ACM, 22(10), pp. 560-570, 1979.
[8] E. Welzl. Constructing the Visibility Graph for N-Line Segments in O(n2) Time, Information
Processing Letters, 20, pp. 167-171, 1985.
[9] S. K. Ghosh and D. M. Mount. An Output-Sensitive Algorithm for Computing Visibility Graphs.
SIAM Journal on Computing, 20(5), pp. 888-910, 1991.
[10] M. L. Fredman & R. E. Tarjan. Fibonacci Heaps and Their Uses in Improved Network
Optimization Algorithms. Journal of the Association for Computing Machinery, 34(3), pp.
596-615, 1987.
[11] R. Ahuja, K. Mehlhorn, J. B. Orlin, R.E. Tarjan. Faster Algorithms for the Shortest Path Problem.
113
Journal of the Association for Computing Machinery, 37(2), pp. 213-223, 1990.
[12] J. S. B. Mitchell. Shortest Paths among Obstacles in the Plane. In Proceedings of the 9th ACM
Symposium on Computational Geometry, pp. 308-317, 1993.
[13] M. Bern, D. Eppstein, and J. R. Gilbert. Provably Good Mesh Generation. In Proceedings of the
31st IEEE Symposium on Foundations of Computer Science, pp. 231-241, 1990.
[14] J. Hershberger and S. Suri. Efficient Computation of Euclidean Shortest Paths in the Plane,
Proceedings. 34th IEEE Annual Symposium on Foundations of Computer Science, pp. 508-517,
1993.
[15] D. Z. Chen. Developing Algorithms and Software for Geometric Path Planning Problems. ACM
Computing Surveys, 28 (4), Article No. 18, 1996.
[16] J.-C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Norwell, MA, 1991.
[17] Y. K. Hwang and N. Ahuja. Gross Motion Planning: A Survey. ACM Computing Surveys, 24(3),
pp. 219-291, 1992.
[18] B. Grunbaum and G. C. Shephard. Tilings and Patterns. W. H. Freeman and Company, New York,
NY, 1986.
[19] D. Chavey. Tilings by Regular Polygons — II: A Catalog of Tilings. Computers & Mathematics
with Applications 17: 147–165, 1989.
[20] A. Patel. Amits’ Thought on Grids. 2006. http://www-cs-students.stanford.edu/~amitp/game-programming/grids/
[21] H. Samet. An Overview of Quadtrees, Octrees, and Related Hierarchical Data Structures. NATO
ASI Series, Vol. F40, 1988.
[22] S. Kambhampati, L. Davis. Multiresolution Path Planning for Mobile Robots. IEEE Journal of
Robotics and Automation, 2(3), pp. 135- 145, 1986.
[23] D. Z. Chen, R. J. Szczerba, and J. J. Uhran Jr.. A Framed-Quadtree Approach for Determining
Euclidean Shortest Paths in a 2-D Environment. IEEE Transactions on Robotics and Automation,
13(5), pp. 668-681, 1997.
[24] A. Yahja, A. Stentz, S. Singh, and B. Brummit. Framed-Quadtree Path Planning for Mobile
114
Robots Operating in Sparse Environments. In Proceedings, IEEE Conference on Robotics and
Automation, (ICRA), Leuven, Belgium, May, 1998.
[25] P. E. Hart, N. J. Nilsson, and B. Raphael. A Formal Basis for the Heuristic Determination of
Minimum Cost Paths. IEEE Transactions on Systems Science and Cybernetics, SSC4 (2), pp.
100–107, 1968.
[26] N. J. Nilsson. Principles of Artificial Intelligence. Morgan Kaufmann, San Mateo, California,
1980.
[27] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison-Wesley,
1984.
[28] S. J. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Pearson Education,
2003.
[29] P. Lester. 2005. A* Pathfinding for Beginners. http://www.policyalmanac.org/games/aStarTutorial.htm
[30] A. Patel. 2006. Amit’s A* pages. http://theory.stanford.edu/~amitp/GameProgramming/
[31] R. E. Korf. Depth-First Iterative-Deepening: An Optimal Admissible Tree Search. Artificial
Intelligence, 27 (1), pp. 97–109, 1985.
[32] P. P. Chakrabarti, S. Ghose, A. Acharya and S. C. De Sarkar. Heuristic Search in Restricted
Memory. Artificial Intelligence, 41(2), pp. 197-221, 1989.
[33] S. J. Russel. Efficient Memory-Bounded Search Methods. In ECAI 92: 10th European
Conference on Artificial Intelligence Proceedings, pp1-5, Vienna, Austria. Wiley.
[34] R. E. Korf. Linear-Space Best-First Search. Artificial Intelligence, 62(1): pp. 41-78, 1993.
[35] R. E. Korf and W. Zhang, Frontier Search, Journal of the Association for Computing Machinery,
52 (5), pp. 715-748, 2005.
[36] A. Zelinsky. A Mobile Robot Exploration Algorithm. IEEE Transactions on Robotics and
Automation, 8 (6), December, 1992.
[37] A. Stentz. Optimal and Efficient Path Planning for Partially-Known Environments. In
Proceedings of the IEEE International Conference on Robotics and Automation, May 1994.
115
[38] A. Stentz. The Focussed D* Algorithm for Real-Time Replanning. In Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI), 1995.
[39] S. Koenig and M. Likhachev. D* Lite. Proceedings of the Eighteenth National Conference on
Artificial Intelligence (AAAI), pp. 476-483, 2002.
[40] S. Koenig and M. Likhachev. Incremental A*. Advances in Neural Information Processing
Systems 14 (NIPS), MIT Press, Cambridge, MA, 2002.
[41] G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path
problem. Journal of Algorithms, 21, pp. 267–305, 1996.
[42] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization, Dover Publications, INC.,
1998.
[43] I. Pohl. Bidirectional Search, in Machine Intelligence 6, B. Meltzer and D. Michie, Eds.,
American Elsevier, New York, pp. 127-140, 1971.
[44] A. Bar-Noy and B. Schieber. The Canadian Traveller Problem. SODA ’91: Proceedings of the
Second Annual ACM-SIAM Symposium on Discrete Algorithms, 1991.
[45] C.H. Papadimitriou and M. Yannakakis. Shortest Paths without A Map, Theoretical Computer
Science, 84, pp. 127–150, 1991.
[46] D.M. Blei and L.P. Kaelbling. Shortest Paths in a Dynamic Uncertain Domain. IJCAI Workshop
on Adaptive Spatial Representations of Dynamic Environments, 1999.
[47] M. Likhachev, G. Gordon, and S. Thrun. Planning for Markov Decision Processes with Sparse
Stochasticity. In Lawrence K. Saul, Yair Weiss, and L´eon Bottou, Editors, Advances in Neural
Information Processing Systems, MIT Press, Cambridge, MA, 2005.
[48] G. Andreatta and L. Romeo. Stochastic Shortest Paths with Recourse. Networks, 18, pp. 193–204,
1988.
[49] G. H. Polychronopoulos and J. N. Tsitsiklis. Stochastic Shortest Path Problems with Recourse.
Networks, 27, pp. 133–143, 1996.
[50] J. S. Provan. A Polynomial-Time Algorithm to Find Shortest Paths with Recourse. Networks, 42,
116
pp. 115–125, 2003.
[51] M. Shaked and J. G. Shanthikumar. Stochastic Orders and Their Applications. Associated Press,
1994.
[52] C.E. Priebe, J.S. Pang, and T. Olson. Optimizing Sensor Fusion for Classification Performance.
Proceedings of the CISST 1999 International Conference, pp. 397–403, 1999.
[53] T. Olson, J. S. Pang, and C. E. Priebe. A Likelihood-MPEC Approach to Target Classification.
Mathematical Programming, 96, pp.1–31, 2002.
[54] G. Miller. The Definition and Rendering of Terrain Maps. Computer Graphics, 20 (4), pp. 9-48,
1986.
[55] G. K. Hartmann and S. C. Truver. Weapons That Wait: Mine Warfare in the U.S. Navy. Naval
Institute Press, Annapolis, 1991.
[56] A. Washburn. Mine Warfare Models. Naval postgraduate school notes, September 2007.
[57] M. Mani, A. Zelikovsky, G. Bhatia, and A. B. Kahng. Traversing Probabilistic Graphs. Technical
Report, University of California at Los Angels, 1999.