23
Finding Optimal Travel Routes with Uncertain Cost Data Denis J. Dean School of Economic, Political and Policy Sciences, University ofTexas at Dallas Abstract Geospatial data analysis techniques are widely used to find optimal routes from specified starting points to specified destinations. Optimality is defined in terms of minimizing some impedance value over the length of the route – the value to be minimized might be distance, travel time, financial cost, or any other metric. Conventional analysis procedures assume that impedance values of all possible travel routes are known a priori, and when this assumption holds, efficient solution strategies exist that allow truly optimal solutions to be found for even very large problems. When impedance values are not known with certainty a priori, exact solution strategies do not exist and heuristics must be employed. This study evaluated how the quality of the solutions generated by one such heuristic were impacted by the nature of the uncertainty in the cost database, the nature of the costs themselves, and the parameters used in the heuristic algorithm. It was found that all of these factors influenced the qualities of the solutions produced by the heuristic, but encouragingly, an easily controlled parameter of the heuristic algorithm itself played the most important role in controlling solution quality. 1 Introduction Finding optimal travel routes is arguably one of the most common contemporary applications of geospatial information science. Web sites such as Google Maps and MapQuest have made routefinding one of the key components of their services, and many consumer GPS devices include routefinding capabilities as part of their basic functionality. One common factor in these systems is a lack of uncertainty; each of the applications just mentioned assumes that all pertinent information impacting the routefinding problem is known a priori and with complete certainty. Under many circumstances this is not an unreasonable assumption, but it is easy to find situations where this assumption fails to approximate reality. Uncertain routefinding problems occur whenever a priori information about potential routes is incomplete (e.g. an automobile navigation system may not contain up-to-date infor- mation regarding construction projects), inaccurate (e.g. a robotic navigation system may indi- cate that an obstacle is in a certain location, but it might have been moved), or insufficiently detailed (e.g. a cross-country routefinding problem might be based upon a terrain database whose resolution is not fine enough to identify small features that impact routefinding deci- sions). Regardless of its cause, uncertainty renders a priori solutions to routefinding problems suspect, because there is no guarantee that such solutions will be optimal or even feasible. Solving routefinding problems under uncertainty is a nontrivial undertaking. In most cases, exact solutions do not exist, so the problem is approached using heuristics. Details of these heuristics vary, based on the particular problems they are designed to address. However, the majority of uncertain routefinding problems that must be solved in real time – including navigation by autonomous or semi-autonomous robots, in-vehicle navigational aids facing Address for correspondence: Denis J. Dean, University of Texas at Dallas, Mail Stop GR31, 800 West Campbell Road, Richardson, TX 75080-3021, USA. E-mail: [email protected] Research Article Transactions in GIS, 2013, 17(2): 159–181 © 2012 Blackwell Publishing Ltd doi: 10.1111/j.1467-9671.2012.01360.x

Finding Optimal Travel Routes with Uncertain Cost Data

  • Upload
    denis-j

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Finding Optimal Travel Routes with Uncertain Cost Data

Denis J. Dean

School of Economic, Political and Policy Sciences, University of Texas at Dallas

AbstractGeospatial data analysis techniques are widely used to find optimal routes from specified starting pointsto specified destinations. Optimality is defined in terms of minimizing some impedance value over thelength of the route – the value to be minimized might be distance, travel time, financial cost, or any othermetric. Conventional analysis procedures assume that impedance values of all possible travel routes areknown a priori, and when this assumption holds, efficient solution strategies exist that allow trulyoptimal solutions to be found for even very large problems. When impedance values are not known withcertainty a priori, exact solution strategies do not exist and heuristics must be employed. This studyevaluated how the quality of the solutions generated by one such heuristic were impacted by the nature ofthe uncertainty in the cost database, the nature of the costs themselves, and the parameters used in theheuristic algorithm. It was found that all of these factors influenced the qualities of the solutions producedby the heuristic, but encouragingly, an easily controlled parameter of the heuristic algorithm itself playedthe most important role in controlling solution quality.

1 Introduction

Finding optimal travel routes is arguably one of the most common contemporary applicationsof geospatial information science. Web sites such as Google Maps and MapQuest have maderoutefinding one of the key components of their services, and many consumer GPS devicesinclude routefinding capabilities as part of their basic functionality. One common factor inthese systems is a lack of uncertainty; each of the applications just mentioned assumes that allpertinent information impacting the routefinding problem is known a priori and with completecertainty. Under many circumstances this is not an unreasonable assumption, but it is easy tofind situations where this assumption fails to approximate reality.

Uncertain routefinding problems occur whenever a priori information about potentialroutes is incomplete (e.g. an automobile navigation system may not contain up-to-date infor-mation regarding construction projects), inaccurate (e.g. a robotic navigation system may indi-cate that an obstacle is in a certain location, but it might have been moved), or insufficientlydetailed (e.g. a cross-country routefinding problem might be based upon a terrain databasewhose resolution is not fine enough to identify small features that impact routefinding deci-sions). Regardless of its cause, uncertainty renders a priori solutions to routefinding problemssuspect, because there is no guarantee that such solutions will be optimal or even feasible.

Solving routefinding problems under uncertainty is a nontrivial undertaking. In mostcases, exact solutions do not exist, so the problem is approached using heuristics. Details ofthese heuristics vary, based on the particular problems they are designed to address. However,the majority of uncertain routefinding problems that must be solved in real time – includingnavigation by autonomous or semi-autonomous robots, in-vehicle navigational aids facing

Address for correspondence: Denis J. Dean, University of Texas at Dallas, Mail Stop GR31, 800 West Campbell Road, Richardson, TX75080-3021, USA. E-mail: [email protected]

bs_bs_banner

Research Article Transactions in GIS, 2013, 17(2): 159–181

© 2012 Blackwell Publishing Ltd doi: 10.1111/j.1467-9671.2012.01360.x

uncertain traffic conditions, routefinding by autonomous automobiles, and so forth – areaddressed using variations of a technique referred to here as the Route Finding with PeriodicRevision (or RFPR) approach. RFPR is an iterative technique that plans a route with the bestdata available, moves a certain distance along that route, then uses inputs from sensors (in thecase of autonomous robots), drivers (in the case of in-vehicle navigation aids) or other sourcesto reevaluate and revise its route. Obviously, the effectiveness of this sort of algorithm will beinfluenced by factors such as the quality and level of detail present in the a priori database, therange and accuracy of the sensing systems used, and the frequency with which reevaluationsand revisions are made. The purpose of this article is to evaluate how these and other factorsimpact RFPR solutions. A Monte Carlo simulation approach was used to compare RFPRroutes to optimal routes under various conditions of initial data quality/level of detail, robotsensor characteristics, and update frequency. Analysis of these comparisons reveals that thequality of RFPR routes is influenced by each of these factors, but reevaluation frequency is thesingle most influential factor.

2 Literature Review

There is a very substantial body of literature concerning uncertainty in optimal routefinding.Complete evaluation of all of this literature is impossible in the space available, but some ofthe more pertinent themes will be reviewed.

Much of the literature formulates the pathfinding problem using a network approach, andaddresses the problem using graph theory. In GIS, this is implemented using a vector line data-base in arc-node format (Burrough and McDonnell 1998). Arcs represent possible travel pathsand nodes represent decision points where two or more paths converge. Each arc (and some-times each node) is assigned a weight (referred to as an impedance) that reflects how difficult itis to travel the length of the arc or pass through the node. In this approach, the problem offinding an optimal path boils down to identifying the set of arcs and nodes that: (1) connectsthe starting and ending nodes; and (2) results in the lowest total impedance of all sets thatmeet criterion (1). In the deterministic case, this problem is solved using Dijkstra’s algorithm(Dijkstra 1959).

Uncertainty is introduced into this model by treating impedance values as uncertain. Inthis approach, the network structure (i.e. the number and connectivity of arcs and nodes)remains deterministic, but the impedance associated with the arcs and/or nodes becomesuncertain. There is no universally accepted procedure for addressing this sort of problem.Chen and Ji (2005), Fan and Nie (2006) and Miller-Hooks (2001), among many others, haveaddressed this problem using a variety of heuristic techniques, including adaptive systems andgenetic approaches. Rilett and Park (2001), among others, have extended the basic uncertainroutefinding problem to include multiple criteria to judge optimality. For example, the opti-mality of a route along roads may be judged not only on its traverse time, but also on thenumber of intersections encountered along the route (e.g. simpler routes with fewer intersec-tions are more optimal than routes with more intersections). Multicriteria problems areoutside the scope of this study, so will not be discussed here.

Another area where pathfinding problems have generated a substantial body of literatureconcerns movement of autonomous or semi-autonomous robots. In this context, the problemcalls for a robot to plan a route to a desired destination under the constraint that the routeminimize one or more forms of risk – e.g. the risk to the robot posed by terrain where it mightget damaged or stuck, the risk to the robot’s surroundings posed by collisions while the robot

160 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

is underway, and so forth. This sort of problem is increasingly common in real-world indus-trial situations, where robots maneuver around humans, equipment, and each other on assem-bly lines, in transportation and warehousing environments, and so on.

Unlike the models discussed previously, in robot pathfinding the environment is typicallyrepresented using raster databases (Schultz et al. 1999). In GIS, raster-based pathfinding prob-lems are usually addressed using cost spreading (Dean 1997). Cost spreading is perhaps mosteasily envisioned as a form of dynamic programming (Smith 1989, Huriot et al. 1989), but itcan also be shown to be equivalent to the network approach where a node falls at the center ofeach raster cell, and each node is connected by arcs to the nodes in the centers of neighboringcells (Xu and Lathrop 1995). Cost spreading is computationally intensive, although someauthors have developed techniques that dramatically improve the efficiency of conventionalcost spreading algorithms (Douglas 1994). When the raster model is viewed as a network,uncertainty in pathfinding is handled as discussed previously: a network representation of theraster database is constructed, and impedance values along this network are treated stochasti-cally (Hu and Brady 1997).

3 Case Study Framework

Perhaps no application of real-time uncertain routefinding captures the imagination quite sopowerfully as does the case of robot navigation on the surface of Mars. In 2004, NASA (theNational Aeronautics and Space Administration; the U.S. government’s civilian space agency)landed two robotic rovers at widely dispersed locations on the Martian surface. The tworovers, named Spirit and Opportunity, were designed to operate for 90 Martian days, butdefied all expectations and, as of this writing, one continues to move across the Martiansurface (albeit with some limitations due to hardware failures) seven Earth years after landingon the red planet (Soderblom et al. 2008). These two rovers were joined by a third, largerrover (named Curiosity) in 2012.

The rovers navigate primarily via choreographed movements sent verbatim from humancontrollers based on Earth, but they can employ autonomous techniques based on onboardpathfinding software. Manual navigation is considered the safest way to maneuver the rovers,but is very slow, especially in light of the fact that one-way radio signals from Earth takebetween 3 and 20 minutes to reach the rovers (depending on the relative positions of Earthand Mars in their respective orbits around the Sun). Automated routefinding is unquestionablyfaster, but increases the risk of the rovers getting mired in non-traversable terrain or beingdamaged or destroyed by moving into hazardous locations (Arvidson et al. 2008, Maurette2003).

The remainder of this paper is inspired by the robotic cross-country route finding scenariofaced by these Mars rovers. Thus, the routefinding problem discussed here involves a robotnavigating cross country through a static terrain (i.e. the obstacles that the robot encountersdo not move; they remain fixed within the landscape) that is imperfectly known (i.e. thegeneral nature of the terrain is known a priori through an imperfect DEM, but the details ofthe landscape can only be determined by inspection from the robot’s sensors). It must be rec-ognized that other scenarios are possible. For example, consider an industrial situation wherea robot is required to move about in a manufacturing facility. The overall layout of this facilitymay well be known a priori and with absolute certainty (i.e. the a priori database may accu-rately describe the dimensions of the floor space, the location of fixed machinery, and soforth), but the locations of dynamic obstacles like people and other robots moving through the

Finding Optimal Travel Routes with Uncertain Cost Data 161

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

space will remain unknown until they are detected in real time by the robot. Thus, the resultspresented here pertain to a single type of real-time uncertain routefinding problem, and may ormay not be generalizable other variations of this problem.

The current generation of robots typically update their databases using informationgleaned from optical devices or sonar, radar, or LiDAR sensors (Angelova et al. 2007, Chakra-vorty and Junkins 2007, Olson et al. 2003). All sensors systems have limited fields of viewand/or ranges, so the information they supply must have a limited geographic scope. Thismeans that as a robot moves through its environment, it can only gather information toupdate that portion of its model of its surroundings that can be observed by its sensors. Fur-thermore, when using even a moderately sophisticated sensor and a model of the environmentof modest size, the amount of time needed to interpret the sensor data, update the networkinformation, and solve the new shortest path problem is not insignificant. Therefore, continu-ous scanning, interpreting, updating and route finding is frequently impractical; model updat-ing and subsequent route revision can only take place at selected times. Thus, a practicalmethodology for robot movement (and the methodology which will be simulated in this casestudy) involves the following six steps:

1. Remain stationary (or if currently in motion, come to a stop); scan the portion of the envi-ronment that can be imaged from the robot’s current location,

2. Interpret the sensor data,3. Use the interpreted sensor data to update the model of the surrounding environment,4. Solve a shortest path problem from the robot’s current location to the desired target using

the current model of the environment,5. Move a certain distance along the path from step (4) until either (a) the target is reached

(in which case the pathfinding process terminates), or (b) some predefined stopping criteria(which prevents the robot from assuming too much risk by moving through its uncertainenvironment) is reached, and

6. Return to step (1).

Note that while this case study was inspired by the Mars rovers, it is also applicable closerto home. Cross-country navigation by Autonomous Land Vehicles (AVLs) is an active researcharea in the field of robotics (Green and Kelly 2011, Kelly et al. 2006, Wellington et al. 2006).On some level, conventional in-vehicle navigational aids face this problem, when it is deter-mined that routes identified in their databases are no longer passable due to construction, acci-dents, and so forth. Even humans navigating cross country across unfamiliar terrain employtechniques similar to those encountered here; i.e., we plan a general route using a map thatprovides a general overview of the terrain, but we revise that route as obstacles not shown onthe map come into view.

4 Methodology

4.1 General Approach

This study employed a Monte Carlo simulation approach. The simulator was designed toinvestigate the impact of the following factors on the performance of the RFPR algorithm:

1. The roughness of the terrain;2. The accuracy of the initial terrain model and the distributional characteristics of whatever

error is present in the model;

162 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

3. The length of the route being evaluated;4. The characteristics of the sensors (range and accuracy); and5. The criteria used to determine how far along its planned path the robot moves prior to

re-evaluating its path.

A computerized system was developed to conduct this Monte Carlo analysis using a com-bination of ArcObjects (Esri 2010), Python scripts (Rossum and Drake 2006) and Visual Basicprograms (Microsoft Corporation 2010). Parameters designed to reflect each of the five factorsjust listed were entered as inputs into this system, which used them to: (1) develop a simulatederror-free database that could be used to find a truly optimal route from some arbitrary start-ing point within the database to some other arbitrary target point; (2) artificially introduce acontrolled amount of error (with known distributional characteristics) into the database; and(3) use the RFPR method to find the route from the same starting point to the same targetpoint through the database containing introduced error. As output, the simulator produced aseries of comparisons between the optimal and RFPR routes. This simulation process was usedrepeatedly to generate a database containing multiple route comparisons. A flowchart showingthe general operation of the simulation system is shown in Figure 1.

The simulation starts by creating four congruent raster databases. The first is a syntheticDEM featuring a controlled amount of spatial autocorrelation. DEMs featuring high autocor-relation reflect gentle terrain, because individual raster cells tend to contain elevation valuessimilar to those of neighboring cells. Conversely, low levels of spatial autocorrelation producehighly convoluted terrain, because individual cell values tend to be dissimilar from their neigh-bors. Synthetic DEMs were created using the MidPoint Displacement Method (MPDM),described in detail by Saupe (1988). The amount of autocorrelation in a database produced bythe MPDM is controlled using a parameter denoted as h; this ranges from zero to one, withlow h values producing databases with little autocorrelation. However, since the MPDM isstochastic, the actual amount of autocorrelation present in any given MPDM database willvary; e.g. two MPDM databases produced using identical h values will likely have somewhatdifferent levels of actual autocorrelation. Thus, h is best thought of as a measure of the tar-

Figure 1 General flowchart of the Monte Carlo simulation process. The explanatory variables listedhere as inputs are described in more detail in Tables 1 and 2

Finding Optimal Travel Routes with Uncertain Cost Data 163

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

geted amount of spatial autocorrelation; the actual amount of autocorrelation present in anyMPDM database must be measured empirically.

The databases produced by the MPDM were rescaled so that the values they containedranged from 1 to 10. This was done to ensure consistency; the stochastic nature of the MPDMalgorithm ensured that the range of values it produced varied from iteration to iteration. Bystandardizing the range of synthetic DEM values, the relative magnitudes of the elevationvalues and error values (to be described shortly) could be made comparable betweeniterations.

Synthetic DEMs were also subjected to hydrologic fill analyses designed to ensure that thelandscapes they represented did not contain basins from which no drainage was possible (Tar-boton et al. 1991). This was done to: (1) ensure that the resulting landscapes were more intui-tively realistic than would otherwise be the case; and (2) ensure that the resulting syntheticDEMs contained identifiable drainages and ridges, which (as shall be seen shortly) was neces-sary for one form of DEM error investigated in this study.

The second database created by the simulator represented point obstacles that had to beavoided but would not be reflected in the DEM. An example might be an area of deep, softsand in which a robot might become mired. This database was binary; each cell was simplyclassified as either being an obstacle or not. This database was created by initially assigning allcells to the “not obstacle” category, and then iteratively picking random cells (without replace-ment, so that no cell could be picked more than once) to place in the “obstacle” category. Thisprocess terminated when a specified number of obstacle cells were created. The process wasuniform; i.e. all cells had an equal probability of being placed into the obstacle category.

The third created database represented the error in the initial DEM. This database con-tained real values, and was added to the synthetic DEM (which was assumed to be error-free)to produce the error-filled DEM. Three different techniques were used to create this errorsurface, resulting in what were termed three different error scenarios. These scenarios will bedescribed in the Error Scenarios section.

The final initial database identified the starting and ending points of the robot’s routethrough the landscape. In order to reduce variability between iterations of the simulation, eachpair of starting and ending points were chosen so that they fell between 50 and 100 cell widths(as measured along a straight line connecting the centers of the starting and ending cells) ofone another. This was accomplished by randomly picking a starting cell (where every non-obstacle cell in the database had an equal probability of being picked as the starting point cell– neither the starting nor ending points were allowed to fall in raster cells containing obsta-cles), conducting a Euclidean distance analysis from that cell, and randomly selecting anending point within the prescribed range of distances from the starting point. If the Euclideandistance analysis indicated that no non-obstacle cell in the database fell within the prescribedrange of distances, a new starting point was randomly chosen and the process of finding anending point was repeated.

Once these initial databases were created, the Monte Carlo simulator used them in twoparallel analyses. The first ignored uncertainty and found the optimal path from the startingpoint to the ending point. The second introduced simulated uncertainty into the analysis andused the RFPR approach to find a route from the starting to the ending points. Both of theseanalyses were accomplished using anisotropic cost spreading.

Anisotropic cost spreading requires as inputs: (1) one or more raster databases thatdescribe the impedance values associated with traversing from the center of each raster cell tothe centers of each neighboring cell; and (2) the starting and ending points of the route. Theanisotropic cost spreading implementation used in this study required eight raster databases to

164 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

describe cell-center-to-cell-center impedance values, one for each possible direction of travel(Dean 1997). Gradients (e.g. vertical angles along the terrain surface in specific directions)were used as the basic impedance values. These basic values were modified to account for thefeatures identified in the obstacles database.

Precisely how the unit cost databases were constructed varied somewhat between theanalysis designed to find the optimal path and the analysis designed to replicate RFPR naviga-tion. When finding optimal paths, unit costs were constructed via a two-step process. First,simple map algebra was used to construct eight gradient databases (one representing each ofthe eight possible directions of movement from any given raster cell center to an adjacent cellcenter) from the synthetic DEM. Gradients were expressed in absolute degrees; thus, each cellcontained a value between zero and 90. Next, raster cells identified as obstacles were assigned“NoData” values in each of the eight gradient databases; these values indicated that obstaclecells could not be part of any optimal path. The resulting databases were used as unit costs inthe analysis that found the true optimal path from the source to the target.

The process used to create the unit costs databases used in the RFPR process started withthe same synthetic DEM, but prior to the construction of the unit cost databases the syntheticDEM was modified to take into account both simulated error and information gathered by therobot’s sensors. This was accomplished by modifying the error surface created previously toaccount for the reduction in error created by processing information from the robot’s sensors(this process is described in the Simulating Sensor Inputs section), and then adding the modi-fied error surface to the DEM. Gradients were then constructed using map algebra. A secondsensor input simulation (once again described in the Simulating Sensor Inputs section) wasthen used to determine which obstacles had been identified by the robot, and identified obsta-cle cells were assigned “NoData” values.

An anisotropic routefinding analysis was then conducted, and the robot moved a prede-fined number of cell widths along the resulting path. The error surface was then modifiedagain to reflect the additional information obtained by the robot’s sensors at its new locationand the set of identified obstacles was expanded to include any additional obstacles detectedby the sensors at the new location. The process was iteratively repeated until either the robotencountered an undetected obstacle (in which case the RFPR solution was deemed unsuccess-ful) or the target point was reached (in which case the RFPR solution was successful and com-pared to the true optimal solution). An example of truly optimal and RFPR routes generatedin this fashion is shown in Figure 2.

4.2 Comparing Routes

Optimal routes through the error-free DEM were compared to routes found via the RFPRtechnique using the five metrics summarized in Table 1. The success metric was the simplest;this was a binary measure that recorded whether or not the RFPR route reached the targetwithout encountering an obstacle. If the RFPR route did encounter an obstacle, none of theother metrics were computed.

Assuming the RFPR algorithm was successful, Goodchild and Hunter’s (1997) tech-niques were used to construct the next two metrics. These techniques quantify the spatialsimilarity of two lines. Basically, Goodchild and Hunter’s metric is the width of a bufferaround one line that captures a predefined percentage of the length of the second line. Obvi-ously, smaller values of this metric indicate lines closer together in space. Unfortunately,Goodchild and Hunter’s procedure can produce different results depending upon which lineis considered “first” and which is considered “second.” Thus, each time the metric was

Finding Optimal Travel Routes with Uncertain Cost Data 165

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

Figure 2 Examples of truly optimal (shown in green) and RFPR (shown in blue) routes. Both routesstart at the upper left and end in the lower right. Red points along the RFPR route are locationswhere the robot updated is route. Background shading represents elevation (lighter shades repre-sent higher elevations)

Table 1 RFPR performance metrics evaluated in this study. Names in italics are used to identify thevariables in the text and subsequent tables

1. Success A binary variable indicating that either the target was reachedwithout encountering any fatal obstacles (value = 1) or not(value = 0).

2. Buffer 90 Goodchild and Hunter’s (1997) linear feature similarity metric(at 90% confidence) comparing the actual minimum cost pathand the path produced via RFPR. Measured in cell widths.

3. Buffer 95 Goodchild and Hunter’s (1997) linear feature similarity metric(at 95% confidence) comparing the actual minimum cost pathand the path produced via RFPR. Measured in cell widths.

4. Added Cost The difference in route costs (i.e. the total traversing cost of theRFPR route minus the total traversing cost of the true optimalroute) divided by the total cost of the true optimal route.

5. Added Length The difference in route lengths (i.e. the total length of the RFPRroute minus the total length of the true optimal route) dividedby the total length of the true optimal route.

166 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

applied in this study, it was computed both ways (i.e. once with the optimal path considered“first” and a second time with the RFPR path considered “first”) and the results were aver-aged. This metric was computed twice, once using 90% as the predefined target percentageand once using 95%.

The final two metrics were measures of length and cost. The difference in length betweenthe two routes (i.e. length of RFPR route minus length of the optimal route) was expressed asa percentage of the length of the optimal route. Each route also has a total cost (i.e. the sum ofthe unit costs of all the cells the route traverses), and the difference in total costs (cost of theRFPR route minus cost of the optimal route) was again expressed as a percentage of the costof the optimal route.

4.3 Error Scenarios

The third raster database created by the simulator represented the error present in the initialDEM. In order to investigate the impacts of DEM errors with differing distributional charac-teristics, three separate techniques were used to create this error surface (spatial distributionsof DEM errors have been studied extensively; see Erdogan (2010) for discussion of this topic).Examples of each of these error databases are shown in Figure 3.

The first set of error databases was created by populating a raster matrix with randomvariates drawn from a Gaussian distribution with a mean of zero and a specified standarddeviation. The errors in databases constructed in this fashion exhibited no spatial pattern. Asecond set of error databases was constructed by normalizing the values in a raster databaseconstructed via the MPDM (created using a specified h value) to a mean of zero and a specifiedstandard deviation. Error surfaces constructed in this fashion exhibited spatial autocorrelation(the extent of which was controlled by the MPDM’s h factor), but no correlation with theDEM. Finally, a third set of error databases contained errors correlated with the distancesfrom ridgelines and valley bottoms (with errors at their maximum along ridgelines and valleybottoms and decreasing with distance from these features). Databases of this sort were createdvia the following procedure:

1. Valleys bottoms were identified by subjecting the synthetic DEM to a hydrologic analysisdesigned to determine the number of raster cells whose overland flow drain into thecurrent cell (i.e. a flow accumulation analysis) and indentifying all the cells whose flowaccumulation exceeded 5% of the total number of cells in the database as valley bottoms(Tarboton et al. 1991).

2. Ridges were identified by inverting the DEM and repeating the valley bottom identifica-tion process. Flow accumulations exceeding 1% of the cells in the database were identifiedas ridges.

3. Euclidean distances were computed from each cell in the database to the nearest ridge orvalley.

4. Distances from the previous step were reclassed so that (a) raster cells containing distancesof zero (i.e. the ridges and valley bottoms themselves) were reclassed to one, (b) raster cellscontaining distances above a specified maximum distance (the Correlated Range) werereclassed to a specified minimum value (this value was termed the Correlated Decay andwas always less than one), and (c) all other cells were reclassed to values between one andthe specified minimum value. The values falling under reclass rule (c) were inversely pro-portional to distance; for example, if the specified minimum value was 0.25, a cell whosedistance to the nearest ridgeline or valley bottom was one third the specified maximum

Finding Optimal Travel Routes with Uncertain Cost Data 167

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

Figure 3 Examples of the error databases created in this study. Dark colors represent large absolutevalues of error; lighter colors represent errors with small absolute values

168 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

distance would contain a value of 0.25 + (1 − 1/3) ¥ (1 − 0.25) = 0.75. This computationis depicted graphically in part A of Figure 4.

5. The reclassed values just described were multiplied by a base standard deviation value toproduce the actual standard deviation value for each cell. Each cell’s error value was thenproduced by drawing a random variate from a Gaussian distribution with a mean of zeroand the cell’s standard deviation.

Error databases constructed in this third fashion show a spatial pattern correlated to thesynthetic DEM; errors tend to be largest along ridges and valley bottoms and decrease withincreasing distance from these features. The range over which this correlation applies is deter-mined by the maximum distance used in the reclassing operations of step (4); the strength ofthe correlation between error and synthetic DEM features is determined by the specifiedminimum value for the multiplier used in step (4); and the overall magnitude of the error isdetermined by the base standard deviation used in the final step.

4.4 Simulating Sensor Inputs

It was assumed that: (1) sensors could only obtain information for that portion of the land-scape that was within the robot’s viewshed; (2) within the viewshed, the sensors could onlygather information out to some maximum range; and (3) the information gathered by thesensors was 100% accurate at zero distance from the robot’s location, but sensor accuracydropped linearly to some predefined minimum level at the maximum range of the sensors.

These ideas were operationalized by conducting both viewshed and Euclidean distanceanalyses from the robot’s current location. Map algebra was used to combine the two resultingdatabases into a new database where each raster cell both within the viewshed and within thespecified sensor range of the robot contained the cell’s distance from the robot, and all othercells contained zeros. A “reverse distance” database was then created by subtracting each non-zero cell’s value from the maximum distance in the database and dividing the result by themaximum distance (cells containing zeros retained their values in the reverse distance data-base). This produced a database (here called the sensor accuracy database) where all cells in

Figure 4 The correlated elevation error, sensor accuracy for obstacles, and sensor accuracy forelevation error distance decay functions

Finding Optimal Travel Routes with Uncertain Cost Data 169

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

the viewshed and within sensor range contained values between zero and one, with largervalues located at proportionately shorter distances from the robot.

This sensor accuracy database was used in two different sensor simulations. The firstdetermined whether or not the robot’s sensors detected an obstacle (this simulation is depictedgraphically in part B of Figure 4). To accomplish this, the nonzero cells in the sensor accuracydatabase were rescaled so that they ran from one to some predefined minimum value (whichwas always between zero and one) instead of their original range from zero to one. Eachobstacle cell within the sensor’s scanning area was then identified, and represented by arandom variate drawn from a uniform distribution that ranged from zero to one. If therandom variate was less than the value in the rescaled sensor accuracy database, the obstaclewas identified. If the random variate exceeded the value in the rescaled database, the obstacleremained undetected. Note that once an obstacle was detected, it remained detected in all sub-sequent iterations of the RFPR algorithm; the robot’s route finding system was assumed to“remember” everything it learned from previous scans.

The second simulation determined how much elevation error from the DEM was detected(this simulation is depicted graphically in part C of Figure 4). This was accomplished by onceagain rescaling all the nonzero cells in the sensor accuracy database to a second predefinedminimum, which was once again less than one. A uniform random variate (again between zeroand one) was then drawn for each rescaled cell. If the random variate was less than the res-caled cell value, the sensors were assumed to have corrected all of the error present in the cell,so the corresponding cell value in the error surface was set to zero. If the random variate wasgreater than the rescaled cell value, the proportion of DEM error corrected by the sensors wascomputed using the following formula: (1 - Random Variate Value) / (1 – Rescaled Cell Value).

4.5 Experimental Design

In order to understand the experimental design used in this study, a distinction must be madebetween experimental and explanatory variables. Experimental variables were controlled asinputs into the simulator; systematically varying their values constituted the experimentaldesign used here. Explanatory variables were derived from the outputs of the simulator andwere used in statistical analyses designed to explain the variation of the RFPR performancemetrics listed in Table 1. While in many cases the explanatory variables were identical to theexperimental variables, they were not always so.

The experimental variables and their values are summarized in Table 2. Eight experimen-tal variables were found in every iteration of the Monte Carlo simulator, and one additionalvariable was needed in iterations involving the random DEM error. Two additional experimen-tal variables were needed in iterations involving spatially autocorrelated DEM error, and threeadditional parameters were needed in iterations involving DEM error correlated with ridge-lines and valley bottoms. Thus, the smallest number of experimental variables used in anyiteration was nine, and the largest was 11. By multiplying the number of values for each vari-able investigated (second column of Table 2), we can see there were a total of 3,888 uniquecombinations of experimental variable values investigated under the random error scenario,15,552 unique combinations under the spatial autocorrelated error scenario, and 34,992 com-binations were investigated under the error correlated with landforms scenario. Since the simu-lator evaluated each unique combination of experimental variables 50 times, there were a totalof 3,888 ¥ 50 = 194,400 iterations under the random error scenario, 15,552 ¥ 50 = 777,600iterations under the autocorrelated error scenario, and 34,992 ¥ 50 = 1,749,600 iterationsevaluated under the error correlated with terrain features scenario.

170 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

The right-hand column of Table 2 lists the explanatory variables associated with each experi-mental variable. The difference between the two are: (1) while the simulator used MPDM hvalues to record the targeted amount of spatial autocorrelation in synthetic DEMs and correlatederror surfaces, Moran’s I values (computed using a 3-by-3 moving window) were used asexplanatory variables because they measured actual amounts of autocorrelation (Griffith andPaelinck 2011); and (2) while distance between the starting and ending point of each route wasconstrained within the simulator to ranges between 50 and 100 cell widths, distance was not rig-orously controlled (i.e. assigned specific values to be investigated) during the simulation process.Instead, simple straight-line distance was used as an explanatory variable.

4.6 Computer Platform

The model just described was implemented on a variety of Intel-based desktop PCs runningthe Windows XP operating system. The model utilized version 9.3 of the ArcGIS software(Esri 2010), version 2.6 of the Python scripting language (Rossum and Drake 2006) and the2010 edition of the Visual Basic programming system (Microsoft Corporation 2010). As manyas 26 machines were used simultaneously to complete the all of the required iterations inapproximately three months.

4.7 Statistical Analysis

The goal of the statistical analysis was to investigate the relationships between the RFPRperformance metrics from Table 1 and the explanatory variables from Table 2. Due to the

Table 2 Values of experimental/explanatory variables investigated in the Monte Carlo analysis

VariableExperimental ValuesInvestigated

Explanatory ValuesInvestigated

Variables Pertinent Across all Error ScenariosDEM Autocorrelation 0.3, 0.5, 0.7, 0.9 h units Moran’s I valuesDistance Not controlled as an

experimental variableRandom distances between

50 and 100 cell widthsNumber of Obstacles 50, 100, 200 obstacles Same as experimental valuesSensor Range 5, 10, 15 cell widths Same as experimental valuesObstacle Accuracy 25%, 50%, 75% Same as experimental valuesElevation Accuracy 25%, 50%, 75% Same as experimental valuesRevision Interval 0, 1, 5, 10 cell widths

between revisionsSame as experimental values

Variable Pertinent only under the Random Error ScenarioRandom Magnitude 0.5, 1.0, 2.0 elevation units Same as experimental values

Variables Pertinent only under the Autocorrelated Error ScenarioError Autocorrelation 0.3, 0.5, 0.7, 0.9 h units Moran’s I valuesAutocorrelated Magnitude 0.5, 1.0, 2.0 elevation units Same as experimental values

Variables Pertinent only under the Error Correlated with DEM Features ScenarioCorrelated Magnitude 0.5, 1.0, 2.0 elevation units Same as experimental valuesCorrelation Decay 0.75, 0.50. 0.25 Same as experimental valuesCorrelation Range 3, 6, 9 cell widths Same as experimental values

Finding Optimal Travel Routes with Uncertain Cost Data 171

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

differing number of explanatory variables under each of the three error scenarios, separateanalyses were conducted for each error scenario as well as each performance metric. All of theexplanatory variables were continuous, as were four of the five performance metrics. Thesefour metrics were evaluated via 12 (four variables multiplied by three error scenarios) analysesof variance. The remaining performance metric (Success) was binary; it was analyzed via threelogistic regression procedures.

The large number of observations posed a problem. Given the number of observations,conventional statistical tests concluded that virtually all of the explanatory variables werehighly significant predictors of the RFPR performance metrics. Thus, rather than rely on insen-sitive statistical tests, a pseudo-jackknife procedure was used. Under this approach, a fullmodel was constructed using all of the explanatory variables. A reduced model was then con-structed by eliminating a single explanatory variable. The reduction in the reduced model’spredictive power (relative to the predictive power of the full model) was used as a measure ofthe eliminated variable’s ability to explain variation in the performance metric.

5 Results and Discussion

Means, standard deviations and sample sizes of each of the RFPR performance metrics areshown in Table 3. This table reveals a number of interesting trends.

Overall success was extremely high; the RFPR procedure found successful routes in 96 to98% of the simulations. There appears to be a trend toward higher rates of success when

Table 3 Means, standard deviations (in italics) and sample sizes [in brackets] of RFPR performancemetrics. Note that the statistics for the Success metric were computed using all observations; statis-tics for the other metrics were computed from only those observations where the RFPR techniquefound a successful path

Error Scenario

Random Autocorrelated Terrain Correlated

RFPR PerformanceMetric

Success 0.96 0.97 0.98(0.17) (0.17) (0.18)[194,400] [777,600] [1,749,600]

Buffer 90 8.24 7.08 6.94(3.59) (3.56) (3.64)[186,561] [753,036] [1,722,176]

Buffer 95 10.29 9.10 8.51(4.67) (4.63) (4.73)[186,561] [753,036] [1,722,176]

Added Cost 18.50% 16.47% 12.78%(15.09%) (13.72%) (16.96%)[186,561] [753,036] [1,722,176]

Added Length 9.86% 8.74% 11.07%(10.18%) (9.65%) (10.90%)[186,561] [753,036] [1,722,176]

172 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

moving from simple random error to spatially autocorrelated error to error correlated withterrain features, but all of the success rates are so high that any apparent trend might be anaberration. However, the same trend is seen in the Buffer 90 and Buffer 95 statistics (the meanbuffer widths decrease as one moves from simple random error to spatially autocorrelatederror to error correlated with terrain features) as well as in the added cost results (added costsdecrease as one moves along the spectrum of error scenarios). The only metric for which thistrend is not apparent is added length. However, length is not a measure of route quality. Whilecomparative length is an obvious and intuitive metric of one aspect of route similarity, route-finding techniques (both the RFPR technique and the conventional routefinding process) aredesigned to find routes that minimize total cumulative cost, not length. Thus, the observedtrends imply that the RFPR algorithm produces its least optimal results when error is randomand its best results when error is correlated with terrain features. RFPR performance is inter-mediate when error is autocorrelated.

Another feature to note is the magnitudes of the various metrics. The 90% buffer rangesfrom approximately 7 to 81/4 cell widths; given the 100-by-100 cell width dimensions of thedatabases analyzed in this study, this represents a significant distance. The 95% buffers rangefrom approximately 81/4 to 101/4 cell widths; the relatively modest increase in size relative tothe 90% buffers indicate that the dissimilarity between the RFPR and optimal routes are notcaused by small portions of the routes that differ significantly while the remainder of theroutes are relatively similar. Instead, these results imply that the dissimilarity of the routes isrelatively evenly distributed along the entire length of the routes.

Table 3 also indicates that the total cumulative cost of the RFPR routes increased more(relative to the cost of the truly optimal routes) under each error scenario than did the totallength of the RFPR routes. This indicates that not only did the RFPR routes traverse more cellsthan the truly optimal routes, but the average traversing cost of each cell along the RFPRroutes was higher than the average cost of cells along the truly optimal routes. This is not sur-prising; the 100% accurate data available to the truly optimal routefinding system allowed itto find and take advantage of low-cost cells that were hidden from the RFPR system under theerror in its databases.

Tables 4, 5 and 6 describe the relationships between the RFPR performance metrics andthe explanatory variables from Table 2. The first column in each table lists the performancemetric being described and indicates what index(es) were used to evaluate the predictive modelthat attempted to predict the value of the metric. In the case of the Success metric, logisticregression was used as the predictive model, which was evaluated by four indexes: Somers’ D,the Goodman-Kruskal Gamma, Kendall’s T, and the Receiver Operating Curve (ROC) index(Menard 2010). Linear regression was used to model all of the remaining performance metrics,and the Coefficient of Determination (R2) was used to evaluate the model. The first column ofeach table lists the values for each of these indexes for the full model (i.e. the model thatincluded all of the explanatory variables) as well as the number of observations used to buildthe model. The models for the Success metric used all available observations, while the modelsfor all subsequent metrics used only those observations where the RFPR method produced asuccessful route.

The % reduction column of each table indicates the percent reduction in each index forthe various reduced models. For example, Table 4 includes an entry of 18.0 associated with theSomers’ D index relating to the Success metric and the Number of Obstacles explanatory vari-able. This indicates that when the reduced model that omitted the Number of Obstacles vari-able was constructed, the value of Somers’ D decreased by 18.0% relative to the Somers’ Dvalue of the full model. The Relation column in each table indicates whether increases in the

Finding Optimal Travel Routes with Uncertain Cost Data 173

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

Table 4 Relationships between explanatory variables and RFPR performance metrics under therandom error scenario. See text for details

Random Error Scenario

Explanatory variable % reduction Relation

Success DEM Autocorrelation 0.0 / 0.0 / 0.0 / 0.0 Positive(Somers’ D / Gamma /

Kendall’s T / ROC)Distance 0.0 / 0.1 / 0.0 / 0.0 Negative

Full Model: Num. Obstacles 18.0 / 14.4 / 17.2 / 8.0 Negative0.902 / 0.908 / 0.054 / 0.951 Sensor Range 5.9 / 4.1 / 5.5 / 2.7 Positiven = 194,400 Obstacle Accuracy 18.2 / 15.1 / 17.9 / 8.9 Positive

Elevation Accuracy 0.0 / 0.0 / 0.0 / 0.0 NegativeRevision Interval 22.3 / 19.8 / 21.0 / 9.8 NegativeRandom Magnitude 2.2 / 1.7 / 1.8 / 1.1 Negative

Buffer 90 DEM Autocorrelation 0.6 Negative(R2) Distance 0.1 NegativeFull Model: Num. Obstacles 0.1 Positive0.6233 Sensor Range 12.9 Negativen = 186,561 Obstacle Accuracy 1.1 Negative

Elevation Accuracy 9.4 NegativeRevision Interval 76.9 PositiveRandom Magnitude 14.3 Positive

Buffer 95 DEM Autocorrelation 1.2 Negative(R2) Distance 0.1 PositiveFull Model: Num. Obstacles 0.2 Positive0.5041 Sensor Range 16.1 Negativen = 186,561 Obstacle Accuracy 0.9 Negative

Elevation Accuracy 13.4 NegativeRevision Interval 59.8 PositiveRandom Magnitude 12.1 Positive

Added Cost DEM Autocorrelation 19.7 Negative(R2) Distance 4.9 PositiveFull Model: Num. Obstacles 0.6 Positive0.1963 Sensor Range 14.8 Negativen = 186,561 Obstacle Accuracy 0.2 Negative

Elevation Accuracy 11.5 NegativeRevision Interval 29.9 PositiveRandom Magnitude 24.1 Positive

Added Length DEM Autocorrelation 9.7 Negative(R2) Distance 4.3 PositiveFull Model: Num. Obstacles 0.8 Positive0.0809 Sensor Range 16.3 Negativen = 186,561 Obstacle Accuracy 0.3 Negative

Elevation Accuracy 22.3 NegativeRevision Interval 27.4 NegativeRandom Magnitude 22.2 Positive

174 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

Table 5 Relationships between explanatory variables and RFPR performance metrics under the spa-tially autocorrelated error scenario. See text for details

Autocorrelated Error Scenario

Explanatory variable % reduction Relation

Success(Somers’ D / Gamma / Kendall’s T / ROC)

Full Model:0.901 / 0.911 / 0.055 / 0.951n = 777,600

DEM Autocorrelation 0.0 / 0.1 / 0.0 / 0.0 PositiveDistance 0.1 / 0.0 / 0.1 / 0.0 NegativeNum. Obstacles 20.3 / 16.7 / 20.0 / 9.7 NegativeSensor Range 6.0 / 2.7 / 5.5 / 2.8 PositiveObstacle Accuracy 16.8 / 12.1 / 16.4 / 8.0 PositiveElevation Accuracy 0.0 / 0.0 / 0.0 / 0.0 PositiveRevision Interval 23.0 / 19.9 / 21.0 / 11.1 NegativeError Autocorrelation 1.6 / 1.3 / 1.4 / 0.9 NegativeAuto. Magnitude 1.0 / 0.8 / 0.8 / 0.5 Negative

Buffer 90(R2)

Full Model:0.6378n = 753,036

DEM Autocorrelation 4.7 NegativeDistance 0.0 PositiveNum. Obstacles 0.1 PositiveSensor Range 12.8 NegativeObstacle Accuracy 1.3 NegativeElevation Accuracy 9.2 NegativeRevision Interval 70.3 PositiveError Autocorrelation 9.2 PositiveAuto. Magnitude 11.7 Positive

Buffer 95(R2)

Full Model:0.4912n = 753,036

DEM Autocorrelation 3.1 NegativeDistance 0.0 NegativeNum. Obstacles 0.6 PositiveSensor Range 20.2 NegativeObstacle Accuracy 1.1 NegativeElevation Accuracy 16.8 NegativeRevision Interval 55.9 PositiveError Autocorrelation 8.1 PositiveAuto. Magnitude 6.9 Positive

Added Cost(R2)

Full Model:0.3347n = 753,036

DEM Autocorrelation 13.9 NegativeDistance 5.1 PositiveNum. Obstacles 0.7 PositiveSensor Range 15.8 NegativeObstacle Accuracy 0.3 NegativeElevation Accuracy 20.9 NegativeRevision Interval 31.6 PositiveError Autocorrelation 10.3 PositiveAuto. Magnitude 16.8 Positive

Added Length(R2)

Full Model:0.1195n = 753,036

DEM Autocorrelation 10.2 NegativeDistance 4.7 PositiveNum. Obstacles 1.0 PositiveSensor Range 23.4 NegativeObstacle Accuracy 0.2 PositiveElevation Accuracy 17.2 NegativeRevision Interval 34.1 NegativeError Autocorrelation 9.5 PositiveAuto. Magnitude 14.4 Positive

Finding Optimal Travel Routes with Uncertain Cost Data 175

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

Table 6 Relationships between explanatory variables and RFPR performance metrics under theterrain correlated error scenario. See text for details

Terrain Correlated Error Scenario

Explanatory variable % reduction Relation

Success DEM Autocorrelation 0.0 / 0.0 / 0.0 / 0.0 Negative(Somers’ D / Gamma / Kendall’s T / ROC) Distance 0.0 / 0.0 / 0.0 / 0.0 NegativeFull Model: Num. Obstacles 20.4 / 16.9 / 21.4 / 9.7 Negative0.902 / 0.912 / 0.056 / 0.951 Sensor Range 5.9 / 3.1 / 7.1 / 2.7 Positiven = 1,749,600 Obstacle Accuracy 16.9 / 14.0 / 17.9 / 8.0 Positive

Elevation Accuracy 0.0 / 0.0 / 0.0 / 0.0 PositiveRevision Interval 22.8 / 19.7 / 20.9 / 10.2 NegativeCorrelated Decay 0.8 / 0.6 / 0.7 / 0.4 PositiveCorrelated Range 0.8 / 0.7 / 0.7 / 0.5 PositiveCorrelated Magnitude 1.1 / 0.9 / 0.9 / 0.6 Negative

Buffer 90 DEM Autocorrelation 9.6 Negative(R2) Distance 0.1 PositiveFull Model: Num. Obstacles 0.1 Negative0.6059 Sensor Range 13.1 Negativen = 1,722,176 Obstacle Accuracy 0.9 Negative

Elevation Accuracy 9.4 NegativeRevision Interval 72.4 PositiveCorrelated Decay 6.1 PositiveCorrelated Range 10.9 PositiveCorrelated Magnitude 8.8 Positive

Buffer 95 DEM Autocorrelation 8.7 Negative(R2) Distance 0.1 PositiveFull Model: Num. Obstacles 0.8 Positive0.5198 Sensor Range 21.1 Negativen = 1,722,176 Obstacle Accuracy 0.6 Negative

Elevation Accuracy 18.3 NegativeRevision Interval 63.4 PositiveCorrelated Decay 5.5 PositiveCorrelated Range 6.4 PositiveCorrelated Magnitude 6.2 Positive

Added Cost DEM Autocorrelation 17.0 Negative(R2) Distance 5.2 PositiveFull Model: Num. Obstacles 4.9 Positive0.3918 Sensor Range 16.6 Negativen = 1,722,176 Obstacle Accuracy 0.4 Negative

Elevation Accuracy 24.9 NegativeRevision Interval 36.8 PositiveCorrelated Decay 8.1 PositiveCorrelated Range 10.8 PositiveCorrelated Magnitude 11.6 Positive

Added Length DEM Autocorrelation 17.7 Negative(R2) Distance 5.1 PositiveFull Model: Num. Obstacles 2.1 Positive0.1679 Sensor Range 11.2 Negativen = 1,722,176 Obstacle Accuracy 0.3 Negative

Elevation Accuracy 27.1 NegativeRevision Interval 39.2 NegativeCorrelated Decay 9.9 NegativeCorrelated Range 11.3 NegativeCorrelated Magnitude 10.2 Positive

176 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

explanatory variable produce increases (a positive relation) or decreases (a negative relation) inthe corresponding RFPR performance metric. These relations were derived from the fullmodels. Note that in the case of the Success metric, a positive relation indicates an increasedprobability of a success, while a negative relation indicates a decreased probability.

The results in these tables show certain trends. Regardless of the type of error investi-gated, Success was explained largely by three factors: the number of cells traversed betweenroute revisions, the number of obstacles, and the accuracy of the sensors in detecting obstacles.To a much lesser extent, Sensor Range also had explanatory power. All of these factors playobvious roles in determining whether or not the robot becomes mired in an obstacle, so it isnot surprising that they explained Success. In addition, the relationships between these vari-ables and success make intuitive sense. Under all error scenarios, increased values of the Revi-sion Interval variable (the number of cells between route revisions), the Number of Obstaclesvariable, and/or the Sensor Range variable produced decreases in the probability of success.Increased values of the Obstacle Accuracy variable (the probability that an obstacle will bedetected) resulted in increased probability of success. However, another trend was notexpected. Revision Interval had the most explanatory power of any single variable, and themagnitude of this power was remarkably consistent. For example, looking only at the Somers’D criterion, removing Revision Interval from the full model decreased Somers’ D by 22.7 �

0.4 % regardless of which error scenario was evaluated.Surprisingly, Buffer 90 was overwhelmingly explained by a single variable. Removing

Revision Interval from the full model resulted in a decrease in R2 values of between 70.3 and76.9%, depending upon the error scenario. Revision Interval has an obvious influence on thespatial similarity of the RFPR and truly optimal routes, so it is not surprising that RevisionInterval is highly explanatory. However, the results in Tables 4, 5 and 6 indicate that othervariables that intuitively could be expected to have similar degrees of impact on Buffer 90 havedramatically less impact. For example, it is certainly plausible that Sensor Range and ElevationAccuracy could explain a significant portion of Buffer 90, and while the results from the tablesindicate that these variables do have explanatory power, it is generally less than 1/5 of that ofRevision Interval. The magnitude of this discrepancy was not expected. Despite this, the rela-tionships between these variables are as intuitively expected: increased values of RevisionInterval result in increased buffer width, while increased values of Sensor Range and/or Eleva-tion Accuracy result in decreased buffer width.

It is also worth noting that when comparing between error scenarios, the variablesdescribing DEM error had more collective explanatory power as one moved from randomerror (where a single variable described DEM error) to autocorrelated error (two variables) toerror correlated with terrain features (three variables). The values shown in Tables 4 through 6hint at this, but it may not be valid to simply add percent reduction values from the tablesacross explanatory variables – doing so assumes complete independence of the explanatoryvariables. Nevertheless, when all variables that describe DEM error are eliminated simultane-ously, the single variable that describes random error reduces R2 by 14.3%, the two variablesthat describe autocorrelated error reduce R2 by 19.0%, and the three variables describing errorcorrelated with terrain features collectively reduce R2 by 22.1%. This same progression isvisible in the DEM Autocorrelation results. The degree of autocorrelation in the DEM hadnegligible explanatory power in the case of random error, modest power in the case of auto-correlated error, and significantly more explanatory power in the case of errors correlated withterrain features.

The relationships between the DEM error variables and Buffer 90 range from simple tocomplex. In the case of simple random error, the magnitude of error (as measured by standard

Finding Optimal Travel Routes with Uncertain Cost Data 177

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

deviation) is positively related to buffer width. The same is true for autocorrelated error, but thedegree of autocorrelation is also positively correlated with buffer width. This can be explained ifthe RFPR pathways follow “corridors” of low error. Such corridors may exist in autocorrelatederror databases; they become more sharply defined as the degree of autocorrelation increased.These corridors are randomly oriented relative to the cost surface, so an RFPR pathway follow-ing a corridor would have little likelihood of being in close proximity to a truly optimal routethat is not subject to DEM error. Finally, in the case of error correlated with terrain features,error magnitude was once again positively related to buffer width, as were both CorrelatedDecay and Correlated Range. These last two relations imply that as the terrain error becomesincreasingly localized around ridgelines and valley bottoms, buffer width decreases.

The Buffer 95 results were a muted version of the Buffer 90 results. Revision Interval stillhad exceptional explanatory power, but not to the extremes seen previously. The Sensor Rangeand Accuracy variables were stronger predictors of Buffer 95 than they were for Buffer 90.The progression of DEM error variables was also present (12.1% for random error, 14.2% forautocorrelated error, and 16.7% for terrain correlated error), but once again, this progressionwas not as dramatic as it was previously. Relationships between the explanatory variables andBuffer 95 were identical to those seen in the Buffer 90 results.

This muting effect was accompanied by an overall decrease in explanatory power. Foreach error scenario, the Buffer 95 full model explained substantially less variation in itsdependent variable than did the corresponding Buffer 90 full model. Together, these resultsimply that as the buffer criteria became more stringent, random effects became more pro-nounced, thereby causing individual variables and overall models to lose predictive power.

The Added Cost models had substantially less explanatory power than did either of theBuffer models. Nevertheless, with R2 values in the 0.20 to 0.40 range, these models did havesome explanatory power. This power was spread among many explanatory variables, includ-ing Revision Interval (which, regardless of error scenario, had the most significant impact onR2 when it was removed from the model), Elevation Accuracy, DEM Autocorrelation, and thevariables measuring DEM error (the precise ranking of these variables varied between errorscenarios), Sensor Range, and Distance (which had a modest but not insignificant impact). Therelationships between these variables were consistent across error scenarios and generallyintuitive. Added Cost was positively related to Revision Interval and Distance, and negativelyrelated to Elevation Accuracy, DEM Autocorrelation, and Sensor Range. Relationships withthe variables measuring DEM error followed similar lines as observed in the Buffer 90 results.

The Added Length models had the least explanatory power of any of the models investi-gated. However, the explanatory power of these models was influenced by error scenario. Theprogression from random to autocorrelated error correlated with terrain features was againevidenced, corresponding to an increase in R2 from 0.08 to 0.17. Thus, while explainingAdded Length was always problematic, as DEM error became more structured and morerelated to DEM features Added Length became more explainable.

The power of the Added Length models was spread among many explanatory variables,including Revision Interval (which again had the greatest impact on R2 values of any variable,regardless of error scenario), Elevation Accuracy (which produced the second greatest impactacross all error scenarios), and DEM Autocorrelation, Sensor Range and the variables measur-ing DEM error (whose specific ranking varied between error scenarios). All of these variableswere negatively related to Added Length, with the exception of some of the variables measur-ing DEM error. The DEM error variables were an inconsistent mix of positive and negativerelations; it is difficult to interpret these relations as anything other than artifacts of the regres-sion process.

178 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

6 Conclusions

This study evaluated the quality of routes produced by the RFPF algorithm in three generalareas: their ability to successfully reach desired targets, their optimality, and their spatial simi-larity to truly optimal routes. The results indicate that these qualities of RFPR routes are influ-enced by interactions among terrain characteristics, DEM error traits, sensor attributes, andparameters of the RFPR algorithm itself.

Overall success – the ability of the RFPR algorithm to plot a route to a desired targetwithout becoming mired in an obstacle – was extremely high (96 to 98%). No dramaticchanges in success rates were apparent across any of the error scenarios evaluated here. Thisbodes well for the applicability of the RFPR algorithm; at the most fundamental level, thealgorithm is capable of producing successful routes in a very high percentage of cases. Further-more, the explanatory variable that most heavily predicted success – the number of raster cellstraversed between route revisions – is easily controlled by the designer of the routefindingsystem. Decreasing the number of cells traversed between revisions improves the likelihood ofproducing a successful route; the only downside to such a decrease is the increased computa-tional demands required by frequent revisions. System designers can decide how they want totradeoff between probability of success and system performance and required computationalresource in ways that are appropriate for their particular application.

The optimality of a route is defined by its ability to minimize cost. Thus, the optimality ofRFPR routes was measured by the Added Cost metric. Recall that the costs involved here donot have to be economic or financial costs; “cost” can represent any mappable quantity thatrepresents a value to be minimized over the length of the optimal route.

RFPR routes were significantly less optimal than ideal routes. This is hardly surprising; thelack of complete, a priori information about costs that characterize RFPR problems makesthem impossible to solve optimally. There was a clear trend toward improved performance ofthe RFPR approach as the error in the cost surface became more structured – random errorproduced the least optimal results, spatially autocorrelated error produced intermediateresults, and error correlated with cost features produced RFPR paths whose total costs wereclosest to the costs of ideal routes. While it is tempting to conclude that this is simply a resultof the localization of errors (e.g. under the spatially autocorrelated errors scenario, a routethat travelled through a region of relatively low errors is unlikely to be as adversely impactedby error as a route traveling through a random error field, where both high and low errors canbe expected), the results of this study do not allow for any such conclusion. This issue awaitsfurther study.

However, the results shown here do clearly indicate that within any error scenario, thesingle factor with the greatest impact on RFPR route optimality was once again the intervalbetween revisions. Given the controllability of revision frequency, this again bodes well for theapplicability of the RFPR approach.

Finally, the spatial similarity of RFPR and truly optimal routes was measured by theBuffer 90, Buffer 95 and added length response variables. In this analysis, spatial similarity ismore of an intellectual curiosity than a measure of RFPR performance; two identically optimalroutes (i.e. routes with the same cost) could be spatially very dissimilar. However, it stands toreason that under conditions of spatially autocorrelated costs (which in this study correspondsto high DEM autocorrelations), spatially similar routes should have at least somewhat similarcosts.

Somewhat surprisingly, these results indicate that the spatial similarity of routes whenmeasured in terms of buffers was quite predictable and highly influenced by revision frequency,

Finding Optimal Travel Routes with Uncertain Cost Data 179

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

while spatial similarity measured in terms of added length was quite unpredictable. Theexpected relationship between spatial autocorrelation of costs and buffer width was seen (to amodest degree) in the spatially autocorrelated error and error correlated with terrain featuresscenarios, but was largely absent in the random scenario. The expected relationship was moreapparent in the added length results, but once again, revision frequency had a more significantimpact than did autocorrelation of costs. It is possible that these results were influenced by therelative magnitudes of the unit costs and the error values, but further investigation will beneeded to prove or disprove this hypothesis. In this study, these relative magnitudes were con-trolled by the Random Magnitude, Autocorrelated Magnitude and Correlated Magnitudeexplanatory variables, and these variables show modest to strong correlation to buffer widthand route length. However, these variables measure targeted magnitudes of errors; actual mag-nitudes were stochastic. It is entirely possible that true measures of magnitude (i.e. the RMSEof the error surface) might be more predictive than were the variables used here.

Despite these uncertainties, the results of this study indicate that while RFPR routes arenot ideal, in most cases they are successful. Furthermore, the optimality of RFPR routes isheavily dependent upon revision frequency, which is controllable by the designers of systemslikely to employ RFPR techniques. The RFPR approach is a viable method of solving route-finding problems where a priori data is incomplete or uncertain.

References

Angelova A, Matthies L, Helmick D, and Perona P 2007 Learning and prediction of slip from visual informa-tion. Journal of Field Robotics 24: 205–31

Arvidson R E, Ruff S W, Morris R V, Ming D W, Crumpler L S, Yen A S, Squyres S W, Sullivan R J, Bell J F III,Cabrol N A, Clark B C, Farrand W F, Gellert R, Greenberger R, Grant J A, Guinness E A, Herkenhoff K E,Hurowitz J A, Johnson J R, Klingelhöfer G, Lewis K W, Li R, McCoy T J, Moersch J, McSween H Y,Murchie S L, Schmidt M, Schröder C, Wang A, Wiseman S, Madsen M B, Goetz W, and McLennan S M2008 Spirit Mars Rover Mission to the Columbia Hills, Gusev Crater: Mission overview and selectedresults from the Cumberland Ridge and Home Plate. Journal of Geophysical Research – Planets 113: E1

Burrough P A and McDonnell R A 1998 Principles of Geographical Information Systems (Second Edition).Oxford, Oxford University Press

Chakravorty S and Junkins J L 2007 Motion planning in uncertain environments with visionlike sensors. Auto-matica 43: 2104–111

Chen A and Ji Z 2005 Path finding under uncertainty. Journal of Advanced Transportation 39: 19–37Dean D J 1997 Finding optimal routes for networks of harvest site access roads using GIS-based techniques.

Canadian Journal of Forest Research 27: 11–22Douglas D H 1994 Least-cost path in GIS using an accumulated cost surface and slopelines. Cartographica 31:

37–51Dijkstra E W 1959 A note on two problems in connexion with graphs. Numerische Mathematik 1: 269–71Erdogan S 2010 Modeling the spatial distribution of DEM error with geographically weighted regression: An

experimental study. Computers and Geosciences 36: 34–43Esri 2010 ArcObjects Version 10. Redlands, CA, EsriFan Y and Nie Y 2006 Optimal routing for maximizing the travel time reliability. Networks and Spatial Eco-

nomics 6: 333–44Goodchild M F and Hunter G J 1997 A simple positional accuracy measure for linear features. International

Journal of Geographical Information Science 11: 299–306Green C J and Kelly A 2011 Toward optimal sampling in the space of paths. Advanced Robotics 66: 281–92Griffith D A and Paelinck J H P 2011 Non-Standard Spatial Statistics and Spatial Econometrics. Berlin, SpringerHu H and Brady M 1997 Dynamic global path planning with uncertainty for mobile robots in manufacturing.

IEEE Transactions on Robotics and Automation 13: 760–67Huriot J M, Smith T E, and Thisse J F 1989 Minimum-cost distances in spatial analysis. Geographical Analysis

21: 294–315Kelly A, Anthony Stentz A, Amidi O, Bode M, Bradley D, Diaz-Calderon A, Happold M, Herman H,

Mandelbaum R, Pilarski T, Rander P, Thayer S, Vallidis N, and Warner R 2006 Toward reliable off road

180 D J Dean

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)

autonomous vehicles operating in challenging environments. International Journal of Robotics Research25: 449–83

Maurette M 2003 Mars Rover autonomous navigation. Autonomous Robots 14: 199–208Menard S 2010 Logistic Regression: From Introductory to Advanced Concepts and Applications. Thousand

Oaks, CA, Sage PublicationsMicrosoft 2010 Visual Basic 2010. Redmond, WA, Microsoft CorporationMiller-Hooks E 2001 Adaptive least-expected time paths in stochastic, time-varying transportation and data net-

works. Networks 37: 35–52Olson C F, Matthies L H, Schoppers M, and Maimone M W 2003 Rover navigation using stereo ego-motion.

Robotics and Autonomous Systems 43: 215–29Rilett L R and Park D 2001 Incorporating uncertainty and multiple objectives in real-time route selection.

Journal of Transportation Engineering 127: 531–39Van Rossum G and Drake F L Jr 2006 An Introduction to Python. London, Network Theory Ltd (available at

http://www.network-theory.co.uk/docs/pytut/)Saupe D 1988 Algorithms for random fractals. In Peitgen H-O and Saupe D (eds) The Science of Fractal Images.

Berlin, Springer-Verlag: 71–113Schultz A C, Adams W, and Yamauchi B 1999 Integrating exploration, localization, navigation and planning

with a common representation. Autonomous Robots 6: 293–308Smith T E 1989 Shortest-path distances: An axiomatic approach. Geographical Analysis 21: 1–31Soderblom J M, Bell J F III, Johnson J R, Joseph J, and Wolff M J 2008 Mars exploration rover navigation

camera in-flight calibration. Journal of Geophysical Research – Planets 113: E6Tarboton D G, Bras R L, and Rodriguez–Iturbe I 1991 On the extraction of channel networks from digital

elevation data. Hydrological Processes 5: 81–100Wellington C, Courville A, and Stentz A 2006 A generative model of terrain for autonomous navigation in veg-

etation. International Journal of Robotics Research 25: 1287–304Xu J and Lathrop R G Jr 1995 Improving simulation accuracy of spread phenomena in a raster-based Geo-

graphic Information System. International Journal of Geographical Information Systems 9: 153–68

Finding Optimal Travel Routes with Uncertain Cost Data 181

© 2012 Blackwell Publishing Ltd Transactions in GIS, 2013, 17(2)