10
Robotica http://journals.cambridge.org/ROB Additional services for Robotica: Email alerts: Click here Subscriptions: Click here Commercial reprints: Click here Terms of use : Click here Comparing two algorithms for automatic planning by robots in stochastic environments Alan D. Christiansen and Kenneth Y. Goldberg Robotica / Volume 13 / Issue 06 / November 1995, pp 565 573 DOI: 10.1017/S0263574700018646, Published online: 09 March 2009 Link to this article: http://journals.cambridge.org/abstract_S0263574700018646 How to cite this article: Alan D. Christiansen and Kenneth Y. Goldberg (1995). Comparing two algorithms for automatic planning by robots in stochastic environments. Robotica,13, pp 565573 doi:10.1017/S0263574700018646 Request Permissions : Click here Downloaded from http://journals.cambridge.org/ROB, by Username: noodletoad, IP address: 128.2.179.9 on 13 Aug 2012

Robotica - Ken Goldberggoldberg.berkeley.edu/pubs/christiansen-goldberg-1995.pdf · Robotica Additional services for Robotica: Email alerts: Click here Subscriptions: Click here Commercial

Embed Size (px)

Citation preview

Roboticahttp://journals.cambridge.org/ROB

Additional services for Robotica:

Email alerts: Click hereSubscriptions: Click hereCommercial reprints: Click hereTerms of use : Click here

Comparing two algorithms for automatic planning by robots in stochastic environments

Alan D. Christiansen and Kenneth Y. Goldberg

Robotica / Volume 13 / Issue 06 / November 1995, pp 565 ­ 573DOI: 10.1017/S0263574700018646, Published online: 09 March 2009

Link to this article: http://journals.cambridge.org/abstract_S0263574700018646

How to cite this article:Alan D. Christiansen and Kenneth Y. Goldberg (1995). Comparing two algorithms for automatic planning by robots in stochastic environments. Robotica,13, pp 565­573 doi:10.1017/S0263574700018646

Request Permissions : Click here

Downloaded from http://journals.cambridge.org/ROB, by Username: noodletoad, IP address: 128.2.179.9 on 13 Aug 2012

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

Robolica (19S>5) volume 13. pp 565-573. © i y y S Cambridge University Press

Comparing two algorithms for automatic planning by robots instochastic environments'Alan D. Christiansen! and Kenneth Y. Goldberg^(Received in final form: December 12. 1994)

SUMMARYPlanning a sequence of robot actions is especially difficultwhen the outcome of actions is uncertain, as is inevitablewhen interacting with the physical environment. In thispaper we consider the case of finite state and actionspaces where actions can be modeled as Markovtransitions. Finding a plan that achieves a desired statewith maximum probability is known to be anNP-Complete problem. We consider two algorithms: anexponential-time algorithm that maximizes probability,and a polynomial-time algorithm that maximizes a lowerbound on the probability. As these algorithms trade offplan time for plan quality, we compare their performanceon a mechanical system for orienting parts. Our resultslead us to identify two properties of stochastic actionsthat can be used to choose between these planningalgorithms for other applications.

KEYWORDS: Automatic planning: Robots: Stochastic en-vironments: Algorithms.

1. INTRODUCTIONTo plan, a robot must anticipate the outcome of itsactions. If we choose to model the robot's environmentas a deterministic mapping on a finite set of possiblestates, a robot operating in the physical environmentoften encounters apparent non-determinism: a givenaction applied to a given state does not always yield thesame result. As the robot's model of the environment isan approximation to the real environment, suchuncertainty can arise from: errors in sensing (either theinitial or next state is not measured accurately), errors incontrol (the commanded action is not repeatedaccurately), or a combination of the two caused byphysical effects too fine to be modeled, such as friction orcollision dynamics.

• A portion of this work was performed by the authors atCarnegie Mellon University.t Computer Science Department, Tulane University, NewOrleans, LA 70118-5674 (USA).Supported at CMU by an AT&T Bell Laboratories Ph.D.Scholarship and by the National Science Foundation undergrant DMC-8520475. A portion of this work was completedduring a visit to the Laboratoire d'Informatique Fondamentali:et dintelligence Artificielle (LIF1A) in Grenoble. France.supported by INRIA.t Institute for Robotics and Intelligent Systems, University ofSouthern California. Los Angeles. CA 90089-0273 (USA).Supported by the National Science Foundation under AwardsNo. IRl-9123747. and DDM-9215362 (Strategic ManufacturingInitiative).

To generate plans that will be reliable under suchconditions, we can model events stochastically. When anaction is applied to a given state, the probability ofreaching a particular next state is predicted using adiscrete probability distribution. We model the state ofthe environment as a random variable with an associatedprobability distribution.

This distribution can be thought of as a hyperstate with"probability mass" assigned to each state of theenvironment.' Each action is a mapping from onehyperstate to another. If the current probabilitydistribution depends only on the action and previoushyperstate, then the sequence of hyperstates is known asa Markov chain.

The conditional probabilities that define each actioncan be represented using a matrix. Depending on theapplication, the field of statistics offers a variety ofapproaches to estimating the value of such matrices."1 Inthis paper we use a Bayesian estimator to estimate thevalues in these matrices based on observations of aphysical robot system. We also test the resulting plans ona particular part orienting application. Though we use aspecific robot application, we are motivated to identifytask characteristics that are common to a wide variety ofrobot manipulation tasks.

Our problem can be defined as follows: Given a knowninitial state, desired final state and a finite set of actionsrepresented with stochastic transition matrices, find asequence of actions that will maximize the probability ofreaching the desired final state. We consider the"unobserved" planning problem (where we do notobserve the state between actions during planexecution).4"'5 The exponential-time algorithm to solvethe problem (by checking all possibilities) is straightfor-ward. It has been shown6 that finding an optimal plan isNP-Complete: thus it is unlikely that a polynomial-timealgorithm exists. This motivates us to consider a fasteralgorithm that yields suboptimal plans. Since the generalapproach is to trade off planning time against planquality, this paper compares two planners on ourapplication task.

We have found that planning is affected by twocontrasting properties of stochastic actions. Some actionshave the property that a single initial state tends to maponto more than one result state. We say that the actionexhibits .state divergence when, in executing the action,there is a tendency to distribute probability mass.Alternatively, some actions have the property that

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

566 Automatic planning

multiple initial states map into a single result state. Inthese cases, we say that an action exhibits stateconvergence—it tends to focus distributed probabilitymass into a single state. An action can exhibit bothdivergence and convergence if it spreads out probabilitymass from one state while combining mass from otherstates. Of course, some actions may exhibit neither statedivergence nor state convergence.

A stochastic planner evaluates plans by tracking howprobability mass is distributed by sequences of actions.The objective is to find a plan that transfers a maximalamount of probability mass into the desired final state.This can be accomplished in several ways. One way is touse actions that cause state divergence followed byactions that cause convergence toward the desired state.Another way to plan is to avoid actions that cause statedivergence. These two ways of transferring probabilitymass suggest two planning methods: Method Ipropagates hyperstates, using a forward exhaustivesearch (to a fixed plan length) to find a plan most likelyto produce the desired goal. Method II propagates onlythe most likely state, using a backward best-first searchto find a plan that maximizes a lower bound on theprobability i>f reaching the goal.

We describe ;he two planning methods in detail andanalyze the computational complexity of each method.We compare the plans produced by each method and theplanning time required by each planner. We conclude thepaper by revisiting the concepts of state divergence andstate convergence and their importance to planning withstochastic actions.

2. RELATED WORK

2.1 Stochastic reasoningThe general problem of planning with stochastic actionshas been considered in the Al literature beginning withthe landmark paper7 that advocated the use of decisiontheory with numerical cost and probability models.Today, decision theory is a standard tool in Al:8'9 in fact,it has been argued1" that Bayesian decision theorysubsumes many alternative approaches to coping withuncertainty. One factor that is not always considered isthe computation time required to find an optimal plan:often we are willing to accept a satisfactory plan nowrather than an optimal plan later. Building on pioneeringwork by Simon," Etzioni12 incorporated a model ofdeliberation cost into a single Bayesian planningframework. The present paper addresses the tradeoffbetween plan time and plan quality by comparing themeasured performance of two algorithms on the sameproblem.

In the field of control theory, stochastic dynamicprogramming has been shown'-114 to be a powerful toolfor finding an optimal control policy (i.e. a mapping fromstates to actions). During execution, the robot looks upthe current state in the table and performs the actionstored therein. Unfortunately, these methods do notapply to the "unobserved" planning problem that we

consider in this paper, where the robot is not allowed tosense the state of the system and must proceedopen-loop.

2.2 Empirical modelsOne can also view the systems of the current paper aslearning systems, because the models of the task aredeveloped by the robot from direct observations of theeffects of actions. (These observations are made off-line,before planning occurs, in keeping with the unobservedplanning problem that we consider. There are variants ofour problem where sensing is allowed during planexecution, and this sensing may be used to improve theprobability of achieving the goal.) The observationsallow the robot to adapt its model of action effects. Ifthese adaptations reflect physical reality, then the robotimproves its ability to generate correct plans, and also itsability to achieve its goals. Learning systems for roboticsand control applications have been studied for more thana decade. The peg-in-hole problem was consideredl5

using a learning automata approach."1 Dufay andLatombe17 considered the problem of inducing robotcontrol programs from example program traces. Barto'lS

provided a survey of neural net approaches for controlproblems. Buckley19 considered the problem of human-guided robot teaching in the context of robot motionplanning. Gross2" described a method for learningbackprojections of desired part orientations underpushing actions. Bennett21 presented a method for tuningstochastic plans based upon prior models of the taskactions. Christiansen's PhD Thesis22 provides a detailedreview of learning systems for robotic manipulation.

As stated earlier, a planning algorithm must be able topredict the outcome of alternative actions. Thisprediction can either be based on analytic (a priori)models of actions, or on empirical observations. In thispaper we consider the latter (although we do employ ana priori discretization of the state space).

Research in robotic motion planning has tended tofocus on analytic models, even though such models areoften underspecified when friction is present.23"27 Ananalytic model for the part orienting applicationconsidered in this paper, based on a novel representationfor multiple frictional contacts in the plane, wasoriginally described by Erdmann and Mason.28 Erdmannand Mason considered the problem of finding plans thatachieve a unique final orientation for a part regardless ofits initial orientation. The authors based their analyticmodel on two strong assumptions: that friction obeyedCoulomb's Law with a uniform constant coefficientthroughout the task space, and that inertial effects couldbe ignored. Since this model does not uniquely specifythe outcome of every action, the planning algorithm kepttrack of all possible outcomes. Although this plannerperformed remarkably well in experiments (approxim-ately 50% success rate), the prospect of adding moreelaborate analysis of friction and dynamics was daunting.In contrast, our approach is to plan with a stochasticmodel based solely on physical observations.

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

Automatic planning 567

3. TRAY-TILTINGTo compare the two planning algorithms, we appliedthem to a method for orienting parts called "tray-tilting".2* The objective is to manipulate planar objects ina walled tray by tilting the tray so that the objects slide toa desired position and orientation. (See Figure 1.) Thephysics of this task are subtle and include frictional andimpact effects. Tray-tilting has been studied before, bothanalytically2'*1"' and empirically.-1""-12

Our robot arm was programmed to perform tilts in anydesired direction. As the tray is tilted by the robot,gravity acts on the object and causes it to slide. For thepurposes of this paper, we define tilts by a parameterwhich we call the tilt azimuth. This angle is the direction,in the plane of the tray bottom, of the force of gravity onthe object during the tilt. (See Figure 2.) Otherparameters of the tilt, such as steepness of the tilt and tiltspeed, had constant values during all tilts. (The tiltsteepness was thirty degrees from the horizontal and thetilt speed was set at a relatively slow speed: 25% of thenominal monitor speed for the Puma 560 robot that weused.) Although there is an infinite number of possibletilt azimuths, and therefore an infinite number of tilts, wesampled this space of azimuths at 30 degree intervals. Inall cases, the robot restricted its actions to one of these12 tilts.

In addition to defining tilts by a single parameter, wesimplified the description of task state by discretizing theobject's position and orientation. As indicated in Figure2, we divided the tray (conceptually) into nine regions,corresponding to the four corners, the four sides, and the

Fig. 1. The robot performing a tilting action.

150

Fig. 2. The tray (viewed from above), its states, and examplesof tilt azimuths.

middle of the tray. We gave each of these regionssymbolic names, as shown in Figure 2. We described anobject's position by one of these names - the name of theregion in which the object's center is located. A cameraabove the tray and an industrial image processorprovided this position information. We describedorientations by a similar discretization - our objects weredescribed as being either horizontal (to) or vertical (v),depending on the orientation of the object's major axis,as reported by our vision system. The rectangular tile ofFigure 2 would be classified as (swh). This discretizationis coarse and somewhat arbitrary. However, because theobject tends to be in contact with tray walls as a result ofthe applied tilting actions, the states defined by ourdiscretization correspond to qualitatively distinct physicalconfigurations.

Because of our discretization of state, errors in ourvision system, inaccuracies of the robot, and unmodeledphysics (such as impact), this task exhibits apparentnon-determinism. Our robot system, given only coarsesensory capabilities and no knowledge of physics,observes that its actions are not perfectly repeatable.This lack of repeatability makes the task seemnon-deterministic. However, it is possible for the robotto build a stochastic model of its environment byobserving the relative frequencies with which eventsoccur in response to its actions. The observednon-determinism may or may not be due to anon-deterministic world. Noisy sensors and effectors aresufficient to make the world seem non-deterministic tothe robot.

4. DEVELOPING A STOCHASTIC MODELFROM OBSERVATIONSGiven a set of observations from physical experiments,what is an appropriate stochastic model of actions? If theprobabilities of future states depend only on the currentstate, then we can use a Markov chain to represent the

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

568 Automatic planning

actions. We represented each tilting action u with astochastic transition matrix, P,,, where p,, is the(conditional) probability that the system will be in state jafter action u is applied to state /. We assume that the setof states and the set of actions are both finite.

In the physical experiments, each observation consistsof an initial state, a tilting action, and a final state. Foreach tilting action u, consider the matrix X,,, where ,r,y isthe number of observations with initial state / and finalstate / Given an observation matrix X,,, how do wegenerate a stochastic transition matrix P,,?

One possible approach is to use the observedfrequencies. The difficulty is that some observedtransition frequencies may be zero. For such a transition,it isn't clear whether the transition is truly impossible -maybe the transition just has a low probability and hasn'tyet been observed. A standard approach from statisticalestimation is to use the following estimator for eachaction u,

<*„2, a,, + x,, '

(1)

where the numbers o,y for /' = 1, 2 k are Dirichletparameters based on a priori assumptions. This isequivalent to a Bayes" estimator using a squared errorloss criterion.2

We could set a,v = 1.0 to represent the priorassumption that the conditional probability distribution isuniform: after an action is applied, the system is as likelyto be in any one state as in any other. For the tray tiltingproblem, we set an = 0.01 to represent our priorassumption that the conditional probability distributionfor each action will not be uniformly distributed, but willin fact be skewed toward some subset of states.

We generated 2000 tilt azimuths by random selection(with replacement) from the set of twelve azimuthsdescribed previously. Our robot performed the corres-ponding tilts of the tray, and observed the outcome ofeach tilt. The X,, matrices were defined by this data andwe used equation (1) to generate the correspondingstochastic transition matrices.

5. PLANNING WITH METHOD IWe now turn to the planning problem: given a stochasticmodel of actions, a known initial state and desired finalstate, find a sequence of open-loop tilting actions - ap l a n - t o reach the desired final state. We will presenttwo planning methods and their resulting plans. In thissection we describe a method that maximizes theprobability of reaching the desired final state. Thismethod has been applied to programming partsfeeders/1

Method I is based on the control of Markov chains.Consider a system with finite state space. Let us refer toa probability distribution on this state space as ahyperstate. Each action is a mapping between hyper-states. A plan is a sequence of actions, which alsocorresponds to a mapping between hyperstates. For agiven initial hyperstate. each plan generates a final

hyperstate. To compare plans, we compare their finalhyperstates. To rank hyperstates, we introduce a functionthat maps each hyperstate into a real number called itscost. The best plan is the plan with the lowest cost.

Let the vector A refer to a hyperstate. In a Markovchain, the hyperstate that results from applying action itto A is given by post-multiplying A by P,,,

A' = AP,,.

A plan is a sequence of actions. The outcome of a plan isthe composite effect of its inputs; the transition matrixfor a plan is the product of the transition matrices of itsactions. That is, for a plan consisting of the sequence ofactions («,, i/4, u2). the final hyperstate is

For the tray tilting task we are given a known initialstate and desired final state. In this case the initialhyperstate is a vector with a 1 corresponding to the initialstate and zeros elsewhere. Each action (and hence plan)defines a final hyperstate using the stochastic transitionmatrices described in Section 4. The cost functiondepends on the desired final state. If we want to reachstate /, let

so that the minimum cost hyperstate corresponds to thehighest probability that the system is in state /. Note thatthere may be more than one minimum-cost hyperstate.

To compare plans, we compare their final hyperstates.We express our desire for a particular outcome with acost function on the set of hyperstates. Typically, ahyperstate"s cost is determined by a weighted average ofits constituent states.

To find the best plan, we consider all plans and findone with minimum cost. The difficulty is that there is aninfinite number of plans to consider, requiring an infiniteamount of computation (unless we have prior informa-tion that the optimal plan is not infinite in length). Wecompromise and ignore plans longer than some cutoffthreshold. This threshold depends on how much time wehave and how fast we can evaluate potential plans. In ourcase we considered all plans with length S3 to find a planwith minimal cost.

6. PLANNING WITH METHOD IIAn alternative planning method is based on heuristicgraph search. The transition probability matricesdescribed in Section 4 define a graph, or more correctly,a multigraph. The vertices of the graph are the states thatwere defined previously (the object configurations) andthe graph's edges are the actions that cause one state tobe changed to another. Associated with each edge is anestimate of the probability that the action will providethe indicated state change. The graph is a multi-graphbecause there may be multiple edges possible between asingle pair of vertices, corresponding to multiple actionscausing the same state transition.

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

Automatic planning 569

Fig. 3. A portion of the robot's task model.

Figure 3 shows a small portion of the graph defined byour data. Above each edge is a tilt azimuth. Below eachedge is the associated transition probability. Note thatwhen the object is in state (ne v), and a tilt of azimuth300 is applied, the object's new state is uncertain. It isestimated that it has a 61% chance of achieving the(nwh) configuration, and a 38% chance of moving tothe (nh) configuration. Of course, not all transitions areshown in Figure 3. The sum of the probabilities for aparticular action executed from a particular state must beequal to one.

The planning problem of finding a sequence of actionsthat will transform a given current state to a desired goalstate can now be viewed as finding a path in the graphthat links the two states. Since our model of statetransitions is stochastic, we naturally wish to find a plan(path) with high probability of achieving the goal. Whena plan is executed, several actions are performed insuccession. The action probabilities for the plan stepsmust be combined to give an estimate of the probabilityof success for the whole plan.

A lower bound on the probability that a plan will reacha desired state can be computed by multiplying theprobabilities along one path between initial and desiredstates. Such a computation produces only a lower boundbecause there can exist multiple paths between the initialand desired states that share the same sequence of actionlabels. We can find a plan that maximizes this lowerbound by using shortest-path graph search. This leads toan efficient algorithm for finding plans that we will callMethod II.

Method II is an example of uniform-cost search.'*While uniform-cost search is usually cast as finding aminimum cost path in a graph (where the cost of a pathis the sum of the costs of the edges in the path), it is easyto adapt the algorithm to our problem by changing theevaluation function. In finding a minimum cost path in agraph, one desires to find a sequence of edges linking thestart and goal vertices such that the sum of the costs ofthose edges is as small as that of any other suchsequence. Our problem is to find a sequence of edgeslinking the start and goal vertices such that the productof the probabilities is as large as any other such sequence.It is possible to map our problem exactly onto a shortest

path problem by transforming the values associated withedges. For each edge probability p, we consider a newedge value (- logp). In this way, we guarantee thatfinding a shortest path in the transformed graph willcorrespond exactly to a maximum product probabilitypath in the original graph.

Since Method II only considers single paths, itsometimes misses good plans. Consider when Method IIis applied to the problem of getting from state (ne v) to(neh), and its stochastic action model is defined byFigure 3. Method II returns the plan (180 330 90). Thispath's product of action probability estimates is largerthan any other single path in the graph. Method II hasfound a good plan, but note that the plan (300 90) wouldbe better. Under this plan, the configuration achievedafter the first tilt is very likely to be either (nh) or(nwh), and the second tilt is highly likely to achieve thegoal no matter which intermediate configuration wasactually achieved. The combination of paths yields a highprobability even though neither single path has higherprobability than the path (180 330 90). Note that MethodI would return the better plan in this case but we shallshow that Method I requires significantly morecomputational effort to achieve this rigor.

7. COMPUTATIONAL COMPLEXITYOur planning problem can be related to finding anoptimal control policy for an unobserved Markov chainwhere the control and observation sets are finite.13 Whensystem state is perfectly observed at each stage andeither the transition probabilities are non-stationary andthe time horizon is finite or the transition probabilitiesare stationary and the cost function decays with time, theproblem can be solved in polynomial time by dynamicprogramming techniques. For cases where the systemstate is only partially observed (in a known element of apartition) at each stage, the problem is PSPACE-Hard,even for stationary, finite-horizon problems.

Finding optimal open-loop plans with stochasticactions has been shown to be NP-Complete.6 Thissuggests that an algorithm with good worst-case runningtime may not exist.

Recall that Method I considers all plans up to somelength limit, k. Let n be the number of states and m bethe number of actions. There are mk /c-step plans. Wecan visualize the search for an optimal strategy asproceeding through a tree, where the root node containsthe initial hyperstate and has a branch for each action inthe action space. Each branch leads to a new hyperstatewhich in turn has branches for each action. We expandthe tree to some fixed depth (horizon) and select theoptimal path. To generate each node in the tree we mustperform O(n2) multiplications. The total time for findingthe best fc-step plan is 0(n2mk).

Recall that Method II is based on uniform-cost graphsearch. Because the edge values ( - logp) are nonnega-tive, uniform cost search on this problem has a monotoneheuristic, which implies that whenever a node is

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

570 Automatic planning

expanded, a best path to that node has been found. Thismeans that nodes will never have to be re-expanded, andin a finite graph of n vertices, there can be no more thann node expansions. If there are m actions available ateach state, then m is the maximum number of edges thatcan be between any pair of vertices. Therefore, themaximum amount of work that will have to be done ateach node expansion is 0{nm), and the complexity ofthe algorithm is 0(n2m). The implemented algorithmdeals with probabilities and products directly instead oftransforming the problem to a shortest path problem, butit performs analogous steps to those of uniform costsearch on the transformed graph. So, the complexity ofMethod II is also 0(n2m).

Note that if Method I and Method II were applied in asituation where only 1-step plans were desired, then thecomplexity for planning with the two methods woulddiffer by no more than a constant factor.

Note the exponential factor mk in Method I'scomplexity. For Method II, if the number of actions isincreased by a constant factor, the planning timeincreases by a constant factor. However, if Method I isapplied to a task with a constant factor c more actions,then the total planning time would grow by the factor ck,where k is the length of the longest plan to beconsidered. In this paper, we consider 12 actionsavailable at each step of a plan. The average planningtime was 62 seconds for Method I and 0.46 seconds forMethod II. If we considered the case where the traycould be tilted at one degree intervals (360 possibleactions), and kept the three step plan length bound, thenthe number of actions would increase by a factor of 30.Method II's planning time would increase to 30 • 0.46, or14 seconds. However, Method I's planning time wouldswell to 303 - 62 seconds, which is more than 19 days.Thus, Method I becomes less practical as the number ofactions to be considered increases.

8. EMPIRICAL COMPARISONSOF THE PLANNERSThe two planning methods were implemented inCommon Lisp. To explore performance tradeoffs, weperformed several experiments with the tray-tilting task.In all experiments reported in this paper, we used an11 inch square tray and a 1 by 3 inch rectangular tile(Figure 2).

In Section 3, we described the state space of possibletile configurations for the tray-tilting tasks. There arenine possible discrete positions and two discreteorientations, making a total of eighteen possibleconfigurations. It turns out that only twelve of theseconfigurations occur in practice. When the tile is in oneof the four corners, both orientations are possible, for atotal of eight configurations. The two configurations inthe middle of the tray are impossible. For the remainingpositions - where the rectangular tile is against a side ofthe tray - only the orientation where the tile's major axisparallels the tray wall occurs in practice. This adds fourmore possible configurations, for the total of twelve.

For twelve possible tile configurations, there are 144pairs of configurations defining start state and goal state.Let us refer to each of these pairs as a problem. Thereare 132 non-trivial problems for our tray domain. (Thetwelve problems with start state and goal state equal toeach other are trivial, since a null plan always solves theproblem.) We ran the two planners on each of the 132non-trivial problems. Table I lists some of the resultingplans.

For 51 (39%) of the problems, the two plannersproduced identical plans. In many other cases, the twoplanners produced similar or symmetrically equivalentplans. In nearly every case, the estimated success ratiosof the two methods were within a few percentage pointsof each other. Note that for plans of length s 3 , theestimated success ratio P for Method II is less than orequal to that for Method I, since in those cases, MethodII's estimate is a lower bound on the Method I estimate.

On some problems, the planners did not agree. InTable I we have listed three such problems (the last threeentries). Since Method I was limited to searching forplans of three steps or shorter, there were occasionswhen Method II was able to search deeper and find asuperior plan. The four and five step plans listed in thetable are two such examples. On some problems MethodI was able to find a superior plan within its length boundby taking advantage of state convergence. The plan(300 90) is an example. In this problem, the initial state isthe northeast corner of the tray in a verticalconfiguration, and the desired goal is the same corner,but in a horizontal configuration. Figure 4 shows a traceof the anticipated object locations as each of the plansproceeds. Method II (below) finds an adequate plan: Ittilts the tray at 180, moving the object to the (sev)configuration. Then it tilts at 330, moving the objectnominally to (nh). Finally, it tilts at 90, moving theobject to (ne h). This plan is good, but it can fail by (forexample) the tilt 330 not aligning the object horizontally.Method I's plan (above) is better since its intermediatehyperstate aligns the object horizontally but causesdivergence of its position. The second tilt of the plancauses all such intermediate states to converge to (neh).

Both planners were implemented as compiledCommon Lisp programs and were tested on the samecomputer, a Sun 3/280. Method I's search was truncatedat depth three, and so all of its plans were between oneand three steps in length. Method II's plans were allbetween one and six steps in length. Over the 132problems, Method I took an average of 62 seconds realtime per problem, with a standard deviation of 2.8seconds. The average time for Method II on the sameproblems was 0.46 seconds, with a standard deviation of0.62 seconds. The minimum planning time for Method Iwas 59 seconds and the maximum was 87 seconds. Theminimum planning time for Method II was 0.040 secondsand the maximum was 6.5 seconds. The average length ofplans found by Method I was 2.4 tilts with a standarddeviation of 0.75 tilts. The average plan length for theMethod II planner was also 2.4 tilts, but with a standarddeviation of 1.2 tilts.

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

Automatic planning 571

Table I. Comparison of plans generated by the two methods.

Start

(nh)(nh)(nh)(nh)(nh)(nh)(nh)(nh)(nh)(nh)(nh)

(swv)

(nev)(nev)

Problem

Goal

(neh)(nev)(ev)(seh)(sev)(sh)(swh)(swv)(wv)(nwh)(nwv)

(nh)

(nh)(neh)

Method I

Plan

(90)(240180 60)(120)(9018090)(150 30180)(240 150)(270270180)(240 180)(240)(270 30270)(2400)

(120 300 30)

(270 30 60)(300 90)

P

0.980.970.970.980.980.970.970.980.990.970.98

0.62

0.670.98

Method 11

Plan

(90)(240180 60)(120)(270150)(240 0120)(240150)(270 180)(240 180)(240)(270)(2400)

(60180240300 30)

(180240 300 30)(18033091)

P

0.980.960.970.960.970.970.960.970.990.970.98

0.75

0.750.87

5./ Physical test of resulting plansWe further tested the last two problems of Table I withthe robot. These two problems represented cases wherethe methods produced significantly different plans. Interms of predicted success ratios (probability of reachingthe goal), Method II found a better plan for the firstproblem because it was able to search deeper. In thesecond problem, Method I found a better plan because itconsidered state divergence and state convergence.

Table II summarizes the head-to-head competition.Each of the four plans was executed 200 times. In thefirst problem, the estimated success ratio was lower thanthe observed success ratio. There are at least twoexplanations for this. First, the estimates for these plansmay be low because insufficient data had been collectedto predict the correct transition probabilities for theactions comprising the plans. This could be corrected bytaking more experimental trials prior to planning. Asecond explanation is that the circumstances under whichthe data were collected did not truly reflect the

300 90

180\330 90

Fig. 4. Two tray-tilting plans. The plans are read from left toright, with intermediate states shown as views from above thetray. Tray tilt directions are shown between the states that theylink. The problem is to re-orient the object from vertical tohorizontal, leaving the object in the upper right corner of thetray. Above: Method I's plan. Below: Method II's plan.

circumstances under which the plans were executed. Thedata were collected by selecting tilt azimuths at randomfrom the set of twelve nominal aximuths, and this mayhave led to a greater variety of tilt outcomes than is thecase when a sequence of high-quality plans is executed.Since we don't know a priori which plans are the best, itis unclear how to correct this possible source of error.

In the second problem, the estimated and observedsuccess rates were nearly equal. In both cases selectedfor detailed testing, the plan with higher estimatedsuccess rate performed better.

9. DISCUSSIONWe began this paper by acknowledging that it is notalways possible to predict the exact outcome of actions inrobotic manipulation due to factors such as control error,friction, and dynamics. In response, we consideredstochastic models of action. When actions are viewed asstochastic, the conventional planning paradigm must bemodified to search for plans that achieve a goal with highprobability.

For the class of problems where state and actionspaces are finite, we can view planning as finding asequence of actions that transfer probability mass fromthe initial state to the desired final state. Certainly it istrue that the .state and action spaces in many robotplanning problems are not finite. However, they canoften be discretized using a uniform grid or preferably,with a "criticality"-based partition justified by systemgeometry.15

Planning is affected by two contrasting properties ofstochastic actions. We say that an action exhibits statedivergence when it distributes probability mass. We saythat an action exhibits state convergence when it focusesprobability mass. These properties affect the choice ofplaning method. We considered two methods.

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

572 Automatic planning

Table II. Summary of execution trials for four plans. The estimated and measured successcolumns show the number of successful plan executions (out of 200 trials) along with thecorresponding decimal fractions. Method II finds a better plan in the first instance becauseMethod I neglects plans longer than 3 steps. However in the second instance, Method Ifinds a better plan because Method II neglects plans which incur state divergence.

StartState

(nev)(nev)

(nev)(nev)

GoalState

(nh)(nh)

(neh)(neh)

PlanningApproach

Method IMethod II

Method IMethod II

BestPlan

(270 3060)(180240 30030)

(300 90)(180 33090)

EstimatedSuccesses

134 (0.67)150(0.75)

196(0.98)174(0.87)

MeasuredSuccesses

170(0.85)192 (0.96)

198(0.99)171 (0.855)

Method I uses a forward exhaustive search (to a fixedplan length) to find a plan most likely to produce thedesired goal. Method II uses a backward best-first searchto find a plan that maximizes a lower bound on theprobability of reaching the goal. Method I keeps track ofall probability mass as it evaluates plans, monitoring bothstate divergence and state convergence as the planprogresses. However, keeping track of all probabilitymass requires substantial computation; Method I canonly consider short plans. Method II keeps track of onlythe most probable trajectory, monitoring state diver-gence and ignoring state convergence. Accordingly,Method II is faster and can consider longer plans, but itsometimes misses good plans.

Which method is better? It depends on the availableplanning time and the degree of state divergence in theavailable actions. Because Method I checks all plansrequiring fewer than k actions, Method II cannot find abetter plan within this depth. However, depending on thenumber of available actions, the achievable value of kcould be quite small, prompting us to prefer Method II,especially when we can avoid or minimize statedivergence.

Sometimes divergence is unavoidable. For example,consider a tray-tilting problem where we want to achievea particular object configuration but the initialconfiguration is unknown. This is equivalent to a highlydivergent action occurring before plan execution. It is asif someone randomly shook the tray prior to the robotcarrying out its plan. Consider a case where the initialhyperstate reflects a uniformly distributed state prob-ability. Method I can find the best plan for getting thetile into the northeast corner. Method II can't solve thisproblem. As another example, consider programming arobot to pick up parts on an assembly line. Each time apart arrives, its initial position and orientation relative tothe robot will be slightly different. This results from adivergent action earlier in the assembly line.

Sometimes state divergence is desirable. For example,it may be more efficient to allow divergence followed byeffective convergence. Consider a typical plan for causingtwo gears to mesh: we jiggle the gears at random(divergence), until the gears fall into alignment(convergence). This plan is more efficient than trying toavoid divergence by carefully aligning the gears. Theprocess of deliberately incurring divergence is known as

randomization, and has been recently identified as animportant component of manipulation.36

For most problems in the tray-tilting domain, MethodII found plans that were comparable to those found byMethod I. Studying the resulting plans, we discovered itwas often possible to avoid significant state divergence.This is a property of the tray-tilting task, and is notguaranteed to be present in all manipulation tasks. For afew problems, it was better to incur state divergencefollowed by state convergence. An example of such aproblem was given in Figure 3. On a problem like this,Method I is superior.

It may be possible to build a hybrid of these twomethods: an efficient planner that considers some stateconvergence. Instead of tracking the probabilities forevery state, as Method I does, or tracking the probabilityof only the most likely state, as Method II does, maybethe hypothetical planner could track the probabilities ofthe two or three most likely states. Like Method II, thehypothetical planner would, in most cases, find a plan ofnear-optimal quality. Even more desirable would be anapproximation algorithm, where we might be able toguarantee, for all problems, that the plans producedwould be suboptimal by no more than a fixed constantfactor.

Uncertainty is inevitable for any robot operating in thephysical environment. When the outcome of eachpotential action is unpredictable, planning a reliablesequence of robot actions is extremely difficult. Yetreliability is essential if robots are to be adopted in newapplications. The past thirty years of research in roboticshas taught us that it is extremely difficult to deriveaccurate analytic models for simple actions such aspushing a part across a table. More complex actions areeven harder to model analytically. One alternative is toconsider stochastic models obtained from physicalobservations. In this paper we have considered theproblem of planning with such models.

Since finding optimal plans is known to beNP-Complete, we compared an exponential-time plannerthat finds optimal plans with a polynomial-time plannerthat finds sub-optimal plans. Intuitively, there should bea tradeoff between planning time and plan performance.We explored this tradeoff by comparing the twoalgorithms in the context of a physical application. Theseexperiments led us to identify two properties of actions

http://journals.cambridge.org Downloaded: 13 Aug 2012 Username: noodletoad IP address: 128.2.179.9

Automatic planning 573

that can aid in choosing between the algorithms forfuture applications where state and action spaces arefinite.

AcknowledgementsWe thank Matt Mason for suggesting this collaboration.We also thank Matt, Rob Kass, Kevin Lynch, TomMitchell, Steve Shreve, and Manuela Veloso for helpfulcomments. This paper is a revised and expanded versionof a paper that appeared in the proceedings of theDARPA Workshop on Innovative Approaches toPlanning. Scheduling, and Control, in November 1990.

References1. K.J. Astrom, "Adaptive feedback control." Proceedings of

the IEEE 75(2) 185-217 (1987).2. M.H. DeGroot, Optimal Statistical Decisions (McGraw-

Hill. New York. 1970).3. J.O. Berger, Statistical Decision Theory and Bayesian

Analysis (Springer-Verlag, Berlin, 1985).4. R.A. Howard, Dynamic Probabilistic Systems (2 volumes)

(John Wiley, New York, 1971).5. E.B. Dynkin and A.A. Yushkevich, Controlled Markov

Processes (Springer-Verlag, Berlin, 1979).6. C.H. Papadimitriou and J.N. Tsitsiklis, "The complexity of

Markov decision processes" Mathematics of OperationsResearch, 12(3) 441-450 (August 1987).

7. J.A. Feldman and R.F. Sproull, "Decision theory andartificial intelligence II: The hungry monkey". CognitiveScience, 1 158-192(1977).

8. J. Pearl, Probabilistic Reasoning in Intelligent Systems(Morgan-Kaufmann, Los Altos, California, 1988).

9. S. Russell and E. Wefald "Decision-theoretic control ofreasoning: General thery and an application to gameplaying" Technical Report UCB/CSD 88/435 (UCBerkeley, October 1988).

10. P. Cheeseman "In defense of probability" Proceedings ofthe Ninth International Joint Conference on ArtificialIntelligence, Los Angeles, CA, IJCAI (August 1985) pp.1002-1009.

11. H.A. Simon, "A behavioral model of rational choice"Quart. J. Economics 69. 99-118 (1955). Reprinted inModels of Thought (Yale University Press, 1979).

12. O. Etzioni, "Tractable decicion-analytic control"Proceedings of the First International Conference onPrinciples of Knowledge Representation and Reasoning,Los Altos, California (1989) (Morgan Kaufmann, SanMateo, 1989). An expanded version is available astechnical report CMU-CS-89-119.

13. D.P. Bertsekas, Dynamic Programming: Deterministic andStochastic Models (Prentice-Hall, Englewood Cliffs, NewJersey, 1987).

14. T.L. Dean and M.P. Wellman, Planning and Control.(Morgan Kaufmann, San Mateo, California, 1991).

15. J. Simons, H. Van Brussel, J. De Schutter, and J. Verhaert,"A self-learning automation with variable resolution forhigh precision assembly by industrial robots" IEEETransactions on Automatic Control AC27(5) 1109-1113(October 1982).

16. K.S. Narendra and M.A.L. Thathachar Learning Automata:An Introduction (Prentice Hall, Englewood Cliffs, NewJersey, 1988).

17. B. Dufay and J.-C. Latombe, "An approach to automaticrobot programming based on inductive learning"International Symposium on Robotics Research (1983) pp.97-115.

18. A.G. Barto. "Connectionist learning for control: Anoverview" Technical Report COINS 89-89 (University ofMassachusetts-Amherst, September 1989).

19. S.J. Buckley. "Teaching compliant motion strategies"IEEE]. Robotics and Automation 5(1) 112-118 (1989).

20. K.P. Gross. "Concept Acquisition through AttributeEvolution and Experimentation" PhD thesis (CarnegieMellon University, School of Computer Science, May1991).

21. S. Bennett and G. DeJong, "Comparing stochastic planningto the acquisition of increasingly permissive plans forcomplex, uncertain domains" International Workshop onMachine Learning (June 1991) pp. 586-590.

22. A.D. Christiansen, "Automatic Acquisition of TaskTheories for Robotic Manipulation" PhD thesis (CarnegieMellon University, School of Computer Science. March1992).

23. M.T. Mason, "Mechanics and planning of manipulatorpushing operations" Int. J. Robotics Research 5(3) 53-71(1986).

24. R. C. Brost, "Automatic grasp planning in the present ofuncertainty" Int. J. Robotics Research 7(1) 3-17 (February,1988).

25. J.C. Trinkle. J.M. Abel and R.P. Paul, "An investigation offrictionless enveloping grasping in the plane" Int. J.Robotics Research 7(3) 33-51 (June, 1988).

26. M.A. Peshkin. "The motion of a pushed, slidingworkpiece" IEEE Transactions on Robotics and Automa-tion 4(6) 569-598 (December, 1988).

27. R.C. Brost, "Analysis and Planning of Planar ManipulationTasks" PhD thesis (Carnegie Mellon University, School ofComputer Science, January 1991).

28. M.A. Erdmann and M.T. Mason, "An exploration ofsensorless manipulation" IEEE J. Robotics and Automa-tion 4(4) 369-379 (August, 1988).

29. R.H. Taylor, M.T. Mason and K.Y. Goldberg, "Sensor-based manipulation planning as a game with nature"International Symposium on Robotics Research (August,1987) pp. 421-429.

30. A.D. Christiansen, "Manipulation planning from empiricalbackprojections" IEEE International Conference on Robo-tics and Automation (April, 1991) pp. 762-768.

31. A.D. Christiansen, M.T. Mason and T.M. Mitchell,"Learning reliable manipulation strategies without initialphysical models" Robotics and Autonomous Systems 87-18 (November, 1991).

32. A.D. Christiansen, "Learning to predict in uncertaincontinuous tasks" International Conference on MachineLearning (July, 1992) pp. 72-81.

33. K.Y. Goldberg, "Stochastic Plans for Robotic Manipula-tion" PhD thesis (Garnegie Mellon University, School ofComputer Science, August 1990).

34. J. Pearl, Heuristics: Intelligent Search Strategies forComputer Problem Solving. (Addison-Wesley, Reading,Massachusetts, 1984).

35. R. Wilson and J.-C. Latombe, "On the qualitative structureof a mechanical assembly" National Conference onArtificial Intelligence (1992) pp. 697-702.

36. M.A. Erdmann, "On Probabilistic Strategies for RobotTasks" PhD. thesis (MIT, Cambridge, MA, 1989).