34
Concurrent Probabilistic Temporal Planning (CPTP) Mausam Joint work with Daniel S. Weld University of Washington Seattle

Concurrent Probabilistic Temporal Planning (CPTP)

  • Upload
    sora

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Concurrent Probabilistic Temporal Planning (CPTP). Mausam Joint work with Daniel S. Weld University of Washington Seattle. Motivation. Three features of real world planning domains : Durative actions All actions (navigation between sites, placing instruments etc.) take time. - PowerPoint PPT Presentation

Citation preview

Page 1: Concurrent Probabilistic Temporal Planning (CPTP)

Concurrent Probabilistic Temporal Planning (CPTP)

Mausam Joint work with Daniel S. WeldUniversity of WashingtonSeattle

Page 2: Concurrent Probabilistic Temporal Planning (CPTP)

Motivation

Three features of real world planning domains : Durative actions

All actions (navigation between sites, placing instruments etc.) take time.

Concurrency Some instruments may warm up Others may perform their tasks Others may shutdown to save power.

Uncertainty All actions (pick up the rock, send data etc.)

have a probability of failure.

Page 3: Concurrent Probabilistic Temporal Planning (CPTP)

Motivation (contd.)

Concurrent Temporal Planning (widely studied with deterministic

effects) Extends classical planning Doesn’t easily extend to probabilistic

outcomes. Concurrent planning with uncertainty

(Concurrent MDPs – AAAI’04) Handle combinations of actions over an MDP Actions take unit time.

Few planners handle the three in concert!

Page 4: Concurrent Probabilistic Temporal Planning (CPTP)

Outline of the talk

MDP and CoMDPConcurrent Probabilistic Temporal

PlanningConcurrent MDP in augmented state space.

Solution Methods for CPTPTwo heuristics to guide the searchHybridisation

Experiments & ConclusionsRelated & Future Work

Page 5: Concurrent Probabilistic Temporal Planning (CPTP)

Markov Decision Process

S : a set of states, factored into Boolean

variables.A : a set of actionsPr (S£A£S! [0,1]): the transition modelC (A! R) : the cost models0 : the start stateG : a set of absorbing goals

unit duration

Page 6: Concurrent Probabilistic Temporal Planning (CPTP)

GOAL of an MDP

Find a policy (S ! A) which:minimises expected cost of reaching

a goal for a fully observable Markov decision process if the agent executes for indefinite

horizon.

Page 7: Concurrent Probabilistic Temporal Planning (CPTP)

Equations : optimal policy

Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from s.

J* should satisfy:

Page 8: Concurrent Probabilistic Temporal Planning (CPTP)

Min

Bellman Backup

a1

a2

a3

s

Jn

Jn

Jn

Jn

Jn

Jn

Jn

Qn+1(s,a)

Jn+1(s)

Ap(s)

min

Page 9: Concurrent Probabilistic Temporal Planning (CPTP)

Min

RTDP Trial

a1

a2

a3

Jn

Jn

Jn

Jn

Jn

Jn

Jn

Qn+1(s,a)

Jn+1(s)

Ap(s)

amin = a2

Goal

s

min

Page 10: Concurrent Probabilistic Temporal Planning (CPTP)

Real Time Dynamic Programming(Barto, Bradtke and Singh’95)

Trial : Simulate greedy policy;

Perform Bellman backup on visited states

Repeat RTDP Trials until cost function converges Anytime behaviour Only expands reachable state space Complete convergence is slow

Labeled RTDP (Bonet & Geffner’03) Admissible, if started with admissible cost function. Monotonic; converges quickly

optimistic

Lower bound

Page 11: Concurrent Probabilistic Temporal Planning (CPTP)

Concurrent MDP (CoMDP)(Mausam & Weld’04)

Allows concurrent combinations of actions

Safe execution: Inherit mutex definitions from classical planning:Conflicting preconditionsConflicting effects Interfering preconditions and effects

Page 12: Concurrent Probabilistic Temporal Planning (CPTP)

Jn

Jn

Jn

Jn

Jn

Bellman Backup (CoMDP)

a2

a1,a2

a3

sJn+1(s)

Ap(s)

a1

a1,a

3

a2,a3

a1,a2,a3

Jn

Jn

Jn

Jn

Jn

Jn JnJn

Jn

Jn

Jn

Jn

Jn

Exponential blowup to calculate a

Bellman Backup!

min

Page 13: Concurrent Probabilistic Temporal Planning (CPTP)

Sampled RTDP

RTDP with Stochastic (partial) backups:ApproximateAlways try the last best combination Randomly sample a few other

combinations In practice

Close to optimal solutionsConverges very fast

Page 14: Concurrent Probabilistic Temporal Planning (CPTP)

Outline of the talk

MDP and CoMDPConcurrent Probabilistic Temporal

PlanningConcurrent MDP in augmented state space.

Solution Methods for CPTPTwo heuristics to guide the searchHybridisation

Experiments & ConclusionsRelated & Future Work

Page 15: Concurrent Probabilistic Temporal Planning (CPTP)

Modelling CPTP as CoMDP

CoMDP CPTP

Model explicit action durationsMinimise expected make-span.

If we initialise C(a) as its duration – (a) :

Aligned epochs Interwoven epochs

Page 16: Concurrent Probabilistic Temporal Planning (CPTP)

Augmented state space

0 3 6 9

X

a

b

c

e

d f

h

g

<X,;><X1,{(a,1), (c,3)}>X1 : Application of b on X.

<X2,{(h,1)}>X2 : Application of a, b, c, d and e over X.

Time

Page 17: Concurrent Probabilistic Temporal Planning (CPTP)

Simplifying assumptions

All actions have deterministic durations. All action durations are integers. Action model

Preconditions must hold until end of action. Effects are usable only at the end of action.

Properties : Mutex rules are still required. Sufficient to consider only epochs when an action

ends

Page 18: Concurrent Probabilistic Temporal Planning (CPTP)

Completing the CoMDP

Redefine Applicability set Transition function Start and goal states.

Example: Transition function is redefined

Agent moves forward in time to an epoch where some action completes.

Start state : <s0,;> etc.

Page 19: Concurrent Probabilistic Temporal Planning (CPTP)

Solution

CPTP = CoMDP in interwoven state space.

Thus one may use our sampled RTDP (etc)

PROBLEM: Exponential blowup in the size of the state space.

Page 20: Concurrent Probabilistic Temporal Planning (CPTP)

Outline of the talk

MDP and CoMDPConcurrent Probabilistic Temporal

PlanningConcurrent MDP in augmented state space.

Solution Methods for CPTPSolution 1 : Two heuristics to guide the

searchSolution 2 : Hybridisation

Experiments & ConclusionsRelated & Future Work

Page 21: Concurrent Probabilistic Temporal Planning (CPTP)

Max Concurrency Heuristic (MC)

Define c : maximum number of actions executable concurrently in the domain.

•J*(X) · 2£ J*(<X,;>)

•J*(<X,;>) ¸ J*(X)/2

a

b c

J*(<X,;>) = 10

X Ga b c

J*(X) · 20

X G

Serialisation

Admissible Heuristic

Page 22: Concurrent Probabilistic Temporal Planning (CPTP)

Eager Effects Heuristic : Solving a relaxed problem

S : S £ ZLet (X be a state where

X is the world state. : time remaining for all actions

(started anytime in the history) to complete execution.

Start state : (s0,0)Goal states : { (X,0) | X2G }

Page 23: Concurrent Probabilistic Temporal Planning (CPTP)

Eager Effects Heuristic (contd.)

After 2 units(V,6)a

bX

2

8V

c 4

Allow all actions even when

mutex with a or c!

Allowing inapplicable actions to execute, thus

optimistic!

Assuming information of action

effects ahead of time, thus optimisitic!

Hence the name –Eager Effects!

Admissible Heuristic

Page 24: Concurrent Probabilistic Temporal Planning (CPTP)

Solution2 : Hybridisation

ObservationsAligned epoch policy is sub-optimal

but fast to compute. Interwoven epoch policy is optimal

but slow to compute.

Solution: Produce a hybrid policy i.e. : Output interwoven policy for probable

states.Output aligned policy for improbable

states.

Page 25: Concurrent Probabilistic Temporal Planning (CPTP)

Path to goals

s G

GLow

Prob.

Page 26: Concurrent Probabilistic Temporal Planning (CPTP)

Hybrid algorithm (contd.)

Observation: RTDP explores probable branches much more than others.

Algorithm(m,k,r) : Loop

Do m RTDP trials: let current value of start state be J(s0).

Output a hybrid policy () Interwoven policy for states visited > k times Aligned policy for other states.

Evaluate policy : J(s0)

Stop if {J(s0) – J(s0)} < rJ(s0)

Less than optimal

Greater than optimal

Page 27: Concurrent Probabilistic Temporal Planning (CPTP)

Hybridisation

Outputs a proper policy : Policy defined at all reachablepolicy states Policy guaranteed to take agent to goal.

Has an optimality ratio (r) parameter Controls balance between optimality & running

times. Can be used as an anytime algorithm. Is general –

we can hybridise two algorithms in other cases e.g. in solving original concurrent MDP.

Page 28: Concurrent Probabilistic Temporal Planning (CPTP)

Outline of the talk

MDP and CoMDPConcurrent Probabilistic Temporal

PlanningConcurrent MDP in augmented state space.

Solution Methods for CPTPTwo heuristics to guide the searchHybridisation

Experiments & ConclusionsRelated & Future Work

Page 29: Concurrent Probabilistic Temporal Planning (CPTP)

Experiments

DomainsRoverMachineShopArtificial

State Variables: 14-26Durations: 1-20

Page 30: Concurrent Probabilistic Temporal Planning (CPTP)

Speedups in Rover domain

Efficiency of different methods

1

10

100

1000

10000

1 2 3 4 5 6

Different Rover Problems

Tim

e in

sec (

in lo

gari

thm

ic s

cale

)

Interwoven Epoch

Max Concurrency

Eager Effects

Hybrid Algorithm

Aligned epochs

Page 31: Concurrent Probabilistic Temporal Planning (CPTP)

Qualities of solution

Solution Quality of different methods

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1 2 3 4 5 6

Different Rover Problems

Rati

o o

f m

ake-s

pan

to

th

e o

pti

mal

Interwoven Epoch

Max Concurrency

Eager Effects

Hybrid Algorithm

Aligned epochs

Page 32: Concurrent Probabilistic Temporal Planning (CPTP)

Experiments : Summary

Max Concurrency heuristic Fast to compute Speeds up the search.

Eager Effects heuristic High quality Can be expensive in some domains.

Hybrid algorithm Very fast Produces good quality solutions.

Aligned epoch model Superfast Outputs poor quality solutions at times.

Page 33: Concurrent Probabilistic Temporal Planning (CPTP)

Related Work

Prottle (Little, Aberdeen, Thiebaux’05)

Generate, test and debug paradigm (Younes & Simmons’04)

Concurrent options (Rohanimanesh & Mahadevan’04)

Page 34: Concurrent Probabilistic Temporal Planning (CPTP)

Future Work

Other applications of hybridisation CoMDP MDP OverSubscription Planning

Relaxing the assumptions Handling mixed costs Extending to PDDL2.1 Stochastic action durations

Extensions to metric resources State space compression/aggregation