34
A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. Musliner Reid G. Simmons Honeywell Laboratories Carnegie Mellon University

A Framework for Planning in Continuous-time Stochastic Domains

Embed Size (px)

DESCRIPTION

A Framework for Planning in Continuous-time Stochastic Domains. Introduction. Policy generation for complex domains Uncertainty in outcome and timing of actions and events Time as a continuous quantity Concurrency Rich goal formalism Achievement, maintenance, prevention Deadlines. - PowerPoint PPT Presentation

Citation preview

Page 1: A Framework for Planning in Continuous-time Stochastic Domains

A Framework for Planning in Continuous-time Stochastic Domains

Håkan L. S. YounesCarnegie Mellon University

David J. Musliner Reid G. SimmonsHoneywell Laboratories Carnegie Mellon

University

Page 2: A Framework for Planning in Continuous-time Stochastic Domains

Introduction

Policy generation for complex domains Uncertainty in outcome and timing of

actions and events Time as a continuous quantity Concurrency

Rich goal formalism Achievement, maintenance, prevention Deadlines

Page 3: A Framework for Planning in Continuous-time Stochastic Domains

Motivating Example

Deliver package from CMU to Honeywell

CMU PIT

PittsburghMSP

Honeywell

Minneapolis

Page 4: A Framework for Planning in Continuous-time Stochastic Domains

Elements of Uncertainty

Uncertain duration of flight and taxi ride

Plane can get full without reservation

Taxi might not be at airport when arriving in Minneapolis

Package can get lost at airports

Page 5: A Framework for Planning in Continuous-time Stochastic Domains

Modeling Uncertainty

Associate a delay distribution F(t) with each action/event a

F(t) is the cumulative distribution function for the delay from when a is enabled until it triggers

driving at airport

arrive

U(20,40)

Pittsburgh Taxi

Page 6: A Framework for Planning in Continuous-time Stochastic Domains

Concurrency

Concurrent semi-Markov processes

driving at airport

U(20,40)

Pittsburgh Taxi

at airport moving

arrive

move

return

Exp(1/40)

U(10,20)Minneapolis Taxi

PT drivingMT at airport

PT drivingMT moving

t=0

t=24

MT move

Generalized semi-Markov process

Page 7: A Framework for Planning in Continuous-time Stochastic Domains

Rich Goal Formalism

Goals specified as CSL formulae ::= true | a | | | Pr≥ () ::= U≤t | ◊≤t | □≤t

Page 8: A Framework for Planning in Continuous-time Stochastic Domains

Goal for Motivating Example

Probability at least 0.9 that the package reaches Honeywell within 300 minutes without getting lost on the way Pr≥0.9(pkg lost U≤300 pkg@Honeywell)

Page 9: A Framework for Planning in Continuous-time Stochastic Domains

Problem Specification

Given: Complex domain model

Stochastic discrete event system Initial state Probabilistic temporally extended goal

CSL formula

Wanted: Policy satisfying goal formula in initial

state

Page 10: A Framework for Planning in Continuous-time Stochastic Domains

Generate, Test and Debug [Simmons 88]

Generate initial policy

Test if policy is good

Debug and repair policy

good

badrepeat

Page 11: A Framework for Planning in Continuous-time Stochastic Domains

Generate

Ways of generating initial policy Generate policy for relaxed problem Use existing policy for similar problem Start with null policy Start with random policy

Not focus of this talk! Generate

Test

Debug

Page 12: A Framework for Planning in Continuous-time Stochastic Domains

Test

Use discrete event simulation to generate sample execution paths

Use acceptance sampling to verify probabilistic CSL goal conditions

Generate

Test

Debug

Page 13: A Framework for Planning in Continuous-time Stochastic Domains

Debug

Analyze sample paths generated in test step to find reasons for failure

Change policy to reflect outcome of failure analysis

Generate

Test

Debug

Page 14: A Framework for Planning in Continuous-time Stochastic Domains

More on Test Step

Generate initial policy

Test if policy is good

Debug and repair policy

Test

Page 15: A Framework for Planning in Continuous-time Stochastic Domains

Error Due to Sampling

Probability of false negative: ≤ Rejecting a good policy

Probability of false positive: ≤ Accepting a bad policy

(1−)-soundness

Page 16: A Framework for Planning in Continuous-time Stochastic Domains

Acceptance Sampling

Hypothesis: Pr≥()

Page 17: A Framework for Planning in Continuous-time Stochastic Domains

Performance of Test

Actual probability of holding

Pro

bab

ility

of

acc

ep

tin

gPr ≥

(

) as

tru

e

1 –

Page 18: A Framework for Planning in Continuous-time Stochastic Domains

Ideal Performance

Actual probability of holding

Pro

bab

ility

of

acc

ep

tin

gPr ≥

(

) as

tru

e

1 –

False negatives

False positives

Page 19: A Framework for Planning in Continuous-time Stochastic Domains

Realistic Performance

– +

Indifference region

Actual probability of holding

Pro

bab

ility

of

acc

ep

tin

gPr ≥

(

) as

tru

e

1 –

False negatives

False positives

Page 20: A Framework for Planning in Continuous-time Stochastic Domains

SequentialAcceptance Sampling [Wald 45]

Hypothesis: Pr≥()

True, false,or anothersample?

Page 21: A Framework for Planning in Continuous-time Stochastic Domains

Graphical Representation of Sequential Test

Number of samples

Nu

mb

er

of

posi

tive s

am

ple

s

Page 22: A Framework for Planning in Continuous-time Stochastic Domains

Graphical Representation of Sequential Test

We can find an acceptance line and a rejection line given , , , and

Reject

Accept

Continue sampling

Number of samples

Nu

mb

er

of

posi

tive s

am

ple

s A,,,(n)

R,,,(n)

Page 23: A Framework for Planning in Continuous-time Stochastic Domains

Graphical Representation of Sequential Test

Reject hypothesis

Reject

Accept

Continue sampling

Number of samples

Nu

mb

er

of

posi

tive s

am

ple

s

Page 24: A Framework for Planning in Continuous-time Stochastic Domains

Graphical Representation of Sequential Test

Accept hypothesis

Reject

Accept

Continue sampling

Number of samples

Nu

mb

er

of

posi

tive s

am

ple

s

Page 25: A Framework for Planning in Continuous-time Stochastic Domains

Anytime Policy Verification

Find best acceptance and rejection lines after each sample in terms of and

Number of samples

Nu

mb

er

of

posi

tive s

am

ple

s

Page 26: A Framework for Planning in Continuous-time Stochastic Domains

Verification Example

Initial policy for example problem

CPU time (seconds)

Err

or

pro

babili

ty

0

0.1

0.2

0.3

0.4

0.5

0 0.02 0.04 0.06 0.08 0.1 0.12

rejectaccept

=0.01

Page 27: A Framework for Planning in Continuous-time Stochastic Domains

More on Debug Step

Generate initial policy

Test if policy is good

Debug and repair policyDebug

Page 28: A Framework for Planning in Continuous-time Stochastic Domains

Role of Negative Sample Paths

Negative sample paths provide evidence on how policy can fail “Counter examples”

Page 29: A Framework for Planning in Continuous-time Stochastic Domains

Generic Repair Procedure

1. Select some state along some negative sample path

2. Change the action planned for the selected state

Need heuristics to make informed state/action choices

Page 30: A Framework for Planning in Continuous-time Stochastic Domains

Scoring States

Assign –1 to last state along negative sample path and propagate backwards

Add over all negative sample pathss9 failures1 s5s2

s1 s5 s3 failure

−1−−2−3−4

−1−−2−3

s5

s5

Page 31: A Framework for Planning in Continuous-time Stochastic Domains

Example

Package gets lost at Minneapolis airport while waiting for the taxi

Repair: store package until taxi arrives

Page 32: A Framework for Planning in Continuous-time Stochastic Domains

Verification of Repaired Policy

CPU time (seconds)

Err

or

pro

babili

ty

=0.01

0

0.1

0.2

0.3

0.4

0.5

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

rejectaccept

Page 33: A Framework for Planning in Continuous-time Stochastic Domains

Comparing Policies

Use acceptance sampling: Pair samples from the verification of

two policies Count pairs where policies differ Prefer first policy if probability is at

least 0.5 of pairs where first policy is better

Page 34: A Framework for Planning in Continuous-time Stochastic Domains

Summary

Framework for dealing with complex stochastic domains

Efficient sampling-based anytime verification of policies

Initial work on debug and repair heuristics