Upload
barnard-burke
View
217
Download
1
Embed Size (px)
Citation preview
Hierarchical Methods forPlanning under Uncertainty
Thesis Proposal
Joelle Pineau
Thesis Committee:
Sebastian Thrun, Chair
Matthew Mason
Andrew Moore
Craig Boutilier, U. of Toronto
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Integrating robots in living environments
The robot’s role:- Social interaction- Mobile manipulation- Intelligent reminding- Remote-operation- Data collection / monitoring
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
A broad perspective
GOAL = Selecting appropriate actions
USER + WORLD + ROBOT
ACTIONS
OBSERVATIONSBeliefstate
STATE
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Cause #1: Non-deterministic effects of actions
Cause #2: Partial and noisy sensor information
Cause #3: Inaccurate model of the world and the user
Why is this a difficult problem?
UNCERTAINTY
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Cause #1: Non-deterministic effects of actions
Cause #2: Partial and noisy sensor information
Cause #3: Inaccurate model of the world and the user
Why is this a difficult problem?
UNCERTAINTY
A solution: Partially Observable MarkovDecision Processes (POMDPs)
S3o1, o2
S1o1, o2
S2o1, o2
a1
a2
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The truth about POMDPs
• Bad news:
– Finding an optimal POMDP action selection policy is computationally intractable for complex problems.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The truth about POMDPs
• Bad news:
– Finding an optimal POMDP action selection policy is computationally intractable for complex problems.
• Good news:
– Many real-world decision-making problems exhibit structure inherent to the problem domain.
– By leveraging structure in the problem domain, I propose an algorithm that makes POMDPs tractable, even for large domains.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
How is it done?
• Use a “Divide-and-conquer” approach:
– We decompose a large monolithic problem into a collection of loosely-related smaller problems.
Dialoguemanager
Healthmanager Social
manager
Remindingmanager
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Thesis statement
Decision-making under uncertaintycan be made tractable for complex problems
by exploiting hierarchical structurein the problem domain.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Outline
• Problem motivation
Partially observable Markov decision processes
• The hierarchical POMDP algorithm
• Proposed research
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
POMDPs within the family of Markov models
Markov Chain Hidden Markov Model(HMM)
Markov Decision Process(MDP)
Partially Observable MDP(POMDP)
Uncertainty in sensor input?
no
no
Controlproblem?
yes
yes
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
POMDP parameters: Initial belief: b0(s)=Pr(so=s) Observation probabilities: O(s,a,o)=Pr(o|s,a) Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Rewards: R(s,a)
HMM
What are POMDPs?
Components:Set of states: sSSet of actions: aASet of observations: oO
0.5
0.5
1
MDP
S2Pr(o1)=0.9Pr(o2)=0.1
S1Pr(o1)=0.5Pr(o2)=0.5
a1
a2S3
Pr(o1)=0.2Pr(o2)=0.8
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
A POMDP example: The tiger problem
S1“tiger-left”
Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15
S2“tiger-right”
Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85
Actions={ listen, open-left, open-right}
Reward Function: R(a=listen) = -1R(a=open-right, s=tiger-left) = 10R(a=open-left, s=tiger-left) = -100
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
What can we do with POMDPs?
1) State tracking:– After an action, what is the state of the world, st ?
2) Computing a policy:– Which action, aj, should the controller apply next?
Very hard!
Not so hard.
bt-1 ??
at-1 ot
Robot:
St-1 stWorld:
Control layer:
...
...
??
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The tiger problem: State tracking
S1“tiger-left”
S2“tiger-right”
Belief vector
b0
Belief
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The tiger problem: State tracking
S1“tiger-left”
S2“tiger-right”
Belief vector
b0
Belief
obs=growl-leftaction=listen
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The tiger problem: State tracking
b1
obs=growl-left
S1“tiger-left”
S2“tiger-right”
Belief vector
Belief
b0
action=listen
baoP
sbassPasoP
sbSs
jjii
ij
,|
,|,| 0
1
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Policy Optimization
• Which action, aj, should the controller apply next?
– In MDPs:
• Policy is a mapping from state to action, : si aj
– In POMDPs:
• Policy is a mapping from belief to action, : b aj
• Recursively calculate expected long-term reward for each state/belief:
• Find the action that maximizes the expected reward:
)(),|Pr(),(max)(1
j
N
jiji
ai sVassasRsV
)(),|Pr(),(maxarg)(1
j
N
jiji
ai sVassasRs
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The tiger problem: Optimal policy
Belief vector:
open-leftopen-right listen
S1“tiger-left”
S2“tiger-right”
Optimal policy:
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
• Finite-horizon POMDPs are in worse-case doubly exponential:
• Infinite-horizon undiscounted stochastic POMDPs are EXPTIME-hard, and may not be decidable (|n|).
POMDPComplexity(per step ofvalue iteration)
MDPrecursive upper-bound
Time
Space
Complexity of policy optimization
nOAS ||2 ||||
nOA ||||
||1
2 |||||| OnAS
||1 |||| O
nA
|||| 2 AS
|| S
|| n
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The essence of the problem
• How can we find good policies for complex POMDPs?
• Is there a principled way to provide near-optimal policies in reasonable time?
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Outline
• Problem motivation
• Partially observable Markov decision processes
The hierarchical POMDP algorithm
• Proposed research
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
A hierarchical approach to POMDP planning
• Key Idea: Exploit hierarchical structure in the problem domain to break a problem into many “related” POMDPs.
• What type of structure?
Action set partitioning Act
InvestigateHealth Move
NavigateCheckPulse
AskWhere
Left Right Forward Backward
CheckMeds
subtask
abstractaction
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Assumptions
• Each POMDP controller has a subset of Ao.
• Each POMDP controller has full state set S0, observation set O0.
• Each controller includes discriminative reward information.
• We are given the action set partitioning graph.
• We are given a full POMDP model of the problem: {So,Ao,Oo,Mo}.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The tiger problem: An action hierarchy
Pinvestigate={S0, Ainvestigate, O0, Minvestigate}Ainvestigate={listen, open-right}
act
open-left investigate
open-rightlisten
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Optimizing the “investigate” controller
S1“tiger-left”
S2“tiger-right”
Locally optimal policy:
Belief vector:
open-right listen
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The tiger problem: An action hierarchy
Pact={S0, Aact, O0, Mact}Aact={open-left, investigate}
act
open-left investigate
open-rightlisten
But... R(s, a=investigate)is not defined!
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Modeling abstract actions
Insight: Use the local policy of corresponding low-level controller.
General form: R( si, ak) = R ( si, Policy(controllerk,si) )
Example: R(s=tiger-left,ak =investigate) =
open-right listen open-left
tiger-left 10 -1 -100
tiger-right -100 -1 10
Policy (investigate,s=tiger-left) = open-right
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Optimizing the “act” controller
S1“tiger-left”
S2“tiger-right”
Locally optimal policy:
investigate
Belief vector:
open-left
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The complete hierarchical policy
S1“tiger-left”
S2“tiger-right”
Hierarchical policy:
Belief vector:
open-leftopen-right listen
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
The complete hierarchical policy
S1“tiger-left”
S2“tiger-right”
Hierarchical policy:
open-leftopen-right listen
Optimal policy:
Belief vector:
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Results for larger simulation domains
POMDP H-POMDP MDP
Navigation Problem:|S|=11, |A|=6, |O|=6
CPU Time (secs): 1119.93 2.84 0.000654
Average Reward: 12.5 12.2 0.0
Dialogue Problem:|S|=20, |A|=30, |O|=27
CPU Time (secs): >24hrs 77.99 6.46
Average Reward: 64.43 53.33
%Correct actions: 93.2 80.0
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Related work on hierarchical methods
• Hierarchical HMMs– Fine et al., 1998
• Hierarchical MDPs– Dayan&Hinton, 1993; Dietterich, 1998; McGovern et al., 1998; Parr&Russell,
1998; Singh, 1992.
• Loosely-coupled MDPs– Boutilier et al., 1997; Dean&Lin, 1995; Meuleau et al. 1998; Singh&Cohn, 1998;
Wang&Mahadevan, 1999.
• Factored state POMDPs– Boutilier et al., 1999; Boutilier&Poole, 1996; Hansen&Feng, 2000.
• Hierarchical POMDPs– Castanon, 1997; Hernandez-Gardiol&Mahadevan, 2001; Theocharous et al., 2001;
Wiering&Schmidhuber, 1997.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Outline
• Problem motivation
• Partially observable Markov decision processes
• The hierarchical POMDP algorithm
Proposed research
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Proposed research
1) Algorithmic design
2) Algorithmic analysis
3) Model learning
4) System development and application
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Research block #1: Algorithmic design
• Goal 1.1: Developing/implementing hierarchical POMDP algorithm.
• Goal 1.2: Extending H-POMDP for factorized state representation.
• Goal 1.3: Using state/observation abstraction.
• Goal 1.4: Planning for controllers with no local reward information.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
• Assumption #2:
“Each POMDP controller has full state set S0, and observation set O0.”
• Can we reduce the number of states/observations, |S| and |O|?
Goal 1.3: State/observation abstraction
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
• Assumption #2:
“Each POMDP controller has full state set S0, and observation set O0.”
• Can we reduce the number of states/observations, |S| and |O|?
Yes! Each controller only needs subset of state/observation features.
• What is the computational speed-up?
Goal 1.3: State/observation abstraction
Navigate
Left Right Forward Backward
InvestigateHealth
CheckPulse CheckMeds
POMDP recursive upper-bound
Time complexity:nOAS ||2 ||||||
12 |||||| O
nAS
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Goal 1.4: Local controller reward information
• Assumption #3:
“Each controller includes some amount of discriminative reward information.”
• Can we relax this assumption?
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Goal 1.4: Local controller reward information
• Assumption #3:
“Each controller includes some amount of discriminative reward information.”
• Can we relax this assumption?
Possibly. Use reward shaping to select policy-invariant reward function.
• What is the benefit?
– H-POMDP could solve problems with sparse reward functions.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Research block #2: Algorithmic analysis
• Goal 2.1: Evaluating performance of the H-POMDP algorithm.
• Goal 2.2: Quantifying the loss due to the hierarchy.
• Goal 2.3: Comparing different possible decompositions of a problem.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Goal 2.1: Performance evaluation
• How does the hierarchical POMDP algorithm compare to:– Exact value function methods
» Sondik, 1971; Monahan, 1982; Littman, 1996; Cassandra et al, 1997.
– Policy search methods» Hansen, 1998; Kearns et al., 1999; Ng&Jordan, 2000; Baxter&Bartlett, 2000.
– Value approximation methods» Parr&Russell, 1995; Thrun, 2000.
– Belief approximation methods» Nourbakhsh, 1995; Koenig&Simmons, 1996; Hauskrecht, 2000; Roy&Thrun,
2000.
– Memory-based methods» McCallum, 1996.
• Consider problems from POMDP literature and dialogue management domain.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Goal 2.2: Quantifying the loss
• The hierarchical POMDP planning algorithm provides an approximately-optimal policy.
• How “near-optimal” is the policy?
• Subject to some (very restrictive) conditions:
“The value function of top-level controller
is an upper-bound on the value
of the approximation.”
• Can we loosen the restrictions? Tighten the bound?
Find a lower-bound?
Atop
A1
...
...
V top(b)Vactual(b)
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Goal 2.3: Comparing different decomposition
• Assumption #4:
“We are given an action set partitioning graph.”
• What makes a good hierarchical action decomposition?
• Comparing decompositions is the first step towards automatic decomposition.
Manufacture Examine Inspect
Replace
a1
a2
Manufacture Replace Examine Inspect
a1
a2 a3
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Research block #3: Model learning
• Goal 3.1: Automatically generating good action hierarchies.
– Assumption #4: “We are given an action set partitioning graph.”
– Can we automatically generate a good hierarchical decomposition?
– Maybe. It is being done for hierarchical MDPs.
• Goal 3.2: Including parameter learning.
– Assumption #5: “We are given a full POMDP model of the problem.”
– Can we introduce parameter learning?
– Yes! Maximum-likelihood parameter optimization (Baum-Welch) can be used for POMDPs.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Touchscreen inputSpeech utterance
Research block #4: System development and application
• Goal 4.1: Building an extensive dialogue manager
Touchscreen messageSpeech utterance
Dialogue Manager
Remindermessage
Robot sensor readings Motion command
Status information
Facemail operations
Robot module
Reminding module
Teleoperation module
User
Remote-controlcommand
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
An implemented scenario
Physiotherapy
Patientroom
Robothome
Problem size: |S|=288, |A|=14, |O|=15State Features: {RobotLocation, UserLocation, UserStatus, ReminderGoal, UserMotionGoal, UserSpeechGoal}
Test subjects: 3 elderly residents in assisted living facility
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Contributions
• Algorithmic contribution: A novel POMDP algorithm based on hierarchical structure.
Enables use of POMDPs for much larger problems.
• Application contribution: Application of POMDPs to dialogue management is novel.
Allows design of robust robot behavioural managers.
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Research schedule
1) Algorithmic design/implementation
2) Algorithmic analysis
3) Model learning
4) System development and application
5) Thesis writing
fall 01
spring/summer 02
spring/summer/fall 02
ongoing
fall 02 / spring 03
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Questions?
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
A simulated robot navigation example
Domain size: |S|=11, |A|=6, |O|=6
GetReward(t)ReadMap
Act
Navigate(t)Read OpenDoor
GoLeft GoRight GoBack GoForward
($$)($$)
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
A dialogue management example
- AskGoWhere- GoToRoom- GoToKitchen- GoToFollow- VerifyRoom- VerifyKitchen- VerifyFollow
- GreetGeneral- GreetMorning- GreetNight- RespondThanks
- AskWeatherTime- SayCurrent- SayToday- SayTomorrow
- StartMeds- NextMeds- ForceMeds- QuitMeds
- AskCallWho- Call911- CallNurse- CallRelative- Verify911- VerifyNurse- VerifyRelative
- AskHealth- OfferHelp
- SayTimeAct
CheckHealth
PhoneDoMedsCheckWeatherMoveGreet
Domain size: |S|=20, |A|=30, |O|=27
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Action hierarchy for implemented scenario
Act
Remind Assist Rest
MoveContact Inform
BringtoPhysioCheckUserPresentDeliverUser
SayWeatherVerifyRequest
SayTime
RemindPhysioPublishStatus
RingBellGotoRoom
VerifyBringVerifyRelease
RechargeGotoHome
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Sondik’s parts manufacturing problem
Manufacture Examine Inspect Replace
a1
a2 a3
Manufacture Examine Inspect
Replace
a1
a2
Decomposition1:
Decomposition2:
+5 more decompositions
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Manufacturing task results
0
0.1
0.2
0.3
0.4
0.5
POMDP D1 D2 D3 D4 D5 D6 D7
MDP
Planning Method
Av
g. R
ew
ard
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
ReminderGoal={none, medsX}CommunicationGoal={none, personX}UserHealth={good, poor, emergency}
Using state/observation abstraction
Action Set: State Set:
CommunicationGoal={none, nurse, 911, relative}
- AskHealth- OfferHelp
CheckHealth
PhoneDoMeds
- AskCallWho- CallHelp- CallNurse- CallRelative- VerifyHelp- VerifyNurse- VerifyRelative
Phone
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Related work on robot planning and control
• Manually-scripted dialogue strategies:– Denecke&Waibel, 1997; Walker et al., 1997.
• Markov decision processes (MDPs) for dialogue management– Levin et al., 1997; Fromer, 1998; Walker et al., 1998; Goddeau&Pineau, 2000; Singh
et al., 2000; Walker, 2000.
• Robot interface:– Torrance, 1996; Asoh et al., 1999.
• Classical planning– Fikes&Nilsson, 1971; Simmons, 1987; McAllester&Rosenblitt, 1991;
Penberthy&Weld, 1992; Kushmerick, 1995; Veloso&al., 1995; Smith&Weld, 1998.
• Execution architectures– Firby, 1987; Musliner, 1993; Simmons, 1994; Bonasso&Kortenkamp, 1996;
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Decision-theoretic planning models
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
-100
-80
-60
-40
-20
0
20
40
0 1
The tiger problem: Value function solution
V
belief
open-right open-leftlisten
S=tiger-left S=tiger-right
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
Optimizing the “investigate” controller
-120
-100
-80
-60
-40
-20
0
0 1
V
open-right
listen
belief
S=tiger-left S=tiger-right
Thesis Proposal: Hierarchical Methods for Planning under Uncertainty Joelle Pineau
-60
-40
-20
0
20
40
60
80
0 1
Optimizing the “act” controller
V
belief
open-left
investigate
S=tiger-left S=tiger-right