Upload
others
View
4
Download
1
Embed Size (px)
Citation preview
Alborz Geramifard
Logic Programming and MDPs for Planning
Winter 2009
MDP+ Logic + Programming
MDP
Index
2
Introduction
Logic Programming
MDP+ Logic + Programming
MDP
Index
2
Introduction
Logic Programming
Why do we care about planning?
3
Sequential Decision Making
4
✙
?
Sequential Decision Making
4
✙
?
Sequential Decision Making
4
✙
?
Sequential Decision Making
4
✙
?
Desired Property?
Sequential Decision Making
4
✙
?
Desired Property?
Feasible: Logic (Focuses on Feasibility)
Sequential Decision Making
4
✙
?
Desired Property?
Feasible: Logic (Focuses on Feasibility)
Optimal: MDPs (Scaling is Hard)
MDP+ Logic + Programming
MDP
Index
5
Introduction
Logic Programming
MDP+ Logic + Programming
MDP
Index
5
Introduction
Logic Programming
Logic
6
Goal Directed Search
Start Goal
State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner)
Feasible Plan: Sequence of actions from start to goal
Actions
7
Clean Hands
Ingredients
Cook
Dinner Ready
Precondition Action Effect
STRIPS, Graph Plan (16.410/16.413)
8
Graph Plan or Forward Search
Not easily scalable
GOLOG: ALGOL in LOGIC
Restrict action space with programs
Scales easier
Results are dependent on high-level programs
Logic Programming
[Levesque, et. al. 97]
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
Relational Fluent: relations with values depending on situations
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
Relational Fluent: relations with values depending on situations
Example: is_carrying(robot, p, s)
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
Relational Fluent: relations with values depending on situations
Example: is_carrying(robot, p, s)
Functional Fluent: situation dependent functions
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
Relational Fluent: relations with values depending on situations
Example: is_carrying(robot, p, s)
Functional Fluent: situation dependent functions
Example: loc(robot, s)
Situation Calculus (Temporal Logic)
9
Situation: S0, do(A,S)
Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))
Relational Fluent: relations with values depending on situations
Example: is_carrying(robot, p, s)
Functional Fluent: situation dependent functions
Example: loc(robot, s)
More expressive than LTL
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
10
GOLOG: Syntax
[Scott Sanner - ICAPS08 Tutorial]
? Aren’t we hard-coding the whole solution?
Blocks World
11
Actions: pickup(b), putOnTable(b), putOn(b1,b2)
Fluents: onTable(b, s), on(b1, b2, s)
GOLOG Program:
while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile
Blocks World
11
Actions: pickup(b), putOnTable(b), putOn(b1,b2)
Fluents: onTable(b, s), on(b1, b2, s)
GOLOG Program:
while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile
? What does it do?
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
Prolog
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
Prolog
S0
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
Prolog
S0
S1
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
Prolog
S0
S1
S2 S3
GOLOG Execution
12
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
Prolog
S0
S1
S2 S3
GOLOG Execution
12
Non-deterministic
S4
Given Start S0, Goal S’, and program δ
Call Do(δ,S0,S’)
First-order logic proof system
Prolog
S0
S1
S2 S3
GOLOG Execution
12
Non-deterministic
MDP+ Logic + Programming
MDP
Index
13
Introduction
Logic Programming
MDP+ Logic + Programming
MDP
Index
13
Introduction
Logic Programming
MDP
!S,A,Pass! ,Ra
ss! , !"
14
B.S.
100%,+85
M.S.
100%,+120
Studying
10%,-50
80%,-50 Studying
30%,-200
70%,-200Ph.D.
10%,-300
Working
100%,+60
Working
Working
B.S., Working, +60, B.S. Studying, -50, M.S. , ...
MDP
15
V !(s) = E
! !"
t=1
!t"1rt
####s0 = s, "
$.
Policy π(s): How to act in each state
Value Function V(s): How good is it to be in each state?
MDP
16
Goal
Find the optimal policy (π*), which maximizes the value function for all states
V !(s) = maxa
E
!r + !V !(s")
"
!!(s) = argmaxa · · ·
Example (Value Iteration)
17
0 -1
Reward:
Actions:
V (s) = maxa
E
!r + !V (s!)
"
Example (Value Iteration)
17
0 -10 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Reward:
Actions:
V (s) = maxa
E
!r + !V (s!)
"
Example (Value Iteration)
17
0 -10 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
Reward:
Actions:
V (s) = maxa
E
!r + !V (s!)
"
Example (Value Iteration)
17
0 -10 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
0 -1 -2 -2
-2 -2 -2 -2
-1 -2 -2 -2
-2 -2 -2 -2
Reward:
Actions:
V (s) = maxa
E
!r + !V (s!)
"
Example (Value Iteration)
17
0 -10 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
-1 -1 -1 -1
0 -1 -2 -2
-2 -2 -2 -2
-1 -2 -2 -2
-2 -2 -2 -2
0 -1 -2 -3
-3 -4 -5 -6
-1 -2 -3 -4
-2 -3 -4 -5
Reward:
Actions:
V (s) = maxa
E
!r + !V (s!)
"
MDP+ Logic + Programming
MDP
Index
18
Introduction
Logic Programming
MDP+ Logic + Programming
MDP
Index
18
Introduction
Logic Programming
DT-GOLOG: Decision Theoretic GOLOG
19
[Boutilier, et. al 00]
DT-GOLOG: Decision Theoretic GOLOG
19
GOLOG
No Optimization
[Boutilier, et. al 00]
DT-GOLOG: Decision Theoretic GOLOG
19
MDP
Scaling is challenging
GOLOG
No Optimization
[Boutilier, et. al 00]
DT-GOLOG: Decision Theoretic GOLOG
19
MDP
Scaling is challenging
GOLOG
No Optimization
All non-deterministic actions are optimized using MDP concepts.
[Boutilier, et. al 00]
DT-GOLOG: Decision Theoretic GOLOG
19
MDP
Scaling is challenging
GOLOG
No Optimization
All non-deterministic actions are optimized using MDP concepts.
Uncertainty, Reward [Boutilier, et. al 00]
Decision Theoretic GOLOG
20
Programming(GOLOG)
Planning(MDP)
Decision Theoretic GOLOG
20
Programming(GOLOG)
Planning(MDP)
Known Solution
High-level Structure
Decision Theoretic GOLOG
20
Programming(GOLOG)
Planning(MDP)
Not enough grasping
Low-level Structure
Known Solution
High-level Structure
Example
21
Build a tower with least effort
Example
21
Build a tower with least effort
Pick a block as base
Stack all other blocks on top of it
Example
21
Build a tower with least effort
Use which block for base?
In which order pick up the blocks
Pick a block as base
Stack all other blocks on top of it
Example
21
Build a tower with least effort
Use which block for base?
In which order pick up the blocks
Pick a block as base
Stack all other blocks on top of itGOLOG
MDP
Example
21
Build a tower with least effort
Use which block for base?
In which order pick up the blocks
Pick a block as base
Stack all other blocks on top of itGOLOG
MDP
? Guaranteed to be optimal?
DT-GOLOG Execution
22
Given Start S0, Goal S’, and program δ
Add rewards
Formulate the problem as an MDP
S4
S0
S1
S2 S3
DT-GOLOG Execution
22
Given Start S0, Goal S’, and program δ
Add rewards
Formulate the problem as an MDP
S4
S0
S1
S2 S3
DT-GOLOG Execution
22
Given Start S0, Goal S’, and program δ
Add rewards
Formulate the problem as an MDP
-1
+3-2
-5
S4
S0
S1
S2 S3
DT-GOLOG Execution
22
Given Start S0, Goal S’, and program δ
Add rewards
Formulate the problem as an MDP
(∃b) ¬onTable(b), b∈{b1, ..., bn}
¬onTable(b1)∨¬onTable(b2)∨... ∨¬onTable(bn)
-1
+3-2
-5
23
First Order Dynamic Programming
Idea:
Logical Structure
Abstract Value Function
Avoid curse of dimensionality!
[Sanner 07]
Resulting MDP can still be intractable.
24
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic ?
V (s) = maxa
!r + !V (s!)
"
24
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic ?
V (s) = maxa
!r + !V (s!)
"
24
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic ?
V (s) = maxa
!r + !V (s!)
"
24
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic ?
V (s) = maxa
!r + !V (s!)
"
24
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic ?
V (s) = maxa
!r + !V (s!)
"
24
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic ?
V (s) = maxa
!r + !V (s!)
"
Representation of Reward and Values
Adding rewards and values
Max Operator
Find S
25
Reward and Value Representation
Case Representation
∃b, b≠b1, on(b,b1) 10
∄b, b≠b1, on(b,b1) 0
b1
rCase =
26
Add Symbolically
! !Similarly defined for and
[Scott Sanner - ICAPS08 Tutorial]
⊕A 10
¬A 20
B 1
¬B 2
A∧B 11
A∧¬B 12
¬A∧B 21
¬A∧¬B 22
=
27
Max operator
max Operator
Φ1 10
Φ2 5
Φ3 3
Φ4 0
max_a =
Φ1 10
¬Φ1∧Φ2 5
¬Φ1∧¬Φ2∧Φ3 3
¬Φ1∧¬Φ2∧¬Φ3∧Φ4 0
a_1
a_2
[Scott Sanner - ICAPS08 Tutorial]
28
Find s? Isnʼt it obvious?
28
Find s? Isnʼt it obvious?
s s’a
28
Find s? Isnʼt it obvious?
s s’a
Dynamic Programming: Given V(s’) find V(s)
In MDPs, we have s explicitly.
In symbolic representation we have it implicitly so we have to build it.
29
Find s = Goal Regression
b1B
b1
regress(Φ1,a)
Weakest relation that ensures Φ1 after taking a
b1
A
Clear(b1) ∧ B≠b1
On(A,b1)
Φ1=clear(b1)
put(A,B)∨?
29
Find s = Goal Regression
b1B
b1
regress(Φ1,a)
Weakest relation that ensures Φ1 after taking a
b1
A
Clear(b1) ∧ B≠b1
On(A,b1)
Φ1=clear(b1)
put(A,B)∨
vCase = maxa
!rCase! ! " regr(vCase, a)
"
30
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic:
?
V (s) = maxa
!r + !V (s!)
"
vCase = maxa
!rCase! ! " regr(vCase, a)
"
30
Symbolic Dynamic Programming (Deterministic)
Tabular:
Symbolic:
V (s) = maxa
!r + !V (s!)
"
31
Classical Example
Box Truck City
Goal: Have a box in Paris
∃b, BoxIn(b,Paris) 10
else 0rCase =
32
Classical Example
Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop
load and unload have 10% chance of failure
Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c)
Assumptions:
All cities are connected.
ϒ = .9
33
Example[Sanner 07]
∃b, BoxIn(b,Paris) 100 noop
else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t)
else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris)
else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t)
else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1)
else 0 noop
s V*(s) π*(s)
33
Example[Sanner 07]
∃b, BoxIn(b,Paris) 100 noop
else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t)
else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris)
else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t)
else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1)
else 0 noop
? What did we gain by going through all of this?
s V*(s) π*(s)
Conclusion
34
Conclusion
34
Logic Programming
Situation Calculus
GOLOG
Planning
Conclusion
34
Logic Programming
Situation Calculus
GOLOG
Planning
MDP
Review
Value Iteration
Conclusion
34
Logic Programming
Situation Calculus
GOLOG
Planning
MDP
Review
Value Iteration
MDP+ Logic + Programming
DT-GOLOG Symbolic DP
References
35
Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997
Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998
Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: 355-362
S. Sanner, and K. Kersting, ”Symbolic dynamic programming”. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007
References
36
Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp.690--697 (2001).
37
Questions ?
37
Questions ?