Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Alborz Geramifard

Logic Programming and MDPs for Planning

Winter 2009

MDP+ Logic + Programming

MDP

Index

2

Introduction

Logic Programming


MDP

Index

2

Introduction

Logic Programming

Why do we care about planning?

3

Sequential Decision Making

4

✙

?


4

✙

?


4

✙

?


4

✙

?

Desired Property?


4

✙

?

Desired Property?

Feasible: Logic (Focuses on Feasibility)


4

✙

?

Desired Property?

Feasible: Logic (Focuses on Feasibility)

Optimal: MDPs (Scaling is Hard)


MDP

Index

5

Introduction

Logic Programming


MDP

Index

5

Introduction

Logic Programming

Logic

6

Goal Directed Search

Start Goal

State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner)

Feasible Plan: Sequence of actions from start to goal

Actions

7

Clean Hands

Ingredients

Cook

Dinner Ready

Precondition Action Effect

STRIPS, Graph Plan (16.410/16.413)

8

Graph Plan or Forward Search

Not easily scalable

GOLOG: ALGOL in LOGIC

Restrict action space with programs

Scales easier

Results are dependent on high-level programs

Logic Programming

[Levesque, et. al. 97]

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)


9


Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))


9



Relational Fluent: relations with values depending on situations


9




Example: is_carrying(robot, p, s)


9





Functional Fluent: situation dependent functions


9






Example: loc(robot, s)


9






Example: loc(robot, s)

More expressive than LTL

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


10

GOLOG: Syntax


? Aren’t we hard-coding the whole solution?

Blocks World

11

Actions: pickup(b), putOnTable(b), putOn(b1,b2)

Fluents: onTable(b, s), on(b1, b2, s)

GOLOG Program:

while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile

Blocks World

11

Actions: pickup(b), putOnTable(b), putOn(b1,b2)

Fluents: onTable(b, s), on(b1, b2, s)

GOLOG Program:

while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile

? What does it do?

GOLOG Execution

12

Given Start S0, Goal S’, and program δ

GOLOG Execution

12


Call Do(δ,S0,S’)

GOLOG Execution

12


Call Do(δ,S0,S’)

First-order logic proof system

GOLOG Execution

12


Call Do(δ,S0,S’)


Prolog

GOLOG Execution

12


Call Do(δ,S0,S’)


Prolog

S0

GOLOG Execution

12


Call Do(δ,S0,S’)


Prolog

S0

S1

GOLOG Execution

12


Call Do(δ,S0,S’)


Prolog

S0

S1

S2 S3

GOLOG Execution

12


Call Do(δ,S0,S’)


Prolog

S0

S1

S2 S3

GOLOG Execution

12

Non-deterministic

S4


Call Do(δ,S0,S’)


Prolog

S0

S1

S2 S3

GOLOG Execution

12

Non-deterministic


MDP

Index

13

Introduction

Logic Programming


MDP

Index

13

Introduction

Logic Programming

MDP

!S,A,Pass! ,Ra

ss! , !"

14

B.S.

100%,+85

M.S.

100%,+120

Studying

10%,-50

80%,-50 Studying

30%,-200

70%,-200Ph.D.

10%,-300

Working

100%,+60

Working

Working

B.S., Working, +60, B.S. Studying, -50, M.S. , ...

MDP

15

V !(s) = E

! !"

t=1

!t"1rt

####s0 = s, "

$.

Policy π(s): How to act in each state

Value Function V(s): How good is it to be in each state?

MDP

16

Goal

Find the optimal policy (π*), which maximizes the value function for all states

V !(s) = maxa

E

!r + !V !(s")

"

!!(s) = argmaxa · · ·

Example (Value Iteration)

17

0 -1

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"


17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"


17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"


17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

0 -1 -2 -2

-2 -2 -2 -2

-1 -2 -2 -2

-2 -2 -2 -2

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"


17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

0 -1 -2 -2

-2 -2 -2 -2

-1 -2 -2 -2

-2 -2 -2 -2

0 -1 -2 -3

-3 -4 -5 -6

-1 -2 -3 -4

-2 -3 -4 -5

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"


MDP

Index

18

Introduction

Logic Programming


MDP

Index

18

Introduction

Logic Programming

DT-GOLOG: Decision Theoretic GOLOG

19

[Boutilier, et. al 00]


19

GOLOG

No Optimization



19

MDP

Scaling is challenging

GOLOG

No Optimization



19

MDP


GOLOG

No Optimization

All non-deterministic actions are optimized using MDP concepts.



19

MDP


GOLOG

No Optimization

All non-deterministic actions are optimized using MDP concepts.

Uncertainty, Reward [Boutilier, et. al 00]

Decision Theoretic GOLOG

20

Programming(GOLOG)

Planning(MDP)


20

Programming(GOLOG)

Planning(MDP)

Known Solution

High-level Structure


20

Programming(GOLOG)

Planning(MDP)

Not enough grasping

Low-level Structure

Known Solution

High-level Structure

Example

21

Build a tower with least effort

Example

21


Pick a block as base

Stack all other blocks on top of it

Example

21


Use which block for base?

In which order pick up the blocks


Stack all other blocks on top of it

Example

21





Stack all other blocks on top of itGOLOG

MDP

Example

21





Stack all other blocks on top of itGOLOG

MDP

? Guaranteed to be optimal?

DT-GOLOG Execution

22


Add rewards

Formulate the problem as an MDP

S4

S0

S1

S2 S3

DT-GOLOG Execution

22


Add rewards


S4

S0

S1

S2 S3

DT-GOLOG Execution

22


Add rewards


-1

+3-2

-5

S4

S0

S1

S2 S3

DT-GOLOG Execution

22


Add rewards


(∃b) ¬onTable(b), b∈{b1, ..., bn}

¬onTable(b1)∨¬onTable(b2)∨... ∨¬onTable(bn)

-1

+3-2

-5

23

First Order Dynamic Programming

Idea:

Logical Structure

Abstract Value Function

Avoid curse of dimensionality!

[Sanner 07]

Resulting MDP can still be intractable.

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

24


Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

24


Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

24


Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

24


Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

24


Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Representation of Reward and Values

Adding rewards and values

Max Operator

Find S

25

Reward and Value Representation

Case Representation

∃b, b≠b1, on(b,b1) 10

∄b, b≠b1, on(b,b1) 0

b1

rCase =

26

Add Symbolically

! !Similarly defined for and


⊕A 10

¬A 20

B 1

¬B 2

A∧B 11

A∧¬B 12

¬A∧B 21

¬A∧¬B 22

=

27

Max operator

max Operator

Φ1 10

Φ2 5

Φ3 3

Φ4 0

max_a =

Φ1 10

¬Φ1∧Φ2 5

¬Φ1∧¬Φ2∧Φ3 3

¬Φ1∧¬Φ2∧¬Φ3∧Φ4 0

a_1

a_2


28

Find s? Isnʼt it obvious?

28


s s’a

28


s s’a

Dynamic Programming: Given V(s’) find V(s)

In MDPs, we have s explicitly.

In symbolic representation we have it implicitly so we have to build it.

29

Find s = Goal Regression

b1B

b1

regress(Φ1,a)

Weakest relation that ensures Φ1 after taking a

b1

A

Clear(b1) ∧ B≠b1

On(A,b1)

Φ1=clear(b1)

put(A,B)∨?

29

Find s = Goal Regression

b1B

b1

regress(Φ1,a)

Weakest relation that ensures Φ1 after taking a

b1

A

Clear(b1) ∧ B≠b1

On(A,b1)

Φ1=clear(b1)

put(A,B)∨

vCase = maxa

!rCase! ! " regr(vCase, a)

"

30


Tabular:

Symbolic:

?

V (s) = maxa

!r + !V (s!)

"

vCase = maxa

!rCase! ! " regr(vCase, a)

"

30


Tabular:

Symbolic:

V (s) = maxa

!r + !V (s!)

"

31

Classical Example

Box Truck City

Goal: Have a box in Paris

∃b, BoxIn(b,Paris) 10

else 0rCase =

32

Classical Example

Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop

load and unload have 10% chance of failure

Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c)

Assumptions:

All cities are connected.

ϒ = .9

33

Example[Sanner 07]

∃b, BoxIn(b,Paris) 100 noop

else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t)

else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris)

else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t)

else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1)

else 0 noop

s V*(s) π*(s)

33

Example[Sanner 07]

∃b, BoxIn(b,Paris) 100 noop

else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t)

else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris)

else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t)

else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1)

else 0 noop

? What did we gain by going through all of this?

s V*(s) π*(s)

Conclusion

34

Conclusion

34

Logic Programming

Situation Calculus

GOLOG

Planning

Conclusion

34

Logic Programming

Situation Calculus

GOLOG

Planning

MDP

Review

Value Iteration

Conclusion

34

Logic Programming

Situation Calculus

GOLOG

Planning

MDP

Review

Value Iteration


DT-GOLOG Symbolic DP

References

35

Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997

Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998

Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: 355-362

S. Sanner, and K. Kersting, ”Symbolic dynamic programming”. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007

References

36

Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp.690--697 (2001).

37

Questions ?

37

Questions ?

Documents

Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks