103
Alborz Geramifard Logic Programming and MDPs for Planning Winter 2009

Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Alborz Geramifard

Logic Programming and MDPs for Planning

Winter 2009

Page 2: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

2

Introduction

Logic Programming

Page 3: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

2

Introduction

Logic Programming

Page 4: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Why do we care about planning?

3

Page 5: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Sequential Decision Making

4

?

Page 6: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Sequential Decision Making

4

?

Page 7: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Sequential Decision Making

4

?

Page 8: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Sequential Decision Making

4

?

Desired Property?

Page 9: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Sequential Decision Making

4

?

Desired Property?

Feasible: Logic (Focuses on Feasibility)

Page 10: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Sequential Decision Making

4

?

Desired Property?

Feasible: Logic (Focuses on Feasibility)

Optimal: MDPs (Scaling is Hard)

Page 11: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

5

Introduction

Logic Programming

Page 12: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

5

Introduction

Logic Programming

Page 13: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Logic

6

Goal Directed Search

Start Goal

State: Set of propositions (e.g. ¬Quite, Garbage, ¬Dinner)

Feasible Plan: Sequence of actions from start to goal

Page 14: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Actions

7

Clean Hands

Ingredients

Cook

Dinner Ready

Precondition Action Effect

STRIPS, Graph Plan (16.410/16.413)

Page 15: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

8

Graph Plan or Forward Search

Not easily scalable

GOLOG: ALGOL in LOGIC

Restrict action space with programs

Scales easier

Results are dependent on high-level programs

Logic Programming

[Levesque, et. al. 97]

Page 16: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Page 17: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

Page 18: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

Relational Fluent: relations with values depending on situations

Page 19: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

Relational Fluent: relations with values depending on situations

Example: is_carrying(robot, p, s)

Page 20: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

Relational Fluent: relations with values depending on situations

Example: is_carrying(robot, p, s)

Functional Fluent: situation dependent functions

Page 21: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

Relational Fluent: relations with values depending on situations

Example: is_carrying(robot, p, s)

Functional Fluent: situation dependent functions

Example: loc(robot, s)

Page 22: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Situation Calculus (Temporal Logic)

9

Situation: S0, do(A,S)

Example: do(putdown(A), do(walk(L), do(pickup(A), S0)))

Relational Fluent: relations with values depending on situations

Example: is_carrying(robot, p, s)

Functional Fluent: situation dependent functions

Example: loc(robot, s)

More expressive than LTL

Page 23: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 24: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 25: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 26: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 27: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 28: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 29: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 30: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 31: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 32: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

Page 33: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

10

GOLOG: Syntax

[Scott Sanner - ICAPS08 Tutorial]

? Aren’t we hard-coding the whole solution?

Page 34: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Blocks World

11

Actions: pickup(b), putOnTable(b), putOn(b1,b2)

Fluents: onTable(b, s), on(b1, b2, s)

GOLOG Program:

while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile

Page 35: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Blocks World

11

Actions: pickup(b), putOnTable(b), putOn(b1,b2)

Fluents: onTable(b, s), on(b1, b2, s)

GOLOG Program:

while (∃b) ¬onTable(b) then (pickup(b), putOnTable(b)) endWhile

? What does it do?

Page 36: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

GOLOG Execution

12

Page 37: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

GOLOG Execution

12

Page 38: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

GOLOG Execution

12

Page 39: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

GOLOG Execution

12

Page 40: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

Prolog

GOLOG Execution

12

Page 41: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

Prolog

S0

GOLOG Execution

12

Page 42: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

Prolog

S0

S1

GOLOG Execution

12

Page 43: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

Prolog

S0

S1

S2 S3

GOLOG Execution

12

Page 44: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

Prolog

S0

S1

S2 S3

GOLOG Execution

12

Non-deterministic

Page 45: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

S4

Given Start S0, Goal S’, and program δ

Call Do(δ,S0,S’)

First-order logic proof system

Prolog

S0

S1

S2 S3

GOLOG Execution

12

Non-deterministic

Page 46: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

13

Introduction

Logic Programming

Page 47: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

13

Introduction

Logic Programming

Page 48: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP

!S,A,Pass! ,Ra

ss! , !"

14

B.S.

100%,+85

M.S.

100%,+120

Studying

10%,-50

80%,-50 Studying

30%,-200

70%,-200Ph.D.

10%,-300

Working

100%,+60

Working

Working

B.S., Working, +60, B.S. Studying, -50, M.S. , ...

Page 49: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP

15

V !(s) = E

! !"

t=1

!t"1rt

####s0 = s, "

$.

Policy π(s): How to act in each state

Value Function V(s): How good is it to be in each state?

Page 50: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP

16

Goal

Find the optimal policy (π*), which maximizes the value function for all states

V !(s) = maxa

E

!r + !V !(s")

"

!!(s) = argmaxa · · ·

Page 51: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example (Value Iteration)

17

0 -1

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"

Page 52: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example (Value Iteration)

17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"

Page 53: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example (Value Iteration)

17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"

Page 54: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example (Value Iteration)

17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

0 -1 -2 -2

-2 -2 -2 -2

-1 -2 -2 -2

-2 -2 -2 -2

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"

Page 55: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example (Value Iteration)

17

0 -10 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 -1

0 -1 -2 -2

-2 -2 -2 -2

-1 -2 -2 -2

-2 -2 -2 -2

0 -1 -2 -3

-3 -4 -5 -6

-1 -2 -3 -4

-2 -3 -4 -5

Reward:

Actions:

V (s) = maxa

E

!r + !V (s!)

"

Page 56: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

18

Introduction

Logic Programming

Page 57: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

MDP+ Logic + Programming

MDP

Index

18

Introduction

Logic Programming

Page 58: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

DT-GOLOG: Decision Theoretic GOLOG

19

[Boutilier, et. al 00]

Page 59: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

DT-GOLOG: Decision Theoretic GOLOG

19

GOLOG

No Optimization

[Boutilier, et. al 00]

Page 60: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

DT-GOLOG: Decision Theoretic GOLOG

19

MDP

Scaling is challenging

GOLOG

No Optimization

[Boutilier, et. al 00]

Page 61: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

DT-GOLOG: Decision Theoretic GOLOG

19

MDP

Scaling is challenging

GOLOG

No Optimization

All non-deterministic actions are optimized using MDP concepts.

[Boutilier, et. al 00]

Page 62: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

DT-GOLOG: Decision Theoretic GOLOG

19

MDP

Scaling is challenging

GOLOG

No Optimization

All non-deterministic actions are optimized using MDP concepts.

Uncertainty, Reward [Boutilier, et. al 00]

Page 63: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Decision Theoretic GOLOG

20

Programming(GOLOG)

Planning(MDP)

Page 64: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Decision Theoretic GOLOG

20

Programming(GOLOG)

Planning(MDP)

Known Solution

High-level Structure

Page 65: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Decision Theoretic GOLOG

20

Programming(GOLOG)

Planning(MDP)

Not enough grasping

Low-level Structure

Known Solution

High-level Structure

Page 66: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example

21

Build a tower with least effort

Page 67: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example

21

Build a tower with least effort

Pick a block as base

Stack all other blocks on top of it

Page 68: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example

21

Build a tower with least effort

Use which block for base?

In which order pick up the blocks

Pick a block as base

Stack all other blocks on top of it

Page 69: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example

21

Build a tower with least effort

Use which block for base?

In which order pick up the blocks

Pick a block as base

Stack all other blocks on top of itGOLOG

MDP

Page 70: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Example

21

Build a tower with least effort

Use which block for base?

In which order pick up the blocks

Pick a block as base

Stack all other blocks on top of itGOLOG

MDP

? Guaranteed to be optimal?

Page 71: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ

Add rewards

Formulate the problem as an MDP

Page 72: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

S4

S0

S1

S2 S3

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ

Add rewards

Formulate the problem as an MDP

Page 73: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

S4

S0

S1

S2 S3

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ

Add rewards

Formulate the problem as an MDP

-1

+3-2

-5

Page 74: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

S4

S0

S1

S2 S3

DT-GOLOG Execution

22

Given Start S0, Goal S’, and program δ

Add rewards

Formulate the problem as an MDP

(∃b) ¬onTable(b), b∈{b1, ..., bn}

¬onTable(b1)∨¬onTable(b2)∨... ∨¬onTable(bn)

-1

+3-2

-5

Page 75: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

23

First Order Dynamic Programming

Idea:

Logical Structure

Abstract Value Function

Avoid curse of dimensionality!

[Sanner 07]

Resulting MDP can still be intractable.

Page 76: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Page 77: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Page 78: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Page 79: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Page 80: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Page 81: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

24

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic ?

V (s) = maxa

!r + !V (s!)

"

Representation of Reward and Values

Adding rewards and values

Max Operator

Find S

Page 82: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

25

Reward and Value Representation

Case Representation

∃b, b≠b1, on(b,b1) 10

∄b, b≠b1, on(b,b1) 0

b1

rCase =

Page 83: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

26

Add Symbolically

! !Similarly defined for and

[Scott Sanner - ICAPS08 Tutorial]

⊕A 10

¬A 20

B 1

¬B 2

A∧B 11

A∧¬B 12

¬A∧B 21

¬A∧¬B 22

=

Page 84: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

27

Max operator

max Operator

Φ1 10

Φ2 5

Φ3 3

Φ4 0

max_a =

Φ1 10

¬Φ1∧Φ2 5

¬Φ1∧¬Φ2∧Φ3 3

¬Φ1∧¬Φ2∧¬Φ3∧Φ4 0

a_1

a_2

[Scott Sanner - ICAPS08 Tutorial]

Page 85: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

28

Find s? Isnʼt it obvious?

Page 86: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

28

Find s? Isnʼt it obvious?

s s’a

Page 87: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

28

Find s? Isnʼt it obvious?

s s’a

Dynamic Programming: Given V(s’) find V(s)

In MDPs, we have s explicitly.

In symbolic representation we have it implicitly so we have to build it.

Page 88: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

29

Find s = Goal Regression

b1B

b1

regress(Φ1,a)

Weakest relation that ensures Φ1 after taking a

b1

A

Clear(b1) ∧ B≠b1

On(A,b1)

Φ1=clear(b1)

put(A,B)∨?

Page 89: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

29

Find s = Goal Regression

b1B

b1

regress(Φ1,a)

Weakest relation that ensures Φ1 after taking a

b1

A

Clear(b1) ∧ B≠b1

On(A,b1)

Φ1=clear(b1)

put(A,B)∨

Page 90: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

vCase = maxa

!rCase! ! " regr(vCase, a)

"

30

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic:

?

V (s) = maxa

!r + !V (s!)

"

Page 91: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

vCase = maxa

!rCase! ! " regr(vCase, a)

"

30

Symbolic Dynamic Programming (Deterministic)

Tabular:

Symbolic:

V (s) = maxa

!r + !V (s!)

"

Page 92: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

31

Classical Example

Box Truck City

Goal: Have a box in Paris

∃b, BoxIn(b,Paris) 10

else 0rCase =

Page 93: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

32

Classical Example

Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop

load and unload have 10% chance of failure

Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c)

Assumptions:

All cities are connected.

ϒ = .9

Page 94: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

33

Example[Sanner 07]

∃b, BoxIn(b,Paris) 100 noop

else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t)

else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris)

else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t)

else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1)

else 0 noop

s V*(s) π*(s)

Page 95: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

33

Example[Sanner 07]

∃b, BoxIn(b,Paris) 100 noop

else, ∃b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 89 unload(b,t)

else, ∃b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 80 drive(t,c,paris)

else, ∃b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 72 load(b,t)

else, ∃b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) 65 drive(t,c2,c1)

else 0 noop

? What did we gain by going through all of this?

s V*(s) π*(s)

Page 96: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Conclusion

34

Page 97: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Conclusion

34

Logic Programming

Situation Calculus

GOLOG

Planning

Page 98: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Conclusion

34

Logic Programming

Situation Calculus

GOLOG

Planning

MDP

Review

Value Iteration

Page 99: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

Conclusion

34

Logic Programming

Situation Calculus

GOLOG

Planning

MDP

Review

Value Iteration

MDP+ Logic + Programming

DT-GOLOG Symbolic DP

Page 100: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

References

35

Levesque, H.,Reiter, R., Lespérance, Y., Lin, F., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997

Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998

Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “Decision-Theoretic, High-Level Agent Programming in the Situation Calculus”. AAAI/IAAI 2000: 355-362

S. Sanner, and K. Kersting, ”Symbolic dynamic programming”. Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007

Page 101: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

References

36

Craig Boutilier, Ray Reiter and Bob Price, “Symbolic Dynamic Programming for First-order MDPs”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), Seattle, pp.690--697 (2001).

Page 102: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

37

Questions ?

Page 103: Logic Programming and MDPs for Planningpeople.csail.mit.edu/agf/Files/Presentation/DT-GOLOG - 16.412 Class.pdfBuild a tower with least effort Pick a block as base Stack all other blocks

37

Questions ?