23
Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Embed Size (px)

Citation preview

Page 1: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Modified MDPs for Concurrent Execution

AnYuan Guo

Victor Lesser

University of Massachusetts

Page 2: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Concurrent Execution

A set of tasks where each task is relatively easy to solve on its own, but when executed concurrently, new interactions arise that complicate the execution of the composite task.

Single agent executing multiple tasks in parallel (example: office robot)

Multiple agents act in parallel (team)

Page 3: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Cross Product MDP

The problem of concurrent execution can be solved optimally by solving the cross product MDP formed by the separate processes.

Problem: exponential blow up

Page 4: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Related Work Deterministic Planning

- Situation calculus [Reiter96] - Extending STRIPS [Boutilier97, Knoblock94]

Termination schemes for temporally extended actions [Rohanimanesh03] Planning in cross-product MDP [Singh98] Learning ( W-learning [Humphrys96], MAXQ [Dietterich00])

Page 5: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

The Goal

Somehow break apart the interactions, encapsulate them within each agent, so they can again be solved independently.

Page 6: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Algorithm Summary

Define the types of events and interactions of interest

Summarize the other agent’s effect on self in terms of statistical information of how often the constraining event occurs

Change my model to reflect this statistic

Page 7: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Events in MDP State based events (agent enters s5) Action based events (agent moves

north 1 step) State-action based events (agent

moves north 1 step from s4)

Events in MDP1 affect events in MDP2,a total of 9 types of interactions

Page 8: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Assumptions The list of possible interactions

between the MDPs are given The constraints are one-way only.

The effects do not propagate back

to the originator of the constraint.

Page 9: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Directed Acyclic Constraints

Constraints between a set of events that forms a directed acyclic graph.

Page 10: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Event Frequency & MDP modification

event 1 event 2

1) Calculate frequency

2) Modify MDP

Page 11: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Calculating State Visitation Frequency

Given a policy , solve the system of simultaneous linear equations:

Ss

sssTsFsF'

)'),'(,()()'(

Under the constraint that:

Ss

sF 1)(

Page 12: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Calculating Action Frequencies

Given a policy , the action frequency F(a) is the sum of the visitation frequencies of all the states in which action a is executed.

Bs

sFaF )()(

where })(|{ assB

Page 13: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Calculating State-Action Frequencies

0

)(),(

sFasF

otherwise

if as )(

Now both the action and the state at which it is executed matters:

Also generalizes to a set of statesand actions.

Page 14: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Account for the Effects of Constraints

Modify the model Modify the transition probability

table Intuition: other agents can change

the dynamics of my environment

RTAS ,,,

Example: A1

A2

Page 15: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Account for State Based Events

A constraint from another task can affect the current task’s ability to enter certain states:

P(s1,a1,s1)

P(s1, a1, s2)

P(s1, a1, s3)

P(s2,a1,s1)

P(s2, a1, s2)

P(s2, a1, s3)

P(s3,a1,s1)

P(s3, a1, s2)

P(s3, a1, s3)

s1

s2

s3

s3s2s1

A slice of the TPT: under action a1.

from:

to:

Page 16: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Account for Action Based Events

A constraint from another task can affect the current task’s ability to carry out certain actions:

P(s1,a1,s1)

P(s1, a1, s2)

P(s1, a1, s3)

P(s2, a1, s1)

P(s2,a1,s2)

P(s2, a1, s3)

P(s3, a1, s1)

P(s3, a1, s2)

P(s3,a1,s3)

s1

s2

s3

s1 s2 s3

TPT foraffected action a1

Page 17: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Account for State-Action Based Events

A constraint from another task can affect the current task’s ability to carry out certain actions at certainstates: s1 s2 s3

P(s1, a1, s1) P(s1, a1, s2)

P(s1, a1, s3)

P(s2, a1, s1) P(s2, a1, s2)

P(s2, a1, s3)

P(s3, a1, s1) P(s3, a1, s2)

P(s3,a1,s3)

s1

s2

s3

TPT for affected action a1

Page 18: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Experiments

States (location of the agent) Actions (move up, down, left, right or any

of the 4 diagonal steps, 8 total) Transitions (0.05 of slipping to an adjacent

state rather than intended) Rewards (-1, -3 for diagonal, 100 for goal) Constraint: agent 1 taking the “up” action

prevents agent 2 from doing so

The mountain climbing scenario:

Page 19: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Results: Policies

Policies when executingindependently

Policies when executedconcurrently, after weapply the algorithm

Page 20: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Results

Size of State Space

Average Value of Policy

Page 21: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

Improvements

Explore different ways to modify the MDP (e.g. shrink action set)

Relax the directed-acyclic constraint restriction (take an iterative approach)

Show that it is optimal for summaries that consist of a single random variable

Page 22: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

New Directions

Different types of summaries - steady state behavior (current work)

- multi-state summaries - summaries with temporal information

Dynamic task arrival/departure: - given some model of arrival - without model – learning

Positive interactions (e.g. enable)

Page 23: Modified MDPs for Concurrent Execution AnYuan Guo Victor Lesser University of Massachusetts

The End