94
Influence Diagrams for Robust Decision Making in Multiagent Settings

Influence Diagrams for Robust Decision Making in Multiagent Settings

  • Upload
    idalee

  • View
    51

  • Download
    6

Embed Size (px)

DESCRIPTION

Influence Diagrams for Robust Decision Making in Multiagent Settings. Prashant Doshi University of Georgia, USA. http://thinc.cs.uga.edu. Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof., Aalborg Univ. Yingke Chen Post doctoral student. Muthu Chandrasekaran - PowerPoint PPT Presentation

Citation preview

Page 1: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Influence Diagrams for Robust Decision Making in Multiagent Settings

Page 2: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Prashant Doshi University of Georgia, USA

Page 3: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

http://thinc.cs.uga.edu

Page 4: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Yingke ChenPost doctoral student

Yifeng ZengReader, Teesside Univ.

Previously: Assoc Prof., Aalborg Univ.

Muthu ChandrasekaranDoctoral student

Page 5: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Influence diagram

Page 6: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Ai Ri

Oi

S

ID for decision making where state may be partially observable

Page 7: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

How do we generalize IDs to multiagent settings?

Page 8: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Adversarial tiger problem

Page 9: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Multiagent influence diagram (MAID)(Koller&Milch01)

MAIDs offer a richer representation for a game and may be transformed into a normal- or extensive-form game A strategy of an agent is an assignment of a decision rule

to every decision node of that agent

Open or Listeni

Rj

Growli

Tiger loc

Open or Listenj

Growlj

Ri

Page 10: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Expected utility of a strategy profile to agent i is the sum of the expected utilities at each of i’s decision node

A strategy profile is in Nash equilibrium if each agent’s strategy in the profile is optimal given others’ strategies

Open or Listeni

Rj

Growli

Tiger loc

Open or Listenj

Growlj

Ri

Page 11: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Strategic relevanceConsider two strategy profiles which differ in the

decision rule at D’ only. A decision node, D, strategically relies on another, D’, if D‘s decision rule

does not remain optimal in both profiles.

Page 12: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Is there a way of finding all decision nodes that are strategically relevant to

D using the graphical structure?

Yes, s-reachabilityAnalogous to d-separation for determining

conditional independence in BNs

Page 13: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Evaluating whether a decision rule at D is optimal in a given strategy profile involves removing decision nodes

that are not s-relevant to D and transforming the decision and utility nodes into chance nodes

Open or Listeni

Rj

Growli

Tiger loc

Open or Listenj

Growlj

Ri

Page 14: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

What if the agents are using differing models of the same game to make decisions, or are

uncertain about the mental models others are using?

Page 15: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Let agent i believe with probability, p, that j will listen and with 1- p that j will do the best response decision

Analogously, j believes that i will open a door with probability q, otherwise play the best response

Open or Listeni

Rj

Growli

Tiger loc

Open or Listenj

Growlj

Ri

Page 16: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Network of ID (NID)

Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision

Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response

Listen Open

L OL OR0.9 0.05 0.05

L OL OR0.1 0.45 0.45

Block L Block O

Top-level

ListenOpen

q p

(Gal&Pfeffer08)

Page 17: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Let agent i believe with probability, p, that j will likely listen and with 1- p that j will do the best response decision

Analogously, j believes that i will mostly open a door with probability q, otherwise play the best response

Open or Listeni

Rj

Growli

Tiger loc

Open or Listenj

Growlj

Ri

Top-level Block -- MAID

Page 18: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

MAID representation for the NID

BR[i]TL

RTLj

GrowlTLi

Tiger locTL

BR[j]TL

GrowlTLj

RTLi

Mod[j;Di]

OpenO

Open or ListenTL

i

Mod[i;Dj]

ListenL

Open or ListenTL

j

Page 19: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

MAIDs and NIDsRich languages for games based on IDs that

models problem structure by exploiting conditional independence

Page 20: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

MAIDs and NIDsFocus is on computing equilibrium,

which does not allow for best response to a distribution of non-equilibrium behaviors

Do not model dynamic games

Page 21: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Generalize IDs to dynamic interactions in multiagent settings

Page 22: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Challenge: Other agents could be updating beliefs and changing strategies

Page 23: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Model node: Mj,l-1

models of agent j at level l-1

Policy link: dashed arrowDistribution over the other agent’s actions given its models

Belief on Mj,l-1: Pr(Mj,l-1|s)

Open or Listeni

Ri

Growli

Tiger loci

Open or Listenj

Mj,l-1

Level l I-ID

Page 24: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Members of the model nodeDifferent chance nodes are

solutions of models mj,l-1

Mod[Mj] represents the different models of agent j

Mod[Mj]

Aj1

Aj2

Mj,l-1

S

mj,l-11

mj,l-12

Open or Listenj

mj,l-11, mj,l-1

2 could be I-IDs , IDs or simple distributions

Page 25: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

CPT of the chance node Aj is a multiplexer

Assumes the distribution of each of the action nodes (Aj

1, Aj2)

depending on the value of Mod[Mj]

Mod[Mj]

Aj1

Aj2

Mj,l-1

S

mj,l-11

mj,l-12

Aj

Page 26: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Could I-IDs be extended over time?

We must address the challenge

Page 27: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Ait+1

Ri

Oit+1

St+1

Ajt+1

Mj,l-1t+1

Ait

Ri

Oit

St

Ajt

Mj,l-1t

Model update link

Page 28: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Interactive dynamic influence diagram (I-DID)

Page 29: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

How do we implement the model update link?

Page 30: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

mj,l-1t,2

Mod[Mjt]

Aj1

Mj,l-1t

st

mj,l-1t,1

Ajt

Aj2

Oj1

Oj2

Oj

Mod[Mjt+1]

Aj1

Mj,l-1t+1

mj,l-1t+1,1

mj,l-1t+1,2

Ajt+1

Aj2

Aj3

Aj4

mj,l-1t+1,3

mj,l-1t+1,4

Page 31: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

mj,l-1t,2

Mod[Mjt]

Aj1

Mj,l-1t

st

mj,l-1t,1

Ajt

Aj2

Oj1

Oj2

Oj

Mod[Mjt+1]

Aj1

Mj,l-1t+1

mj,l-1t+1,1

mj,l-1t+1,2

Ajt+1

Aj2

Aj3

Aj4

mj,l-1t+1,3

mj,l-1t+1,4

These models differ in their initial beliefs, each of which is the result of j updating its beliefs due to its actions and possible observations

Page 32: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Recap

Page 33: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Prashant Doshi, Yifeng Zeng and Qiongyu Chen, “Graphical Models for Interactive POMDPs: Representations and Solutions”, Journal of AAMAS, 18(3):376-416, 2009

Daphne Koller and Brian Milch, “Multi-Agent Influence Diagrams for Representing and Solving Games”, Games and Economic Behavior, 45(1):181-221, 2003

Ya’akov Gal and Avi Pfeffer, “Networks of Influence Diagrams: A Formalism for Representing Agent’s Beliefs and Decision-Making Processes”,Journal of AI Research, 33:109-147, 2008

Page 34: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

How large is the behavioral model space?

Page 35: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

How large is the behavioral model space?

General definitionA mapping from the agent’s history of

observations to its actions

Page 36: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

How large is the behavioral model space?

2H (Aj)Uncountably infinite

Page 37: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

How large is the behavioral model space?

Let’s assume computable models

Countable

A very large portion of the model space is not computable!

Page 38: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Daniel DennettPhilosopher and Cognitive Scientist

Intentional stanceAscribe beliefs, preferences and intent to explain others’ actions

(analogous to theory of mind - ToM)

Page 39: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Organize the mental models

Intentional modelsSubintentional models

Page 40: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Organize the mental modelsIntentional models

E.g., POMDP = bj, Aj, Tj, j, Oj, Rj, OCj (using DIDs) BDI, ToM

Subintentional models

Frame(may give rise to recursive modeling)

Page 41: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Organize the mental modelsIntentional models

E.g., POMDP = bj, Aj, Tj, j, Oj, Rj, OCj (using DIDs)

BDI, ToMSubintentional models

E.g., (Aj), finite state controller, plan

Frame

Page 42: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Finite model space grows as the interaction progresses

Page 43: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Growth in the model space

Other agent may receive any one of |j| observations

|Mj| |Mj||j| |Mj||j|2 ... |Mj||j|t

0 1 2 t

Page 44: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Growth in the model space

Exponential

Page 45: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

General model space is large and grows exponentially as the interaction progresses

Page 46: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

It would be great if we can compress this space!

No loss in value to the modelerFlexible loss in value for greater compression

LosslessLossy

Page 47: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Expansive usefulness of model space compression to many areas:

1. Sequential decision making in multiagent settings using I-DIDs

2. Bayesian plan recognition3. Games of imperfect information

Page 48: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

General and domain-independent approach for compression

Establish equivalence relations that partition the model space and retain representative models from each equivalence class

Page 49: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Approach #1: Behavioral equivalence (Rathanasabapathy et al.06,Pynadath&Marsella07)

Intentional models whose complete solutions are identical are considered equivalent

Page 50: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Approach #1: Behavioral equivalence

Behaviorally minimal set of models

Page 51: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Lossless

Works when intentional models have differing frames

Approach #1: Behavioral equivalence

Page 52: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Multiagent tiger

Approach #1: Behavioral equivalence

Impact on I-DIDs in multiagent settings

Multiagent tiger

Multiagent MM

Page 53: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Utilize model solutions (policy trees) for mitigating model growth

Approach #1: Behavioral equivalence

Model reps that are not BE may become BE next step onwards

Preemptively identify such models and do not update all of them

Page 54: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Thank you for your time

Page 55: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Intentional models whose partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the partial trees

are identical are considered equivalent

Approach #2: Revisit BE(Zeng et al.11,12)

Sufficient but not necessary

Lossless if frames are identical

Page 56: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Approach #2: (,d)-Behavioral equivalence

Two models are (,d)-BE if their partial depth-d solutions are identical and vectors of updated beliefs at the leaves of the

partial trees differ by

Models are(0.33,1)-BE

Lossy

Page 57: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Approach #2: -Behavioral equivalence

Lemma (Boyen&Koller98): KL divergence between two distributions in a discrete Markov stochastic process reduces or remains the same after a transition, with the mixing rate acting as a discount factor

Mixing rate represents the minimal amount by which the posterior distributions agree with each other after one transition

Property of a problem and may be pre-computed

Page 58: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Given the mixing rate and a bound, , on the divergence between two belief vectors, lemma allows computing the depth, d, at which the bound is reached

Approach #2: -Behavioral equivalence

Compare two solutions up to depth d for equality

Page 59: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Discount factor F = 0.5

Multiagent Concert

Approach #2: -Behavioral equivalence

Impact on dt-planning in multiagent settings

Multiagent Concert

On a UAV reconnaissance problem in a 5x5 grid, allows the solution to scale to a 10 step look ahead in 20 minutes

Page 60: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

What is the value of d when some problems exhibit F with a value of 0 or 1?

Approach #2: -Behavioral equivalence

F=1 implies that the KL divergence is 0 after one step: Set d = 1

F=0 implies that the KL divergence does not reduce: Arbitrarily set d to the horizon

Page 61: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Intentional or subintentional models whose predictions at time step t (action distributions)

are identical are considered equivalent at t

Approach #3: Action equivalence(Zeng et al.09,12)

Page 62: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Approach #3: Action equivalence

Page 63: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Lossy

Works when intentional models have differing frames

Approach #3: Action equivalence

Page 64: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Approach #3: Action equivalence

Impact on dt-planning in multiagent settings

Multiagent tigerAE bounds the model space at each time

step to the number of distinct actions

Page 65: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Intentional or subintentional models whose predictions at time step t influence the subject agent’s plan

identically are considered equivalent at t

Regardless of whether the other agent opened the left or right door,the tiger resets thereby affecting the agent’s plan identically

Approach #4: Influence equivalence(related to Witwicki&Durfee11)

Page 66: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Influence may be measured as the change in the subject agent’s belief due to the action

Approach #4: Influence equivalence

Group more models at time step t compared to AE

Lossy

Page 67: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Compression due to approximate equivalence may violate ACC

Regain ACC by appending a covering model to the compressed set of representatives

Page 68: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Open questions

Page 69: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

N > 2 agents

Under what conditions could equivalent models belonging to different agents be

grouped together into an equivalence class?

Page 70: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Can we avoid solving models by using heuristics for identifying approximately

equivalent models?

Page 71: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Modeling Strategic Human Intent

Page 72: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Yifeng ZengReader, Teesside Univ.

Previously: Assoc Prof., Aalborg Univ.

Yingke ChenDoctoral student

Hua MaoDoctoral student

Muthu ChandrasekaranDoctoral student

Xia QuDoctoral student

Roi CerenDoctoral student

Matthew MeiselDoctoral student

Adam GoodieProfessor of Psychology, UGA

Page 73: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Computational modeling of human recursive thinking in sequential games

Computational modeling of probability judgment in stochastic games

Page 74: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Human strategic reasoning is generally hobbled by low levels of recursive thinking

(Stahl&Wilson95,Hedden&Zhang02,Camerer et al.04,Ficici&Pfeffer08)

(I think what you think that I think...)

Page 75: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

You are Player I and II is human. Will you move or stay?

Move MoveMove

Stay Stay Stay

Payoff for I:Payoff for II:

31

13

24

42

I II I IIPlayer to move:

Page 76: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Less than 40% of the sample population performed the rational action!

Page 77: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Thinking about how others think (...) is hard in general contexts

Page 78: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Move MoveMove

Stay Stay Stay

Payoff for I:

(Payoff for II is 1 – decimal)

0.6 0.4 0.2 0.8

I II I IIPlayer to move:

Page 79: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

About 70% of the sample population performed the rational action in this simpler and strictly competitive game

Page 80: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Simplicity, competitiveness and embedding the task in intuitive representations seem to facilitate

human reasoning (Flobbe et al.08, Meijering et al.11, Goodie et al.12)

Page 81: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

3-stage game

Myopic opponents default to staying (level 0) while predictive opponents think about the player’s

decision (level 1)

Page 82: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Can we computationally model these strategic behaviors using process models?

Page 83: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Yes! Using a parameterized Interactive POMDP framework

Page 84: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Replace I-POMDP’s normative Bayesian belief update with Bayesian learning that underweights evidence, parameterized by

Notice that the achievement score increases as more games are played indicating learning of the opponent modelsLearning is slow and partial

Page 85: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Replace I-POMDP’s normative expected utility maximization with quantal response model that selects actions proportional to their utilities, parameterized by

Notice the presence of rationality errors in the participants’ choices (action is inconsistent with prediction) Errors appear to reduce with time

Page 86: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Underweighting evidence during learning and quantal response for

choice have prior psychological support

Page 87: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Use participants’ predictions of other’s action to learn and participants’ actions to learn

Page 88: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Use participants’ actions to learn both and Let vary linearly

Page 89: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Insights revealed by process modeling:1. Much evidence that participants did not make rote use of BI, instead

engaged in recursive thinking2. Rationality errors cannot be ignored when modeling human decision

making and they may vary3. Evidence that participants’ could be attributing surprising

observations of others’ actions to their rationality errors

Page 90: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Open questions:1. What is the impact on strategic thinking if action outcomes

are uncertain?2. Is there a damping effect on reasoning levels if participants

need to concomitantly think ahead in time

Page 91: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

Suite of general and domain-independent approaches for compressing agent model

spaces based on equivalence

Computational modeling of human behavioral data pertaining to strategic thinking

Page 92: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

2. Bayesian plan recognition under uncertainty

Plan recognition literature has paid scant attention to finding general ways of reducing the set of feasible

plans (Carberry, 01)

Page 93: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

3. Games of imperfect information (Bayesian games)

Real-world applications often involve many player types Examples• Ad hoc coordination in a spontaneous team• Automated Poker player agent

Page 94: Influence Diagrams for Robust Decision Making in  Multiagent  Settings

3. Games of imperfect information (Bayesian games)

Real-world applications often involve many player types

Model space compression facilitates equilibrium computation