1
We have also applied VPI in a disaster management setting: We investigate overlapping coalition formation models. Sequential Decision Making in Repeated Coalition Formation under Uncertainty Georgios Chalkiadakis Craig Boutilier Ongoing and Future Work We have recasted RL algorithms / sequential decision making ideas within a computational trust framework ( we beat the winner of the international ART competition!! ) : Paper in this AAMAS! : W.T.L.Teacy, Georgios Chalkiadakis, A. Rogers and N.R. Jennings: “Sequential Decision Making with Untrustworthy Service Providers” p1 c1 e1 c2 p2 e2 c3 C 0 C 2 C 1 e3 p3 I believe that some guys are better than my current partners…but is there any possible coalition that can guarantee me a higher payoff share? Beliefs are over types . Types reflect capabilities (private information) Agents have to: decide who to join decide on how to act decide how to share the coalitional value / utility Coalition structure CS= C 0 , C 1 , C 2 p1 c1 e1 c2 p2 e2 c3 Coalition C 0 ={p1, c2, e1} C 2 C 1 e3 p3 Action vector: a=a C 0 , a C 1 , a C 2 litional value: u(C 0 | a C 0 ) = 30 location: <p1=12, c2=3, e1=15> Coalition Formation Reasoning under Type Uncertainty Type Uncertainty: It Matters! Coalition structure CS= C 0 , C 1 , C 2 + Action-related uncertainty + Action outcomes are stochastic + No superadditivity assumptions Agents have own beliefs about the types (capabilities) of others. Type uncertainty is then translated to value uncertainty: According to i, what’s the value (quality) of <C, a>? A Bayesian Coalition Formation Model N agents; each agent i has a type t T i Set of type profiles: For any coalition Agent i has beliefs about the types of the members of any C of agents: Coalitional actions (i.e., choice of task) for C : A C Action’s outcome s S (given actual members’ types) Probability Each s results into some reward R(s) Each i has (a possibly different) estimate about the value of any coalition C: Example experiment: The Good, the Bad, and the Ugly Optimal Repeated Coalition Formation Approximation Algorithms Discounted accumulate d rewards: Total actual rewards gathered during the “Big Crime” phase: Belief-State MDP formulation to address the induced exploration- exploitation problem: i.e.: Equations account for the sequential value of coalitional agreements One-step lookahead (OSLA): performs a one-step lookahead in belief space VPI exploration: estimates Value of Perfect Information regarding coalitional agreements VPI-over-OSLA: combines VPI with OSLA Maximum a Posteriori (MAP): uses the most likely type vector given beliefs Myopic: calculates expectation disregarding the sequential value of formation decisions Takes into account the immediate reward from forming a coalition and executing an action Takes into account the long-term impact of a coalitional agreement (i.e., the value of information: through belief-state updating and incorporation of the belief- state value into calculations) School of Electronics and Computer Science University of Southampton Southampton, United Kingdom Department of Computer Science University of Toronto Toronto, Canada VPI is a winner! Balances the expected gain against the expected cost from executing a suboptimal action: Use current model to myopically evaluate actions’ EU Assume an action results to perfect information regarding its Q-value. This perfect information has non-zero value only if it results to a change in policy. EVPI is calculated and accounted for in action selection (act greedily towards EU + EVPI ) a) Bayesian, and, yet, b) efficient: uses myopic evaluation of actions, but boosts their desirability EVPI estimates Consistently outperforms other approximation algorithms Scales to dozens / hundreds of agents (see below), unlike lookahead approaches

We have also applied VPI in a disaster management setting :

  • Upload
    fancy

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

Coalition C 0 ={p1, c2, e1}. p2. C 2. c2. e2. p1. c3. e1. c1. C 1. e3. p3. University of Toronto. Department of Computer Science. Sequential Decision Making in Repeated Coalition Formation under Uncertainty Georgios Chalkiadakis Craig Boutilier. - PowerPoint PPT Presentation

Citation preview

Page 1: We have also applied VPI  in a disaster management setting :

We have also applied VPI in a disaster management setting:

We investigate overlapping coalition formation models.

Sequential Decision Making in Repeated Coalition Formation under Uncertainty

Georgios Chalkiadakis Craig Boutilier

Ongoing and Future Work We have recasted RL algorithms / sequential decision making ideas within a computational trust framework ( we beat the winner of the international ART competition!! ) : Paper in this AAMAS!: W.T.L.Teacy, Georgios Chalkiadakis, A. Rogers and N.R. Jennings: “Sequential Decision Making with Untrustworthy Service Providers”

p1

c1 e1

c2

p2

e2

c3

C0

C2

C1 e3p3

I believe that some guys are better than

my current partners…but is there any possible coalition

that can guarantee me a higher payoff

share?

Beliefs are over types.

Types reflect capabilities

(private information)

Agents have to: decide who to join

decide on how to act

decide how to share the

coalitional value / utility

Coalition structure CS= C0, C1, C2

p1

c1 e1

c2

p2

e2

c3

Coalition C0={p1, c2, e1}

C2

C1 e3p3

Action vector: a=aC0 , aC1

, aC2

Coalitional value: u(C0 | aC0) = 30

Allocation: <p1=12, c2=3, e1=15>

Coalition Formation Reasoning under Type UncertaintyType Uncertainty: It Matters!

Coalition structure CS= C0, C1, C2

+ Action-related uncertainty+ Action outcomes are stochastic+ No superadditivity assumptions

Agents have own beliefs about the types (capabilities) of others. Type uncertainty is then translated to value uncertainty: According to i, what’s the value (quality) of <C, a>?

A Bayesian Coalition Formation Model N agents; each agent i has a type t Ti

Set of type profiles: For any coalition Agent i has beliefs about the types of the members of any C of agents:

Coalitional actions (i.e., choice of task) for C : AC

Action’s outcome s S (given actual members’ types)

Probability

Each s results into some reward R(s) Each i has (a possibly different) estimate about the

value of any coalition C:

Example experiment:The Good, the Bad, and the Ugly

Optimal Repeated Coalition Formation

Approximation Algorithms

Discounted accumulated rewards:

Total actual rewards gathered during the “Big Crime” phase:

Belief-State MDP formulation to address the induced exploration-

exploitation problem:

i.e.: Equations account for the sequential value

of coalitional agreements

One-step lookahead (OSLA): performs a one-step lookahead in belief space

VPI exploration: estimates Value of Perfect Information regarding coalitional agreements

VPI-over-OSLA: combines VPI with OSLA Maximum a Posteriori (MAP): uses the most

likely type vector given beliefs Myopic: calculates expectation disregarding the

sequential value of formation decisions

Takes into account the immediate reward from forming a coalition and executing an action

Takes into account the long-term impact of a coalitional agreement (i.e., the value of information: through belief-state updating and incorporation of the belief-state value into calculations)

School of Electronics and Computer ScienceUniversity of Southampton

Southampton, United Kingdom

Department of Computer ScienceUniversity of Toronto

Toronto, Canada

VPI is a winner!

Balances the expected gain against the expected cost from executing a suboptimal action:

• Use current model to myopically evaluate actions’ EU• Assume an action results to perfect information regarding its Q-value. • This perfect information has non-zero value only if it results to a change in policy.• EVPI is calculated and accounted for in action selection (act greedily towards EU + EVPI )

a) Bayesian, and, yet,b) efficient: uses myopic evaluation of actions, but boosts their desirability EVPI estimates

Consistently outperforms other approximation algorithms Scales to dozens / hundreds of agents (see below), unlike lookahead approaches