We have also applied VPI in a disaster management setting :

We have also applied VPI in a disaster management setting:

We investigate overlapping coalition formation models.

Sequential Decision Making in Repeated Coalition Formation under Uncertainty

Georgios Chalkiadakis Craig Boutilier

Ongoing and Future Work We have recasted RL algorithms / sequential decision making ideas within a computational trust framework ( we beat the winner of the international ART competition!! ) : Paper in this AAMAS!: W.T.L.Teacy, Georgios Chalkiadakis, A. Rogers and N.R. Jennings: “Sequential Decision Making with Untrustworthy Service Providers”

p1

c1 e1

c2

p2

e2

c3

C0

C2

C1 e3p3

I believe that some guys are better than

my current partners…but is there any possible coalition

that can guarantee me a higher payoff

share?

Beliefs are over types.

Types reflect capabilities

(private information)

Agents have to: decide who to join

decide on how to act

decide how to share the

coalitional value / utility

Coalition structure CS= C0, C1, C2

p1

c1 e1

c2

p2

e2

c3

Coalition C0={p1, c2, e1}

C2

C1 e3p3

Action vector: a=aC0 , aC1

, aC2

Coalitional value: u(C0 | aC0) = 30

Allocation: <p1=12, c2=3, e1=15>

Coalition Formation Reasoning under Type UncertaintyType Uncertainty: It Matters!

Coalition structure CS= C0, C1, C2

+ Action-related uncertainty+ Action outcomes are stochastic+ No superadditivity assumptions

Agents have own beliefs about the types (capabilities) of others. Type uncertainty is then translated to value uncertainty: According to i, what’s the value (quality) of <C, a>?

A Bayesian Coalition Formation Model N agents; each agent i has a type t Ti

Set of type profiles: For any coalition Agent i has beliefs about the types of the members of any C of agents:

Coalitional actions (i.e., choice of task) for C : AC

Action’s outcome s S (given actual members’ types)

Probability

Each s results into some reward R(s) Each i has (a possibly different) estimate about the

value of any coalition C:

Example experiment:The Good, the Bad, and the Ugly

Optimal Repeated Coalition Formation

Approximation Algorithms

Discounted accumulated rewards:

Total actual rewards gathered during the “Big Crime” phase:

Belief-State MDP formulation to address the induced exploration-

exploitation problem:

i.e.: Equations account for the sequential value

of coalitional agreements

One-step lookahead (OSLA): performs a one-step lookahead in belief space

VPI exploration: estimates Value of Perfect Information regarding coalitional agreements

VPI-over-OSLA: combines VPI with OSLA Maximum a Posteriori (MAP): uses the most

likely type vector given beliefs Myopic: calculates expectation disregarding the

sequential value of formation decisions

Takes into account the immediate reward from forming a coalition and executing an action

Takes into account the long-term impact of a coalitional agreement (i.e., the value of information: through belief-state updating and incorporation of the belief-state value into calculations)

School of Electronics and Computer ScienceUniversity of Southampton

Southampton, United Kingdom

Department of Computer ScienceUniversity of Toronto

Toronto, Canada

VPI is a winner!

Balances the expected gain against the expected cost from executing a suboptimal action:

• Use current model to myopically evaluate actions’ EU• Assume an action results to perfect information regarding its Q-value. • This perfect information has non-zero value only if it results to a change in policy.• EVPI is calculated and accounted for in action selection (act greedily towards EU + EVPI )

a) Bayesian, and, yet,b) efficient: uses myopic evaluation of actions, but boosts their desirability EVPI estimates

Consistently outperforms other approximation algorithms Scales to dozens / hundreds of agents (see below), unlike lookahead approaches

Documents

We have also applied VPI in a disaster management setting :