Upload
fancy
View
27
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Coalition C 0 ={p1, c2, e1}. p2. C 2. c2. e2. p1. c3. e1. c1. C 1. e3. p3. University of Toronto. Department of Computer Science. Sequential Decision Making in Repeated Coalition Formation under Uncertainty Georgios Chalkiadakis Craig Boutilier. - PowerPoint PPT Presentation
Citation preview
We have also applied VPI in a disaster management setting:
We investigate overlapping coalition formation models.
Sequential Decision Making in Repeated Coalition Formation under Uncertainty
Georgios Chalkiadakis Craig Boutilier
Ongoing and Future Work We have recasted RL algorithms / sequential decision making ideas within a computational trust framework ( we beat the winner of the international ART competition!! ) : Paper in this AAMAS!: W.T.L.Teacy, Georgios Chalkiadakis, A. Rogers and N.R. Jennings: “Sequential Decision Making with Untrustworthy Service Providers”
p1
c1 e1
c2
p2
e2
c3
C0
C2
C1 e3p3
I believe that some guys are better than
my current partners…but is there any possible coalition
that can guarantee me a higher payoff
share?
Beliefs are over types.
Types reflect capabilities
(private information)
Agents have to: decide who to join
decide on how to act
decide how to share the
coalitional value / utility
Coalition structure CS= C0, C1, C2
p1
c1 e1
c2
p2
e2
c3
Coalition C0={p1, c2, e1}
C2
C1 e3p3
Action vector: a=aC0 , aC1
, aC2
Coalitional value: u(C0 | aC0) = 30
Allocation: <p1=12, c2=3, e1=15>
Coalition Formation Reasoning under Type UncertaintyType Uncertainty: It Matters!
Coalition structure CS= C0, C1, C2
+ Action-related uncertainty+ Action outcomes are stochastic+ No superadditivity assumptions
Agents have own beliefs about the types (capabilities) of others. Type uncertainty is then translated to value uncertainty: According to i, what’s the value (quality) of <C, a>?
A Bayesian Coalition Formation Model N agents; each agent i has a type t Ti
Set of type profiles: For any coalition Agent i has beliefs about the types of the members of any C of agents:
Coalitional actions (i.e., choice of task) for C : AC
Action’s outcome s S (given actual members’ types)
Probability
Each s results into some reward R(s) Each i has (a possibly different) estimate about the
value of any coalition C:
Example experiment:The Good, the Bad, and the Ugly
Optimal Repeated Coalition Formation
Approximation Algorithms
Discounted accumulated rewards:
Total actual rewards gathered during the “Big Crime” phase:
Belief-State MDP formulation to address the induced exploration-
exploitation problem:
i.e.: Equations account for the sequential value
of coalitional agreements
One-step lookahead (OSLA): performs a one-step lookahead in belief space
VPI exploration: estimates Value of Perfect Information regarding coalitional agreements
VPI-over-OSLA: combines VPI with OSLA Maximum a Posteriori (MAP): uses the most
likely type vector given beliefs Myopic: calculates expectation disregarding the
sequential value of formation decisions
Takes into account the immediate reward from forming a coalition and executing an action
Takes into account the long-term impact of a coalitional agreement (i.e., the value of information: through belief-state updating and incorporation of the belief-state value into calculations)
School of Electronics and Computer ScienceUniversity of Southampton
Southampton, United Kingdom
Department of Computer ScienceUniversity of Toronto
Toronto, Canada
VPI is a winner!
Balances the expected gain against the expected cost from executing a suboptimal action:
• Use current model to myopically evaluate actions’ EU• Assume an action results to perfect information regarding its Q-value. • This perfect information has non-zero value only if it results to a change in policy.• EVPI is calculated and accounted for in action selection (act greedily towards EU + EVPI )
a) Bayesian, and, yet,b) efficient: uses myopic evaluation of actions, but boosts their desirability EVPI estimates
Consistently outperforms other approximation algorithms Scales to dozens / hundreds of agents (see below), unlike lookahead approaches