Upload
blaze-payne
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
History-Dependent Graphical Multiagent Models
Quang Duong Michael P. Wellman Satinder SinghComputer Science and Engineering
University of Michigan, USA
Yevgeniy VorobeychikComputer and Information Sciences
University of Pennsylvania, USA1
Modeling Dynamic Multiagent Behavior
• Design a representation that:– expresses a joint probability distribution
over agent actions over time– supports inference (e.g., prediction)– exploits locality of interaction
• Our solution: – history-dependent graphical multiagent
models (hGMMs)
2
Example
3
Consensus Voting [Kearns et al. ’09]:shown from agent 1’s perspective
2
3
4
5
1
6
t=10s
Agent Blue consensus
Red consensus
neither
1 1.0 0.5 0
2 0.5 1.0 0
time
Graphical Representations
• Exploit locality in agent interactions– MAIDs [Koller & Milch ’01], NIDs [Gal &
Pfeffer ’08], Action-graph games [Jiang et al. ’08]
– Graphical games [Kearns et al. ’01] and Markov random field for graphical games [Daskalakis & Papadimitriou ’06]
4
Graphical Multiagent Models (GMMs)
• [Duong, Wellman and Singh UAI-08]– Nodes: agents– Edges: dependencies between agents
– Neighborhood Ni includes i and its neighbors
• accommodates multiple sources of belief about agent behavior for static (one-shot) scenarios
5
23
4
5
16
Joint probabilitydistribution ofsystem’s actions
Joint probabilitydistribution ofsystem’s actions
potential of neighborhood’s joint actionspotential of neighborhood’s joint actions
normalizationnormalization
Contribution
Extend static GMM for modeling
• dynamic joint behaviors
• by conditioning on local history
6
History-dependent GMM (hGMM)
7
• Extend static GMM: condition joint agent behavior on abstracted history of actions
• directly captures joint behavior using limited action history
Joint probability distribution of system’s actions at time t
Joint probability distribution of system’s actions at time t
potential of neighborhood’s joint actions at tpotential of neighborhood’s joint actions at t
normalizationnormalization
neighborhood-relevant abstracted historyneighborhood-relevant abstracted history
abstracted historyabstracted history
Joint vs. Individual Behavior ModelsAutonomous agents’ behaviors are independent given complete history.
Agent i’s actions depend on past observations, specified by strategy function σi(Ht)– Individual behavior models (IBMM): conditional independence of agent
behavior given complete history.
Pr(at | Ht) = Πiσi(Ht)
History is often abstracted/summarized (limited horizon h, frequency function f, etc.), resulting in correlations in observed behavior. – Joint behavior models (hGMM) – no independence assumption
8
2
33
1
σ2(Ht2)
σ3(Ht3)
3
σ1(Ht1)
Voting Consensus Simulation
• Simulation (treated as the true model): smooth fictitious play [Camerer and Ho ’99]– agents respond probabilistically in proportion to
expected rewards (given reward function and beliefs about others’ behavior)
• Note:– This generative model is individual behavior– Given abstracted history, joint behavior models may
better capture behavior even if generated by an individual behavior model
9
Voting Consensus ModelsIndividual Behavior Multiagent Model (IBMM)
Joint Behavior Multiagent Model (hGMM)
10
normalizationnormalization Frequency that action ai is previously chosen by each of i’s neighbors
Frequency that action ai is previously chosen by each of i’s neighbors
Reward for action ai, regardless of neighbor’s actionsReward for action ai, regardless of neighbor’s actions
Expected reward for aNi, discounted by the number of dissenting neighbors
Expected reward for aNi, discounted by the number of dissenting neighbors
Frequency that aNi is previously chosen by neighborhood Ni
Frequency that aNi is previously chosen by neighborhood Ni
Model Learning and Evaluation
• Given a sequence of joint actions over m time periods X = {a0,…,am}, the log likelihood induced by the model M: LM(X;θ) – θ: model’s parameters
• Potential function learning:
– assumes a known graphical structure
– employs gradient descent
• Evaluation:
– computes LM(X;θ) to evaluate M
11
Experiments
• 10 agents
• i.i.d. payoffs for consensus red and blue results (between 0 and 1), 0 otherwise.
• max node degree d
• T = 100 or when the vote converges
• 20 smooth fictitious play game runs generated for each game configuration (10 for training, 10 for testing)
12
Results
13
1. hGMMs outperform IBMMs in predicting outcomes for shorter history lengths.
2. Shorter history horizon more abstraction of history more induced behavior correlation hGMM > IBMM
3. hGMMs outperform IBMMs in predicting outcomes across different values of d
Evaluation: log likelihood for hGMM / log likelihood for IBMM
d\h 1 2 3 4 5 6 7 8
6
3
•Green: hGMM > IBMM•Yellow: hGMM < IBMM
Asynchronous Belief Updates
• hGMMs outperform IBMMs more for longer summarization intervals v (which induce more behavior correlations)
14
Direct Sampling• Compute the joint distribution of actions as the empirical distribution of
the training data
• Evaluation: Log likelihood for hGMM / log likelihood for direct sampling
• Direct sampling is computationally more expensive but less powerful
15
Conclusions• hGMMs support efficient and effective inference
about system dynamics, using abstracted history, for scenarios exhibiting locality
• hGMMs provide better predictions of dynamic behaviors than IBMMs and direct fictitious play sampling
• Approximation does not deteriorate performance• Future work:
– More domain applications: authentic voting experimental results, other scenarios
– (Fully) dynamic GMM that allows reasoning about unobserved past states
16