Upload
duman
View
58
Download
0
Embed Size (px)
DESCRIPTION
Modeling Two-Player Games in the Sigma Graphical Cognitive Architecture David V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li 8.1.2013. Σ. Overall Desiderata for Sigma (𝚺 ). A new breed of cognitive architecture that is Grand unified - PowerPoint PPT Presentation
Citation preview
Modeling Two-Player Games in the Sigma Graphical Cognitive ArchitectureDavid V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li
8.1.2013
Σ
2
Overall Desiderata for Sigma ( )𝚺 A new breed of cognitive architecture that is
Grand unified Cognitive + key non-cognitive (perception, motor, affective, …)
Functionally elegant Broadly capable yet simple and theoretically elegant
“cognitive Newton’s laws”
Sufficiently efficient Fast enough for anticipated applications
For virtual humans & intelligent agents/robots that are Broadly, deeply and robustly cognitive Interactive with their physical and social worlds Adaptive given their interactions and experience
Hybrid: Discrete + ContinuousMixed: Symbolic + Probabilistic
3
For education, training, interfaces, health, entertainment, …
Sample ICT Virtual HumansAda & Grace
SASO
Gunslinger
INOTS
4
ToM models the minds of others, to enable for example: Understanding multiagent situations Participating in social interactions
ToM approach based on PsychSim (Marsella & Pynadath) Decision theoretic problem solving based on POMDPs Recursive agent modeling
Questions to be answered Can Sigma elegantly extend to comparable ToM? What are the benefits for ToM? What new phenomena emerge from this combination?
Results reported here concern: Multiagent Sigma Implementation of single shot, two player games
Both simultaneous and sequential moves
Theory of Mind (ToM) in Sigma
5
Constructed in layers In analogy to computer systems
The Structure of SigmaComputer System
ComputerArchitecture
MicrocodeArchitecture
Programs & Services
Hardware
Graph ModificationGraph SolutionGraphical Architecture:Graphical modelsPiecewise linear
functions
Memory AccessPerception Decision Learning ActionCognitive Arch:Predicates
(WM)Conditionals
(LTM)
𝚺 Cognitive System
CognitiveArchitecture
GraphicalArchitecture
Knowledge & Skills
Lisp
Conditionals: Deep blending of rules and probabilistic networksGraphical models: Factor graphs + summary product algorithm
6
A reactive layer One (internally parallel) graph/cognitive cycleWhich acts as the inner loop for
A deliberative layer Serial selection and application of operatorsWhich acts as the inner loop for
A reflective layer Recursive, impasse-driven, meta-level generation
The layers differ in Time scales Serial versus parallel Controlled versus uncontrolled
Control Structure: Soar-like Nesting of Three Layers
Tie
No-Change
7
Single-Shot, Simultaneous-Move, Two-Player Games
Prisoner’s Dilemma
Cooperate
Defect
Cooperate .3 .1(,.4)
Defect .4(,.1) .2A
B
Two players move simultaneously Played only once (not repeated)
So no need to look beyond current decision Symmetric and asymmetric games Socially preferred outcome: optimum in some sense Nash equilibrium: Neither player can unilaterally increase
their payoff by altering their own choice Key result: Sigma found the best Nash equilibrium in
one memory access (i.e., graph solution) Although linear combination in article can’t always guarantee it
Prisoner’s Dilemma
Cooperate
Defect AResult
BResult
Cooperate .3 .1 .43 .43
Defect .4 .2 .57 .57
StagHunt
Cooperate
Defect AResult
BResult
Cooperate .25 0 .54 .54
Defect .1 .1 .46 .46
602 Messages 962 Messages
8
Players (A, B) alternate moves E.g., Ultimatum, centipede and negotiation
Decision-theoretic approach with softmax combination Use expected value at each level of search Action Ps assumed exponential in their utilities (à la Boltzmann)
There may be many Nash equilibria Instead seek stricter concept of subgame perfection
Overall strategy is an equilibrium strategy over any subgame
Key result: Games solvable in two modes: Automatic/reactive/system-1 Controlled/deliberate/system-2Both modes well documented in humans for general processingCombination not found previously in ToM models
Sequential Games
9
A starts with a fixed amount of money (3) A decides how much (in 0-3) to offer B B decides whether or not to accept the offer
If B accepts, each gets the resulting amount If B rejects, both get 0
Each has a utility function over money E.g., <.1, .4, .7, 1>
The Ultimatum Game
10
A trellis (factor) graph in LTM with one stage per move Focus on backwards messages from reward(s)
Automatic/Reactive Approach
TA TBaccept moneyoffer
exp
CONDITIONAL Transition-A Conditions: Money(agent:A quantity:moneya) Accept-E(offer:offer acceptance:choice) Condacts: Offer(agent:A quantity:offer) Function(choice,offer,moneya): 1<T,0,3>, 1<T,1,2>, 1<T,2,1>, 1<T,3,0>, 1<F,*,0>
CONDITIONAL Transition-B Conditions: Money(agent:B quantity:moneyb) Condacts: Accept(offer:offer acceptance:choice) Function(choice,offer,moneyb): 1<T,0,0>, 1<T,1,1>, 1<T,2,2>, 1<T,3,3>, 1<F,*,0>
CONDITIONAL Reward Condacts: Money(agent:agent quantity:money) Function(agent,money): .1<*,0>, .4<*,1>, .7<*,2>, 1<*,3>
reward
11
Decision-theoretic problem-space search across metalevels Very Soar-like, but with softmax combination
Depends on summary product and Sigma’s mixed aspect Corresponds to PsychSim’s online reasoning
Controlled/Deliberate(Reflective) Approach
E(2)
no-change
E(accept)
no-change
0123tie
A
acceptreject2
tie
B
accept
0123tie
none
A1
A0123
E(2)
acceptreject
tie
no-change
2
tie
none
A
B
12
Automatic version (5 conditionals) A’s normalized distribution over offers: <.315, .399, .229, .057> 1 decision (94 messages) and .02 s (on a MacBook Air)
Controlled version (19 conditionals) A’s normalized distribution over offers: <.314, .400, .229, .057> 72 decisions (868 messages/decision) and 126.69 s
Same result, with distinct computational properties Automatic is fast and occurs in parallel with other memory processing, but is not
(easily) penetrable by new bits of other knowledge Controlled is slow, sequential, but can (easily) integrate new knowledge Distinction also maps onto expert versus novice behavior in general
Raises possibility of a generalization of Soar’s chunking mechanism Compile/learn automatic trellises from controlled problem solving Finer grained, mixed(/hybrid) learning mechanism
Comments on the Ultimatum Game
Speed Ratio >6000
Distributions Comparable
13
Simultaneous games are solvable within a single decision Yield Nash equilibria (although linear combination doesn’t guarantee)
Sequential games are solvable in either an automatic or a controlled manner Raises possibility of a mixed variant of chunking that automatically
learns probabilistic trellises (HMMs, DBNs, …) from problem solving May yield a novel form of general structure learning for graphical models
Two architectural modifications to Sigma were required Multiagent decision making (and reflection) Optional exponentiation of outgoing WM messages (for softmax)
Future work includes More complex games Belief updating (learning models of others)
Conclusion
14
Memory [ICCM 10] Procedural (rule) Declarative (semantic/episodic) Constraint
Problem solving Preference based decisions [AGI 11]
Impasse-driven reflection [AGI 13]
Decision-theoretic (POMDP) [BICA 11b]
Theory of Mind [AGI 13]
Learning [ICCM 13] Episodic Concept (supervised/unsupervised) Reinforcement [AGI 12b]
Action modeling [AGI 12b]
Map (as part of SLAM)
Overall Progress in Sigma Mental imagery [BICA 11a; AGI 12a]
1-3D continuous imagery buffer Object transformation Feature & relationship detection
Perception [BICA 11b]
Object recognition (CRFs) Localization
Natural language Question answering (selection) Word sense disambiguation [ICCM 13]
Part of speech tagging [ICCM 13]
Isolated word speech recognition
Graph integration [BICA 11b]
CRF + Localization + POMDP
Some of these are still just beginnings