Overall Desiderata for Sigma (𝚺 )

Modeling Two-Player Games in the Sigma Graphical Cognitive ArchitectureDavid V. Pynadath, Paul S. Rosenbloom, Stacy C. Marsella and Lingshan Li

8.1.2013

Σ

2

Overall Desiderata for Sigma ( )𝚺 A new breed of cognitive architecture that is

Grand unified Cognitive + key non-cognitive (perception, motor, affective, …)

Functionally elegant Broadly capable yet simple and theoretically elegant

“cognitive Newton’s laws”

Sufficiently efficient Fast enough for anticipated applications

For virtual humans & intelligent agents/robots that are Broadly, deeply and robustly cognitive Interactive with their physical and social worlds Adaptive given their interactions and experience

Hybrid: Discrete + ContinuousMixed: Symbolic + Probabilistic

3

For education, training, interfaces, health, entertainment, …

Sample ICT Virtual HumansAda & Grace

SASO

Gunslinger

INOTS

4

ToM models the minds of others, to enable for example: Understanding multiagent situations Participating in social interactions

ToM approach based on PsychSim (Marsella & Pynadath) Decision theoretic problem solving based on POMDPs Recursive agent modeling

Questions to be answered Can Sigma elegantly extend to comparable ToM? What are the benefits for ToM? What new phenomena emerge from this combination?

Results reported here concern: Multiagent Sigma Implementation of single shot, two player games

Both simultaneous and sequential moves

Theory of Mind (ToM) in Sigma

5

Constructed in layers In analogy to computer systems

The Structure of SigmaComputer System

ComputerArchitecture

MicrocodeArchitecture

Programs & Services

Hardware

Graph ModificationGraph SolutionGraphical Architecture:Graphical modelsPiecewise linear

functions

Memory AccessPerception Decision Learning ActionCognitive Arch:Predicates

(WM)Conditionals

(LTM)

𝚺 Cognitive System

CognitiveArchitecture

GraphicalArchitecture

Knowledge & Skills

Lisp

Conditionals: Deep blending of rules and probabilistic networksGraphical models: Factor graphs + summary product algorithm

6

A reactive layer One (internally parallel) graph/cognitive cycleWhich acts as the inner loop for

A deliberative layer Serial selection and application of operatorsWhich acts as the inner loop for

A reflective layer Recursive, impasse-driven, meta-level generation

The layers differ in Time scales Serial versus parallel Controlled versus uncontrolled

Control Structure: Soar-like Nesting of Three Layers

Tie

No-Change

7

Single-Shot, Simultaneous-Move, Two-Player Games

Prisoner’s Dilemma

Cooperate

Defect

Cooperate .3 .1(,.4)

Defect .4(,.1) .2A

B

Two players move simultaneously Played only once (not repeated)

So no need to look beyond current decision Symmetric and asymmetric games Socially preferred outcome: optimum in some sense Nash equilibrium: Neither player can unilaterally increase

their payoff by altering their own choice Key result: Sigma found the best Nash equilibrium in

one memory access (i.e., graph solution) Although linear combination in article can’t always guarantee it

Prisoner’s Dilemma

Cooperate

Defect AResult

BResult

Cooperate .3 .1 .43 .43

Defect .4 .2 .57 .57

StagHunt

Cooperate

Defect AResult

BResult

Cooperate .25 0 .54 .54

Defect .1 .1 .46 .46

602 Messages 962 Messages

8

Players (A, B) alternate moves E.g., Ultimatum, centipede and negotiation

Decision-theoretic approach with softmax combination Use expected value at each level of search Action Ps assumed exponential in their utilities (à la Boltzmann)

There may be many Nash equilibria Instead seek stricter concept of subgame perfection

Overall strategy is an equilibrium strategy over any subgame

Key result: Games solvable in two modes: Automatic/reactive/system-1 Controlled/deliberate/system-2Both modes well documented in humans for general processingCombination not found previously in ToM models

Sequential Games

9

A starts with a fixed amount of money (3) A decides how much (in 0-3) to offer B B decides whether or not to accept the offer

If B accepts, each gets the resulting amount If B rejects, both get 0

Each has a utility function over money E.g., <.1, .4, .7, 1>

The Ultimatum Game

10

A trellis (factor) graph in LTM with one stage per move Focus on backwards messages from reward(s)

Automatic/Reactive Approach

TA TBaccept moneyoffer

exp

CONDITIONAL Transition-A Conditions: Money(agent:A quantity:moneya) Accept-E(offer:offer acceptance:choice) Condacts: Offer(agent:A quantity:offer) Function(choice,offer,moneya): 1<T,0,3>, 1<T,1,2>, 1<T,2,1>, 1<T,3,0>, 1<F,*,0>

CONDITIONAL Transition-B Conditions: Money(agent:B quantity:moneyb) Condacts: Accept(offer:offer acceptance:choice) Function(choice,offer,moneyb): 1<T,0,0>, 1<T,1,1>, 1<T,2,2>, 1<T,3,3>, 1<F,*,0>

CONDITIONAL Reward Condacts: Money(agent:agent quantity:money) Function(agent,money): .1<*,0>, .4<*,1>, .7<*,2>, 1<*,3>

reward

11

Decision-theoretic problem-space search across metalevels Very Soar-like, but with softmax combination

Depends on summary product and Sigma’s mixed aspect Corresponds to PsychSim’s online reasoning

Controlled/Deliberate(Reflective) Approach

E(2)

no-change

E(accept)

no-change

0123tie

A

acceptreject2

tie

B

accept

0123tie

none

A1

A0123

E(2)

acceptreject

tie

no-change

2

tie

none

A

B

12

Automatic version (5 conditionals) A’s normalized distribution over offers: <.315, .399, .229, .057> 1 decision (94 messages) and .02 s (on a MacBook Air)

Controlled version (19 conditionals) A’s normalized distribution over offers: <.314, .400, .229, .057> 72 decisions (868 messages/decision) and 126.69 s

Same result, with distinct computational properties Automatic is fast and occurs in parallel with other memory processing, but is not

(easily) penetrable by new bits of other knowledge Controlled is slow, sequential, but can (easily) integrate new knowledge Distinction also maps onto expert versus novice behavior in general

Raises possibility of a generalization of Soar’s chunking mechanism Compile/learn automatic trellises from controlled problem solving Finer grained, mixed(/hybrid) learning mechanism

Comments on the Ultimatum Game

Speed Ratio >6000

Distributions Comparable

13

Simultaneous games are solvable within a single decision Yield Nash equilibria (although linear combination doesn’t guarantee)

Sequential games are solvable in either an automatic or a controlled manner Raises possibility of a mixed variant of chunking that automatically

learns probabilistic trellises (HMMs, DBNs, …) from problem solving May yield a novel form of general structure learning for graphical models

Two architectural modifications to Sigma were required Multiagent decision making (and reflection) Optional exponentiation of outgoing WM messages (for softmax)

Future work includes More complex games Belief updating (learning models of others)

Conclusion

14

Memory [ICCM 10] Procedural (rule) Declarative (semantic/episodic) Constraint

Problem solving Preference based decisions [AGI 11]

Impasse-driven reflection [AGI 13]

Decision-theoretic (POMDP) [BICA 11b]

Theory of Mind [AGI 13]

Learning [ICCM 13] Episodic Concept (supervised/unsupervised) Reinforcement [AGI 12b]

Action modeling [AGI 12b]

Map (as part of SLAM)

Overall Progress in Sigma Mental imagery [BICA 11a; AGI 12a]

1-3D continuous imagery buffer Object transformation Feature & relationship detection

Perception [BICA 11b]

Object recognition (CRFs) Localization

Natural language Question answering (selection) Word sense disambiguation [ICCM 13]

Part of speech tagging [ICCM 13]

Isolated word speech recognition

Graph integration [BICA 11b]

CRF + Localization + POMDP

Some of these are still just beginnings

Documents

Overall Desiderata for Sigma (𝚺 )