Interaction settings for measuring (social) intelligence in multi-agent systems Javier Insa-Cabrera, José Hernández-Orallo Dep. de Sistemes Informàtics

In

tera

ctio

n setti

ngs for

mea

surin

g (soc

ial)

inte

lligen

ce in

multi

-agen

t

syst

ems

Javier Insa-Cabrera, José Hernández-OralloDep. de Sistemes Informàtics i Computació,Universitat Politècnica de Valè[email protected]

II Workshop ReteCog INTERACTION,Zaragoza, 17-18 January, 2013

OUTLINE

1. Introduction2. Interactive general tests3. Some findings and caveats4. Configurations5. Difficulty estimation6. Conclusions

INTERACTION SETTINGS FOR MEASURING (SOCIAL) INTELLIGENCE IN MULTI-AGENT SYSTEMS

2

1. INTRODUCTION


3

Why are tests (for machines) good methodologically? An intelligence test can be seen as a definition of

intelligence. Note that a definition of intelligence does not ensure

an intelligence test. Cognitive tests can be refuted by experimentation.

Especially those that are universal, since they must put very different kinds of subjects on the same scale.

Cognitive tests can be used to evaluate systems and assess the progress of a discipline.

They will become more and more necessary in the future.

They are useful to make us formulate new questions and address new challenges.

1. INTRODUCTION

Can we construct ‘universal’ intelligence tests?


4

Project anYnt (Anytime Universal Intelligence)

http://users.dsic.upv.es/proy/anynt/

Any kind of system (biological, non-biological, human).

Any system now or in the future.Any moment in its development (child, adult).Any degree of intelligence.Any speed.Evaluation can be stopped at any time.

1. INTRODUCTION


5

Intelligence as a cognitive ability: General Intelligence: Capacity to perform well in any

kind of environment. Social Intelligence: Ability to perform well in an

environment interacting with other agents. Related but different from collective intelligence or

emotional intelligence.Why social intelligence is so important?

It is shown to be one of the distinctive traits in human intelligence and other animals. Hermann, Call, Hernández-Lloreda, Hare, Tomasello “Humans have

evolved specialized skills of social cognition. The cultural intelligence hypothesis”, Science, 2007.

Shows the ability to create “mind models”.

1. INTRODUCTION


6

Approach: Tests must be universal. Tests must have a formal background of what we are

measuring. Following the “tradition” of tests based on compression,

Kolmogorov complexity and related ideas: Turing Test enhanced with compression (Dowe and Hajek

“A non-behavioural, computational extension to the Turing Test, ICCIMA, 1998)

C-tests: Intelligence tests based on Kolmogorov Complexity (Hernandez-Orallo “Beyond the Turing Test”, J. Logic, Language & Information, 2000)

2. INTERACTIVE GENERAL TESTS


7

Universal Intelligence (Legg and Hutter “Universal

intelligence: A definition of machine intelligence, 2007). An interactive extension of C-tests. Agents are evaluated in a classical reinforcement

learning setting.

Choice of environments is done and results averaged using a universal distribution.

This leads to the following definition:

= performance over a universal distribution of environments.

π μri

oi

ai



8

Anytime Intelligence Test (Hernandez-Orallo and Dowe “Measuring universal intelligence: Towards and anytime intelligence test”,

Artificial Intelligence, 2010).An interactive setting following (Legg and Hutter 2007) which addresses:Issues about the difficulty of environments.The definition of discriminative environments.Finite samples and (practical) finite interactions.

Time (speed) of agents and environments.Reward aggregation, convergence issues.Anytime and adaptive application.



9

An environment class (Hernandez-Orallo “A (hopefully) Unbiased Universal Environment Class for Measuring Intelligence of Biological and Artificial Systems”, Artificial General Intelligence, 2010). Spaces are defined as fully connected graphs.

Actions are the arrows in the graphs. Observations are the ‘contents’ of each vertex/cell in the

graph. Example:

Agents can perform actions inside the space. Rewards: Two special agents, Good (⊕) and Evil (⊖),

which are responsible for the rewards: leave a trail. With regular graphs the space resembles a cellular

automaton (and other computational models).



10

With the test definitions and this environment class, we have been evaluating ‘general intelligence’ of different systems. Experiments concluded that the test prototype is not

universal (Insa-Cabrera et al. “Comparing Humans and AI agents”, Artificial General Intelligence, 2011).

Environments rarely contain social behaviour. Environment distributions should be reconsidered:

Darwin-Wallace distribution (Hernandez-Orallo et al “On more realistic environment distributions for defining, evaluating and developing intelligence”, Artificial General Intelligence, 2011).



11

Towards social tests: Goal: modify the setting to include some social behaviour.

See whether social behaviour better discriminates between humans and machines.

How: Introduce simple agents in the environments.

Convert environment into a truly Multi-Agent System (MAS).

Examine the impact of other agents over agent performance using competitive and cooperative scenarios.

3. SOME FINDINGS AND CAVEATS


12

Agents compared: Reinforcement Learning algorithms:

Q-learning SARSA QV-learning

Simple algorithm Random

Results when alone in the environment (only with ⊕ and ⊖)



13

Competition: All the agents compete for rewards.

Competition results: four agents, including the random agent

Competition results: three agents, without the random agent



14

Cooperation: The agents receive the average of obtained rewards.

Cooperation results: four agents, including the random agent

Cooperation results: three agents, without the random agent



15

Teams: Two teams (2 Qlearning vs 2 SARSA) compete for

rewards. Competition and cooperation.



16

Competitive scenario Cooperative scenario



17

Findings: The inclusion of other agents (even random) make

other agents perform worse. RL algorithms increase their cost matrix. Algorithms should learn to deal with ‘noise’.

Complexity increases with the inclusion of social behaviour. The complexity of the environment is more related to

the complexity of the other agents. We need to calculate first the complexity (or

intelligence) of the other agents included in the environment.

The overall complexity gets too large (which also means that its approximation is much more difficult).



18

We need a more minimalistic setting The use of complex agents such as Q-learning

(hundreds of lines of code) or a random agent makes the connection between difficulty and environment complexity (including the agents) much more intricate. We need to simplify the setting and consider simple

agents: We analyse several configurations next.

We need to derive and analyse the difficulty in a different way. We analyse several distributional approaches.

4. CONFIGURATIONS


19

Multi-agent environment configurations:

We look for configurations which are minimalist. The behaviour is given mostly by the other agents,

not by the environment. Agents can have simple action, perception and reward

schemas. Simple agents may be easy to define: colliders,

evaders, random, etc.

4. CONFIGURATIONS


20

Space and actions: We keep the previous configuration: a graph where the

edges are the actions and the vertices are the cells. Not necessarily regular (as before).

Observations: Agents see some of the cells (e.g., adjacent cells and

their content). Agents see who is in each cell (and not only how many).

This is important for social intelligence, since agents need to identify different mind models.

4. CONFIGURATIONS


21

Rewards: We consider all agents equal. There are no special

agents ⊕ and ⊖ generating rewards. No heavens, no hells:

The number of rewards shared by / included in the system must be finite and remain constant.

There must always be a way to prevent one agent from getting all the rewards.

We relax the ‘balancedness’ property (random agents score 0). It is difficult to ensure in general. Now rewards are always positive.

4. CONFIGURATIONS


22

4. CONFIGURATIONS


23

Current configuration. Definition: Agents can be arranged into teams of at least one agent.

The team of agent a is denoted by T(a). There is a fixed number of indivisible (energy) units. Start:

The number of units each agent a stores is denoted by U(a). Agents are originally empty: U(a) = 0.

The units are originally spread at random on the space cells. The number of units in c is denoted by U(c). The number of agents in c is denoted by A(c),

Reward rule: If A(c) = n and U(c) = m, for each agent a we have U(a) U(a) + 1, provided m ≥ n. If m < n then for each agent a we have U(a) U(a) U(c) / A(c).

For each step, each agent’s rewards are the sum of units that all its team’s members carry divided by the number of agents in the team.

4. CONFIGURATIONS


24

Current configuration. Properties: If agents are optimal, some equilibria appear. For suboptimal and diverse populations of agents,

interesting strategies emerge. These strategies depend mostly on how the other

agents behave. Capturing the behaviour of the other agents is crucial

for succeeding in this game. Co-operation can take place, especially when using

teams.

4. CONFIGURATIONS


25

Defining and playing with simple agents: Examples:

Random agents. Colliders: go to the nearby cell with most agents. Evaders: go to the nearby cell with least agents. Gluttons: go to the nearby cell with most energy. Regular: do regular patterns. …

A very simple agent description language is been designed to describe most of them.

5. DIFFICULTY ESTIMATION


26

Difficulty is not complexity. An environment full of very complex agents can still be

very benevolent and easy. The other agents may not compete for the rewards. There may be shortcuts leading to very simple (and

possibly non-social) good policies in very complex and chaotic situations.

Furthermore, using the complexity of the environment (and everything it contains), as used for non-social environments, leads to:

Where again this relation is only unidirectional (a difficult environment must be complex): D is high implies K is high.

But with other agents, this is a very loose upper bound and is not useful as a definition or approximation of D.



27

A solution-centred view of difficulty: When do we say that a (social) environment is easy?

What are good results?

If there are many agents (policies) leading to good Ri then we say that the environment is easy.

An environment (or a task) is said to be easy when simple policies get good results.



28

Complexity on the agent’s side: Now we need to calculate the complexity of the agents

instead: K(π)

We can parametrise a class of agents depending on their complexity.

From here, we can calculate the distribution (and the maximum) of expected aggregated rewards for each complexity k:

We can plot these functions of k.



29

AnGood simple

policies aboundGood policies are rare and

complex



30

Can we derive some numerical indicators?

We may also derive some other statistical indicators for discrimination (sparseness). We do not want environments which are easy or difficult

independently of what policy we use (all the agents score similarly).



31

Can these plots and indicators be estimated? Instead of a difficult upper bound which requires calculating

the complexity of the environment and its components

We need to calculate the complexity of an agent in a sample:

K(π1), K(π2), …, K(πm) Where m is usually large (much larger than n). And let them interact, always using the same role i (1 ≤ i

≤ n). All this is consistent with (and gives further justification to)

our previous search for minimalistic environments and agents.

6. CONCLUSIONS


32

The inclusion of many agents in an environment makes environments more unpredictable (as expected). Also much more difficult to analyse in terms of difficulty and discriminative power.

Calculating the complexity of the environment is no longer a good approach for estimating difficulty, especially because the value becomes very large when other agents abound (a very loose upper bound)

Instead, we evaluate the environments as how a distribution of policies/agents work on them. For the approximation of environment difficulty we need:Minimalistic agent and environment descriptions.Graphical and statistical summarisation of agent behaviour.

6. CONCLUSIONS


33

Social behaviour (even a primitive one) is not just the inclusion of other agents.These agents must play a role.With this approach, we do not completely discard that other optimal, but non-social, solutions may exist for some multi-agent environments, but we can have more control.

Experimentation on the current configuration will surely detect flaws and will trigger improvements.

There are dozens of similar settings in multi-agent systems, artificial life, cognitive models, etc.A complete knowledge and analysis of all this is impossibleWe are open to suggestions about how ideas from those areas can be useful here (spaces, reward generation, agent description language, …).

THANK YOU!


34

Most especially to the other members of the anYnt project:

http://users.dsic.upv.es/proy/anynt/

for their joint work, ideas, material, software, experiments, patience and support: David L. Dowe, Monash, Computer Science and Software Engineering Dept, Monash,

Australia

M.Victoria Hernandez-Lloreda, Dpto. de Metodología de las Ciencias del Comportamiento, UAM, Spain

Sergio España. DSIC, UPV, Spain.

Documents

Interaction settings for measuring (social) intelligence in multi-agent systems Javier Insa-Cabrera, José Hernández-Orallo Dep. de Sistemes Informàtics