Download pdf - DiProdiInfoSwarm4

Transcript
Page 1: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 1/8

In social systems anticipatory information is reduced

to produce sub-systems

Anonymous Author(s)Affiliation

Addressemail

Abstract

We propose the application of an anticipatory information that applied to socialsystems proves that social differentiation as Luhmann suggested, is achieved byreducing the amount of sensory information thus reducing the complexity of theperceived environment from the agent’s perspective. This anticipatory informationcan be measured for adaptive controllers using hebbian learning, input correlationlearning (ICO/ISO) and temporal difference learning.

1 Introduction

Information measures are usually defined on input/output systems where they determine the qualityof the transmission. Behaving agents, however, act as closed loop systems where there is no clearlydefined difference between input and output. What does matter for the organism is to compensate

disturbances introduced by the environment in the perception action loop. If there is no disturbancethe organism cannot differentiate between “himself” and the environment. Consequently[3], theconcept of information in these systems needs to be revised.

A way of defining closed loop information has been proposed by Ashby as the so called requisitevariety principle. The idea behind that measure is that closed loop systems aim to maintain a desiredstate. Such state is measured at the input of the controller or the organism. The goal of a feedback loop is then to minimise the deviation from the desired state. In order to quantify the informationin such a closed loop system, the measure of “requisite variety” was introduced by Ashby[1] in theearly 50s. This measure defines the number of bits to successfully compensate a disturbance actingon the feedback loop. In this way it quantifies the variety or bits originating from the disturbance.For example if the disturbance has a variety of 10 bits and the survival requires a desired state of 2bits, then each reaction to the disturbance must provide a variety of at least 8 bits.

While the requisite variety is a good measure to describe pure feedback systems, closed loop mea-

sures for adaptive systems are rare. The first adaptations of the Shannon’s communication theory toclosed loop systems were introduced by Touchette et al.[13] with the analysis of the controllabilityand observability, Thisby et al.[12] with the bottleneck principle that relies on the relevant informa-tion for an agent to predict or compress an information signal, nevertheless they did not considerthe most important case: learning on top of a feedback loop. A reactive system represented by afeedback loop can be seen as non optimal because it only reacts after a deviation from its desiredstate has happend: the controller does not exploit the causality to predict future events. Closed loopcontrollers are exposed to structured sensory information [9], thus bio inspired controllers can beprovided with a predictive signal (like vision) and a reflexive signal (like touch). Learning has thenthe task to avoid the trigger of the reflexive reaction. A useful information measure in this contextis one which detects which information is gained  during learning to successfully avoid the triggerof the reflex. In this paper we will present an information measure -called anticipatory information-which can be applied to any error-signal driven learning system which aims to calculate predictions

1

Page 2: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 2/8

in order to minimise this error signal. Here, we will use a closed loop correlation based learning[10].It correlates predictive signals with a reflexive signals. The weight update rule is such that the con-troller learns to exploit the causal relation between the predictive signal and the error signal. Ourclosed loop information will then be applied to a social system of agents. Social systems are systemswhere agents interact which each other and change their behaviour adaptively. Luhmann proposed

that differentiation of social systems is achieved through the formation of subsystems which nolonger process all information available from the environment but process only a fraction of it. Interms of our anticipatory information that means that as soon as the agents differentiate themselvesinto subsystems they will use less predictive information. The paper is divided in methods wherewe explain the agent configuration and the formula to compute the anticipatory information, an ap-plication of the information to adaptive obstacle avoidance, an application of the information to asocial system and discussion.

2 Predictive learning

For our simulations we are using Braitenberg[4] vehicles that have 2 lateral wheels and 2 antennas(one left, one right). The antennas are able to discriminate between different objects such as walls,agents and food places and generate delta pulses when in contact with them. The next section

describes how the controller learns.

2.1 Learning rule

The difference between the left and right far antennas provides x1 and the difference between theleft and right short antennas provides x0. The band pass filters in Fig.1 generate u0, u1 that aredamped waves if x0, x1 are delta pulses. The transfer function of the band pass filter is specified inthe Laplace-domain as:

h(t) ↔ H (s) =1

(s + p)(s + p∗)(1)

h(t) =1

beatsin(bt)., a = −π

q., b =

 (2πf )2 − a2 (2)

where p represents the complex conjugate of the pole p = a + ib, f  is the oscillation frequency andq is the quality factor of the filter. ICO correlates the predictive signal u1 with the reflexive signalu0 according to the formula:

dω1

dt= µ · u1 ·

du0

dt(3)

Then the output z of the controller is used to control the steering angle of the robot such that anobstacle on the left u0 > 0 will produce an anticlockwise turn, whereas an obstacle on the rightu0 < 0 will produce a clockwise turn. The controller learns to avoid the error signal u0 using thepredictive signal u1. Fig.1(A) illustrates how the learning is achieved and (B) describes how theagents interact with the world. A purely reactive agent has only a reflexive behaviour via u0 and willnever learn to avoid the loop error signal u0: it will touch the obstacle and produce a trajectorieslike (1). When the agent starts to learn (ω1 > 0) it will use the u1 to prevent u0, thus avoiding theobstacle before touching it like the trajectories in (2). Fig.1(C) shows how the reflex signal is shifted

forward in time and reduced in amplitude due to the anticipatory motor reaction of the controller.

2.2 Anticipatory information

We are looking for an information measure which grows when the agent is using anticipatory infor-mation and is zero when the agent is not able to predict its reflex. The agent is not learning whenan u1 impulse is followed always by an u0 impulse with the same amplitude, while a successfullylearning agent is reducing significantly the amplitude of  u0. An ideal learner will be able to reducetotally the u0 to 0. Fig.2 shows a typical temporal diagram of events for a non learning agent Aand for a learning one B. If one takes the cross correlation between the u1 and u0 for both cases,it is possible to understand by the cross correlation which agent is learning and which one is not.For instance the non learning agent has a cross correlation that is not shifted in time and reduced inamplitude like in Fig.2(C), while a learning agent has a cross correlation that is shifted in time and

2

Page 3: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 3/8

Figure 1: A) Schematic diagram of the closed-loop learning system with inputs x0 and x1, synapticweights ω0 and ω1 and motor output z. P 0 and P 1 are the transfer functions of the reflexive andthe predictive pathway. BP block is a 2 pole band pass filter. B) Agent setup with short antennas(reflexive inputs, x0) and long antennas (predictive inputs, x1). The agent is learning to avoidobstacles and walls using its short and long antennas. The motor reaction will reduce the intensityof the painful reflex x0 as well as delay its occurrence. C) Schematic diagram of the input correlationlearning rule and the signal structure[10]. The u0 and u1 are respectively the difference between thefiltered values of the left and right antennas of the agent. During learning the u0 peak will be shiftedin time and reduced in amplitude.

Figure 2: A) Illustration of the signals u1, u0 of a non learning agent. The peaks are periodic but off phase and with same amplitude. B) Same time diagram for a learning agent. C) Cross correlation of u1 and u0 for the non learning case. D) Cross correlation of  u1 and u0 for the learning case..

3

Page 4: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 4/8

reduced in amplitude like in Fig.2(D). How do we quantify this reduction in term of bits? ThereforeAI  is computed in 2 steps:

cc(t) = max(

τ =W 

2

τ =−

2

u1(t) · u0(t + τ )) (4)

AI (t) = −log2(cc(t)

cc(0)) (5)

0 ≤ cc(t)

cc(0)≤ 1 (6)

cc(t) is the maximum of the cross correlation between the error signal and the predictive signal inthe time window W  that must be sufficiently large to take at least a pairing of  u1 with u0 whenlearning is off. cc(0) is the maximum of the cross correlation computed when the agent is notlearning, whereas AI (t) takes the ratio between the current and the initial cross correlation, thus theargument of the logarithm will range from 0 to 1 because the correlations following the first onecan at least be equal to the first one. When learning is off, the agent’s predictive signal precedes thereflex signal whose amplitude is not reduced hence AI ≃ 0, when learning is on the agent learns to

reduce the error u0, using an earlier motor reaction elicited by u1, thus the AI ≫ 0 in the ideal caseof perfect learning. In terms of information if the agent is learning continuously the number of bitsof the AI  will increase in time until the agent has completely avoided the reflex. In the next sectionwe do a benchmark of AI  for a classical task such as obstacle avoidance. We then compare the AI for the learning and non learning case and see the difference in term of information reduction.

3 Avoidance case application

In this simple case there are 2 agents navigating in a rectangular space with a number of 2 obstaclesthat are placed randomly in the environment (see Appendix6). The software used to simulate theagents is Enki1 an open source simulator for multiple robots interacting on a flat surface. Thesimulator implements collisions, physics support (like slip, friction etc..) and features 4 realisticrobots. For our simulations we used a group of Alice robots. Every simulation is run for T  = 30

minutes with a time step of ∆ = 0.01s, the window for the correlation W  = 10 seconds, thereforecc(t) is computed for t = 1, 2, ..., 180. Anticipatory information is averaged over 100 simulations:every simulation is randomised in the robots and obstacles initial positions. In the first time windowthe learning of the agents is switched off and cc(0) ≃ 0.7071 because the reflex of the agent isalready preventing a full force impact, whereas for cc(t > 0) ≤ 0.7071 because the impact of thecollision can only be equal or decreased. Fig.3(B) shows that AI increases and stabilises to 6 bitswhen agents have learned successfully: they are using 6 bits in the predictive loop to reduce thereflex. Instead non learning agents are not using any bit to reduce the reflex. The noise visible inFig.3 is due to the random repositioning of the obstacles and random collisions. In a perfect worldif agents are able to avoid perfectly cc(t) ≃ 0 and so AI (t) ≫ 0.

4 Social system application

We apply now the measure to a social system as the one described by DiProdi et al.[8]: a socialsystem whose task is cooperative food foraging. As for the avoidance case the agents learn how touse the distal sensors to approach food (to increase their energy) or other agents (to get their energy).Agents can forage directly from the food patches (see Fig.4(B)) or reduce the energy of other agentswho have previously got food (see Fig.4(D)) . Thus every agent has two competitive signals: onefrom the food patches and one indicating the energy level of the other agents. Indeed when antennasare in contact with another agent with high energy they produce impulses on x1,e for far contacts andon x0,e for near contacts, whereas when they are in contact with a food patch they produce impulseson x1,f  for far contacts and on x0,f  for near contacts. Therefore the agent has 2 learning weightsω1,e for energy and ω1,f  for food that contribute both to the motor output. When the simulationstarts all agents have ω1,e = ω1,f  and therefore they approach any object because according to the

1http://home.gna.org/enki/ 

4

Page 5: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 5/8

Figure 3: A) 2 agents are learning to avoid obstacles in a closed rectangular world. At the veryleft the agent is only reacting to the obstacle thus AI  ≃ 0, at the very right the agent has learntsuccessfully how to exploit the predictive signal u1 to avoid the contact thus AI > 0. B) Anticipa-tory information computed for 2 learning agents. When learning is stable it reaches a baseline level,AI (t) ≃ 6 bits. If agents are not learning the anticipatory information is low AI (t) ≃ 0.5 bits,becaue cc(0) ≃ 0.7071

5

Page 6: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 6/8

situation they will choose for the food or for the nearby agent. Nevertheless during the simulationsome agents (the seekers) will become more attracted by the food ω1,e < ω1,f , while the others(the parasites) will become more attracted by the other agents with energy ω1,e > ω1,f . An agentchanges class or behaviour when the weights are swapped i.e. a seeker ω1,e < ω1,f  becomes aparasite ω1,e > ω1,f  (or viceversa) thus contributing to the system instability. The bar diagram of 

Fig.4(E) shows for every time window how many times this swap has happened. The system self-stabilises to a number of seekers and paraties whose number depends on the available resources[8]:in this case with 4 food resources there are 6 parasites and 4 seekers. With more resources like 10there will be 6 seekers and 4 parasites. Luhmann theorised that sub-systems are formed to reducethe complexity of the perceived environment: in this case means that agents are discarding partof the closed loop information. This process is shown in Fig.4 by computing AI  for the energyand for the food signal. The AI (u1,e, u0,e) = AI energy represents the anticipatory informationfor the energy attraction and the AI (u1,f , u0,f ) = AI food for the food attraction. For seekersthe AI food > AI energy increasingly as the system differentiates and conversely for the parasitesAI energy > AI food. In term of information it means that seekers are using AI food −AI energy = 2bits more to reduce the energy signal, whereas parasites are using AI energy−AI food = 2 bits moreto reduce the food.

Figure 4: B) A parasite is an agent that prefers an energy signal to a food one. D) A seeker is anagent that prefers a food signal to an energy one produced by another agent. B) For parasites theanticipatory information for food is less to that one of the energy. C) For seekers the anticipatoryinformation for energy is less to that one of the food. E) There are a total of N  = 10 agents of whom6 turn into parasites and 4 turn into seekers after 20 time steps. The system stabilisation is a functionof time: at the very beginning agents are using both signals and their behaviour is unpredictablebecause they are switching between the 2 competitive behaviours. After 20 time windows the agentshave a more predictable behaviour resulting by the selection of the information. Minor oscillationsare again due to the noise resulting from the interaction between learning agents.

6

Page 7: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 7/8

5 Discussion

In this paper we presented a novel closed loop predictive information to demonstrate the formationof subsystems. This means in brief that the agents use less information to successfully minimise theirerror signals than in the case of no subsystems. Luhmann theorised that sub-systems are formed to

reduce the perceived complexity of the environment: here agents can discard either the food signal orthe energy signal. Indeed, we found different AIs for the 2 different signals: for the food searchersthe AI mainly comes from the sensors which sense the food whereas the parasites’ AI is mainlyabout the food signals coming from the other agents. Thus, we conclude that predictive learning ina social context leads to the formation of subsystems which could be demonstrated with the help of AI. The AI measure is unique for the following reasons:

• it can be applied to social systems

• it is not depended on the learning algorithm

• it is a natural continuation of Ashby’s requisite variety

As a comparative summary we can mention the previous works done in the closed loop case byPolani, Lungarella and Ay that are using information theoretic cost functions to optimise the agent’sbehaviour:

• Polani[5] evolves controllers to maximise the information transfer of the sensory-motorloop and discover that to use memory efficiently they do compression.

• Lungarella[9] uses mutual information to generate information structure by motor feedback 

• Ay[2] maximises the excess entropy (the mutual information between past and present)of the agent’s input changing the controller’s parameter to achieve a working regime (ex-ploratory and sensitive to the environment) for the robot

There is a substantial difference between the afore mentioned approaches and ours: we use a corre-lation based measure to calculate the learning ability of a general adaptive controller while the othersuse statistical measures such as probabilities as a learning signal for their controllers. Indeed ourapproach does not suffer of the problem of the large sampling time required to estimate probabilitiesfor entropy and is easy to compute because the only parameter to choose is the time window’s length.

There is a similarity of results with a recent experiment made by Lungarella et al.[7] where he com-puted entropy measures on a saliency-driven attention task, where a camera foveate red blocks. Theentropy for the foveation case is less than the random case: this means that a closed loop systeminduces statistical regularities in the information flow. Our results coming from the social systemapplication yields a similar concept: agents regularise their input by selecting information which inturns affect the motor behaviour. There is a mutual relation between perception, information andaction: agents select the information which in turns change their behaviour and their predictability.Indeed in Fig.4(D) the system is unstable when every agent is using all the information and thusproducing a non predictable behaviour, but when agents start to select the relevant information theysimplify their behaviour and re-inforce intrinsically the stability of the system since agents mutuallybenefit of the increased predictability. Another similar result was produced by Kulvicius et al.[6]where they correlate the peak location of the cross correlation τ  with the weight change developmentand shows that for more complex environmentsτ  has a larger deviationcompared to the case of moresimple environments. Anticipatory information can be applied also to reinforcement learning: tem-

poral difference[11] uses incremental error correction to update a prediction function topredict future values, if ones apply the AI between the actual and future values the agent is learningwhen the error signal is reduced thus the number of bits are increased. In future work we would liketo find an analytical model of social systems to verify what information approach is more suitableto understand how information is reduced to form sub-systems.

6 Appendix A

There are N  = 2, 4 and M  = 2 food places. The agent’s area Aagent = 5U 2, where U is the unit

of measure used in the simulations. Food place’s area Afood = 28.36U 2. The area of the world isproportional to the areas of agents and food disks, and on their numbers.

Aworld = (K 1 ·Afood · M ) + (K 2 ·Aagent · N ) + (K 2 ·Aagent · N ) (7)

7

Page 8: DiProdiInfoSwarm4

8/6/2019 DiProdiInfoSwarm4

http://slidepdf.com/reader/full/diprodiinfoswarm4 8/8

where K o = 50, K a = 60. The world is a square with a side of L = 2√

Aworld. For N  = 2 andM  = 2: Aworld = 3436U 2 and L = 58.617.

References

[1] W. Ashby. Design for a Brain. Chapman & Hall, 1952.[2] N. Ay, N. Bertschinger, R. Der, F. Guttler, and E. Olbrich. Predictive information and explo-

rative behavior of autonomous robots. The European Physical Journal B, 63(3):329–339, jun2008.

[3] Florentin Worgotter Bernd Porr. Inside embodiment what means embodiment to radical con-structivists? Kybernetes, pages 105–117, 2005.

[4] V. Braitenberg. Vehicles: Experiments in synthetic psychology. MIT Press, 1984.

[5] A.S. Klyubin, D. Polani, and C.L. Nehaniv. Organization of the information flow in theperception-action loop of evolved agents. pages 177–180, June 2004.

[6] Tamosiunaite M. Kulvicius T., Kolodziejski C. Behavioral analysis of differential hebbianlearning in closed-loop systems. Submitted to Neural Computation, 2009.

[7] Bulwinkle Lungarella, Pegors. Methods for quantifying the informational structure of sensoryand motor data. Neuroinformatics, pages 1539–2791, 2005.

[8] Di Prodi Paolo, Bernd Porr, and Florentin Worgotter. Adaptive communication promotes sub-system formation in a multi agent system with limited resources. LAB-RS ’08: Proceedings of the 2008 ECSIS Symposium on Learning and Adaptive Behaviors for Robotic Systems , pages89–96, 2008.

[9] Sporns O. Pfeifer R., Lungarella M. and Kuniyoshi Y. On the information theoretic impli-cations of embodiment - principles and methods. Proc. of the 50th Anniversary Summit of 

 Artificial Intelligence, pages 76–86, 2008.

[10] Bernd Porr and Florentin Worgotter. Strongly improved stability and faster convergenceof temporal sequence learning by utilising input correlations only. Neural Computation,18(6):1380–1412, 2006.

[11] A.G. Sutton R.S., Barto. Reinforcement learning: An introduction. MIT Press, 1998.

[12] Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method,April 2000.

[13] Hugo Touchette and Seth Lloyd. Information-theoretic limits of control. Phys. Rev. Lett.,84(6):1156–1159, Feb 2000.

8