18
Available online at http://www.idealibrary.com on doi:10.1006/bulm.2000.0207 Bulletin of Mathematical Biology (2001) 63, 117–134 A Model for the Interaction of Learning and Evolution H. DOPAZO *, M. B. GORDON , R. PERAZZO AND S. RISAU-GUSMAN †‡ Centro de Estudios Avanzados, Universidad de Buenos Aires, Uriburu 950, 1114 Buenos Aires, Argentina E-mail: [email protected] CEA Grenoble, epartement de Recherche Fondamentale sur la Mati` ere Condens´ ee, 17, rue des Martyrs, 38054 Grenoble Cedex 9, France We present a simple model in order to discuss the interaction of the genetic and be- havioral systems throughout evolution. This considers a set of adaptive perceptrons in which some of their synapses can be updated through a learning process. This framework provides an extension of the well-known Hinton and Nowlan model by blending together some learning capability and other (rigid) genetic effects that contribute to the fitness. We find a halting effect in the evolutionary dynamics, in which the transcription of environmental data into genetic information is hindered by learning, instead of stimulated as is usually understood by the so-called Baldwin effect. The present results are discussed and compared with those reported in the literature. An interpretation is provided of the halting effect. c 2001 Society for Mathematical Biology 1. I NTRODUCTION The interaction between the adaptive abilities of organisms and their evolution has been a matter of discussion ever since Lamarck suggested that acquired char- acters could be inherited. At the end of the last century, Baldwin (1896) addressed the problem of the interaction of learning and evolution. His basic suggestion is that what had to be learned in previous generations appears genetically encoded in later ones. Although this process suggests a Lamarckian mechanism of evolution, Baldwin emphasized that this can be achieved within a pure Darwinian framework. * Author to whom correspondence should be addressed. Now affiliated with Lab. Biolog´ ıa del Comportamiento, IByME, Vuelta de Obligado 2490, (1428) Buenos Aires, Argentina. 0092-8240/01/010117 + 18 $35.00/0 c 2001 Society for Mathematical Biology

A Model for the Interaction of Learning and Evolution

Embed Size (px)

Citation preview

Page 1: A Model for the Interaction of Learning and Evolution

Available online at http://www.idealibrary.com ondoi:10.1006/bulm.2000.0207Bulletin of Mathematical Biology(2001)63, 117–134

A Model for the Interaction of Learning and Evolution

H. DOPAZO∗†, M. B. GORDON‡, R. PERAZZO† ANDS. RISAU-GUSMAN†‡

†Centro de Estudios Avanzados,Universidad de Buenos Aires,Uriburu 950,1114 Buenos Aires,ArgentinaE-mail: [email protected]‡CEA Grenoble,Departement de Recherche Fondamentale sur la Matiere Condensee,17, rue des Martyrs,38054 Grenoble Cedex 9,France

We present a simple model in order to discuss the interaction of the genetic and be-havioral systems throughout evolution. This considers a set of adaptive perceptronsin which some of their synapses can be updated through a learning process. Thisframework provides an extension of the well-known Hinton and Nowlan modelby blending together some learning capability and other (rigid) genetic effects thatcontribute to the fitness. We find a halting effect in the evolutionary dynamics, inwhich the transcription of environmental data into genetic information is hinderedby learning, instead of stimulated as is usually understood by the so-called Baldwineffect. The present results are discussed and compared with those reported in theliterature. An interpretation is provided of the halting effect.

c© 2001 Society for Mathematical Biology

1. INTRODUCTION

The interaction between the adaptive abilities of organisms and their evolutionhas been a matter of discussion ever since Lamarck suggested that acquired char-acters could be inherited. At the end of the last century,Baldwin (1896) addressedthe problem of the interaction of learning and evolution. His basic suggestion isthat what had to be learned in previous generations appears genetically encoded inlater ones. Although this process suggests a Lamarckian mechanism of evolution,Baldwin emphasized that this can be achieved within a pure Darwinian framework.∗Author to whom correspondence should be addressed.

Now affiliated with Lab. Biologıa del Comportamiento, IByME, Vuelta de Obligado 2490, (1428)Buenos Aires, Argentina.

0092-8240/01/010117 + 18 $35.00/0 c© 2001 Society for Mathematical Biology

Page 2: A Model for the Interaction of Learning and Evolution

118 H. Dopazoet al.

More recently,Waddington (1942) showed that a character whose developmentoriginally depends on an environmental stimulus, becomes genetically fixed andindependent of it. Waddington called this processgenetic assimilation, suggestingthat natural selection favors those genetic combinations that most readily respondto the environmental stimulus. Further experiments (Ho et al., 1983; Scharloo,1991) confirmed his results and interpretations.

The Baldwin effect and the genetic assimilation share a common Darwinianmechanism with similar consequences on the evolutionary process. This was sum-marized byMaynard Smith (1987): ‘If individuals vary genetically in their capac-ity to learn, or to adapt developmentally, then those most able to adapt will leavemore descendants, and the genes responsible will increase in frequency. In a fixedenvironment, when the best thing to learn remains constant, this can lead to thegenetic determination of a character that, in earlier generations, had to be acquiredafresh each generation’. More recently,Jablonka and Lamb (1995) summarizedthese ideas by defining the Baldwin effect as that ‘seen when the environmental in-duction of a physiological or behavioral adaptation allows a population to survivelong enough for the accumulation by selection of similarconstitutivehereditarychanges’.

Hinton and Nowlan (1987) (H&N) wrote a seminal paper to provide a theoret-ical framework for the discussion of the Baldwin effect. They considered a pop-ulation of haploid individuals, each one having a neural network withL potentialconnections defined by three allelic forms. Alleles1 (−1) hereafter called ‘fixed’represents that the connection is present (absent). An? allele generates instead aflexible connection, that can be adapted during the life of the individual througha random search that emulates a learning process. Fitness depends on the effec-tiveness of such a learning process and is therefore nothing but a (probabilistic)measure of the distance of the individual’s genotype to the optimum.

The H&N model was simulated using genetic algorithms (Goldberg, 1989), inwhich mutation and cross-over were included. The results show that natural se-lection rapidly eliminates the wrong alleles−1 and gradually replaces the plastic? alleles by optimal values1. It is further found that the evolution towards a pop-ulation composed of optimal individuals is greatly accelerated with respect to theone in which plastic alleles are absent. These results were confirmed numerically,and thoroughly discussed byFontanari and Meir (1990), Belew (1990), Ackleyand Littmann (1992), Harvey (1993) andFrench and Messinger (1994). Since nomechanism is explicitly introduced to fix in the gametes the result of learning, it isconcluded that Baldwin’s conjecture actually takes place.

The H&N model provides a framework to account for the interaction of the ge-netic and behavioral systems throughout the evolutionary process. In particular,it shows that the evolution towards the optimal genotype is greatly accelerated bylearning. It is, however, easy to overstate the conclusions that can be extractedfrom this model. One should note that this accelerationwith respect to a purerandom searchshould not be considered a surprising result. It is instead a natural

Page 3: A Model for the Interaction of Learning and Evolution

Learning and Evolution 119

consequence of the Darwinian selection allowed through the explicit introductionof a fitness function. As the only behavioral feature that confers fitness is the indi-viduals’ learning ability, this model cannot address the relevant question whetherlearning accelerates the evolution with respect to other evolutionary paths; in par-ticular, those followed by populations with phenotypic features expressing fixedgenetic information.

The purpose of the present paper is to go one step further than the H&N modelby endowing the individuals with a richer genotype-phenotype structure. We aimat blending into a single pictureboth some learning capabilityand other geneticepistatic effects that contribute to the selection process in the absence of geneticplasticity.

Within the present wider framework the evolutionary paths result from the con-tributions to the fitness of fixed and flexible genetic information. This allows usto render visible an effect that has been suggested byMitchell (1996): ‘[...] if thelearning procedure were too sophisticated [...] there would be little selection pres-sure for evolution to move from the ability to learn the trait to the genetic encodingof that trait’. This hindrance (instead of a stimulation) of the transcription of theenvironmental data falls beyond the H&N model. In fact, the latter corresponds tothe very particular situation in which genetic plasticity constitutes the only (nar-row) road to reach the optimum. A second but equally plausible particular caseis the trivial situation of a population of (‘rigid’) individuals that obtains fitnesssimply through the (Hamming) closeness to the optimal genetic information. Theconsequence of endowing the individuals with both fixed and adaptable features isthat evolution can proceed along less restricted paths in the fitness landscape.

In Section2 we present the model. It has the following properties: firstly, eachindividual is endowed with a phenotype that processes data from the environmentin order to provide a response to it. Secondly, the reproductive success of eachindividual is defined in terms of the performance of the phenotype, arising from itsprocessing ability. Thirdly, phenotypes may also undergo a learning process thattakes advantage of some genetic plasticity and that is regulated by their processingability. In Section3we discuss two extreme particular cases of the model within thesame mean field analytic approximation used byFontanari and Meir (1990). Theresults are presented in Section4 and the final discussions are left to Section5.

2. THE M ODEL

2.1. The population. We consider a haploid population in which each individ-ual is represented by a neural network. We choose the simplest kind of net, havingL input ports that are assumed to receive data from the environment and a sin-gle output neuron that is assumed to provide the response of the individual. Thistype of network is called a perceptron (Minsky and Papert, 1969). The connec-tions between the input portsk = 1,2, . . . , L and the output neuron have synaptic

Page 4: A Model for the Interaction of Learning and Evolution

120 H. Dopazoet al.

efficaciesw ≡ (w1, w2, . . . , wL). We restrict our model by considering ‘Ising per-ceptrons’ in whichwk = ±1. This assumption does not introduce any essentiallimitation, but provides a convenient simplification of the algebra.

The inputs of the network that convey data from the environment are representedby pattern vectorsξ ≡ (ξ1, ξ2, . . . , ξL). We also assume thatξk = ±1 (k =1,2, . . . , L). The response of each individual to the input data is determined bythe signal produced by the output neuron. This is taken to be the sign of the sumof all the inputs weighted by the synaptic efficacies, namely:

σ = sign(w · ξ). (1)

From equation (1) it follows that perceptrons are only able to classify successfullysets of input vectors that can be separated by a single hyperplane, normal tow,in the input space ofL dimensions. The two possible values of the classificationσ = ±1 may be considered as two different behavioral responses.

In order to obtain an intuitive picture of this model, one could think that eachcomponent of the input vectorsξ encodes the presence (ξk = +1) or the absence(ξk = −1) of some attribute of, say, a fruit (i.e., a particular color, smell, texture,etc.). The perceptrons are asked to choose which fruits can be eaten and whichshould be discarded. We assume that there is an optimal classification of fruits intothose that are edible and those that are not. We further assume that such optimalclassification is provided by a ‘reference or teacher perceptron’† that is assumedto remain unchanged throughout the evolutionary process. Those individuals thathave many synaptic connections different from those of the reference will makemany mistakes in the classification task. Conversely, those whose synaptic connec-tions are similar to the ones of the reference will be more successful in choosingfood‡.

In order to study the evolutionary dynamics using genetic algorithms we assumethat each individual is specified by a genotype consisting of a string ofL loci, inwhich the synaptic connections are encoded. In order to endow the perceptronswith genetic plasticity we assume, in the same fashion as in the H&N model, thateach locus of the genotype can be occupied by one of the three possible alleles1,−1 and?. The two first alternatives correspond to fixed synaptic connections whilethe last represents a flexible link that is allowed to change during the lifetime ofthe individual, acquiring the values±1 and in accordance to some (fixed) updatingalgorithm. Each genotype is therefore characterized by the three integersP, R and

†The name ‘teacher perceptron’ is usual in the neural networks literature. Here we prefer ‘reference’because in the present context ‘teacher’ may convey the somewhat inappropriate idea that a culturaltransmission is taking place.

‡The reason for choosing a perceptron as a reference to provide an optimal classification is only amatter of mathematical convenience. Resorting to this artifice one can be sure that there is a uniqueoptimal genetic configuration. If a classification surface different from a hyperplane were chosen, itis in general possible to find several optimal perceptrons.

Page 5: A Model for the Interaction of Learning and Evolution

Learning and Evolution 121

Q (0≤ P, Q, R≤ L, with P+Q+R= L) that, respectively, specify the numberof 1,−1 and? alleles.

The phenotypeof each individual is the result of the interaction of the geneticinformation with a set ofM environmental stimuli{ξµ}µ=1,...,M . The behavioralresponse is the outputσ = ±1 classifying the input stimulus into one of the twopossible categories. Individuals are also assumed to undergo an updating process ofthe Q flexiblealleles according to a (fixed) algorithm that we will explain shortly.

This updating process could be regarded as a learning protocol. Think for in-stance of the pleasant or unpleasant taste of some food that has been eaten. As aconsequence, some adjustments of the flexible synapses may be performed to im-prove the classification scheme before the next stimulus from the environment isreceived. Within our model this feedback loop arises from the comparison of theclassification performed by the individual and the one provided by the ‘reference’perceptron.

To update the synaptic efficacies we use a simple version of the ‘on line’ learningalgorithms extensively studied in the neural networks literature [see, e.g.,Hertzetal. (1991)]. The procedure is as follows:

Learning protocol.

• If the classification of the input pattern is correct, no changes are produced.• Else, determine the minimum number of synapsesnmin that have to change

in order to flip the sign of the output signal.• The updating consists in flipping eithernmin synapses at random among the

Q flexible ones, or all of them ifnmin > Q.

This learning protocole is repeated each time an example is presented.

2.2. The fitness function. To put the above picture into a more formal languagewe assume that the reference perceptron has synaptic connectionsw∗. The ‘correct’classifications of the input patternsξµ ≡ (ξ

µ

1 , ξµ

2 , . . . , ξµ

L ) are thereforeσµ∗ =sign(w∗ · ξµ) and the optimal genetic information isw = w∗.

The interaction of each individual with the environment is represented by thefollowing process:

(1) Flexible synapses are randomly assigned to±1, so that the individuals’weights arew, and we initializeFM = 0.

(2) Forµ = 1 to M :

• An input patternξµ ≡ (ξµ

1 , ξµ

2 , . . . , ξµ

L ) is drawn at random, and theoutputσµ = sign(w · ξµ) is compared to the reference class,σµ∗ =

sign(w∗ · ξµ).• If σµ 6= σµ∗ the flexible synapses are updated following the learning

protocole described above.• Else, we incrementFM : FM = FM + 1.

Page 6: A Model for the Interaction of Learning and Evolution

122 H. Dopazoet al.

(3) The fraction of patterns that the individual did not need to learn isfM =

FM/M .

We define the performance of each individual by the fractionfM of examplesthat the individual did not need to learn. Note, however, that the classificationsperformed last are more likely to be correct due to the learning of the previouslypresented patterns.

We assume that the reproductive success of an individual is proportional to itsperformance. We scale it in such a way that individuals with the highest possibleperformancefM = 1 have a reproductive success8 = L. On the other hand,individuals that miss all the classifications are assumed to have8( fM = 0) = 1.We thus define

8( fM) = 1+ (L − 1) fM . (2)

If M = 1 the adaptive effects stem from the fact that the initial random guessof the synaptic efficacies corresponding to the? alleles may be favorable.fM in-creases slowly withM because it always bears the cost of all the misclassificationsperformed during the early stages of the interaction with the environment. As thenumber of examples needed to learnQ synapses is proportional toQ, (2) penal-izes individuals with largeQ, as they need more examples to fix the correspondingsynapses. Except for this effect no explicit cost is assumed for the learning process.Note that an individual can have a sizable success having only fixed synapses. Infact, an individual with a rigid genotype with few alleles wrong can outperformanother with many flexible synapses that are hard to fix into the right values usingthe learning protocole.

Individuals leave descendants to the next generation with a probability that isproportional to8. Selection operates because poorly performing individuals are ina worse situation to leave descendants than those that make less mistakes.

In order to model the evolutionary process we ask each generation to face a newset of M randomly chosen environmental stimuli, keeping unchanged the refer-ence perceptron. The model is completed allowing for the usual operations of thegenetic algorithm. Mutations are applied to the selected genotypes by inspectingsequentially all the genome’s loci and changing at random the corresponding al-leles with a prescribed probabilityPmut. Recombination can then be taken intoconsideration. This is an important source of diversity and accelerates significantlythe evolutionary process. This process, however, adds no new conceptual ingredi-ents to the evolutionary dynamics in the present problem and therefore will not beincluded.

The evolutionary process may be described by the fraction of individuals corre-sponding to each genotype composition(P, Q, R) at each generation. In the caseof very large populations, the histogram is centered on the average value. The lat-ter approaches the optimum genotype by mutations and selection. In the following

Page 7: A Model for the Interaction of Learning and Evolution

Learning and Evolution 123

we assume that the population is sufficiently large, and we describe its evolutionby that of the alleles’ frequencies,p = 〈P〉/L, q = 〈Q〉/L and r = 〈R〉/L,where〈A〉 is the average number of allelesA (A ∈ {1, ?, −1}) in the population.Within this approximation, the evolutionary process corresponds to a walk in thethree-dimensional space spanned byp, q andr . SinceP + Q + R = L for eachindividual, the evolutionary path is confined to the trianglep+ q + r = 1, p ≥ 0,q ≥ 0, r ≥ 0. To make the visualization easier, we project the paths onto theq, p plane below the linep+ q = 1, limited by p ≥ 0, q ≥ 0. The axisq = 0corresponds to a population composed of individuals having only rigid genotypeswhile the linep+ q = 1 corresponds to one without ‘wrong’ connections.

The evolutionary processes can be regarded as a climb of the fitness landscapedefined within that triangle in thep,q plane. To generate these landscapes, foreach point of the(p,q) plane an ensemble of random populations is generated,each one having the corresponding mean frequency of the1 and? alleles. Thereproductive success of each individual is evaluated and is next averaged over eachpopulation and over the ensemble. In the four panels of Fig.1 we show the fitnesslandscapes〈8〉(p,q) with learning schedules involving several values ofM . Wealso show the fitness landscape of the H&N model for comparison.

The H&N surface displays a localized region§ of high fitness on the linep+q =1 with a flat plateau〈8〉 = 1 outside it. This is associated to the fact that withinthe H&N model, individuals with−1 alleles have minimal reproductive success(8 = 1), and that theonly way of increasing it is through mutations transformingthe ‘wrong’ alleles into? or 1 alleles. Evolutionary paths with increasing fitnesscan therefore mainly take place within the limited subspacep + q = 1. Out-side, evolution can only proceed by a random search through mutations acting onindividuals having equal (low) reproductive rate.

A limiting case of our model may be obtained by excluding the? alleles andrestricting the mutations to changes of the1 into −1 alleles and vice versa. Be-cause of these restrictions imposed on the mutations, the evolutionary paths canonly take place along thep-axis. Contrary to the H&N model, individuals withsome−1 alleles may have a large reproductive rate and leave descendants becausea perceptron with some wrong weights may be able to classify successfully an ap-preciable fraction of the examples. This situation is discussed analytically in thenext section.

The fitness landscapes shown in the remaining panels of Fig.1 correspond to ourmodel. These are seen to have always gradual slopes due to the combined effectsof learning and the genetic epistatic effects of rigid alleles. These landscapes areseen to change with the number of training examplesM due to the longer learning

§Outside the lineP + Q = L the reproductive success of each individual is strictly equal to itsminimum,8 = 1. The landscape shown is not discontinuous because of the averaging proceduredescribed in the preceding paragraph. In a population with mean valuesp andq that are close to theline p+ q = 1, there is a sizable probability of finding individuals withP = L. These contribute tothe mean fitness producing a value of〈8〉 that is slightly larger than the minimum.

Page 8: A Model for the Interaction of Learning and Evolution

124 H. Dopazoet al.

(a)

(c) (d)

(b)

F F

p

q

p

q

p

q

p

q

F F

510

1520

510

1520

510

1520

1.00.8

0.60.4

0.20.0

1.00.8

0.60.4

0.20.0

1.00.8

0.60.4

0.20.0

1.00.8

0.60.4

0.20.0 1.0

0.60.40.20.0

0.8

1.00.60.40.2

0.0

0.8

1.00.60.40.2

0.0

0.8

510

1520

1.00.60.40.2

0.0

0.8

Figure 1. Fitness landscapes. (a) Landscape obtained for the H&N model. Landscapes forthe adaptive perceptron model: (b) no learning, (c) learning protocole withM = 100, and(d) with M = 500.

sessions that all the individuals of the population undergo.

3. ANALYTIC TREATMENT OF PARTICULAR CASES

Some particular cases of our model can be studied analytically. One correspondsto a population of rigid perceptrons that have no? alleles at all, so that learning isimpossible. Fitness is only acquired through ‘rigid’ epistatic effects. Another caseinvolves a population of adaptive perceptrons whose ability is tested with only onepattern; that is, withM = 1 in (2). Finally, we also analyse the extreme case ofperfect learning, in which the initial guess of the weights is the best possible onefor all the perceptrons in the population. In the following we compare the evolutionof these limiting cases of our model with the model of H&N. As we will shortlysee, both bear a great conceptual similarity.

We use the same analytical approach ofFontanari and Meir (1990), that considersan infinite¶ haploid, asexual population. At each generation, a new population isobtained by selection and random mutations. One can introduce the probabilitiespn, qn andrn of finding a given proportion of1, ? and−1, alleles respectively, inthenth generation. The fractionpn of ‘correct’ alleles1 may be written in termsof pn−1, qn−1 andrn−1, and of the individuals’ reproductive rates. This is a randomfunction of the patterns presented during the learning process. We assume thatthe reproductive rate of individuals only depends on their genotype composition,

¶Within this limit there is not genetic drift. Instead, this is present in the numerical simulations withfinite populations that are presented later.

Page 9: A Model for the Interaction of Learning and Evolution

Learning and Evolution 125

defined by the values ofP, Q andR, and that it is well described by8(P, Q), theaverage of8(P, Q) taken over the pattern distribution. Then:

pn = Pmut+1− d Pmut

L Zn−1

L∑P=1

L∑Q=L−P

P8(P, Q)L!

P!Q!R!pP

n−1qQn−1r

Rn−1, (3)

wherepn−1+qn−1+ rn−1 = 1, R= L − P− Q, Pmut is the mutation rate,d is thenumber of different alleles, andZn is a normalization constant given by

Zn =

L∑P=1

L∑Q=L−P

8(P, Q)L!

P!Q!R!pP

n qQn r R

n . (4)

Equations similar to (3) give the evolution ofqn andrn, the fractions of plastic and‘wrong’ alleles.

Equations (3) and (4) can be used recursively to generate the evolutionary processonce the average reproductive success is known. In our model,

8(P, Q) = 1+ (L − 1) fM(P, Q), (5)

where fM(P, Q) represents the average offM(P, Q) over the possible sequencesof M test patterns. This average can be determined analytically in the three partic-ular cases mentioned above.

Consider first the population of rigid perceptrons (Q = 0). To write down theaverage fitness we note that the probability that an individual makes a mistakein the classification of one randomly chosen example, i.e., the probability that itsoutput differs from that of the reference, is given by thegeneralization error. Thisis a random variable, that depends on the distribution of examples. If these areselected at random with uniform probability, its averageεg determines the fitnessthrough

fM = (1− εg)M . (6)

εg can be expressed in terms of the angle between the separating hyperplanes ofthe reference and the individual. More precisely, if the perceptron has weightsw,thenεg = arcos(w · w∗/|w||w∗|)/π . In the case of Ising perceptrons consideredhere, this can immediately be expressed in terms of the total number of alleles,L,and the numberP of correct connections.

εg(P) =1

πarcos

(2

P

L− 1

). (7)

The corresponding fitness function (6) is gradual in genotype space even if theindividuals have no plastic alleles. This is so because there is always some proba-bility that an individual with some wrong connections can properly classify some

Page 10: A Model for the Interaction of Learning and Evolution

126 H. Dopazoet al.

of the M environmental stimuli. In the limit of a large population,〈P〉/L ∼= p, sothat function (6) corresponds to the intersection of the surfaces shown in Fig.1(b),(c) and (d) with the planeq = 0.

Equations (3) and (4) are directly applicable to the case of rigid perceptrons bysettingQ = 0,d = 2 and by using (6) and (7) for the fitness (5). The correspondingevolution curves, obtained by starting with the same initial population (p0 = 0.25),is represented in Fig.2 for a particular value of the mutation ratePmut. The intro-duction of other mutation rates only decreases the values ofpn in the asymptoticregime due to random fluctuations in the genotypes of the population. This, how-ever, does not alter the general picture. The corresponding evolutionary path in thefitness landscape is a climb along the lineq = 0.

It is worth comparing the evolutionary path of the population of rigid perceptronsto the one obtained within the H&N model. In the latter case, as genomes havingone or more−1 alleles have minimum reproductive success, these are eliminatedduring the first generations. After this irrelevant short transient, consisting of arandom search of the ‘ridge’ located along the linep + q = 1, a steady climbtakes place that is restricted to that line. If one assumes that all the−1 alleles havealready been eliminated, the reproductive success, averaged over the number oflearning trials, depends uponL and the numberP of (correct) inherited1 allelesthrough (Fontanari and Meir, 1990):

8H NG (P) = L − (L − 1)

1− [1− 2(P−L)]G

G 2(P−L), (8)

whereG stands for the maximum number of guesses, in the (random) learningprotocole of the H&N model. For a large population, function (8) correspondsto the profile of the ridge along the linep + q = 1 in Fig. 1(a). Thus, both inthe case of our model with rigid perceptrons as well as within the H&N model,the evolutionary paths proceed within restricted subspaces (the lineQ = 0 or theline P + Q = 1) to reach the optimal phenotype. Equations (3) and (4) withR = 0, d = 2 and (8) for the fitness, give the evolution of the H&N model afterthe transient. The result is represented in Fig.2. These figures show the conceptualsimilarity of the H&N model and our case of rigid perceptrons.

Within our model, the evolution is very different if the perceptrons have plasticalleles. It is possible to study analytically the evolution if the adaptive weights areselected at random, so that the average fitness does not depend on the details of thelearning process. This is the case in two extreme scenarios in which the synapticconnections used to evaluate the reproductive success are those set at random ‘atbirth’. This is the case whenM = 1, because a single pattern is used to testthe individuals’ classification abilitybeforelearning starts. Then, the reproductivesuccess is determined by the (average) generalization error, which is dominated bythe most probable choice of theQ adaptive weights. This corresponds to one halfof these equal to 1, and the other half to−1. The generalization error is that of a

Page 11: A Model for the Interaction of Learning and Evolution

Learning and Evolution 127

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

0 50 100 150 200 250 300 0 50 100 150 200 250 300

0 50 100 150 200 250 300

p q

r

Generations Generations

Generations

Pmut

= 0.01H & NQ = 0M = 1perfect

Pmut

= 0.01Q = 0M = 1perfect

Pmut

= 0.01H & NM = 1perfect

Figure 2.pn, qn andrn as a function of the generation number (genome lengthL = 20) formutation probabilityPmut = 0.01, for the H&N model and the three extreme cases of ourmodel. The allowed number of (random) learning trials of the H&N model isG = 500.

perceptron withPc synaptic efficacies equal to those of the reference,

fM(P, Q) = 1−1

πarcos

(2

Pc

L− 1

), (9)

where Pc = P + Q/2. In the other extreme case of our model, namely that ofperceptrons whose weights are systematically set to their correct values at birth,the average fitness is also given by (9), but now Pc = P + Q. The evolution ofthe adaptive perceptrons in these extreme situations follows from (3) and (4) withd = 3 and by using (9) for the fitness (5). The fraction of1, ? and−1 allelesthrough the successive generations,pn, qn andrn, respectively, are represented inFig. 2.

The results of Fig.2 indicate that the assimilation of1’s into the genotype halts atgreater distances as the learning effects become more noticeable in the evolutionaryprocess. Note thatpn saturate far fromp = 1. In fact, for the case of perfectlearning (Pc = P+Q), evolution stops when the adaptive walks hit the linep+q =1 andnot when the optimal genotype (P = L) is reached. Within this elementarymodel this stems from the fact that? alleles are completely equivalent to1 alleles.A similar, somewhat hindered, effect is found in the numerical simulations that aredescribed in the next section.

4. THE ADAPTIVE WALKS

The adaptive walks have been calculated using a genetic algorithm, as explainedbefore. However, in order to obtain statistically significant results some averaging

Page 12: A Model for the Interaction of Learning and Evolution

128 H. Dopazoet al.

is performed. An ensemble of random populations is generated with the same givenmean valuep andq. Each of the members of this ensemble is taken as an initialpopulation to run independent evolutionary processes using the genetic algorithm.We have chosen to stop the process after 300 generations. This number is suchthat the diversity introduced by mutation is not enough to give rise to a significantevolutionary change‖. In each generation, we average the values ofp, q, r and8(p,q) that are obtained in each independent evolutionary process. The valuespn, qn, rn and8(pn,qn) that are obtained in this way are henceforth referred toas ‘adaptive walks’ or ‘evolutionary paths’. The results are such that these pathslie on top of the corresponding surfaces shown in Fig1 showing no appreciabledeparture from them.

In what follows we therefore only use the variablesp andq. Since the reproduc-tive rates turn out to also be averaged over the training set of examples, we againdrop the symbols〈· · · 〉 and the bar over8 to ease the notation.

In the general situation of adaptive perceptrons (AP), evolution proceeds along adirection that approaches that of the maximum gradient of the surface8(p,q). Itis possible to obtain an approximate picture of the orientation of the adaptive walksfrom the corresponding contour plots [see Fig.3(a)].

As mentioned above, the fractionfM of correct classifications cannot be calcu-lated analytically for each value ofM . It can, however, be estimated in two extremesituations: for the case whenM = 1 in which the learning algorithm is not appliedand for the case of a perfect learning process. These two cases were discussed inthe preceding section. The corresponding fitness landscapes therefore are:

81 = 1+ (L − 1)

(1−

1

πarcos(2p+ q − 1)

), (10)

8∞ = 1+ (L − 1)

(1−

1

πarcos(2(p+ q)− 1)

). (11)

The lines of constant8 are straight lines having slopes−1/2 or−1, respectively,for 81 and8∞. In the Fig.3(a) we also show the contour plots of the average fit-ness landscape forM = 500 obtained numerically. Note that these fall in betweenthose of81 and8∞. A similar contour plot for purely rigid perceptrons with noflexible alleles corresponds to horizontal lines because the fitness function is in-dependent ofq. We therefore note that the slope of the contour lines increases asthe plasticity plays an increasingly significant role. Correspondingly evolutionarypaths that have a common starting point and with less important learning effectsshould proceed to the left of those in which learning is more important. This canbe checked in Fig.3(b).

The numerical examples were calculated withPmut = 0.01, M = 100 andM =500. In addition, two sets of paths are generated corresponding, respectively, to‖For large mutation probabilities the evolutionary process is not possible because successful geno-

types are not preserved from one generation to the next.

Page 13: A Model for the Interaction of Learning and Evolution

Learning and Evolution 129

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

p

q

q

7.1

9.8

12.6

15.3

18.0

No learningLearning, M = 500Infinitely effective learning

(a)

No plasticityNo learningLearning, M = 100Learning, M = 500Hinton & Nowland

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

p

(b)

Figure 3. (a) Contour plot of the fitness landscape for the adaptive perceptron model fordifferent learning protocols. Increasing slopes are associated to improved learning abilities.(b) Evolution of the average population frequenciesp andq for several models. Observethe different stagnation points of the different models. The one farther from the optimum(p = 1,q = 0) corresponds to individuals with greater learning abilities. The label ‘nolearning’ corresponds to the caseM = 1 in which the learning algorithm is not used.

Page 14: A Model for the Interaction of Learning and Evolution

130 H. Dopazoet al.

the cases in which the learning algorithm is used throughout the lifetime of theindividual and another in which the learning algorithm is not applied and the plasticweights are determined at random at ‘birth’.

In both cases, the evolution is qualitatively similar. If the starting random popula-tion is very close to the linep+q = 1, the fraction of1’s increases upon evolution,mainly to the detriment of the?.

When the initial population has smallq, plasticity noticeably influences the evo-lutionary path. In that case there is a differential selective pressure to increase the?’s at the expense of diminishing the−1’s. This is mainly because? alleles are al-ways beneficial. Even if there is no learning they have at least a probability of 1/2of becoming a1. In later stages the adaptive walks approach the linep+ q = 1.Individuals that are allowed to learn for longer get closer to that line, showing therelevance of the full use of genetic plasticity through the learning algorithm.

An inspection of the evolutionary paths obtained with the AP model leads to thesurprising result that the use of learning may not be an effective way of transcribingenvironmental data into the genetic information. Individuals that are allowed tolearn have a reproductive success that outperforms that of individuals that do notlearn. However, upon evolution, the latter get closer to the optimum. Therefore,as long as the evolutionary process isonly regarded in terms of the transcription ofenvironmental data into genetic information, we reach the conclusion thatlearninggives rise to a halting of this process, in opposition of what is usually understoodas the Baldwin effect. This effect has been already noticed in the schematic modelsof the preceding section.

The evolutionary paths within the AP model can be compared to those of thetwo extreme, conceptually similar situations mentioned in the preceding section,namely the H&M model without ‘wrong’ alleles, and perceptrons without? alleles.In the latter case the evolutionary path is confined to thep-axis, asq = 0, andmutations leading to the appearance of? alleles are not allowed. In both casesselection is seen to drive the population to the immediate neighborhood of theoptimal genome, getting closer to that target than populations of AP.

In the same figure we also present a typical evolutionary path obtained withinthe (full) H&N model (also with 300 generations), which shows the two differentregimes mentioned in the preceding section: a random search of the configurationswithout−1’s on a flat fitness landscape followed by the evolution confined to thep+q = 1 line. As in the case of a population having only rigid alleles, the optimalgenome is rapidly reached. Experiments with initial conditions with similar valuesof p but smaller value ofq can hardly be considered a case of Darwinian evolutionbecause the corresponding adaptive walks spend most of the time performing arandom search of the ‘fitness ridge’ atp+ q = 1.

The features discussed above can also be analysed by looking at the curves ofFig. 4. The stagnation effect displayed by the adaptive walks of Fig.3 shows upas an asymptotic constant value of the average frequency of the1 alleles in corre-spondence with a constant (high) value of the average fitness. In spite of having a

Page 15: A Model for the Interaction of Learning and Evolution

Learning and Evolution 131

0 100 200 300 0 100 200 300

0 100 200 3000 100 200 300

1

6

11

16

21

F

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

q r

p

0.0

0.2

0.4

0.6

0.8

1.0

No plasticityNo learningLearning, M = 100Learning, M = 500

(a)

(c)

(b)

(d)

G G

G G

Figure 4. (a) Fitness as a function of the generation number. (b) Frequency of1 alleles,(c) of ? alleles and (d) of−1 alleles as a function of the generation number. The initialconditions arep = 0.25 andq = 0.01. Full circles correspond to a learning protocole withM = 100, empty circles correspond to no learning and crossed circles correspond to apopulation without plastic alleles. The results of H&N are not included because with theseinitial conditions evolution corresponds to a random search during more than the allowed300 generations.

lower value ofp, the average fitness of the populations of learning individuals isgreater. The opposite happens in the extreme case in which the?’s are forbidden.In this case the asymptotic value ofp is the largest while fitness reaches a lowerasymptotic value.

In Fig. 4(c) and (d) one can check that a population of learning individuals isindeed less efficient at eliminating the? alleles while it is the best at eliminatingthe−1’s. The worse situation, as far as fitness is concerned, corresponds to thecaseM = 1 in which individuals have plastic alleles but the learning protocole isnot used. In this case selection is not efficient at eliminatingboth−1 and? alleles.It is indeed reasonable to expect that if individuals are endowed with plasticity, thisshould be used for some ontogenetic adaptation.

5. CONCLUSIONS

We have presented a stylized model to study the interaction of the genetic andbehavioral systems during evolution. We considered a population of perceptronshaving some plastic synapses that can be adjusted through learning using an algor-ithm that is a simple version of ‘on line learning’. Each perceptron is presentedwith M examples, and its flexible synapses are adjusted whenever its classificationis different from that of a reference perceptron that represents the environmental

Page 16: A Model for the Interaction of Learning and Evolution

132 H. Dopazoet al.

conditions. The reference is assumed to have all synapses codified by1 alleles,without any loss of generality. The reproductive success of each individual is mea-sured in terms of the fraction of examples that are properly classified. Within thismodel, therefore, only a single optimal genotype exists that is equal to that of thereference perceptron.

Our framework is an extension of the H&N model, which also assumes a singleoptimal genotype and includes genetic plasticity through the? alleles. Learningis represented as a random search, and stops either if the correct configuration isfound or if the maximum ‘learning time’ is exceeded. This process is a non-trivialway of defining a smooth fitness landscape in genotype space. However this onlyhas non-vanishing slopes in the immediate neighborhood of the subspacep+q = 1involving only sequences with1 and? alleles.

In our case the fitness landscape is the result of a much more delicate trade-offbetween how similar the genotype is to the optimal and how much of the classifi-cation performance can be improved due to learning. The result of such a balanceis that the fitness landscape has gradual slopes in the whole genotype space. Con-sequently, the corresponding adaptive walks unfold in the whole space spanned bythe frequenciesp,q andr of the alleles1, ? and−1.

The evolutionary process amounts to substituting the−1’s and?’s by 1’s, thus‘transcribing’ the environmental data into genetic information. This process corre-sponds to an evolutionary path in the space of the frequenciesp,q andr . Due to thelimited features of the H&N model this substitution only takes place in two phases.The first consists of the random search of the fitness ‘ridge’ in the regionp+q = 1that only involves genotypes with1 and? alleles. This stage can hardly be con-sidered a Darwinian process, precisely because it proceeds at random due to theabsence of any differential fitness to drive it. The second phase consists of climb-ing the ridge in the single possible direction that amounts to substitute?’s by 1’s.

In our model the average fitness is the compound result of the constitutive clas-sification performance of each individualand its learning ability. The ‘greaterdistance’ that is thus introduced between genotype and phenotype gives rise to avariety of adaptive walks in which the transcription of environmental informationinvolves thesimultaneoussubstitution of−1’s and?’s by 1’s at rates that are re-spectively established by the slopes of the fitness landscape. This model has beenstudied analytically within a schematic framework of infinite populations and alsoby performing numerical simulations in finite populations.

In the last stages of evolution we find that the result of the simultaneous presenceof the two elements effectively halts the transcription of the environmental datainto genetic information. This can be considered as a sort of hindrance of theBaldwin effect just as suggested byMitchell (1996). It is important to stress thatthis evolutionary halting is a direct consequenceonlyof the learning ability. This infact provides the extra success that is needed to compensate for a greater distance tothe optimal genotype. This effect has also been found in a more dramatic fashionwithin the analytic approach. Within that framework, and in the case of perfect

Page 17: A Model for the Interaction of Learning and Evolution

Learning and Evolution 133

learning, it is clear that the population is not forced to reach the optimum atp = 1because? and1 alleles are both equally useful.

These effects are clearly seen by comparing the evolutionary path of a populationthat is able to learn with one in which learning is inhibited or even to the extremesituation of a population having only fixed alleles. In the last stages of the evo-lutionary process, the overhead payed by the? and−1 alleles that are left is sosmall that there is not an appreciable selection pressure to eliminate them. Theresult is a stagnation of the transcription process. On the other hand, a populationof rigid genotypes is very good at getting close to the optimal genotype becausethat is theonly way of surviving. A plastic system that is able to undergo develop-mental adaptations appears as more efficient in eliminating the wrong informationby changing it into plastic alleles. All these effects are magnified if learning isenhanced allowing the algorithm to run with a greater numberM of ‘training ex-amples’.

The model that has been presented allows for several extensions that are presentlyunder progress. One is to study the effects of an additional cost to the learningprocess in order to take into consideration that there are no ‘free lunches’ in nature.As far as the Baldwin effect is considered, more important effects are, however, tobe expected from the inclusion of changes in the environmental parameters.

ACKNOWLEDGEMENTS

One of the authors (HD) wishes to acknowledge the support of CONICET througha postdoctoral fellowship and to the Santa Fe Institute for support in his participa-tion in the Summer School on Complex Systems (1997) where some early discus-sions about this subject took place. SR-G, MG and RP acknowledge economicsupport from the EU-research contract ARG/B7-3011/94/97. HD and RP holda UBA-research contract UBACYT PS021, 1998/2000. MG is a member of theCNRS.

REFERENCES

Ackley, D. and M. Littmann (1992). Interactions between learning and evolution, in G. C.Langton, C. Taylor, J. Farmer and S. Rasmussen (Eds),Artificial life II. Santa Fe Insti-tute Studies in the Science of Complexity, Proc. Vol. X. Redwood City, CA: Addison-Wesley, pp. 487–509.

Baldwin, J. M. (1896). A new factor in evolution.Am. Nat.30, 441–451.

Belew, R. K. (1990). Evolution, learning, and culture: Computational metaphors for adap-tive algortithms.Complex Syst.4, 11–49.

Fontanari, J. F. and R. Meir (1990). The effect of learning on the evolution of asexualpopulations.Complex Syst.4, 401–414.

Page 18: A Model for the Interaction of Learning and Evolution

134 H. Dopazoet al.

French, R. and A. Messinger (1994). Genes, phenes and the Baldwin effect: Learning andevolution in a simulated population, in R. Brooks and P. Maes (Eds),Artificial Life IV.Cambridge, MA: MIT Press, pp. 277–282.

Goldberg, D. (1989).Genetic Algorithms in Search, Optimization and Machine Learning,Redwood City, CA: Addison-Wesley.

Harvey, I. (1993). The puzzle of the persistent question marks: A case study of geneticdrift, in S. Forrest (Ed.),Proceedings of the Fifth International Conference on GeneticAlgorithms, San Mateo, CA: Morgan Kaufmann, pp. 15–22.

Hertz, J., A. Krogh and R. G. Palmer (1991).Introduction to the Theory of Neural Compu-tation, Redwood City, CA: Addison-Wesley.

Hinton, G. E. and S. J. Nowlan (1987). How learning can guide evolution.Complex Syst.1, 495–502.

Ho, M. W., C. Tucker, D. Keeley and P. T. Saunders (1983). Effect of successive genera-tion of ether treatment on penetrance and expression of the bi-thorax phenotype inD.melanogaster. J. Exp. Zool.225, 357–368.

Jablonka, E. and M. Lamb (1995).Epigenetic Inheritance and Evolution. The Lamarck-ian Dimension, Oxford: Oxford University Press.

Maynard Smith, J. (1987). When learning guides evolution.Nature349, 761–762.Minsky, M. and S. Papert (1969).Perceptrons, Cambridge, MA: MIT Press.Mitchell, M. (1996).An Introduction to Genetic Algorithms, Cambridge, MA: MIT Press.Scharloo, W. (1991). Canalization: genetic and developmental aspects.Ann. Rev. Ecol.

Syst.22, 65–93.Waddington, C. H. (1942). Canalization of development and the inheritance of acquired

characters.Nature150, 563–565.

Received 15 June 2000 and accepted 18 September 2000