A Model for the Interaction of Learning and Evolution · Bulletin of Mathematical Biology (2000) 00, 1–18 A Model for the Interaction of Learning and Evolution H. DOPAZO∗ †,

Available online at http://www.idealibrary.com ondoi:10.1006/bulm.2000.0207Bulletin of Mathematical Biology (2000) 00, 1–18

A Model for the Interaction of Learning and Evolution

H. DOPAZO∗†, M. B. GORDON‡, R. PERAZZO† ANDS. RISAU-GUSMAN†‡

†Centro de Estudios Avanzados,Universidad de Buenos Aires,Uriburu 950,1114 Buenos Aires,ArgentinaE-mail: [email protected]‡CEA Grenoble,Departement de Recherche Fondamentale sur la Matiere Condensee,17, rue des Martyrs,38054 Grenoble Cedex 9,France

We present a simple model in order to discuss the interaction of the genetic and be-havioral systems throughout evolution. This considers a set of adaptive perceptronsin which some of their synapses can be updated through a learning process. Thisframework provides an extension of the well-known Hinton and Nowlan modelby blending together some learning capability and other (rigid) genetic effects thatcontribute to the fitness. We find a halting effect in the evolutionary dynamics, inwhich the transcription of environmental data into genetic information is hinderedby learning, instead of stimulated as is usually understood by the so-called Baldwineffect. The present results are discussed and compared with those reported in theliterature. An interpretation is provided of the halting effect.

c© 2000 Society for Mathematical Biology

1. INTRODUCTION

The interaction between the adaptive abilities of organisms and their evolutionhas been a matter of discussion ever since Lamarck suggested that acquired char-acters could be inherited. At the end of the last century, Baldwin (1896) addressed Author: Please pro-

vide received andaccepted datethe problem of the interaction of learning and evolution. His basic suggestion is

that what had to be learned in previous generations appears genetically encoded inlater ones. Although this process suggests a Lamarckian mechanism of evolution,Baldwin emphasized that this can be achieved within a pure Darwinian framework.

More recently, Waddington (1942) showed that a character whose developmentoriginally depends on an environmental stimulus, becomes genetically fixed andindependent of it. Waddington called this process genetic assimilation, suggesting

∗Author to whom correspondence should be addressed.

0092-8240/00/000001 + 18 $35.00/0 c© 2000 Society for Mathematical Biology

2 H. Dopazo et al.

that natural selection favors those genetic combinations that most readily respondto the environmental stimulus. Further experiments (Ho et al., 1983; Scharloo,1991) confirmed his results and interpretations.

The Baldwin effect and the genetic assimilation share a common Darwinianmechanism with similar consequences on the evolutionary process. This was sum-marized by Maynard Smith (1987): ‘If individuals vary genetically in their capac-ity to learn, or to adapt developmentally, then those most able to adapt will leavemore descendants, and the genes responsible will increase in frequency. In a fixedenvironment, when the best thing to learn remains constant, this can lead to thegenetic determination of a character that, in earlier generations, had to be acquiredafresh each generation’. More recently, Jablonka and Lamb (1995) summarizedthese ideas by defining the Baldwin effect as that ‘seen when the environmental in-duction of a physiological or behavioral adaptation allows a population to survivelong enough for the accumulation by selection of similar constitutive hereditarychanges’.

Hinton and Nowlan (1987) (H&N) wrote a seminal paper to provide a theoret-ical framework for the discussion of the Baldwin effect. They considered a pop-ulation of haploid individuals, each one having a neural network with L potentialconnections defined by three allelic forms. Alleles 1 (−1) hereafter called ‘fixed’represents that the connection is present (absent). An ? allele generates instead aflexible connection, that can be adapted during the life of the individual througha random search that emulates a learning process. Fitness depends on the effec-tiveness of such a learning process and is therefore nothing but a (probabilistic)measure of the distance of the individual’s genotype to the optimum.

The H&N model was simulated using genetic algorithms (Goldberg, 1989), inwhich mutation and cross-over were included. The results show that natural se-lection rapidly eliminates the wrong alleles −1 and gradually replaces the plasticalleles ? by optimal values 1. It is further found that the evolution towards a pop-Editor: should this

be ? allelesulation composed of optimal individuals is greatly accelerated with respect to theone in which plastic alleles are absent. These results were confirmed numerically,and thoroughly discussed by Fontanari and Meir (1990), Belew (1990), Ackleyand Littmann (1992), Harvey (1993) and French and Messinger (1994). Since nomechanism is explicitly introduced to fix in the gametes the result of learning, it isconcluded that Baldwin’s conjecture actually takes place.

The H&N model provides a framework to account for the interaction of the ge-netic and behavioral systems throughout the evolutionary process. In particular,it shows that the evolution towards the optimal genotype is greatly accelerated bylearning. It is, however, easy to overstate the conclusions that can be extractedfrom this model. One should note that this acceleration with respect to a purerandom search should not be considered a surprising result. It is instead a naturalconsequence of the Darwinian selection allowed through the explicit introductionof a fitness function. As the only behavioral feature that confers fitness is the indi-viduals’ learning ability, this model cannot address the relevant question whether

Learning and Evolution 3

learning accelerates the evolution with respect to other evolutionary paths; in par-ticular, those followed by populations with phenotypic features expressing fixedgenetic information.

The purpose of the present paper is to go one step further than the H&N modelby endowing the individuals with a richer genotype-phenotype structure. We aimat blending into a single picture both some learning capability and other geneticepistatic effects that contribute to the selection process in the absence of geneticplasticity.

Within the present wider framework the evolutionary paths result from the con-tributions to the fitness of fixed and flexible genetic information. This allows usto render visible an effect that has been suggested by Mitchell (1996): ‘[...] if thelearning procedure were too sophisticated [...] there would be little selection pres-sure for evolution to move from the ability to learn the trait to the genetic encodingof that trait’. This hindrance (instead of a stimulation) of the transcription of theenvironmental data falls beyond the H&N model. In fact, the latter corresponds tothe very particular situation in which genetic plasticity constitutes the only (nar-row) road to reach the optimum. A second but equally plausible particular caseis the trivial situation of a population of (‘rigid’) individuals that obtains fitnesssimply through the (Hamming) closeness to the optimal genetic information. Theconsequence of endowing the individuals with both fixed and adaptable features isthat evolution can proceed along less restricted paths in the fitness landscape.

In Section 2 we present the model. It has the following properties: firstly, eachindividual is endowed with a phenotype that processes data from the environmentin order to provide a response to it. Secondly, the reproductive success of eachindividual is defined in terms of the performance of the phenotype, arising from itsprocessing ability. Thirdly, phenotypes may also undergo a learning process thattakes advantage of some genetic plasticity and that is regulated by their processingability. In Section 3 we discuss two extreme particular cases of the model within thesame mean field analytic approximation used by Fontanari and Meir (1990). Theresults are presented in Section 4 and the final discussions are left to Section 5.

2. THE MODEL

2.1. The population. We consider a haploid population in which each individ-ual is represented by a neural network. We choose the simplest kind of net, havingL input ports that are assumed to receive data from the environment and a sin-gle output neuron that is assumed to provide the response of the individual. Thistype of network is called a perceptron (Minsky and Papert, 1969). The connec-tions between the input ports k = 1, 2, . . . , L and the output neuron have synapticefficacies w ≡ (w1, w2, . . . , wL). We restrict our model by considering ‘Ising per-ceptrons’ in which wk = ±1. This assumption does not introduce any essentiallimitation, but provides a convenient simplification of the algebra.

4 H. Dopazo et al.

The inputs of the network that convey data from the environment are representedby pattern vectors ξ ≡ (ξ1, ξ2, . . . , ξL). We also assume that ξk = ±1 (k =1, 2, . . . , L). The response of each individual to the input data is determined bythe signal produced by the output neuron. This is taken to be the sign of the sumof all the inputs weighted by the synaptic efficacies, namely:

σ = sign(w · ξ). (1)

From equation (1) it follows that perceptrons are only able to classify successfullysets of input vectors that can be separated by a single hyperplane, normal to w,in the input space of L dimensions. The two possible values of the classificationσ = ±1 may be considered as two different behavioral responses.

In order to obtain an intuitive picture of this model, one could think that eachcomponent of the input vectors ξ encodes the presence (ξk = +1) or the absence(ξk = −1) of some attribute of, say, a fruit (i.e., a particular color, smell, texture,etc.). The perceptrons are asked to choose which fruits can be eaten and whichshould be discarded. We assume that there is an optimal classification of fruits intothose that are edible and those that are not. We further assume that such optimalclassification is provided by a ‘reference or teacher perceptron’† that is assumedto remain unchanged throughout the evolutionary process. Those individuals thathave many synaptic connections different from those of the reference will makemany mistakes in the classification task. Conversely, those whose synaptic connec-tions are similar to the ones of the reference will be more successful in choosingfood‡.

In order to study the evolutionary dynamics using genetic algorithms we assumethat each individual is specified by a genotype consisting of a string of L loci, inwhich the synaptic connections are encoded. In order to endow the perceptronswith genetic plasticity we assume, in the same fashion as in the H&N model, thateach locus of the genotype can be occupied by one of the three possible alleles 1,−1 and ?. The two first alternatives correspond to fixed synaptic connections whilethe last represents a flexible link that is allowed to change during the lifetime ofthe individual, acquiring the values±1 and in accordance to some (fixed) updatingalgorithm. Each genotype is therefore characterized by the three integers P , R andQ (0 ≤ P, Q, R ≤ L , with P+Q+ R = L) that, respectively, specify the numberof 1, −1 and ? alleles.

The phenotype of each individual is the result of the interaction of the geneticinformation with a set of M environmental stimuli {ξµ}µ=1,...,M . The behavioral

†The name ‘teacher perceptron’ is usual in the neural networks literature. Here we prefer ‘reference’because in the present context ‘teacher’ may convey the somewhat inappropriate idea that a culturaltransmission is taking place.

‡The reason for choosing a perceptron as a reference to provide an optimal classification is only amatter of mathematical convenience. Resorting to this artifice one can be sure that there is a uniqueoptimal genetic configuration. If a classification surface different from a hyperplane were chosen, itis in general possible to find several optimal perceptrons.


response is the output σ = ±1 classifying the input stimulus into one of the twopossible categories. Individuals are also assumed to undergo an updating process ofthe Q flexible alleles according to a (fixed) algorithm that we will explain shortly.

This updating process could be regarded as a learning protocol. Think for in-stance of the pleasant or unpleasant taste of some food that has been eaten. As aconsequence, some adjustments of the flexible synapses may be performed to im-prove the classification scheme before the next stimulus from the environment isreceived. Within our model this feedback loop arises from the comparison of theclassification performed by the individual and the one provided by the ‘reference’perceptron.

To update the synaptic efficacies we use a simple version of the ‘on line’ learningalgorithms extensively studied in the neural networks literature [see, e.g., Hertz etal. (1991)]. The procedure is as follows:

Learning protocol.

• If the classification of the input pattern is correct, no changes are produced.• Else, determine the minimum number of synapses nmin that have to change

in order to flip the sign of the output signal.• The updating consists in flipping either nmin synapses at random among the

Q flexible ones, or all of them if nmin > Q.

This learning protocole is repeated each time an example is presented.

2.2. The fitness function. To put the above picture into a more formal languagewe assume that the reference perceptron has synaptic connections w∗. The ‘correct’classifications of the input patterns ξµ ≡ (ξ

µ

1 , ξµ

2 , . . . , ξµ

L ) are therefore σµ∗ =sign(w∗ · ξµ) and the optimal genetic information is w = w∗.

The interaction of each individual with the environment is represented by thefollowing process:

(1) Flexible synapses are randomly assigned to ±1, so that the individuals’weights are w, and we initialize FM = 0.

(2) For µ = 1 to M :

• An input pattern ξµ ≡ (ξµ

1 , ξµ

2 , . . . , ξµ

L ) is drawn at random, and theoutput σµ = sign(w · ξµ) is compared to the reference class, σµ∗ =sign(w∗ · ξµ).• If σµ = σµ∗ the flexible synapses are updated following the learning

protocole described above.• Else, we increment FM : FM = FM + 1.

(3) The fraction of patterns that the individual did not need to learn is fM =FM/M .

6 H. Dopazo et al.

We define the performance of each individual by the fraction fM of examplesthat the individual did not need to learn. Note, however, that the classificationsperformed last are more likely to be correct due to the learning of the previouslypresented patterns.

We assume that the reproductive success of an individual is proportional to itsperformance. We scale it in such a way that individuals with the highest possibleperformance fM = 1 have a reproductive success � = L . On the other hand,individuals that miss all the classifications are assumed to have �( fM = 0) = 1.We thus define

�( fM) = 1+ (L − 1) fM . (2)

If M = 1 the adaptive effects stem from the fact that the initial random guessof the synaptic efficacies corresponding to the ? alleles may be favorable. fM in-Author: Is this ok

done inconstencycreases slowly with M because it always bears the cost of all the misclassificationsperformed during the early stages of the interaction with the environment. As thenumber of examples needed to learn Q synapses is proportional to Q, (2) penal-izes individuals with large Q, as they need more examples to fix the correspondingsynapses. Except for this effect no explicit cost is assumed for the learning process.Note that an individual can have a sizable success having only fixed synapses. Infact, an individual with a rigid genotype with few alleles wrong can outperformanother with many flexible synapses that are hard to fix into the right values usingthe learning protocole.

Individuals leave descendants to the next generation with a probability that isproportional to �. Selection operates because poorly performing individuals are ina worse situation to leave descendants than those that make less mistakes.

In order to model the evolutionary process we ask each generation to face a newset of M randomly chosen environmental stimuli, keeping unchanged the refer-ence perceptron. The model is completed allowing for the usual operations of thegenetic algorithm. Mutations are applied to the selected genotypes by inspectingsequentially all the genome’s loci and changing at random the corresponding al-leles with a prescribed probability Pmut. Recombination can then be taken intoconsideration. This is an important source of diversity and accelerates significantlythe evolutionary process. This process, however, adds no new conceptual ingredi-ents to the evolutionary dynamics in the present problem and therefore will not beincluded.

The evolutionary process may be described by the fraction of individuals corre-sponding to each genotype composition (P, Q, R) at each generation. In the caseof very large populations, the histogram is centered on the average value. The lat-ter approaches the optimum genotype by mutations and selection. In the followingwe assume that the population is sufficiently large, and we describe its evolutionby that of the alleles’ frequencies, p = 〈P〉/L , q = 〈Q〉/L and r = 〈R〉/L ,where 〈A〉 is the average number of alleles A (A ∈ {1, ?, −1}) in the population.


Within this approximation, the evolutionary process corresponds to a walk in thethree-dimensional space spanned by p, q and r . Since P + Q + R = L for eachindividual, the evolutionary path is confined to the triangle p + q + r = 1, p ≥ 0,q ≥ 0, r ≥ 0. To make the visualization easier, we project the paths onto theq, p plane below the line p + q = 1, limited by p ≥ 0, q ≥ 0. The axis q = 0corresponds to a population composed of individuals having only rigid genotypeswhile the line p + q = 1 corresponds to one without ‘wrong’ connections.

The evolutionary processes can be regarded as a climb of the fitness landscapedefined within that triangle in the p, q plane. To generate these landscapes, foreach point of the (p, q) plane an ensemble of random populations is generated,each one having the corresponding mean frequency of the 1 and ? alleles. Thereproductive success of each individual is evaluated and is next averaged over eachpopulation and over the ensemble. In the four panels of Fig. 1 we show the fitnesslandscapes 〈�〉(p, q) with learning schedules involving several values of M . Wealso show the fitness landscape of the H&N model for comparison.

The H&N surface displays a localized region§ of high fitness on the line p+q =1 with a flat plateau 〈�〉 = 1 outside it. This is associated to the fact that withinthe H&N model, individuals with −1 alleles have minimal reproductive success(� = 1), and that the only way of increasing it is through mutations transformingthe ‘wrong’ alleles into ? or 1 alleles. Evolutionary paths with increasing fitnesscan therefore mainly take place within the limited subspace p + q = 1. Out-side, evolution can only proceed by a random search through mutations acting onindividuals having equal (low) reproductive rate.

A limiting case of our model may be obtained by excluding the ? alleles andrestricting the mutations to changes of the 1 into −1 alleles and vice versa. Be-cause of these restrictions imposed on the mutations, the evolutionary paths canonly take place along the p-axis. Contrary to the H&N model, individuals withsome −1 alleles may have a large reproductive rate and leave descendants becausea perceptron with some wrong weights may be able to classify successfully an ap-preciable fraction of the examples. This situation is discussed analytically in thenext section.

The fitness landscapes shown in the remaining panels of Fig. 1 correspond to ourmodel. These are seen to have always gradual slopes due to the combined effectsof learning and the genetic epistatic effects of rigid alleles. These landscapes areseen to change with the number of training examples M due to the longer learningsessions that all the individuals of the population undergo.

§Outside the line P + Q = L the reproductive success of each individual is strictly equal to itsminimum, � = 1. The landscape shown is not discontinuous because of the averaging proceduredescribed in the preceding paragraph. In a population with mean values p and q that are close to theline p + q = 1, there is a sizable probability of finding individuals with P = L . These contribute tothe mean fitness producing a value of 〈�〉 that is slightly larger than the minimum.

8 H. Dopazo et al.

(a)

(c) (d)

(b)

F F

p

q

p

q

p

q

p

q

F F

510

1520

510

1520

510

1520

1.00.8

0.60.4

0.20.0

1.00.8

0.60.4

0.20.0

1.00.8

0.60.4

0.20.0

1.00.8

0.60.4

0.20.0 1.0

0.60.40.20.0

0.8

1.00.60.40.2

0.0

0.8

1.00.60.40.2

0.0

0.8

510

1520

1.00.60.40.2

0.0

0.8

Figure 1. Fitness landscapes. (a) Landscape obtained for the H&N model. Landscapes forthe adaptive perceptron model: (b) no learning, (c) learning protocole with M = 100, and(d) with M = 500.

3. ANALYTIC TREATMENT OF PARTICULAR CASES

Some particular cases of our model can be studied analytically. One correspondsto a population of rigid perceptrons that have no ? alleles at all, so that learning isimpossible. Fitness is only acquired through ‘rigid’ epistatic effects. Another caseinvolves a population of adaptive perceptrons whose ability is tested with only onepattern; that is, with M = 1 in (2). Finally, we also analyse the extreme case ofperfect learning, in which the initial guess of the weights is the best possible onefor all the perceptrons in the population. In the following we compare the evolutionof these limiting cases of our model with the model of H&N. As we will shortlysee, both bear a great conceptual similarity.

We use the same analytical approach of Fontanari and Meir (1990), that considersan infinite¶ haploid, asexual population. At each generation, a new population isobtained by selection and random mutations. One can introduce the probabilitiespn , qn and rn of finding a given proportion of 1, ? and −1, alleles respectively, inthe nth generation. The fraction pn of ‘correct’ alleles 1 may be written in termsof pn−1, qn−1 and rn−1, and of the individuals’ reproductive rates. This is a randomfunction of the patterns presented during the learning process. We assume thatthe reproductive rate of individuals only depends on their genotype composition,defined by the values of P , Q and R, and that it is well described by �(P, Q), the

¶Within this limit there is not genetic drift. Instead, this is present in the numerical simulations withfinite populations that are presented later.


average of �(P, Q) taken over the pattern distribution. Then:

pn = Pmut + 1− d Pmut

L Zn−1

L∑P=1

L∑Q=L−P

P�(P, Q)L!

P!Q!R! pPn−1q Q

n−1r Rn−1, (3)

where pn−1+ qn−1+ rn−1 = 1, R = L − P − Q, Pmut is the mutation rate, d is thenumber of different alleles, and Zn is a normalization constant given by

Zn =L∑

P=1

L∑Q=L−P

�(P, Q)L!

P!Q!R! pPn q Q

n r Rn . (4)

Equations similar to (3) give the evolution of qn and rn , the fractions of plastic and‘wrong’ alleles.

Equations (3) and (4) can be used recursively to generate the evolutionary processonce the average reproductive success is known. In our model,

�(P, Q) = 1+ (L − 1) fM(P, Q), (5)

where fM(P, Q) represents the average of fM(P, Q) over the possible sequencesof M test patterns. This average can be determined analytically in the three partic-ular cases mentioned above.

Consider first the population of rigid perceptrons (Q = 0). To write down theaverage fitness we note that the probability that an individual makes a mistakein the classification of one randomly chosen example, i.e., the probability that itsoutput differs from that of the reference, is given by the generalization error. Thisis a random variable, that depends on the distribution of examples. If these areselected at random with uniform probability, its average εg determines the fitnessthrough

fM = (1− εg)M . (6)

εg can be expressed in terms of the angle between the separating hyperplanes ofthe reference and the individual. More precisely, if the perceptron has weights w,then εg = arcos(w · w∗/|w||w∗|)/π . In the case of Ising perceptrons consideredhere, this can immediately be expressed in terms of the total number of alleles, L ,and the number P of correct connections.

εg(P) = 1

πarcos

(2

P

L− 1

). (7)

The corresponding fitness function (6) is gradual in genotype space even if theindividuals have no plastic alleles. This is so because there is always some proba-bility that an individual with some wrong connections can properly classify some

10 H. Dopazo et al.

of the M environmental stimuli. In the limit of a large population, 〈P〉/L ∼= p, sothat function (6) corresponds to the intersection of the surfaces shown in Fig. 1(b),(c) and (d) with the plane q = 0.

Equations (3) and (4) are directly applicable to the case of rigid perceptrons bysetting Q = 0, d = 2 and by using (7) and (6) for the fitness (5). The correspondingAuthor:

(7) and (6) OK or (6)and (7)? evolution curves, obtained by starting with the same initial population (p0 = 0.25),

is represented in Fig. 2 for a particular value of the mutation rate Pmut. The intro-duction of other mutation rates only decreases the values of pn in the asymptoticregime due to random fluctuations in the genotypes of the population. This, how-ever, does not alter the general picture. The corresponding evolutionary path in thefitness landscape is a climb along the line q = 0.

It is worth comparing the evolutionary path of the population of rigid perceptronsto the one obtained within the H&N model. In the latter case, as genomes havingone or more −1 alleles have minimum reproductive success, these are eliminatedduring the first generations. After this irrelevant short transient, consisting of arandom search of the ‘ridge’ located along the line p + q = 1, a steady climbtakes place that is restricted to that line. If one assumes that all the −1 alleles havealready been eliminated, the reproductive success, averaged over the number oflearning trials, depends upon L and the number P of (correct) inherited 1 allelesthrough (Fontanari and Meir, 1990):

�H NG (P) = L − (L − 1)

1− [1− 2(P−L)]GG 2(P−L)

, (8)

where G stands for the maximum number of guesses, in the (random) learningprotocole of the H&N model. For a large population, function (8) correspondsto the profile of the ridge along the line p + q = 1 in Fig. 1(a). Thus, both inthe case of our model with rigid perceptrons as well as within the H&N model,the evolutionary paths proceed within restricted subspaces (the line Q = 0 or theline P + Q = 1) to reach the optimal phenotype. Equations (3) and (4) withR = 0, d = 2 and (8) for the fitness, give the evolution of the H&N model afterthe transient. The result is represented in Fig. 2. These figures show the conceptualsimilarity of the H&N model and our case of rigid perceptrons.

Within our model, the evolution is very different if the perceptrons have plasticalleles. It is possible to study analytically the evolution if the adaptive weights areselected at random, so that the average fitness does not depend on the details of thelearning process. This is the case in two extreme scenarios in which the synapticconnections used to evaluate the reproductive success are those set at random ‘atbirth’. This is the case when M = 1, because a single pattern is used to testthe individuals’ classification ability before learning starts. Then, the reproductivesuccess is determined by the (average) generalization error, which is dominated bythe most probable choice of the Q adaptive weights. This corresponds to one halfof these equal to 1, and the other half to −1. The generalization error is that of a


1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

0 50 100 150 200 250 300 0 50 100 150 200 250 300

0 50 100 150 200 250 300

p q

r

Generations Generations

Generations

Pmut

= 0.01H & NQ = 0M = 1perfect

Pmut

= 0.01Q = 0M = 1perfect

Pmut

= 0.01H & NM = 1perfect

Figure 2. pn , qn and rn as a function of the generation number (genome length L = 20) formutation probability Pmut = 0.01, for the H&N model and the three extreme cases of ourmodel. The allowed number of (random) learning trials of the H&N model is G = 500.

perceptron with Pc synaptic efficacies equal to those of the reference,

fM(P, Q) = 1− 1

πarcos

(2

Pc

L− 1

), (9)

where Pc = P + Q/2. In the other extreme case of our model, namely that ofperceptrons whose weights are systematically set to their correct values at birth,the average fitness is also given by (9), but now Pc = P + Q. The evolution ofthe adaptive perceptrons in these extreme situations follows from (3) and (4) withd = 3 and by using (9) for the fitness (5). The fraction of 1, ? and −1 allelesthrough the successive generations, pn , qn and rn , respectively, are represented inFig. 2.

The results of Fig. 2 indicate that the assimilation of 1’s into the genotype halts atgreater distances as the learning effects become more noticeable in the evolutionaryprocess. Note that pn saturate far from p = 1. In fact, for the case of perfectlearning (Pc = P+Q), evolution stops when the adaptive walks hit the line p+q =1 and not when the optimal genotype (P = L) is reached. Within this elementarymodel this stems from the fact that ? alleles are completely equivalent to 1 alleles.A similar, somewhat hindered, effect is found in the numerical simulations that aredescribed in the next section.

4. THE ADAPTIVE WALKS

The adaptive walks have been calculated using a genetic algorithm, as explainedbefore. However, in order to obtain statistically significant results some averaging

12 H. Dopazo et al.

is performed. An ensemble of random populations is generated with the same givenmean value p and q. Each of the members of this ensemble is taken as an initialpopulation to run independent evolutionary processes using the genetic algorithm.We have chosen to stop the process after 300 generations. This number is suchthat the diversity introduced by mutation is not enough to give rise to a significantevolutionary change‖. In each generation, we average the values of p, q, r and�(p, q) that are obtained in each independent evolutionary process. The valuespn , qn , rn and �(pn, qn) that are obtained in this way are henceforth referred toas ‘adaptive walks’ or ‘evolutionary paths’. The results are such that these pathslie on top of the corresponding surfaces shown in Fig 1 showing no appreciabledeparture from them.

In what follows we therefore only use the variables p and q. Since the reproduc-tive rates turn out to also be averaged over the training set of examples, we againdrop the symbols 〈· · · 〉 and the bar over � to ease the notation.

In the general situation of adaptive perceptrons (AP), evolution proceeds along adirection that approaches that of the maximum gradient of the surface �(p, q). Itis possible to obtain an approximate picture of the orientation of the adaptive walksfrom the corresponding contour plots [see Fig. 3(a)].

As mentioned above, the fraction fM of correct classifications cannot be calcu-lated analytically for each value of M . It can, however, be estimated in two extremesituations: for the case when M = 1 in which the learning algorithm is not appliedand for the case of a perfect learning process. These two cases were discussed inthe preceding section. The corresponding fitness landscapes therefore are:

�1 = 1+ (L − 1)

(1− 1

πarcos(2p + q − 1)

), (10)

�∞ = 1+ (L − 1)

(1− 1

πarcos(2(p + q)− 1)

). (11)

The lines of constant � are straight lines having slopes−1/2 or−1, respectively,for �1 and �∞. In the Fig. 3(a) we also show the contour plots of the average fit-ness landscape for M = 500 obtained numerically. Note that these fall in betweenthose of �1 and �∞. A similar contour plot for purely rigid perceptrons with noflexible alleles corresponds to horizontal lines because the fitness function is in-dependent of q . We therefore note that the slope of the contour lines increases asthe plasticity plays an increasingly significant role. Correspondingly evolutionarypaths that have a common starting point and with less important learning effectsshould proceed to the left of those in which learning is more important. This canbe checked in Fig. 3(b).

The numerical examples were calculated with Pmut = 0.01, M = 100 and M =500. In addition, two sets of paths are generated corresponding, respectively, to

‖For large mutation probabilities the evolutionary process is not possible because successful geno-types are not preserved from one generation to the next.


0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

p

q

q

7.1

9.8

12.6

15.3

18.0

No learningLearning, M = 500Infinitely effective learning

(a)

No plasticityNo learningLearning, M = 100Learning, M = 500Hinton & Nowland

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

p

(b)

Figure 3. (a) Contour plot of the fitness landscape for the adaptive perceptron model fordifferent learning protocols. Increasing slopes are associated to improved learning abilities.(b) Evolution of the average population frequencies p and q for several models. Observethe different stagnation points of the different models. The one farther from the optimum(p = 1, q = 0) corresponds to individuals with greater learning abilities. The label ‘nolearning’ corresponds to the case M = 1 in which the learning algorithm is not used.

14 H. Dopazo et al.

the cases in which the learning algorithm is used throughout the lifetime of theindividual and another in which the learning algorithm is not applied and the plasticweights are determined at random at ‘birth’.

In both cases, the evolution is qualitatively similar. If the starting random popula-tion is very close to the line p+q = 1, the fraction of 1’s increases upon evolution,mainly to the detriment of the ?.

When the initial population has small q , plasticity noticeably influences the evo-lutionary path. In that case there is a differential selective pressure to increase the?’s at the expense of diminishing the −1’s. This is mainly because ? alleles are al-ways beneficial. Even if there is no learning they have at least a probability of 1/2of becoming a 1. In later stages the adaptive walks approach the line p + q = 1.Individuals that are allowed to learn for longer get closer to that line, showing therelevance of the full use of genetic plasticity through the learning algorithm.

An inspection of the evolutionary paths obtained with the AP model leads to thesurprising result that the use of learning may not be an effective way of transcribingenvironmental data into the genetic information. Individuals that are allowed tolearn have a reproductive success that outperforms that of individuals that do notlearn. However, upon evolution, the latter get closer to the optimum. Therefore,as long as the evolutionary process is only regarded in terms of the transcription ofenvironmental data into genetic information, we reach the conclusion that learninggives rise to a halting of this process, in opposition of what is usually understoodas the Baldwin effect. This effect has been already noticed in the schematic modelsof the preceding section.

The evolutionary paths within the AP model can be compared to those of thetwo extreme, conceptually similar situations mentioned in the preceding section,namely the H&M model without ‘wrong’ alleles, and perceptrons without ? alleles.In the latter case the evolutionary path is confined to the p-axis, as q = 0, andmutations leading to the appearance of ? alleles are not allowed. In both casesselection is seen to drive the population to the immediate neighborhood of theoptimal genome, getting closer to that target than populations of AP.

In the same figure we also present a typical evolutionary path obtained withinthe (full) H&N model (also with 300 generations), which shows the two differentregimes mentioned in the preceding section: a random search of the configurationswithout −1’s on a flat fitness landscape followed by the evolution confined to thep+q = 1 line. As in the case of a population having only rigid alleles, the optimalgenome is rapidly reached. Experiments with initial conditions with similar valuesof p but smaller value of q can hardly be considered a case of Darwinian evolutionbecause the corresponding adaptive walks spend most of the time performing arandom search of the ‘fitness ridge’ at p + q = 1.

The features discussed above can also be analysed by looking at the curves ofFig. 4. The stagnation effect displayed by the adaptive walks of Fig. 3 shows upas an asymptotic constant value of the average frequency of the 1 alleles in corre-spondence with a constant (high) value of the average fitness. In spite of having a


0 100 200 300 0 100 200 300

0 100 200 3000 100 200 300

1

6

11

16

21

F

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

q r

p

0.0

0.2

0.4

0.6

0.8

1.0

No plasticityNo learningLearning, M = 100Learning, M = 500

(a)

(c)

(b)

(d)

G G

G G

Figure 4. (a) Fitness as a function of the generation number. (b) Frequency of 1 alleles,(c) of ? alleles and (d) of −1 alleles as a function of the generation number. The initialconditions are p = 0.25 and q = 0.01. Full circles correspond to a learning protocole withM = 100, empty circles correspond to no learning and crossed circles correspond to apopulation without plastic alleles. The results of H&N are not included because with theseinitial conditions evolution corresponds to a random search during more than the allowed300 generations.

lower value of p, the average fitness of the populations of learning individuals isgreater. The opposite happens in the extreme case in which the ?’s are forbidden.In this case the asymptotic value of p is the largest while fitness reaches a lowerasymptotic value.

In Fig. 4(c) and (d) one can check that a population of learning individuals isindeed less efficient at eliminating the ? alleles while it is the best at eliminatingthe −1’s. The worse situation, as far as fitness is concerned, corresponds to thecase M = 1 in which individuals have plastic alleles but the learning protocole isnot used. In this case selection is not efficient at eliminating both −1 and ? alleles.It is indeed reasonable to expect that if individuals are endowed with plasticity, thisshould be used for some ontogenetic adaptation.

5. CONCLUSIONS

We have presented a stylized model to study the interaction of the genetic andbehavioral systems during evolution. We considered a population of perceptronshaving some plastic synapses that can be adjusted through learning using an algor-ithm that is a simple version of ‘on line learning’. Each perceptron is presentedwith M examples, and its flexible synapses are adjusted whenever its classificationis different from that of a reference perceptron that represents the environmental

16 H. Dopazo et al.

conditions. The reference is assumed to have all synapses codified by 1 alleles,without any loss of generality. The reproductive success of each individual is mea-sured in terms of the fraction of examples that are properly classified. Within thismodel, therefore, only a single optimal genotype exists that is equal to that of thereference perceptron.

Our framework is an extension of the H&N model, which also assumes a singleoptimal genotype and includes genetic plasticity through the ? alleles. Learningis represented as a random search, and stops either if the correct configuration isfound or if the maximum ‘learning time’ is exceeded. This process is a non-trivialway of defining a smooth fitness landscape in genotype space. However this onlyhas non-vanishing slopes in the immediate neighborhood of the subspace p+q = 1involving only sequences with 1 and ? alleles.

In our case the fitness landscape is the result of a much more delicate trade-offbetween how similar the genotype is to the optimal and how much of the classifi-cation performance can be improved due to learning. The result of such a balanceis that the fitness landscape has gradual slopes in the whole genotype space. Con-sequently, the corresponding adaptive walks unfold in the whole space spanned bythe frequencies p, q and r of the alleles 1, ? and −1.

The evolutionary process amounts to substituting the −1’s and ?’s by 1’s, thus‘transcribing’ the environmental data into genetic information. This process corre-sponds to an evolutionary path in the space of the frequencies p, q and r . Due to thelimited features of the H&N model this substitution only takes place in two phases.The first consists of the random search of the fitness ‘ridge’ in the region p+q = 1that only involves genotypes with 1 and ? alleles. This stage can hardly be con-sidered a Darwinian process, precisely because it proceeds at random due to theabsence of any differential fitness to drive it. The second phase consists of climb-ing the ridge in the single possible direction that amounts to substitute ?’s by 1’s.

In our model the average fitness is the compound result of the constitutive clas-sification performance of each individual and its learning ability. The ‘greaterdistance’ that is thus introduced between genotype and phenotype gives rise to avariety of adaptive walks in which the transcription of environmental informationinvolves the simultaneous substitution of −1’s and ?’s by 1’s at rates that are re-spectively established by the slopes of the fitness landscape. This model has beenstudied analytically within a schematic framework of infinite populations and alsoby performing numerical simulations in finite populations.

In the last stages of evolution we find that the result of the simultaneous presenceof the two elements effectively halts the transcription of the environmental datainto genetic information. This can be considered as a sort of hindrance of theBaldwin effect just as suggested by Mitchell (1996). It is important to stress thatthis evolutionary halting is a direct consequence only of the learning ability. This infact provides the extra success that is needed to compensate for a greater distance tothe optimal genotype. This effect has also been found in a more dramatic fashionwithin the analytic approach. Within that framework, and in the case of perfect


learning, it is clear that the population is not forced to reach the optimum at p = 1because ? and 1 alleles are both equally useful.

These effects are clearly seen by comparing the evolutionary path of a populationthat is able to learn with one in which learning is inhibited or even to the extremesituation of a population having only fixed alleles. In the last stages of the evo-lutionary process, the overhead payed by the ? and −1 alleles that are left is sosmall that there is not an appreciable selection pressure to eliminate them. Theresult is a stagnation of the transcription process. On the other hand, a populationof rigid genotypes is very good at getting close to the optimal genotype becausethat is the only way of surviving. A plastic system that is able to undergo develop-mental adaptations appears as more efficient in eliminating the wrong informationby changing it into plastic alleles. All these effects are magnified if learning isenhanced allowing the algorithm to run with a greater number M of ‘training ex-amples’.

The model that has been presented allows for several extensions that are presentlyunder progress. One is to study the effects of an additional cost to the learningprocess in order to take into consideration that there are no ‘free lunches’ in nature.As far as the Baldwin effect is considered, more important effects are, however, tobe expected from the inclusion of changes in the environmental parameters.

ACKNOWLEDGEMENTS

One of the authors (HD) wishes to acknowledge the support of CONICET througha postdoctoral fellowship and to the Santa Fe Institute for support in his participa-tion in the Summer School on Complex Systems (1997) where some early discus-sions about this subject took place. SR-G, MG and RP acknowledge economicsupport from the EU-research contract ARG/B7-3011/94/97. HD and RP holda UBA-research contract UBACYT PS021, 1998/2000. MG is a member of theCNRS.

REFERENCES

Ackley, D. and M. Littmann (1992). Interactions between learning and evolution, in Author: Please sup-ply place of publica-tion for all books.Artificial Life II, G. C. Langton, C. Taylor, J. Farmer and S. Rasmussen (Eds), Addison-

Wesley.

Baldwin, J. M. (1896). A new factor in evolution. Am. Nat. 30, 441–451.

Belew, R. K. (1990). Evolution, learning, and culture: Computational metaphors for adap-tive algortithms. Complex Syst. 4, 11–49.

Fontanari, J. F. and R. Meir (1990). The effect of learning on the evolution of asexualpopulations. Complex Syst. 4, 401–414.

18 H. Dopazo et al.

French, R. and A. Messinger (1994). Genes, phenes and the Baldwin effect: Learning andevolution in a simulated population, in Artificial Life IV, R. Brooks and P. Maes (Eds),MIT Press.

Goldberg, D. (1989). Genetic Algorithms in Search, Optimization and Machine Learning,Addison-Wesley.

Harvey, I. (1993). The puzzle of the persistent question marks: A case study of geneticdrift, Proceedings of the Fifth International Conference on Genetic Algorithms, S. For-rest (Ed.), Morgan Kaufmann.

Hertz, J., A. Krogh and R. G. Palmer (1991). Introduction to the Theory of Neural Com-putation, Redwood City, CA, USA: Addison-Wesley.

Hinton, G. E. and S. J. Nowlan (1987). How learning can guide evolution. Complex Syst.1, 495–502.

Ho, M. W., C. Tucker, D. Keeley and P. T. Saunders (1983). Effect of successive gener-ation of ether treatment on penetrance and expression of the bi-thorax phenotype in D.melanogaster. J. Exp. Zool. 225, 357–368.

Jablonka, E. and M. Lamb (1995). Epigenetic Inheritance and Evolution. The Lamarck-ian Dimension, Oxford: Oxford University Press.

Maynard Smith, J. (1987). When learning guides evolution. Nature 349, 761–762.Minsky, M. and S. Papert (1969). Perceptrons, MIT Press.Mitchell, M. (1996). An Introduction to Genetic Algorithms, MIT Press.Scharloo, W. (1991). Canalization: genetic and developmental aspects. Ann. Rev. Ecol.

Syst. 22, 65–93.Waddington, C. H. (1942). Canalization of development and the inheritance of acquired

characters. Nature 150, 563–565.

Received xx month 2000 and accepted xx month 2000

Documents

A Model for the Interaction of Learning and Evolution · Bulletin of Mathematical Biology (2000) 00, 1–18 A Model for the Interaction of Learning and Evolution H. DOPAZO∗ †,