12
J . theor . Biol . (1995) 175, 161–171 0022–5193/95/140161+11 $12.00/0 7 1995 Academic Press Limited Evolution of Altriusm in Optional and Compulsory Games J B P KDepartments of Cognitive Science and Philosophy , University of California at San Diego , La Jolla , CA 92093-0302, U.S.A. (Received on 14 September 1993, Accepted in revised form on 3 January 1995) In ‘‘optional’’ variants of the iterated prisoner’s dilemma, players may choose whether or not to participate. Members of evolving populations playing optional variants of the iterated prisoner’s dilemma by following inherited strategies tend to cooperate more than do members of populations playing the standard, ‘‘compulsory’’ version. This result is due to dynamical properties of the evolving systems: the populations playing the compulsory game can become stuck in states of low cooperation that last many generations, while the optional games provide routes out of such states to states of high cooperation. Computational simulations of the evolution of populations playing these games support these analytic results and illustrate the interactions between the genetic representation of strategies and the composition of populations in which those strategies are deployed. 7 1995 Academic Press Limited 1. Introduction Contemporary biological discussions of the evolution of altruism define altruistic behavior as that which increases the fitness of another animal at a cost in fitness to the animal engaging in the behavior. In the past decades, study of the evolution of genetic dispositions to altruistic behavior in this sense has been advanced by considering two special instances: (i) cases in which the animal’s own reproductive losses are made up through increases in the reproductive successes of kin, and (ii) cases in which the short-term losses of altruistic actions are made up through a system of reciprocation. Separating these instances is heuristi- cally helpful, but artificial, since it is possible that interactions with relatives might involve short-term fitness losses that are made up through contributions from both sources. Our concern is with cases of type (ii) and with a modification of what has become the standard way of dealing with such cases. Reciprocal altruism was originally introduced in a seminal paper by Robert Trivers (1971). Following the work of Axelrod and Hamilton (Axelrod & Hamilton, 1981; Axelrod, 1984), reciprocal altruism has been explored by considering strategies for playing iterated Prisoner’s Dilemma (henceforth PD). Initially it appeared that the strategy Tit -for -Tat would be evolutionarily stable (in the sense of Maynard Smith & Price, 1973; see Maynard-Smith, 1982), but further investigation has shown that it is not so (Boyd & Lorberbaum, 1987; Farrell & Ware, 1989; Mesterton-Gibbons, 1992). Indeed, the many discus- sions of the Axelrod–Hamilton approach have revealed unsuspected complexities in the selection of strategies for playing the iterated Prisoner’s Dilemma (Dugatkin & Wilson, 1991; Peck & Feldman, 1986; Lindgren 1992). Our aim in this paper is to suggest a modification of the Axelrod–Hamilton scenario, and to explore the dynamics of the selection of altruistic strategies. We begin by presenting a new version of the Prisoner’s Dilemma in which players have the option of not participating in interactions. Analytical evaluations of strategies for this game are presented to show that populations playing the optional games will achieve states of high cooperation more reliably than will populations playing the standard, compulsory, Prisoner’s Dilemma. We then present the results of E-mail: batali.cogsci.ucsd.edu or pkitcher.ucsd.edu 161

Evolution of altriusm in optional and compulsory games

Embed Size (px)

Citation preview

Page 1: Evolution of altriusm in optional and compulsory games

J. theor. Biol. (1995) 175, 161–171

0022–5193/95/140161+11 $12.00/0 7 1995 Academic Press Limited

Evolution of Altriusm in Optional and Compulsory Games

J B† P K‡

Departments of †Cognitive Science and ‡Philosophy, University of California at San Diego,La Jolla, CA 92093-0302, U.S.A.

(Received on 14 September 1993, Accepted in revised form on 3 January 1995)

In ‘‘optional’’ variants of the iterated prisoner’s dilemma, players may choose whether or not toparticipate. Members of evolving populations playing optional variants of the iterated prisoner’s dilemmaby following inherited strategies tend to cooperate more than do members of populations playing thestandard, ‘‘compulsory’’ version. This result is due to dynamical properties of the evolving systems: thepopulations playing the compulsory game can become stuck in states of low cooperation that last manygenerations, while the optional games provide routes out of such states to states of high cooperation.Computational simulations of the evolution of populations playing these games support these analyticresults and illustrate the interactions between the genetic representation of strategies and the compositionof populations in which those strategies are deployed.

7 1995 Academic Press Limited

1. Introduction

Contemporary biological discussions of the evolutionof altruism define altruistic behavior as that whichincreases the fitness of another animal at a cost infitness to the animal engaging in the behavior. Inthe past decades, study of the evolution of geneticdispositions to altruistic behavior in this sense has beenadvanced by considering two special instances: (i) casesinwhich the animal’s own reproductive losses aremadeup through increases in the reproductive successes ofkin, and (ii) cases in which the short-term losses ofaltruistic actions are made up through a system ofreciprocation. Separating these instances is heuristi-cally helpful, but artificial, since it is possible thatinteractions with relatives might involve short-termfitness losses that are made up through contributionsfrom both sources.

Our concern is with cases of type (ii) and with amodification of what has become the standard wayof dealing with such cases. Reciprocal altruism wasoriginally introduced in a seminal paper by RobertTrivers (1971). Following the work of Axelrod and

Hamilton (Axelrod & Hamilton, 1981; Axelrod, 1984),reciprocal altruism has been explored by consideringstrategies for playing iterated Prisoner’s Dilemma(henceforth PD). Initially it appeared that the strategyTit-for-Tat would be evolutionarily stable (in the senseof Maynard Smith & Price, 1973; see Maynard-Smith,1982), but further investigation has shown that it is notso (Boyd & Lorberbaum, 1987; Farrell & Ware, 1989;Mesterton-Gibbons, 1992). Indeed, the many discus-sions of the Axelrod–Hamilton approach haverevealed unsuspected complexities in the selection ofstrategies for playing the iterated Prisoner’s Dilemma(Dugatkin & Wilson, 1991; Peck & Feldman, 1986;Lindgren 1992). Our aim in this paper is to suggest amodification of the Axelrod–Hamilton scenario, andto explore the dynamics of the selection of altruisticstrategies.

We begin by presenting a new version of thePrisoner’s Dilemma in which players have the optionof not participating in interactions. Analyticalevaluations of strategies for this game are presented toshow that populations playing the optional games willachieve states of high cooperation more reliably thanwill populations playing the standard, compulsory,Prisoner’s Dilemma. We then present the results ofE-mail: batali.cogsci.ucsd.edu or pkitcher.ucsd.edu

161

Page 2: Evolution of altriusm in optional and compulsory games

. . 162

computer simulations of these games that confirm theanalytic results, and also illustrate the dynamics ofstrategy changes in different kinds of game.

1.1. ’

In each round of the standard Prisoner’s Dilemma,each of two players must choose one of two actions: C,(Cooperate) or D (Defect). After each player haschosen, a payoff for each is computed from a payoffmatrix as follows:

C D

R TC R S

S PD T P

withR=reward formutual cooperation;T=‘‘traitor’’payoff to defector if other cooperates; S= ‘‘sucker’’payoff to cooperator if other defects; P= payoff formutual defection, where TqRqPqS and 2RqT+S.

This standard game is compulsory in the followingsense: first, theplayers haveno choice of partner but areforced to interact with the assigned individual; second,they have no choice but to play: there is no individualasocial behavior available to them. In focusing on theiterated Prisoner’s Dilemma, Axelrod and Hamiltonconsidered the evolution of altruism in a population ofanimals that is forced into social activity, so that theproblem is posed in terms of the victory of cooperationover anti-social behavior.Aswe shall explainbelow,webelieve that it may be more realistic to consider theevolution of altruism in situations in which thepossibility of asociality is present, and that this willmake a difference to the evolutionary scenarios.

Various authors have considered the possibility thatanimals facing repeated game-like situations mighteither have an asocial option (Miller, 1967; Lima,1989) or be able to discriminate partners (Noe et al.,1991; Peck, 1990; Bull & Rice, 1991), but we believethat the concept of an optional game hasnot previously been clearly articulated (the closestapproach we know is that of Stanley et al., 1994). Inan optional game, individuals have the possibility ofsignaling willingness to play to other individuals in thepopulation. When signals of mutual willingness aregiven, the individuals play the game together.An animal that is unwilling to play with any of theanimals willing to play with it is forced to opt out,performing some asocial behavior. We allow that suchasocial behavior might have some intermediate benefitto the player, which we represent by the payoff W. Inthe cases of interest, TqRqWqPqS, i.e. theopting-out payoff is less advantageous to both players

than is mutual cooperation, but is more advantageousto both players than is mutual defection. Optionalgames of this sort will be referred to as ‘‘fully optional’’versions of the iterated Prisoner’s Dilemma.

Intermediate between fully optional and compul-sory games are ‘‘semi-optional’’ games. These are oftwo types. In one version, individuals have no choiceof partners, but they do have the ability to opt out. Inthe other, they cannot opt out, but can signalwillingness to play with particular partners. The lattertype of game will be of no concern to us, and we shallhenceforth use ‘‘semi-optional’’ game to refer only tothose games in which there is the possibility of optingout, but no choice of partners. We hope that thedistinction between compulsory, fully optional, andsemi-optional games is now sufficiently clear, andproceed with an illustrative example.

1.2. -

Mutual grooming among primates serves thefunction of removing parasites from the fur. An animalthat grooms another incurs a cost, depending on thelength of time taken up in grooming, an animal that isgroomed receives a benefit, which we can also assumeto be proportional to the amount of time spend ingrooming. Assume that there are two possibilities foreach animal: to groom for a long period (C) or togroom for a short period (D). Let the cost of groomingfor a short period be c0, the benefit from a short periodof grooming be b0, the cost of a long period ofgrooming be c2, and the benefit from a long period ofgrooming be b2. Then, in a single interaction betweentwo animals, the payoff matrix will be

C D

b2−c2 b2−c0

C b2−c2 b0−c2

b0−c2 b0−c0

D b2−c0 b0−c0

Since b2qb0 and c2qc0, it is clear that b2−c0 is thehighest payoff and b0−c2 is the lowest. We mayreasonably assume that a more thorough groomingis worth the extra time spent, and thus thatb2−c2qb0−c0. Given this inequality, it follows that2(b2−c2)qb2−c2+b0−c0=(b2−c0)+(b0−c2). Thusthe payoff matrix may be rewritten as:

C D

R TC R S

S PD T P

Page 3: Evolution of altriusm in optional and compulsory games

163

with: R=b2−c2; T=b2−c0; S=b0−c2; P=b0−c0; andso TqRqPqS and 2Rq(T+S). These are theconditions for a compulsory Prisoner’s Dilemma.

But our primates surely have another option—theycan groom themselves. Presumably the payoff fromself-grooming is less than one would receive from anearnest grooming job from another, but greater thanthe payoff from a more desultory performance. Thiscan be modeled by assigning an intermediate costc1, and an intermediate benefit b1 for self-grooming.The payoff from self-grooming is thus b1−c1, andwe will assume that b2−c2qb1−c1qb0−c0. Giventhat primates can also signal willingness to engage ingrooming interactions with some animals and toforswear grooming interactions with others, we canexpect that grooming interactions among primatestake the form of an optional Prisoner’s Dilemma,where the opt-out payoff W is intermediate betweenR (the reward for mutual cooperation) and P (thepenalty for mutual defection). To understand theevolution of mutual grooming under natural selec-tion, we therefore need to consider the evolution ofcooperative behavior in iterated optional Prisoner’sDilemmas.

Our illustration is loosely based on recentdiscussions of the social lives of primates (forexample Smuts et al., 1987). Plainly, it would bepossible to test our assumptions about the ordering ofcosts and benefits by assessing the contributions tofitness of various commitments of time and of variousstates of parasitic infestation. Until such testing isdone, we can only suggest that it is plausible that thispart of primate social life may be understood interms of optional Prisoner’s Dilemmas. We alsothink it likely that optional games will prove usefulin understanding cooperative hunting, cooperativeforaging, mate-seeking coalitions, systems ofdefense against predation, and cooperation amongfemales in exercising mate choice. But in allthese cases the promise of the approach werecommend must be assessed in the light of fieldstudies.

2. Analytic Results for Simple Strategies

Consider the following very simple strategies forplaying an iterated optional Prisoner’s Dilemmawith the standard payoffs T, R, P, S and the opt outpayoff W:

Solo: always opt out;Undiscriminating Altruist: always interact and

always play C;Discriminating Altruist: interact with any animal

that is willing to interact with you provided that thatanimal has never previously defected on you, andcooperate in any such interaction, (or, if there are nosuch animals, opt out);

Undiscriminating Defector: always interact andalways play D;

Discriminating Defector: interact with any animalthat is willing to interact with you, provided that thatanimal has never previously defected on you, anddefect in any such interaction (or, if there are no suchanimals, opt out).

We now consider, from a standard evolutionarygame-theoretic perspective, the fitness relationshipsamong some of these strategies under variousconditions. Throughout we shall suppose that thepopulation of animals with we are dealing has size N,and that the average number of occasions that ananimal has to play the optional Prisoner’s Dilemmaduring its lifetime is M. We assume that M issignificantly larger than N, an assumption that we baseon the idea that animals have frequent opportunitiesfor playing the game and that they interact with arelatively small number of conspecifics. (Think of thenumber of times primates groom one another duringtheir lifetimes and the relatively small sizes ofthe groups of conspecifics with which they interact.)Since our strategies involve the ability of animals torecognize one another, to recognize when another hasdefected on them, and to remember past defections, weare obviously supposing that our animals havesubstantial cognitive capacities. We are encouragedboth by recent studies of the cognitive lives of primates(Cheney & Seyfarth, 1990) and by investigationsof cooperative behavior in guppies (Dugatkin &Alfieri, 1991). However, these assumptions aboutthe cognitive capacities of the animals concernedare plainly in tension with the ‘‘like begets like’’assumption of evolutionary game theory (MaynardSmith, 1982). For the moment, we shall simplysuppose that ‘‘like begets like’’ is a good approximaterule of thumb. Later sections will explore thedynamics of the evolutionary game in a less simplisticway.

1.

Suppose that the population contains N−nDiscriminating Defectors and n Solos (nq0). Thepayoff to each Solo is MW (on each of the M occasionson which the opportunity arises, a Solo opts out

Page 4: Evolution of altriusm in optional and compulsory games

. . 164

for payoff W). The payoff to each DiscriminatingDefector is:

(N−n−1)P+(M−N+n+1)W,

since each Discriminating Defector plays with eachother Discriminating Defector just once, each attempt-ing to exploit the other and receiving payoff P; when allhave been tested, all opt out. Since WqP, the payoffto Solos is greater than that to DiscriminatingDefectors. A population fixed for DiscriminatingDefector should be invaded by Solo, and Solo shouldsweep to fixation.

1.1.

The payoff to Undiscriminating Defector is less thanthat for Discriminating Defector. The payoff to Solo isthe same as in Result 1.

2.

In a population consisting entirely of Solos, a singlemutant Discriminating Altruist is indistinguishablefrom the other members. If there are n (q1)Discriminating Altruists in a population with N−nSolos, then the payoff to each Discriminating Altruistis MR (provided that n is even) and (n−1)Mr/n+MW/n (when n is odd). (These provisions are neededsince, with an odd number of Discriminating Altruists,if the decision opportunities arise for all simul-taneously, arithmetical considerations dictate that onewill have to opt out; the probability that any particularDiscriminatingAltruist is the unlucky one is 1/n.)Givennq1, it is trivial that the payoff to a DiscriminatingAltruist exceeds that to aSolo, for all values of n.HenceDiscriminating Altruist can be expected to invade apopulation of Solos, and to sweep to fixation.

2.1.

,

;

,

If a population of Solos containing a singleDiscriminating Altruist mutant comes to have a singleDiscriminating Defector mutant, then the Discriminat-ing Altruist mutant will be selected against. Thecondition for Discriminating Altruists to enter thepopulation when Discriminating Defector mutationsare also likely to arise is that the order of mutations be[Discriminating Altruist, Discriminating Altruist]rather than [Discriminating Altruist, DiscriminatingDefector]. If both mutations arise at the samefrequency, we can expect that the probability that

Discriminating Altruists will drift into a populationof Solos will be p/2, where p is the probability thatDiscriminating Altruists would increase in frequency ina population of Solos without Discriminating Defectormutations (i.e. a population like that described inResult 2).

3.

,

,

.

Suppose that a population contains n DiscriminatingAltruists and N−n Discriminating Defectors wherenq1. The worst case for a Discriminating Altruist is tobe exploited by each Discriminating Defector and thento spend the rest of the decision opportunitiescooperating with other Discriminating Altruists (or,occasionally, opting out if the number of Discriminat-ing Altruists is odd). This means that the payoff to aDiscriminating Altruist is bounded below by

(N−n)S+(M−N+n)[(n−1)R/n+W/n].

The best a Discriminating Defector can expect todo is to exploit each of the Discriminating Altruistsonce, and opt out on the remaining occasions. Hencethe payoff to a Discriminating Defector is boundedabove by

nT+(M−n)W.

The condition for Discriminating Altruists to increasein frequency under selection is thus

(N−n)S+(M−N+n)[(n−1)R/n+W/n]

qnT+(M−n)W.

This reduces to

M/NqK+Hn/N,

where

K=[(n−1)R+W−nS]/(n−1)(R−W);

H=[n(T+S)−(n+1)W−(n−1)R]/(n−1)(R−W).

When n is large (approximately N), the crucialcondition for the maintenance of DiscriminatingAltruists is

M/Nq(T−W)/(R−W),

Page 5: Evolution of altriusm in optional and compulsory games

165

which is clearly satisfied if the reward for cooperatingis significantly larger than the payoff for opting out, thepayoff to exploiters is not too big, and the number ofdecision opportunities is sufficiently large relative tothe population size. Hence it will be possible forDiscriminating Altruists to resist invasion by Discrimi-nating Defectors. At the other extreme, when n is small,the worst case is given by n=3. Since N�3, we canignore the term in H, and approximate the inequalityby

M/Nq(2R+W−3S)/2(R−W).

As before, if the reward for cooperation is largein relation to the payoff for opting out, and ifthe number of decision opportunities is large inrelation to population size, Discriminating Altruistscan increase in frequency against DiscriminatingDefector.

These results are encouraging, for they suggestthat, contrary to our naıve expectations, it might bevery hard for anti-social or asocial behavior to beevolutionarily sustainable. By Result 1, anti-socialpopulations are likely to decay into states of asociality.ByResult 2, asocial populationsare likely tobe invadedby Discriminating Altruists, and, given Result 3,Discriminating Altruists can resist invasionby Discriminating Defectors. The only problem fora population of Discriminating Altruists is thatUndiscriminating Altruists can drift in unnoticed, and,once thepopulationhas a sufficient numberof them it isripe for invasionbyDiscriminatingDefector (or evenbyUndiscriminating Defector). Nevertheless, the combi-nation of Results 1 and 2, and also Result 3, show thatDiscriminating Altruists can stage a comeback. Ouranalysis reveals that, while altruism may not be stable,the absence of altruism is also unstable. Moreover,when fitness differences are marked, we can expect thatpopulations will spend most of their time in states ofhigh cooperation, with occasional crashes and briefrecovery periods. As we shall see later, thisoptimistic expectation is confirmed by computersimulations.

We now briefly explore the consequences ofsupposing that interactions are not fully optional.Animals are paired at random, and can either playwith their assigned partner or opt out. As before,Solos always opt out, Discriminating Altruists playif and only if the assigned partner has not previouslydefected on them (and opt out otherwise) andthey cooperate when they play, DiscriminatingDefectors also play if and only if the assigned partnerhas not defected on them and they defect when theyplay.

4.

Initially Discriminating Altruists are indistinguish-able from Solos. Once there are two (or more)Discriminating Altruists there is a non-zero probabilitythat they will be paired, and, on such occasions,each will receive R, a payoff that exceeds the opt outpayoff W. So the fitness of Discriminating Altruists canbe written as M(rR+(1−r)W) where rq0, whichexceeds the payoff for Solos of MW.

5.

In a population of Discriminating Defectors,the payoff to a Discriminating Defector will beM(rP+(1−r)W), where rq0 (r is now the probabilitythat two Discriminating Defectors who have neverpreviously met are paired). The payoff to Solo is MW,and, since WqP, Solo has a selective advantage.

6.

,

-

Suppose the population has n DiscriminatingAltruists and (N−n) Discriminating Defectors. Theexpected number of encounters of a DiscriminatingAltruist with a Discriminating Defector is M(N−n)/N.The total payoff to a maximally unlucky Discriminat-ing Altruist from these encounters is(N−n)S+M(N−n)W/N. The rest of the timeDiscriminating Altruists are paired with one anotherfor total payoff MnR/N. The total payoff forDiscriminating Altruists is thus bounded below by

(N−n)S+M[(N−n)W+nR]/N.

The total payoff for Discriminating Defector isbounded above by

(N−n)T+(M−N+n)W.

Discriminating Altruists will have a selective advantageprovided that

Mn(R−W)/Nq(N−n)(T−S−W).

When Discriminating Altruists are prevalent (n is closeto N), this condition becomes

M/Nq(T−S−W)/(R−W),

which is very similar to the condition in thefully-optional game.WhenDiscriminatingAltruists are

Page 6: Evolution of altriusm in optional and compulsory games

. . 166

rare (nq1, but small in relation to N), increase ofDiscriminating Altruists requires that

2M/N2q(T−S−W)/(R−W).

This is farmore exacting than the condition ofResult 3,and it is thus only in special cases (M extremely large)that we could expect a small number of DiscriminatingAltruists to invade a population fixed for Discriminat-ing Defector in the semi-optional game.

From Results 4–6, we can expect that the dynamicsof the evolution of cooperative behavior in semi-optional games will resemble that in the fully-optionalcase, but that recovery from crashes is likely tobe slower and mediated by the presence of Solos.Intuitively, the direct route from DiscriminatingDefector to Discriminating Altruist is now partiallyblocked, and, very frequently, only the trajectory fromDiscriminating Defector to Solo to DiscriminatingAltruist will be available.

3. Computational Simulations

The above analytical results are based on assumingthat the populations are relatively simple in two ways:first, that the populations consist of individuals whouse one of a small set of strategies; second, that thestrategies are chosen from the set of simple strategiesdescribed in Section 2. In order to investigate theproperties of more heterogeneous populations, and ofpopulations containing individuals following morecomplex strategies, we performed a number ofcomputational simulations of populations of playerswho participate in the iterated Prisoner’s Dilemma byfollowing inherited strategies. Our simulation resultssupport and expand upon the analytic results, and alsoillustrate how the genetic representation of strategiescan influence the evolutionary dynamics of popu-lations whose members deploy those strategies.

This section describes the algorithms used in thecomputational simulations. The next section presentsthe results of those simulations, and describes thedynamics of a few of the runs we performed.

In the simulations, the actions performed by eachplayer are represented as a history sequence. Thelengths of histories recorded in our simulations variedfrom 2 to 4. For example the sequence (C C) indicatesthat a player cooperated on both of the previous tworounds; the sequence (D C) indicates that a playerdefected two rounds ago, but cooperated the lastround. The symbol N is used when the players haven’tplayed as many rounds as the history records. Thus fora history of length 2, the sequence (N N) indicates thatno rounds at all have been played; the sequence (N C)

means that a single round was played, and the playercooperated.

Strategies are represented by pairing each possiblehistory of opponent’s actions with the action to makein response the next round. This pairing of a historywith a response action will be called a ‘‘move’’. Thefollowing move represents the response of defecting ifthe opponent cooperated twice in a row:

((C C) D)

This move represents the action of cooperating in thefirst round:

((N N) C)

Given a specific history length, a complete strategycontains a move for each possible history sequence ofthat length. For example this strategy representsthe Tit-for-Tat strategy in which two steps of historyare recorded:

((N N) C)((N C) C)((N D) D)((C C) C)((C D) D)((D C) C)((D D) D)

In this strategy, the player begins by cooperating andthen responds with whatever its opponent did the lastround.

A sequence of rounds between two players issimulated by using the strategies of the two players todetermine their moves for each round, dependingon what the other player did the last rounds. Eachplayer receives an increment to a ‘‘fitness’’ valueaccording to this payoff schedule:

Payoff Explanation Value

T Defect if other cooperates 7R Both cooperate 5W Opt out 3P Both defect 2S Cooperate if other defects 0

In a generation, each player plays against each otherplayer in the population some number of rounds.

For example consider a simulation of thecompulsory game in which one step of history isrecorded. Two players with the following strategies arechosen to play the game:

((N) C) ((N) D)((C) C) ((C) D)((D) D) ((D) C)Player one Player two

Page 7: Evolution of altriusm in optional and compulsory games

167

The players begin their interactions with fitnessvalues of 0. In the first round both of the histories are(N), so player one will cooperate and player two willdefect. Player one’s fitness will remain 0, while playertwo’s fitness will be set to 7. In the second round, playerone consults its strategy with the history (D) anddefects. Player two uses the history (C) and alsodefects. Both players fitness values are incrementedby 2. On the third round, player one defects again,but player two cooperates. Hence player one receives7 and player two receives 0. The next round player onecooperates again and so does player two, so they bothreceive 5 points. The next round player one cooperatesbut player two defects. After this point the histories areidentical to that after the first move, and so the playersperformas they did then, and continue to cycle throughthe same sequence of moves.

In the first generation of a simulation, each of theplayers is assigned a random strategy—the responseaction to each possible history sequence is randomlychosen from the available moves. At the end of eachsubsequent generation, the set of players is sorted inorder of decreasing value of the total fitness payoffseach received while playing against the other players.The top third of the players is preserved into the nextgeneration, and those players are also used to createthe strategies of the rest of the players in the nextgeneration. Each new strategy is created by mixingthe strategies of two of the most successfulplayers—for each possible history, the responseaction is taken randomly from one or the otherparent’s strategy. Mixing strategies in this way has theeffect of rapidly distributing advantageous movesthrough the population. A small fraction (for most ofour runs: 1%) of the moves are then mutated byreplacing the action part of the move with a randomlychosen action.

In each generation, a record is kept of the totalnumber of moves of each type: ‘‘cooperate’’, ‘‘defect’’,and for the optional games, ‘‘opt out’’. At the end ofa generation the average fitness of the population isalso recorded. A sample run of the compulsory gameis shown in Fig. 1. As is typical for the runs reportedhere, the population moves through a number of statesin which the levels of cooperation and defection arefairly stable for tens of generations or longer. Rapidtransitions then occur, yielding other stable states.

In this run the population quickly enters a state ofhigh cooperation and fitness. Around generation 70, itreverts to a state of virtually 100% defection and lowfitness. This is followed at generation 150 with a stateof 50% cooperation, 50% defection and anintermediate fitness value. Around generation 310, thepopulation again finds a state of very high cooperation.

The results of an entire run are summarized with twonumerical values:

The ‘‘cooperativity’’ measure is meant to quantifythe degree to which cooperative behavior dominatedduring the run. Cooperativity is defined as the fractionof generations during a run when the differencebetween the percentages of cooperative moves anddefection moves is greater than a threshold of 25.The cooperativity value for the run shown in Fig. 1 is0.494. (The precise value of the threshold forcomputing the cooperativity value is not crucial. Forexample changing the threshold to 70 for the run inFig. 1 changes the cooperativity value from 0.494 to0.488. This is because the runs tend to remain in stateswhere either cooperation or defection is relatively high,and the other is correspondingly low.)

The ‘‘instability’’ measure is meant to quantify thedegree to which the amount of cooperation varies fromgeneration to generation. This is defined as the averageof the square of the difference between the number ofcooperative moves in successive generations. Theinstability value for the run shown Fig. 1 is 19.41. Theinstabilitymeasure increases if a run entersmore states,or if the states that it enters do not have constant valuesof cooperation. The value of the instability measure forsimulations depends on the simulation parameters.For example in a set of 20 runs of 500 generations ofthe compulsory game with 36 players, a history oflength 2, and 10 rounds between each pair of players,increasing the fraction of moves mutated eachgeneration from 0.1% to 10% changed the averageinstability value from 3.28 to 28.37. As is shown below,the instability value is also strongly affected by whetherthe game is compulsory or optional.

Two versions of the optional game were simulated:the ‘‘semi-optional’’ and the ‘‘fully-optional’’ game. Inthe semi-optional game players are paired off as in thecompulsory game, and each pair plays some number ofrounds against each other. If either player chooses toopt out in any move, both players receive the opt-outpayoff W.

In the fully optional game, players first attempt tolocate other players who will not opt out against them.Such pairs then play one round against each other asin the compulsory game, and each player keeps arecord of what its opponent does which it consults thenext time they are looking for partners.All playerswhodo not find a willing partner in a given round areawarded the opt-out payoff. The fully optional gamerequires more computational overhead to simulate,and the runs take much longer, than the semi-optionalgame, because each player must record all of thehistories of its interactions with all of the players it hasplayed against in a generation; and because the process

Page 8: Evolution of altriusm in optional and compulsory games

. . 168

T 1Mean Standard Mean Standard

Game cooperativity deviation instability deviation

Compulsory 0.105 0.160 5.28 3.80Semi-optional 0.719 0.283 13.2 5.96

of pairing off the players is more complicated than forthe semi-optional game.

Runs of the optional games are illustrated anddiscussed below.

4. Simulation Results

The results of this study can be summarized inTables 1 and 2. The statistics in Table 1 are for a setof 27 runs, each of 500 generations. There were 60players in each run, two steps of history were recorded,and each pair of players played 30 rounds eachgeneration. As can be seen from Table 1, thecompulsory game yields populations which are morestable but less cooperative than those that play thesemi-optional game.

Fewer runs of the fully optional game were done, asthey take much longer. However the general result issimilar. Table 2 shows results for 12 runs of each of theindicated games of 500 generations each, with 30players, two steps of history were recorded and theplayers played against each other an average of tenrounds.

The reason that the statistics for the compulsorygame in Table 2 are different from those in Table 1 isthat the algorithm for pairing off players in this set ofruns corresponded to that used for the fully optionalgame, except that no player could opt out. In eachround of the game, a player was paired with anotherplayer in the population at random. The fact that somepairs played fewer (and some played more) roundsthan the average of ten is reflected in the higherinstability values for these runs as compared with thefirst table. Again however, the optional version of thegame yields more cooperative generations than doesthe compulsory version of the game, and the instabilityof the optional game is higher than that of thecompulsory game.

4.1.

Runs of the compulsory game tend to become stuckin a small number of states, either with very highcooperation, very high defection, or half cooperationand half defection. Often a run will be stuck in a statefor many generations. This is reflected in the relativelylow instability value for the compulsory game, and thehigh ratio of the standard deviation of its cooperativityto the mean value.

For example, when one step of history is recorded,a run often first enters a state where all of theplayers defect each round. This is because the initialstrategies are random, and so there are a large numberof players who cooperate no matter what theiropponents do. Hence the defectors receive the high Tpayoff, and their offspring take over the population.

Subsequent events can be understood by examiningthe strategies shown in Fig. 2, beginning with theUndiscriminating Defector strategy shown in Fig. 2a.Two of the single-move mutations of this strategy willbe at a disadvantage playing against it because theywillcooperate in a round where the original will defect. Ifeveryone in the population is defecting all the time,however, the move ((C) D) is never exercised. So amutation from the move ((C) D) to ((C) C) will notaffect the behavior of (nor the fitness payoffsaccumulated by) a player using the strategy.

After several generations of this kind of ‘‘geneticdrift’’, a population initially containing only playerswith strategy 2a can be expected to contain a fractionof players with strategy 2b. From here, a singlemutation in the ((C) D) move can change the strategyto Tit-for-Tat, as shown in Fig. 2c. Provided thatenough of these mutations occur at about the sametime, players using Tit-for-Tat can dominate thepopulation, which will enter a state of very highcooperation.This is essentiallywhat happens in the runshown in Fig. 1 around generation 10.

T 2Mean Standard Mean Standard

Game cooperativity deviation instability deviation

Compulsory 0.248 0.267 14.8 8.85Fully-optional 0.668 0.292 29.9 5.10

Page 9: Evolution of altriusm in optional and compulsory games

J. theor. Biol.

J. B P. K (facing p. 168)

F. 4. A run of the semi-optional game, with one step of historyF. 1. A run of the compulsory game, with two steps of historyrecorded. The trace marked O records the percentage of ‘‘opt out’’recorded. The trace marked C records the percentage of ‘‘cooperate’’

moves each generation; the trace marked D records the percentage moves each generation. The ‘‘cooperativity’’ value for this run is 0.581;of ‘‘defect’’ moves each generation; the trace marked F records the its ‘‘instability’’ is 45.1.average fitness of the population as a percentage of the maximumpossible value. The ‘‘cooperativity’’ value for this run is 0.494; its‘‘instability’’ is 19.41.

F. 6. A run of the fully optional game with two steps of historyF. 5.A portion of a run of the semi-optional game, showing tworecorded. The ‘‘cooperativity’’ value for this run is 0.716; its‘‘predator–prey’’ cycles, and the beginning of a state of high

cooperation. ‘‘instability’’ is 31.7.

Page 10: Evolution of altriusm in optional and compulsory games

169

((N) D) ((N) D) ((N) C) ((N) C) ((N) C) ((N) D)((C) D) ((C) C) ((C) C) ((C) C) ((C) D) ((C) D)((D) D) ((D) D) ((D) D) ((D) C) ((D) C) ((D) C)

a b c d e f

F. 2. Some strategies for the compulsory game, when one step of history is recorded.

However Tit-for-Tat is not immune to variation.For example the ((N) C) move could mutate back to((N) D) and return the population to strategies like 2band a state of high defection. This is what happens neargeneration 70 in the run shown in Fig. 1.

Another mutation from Tit-for-Tat can yield theUndiscriminating Altruist strategy 2d. If the populationis in a state where everyone else is followingTit-for-Tat, following the Undiscriminating Altruiststrategy will not affect the fitness of the player, asthe ((D) C) move in Tit-for-Tat is never exercised.However if defectors appear, they can exploit theplayer who always cooperates. For example, strategy2e is one mutation away from UndiscriminatingAltruist. If it appears, the number of players using itwill increase in the population as they prey on thecooperators. A single mutation from 2e is 2f, whichincreases defection in the population even more.

Strategies like the one shown in Fig. 2f can leadto very stable populations, with relatively lowcooperation and fitness values. A pair of playersplaying this strategy will cooperate 50% of the timeand defect 50% of the time. Furthermore there is nopossibility of genetic drift with this strategy as eachmove of it is exercised and each one-mutation variantof this strategy is at a disadvantage against it. Thepopulation in the run shown in Fig. 1 enters a statewhere each member of the population is playinga variant of this strategy around generation 155.Note that the state remains steady for around 60generations, before the population returns to a state ofrelatively high cooperation.

With longer history lengths, strategies like 2f, inwhich pairs of players using the strategies alternatebetween cooperation and defection, can be very stable.For example the strategy shown in Fig. 3 is a variantof 2e for a history length of 2. In Figure 3, the ‘*’character indicates moves that are exercised whenplayers using this strategy play against each other. Fiveof the seven moves are used in such an encounter, andtherefore cannot mutate away from those shownwithout having an immediate effect. Mutations to theother two moves actually serve to reinforce thepatterns of interactions seen when the rest of theplayers are using this strategy. It is important to notethat the stability of this strategy is not a matter of anystatic superiority of the strategy compared with allpossible competitors: instead this strategy is stable

because its genetic representation is such that there areno better alternatives to it via single mutations. So apopulation in which all players are following such astrategy constitutes a robust local optimum in theevolutionary search. We have observed runs of thecompulsory game in which the population remainsstuck in states where populations are using suchstrategies for thousands of generations, thoughtransitions to states of high cooperation or highdefection eventually occur.

4.2.

With the option of opting out, populations in theoptional games can escape from states in which thereis a significant amount of defection. Since the payofffor opting out, W, is larger than both S, the payoff forcooperating when the opponent defects, and P, thepayoff for mutual defection, the presence of defectionin the population makes opting out an advantageousalternative. Thus populations playing the optionalgames will tend to revert to states of high opting outwhenever a number of defectors appear. This factalone accounts for some of the reason why the optionalgame leads to higher cooperativity—it just can’tbecome stuck in states of high defection.

A run of the semi-optional game with one step ofhistory recorded is shown in Fig. 4. As is typicalfor runs of the optional games, the population entersmore states than otherwise equivalent runs of thecompulsory game, and the states that it enters aremuchless stable.

Since the initial strategies are random, and thereforeinclude many defectors, opting out is initially favored,and most runs of the semi-optional game enter statesof virtually 100% opting out in the early generations,often as early as generation 10. From then one, thepopulations tend to go through cycles of various sorts.One kind of cycle involves the appearance of defectors,

((N N) D) (((N C) C)((N D) C) (((C C) D) (((C D) C) (((D C) C) (((D D) D)

F. 3. A stable sub-optimal strategy for the compulsory game,when two steps of history are recorded. The moves marked * areexercised when a pair of players with this strategy compete againsteach other.

Page 11: Evolution of altriusm in optional and compulsory games

. . 170

T 3Optional Mean Standard Mean Standard

game cooperativity deviation instability deviation

Semi- 0.401 0.263 27.8 10.02Fully- 0.668 0.292 29.9 5.10

which is usually soon followed by a reversion to highopting out.

Another kind of cycle seen in the semi-optionalgames involves the appearance and subsequentdisappearance of players who always cooperate.If almost every other member of the population isopting out each round, there is no danger to the fewplayers who mutate to strategies that involve somecooperation. Indeed, when these players play againsteach other, and receive the reward R for mutualcooperation (which is larger than the W opting outpayoff), they will increase in the population. Howeverwhen there are lots of careless cooperators in thepopulation, there is an advantage to be gained bydefecting. If mutations occur to create strategies thatinvolve some defection, defectors will rapidly increase,effectively destroying the cooperators. Since theresultant level of defection is high, opting out is nowrelatively advantageous, and the population reverts toa state where everyone is opting out. This pattern issimilar to the ‘‘predator–prey’’ cycles seen inpopulation biology.

A typical run of the semi-optional game will gothrough a number of cycles. In some cases it is possiblethat the mutations that increase defection eitherinclude or are followed by mutations that increase thediscriminatingness of the strategy, either by playingTit-for-Tat, or by opting out when an opponentdefects, i.e. the Discriminating Altruist strategy. If suchmutations occur before defection rises significantly, itis possible for the players possessing these strategies tocontinue to increase in the population even whendefection rises temporarily. Thus the population canenter and remain in a relatively stable state of highcooperation.

This process is illustrated in Fig. 5. This showsa portion of a run of the semi-optional game. Atgeneration 60, virtually all of the players are opting outin each game. Around generation 70, a few playersbegin cooperating. Since they manage to find othercooperators, their numbers increase. Within twogenerations, however, a few defectors appear. Sincethese defectors will prevail over the cooperators, theirnumbers increase rapidly, and by generation 80 or so,the cooperators are gone. A similar but less dramaticpattern of this sort begins almost immediately, and isover by generation 90.

At generation 91, another set of cooperatorsappears, followed closely by defectors. However in thiscase, at least some of the cooperators are playing adiscriminating strategy, and in fact by generation 108,the defectors begin disappearing from the population.By generation 110, virtually all of the players arecooperating all of the time.

As with the compulsory game, states of highcooperation are not stable either. With all of themembers of the population cooperating, genetic driftcan set in, and mutate some of the discriminatingstrategies to their careless versions, providing fodderfor defectorswhen they appear bymutation.As before,the high rate of defection will ultimately be followed byan increase in opting out.

The general dynamics of the fully optional game aresimilar to those for the semi-optional game. Slightlyhigher cooperativity is seen, as illustrated in Table 3,but it is not clear if this is significant. The parametersfor these runs are: 30 players, 10 game interactions,history length of 2. The statistics are for 12 runs of 500generations each.

A run of the fully optional game with 2 steps ofhistory recorded is shown in Fig. 6. The populationquickly finds a state of high cooperation without anintervening period of opting out, as predicted bythe analysis in Section 2. Around generation 70, thisstate crashes and yields a period of high opting out thatlasts (with one short glitch) until generation 205.At this point, the population enters a state wherecooperation is still relatively high, but the fraction ofdefection moves is around 30%. This state lasts untilgeneration 440, when a state of high cooperationoccurs. Again, the transition to the higher level ofcooperation happens without an intervening periodof opting out.

5. Conclusion

The superiority of the optional games in reachingstates of high cooperation can be demonstratedanalytically, and is supported by the dynamicproperties that simulations of such games manifest.There is no way for a population playing thecompulsory game to escape from a state of highdefection, except if several favorable mutations appearsimultaneously. In the optional games there are routes

Page 12: Evolution of altriusm in optional and compulsory games

171

out of states of high defection. The option of asocialbehavior facilitates the appearance and maintenanceof altruistic behavior.

In thinking about the evolution of social behavior itis important to recognize that such behavior occursagainst a changing environment consisting of thebehaviors of the other members of the populations.Thus suchanevolutionaryprocess isa feedbacksystem,and the global properties of such a process should beexpected to fluctuate, perhaps chaotically. The relativefitness of a given behavior or strategy cannot beassessed statically, with respect to a specific, or to afixed, environment. In the long run, the evolutionarydynamical properties of strategies and their geneticrepresentations, may have the most significant effecton the careers of populations using those strategies.

Obviously, in addition to more detailed analysis andsimulation of the optional games, it is important to seeif the optional games provide a more ethologicallyvalid model of some animal interactions. One wouldhave to be able to distinguish between an animal’srefusal to participate in an interaction (‘‘opting out’’)and its failure to reciprocate an altruistic action ofanother animal (‘‘defection’’). Our model predicts thatgeographically separate, but genetically equivalent,populations of the same species might differ markedlyin their social interactions, with some populationsexhibiting high cooperation, some behaving asocially,and others enduring periods of highly antisocialbehavior.

There are many ways in which to introducecomplications into the study of altruism in optionaland semi-optional games. For example, animals maymake various types of errors of recognition, potentialpartners may vary in quality, and different typesof game-theoretic situations may arise. Our prelimi-nary analyses reveal that these complications do notmarkedly affect the results presented herein. A slightlymore detailed survey of some possible complications isgiven in Kitcher (1993), where the evolution ofspecifically human types of altruism is also addressed.

REFERENCES

A,R. (1984). The Evolution of Cooperation. New York: BasicBooks.

A, R. & H, W. D. (1981). The evolution ofcooperation. Science 211, 1390–1396.

B, R. & L, J. P. (1987). No pure strategy isevolutionarily stable in the repeated prisoner’s dilemma game.Nature 327, 58–59.

B, J. J. & R, W. R. (1991). Distinguishing mechanisms for theevolution of co-operation. J. theor. Biol. 149, 63–74.

C, D. L. & S, R. M. (1990). How Monkeys See theWorld. Chicago: University of Chicago Press.

D, L. A. & A, M. (1992). Guppies and the tit fortat strategy: Preference based on past interaction. Behav. Ecol.Sociobiol. 28, 243–246.

D, L. A. & W, D. S. (1991). Rover: A strategy forexploiting cooperators in a patchy environment. Am. Nat. 138(3),687–701.

F, J. & W, R. (1989). Evolutionary stability in therepeated prisoner’s dilemma. Theor. Popul. Biol. 36, 161–168.

K, P. S. (1993). The evolution of human altruism. J. Phil. 90,

497–516.L, S. L. (1989). Iterated prisoner’s dilemma: an approach

to evolutionarily stable cooperation. Am. Nat. 134, 828–834.

L,K. (1992). Evolutionary phenomena in simple dynamics.In: Artificial Life II (Langton, C. G., Taylor, C., Farmer, J. D. &Rasmussen, S., eds) pp. 295–311. Redwood City, CA:Addison-Wesley.

M S, J. (1982). Evolution and the Theory of Games.Cambridge: Cambridge University Press.

M S, J. & P, G. R. (1973). The logic of animalconflict. Nature 246, pp. 15–18.

M-G,M. (1992).On the iterated prisoner’s dilemma.Bull. math. Biol. 54, 423–443.

M, R. R. (1967). No play: a means of conflict resolution.J. Pers. & Soc. Psych. 6(2), 150–156.

N,R.,S,C.P. & H, J. (1991). The market effect:an explanation for pay-off assymmetries among collaboratinganimals. Ethology 87, 97–118.

P, J. R. (1990). Evolution of outsider exclusion. J. Theor. Biol.148, 563–571.

P, J. R. & F, M. W. (1986). The evolution of helpingbehavior in large, randomly mixed populations. Am. Nat. 127,

209–221.S, B. B., C,D. L., S, R.M.,W, R.W. &

S, T. T. (eds) (1987). Primate Societies. Chicago:University of Chicago Press.

S, E. A., A, D. & T, L. (1994). Iteratedprisoner’s dilemma with choice and refusal of partners. In:Artificial Life III (Langton, C. G., ed.) pp. 131–175. RedwoodCity, CA: Addison-Wesley.

T, R. (1971). The evolution of reciprocal altruism. Q. Rev.Biol. 46, 33–57.