The Effects of Diversity Maintenance on Coevolution for an Intransitive Numbers Problem

The Effects of Diversity Maintenance on Coevolution for

an Intransitive Numbers Problem

Tirtha R. Ranjeet, Martin Masek, Philip Hingston and Chiou-Peng Lam

School of Computer and Security Science,

Edith Cowan University

{t.ranjeet, m.masek, p.hingston, p.lam}@ ecu.edu.au

Abstract. In this paper, we investigate the effectiveness of several techniques

commonly recommended for overcoming convergence problems with

coevolutionary algorithms. In particular, we investigate effects of the Hall of

Fame, and of several diversity maintenance methods, on a problem designed to

test the ability of coevolutionary algorithms to deal with an intransitive

superiority relation between solutions. We measure and analyse the effects of

these methods on population diversity and on solution quality.

Keywords: coevolution, diversity maintenance, HOF, fitness sharing,

1 Introduction

Evolutionary algorithms are population-based, stochastic search algorithms

modelled on evolutionary processes in nature. Potential solutions to a problem are

assigned a fitness that reflects how well they solve the problem, and these values

guide the search. In a coevolutionary algorithm (CEA), this fitness value depends on

interactions with other potential solutions. CEAs offer advantages over ordinary

evolutionary algorithms in certain situations: when there is no objective function to

measure fitness of a solution; in a large search space when there are two or more

interacting subspaces and in certain complex problem domains [1-7]. However, CEAs

can also suffer from pathologies which interfere with convergence. Many techniques

have been proposed to address these pathologies. One approach is to use an archive of

high quality solutions - the Hall of Fame is a well-known of example [8]. Another

idea is to use a diversity maintenance mechanism, such as fitness sharing [9-13].

This work is an empirical study, using a recent method for estimating solution set

quality [9,14], to investigate how diversity maintenance techniques can improve the

effectiveness of CEAs, both with and without the additional use of an archive. More

specifically, we empirically test variants of a standard CEA with different mutation

rates, with and without competitive fitness sharing, and with and without a Hall of

Fame, on a test problem designed to challenge CEAs. We examine how solution set

diversity and quality is affected in the variants.

The aim of fitness sharing and HOF is to improve the quality of solutions found by

the CEA, yet for many problems, there is no predefined quality metric– rather quality

can only be judged based on how evolved solutions interact with other solutions. In

[9,14], Chong et al. proposed that the appropriate quality measure for CEAs is

generalization performance, and introduced a set of methods for estimating it. They

explored the relationship between diversity and quality, using various implicit and

explicit diversity methods, and concluded that appropriate diversity improves quality.

In our paper, we have adapted the methods of Chong et al. to a different kind of

problem. They used a problem with a single population of interacting agents, Iterated

Prisoner’s Dilemma (IPD), whereas our intransitive number test problem uses two

competing populations, as is suitable when evolving competing sets of solutions in an

asymmetric domain. As well as diversity and quality, we also investigate the effect of

HOF, and its interaction with diversity maintenance.

The remainder of this paper is structured as follows. In Section 2, we review the

basics of CEAs and the Hall of Fame and some common diversity maintenance

methods, as well as describing methods for measuring diversity and quality in CEAs.

In Section 3, we give a description of the design of our experiments. In the final two

sections, we describe our results and conclude.

2 Coevolutionary Algorithms

Evolutionary algorithms (EAs) are stochastic search methods inspired by

biological evolution. EAs work with populations of solutions (individuals). Each

individual’s fitness depends on its performance against a criterion. Individuals with

high fitness are selected preferentially to produce “offspring” individuals for the next

generation. Two selected parents produce several offspring by exchanging genes

(crossover). Then, each offspring alters its gene structure with some probability

(mutation) and becomes a new individual in the next generation. This process of

variation and selection is repeated until some stopping condition is met.

A coevolutionary algorithm (CEA) is an evolutionary algorithm in which the

fitness of each individual depends on interactions between it and other individuals [1].

In CEAs, individuals are organised into sub-populations which coevolve [2,3,15,16].

The fitness calculation in CEAs is subjective: each individual interacts with

individuals from another population. Unlike objective fitness, subjective fitness is

dependent on the composition of the populations. A typical subjective fitness

calculates the average score of an individual in interactions with opposing individuals

in the current populations.

2.1 Hall of Fame

The Hall of Fame (HOF) is a technique that allows the population to interact with a

set of the best individuals from previous generations of the opponent population. The

best individuals from both populations in every generation are collected and stored in

an archive, which interacts with the populations during the fitness evaluation. The

functionality of the HOF is to preserve some old individuals to avoid the cycling and

forgetting pathologies. When the HOF is used, subjective fitness is modified to be the

average score of an individual in interactions with opposing individuals in the current

populations and also in the Hall of Fame [8].

2.2 Diversity

Too much selective pressure and/or not enough exploration in an evolutionary process

can cause premature convergence [9]. Maintaining diversity in the population has

proved to avoid premature convergence [12] in many instances. Chong, et al. [9,14]

categorize diversity maintenance methods into two types, implicit and explicit:

Implicit diversity maintenance methods use the selection process. A typical

implicit method is competitive fitness sharing (FS), where diversity is maintained in

the population by discouraging individuals with similar characteristics. Fitness values

are reduced for individuals with common gene structures. The shared fitness of an

individual f’i is calculated by dividing simple fitness by the niche count:

(1)

The symbol ci is a niche count, which is calculated on the basis of the individual’s

gene structure variation (dj) in the population. The following formulas are used to

calculate gene variation and niche count respectively.

√∑

∑{ (

)

The symbol u is the genome length, x is an individual and yj is an individual from

the same population, and xm and yj,m are their mth

gene values. The symbol τ is a

constant. The symbol nr is a constant niche radius and n is a population size.

Explicit diversity maintenance methods achieve diversity through variation. A

simple method is to increase the mutation rate.

Two types of diversity are genotypic and phenotypic diversity. Genotypic diversity

in a population is a measure of the gene structure variation, calculated as the average

gene variation over the population. Phenotypic diversity is calculated based on the

entropy [11,12] of the distribution of fitness values. The fitness values present in the

population are divided among N equal sized buckets, and then equation (2) is applied.

∑

(2)

2.3 Quality

We adopt the approach of Chong et al. to measure quality, i.e. we use a statistical

estimate of the generalization performance of a solution, but we modify it slightly to

account for the fact that we are using two populations. Chong et al. begin by defining

generalization performance as the mean score of a solution in all possible test cases.

This intuitively appealing idea is usually impractical to calculate. Therefore, they

propose a statistical approximation approach, in which a mean score is computed for a

suitable sample of test cases. In many cases, scores against “high quality” test cases

might be considered more important. They therefore propose two different methods

for sampling the space of test cases: unbiased sampling (purely random) and biased

sampling (favours higher quality). In the present study, due to space limitations, we

report only on results using biased sampling. To obtain a biased test set, we follow the

procedure in Chong et al., using a sample size of 200. Once we have generated test

sets, we can use them to estimate the quality of each solution as its mean score against

the test set solutions, and we can combine these in various ways to obtain an overall

quality measure for an evolved population of solutions.

Estimated Average Quality In an evolutionary algorithm, we are usually most

interested in the top few evolved solutions. Thus, we first sort the population

according to internal fitness, and then consider only the top few. Average quality is

then estimated as

∑

∑

(3)

where Ei is the estimated quality of solution i, nTest is the size of the test set, and

nBest is the number used in the estimate (i.e. we use only the best nBest).

Estimated Best Quality This is the quality of the best solution amongst the top

nBest solutions in the population, when they are sorted on internal fitness:

(4)

3 Experiments

In this section, we describe our experimental design. We describe the test problem

we have chosen to study, the algorithm variants that we test, and the measurements

that we gather during the testing.

As our test problem, we chose an intransitive number problem which was

introduced by Watson and Pollack [17]. It has advantages over the test problem used

by Chong et al, the IPD. IPD is an important problem and widely studied. It is an

extremely difficult problem for a CEA, with complex evolutionary dynamics, an

enormous search space (in fact researchers always restrict their search to solutions

that can be represented using some restricted representation). The intransitive number

problem has one specific feature that makes it difficult (intransitive superiority) and a

simple representation, as well as a known objective quality criterion, making it very

suitable for testing.

Watson and Pollack [17] introduced intransitive number test problems to test the

functionality of CEAs. We pose a version with two populations. Individual solutions

in both populations consist of pairs of real numbers in (0, 100), which we call x and y.

The score when solution a from one population meets solution b from the other

population is given in Equation (5):

(( ) ( )) {

| | | |

( ) | | | |

| | | |

(5)

{

Consider three solutions: A =<10;90>, B =<11;88> and C =<8;89>. Now score (A,

B) is 0 (B beats A), because 10 and 11 are closer than 90 and 88, so the score is

determined by which solution has the larger x value. Similarly, C beats B (based on a

larger y), and yet A beats C. Thus the superiority relation between solutions is

intransitive. Although this is problematic, generally speaking, the closer the solution

is to <100;100>, i.e. the larger both x and y values are, the higher quality the solutions

is. We define the actual quality of solution i as Ai = (x+y)/2, the average of the

solutions x and y values. We can then define measures for the actual quality of a

population, in a similar way as for estimated quality.

3.1 Algorithms tested

For this experiment, four algorithms, naïve CEA, CEA with fitness sharing (CEAFS),

CEA with HOF (CEAHOF) and combination of FS and HOF (CEAFH) were

considered. For each, the mutation rate was varied from 5% to 25% with 5 intervals.

In all algorithms tested, single point crossover [20] and polynomial mutation [18]

were used for the reproduction process. Parents were selected using a stochastic

universal sampling method [19] and an elite individual is copied to the next

generation. Initial gene values were randomly generated between 0 and 100.

Population size (25) and crossover rate (60%) are as recommended by Watson and

Pollack, and we chose 300 generations based on initial testing that showed algorithms

has stabilised well before this. Each run of an algorithm was repeated 60 times to

account for variation.

4 Results and analysis

In this section, we review the results of our experiments by examining quality and

diversity in the evolved populations produced using each algorithm. First we examine

the quality. In Fig. 1, a convergence plot for the CEA naïve algorithm is shown. Each

data point is an average across 60 runs of the algorithm for a specific generation. The

y-axis is the estimated best quality. By about 100 generations, the algorithm has

converged, except in the case of 5% mutation, which needs around 200 generations.

The best mutation rate in terms of estimated quality appears 25%. The actual best

quality plot is similar except that the mutation rate has little effect.

In order to quantify this visual impression, we compute average figures over the

last 60 generations (as the algorithms appear to have converged by then) and all 60

runs (i.e. an average of 3600 data values) for each mutation rate. These averages are

presented in Table 1 (along with diversity data).From the table we can see that, in the

case of the naïve algorithm CEA, higher mutation rates tend to give higher best

quality (both estimated and actual), and that there is little effect on average quality.

Convergence plots for average quality are qualitatively similar to those for best

quality, and are omitted.

Looking at CEAFS, we see that best quality is not sensitive to mutation rate, and

that estimated best quality is high when compared with CEA, while actual quality is

improved compared with CEA. Thus, fitness sharing is effective in increasing the

performance of the algorithm (higher best quality). Average quality is reduced when

compared with CEA, and decreases with higher mutation rates. The reduction in

average quality is due at least in part to the increased diversity of the population, as

expected. Convergence plots are quite similar to those for CEA, apart from the final

quality levels being different.

CEAHOF has improved quality compared to CEA, with estimated best quality very

similar to CEAFS, and the actual best quality also similar, but more sensitive to

mutation rate. In fact the best performance over all the algorithms on this measure

was CEAHOF with 25% mutation. However average quality levels are actually higher

than for CEA, suggesting that the improved performance is not due to an increase in

diversity.

Fig. 1. Convergence plot for CEANaive with different mutation rates, showing average

estimated best quality over 60 runs.

Finally, the performance of CEAFH is rather erratic, with best quality levels

similar to the naïve algorithm, along with a lower average quality. We conjecture that

this is because the mechanism of HOF and diversity maintenance methods interfere

and conflict with each other, rendering both ineffective.

As well as solution quality, we also focus on the role of diversity. Following

Chong et al., we measured both genotypic and phenotypic diversity. Fig. 2 is a

generational plot showing the progress of genotypic diversity for CEA - diversity

drops swiftly, with a slight recover in phenotypic diversity, before levelling out.

Phenotypic diversity is similar. This low diversity might be expected to cause

problems such as premature convergence. Higher mutation rates reduce the loss of

diversity.

Fig. 2. Generational plot of genotypic diversity with CEANaive. Data values are averaged over

60 runs.

Table 1. Population quality and diversity figures for all algorithm variants. Each column shows

the mean for the last 60 generations, over 60 runs of the algorithm.

Algorithm Est.Average Est.Best Act.Average Act.Best Geno Pheno

CEANaive05 0.85 0.93 75.45 75.36 6.72 0.98

CEANaive10 0.82 0.92 74.95 83.47 10.25 1.12

CEANaive15 0.84 0.95 74.50 84.16 11.42 1.17

CEANaive20 0.82 0.96 72.93 83.79 12.50 1.32

CEANaive25 0.84 0.98 74.67 84.74 14.18 1.25

CEAFS05 0.69 0.95 71.11 91.74 25.99 1.51

CEAFS10 0.68 0.96 70.04 91.39 26.11 1.55

CEAFS15 0.66 0.96 70.00 91.12 26.08 1.64

CEAFS20 0.63 0.95 68.89 91.61 27.17 1.70

CEAFS25 0.63 0.96 69.03 91.41 27.19 1.71

CEAHOF05 0.88 0.95 82.15 88.58 6.32 0.88

CEAHOF10 0.88 0.95 84.76 90.95 8.58 0.95

CEAHOF15 0.86 0.95 83.94 91.81 9.62 0.98

CEAHOF20 0.84 0.96 83.31 91.14 10.79 1.09

CEAHOF25 0.85 0.96 88.09 95.51 11.20 0.93

CEAFH05 0.49 0.95 61.98 82.78 23.18 1.16

CEAFH10 0.50 0.90 65.06 85.95 23.41 1.62

CEAFH15 0.48 0.88 63.95 83.97 23.72 1.67

CEAFH20 0.46 0.87 63.82 83.60 23.56 1.71

CEAFH25 0.45 0.87 65.20 85.77 24.12 1.71

The last two columns of Table 1 summarise diversity values for variants of each

algorithm. It can be seen that higher mutation rates increase diversity, as expected,

and that this effect is much smaller when fitness sharing is used, as diversity is

already effectively maintained. Also, the level of diversity is much higher in every

case when fitness sharing is used than in any case where fitness sharing is not used.

The effect of HOF is to reduce diversity, again emphasising that the improvement in

quality when HOF is used is due to a different mechanism.

Fig. 3. Scatter plot of diversity versus quality for each of the four algorithms, with a

mutation rate of 5%. For each data point, the x value is the mean value of genotypic

diversity over the last 60 generations in one run of the particular algorithm, while the

y value is the corresponding mean of the actual best quality measure.

Due to space restrictions, we have omitted generational diversity plots for CEAFS,

CEAHOF and CEAFH, but we can provide a qualitative description of them as

follows: For CEAFS, the plots show a small but rapid rise in genotypic diversity, after

which the level remains steady. There is an initial small increase in phenotypic

diversity then a quick drop and a leveling out at about the initial diversity level.

The overall shape of the plots for CEAHOF is similar to those for CEA, except that

the final diversity levels are a little lower. CEAFH is similar to CEAFS, with

genotypic diversity levels slightly lower. The fact that the performance of CEAFH is

so poor, even though diversity is only slightly reduced, again suggests that HOF and

diversity maintenance are interfering with each other.

To further scrutinize the relationship between diversity and quality, we present Fig.

3, a scatter plot of genotypic diversity versus actual best quality, for all algorithms,

with a mutation rate of 5%. It is clear that the naïve algorithm and CEAHOF provide

all the points on the left of the plot, i.e. those with lower diversity, and that their

quality values are widely spread, i.e. the algorithm is unreliable (though it sometimes

converges on very high quality). In contrast, the two algorithms with fitness sharing

contribute all the higher diversity points, and reliable quality, with CEAFS being

more consistent than CEAFH.

5 Conclusion

In this paper, we have described our experiments with different variations on a

naïve CEA, introducing combinations of fitness sharing, Hall of Fame, and a range of

mutation rates. We have tested these variations on a test problem designed to be

difficult for CEAs due to an intransitive superiority relationship between solutions.

We have measured the effects of these variations on the performance of the algorithm

in terms of population diversity and solution quality. With regards to diversity, our

results are in broad agreement with those found by Chong et al. on a different

problem: Iterated Prisoner’s Dilemma: fitness sharing is an effective way to maintain

population diversity in a CEA, and a moderate amount of diversity helps to ensure

that high quality solutions are reliably found. In addition, we found that the Hall of

Fame method can also improve quality, but not as reliably as fitness sharing, and that

the diversity maintenance methods that we tested do not combine well with Hall of

Fame.

In future, we intend to carry out similar tests on further test problems having

different characteristics, such as multi-modal problems, to try to improve

understanding of which methods are most effective for which kinds of problems. We

would also like to investigate whether there are ways to combine diversity

maintenance with HOF effectively.

6 References

1. Axelrod, R.: The evolution of strategies in the iterated Prisoner's Dilemma. Genetic

Algorithms and Simulated Annealing. 32--41 (1987)

2. deJong, E., Stanley, K., Wiegand, P.: Introductory tutorial on coevolution. In: Proceedings

of the 2007 Genetic and Evolutionary Computation Conference (GECCO 2007), pp. 3133--

3157. ACM, New York (2007)

3. Ficici, S. G.: Solution concepts in coevolutionary algorithms. Ph.D. Dissertation. Brandeis

University (2004)

4. Hillis, W. D.: Coevolving parasites improve simulated evolution as an optimization

procedure. Physica D: Nonlinear Phenomena. 42, 228--234 (1990)

5. Porter, M. A., deJong, K. A.: A cooperative coevolutionary approach to function

optimization. In: The Third Conference on Parallel Problem Solving from Nature, pp. 249-

-257, Springer-Verlag, London (1994)

6. Rosin, C. D.: Coevolutionary search among adversaries. Ph.D. Dissertation. University of

California, San Diego (1997)

7. Wiegand, R. P.: An analysis of cooperative coevolutionary algorithms. George Mason

University, Virginia (2003)

8. Rosin, C. D., Belew, R. K.: New methods for competitive coevolution. Evolutionary

Computation, 5, 1--29 (1997)

9. Chong, S. Y., Tino, P., Yao, X.: Relationship between generalization and diversity in

coevolutionary learning. IEEE Transactions on Computational Intelligence and AI in

Games, 1, 214--232 (2009)

10. Mckay, R. I.: Fitness sharing in genetic programming. In: Proceedings of the Proceedings

of the Genetic and Evolutionary Computation Conference, Las Vegas (2000)

11. Ray, T. S.: Evolution, complexity, entropy and artificial reality. Physica D: Nonlinear

Phenomena, pp. 239--263 (1993)

12. Rosca, J. P.: Entropy-driven adaptive representation. In: Proceedings of the Workshop on

Genetic Programming: From Theory to Real-World Applications, pp. 23--32 (1995)

13. Yao, X., Liu, Y.: How to Make Best Use of Evolutionary Learning. Complex Systems -

From Local Interactions to Global Phenomena, 229-242. (1996)

14. Chong, S. Y., Tino, P., Yao, X.: Measuring Generalization Performance in Coevolutionary

Learning. IEEE Transactions on Evolutionary Computation, 12, 479-505 (2008)

15. Casillas, J., Cordon, O., Herrera, F., Merelo, J. J.: A cooperative coevolutionary algorithm

for jointly learning fuzzy rule bases and membership functions, pp. 1075-1105. Artificial

Evolution (2002)

16. Ficici, S. G., Pollack, J. B.: Pareto Optimality in Coevolutionary Learning. In: Proceedings

of the 6th European Conference on Advances in Artificial Life, pp. 316--325. Springer-

Verlag, London (2001)

17. Watson, R. A., Pollack, J. B.: Coevolutionary dynamics in a minimal substrate. In:

Proceedings of the Proceedings of the Genetic and Evolutionary Computation Conference

GECCO-01, Morgan Kaufmann, San Francisco (2001)

18. Deb, K., Goyal, M.: A combined genetic adaptive search (gene AS) for Engineering

Design. Computer Science and Informatics, 26, 30-45 (1996)

19. Barker, J. E.: Adaptive Selection Methods for Genetic Algorithms. In: Proceedings of the

1st International Conference on Genetic Algorithms, pp. 101--111. Hillsdale, NJ (1985)

20. Poli, R., Langdon, W.B.: A new schema theorem for genetic programming with one-point

crossover and point mutation. Evolutionary Computation. 6, 231--252 (1998)

Documents

The Effects of Diversity Maintenance on Coevolution for an Intransitive Numbers Problem