Upload
independent
View
0
Download
0
Embed Size (px)
Citation preview
The Effects of Diversity Maintenance on Coevolution for
an Intransitive Numbers Problem
Tirtha R. Ranjeet, Martin Masek, Philip Hingston and Chiou-Peng Lam
School of Computer and Security Science,
Edith Cowan University
{t.ranjeet, m.masek, p.hingston, p.lam}@ ecu.edu.au
Abstract. In this paper, we investigate the effectiveness of several techniques
commonly recommended for overcoming convergence problems with
coevolutionary algorithms. In particular, we investigate effects of the Hall of
Fame, and of several diversity maintenance methods, on a problem designed to
test the ability of coevolutionary algorithms to deal with an intransitive
superiority relation between solutions. We measure and analyse the effects of
these methods on population diversity and on solution quality.
Keywords: coevolution, diversity maintenance, HOF, fitness sharing,
1 Introduction
Evolutionary algorithms are population-based, stochastic search algorithms
modelled on evolutionary processes in nature. Potential solutions to a problem are
assigned a fitness that reflects how well they solve the problem, and these values
guide the search. In a coevolutionary algorithm (CEA), this fitness value depends on
interactions with other potential solutions. CEAs offer advantages over ordinary
evolutionary algorithms in certain situations: when there is no objective function to
measure fitness of a solution; in a large search space when there are two or more
interacting subspaces and in certain complex problem domains [1-7]. However, CEAs
can also suffer from pathologies which interfere with convergence. Many techniques
have been proposed to address these pathologies. One approach is to use an archive of
high quality solutions - the Hall of Fame is a well-known of example [8]. Another
idea is to use a diversity maintenance mechanism, such as fitness sharing [9-13].
This work is an empirical study, using a recent method for estimating solution set
quality [9,14], to investigate how diversity maintenance techniques can improve the
effectiveness of CEAs, both with and without the additional use of an archive. More
specifically, we empirically test variants of a standard CEA with different mutation
rates, with and without competitive fitness sharing, and with and without a Hall of
Fame, on a test problem designed to challenge CEAs. We examine how solution set
diversity and quality is affected in the variants.
The aim of fitness sharing and HOF is to improve the quality of solutions found by
the CEA, yet for many problems, there is no predefined quality metric– rather quality
can only be judged based on how evolved solutions interact with other solutions. In
[9,14], Chong et al. proposed that the appropriate quality measure for CEAs is
generalization performance, and introduced a set of methods for estimating it. They
explored the relationship between diversity and quality, using various implicit and
explicit diversity methods, and concluded that appropriate diversity improves quality.
In our paper, we have adapted the methods of Chong et al. to a different kind of
problem. They used a problem with a single population of interacting agents, Iterated
Prisoner’s Dilemma (IPD), whereas our intransitive number test problem uses two
competing populations, as is suitable when evolving competing sets of solutions in an
asymmetric domain. As well as diversity and quality, we also investigate the effect of
HOF, and its interaction with diversity maintenance.
The remainder of this paper is structured as follows. In Section 2, we review the
basics of CEAs and the Hall of Fame and some common diversity maintenance
methods, as well as describing methods for measuring diversity and quality in CEAs.
In Section 3, we give a description of the design of our experiments. In the final two
sections, we describe our results and conclude.
2 Coevolutionary Algorithms
Evolutionary algorithms (EAs) are stochastic search methods inspired by
biological evolution. EAs work with populations of solutions (individuals). Each
individual’s fitness depends on its performance against a criterion. Individuals with
high fitness are selected preferentially to produce “offspring” individuals for the next
generation. Two selected parents produce several offspring by exchanging genes
(crossover). Then, each offspring alters its gene structure with some probability
(mutation) and becomes a new individual in the next generation. This process of
variation and selection is repeated until some stopping condition is met.
A coevolutionary algorithm (CEA) is an evolutionary algorithm in which the
fitness of each individual depends on interactions between it and other individuals [1].
In CEAs, individuals are organised into sub-populations which coevolve [2,3,15,16].
The fitness calculation in CEAs is subjective: each individual interacts with
individuals from another population. Unlike objective fitness, subjective fitness is
dependent on the composition of the populations. A typical subjective fitness
calculates the average score of an individual in interactions with opposing individuals
in the current populations.
2.1 Hall of Fame
The Hall of Fame (HOF) is a technique that allows the population to interact with a
set of the best individuals from previous generations of the opponent population. The
best individuals from both populations in every generation are collected and stored in
an archive, which interacts with the populations during the fitness evaluation. The
functionality of the HOF is to preserve some old individuals to avoid the cycling and
forgetting pathologies. When the HOF is used, subjective fitness is modified to be the
average score of an individual in interactions with opposing individuals in the current
populations and also in the Hall of Fame [8].
2.2 Diversity
Too much selective pressure and/or not enough exploration in an evolutionary process
can cause premature convergence [9]. Maintaining diversity in the population has
proved to avoid premature convergence [12] in many instances. Chong, et al. [9,14]
categorize diversity maintenance methods into two types, implicit and explicit:
Implicit diversity maintenance methods use the selection process. A typical
implicit method is competitive fitness sharing (FS), where diversity is maintained in
the population by discouraging individuals with similar characteristics. Fitness values
are reduced for individuals with common gene structures. The shared fitness of an
individual f’i is calculated by dividing simple fitness by the niche count:
(1)
The symbol ci is a niche count, which is calculated on the basis of the individual’s
gene structure variation (dj) in the population. The following formulas are used to
calculate gene variation and niche count respectively.
√∑
∑{ (
)
The symbol u is the genome length, x is an individual and yj is an individual from
the same population, and xm and yj,m are their mth
gene values. The symbol τ is a
constant. The symbol nr is a constant niche radius and n is a population size.
Explicit diversity maintenance methods achieve diversity through variation. A
simple method is to increase the mutation rate.
Two types of diversity are genotypic and phenotypic diversity. Genotypic diversity
in a population is a measure of the gene structure variation, calculated as the average
gene variation over the population. Phenotypic diversity is calculated based on the
entropy [11,12] of the distribution of fitness values. The fitness values present in the
population are divided among N equal sized buckets, and then equation (2) is applied.
∑
(2)
2.3 Quality
We adopt the approach of Chong et al. to measure quality, i.e. we use a statistical
estimate of the generalization performance of a solution, but we modify it slightly to
account for the fact that we are using two populations. Chong et al. begin by defining
generalization performance as the mean score of a solution in all possible test cases.
This intuitively appealing idea is usually impractical to calculate. Therefore, they
propose a statistical approximation approach, in which a mean score is computed for a
suitable sample of test cases. In many cases, scores against “high quality” test cases
might be considered more important. They therefore propose two different methods
for sampling the space of test cases: unbiased sampling (purely random) and biased
sampling (favours higher quality). In the present study, due to space limitations, we
report only on results using biased sampling. To obtain a biased test set, we follow the
procedure in Chong et al., using a sample size of 200. Once we have generated test
sets, we can use them to estimate the quality of each solution as its mean score against
the test set solutions, and we can combine these in various ways to obtain an overall
quality measure for an evolved population of solutions.
Estimated Average Quality In an evolutionary algorithm, we are usually most
interested in the top few evolved solutions. Thus, we first sort the population
according to internal fitness, and then consider only the top few. Average quality is
then estimated as
∑
∑
(3)
where Ei is the estimated quality of solution i, nTest is the size of the test set, and
nBest is the number used in the estimate (i.e. we use only the best nBest).
Estimated Best Quality This is the quality of the best solution amongst the top
nBest solutions in the population, when they are sorted on internal fitness:
(4)
3 Experiments
In this section, we describe our experimental design. We describe the test problem
we have chosen to study, the algorithm variants that we test, and the measurements
that we gather during the testing.
As our test problem, we chose an intransitive number problem which was
introduced by Watson and Pollack [17]. It has advantages over the test problem used
by Chong et al, the IPD. IPD is an important problem and widely studied. It is an
extremely difficult problem for a CEA, with complex evolutionary dynamics, an
enormous search space (in fact researchers always restrict their search to solutions
that can be represented using some restricted representation). The intransitive number
problem has one specific feature that makes it difficult (intransitive superiority) and a
simple representation, as well as a known objective quality criterion, making it very
suitable for testing.
Watson and Pollack [17] introduced intransitive number test problems to test the
functionality of CEAs. We pose a version with two populations. Individual solutions
in both populations consist of pairs of real numbers in (0, 100), which we call x and y.
The score when solution a from one population meets solution b from the other
population is given in Equation (5):
(( ) ( )) {
| | | |
( ) | | | |
| | | |
(5)
{
Consider three solutions: A =<10;90>, B =<11;88> and C =<8;89>. Now score (A,
B) is 0 (B beats A), because 10 and 11 are closer than 90 and 88, so the score is
determined by which solution has the larger x value. Similarly, C beats B (based on a
larger y), and yet A beats C. Thus the superiority relation between solutions is
intransitive. Although this is problematic, generally speaking, the closer the solution
is to <100;100>, i.e. the larger both x and y values are, the higher quality the solutions
is. We define the actual quality of solution i as Ai = (x+y)/2, the average of the
solutions x and y values. We can then define measures for the actual quality of a
population, in a similar way as for estimated quality.
3.1 Algorithms tested
For this experiment, four algorithms, naïve CEA, CEA with fitness sharing (CEAFS),
CEA with HOF (CEAHOF) and combination of FS and HOF (CEAFH) were
considered. For each, the mutation rate was varied from 5% to 25% with 5 intervals.
In all algorithms tested, single point crossover [20] and polynomial mutation [18]
were used for the reproduction process. Parents were selected using a stochastic
universal sampling method [19] and an elite individual is copied to the next
generation. Initial gene values were randomly generated between 0 and 100.
Population size (25) and crossover rate (60%) are as recommended by Watson and
Pollack, and we chose 300 generations based on initial testing that showed algorithms
has stabilised well before this. Each run of an algorithm was repeated 60 times to
account for variation.
4 Results and analysis
In this section, we review the results of our experiments by examining quality and
diversity in the evolved populations produced using each algorithm. First we examine
the quality. In Fig. 1, a convergence plot for the CEA naïve algorithm is shown. Each
data point is an average across 60 runs of the algorithm for a specific generation. The
y-axis is the estimated best quality. By about 100 generations, the algorithm has
converged, except in the case of 5% mutation, which needs around 200 generations.
The best mutation rate in terms of estimated quality appears 25%. The actual best
quality plot is similar except that the mutation rate has little effect.
In order to quantify this visual impression, we compute average figures over the
last 60 generations (as the algorithms appear to have converged by then) and all 60
runs (i.e. an average of 3600 data values) for each mutation rate. These averages are
presented in Table 1 (along with diversity data).From the table we can see that, in the
case of the naïve algorithm CEA, higher mutation rates tend to give higher best
quality (both estimated and actual), and that there is little effect on average quality.
Convergence plots for average quality are qualitatively similar to those for best
quality, and are omitted.
Looking at CEAFS, we see that best quality is not sensitive to mutation rate, and
that estimated best quality is high when compared with CEA, while actual quality is
improved compared with CEA. Thus, fitness sharing is effective in increasing the
performance of the algorithm (higher best quality). Average quality is reduced when
compared with CEA, and decreases with higher mutation rates. The reduction in
average quality is due at least in part to the increased diversity of the population, as
expected. Convergence plots are quite similar to those for CEA, apart from the final
quality levels being different.
CEAHOF has improved quality compared to CEA, with estimated best quality very
similar to CEAFS, and the actual best quality also similar, but more sensitive to
mutation rate. In fact the best performance over all the algorithms on this measure
was CEAHOF with 25% mutation. However average quality levels are actually higher
than for CEA, suggesting that the improved performance is not due to an increase in
diversity.
Fig. 1. Convergence plot for CEANaive with different mutation rates, showing average
estimated best quality over 60 runs.
Finally, the performance of CEAFH is rather erratic, with best quality levels
similar to the naïve algorithm, along with a lower average quality. We conjecture that
this is because the mechanism of HOF and diversity maintenance methods interfere
and conflict with each other, rendering both ineffective.
As well as solution quality, we also focus on the role of diversity. Following
Chong et al., we measured both genotypic and phenotypic diversity. Fig. 2 is a
generational plot showing the progress of genotypic diversity for CEA - diversity
drops swiftly, with a slight recover in phenotypic diversity, before levelling out.
Phenotypic diversity is similar. This low diversity might be expected to cause
problems such as premature convergence. Higher mutation rates reduce the loss of
diversity.
Fig. 2. Generational plot of genotypic diversity with CEANaive. Data values are averaged over
60 runs.
Table 1. Population quality and diversity figures for all algorithm variants. Each column shows
the mean for the last 60 generations, over 60 runs of the algorithm.
Algorithm Est.Average Est.Best Act.Average Act.Best Geno Pheno
CEANaive05 0.85 0.93 75.45 75.36 6.72 0.98
CEANaive10 0.82 0.92 74.95 83.47 10.25 1.12
CEANaive15 0.84 0.95 74.50 84.16 11.42 1.17
CEANaive20 0.82 0.96 72.93 83.79 12.50 1.32
CEANaive25 0.84 0.98 74.67 84.74 14.18 1.25
CEAFS05 0.69 0.95 71.11 91.74 25.99 1.51
CEAFS10 0.68 0.96 70.04 91.39 26.11 1.55
CEAFS15 0.66 0.96 70.00 91.12 26.08 1.64
CEAFS20 0.63 0.95 68.89 91.61 27.17 1.70
CEAFS25 0.63 0.96 69.03 91.41 27.19 1.71
CEAHOF05 0.88 0.95 82.15 88.58 6.32 0.88
CEAHOF10 0.88 0.95 84.76 90.95 8.58 0.95
CEAHOF15 0.86 0.95 83.94 91.81 9.62 0.98
CEAHOF20 0.84 0.96 83.31 91.14 10.79 1.09
CEAHOF25 0.85 0.96 88.09 95.51 11.20 0.93
CEAFH05 0.49 0.95 61.98 82.78 23.18 1.16
CEAFH10 0.50 0.90 65.06 85.95 23.41 1.62
CEAFH15 0.48 0.88 63.95 83.97 23.72 1.67
CEAFH20 0.46 0.87 63.82 83.60 23.56 1.71
CEAFH25 0.45 0.87 65.20 85.77 24.12 1.71
The last two columns of Table 1 summarise diversity values for variants of each
algorithm. It can be seen that higher mutation rates increase diversity, as expected,
and that this effect is much smaller when fitness sharing is used, as diversity is
already effectively maintained. Also, the level of diversity is much higher in every
case when fitness sharing is used than in any case where fitness sharing is not used.
The effect of HOF is to reduce diversity, again emphasising that the improvement in
quality when HOF is used is due to a different mechanism.
Fig. 3. Scatter plot of diversity versus quality for each of the four algorithms, with a
mutation rate of 5%. For each data point, the x value is the mean value of genotypic
diversity over the last 60 generations in one run of the particular algorithm, while the
y value is the corresponding mean of the actual best quality measure.
Due to space restrictions, we have omitted generational diversity plots for CEAFS,
CEAHOF and CEAFH, but we can provide a qualitative description of them as
follows: For CEAFS, the plots show a small but rapid rise in genotypic diversity, after
which the level remains steady. There is an initial small increase in phenotypic
diversity then a quick drop and a leveling out at about the initial diversity level.
The overall shape of the plots for CEAHOF is similar to those for CEA, except that
the final diversity levels are a little lower. CEAFH is similar to CEAFS, with
genotypic diversity levels slightly lower. The fact that the performance of CEAFH is
so poor, even though diversity is only slightly reduced, again suggests that HOF and
diversity maintenance are interfering with each other.
To further scrutinize the relationship between diversity and quality, we present Fig.
3, a scatter plot of genotypic diversity versus actual best quality, for all algorithms,
with a mutation rate of 5%. It is clear that the naïve algorithm and CEAHOF provide
all the points on the left of the plot, i.e. those with lower diversity, and that their
quality values are widely spread, i.e. the algorithm is unreliable (though it sometimes
converges on very high quality). In contrast, the two algorithms with fitness sharing
contribute all the higher diversity points, and reliable quality, with CEAFS being
more consistent than CEAFH.
5 Conclusion
In this paper, we have described our experiments with different variations on a
naïve CEA, introducing combinations of fitness sharing, Hall of Fame, and a range of
mutation rates. We have tested these variations on a test problem designed to be
difficult for CEAs due to an intransitive superiority relationship between solutions.
We have measured the effects of these variations on the performance of the algorithm
in terms of population diversity and solution quality. With regards to diversity, our
results are in broad agreement with those found by Chong et al. on a different
problem: Iterated Prisoner’s Dilemma: fitness sharing is an effective way to maintain
population diversity in a CEA, and a moderate amount of diversity helps to ensure
that high quality solutions are reliably found. In addition, we found that the Hall of
Fame method can also improve quality, but not as reliably as fitness sharing, and that
the diversity maintenance methods that we tested do not combine well with Hall of
Fame.
In future, we intend to carry out similar tests on further test problems having
different characteristics, such as multi-modal problems, to try to improve
understanding of which methods are most effective for which kinds of problems. We
would also like to investigate whether there are ways to combine diversity
maintenance with HOF effectively.
6 References
1. Axelrod, R.: The evolution of strategies in the iterated Prisoner's Dilemma. Genetic
Algorithms and Simulated Annealing. 32--41 (1987)
2. deJong, E., Stanley, K., Wiegand, P.: Introductory tutorial on coevolution. In: Proceedings
of the 2007 Genetic and Evolutionary Computation Conference (GECCO 2007), pp. 3133--
3157. ACM, New York (2007)
3. Ficici, S. G.: Solution concepts in coevolutionary algorithms. Ph.D. Dissertation. Brandeis
University (2004)
4. Hillis, W. D.: Coevolving parasites improve simulated evolution as an optimization
procedure. Physica D: Nonlinear Phenomena. 42, 228--234 (1990)
5. Porter, M. A., deJong, K. A.: A cooperative coevolutionary approach to function
optimization. In: The Third Conference on Parallel Problem Solving from Nature, pp. 249-
-257, Springer-Verlag, London (1994)
6. Rosin, C. D.: Coevolutionary search among adversaries. Ph.D. Dissertation. University of
California, San Diego (1997)
7. Wiegand, R. P.: An analysis of cooperative coevolutionary algorithms. George Mason
University, Virginia (2003)
8. Rosin, C. D., Belew, R. K.: New methods for competitive coevolution. Evolutionary
Computation, 5, 1--29 (1997)
9. Chong, S. Y., Tino, P., Yao, X.: Relationship between generalization and diversity in
coevolutionary learning. IEEE Transactions on Computational Intelligence and AI in
Games, 1, 214--232 (2009)
10. Mckay, R. I.: Fitness sharing in genetic programming. In: Proceedings of the Proceedings
of the Genetic and Evolutionary Computation Conference, Las Vegas (2000)
11. Ray, T. S.: Evolution, complexity, entropy and artificial reality. Physica D: Nonlinear
Phenomena, pp. 239--263 (1993)
12. Rosca, J. P.: Entropy-driven adaptive representation. In: Proceedings of the Workshop on
Genetic Programming: From Theory to Real-World Applications, pp. 23--32 (1995)
13. Yao, X., Liu, Y.: How to Make Best Use of Evolutionary Learning. Complex Systems -
From Local Interactions to Global Phenomena, 229-242. (1996)
14. Chong, S. Y., Tino, P., Yao, X.: Measuring Generalization Performance in Coevolutionary
Learning. IEEE Transactions on Evolutionary Computation, 12, 479-505 (2008)
15. Casillas, J., Cordon, O., Herrera, F., Merelo, J. J.: A cooperative coevolutionary algorithm
for jointly learning fuzzy rule bases and membership functions, pp. 1075-1105. Artificial
Evolution (2002)
16. Ficici, S. G., Pollack, J. B.: Pareto Optimality in Coevolutionary Learning. In: Proceedings
of the 6th European Conference on Advances in Artificial Life, pp. 316--325. Springer-
Verlag, London (2001)
17. Watson, R. A., Pollack, J. B.: Coevolutionary dynamics in a minimal substrate. In:
Proceedings of the Proceedings of the Genetic and Evolutionary Computation Conference
GECCO-01, Morgan Kaufmann, San Francisco (2001)
18. Deb, K., Goyal, M.: A combined genetic adaptive search (gene AS) for Engineering
Design. Computer Science and Informatics, 26, 30-45 (1996)
19. Barker, J. E.: Adaptive Selection Methods for Genetic Algorithms. In: Proceedings of the
1st International Conference on Genetic Algorithms, pp. 101--111. Hillsdale, NJ (1985)
20. Poli, R., Langdon, W.B.: A new schema theorem for genetic programming with one-point
crossover and point mutation. Evolutionary Computation. 6, 231--252 (1998)