Upload
riyad-parvez
View
210
Download
5
Tags:
Embed Size (px)
Citation preview
I
An Adaptive Selection Scheme for Balancing
Exploitation and Exploration
By
Muhammad Riyad Parvez
Student ID: 200605043
Department of Computer Science and Engineering
Bangladesh University of Engineering and Technology
March 2012
II
DECLARATION
I, hereby, declare that the work presented in this thesis is the outcome of the investigation
performed by me under the supervision of Dr. Md. Monirul Islam, Associate Professor,
Department of Computer Science and Engineering, Bangladesh University of Engineering
and Technology, Dhaka. I also declare that no part of this thesis and thereof has been or is
being submitted elsewhere for the award of any degree or diploma.
Signature
(Muhammad Riyad Parvez)
Candidate
Countersigned
(Dr. Md. Monirul Islam)
Supervisor
III
Acknowledgements
Firstly, I would like to express my profound gratefulness and honor to Dr. Md. Monirul Islam,
Associate Professor, Department of Computer Science and Engineering, Bangladesh University of
Engineering and Technology, for his continuous support, advice and care. His endless patience,
scholarly guidance, continual encouragement, constant and energetic supervision, constructive
criticism, valuable advice, reading many inferior drafts and correcting them at all stages have made it
possible to complete this thesis.
I would also like to thank all the faculty members and staff of Department of CSE, BUET, for their
support and cooperation.
Finally, I would like to express my deep respect and gratitude to my parents and my family. And to
Charles Darwin from whom, evolutionary algorithms get their aesthetic beauty.
IV
Abstract
Evolutionary algorithms are successfully applied to problems where typical optimization
algorithms and local search methods fail. Like any search algorithm, evolutionary algorithms
have to face the conflicting goals of local exploitations and global explorations during the
search process. The success of evolutionary algorithm lies in its ability to explore and exploit
simultaneously. But EA is also error prone to loosing population diversity early generations
which results into population trapped into local optima and degrading of its performance.
Though numerous schemes are suggested for various stages of EA to prevent premature
optimization, maintaining balance between exploration and exploitation isn’t tried at survivor
selection stage. Existing selection strategies either focuses on exploiting or exploring.
This thesis introduces Adaptive Survivor Selection Strategy (ASSS), a totally new concept
that tries to maintain required amount of diversity at survivor selection stage. Along with new
selection strategy this thesis paper also presents new criteria for measuring diversity both for
individual and whole population. Key concept of this strategy is to measure diversity across
the population, calculating needed amount of diversity at that time and try to gain that
diversity level selecting survivors using newly introduced diversity measurement technique.
ASSS uses both fitness of an individual and how much diverse is the individual regarding to
current population to calculate an adaptive survivor selection fitness function. Using that
fitness we simply weed out inferior individuals in terms of fitness and diversity. This survivor
fitness function is adaptive which gives it control on selection pressure.
Performance in maintaining required level of diversity at any time of algorithm is evaluated
on a number of benchmark numerical optimization problems and results are compared with
several existing selection schemes. Experimental result shows that ASSS shows significant
performance gain in managing diversity for any sort of EA.
V
Table of Contents
Chapter 1: Introduction……………………………………….……1
1.1 General Introduction……………………………………….……...1
1.2 Objective of the Thesis……………………………………….……2
1.3 Thesis Organization……………………………………………….3
Chapter 2: Background………………………………………….….4
2.1 When EA is Needed……………………………………………….4
2.2 Advantages of EA…………………………………………………4
2.3 Disadvantages of EA……………………………………………...4
2.4 Canonical Structure of EA……………………………………….5
2.5 Representation of Gene…………………………………………...6
2.6 Major Branches of EA…………………………………………....7
2.6.1 Genetic Algorithm………………………………………………….…..7
2.6.2 Evolutionary Programming……………………………………….…..7
2.6.3 Evolutionary Strategy…………………………………………………7
2.6.4 Genetic Programming…………………………………………………8
2.6.5 Memetic Algorithm……………………………………………………9
2.7 Existing Work…………………………………………………….9
2.7.1 Dynamic Parameter Control…………………………………………9
2.7.2 Maintaining Diversity and Multi-population GAs…………………9
2.7.3 Memory Based Genetic Algorithm…………………………………10
2.7.4 Mutation Based Work………………………………………………10
2.7.5 Survivor Selection Based Work……………………………………12
VI
Chapter 3: Proposed Algorithms………………………………..15
3.1 Dual Population Genetic Algorithm………………………….15
3.1.1 Advantages………………………………………………………….16
3.1.2 Disadvantages……………………………………………………….16
3.1.3 Recommendation……………………………………………………17
3.2 Modified DPGA Proposal……………………………………...17
3.2.1 Structure of Individual……………………………………………..17
3.2.2 Initialization………………………………………………………….18
3.2.3 Parent Selection……………………………………………………..18
3.2.4 Generating Parent Individual On The Fly………………………..18
3.2.5 Mutation…………………………………………………………….18
3.2.6 Survivor Selection…………………………………………………..19
3.2.6.1 Exploited Individual……………………………………...19
3.2.6.2 Explored Individual……………………………………....19
3.2.6.3 Normal Individual………………………………………..19
3.2.7 Schedule of T……………………………………………………….20
3.2.8 Advantages…………………………………………………………20
3.3 New Survivor Selection Strategy……………………………..20
3.4 New Mutation Strategy……………………………………….24
3.4.1 Laplace Distribution………………………………………………25
3.4.2 Slash Distribution…………………………………………………27
3.4.3 Students T-Distribution…………………………………………..28
Chapter 4: Experimental Study………………………………...29
4.1 Modified DPGA………………………………………………29
4.1.1 Pitfalls of Modified DPGA………………………………………29
VII
4.2 Adaptive Survivor Selection Strategy……………………….30
Chapter 5: Conclusion…………………………………………...32
5.2 Future Works……………………………………………….....32
5.2.1 Modified DPGA…………………………………………………...32
5.2.2 Adaptive Survivor Selection………………………………………33
5.2.3 New Distribution Based Mutation………………………………..33
References………………………………………………………...34
Appendix………………………………………………………….35
VIII
List of Figures
Table 1.1(a): Change in best fitness (best solution) with number of generations…………….1
Table 1.1(b): Comparison between Random Search, EA and Problem Tailored Search……..2
Table 2.4: Basic skeleton of an Evolutionary Algorithm………………………………….…..6
Table 2.6.4: Individual structure of GP………………………………………………….……8
Table 2.7.4: Probability Distribution Function (PDF) of Gaussian distribution…………..…11
Table 2.7.5: Truncation Selection……………………………………………………………12
Table 3.1(a): Offspring Generation of DPGA…………………………………………….…15
Table 3.1(b): Reserve Population Fitness Function……………………………………….…16
Table 3.4: Probability Density Function of Stable Family………………………….…….….24
Table 3.4.1(a): Probability Density Function of Laplace Distribution…………………...…..26
Table 3.4.1(b): Comparison of Gaussian and Laplace Distribution…………………….…....26
Table 3.4.2: Probability Density Function of Slash Distribution at different parameters........27
Table 3.4.3: Probability Density Function for Student’s t-distribution with different degrees
of freedom……………………………………………………………………………28
Table 4.2(a): Change in diversity across generations………………………………………...31
Table 4.2(b): Number of buckets searched…………………………………………………..31
IX
List of Symbols
MPGA Multi Population Genetic Algorithm
DPGA Dual Population Genetic Algorithm
GA Genetic Algorithm
StGA Standard Genetic Algorithm
EA Evolutionary Algorithm
EC Evolutionary Computing
EP Evolutionary Programming
MA Memetic Algorithm
FEP Fast Evolutionary Programming
ASSS Adaptive Survivor Selection Strategy
X
XI
Chapter 1
Introduction
1.1 General Introduction
Evolutionary Algorithm (EA) is the study of computational system which use ideas and get
inspirations from natural evolution. It’s a generic population based meta-heuristic
optimization algorithm. EA falls into category of bio-inspired computing. It uses selection,
crossover, mutation mechanisms borrowed from natural evolution. And survival of the fittest
principle lies in the heart of EA [1] [2]. Evolution Algorithms are often viewed as function
optimizers, although the range of problems to which EAs are applies quite broad. One of the
many advantages of EAs is they don’t require very broad domain knowledge. Although
domain knowledge can be introduced in EAs.
Fitness curve by generations for EA is asymptotic in nature. Fitness improvement in earlier
generations of EA is rapid and decreasingly increasing. And after certain generations,
improvement in best fitness throughout generations is negligible. That’s when we call
population has converged. It’s expected that population will converge to good enough
solution. But sometimes population converges to local optima which is not accepted result.
This phenomenon is called premature optimization.
Figure 1.1(a): Change in best fitness (best solution) with number of generations
EAs performs better than random search because search because of its exploitative behavior.
It uses random walk, but also tries exploit good solutions. It also outperforms local greedy
XII
search. Local greedy searches are exploitative in nature, often trapped into local maxima. But
EA has random walk and maintaining required level of diversity it’s less likely to be trapped
into local maxima. Problem tailored searches outperform EA only for the problem in which
the search is tailored and uses deep domain knowledge of that problem. Such deep domain
knowledge isn’t readily available and incorporating to problem tailored search is difficult.
Figure (1.1b): Comparison between Random Search, EA and Problem Tailored Search[4]
1.2 Thesis Objective
This thesis mainly focuses into maintaining diversity of single population algorithms. It is
frequently observed that populations lose diversity too early and their individuals are trapped
into local optima. For lack of diversity trapped individuals can’t escape basin of local
minima. This phenomenon is called Premature Convergence. Objective of this thesis paper is
to investigate better schemes which can maintain diversity of a population and also give
control on diversity. The quest is searching for an adaptive diversity maintaining scheme.
Thesis is done in three focused areas:
1. Modifying Dual Population Genetic Algorithm (DPGA) so that it can properly
manage diversity.
2. Seeking a survivor selection technique which is adaptive and gives more control
on diversity at any time of algorithm.
3. Examining probability distributions other than already used distributions which
can give appropriate amount of jumps in any stage of evolution.
XIII
1.3 Thesis Organization
The rest of the thesis is organized as follows. Chapter 2 introduces the fundamentals of
evolutionary algorithm, with its operators and processes. The essential terms related to
evolutionary algorithm are explained with examples. The strengths, limitations, and
applications of evolutionary algorithm are also mentioned.
In Chapter 3, we introduce new evolutionary strategies, entitled as Modified DPGA,
Adaptive Survivor Selection Strategy, New Mutation Based on Distributions, to balance the
exploitative and explorative features of the standard evolutionary algorithm. The different
stages, operators and procedures of Modified DPGA, ASSS, and Mutation Based on
Distribution are described in details. It is also explained how they differ substantially from
other existing works.
Chapter 4 evaluates Modified DPGA and ASSS on a number of benchmark numerical
optimization problems and makes comparisons with several other existing works. Although
Modified DPGA didn’t perform well, but we gained valuable insight how we can modify this
further to gain more performance. An in-depth experimentation with the parameters,
operators and the stages of ASSS, with their effects on population fitness and diversity, is
also carried out. Finally, in Chapter 5, we summarize our work and provide directions for
future research.
XIV
Chapter 2
Background
Evolutionary Algorithms (EA) consist of several heuristics, which are able to solve
optimization tasks by imitating some aspects of natural evolution. They may use different
levels of abstraction, but they are always working on whole populations of possible solutions
for a given task. EAs are an approved set of heuristics, which are flexible to use and postulate
only negligible requirements on the optimization task.
2.1 When EA is Needed
The search space is large, complex or poorly understood.
Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the
search space.
Only target (fitness) function is provided.
No mathematical analysis is available.
Traditional search methods fail.
Not the best solution but good enough solution is needed.
Local search methods can’t give good enough solutions.
Continuous optimization problems.
2.2 Advantages of EA
Applicable to a wide range of problems.
Useful in areas without good problem specific techniques.
No explicit assumptions about the search space necessary.
Easy to implement.
Any-time behavior.
2.3 Disadvantages of EA
Problem representation must be robust.
No general guarantee for an optimum.
No solid theoretically foundations (yet).
Parameter tuning: trial-and-error Process (but self-adaptive variants in evolution
strategies).
Sometimes high memory requirements.
Implementation: High degree of freedom.
XV
2.4 Canonical Structure of EA
EAs are family of algorithms. There’s no definite structure exists among them. Although
most of the EAs follow more or less following structure:
1. Initialization: The initial population of candidate solutions is usually generated
randomly across the search space. However, domain specific knowledge or other
knowledge can easily be incorporated.
2. Evaluation: Once the population is initialized or offspring population is created,
the fitness value of the candidate solutions is evaluated.
3. Parent Selection: Selection allocates more copies of those solutions with higher
fitness values and thus imposes the survival-of-the-fittest mechanism on the
candidate solutions. The main idea of selection is to prefer better solutions to
worse ones, and many selection procedures have been proposed to accomplish this
idea, including roulette-wheel selection, stochastic universal selection, ranking
selection and tournament selection, some of which are described in the next
section.
4. Recombination: Recombination combines parts of two or more parental solutions
to create new, possibly better solutions (i.e. offspring). There are many ways of
accomplishing this (some of which are discussed in the next section), and
competent performance depends on a properly designed recombination
mechanism. The offspring under recombination will not be identical to any
particular parent and will instead combine parental traits in a novel manner.
5. Mutation: While recombination operates on two or more parental chromosomes,
mutation locally but randomly modifies a solution. Again, there are many
variations of mutation, but it usually involves one or more changes being made to
an individual’s trait or traits. In other words, mutation performs a random walk in
the vicinity of a candidate solution.
6. Replacement: The offspring population created by selection, recombination, and
mutation replaces the original parental population. Many replacement techniques
such as elitist replacement, generation-wise re-placement and steady-state
replacement methods are used in GAs.
7. Repeat steps 2–6 until a terminating condition is met.
XVI
Figure (2.4): Basic skeleton of an Evolutionary Algorithm
2.5 Representation of Gene
Individual representations are typically divided into two types:
1. Genotypic Representation: Genes are internal structures those determine physical
characteristics of an individual. Usually represented by array of letters like genes
in human DNA. In case of EA, it is represented by bit-string. Genotypic
representation is used extensible in Genetic Algorithm. But it has some limitation.
Most real world problems are not in form of genotypic representation. So we have
to device a scheme to represent genotype by bit-string. Performance of algorithm
is dependent on representation of bit-string.
2. Phenotypic Representation: Individuals are represented by real valued vectors. So
there’s no need to convert them to any other representations. Algorithm directly
works on real valued vectors of problems. Extensively used in Evolutionary
Strategy and Evolutionary Programming. It’s used in real valued function
optimization.
XVII
2.6 Major Branches of EA
EAs are divided into four major branches.
2.6.1 Genetic Algorithm
Genetic Algorithm (GA) was first formulated by John Holland. Holland’s original GA is
called standard Genetic Algorithm which uses two parents, produces two offspring. It
simulates Darwinian evolution. Search operators are only applied to genotypic representation;
hence it’s called Genotypic Algorithm. It emphasizes the role of crossover and mutation as a
background operator. GA uses binary string as representation of individuals extensively.
2.6.2 Evolutionary Programming
Evolutionary Programming (EP) was first proposed by David Fogel [2]. It is closer to
Lamarckian evolution. It doesn’t use any kind of crossover. Only mutation is used both for
exploitation and exploration. Individuals are represented by two parts: object variables and
mutation step size . are essentially real valued vectors i.e. phenotypes. So they are called
Phenotypic Algorithm.
2.6.3 Evolutionary Strategies
Evolutionary Strategies (ES) was first proposed by Ingred Rechenberg. Individuals are
represented by real valued vectors. Good optimizer of real valued functions. Like EP, they
are also Phenotypic Algorithm. Mutation plays the main role, crossover is also used. It has
special self-adapting step size of mutation. ES has some basic notation:
1. (p,c) The p parents 'produce' c children using mutation. Each of the c children is then
assigned a fitness value, depending on its quality considering the problem-specific
environment. The best (the fittest) p children become next generations parents. This
means the c children are sorted by their fitness value and the first p individuals are
selected to be next generations parents (c must be greater or equal p).
2. (p+c) The p parents 'produce' c children using mutation. Each of the c children is then
assigned a fitness value, depending on its quality considering the problem-specific
environment. The best (the fittest) p individuals of both: parents and children become
next generations parents. This means the c children together with the p parents are
sorted by their fitness value and the first p individuals are selected to be next
generations parents.
3. (p/r,c) The p parents 'produce' c children using mutation and recombination. Each of
the c children is then assigned a fitness value, depending on its quality considering the
problem-specific environment. The best (the fittest) p children become next
generations parents. This means the c children are sorted by their fitness value and the
first p individuals are selected to be next generation parents (c must be greater or
equal p). 4. (p+c) The p parents 'produce' c children using mutation and recombination. Each of
the c children is then assigned a fitness value, depending on its quality considering the
problem-specific environment. The best (the fittest) p individuals of both: parents and
XVIII
children become next generation parents. This means the c children together with the
p parents are sorted by their fitness value and the first p individuals are selected to be
next generations parents.
2.6.4 Genetic Programming
Genetic Programming (GP) is put forward by John Koza. GP evolves computer programs. It
is a specialization of genetic algorithms (GA) where each individual is a computer program.
It is a machine learning technique used to optimize a population of computer programs
according to a fitness landscape determined by a program's ability to perform a given
computational task. Trees can be easily evaluated in a recursive manner. Every tree node has
an operator function and every terminal node has an operand, making mathematical
expressions easy to evolve and evaluate. Genetic programming starts with a primordial ooze
of thousands of randomly created computer programs. This population of programs is
progressively evolved over a series of generations. The evolutionary search uses the
Darwinian principle of natural selection (survival of the fittest) and analogs of various
naturally occurring operations, including crossover (sexual recombination), mutation, gene
duplication, gene deletion. Genetic programming sometimes also employs developmental
processes by which an embryo grows into fully developed organism. It uses both mutation
and crossover. Trees are often used as data structure for individuals. Although non-tree
representations have been suggested and successfully implemented. Although other fields of
EA developed to be in mainstream usage, GP still is in its infancy. Because of representation
of programs, huge search space, complex operation is needed to generate better individuals,
GP isn’t mainstream yet.
Figure (2.6.4): Individual structure of GP
XIX
2.6.5 Memetic algorithm
Although Memetic algorithms don’t fall into EA category, they incorporate other searching
techniques to EAs. The combination of Evolutionary Algorithms with Local Search
Operators that work within the EA loop has been termed “Memetic Algorithms” (MA). Quite
often, MA are also referred to in the literature as Baldwinian Evolutionary algorithms (EA),
Lamarckian EAs, cultural algorithms or genetic local search. After generating individuals
local search is performed on them. The frequency and intensity of individual learning directly
define the degree of evolution (exploration) against individual learning (exploitation) in the
MA search, for a given fixed limited computational budget. Clearly, a more intense
individual learning provides greater chance of convergence to the local optima but limits the
amount of evolution that may be expended without incurring excessive computational
resources. Therefore, care should be taken when setting these two parameters to balance the
computational budget available in achieving maximum search performance. When only a
portion of the population individuals undergo learning, the issues on which subset of
individuals to improve need to be considered to maximize the utility of MA search.
2.7 Existing Works
2.7.1 Dynamic Parameter Control
A variety of previous works have proposed methods of dynamically adjusting the parameters
of GA or other evolutionary algorithms. These methods include deterministic parameter
control, adaptive parameter control, and self-adaptive parameter control. The simplest
technique is the deterministic parameter control, which adjusts parameters according to a
predetermined policy. Since it controls the parameters deterministically, it cannot adapt to the
changes that occur during the execution of an algorithm.
Adaptive parameter control exploits feedback from the evolution of a population to control
the parameters. A notable example is the 1:5 adaptive Gaussian mutation widely used in the
evolution strategy algorithms. According to this method, the mutation step size is increased if
more than 20% of the mutations are successful and reduced otherwise. However, this method
cannot be applied to algorithms adopting other than the real number representation. Finally,
self-adaptive parameter control encodes the parameters into chromosomes and let them
evolve with other genes. Although elegant, its applicability and effectiveness in a broad range
of problems have not yet been shown
2.7.2 Maintaining Diversity and Multi-population Genetic Algorithms
Multi population GAs (MPGAs) do so by evolving multiple subpopulations which are
spatially separated [6]. Island-model GA (IMGA), which is a typical example of MPGA,
evolves two or more subpopulations and uses periodic migration for the exchange of
information between the subpopulations. The number and size of the populations of IMGA
XX
are predetermined and kept unchanged during the algorithm’s execution. However, other
MPGAs such as multinational GA forking GA the bi-objective multi population algorithm
and variable island GA can adjust the number and size of populations dynamically by
splitting a population into two smaller ones or combining two similar ones. The performance
of IMGA is sensitive to the migration policy, migration rates and size, and the particular
topology used, because they determine the spread speed of good solutions among the
subpopulations. A variety of previous works have studied the effect of these parameters for
migration both theoretically and experimentally.
2.7.3 Memory Based Genetic Algorithm
Diploid GA, GA with unexpressed genes, dual GA (dGA), and primal-dual GA (PDGA) have
adopted complementary and dominance mechanisms to maintain or provide population
diversity. Most organisms in nature have a great number of genes in their chromosomes and
only some of the dominant genes are expressed in a particular environment. The repressed
genes are considered as a means of storing additional information and providing a latent
source of
population diversity. Diploid GAs use diploid chromosomes which are different from natural
ones in that the two strands of the diploid chromosomes are not complementary. Only some
genes in a diploid chromosome are expressed and used for fitness evaluation by some
predetermined dominance rules. GAUG is different from diploid GA in that it uses haploid
chromosomes, but it also incorporates some unexpressed genes into its chromosomes. The
unexpressed genes in GAUG are not used for fitness evaluation but used for preserving
diversity.
dGAs and PDGAs also have haploid chromosomes in the population, but the chromosomes
are sometimes interpreted complementarily to provide additional diversity. In dGA, each
chromosome is attached with an additional bit which indicates whether the chromosome
should be interpreted as it is or as complemented. In PDGA, some bad-looking chromosomes
are interpreted both as complemented and original, and the original one is replaced by the
complemented one if the latter gives better evaluations. Since the additional diversity
provided by memory-based algorithms makes it easier to adapt to extreme environmental
changes, these methods are frequently used for dynamic optimization problems.
2.7.4 Mutation Based Work
ES and EP use mutations exclusively for both maintaining diversity and exploitation.
Mutations can be divided into several categories. Mutation classification based on uniform
ness across generations is:
1. Uniform Mutation: When mutation step size or mutation rate is uniform
regardless of generation at any time of algorithm, then it’s called uniform
mutation. Its usage not very high because of deterministic behavior regardless of
generations.
XXI
2. Non Uniform Mutation: If mutation step size of mutation rate varies with respect
to generation, then it’s called non uniform mutation. Usually at initial generations,
step size or mutation rate is higher. As generation continues to increase step size
or mutation rate is decreased gradually. It’s used frequently, because it gives
option for governing diversity rate and also when diversity is needed it’s
facilitated by large step size and convergence is needed it’s facilitated by small
step size.
For genetic algorithm, random bit-flipping is used for mutation. Random bit changing has
some issues. For example, bit changing in higher position in bit-string has more effect on bit
changing in lower position. And also for some bit-string going to immediate next or previous
bit-string needs all bits changing. So exploitation becomes difficult. It’s called Hamming
Cliff problem. Using gray code can mitigate effect of this problem.
For mutation, random step size is needed to introduce random walk into search space. For
random number generation, Gaussian distribution is most used. It’s a bell shaped curve. It’s
defined by two parameters: position parameter (mean, µ), scale parameter (standard
deviation, σ) and is denoted by . Always µ=0 and usually σ=3 i.e. is used for
random number.
generation (RNG). Mutations using Gaussian distribution is called Gaussian mutation.
Algorithms using distribution based mutation.
Figure (2.7.4): Probability Distribution Function (PDF) of Gaussian distribution
Xin Yao uses two more distributions for RNG. They are:
1. Cauchy Distribution
2. Levy Distribution
Gaussian, Cauchy and Levy they all have same bell curve shape PDF. Both of them have
same parameter set like Gaussian. Both Cauchy and Levy have fatter tail than Gaussian. That
means they are able to give more long jumps which can give more diverse individuals; less
prone to getting trapped into local optima. Mutation using Cauchy and Levy distribution as
XXII
RNG are called Cauchy mutation and Levy mutation respectively. Xin Yao uses adaptive
mutation parameter. Every individual is represented by pair of , where is real values
vectors, is adaptive mutation parameter, is size parameter or standard deviation of that
distribution.
2.7.5 Survivor Selection Based Work
Survivor selection is usually deterministic. In this phase of algorithm, selection pressure is
applied to individuals. Several survivor selection schemes exist:
1. Naïve Survivor Selection: Basically follows survival of the fittest principle.
Individuals are selected based on their fitness value for next generation. Lower fitness
valued individuals are weed out. Sometimes risky, because lower fitness individuals
can have latent genes which can give better individuals in later generations.
2. Elitist Selection: Population maintains spot for best individuals so that they didn’t get
lost across the generations. Certain portion of best individuals is transferred directly to
next generation without any modification. This ensures even if algorithm can’t make
any solution any better than current solutions, the best solution must remain and in the
end of the algorithm is returned.
3. Truncation Selection: Truncation selection simply retains the fittest x% of the
population. These fittest individuals are duplicated into the next generation, so that the
population size is maintained. Less fit candidates are culled even without being given
the opportunity to evolve into something better. Very often results in premature
convergence. Only advantage is rapid convergence.
Figure (2.7.5): Truncation Selection
XXIII
4. Fitness Proportionate Reproduction: Same as roulette wheel selection scheme.
Individuals are directly transferred to next generation based on their proportionate
fitness value. Individuals of lower fitness still have some chances to survive, so that
some genes that are latent can survive through generations even so they haven’t been
able to generate good individuals.
5. Niching Methods: Niching methods strive to maintain niches [9] [10]. That means it
ensures individuals of one niche don’t have to compete with individuals of other
niches. The advantage is pre-existing diversity is maintained. But also makes
convergence harder as selection pressure is lower. Niching methods are divided into
two categories:
i) Fitness Sharing: In nature, individuals of same species compete with each other
for fixed resources [13]. Like nature, in fitness sharing, individuals in same region
share fixed fitness values assigned to that region. Fitness is a shared resource of
the population. Population is first divided into niches. Region is defined by
sharing radius . Sharing Radius defines the niche size. This scheme is
very sensitive to the value of assigned fitness per region and sharing radius.
Population does not converge as a whole, but convergence takes place within the
niches. Sharing can be done at genotypic or phenotypic level: 1. Genotypic level:
Hamming distance and 2. Phenotypic Level: Euclidean distance. Sharing radius: if
too small, practically no effect on the process; if too large, several peaks will
’melt’ individual peaks into one.
ii) Crowding: Similar individuals in natural population, often of the same species,
compete against each other for limited resources. Dissimilar individuals tend to
occupy different niches, they typically don’t compete. Crowding uses individuals
newly entering in a population to replace similar individuals. Random sample of
CF (Crowding Factor) individuals is taken from the population. Larger crowding
factor indicates less tolerance for the similar solutions, smaller values indicate
similar solutions are more welcomed. New members of particular species replace
older members of that species, not replacing members of other species. Crowding
doesn’t increase the diversity of population; rather it strives to maintain the pre-
existing diversity. It’s not directly influenced by fitness value. Crowding is
divided into:
(1) Deterministic Crowding: New individual will always replace the most similar
individual if it has better fitness value.
(2) Probabilistic Crowding: Primarily a distance based niching method. Main
difference is the use of a probabilistic rather than deterministic acceptance
function. No longer do stronger individuals win over weaker individuals, they
win proportionally according to their fitness, and thus we get restorative
pressure. Two core ideas of probabilistic crowding are to hold tournament
between similar individuals and to let tournaments be probabilistic.
6. Deterministic Sampling: Average fitness of the population is calculated. Fitness
associated to each individual is divided by the average fitness, but only the integer
part of this operation is stored. If the value is equal or higher than one, the individual
XXIV
is copied to the next generation. Remaining free places in the new population is
fulfilled with individuals with the greatest fraction.
XXV
Chapter 3
Algorithm Proposal
3.1 Dual Population Genetic Algorithm
Dual Population Genetic Algorithm (DPGA) is a genetic algorithm which uses two
populations instead of one to avoid premature convergence with two different evolutionary
objectives [11] [12] [13]. The main population plays the role of that of an ordinary genetic
algorithm. It evolves to find a good solution of high fitness value. The additional population
is called reserve population is employed as reservoir for additional chromosomes which are
rather different from chromosomes of main population. Two different fitness functions are
used. Main population uses actual fitness function (like normal GA) and reserve population
uses a fitness function which gives better fitness to the chromosomes more different from
chromosomes of main population. Multi Population Genetic Algorithms use migration of
chromosomes from one population to another population to exchange information. DPGA
doesn’t use migration instead it uses another noble approach called crossbreeding.
Crossbreeding is performed by taking one parent from main population and another parent
from reserve population, making crossover between them. Newly born offspring are called
crossbred offspring. Crossbred offspring then evaluated for both main population and reserve
population for survival. DPGA also employs inbreeding, which takes two parents from the
same population and makes offspring by crossover. These inbred offspring compete for
survival in their respective parent population.
Figure (3.1a): Offspring Generation of DPGA
Mutation plays minimal role in DPGA and diversity is mainly provided by reserve population
through crossbreeding. Crossbreeding plays the role of maintaining diversity in DPGA. The
amount of diversity needed in any step of DPGA is specified by a self-adaptive parameter δ
(0< δ <1). δ defines the distance of parents from main population and parents from reserve
XXVI
population. As δ determines which individual will participate in crossbreeding, we can say
roughly δ is analogous to the length of step size. The fitness function of reserve population is
------------------------------------(1)
Figure (3.1b): Reserve Population Fitness Function
d(M, x) is average distance from main population of individual x. So we have turned our
focus into crossbreeding and fitness function for reserve population. δ defines how much
distant reserve population will be from main population. δ is set to lower values for
exploitation and to higher values for exploration. If δ is kept similar for several generations,
reserve population will start to converge at δ distance from main population.
There are some pros and cons of DPGA:
3.1.1 Advantages
1. Reserve population preserves genes which is extinct from main population. As
survivor crossbred offspring holds gene inherited from best individual of main
population (which is lost in later generations), can be recovered from reserve
population.
2. DPGA utilizes information from successful breeding. Value of δ which produces
surviving offspring used later for selecting parent. If crossover is unsuccessful, δ
is set to maximum, which influences selection of future parents.
3.1.2 Disadvantages
1. Reserve population introduces space and computational overhead. For main
population, individuals only need one fitness evaluation when they are created or
modified. But for reserve population individuals, every individual needs to be
evaluated whenever the δ changes value as well as evaluation when created or
modified. If number of chromosomes of main population is n and reserve
population is m, then total evaluation of reserve fitness function is O(nm).
2. When selecting parent, dual population genetic algorithm doesn’t measure
diversity of reserved population. Reserve population should be diverse enough for
XXVII
exploration of search space. As the selected parent of reserve population may not
be so dissimilar to the parent from main population. For crossbreeding, distance
between parents may not be δ. At the worst case, the distance may be far more
less or greater than δ.
3. If crossbred child survives in both main and reserved population. Diversity
decreases as same individual is copied to both populations. This gives us another
insight, crossbred child has higher fitness (according to fitness function for reserve
population) than inbred child and parent of reserve population may not be at
desired distant from parents of main population i.e. reserve population may not
have individuals who can breed offspring at the desired step size.
4. If crossbred offspring can’t survive in the main population, DPGA transforms into
single population algorithm. And if this happens for several generation, measures
to be taken to increase diversity of reserve population which incurs overhead.
5. If crossbred offspring manage to survive in reserve population, reserve population
will contain replicated genes of an individual of main population, decreasing
diversity of reserve population further.
6. At time of converging, DPGA keeps the value of δ low to facilitate convergence
for several generations. From equation (1), individuals having distance δ gains
more fitness over other individuals replacing individuals whose distance d(M, x)
much greater or less than delta are to be replaced. As in time of convergence, δ is
set to minimum i.e. individuals most distant from main population begins to
diminish (DPGA uses best n individual for survival selection for both population
with same fitness function for parent selection) and individuals similar to main
population individuals begins to takeover reserve population after few
generations. Hence, reserve population also begins to converge as like main
population but at distance δ from main population. When DPGA detects main
population converges to local optima, it sets δ to maximum to escape from local
optima. Now DPGA picks individual most distant from main population, but the
reserve population is already similar to main population and can’t provide
diversity any further.
7. Diversity is also dependent on success of crossover. If parents for crossbreeding
are at desired distance, they may not produce fittest individuals. Crossover always
a big jump to an area somewhere “in between” two (parent) areas. Offspring
seldom goes beyond their parents.
8. In DPGA, total gene frequency remained constant from the very beginning. As
crossover is only operator used, new gene is never introduced. Crossbreeding
changes gene frequency in individual population, but total frequency remained
unchanged. At worst scenario, if the best gene is missed at the initialization of
populations, DPGA never gets the optima.
9. Inbreeding in reserve population doesn’t introduce new genes. And if the distance
of two parents is δ and –δ (selection is based on their distance, not direction), then
the inbred offspring will be more similar to main population.
XXVIII
3.1.3 Recommendation
One of the biggest drawbacks of DPGA is convergence of reserve population along main
population. For survivor selection of reserve population, probabilistic crowding (fitness
function would be same as before for parent selection) should be used for survivor selection.
As we have seen, current reserve population survivor selection of DPGA leads to
convergence of reserve population. Probabilistic crowding prevents similar genes to takeover
whole population simultaneously preserving genes from extinction.
3.2 Modified DPGA Proposal
We have seen above that selected parent from reserve population may not be different enough
from parent of main population. We can say this parent of reserve population is best of bad
bunch. As a result, crossbred offspring are not so different from their parents. And once δ is
set to lower value, near to zero for several generations, reserve population also become
almost identical to main population. We have no problem if main population converges to
good enough solution or terminating criteria is met. But if we detect premature convergence,
then we have to increase diversity of main population to escape local optima. But as reserve
population is identical with main population, it can’t give diversity to main population. So it
remains trapped in local optima.
To address this problem, we propose elimination of reserve population, instead we will
generate individual on the fly which will play the role of reserve population parent. On the fly
generated individual will be at exactly δ distance from parent of main population.
3.2.1 Structure Of Individual
Every individual in main population will be consisted of pairs of (xi, δi). Where xi is real
valued vector in each dimension and δi determines how much jump or distant will be on the
fly generated individual incorporated in each dimension. δi is called jump parameter.
Another parameter temperature T is also introduced. This parameter plays similar role like in
simulated annealing. We tried to bring the concept of simulated annealing as local search for
rigorously searching newly found potential search regions. Value of T is bigger at the
beginning of the algorithm, so that search region will be bigger and more uniform in all
dimensions i.e. shape of search region will be n-dimensional sphere. At the final stage of
evolution, value of T will be scheduled to lower to facilitate more exploited local search and
the search will exploit more in the dimensions where solutions are getting better. The local
search region will be like elliptical shape, where the major axis of the ellipse will be towards
the direction of the local (maybe global) optima of that region.
XXIX
3.2.2 Initialization
xi is initialized in regular fashion. For δi, we will generate n random numbers. Then
3.2.3 Parent Selection
Any selection method can be used. But we prefer tournament selection or restricted
tournament selection (RTS).
3.2.4 Generating Parent Individual On The Fly
One parent is selected from main population and another parent is generated based on δ. If
is the real valued vector of generated individual at dimension I, then
as
We will take value of such that
√
is the maximum possible Euclidean distance between two points in search
space.
√∑
Then we will reset main parents jump parameters. Because if this parent is selected again and
jump parameters are unchanged; then same individual will be generated again. As a result,
same offspring will be produced and computation of a generation will be wasted. So we will
reset jump parameters like initialization.
3.2.5 Mutation
DPGA uses non-uniform mutation. But we will use Cauchy Mutation, as it gives more long
jumps to facilitate exploration, when algorithm is in exploration stage,. And when algorithm
is in exploitation stage, we will use Gaussian Mutation; it gives short jump to facilitate
convergence of individuals.
XXX
3.2.6 Survivor Selection
New algorithm will evaluate both on the fly generated individuals and their inbred offspring.
If we offspring survive, we will divide them into 3 categories
1. Exploited Individual: When
2. Normal Individual: When
3. Explored Individual: When
Here distance is the Euclidean distance between offspring and main parent. We have divided
them into 3 categories so that we can explore and exploit at the very same time. When in
exploitation stage, the algorithm can still explore other potential regions in search space while
exploiting in current region. On the other hand, when algorithm is in exploration stage, if a
potential region is found we can exploit that region by conducting a local search like memetic
algorithm while still exploring other regions.
3.2.6.1 Exploited Individual
⁄ when is the explored dimension
⁄
⁄ when is the exploited dimension
Here,
The rationale is exploited individual comes from a region which is already explored or being
explored by another individual. So it doesn’t need to explore surrounding region twice.
3.2.6.2 Explored Individual
⁄
for every dimension
This individual is far away from its parent. It can be assumed that this offspring is in region
where the algorithm never searched before. So this potential region needs exploration.
Exploration is provided because . Even if the algorithm is in exploitation mood, it
can still explore newly found unsearched potentially good region.
3.2.6.3 Normal Individual
Randomly select dimensions.
⁄
⁄ for randomly selected dimension
⁄ for other dimensions
XXXI
The individual is not in the distance which can be called exploited or explored. We have
selected dimensions because we want to introduce some variations based on fitness
difference.
3.2.7 Schedule of T
The initial value of T is dependent of optimization problem. For complex, multi-modal, rough
search space T should be greater to facilitate more exploration in local search and for simple,
unimodal search space smaller value of T is better. The value of T is a function of generation
count and surviving of offspring. We propose that, T should be increased with generation
count and if no offspring survived T should be remained same and if offspring survives T
should be increased. Because surviving of offspring means we are making progress towards
convergence, not surviving means we still need to explore more regions.
3.2.8 Advantages
1. Extra space of reserve population is no longer needed. Evaluation of reserve
population individuals is also eliminated.
2. On the fly generated individual is exactly at δ distance from parent of main
population, so diversity can be incorporated as much as we want.
3. New proposal introduces δi for each dimension ∑δi= 1. δi determines how much
exploitation or exploration will take place in any dimension. If we find a
dimension in which you can find better individual, we can continue to explore or
exploit in that dimension. On the other hand, if population is trapped in any deep
local optima, then we can experiment changing value of δi, to escape local optima.
4. Every individual will have their own δi, so we have granular control for every
individual in each dimension.
5. DPGA doesn’t facilitate exploitation in newly found good regions on fitness
landscapes, whether proposed algorithm gives full throttle in exploitation in
newly found region even if the algorithm is in globally exploration mode,
giving full local search capability like memetic algorithm.
6. DPGA doesn’t evaluate reserve inbred offspring and reserve parents for survival
in the main population. But this algorithm will evaluate both on the fly generated
individuals and inbred generated offspring. Since these individuals are already
generated as by-products of crossbreeding, evaluation of them has very little
overhead and if any of them survives, they can introduce more diversity in main
population and give a new region to search for potential global maxima.
3.3 New Survivor Selection Strategy
Current schemes of survivor selections fall into two categories:
1. Scheme those focused solely on survival of the fittest or exploitation. For
example, elitist selection, rank selection, fitness proportionate reproduction.
2. Scheme those focused on solely maintaining diversity. For example, niching
methods: fitness sharing, deterministic crowding, probabilistic crowding.
XXXII
Above two categories are in two extreme ends. Those who focused on exploitation don’t take
diversity into account. On the other hand, those who focused on diversity don’t take
exploitation into account. But survivor selection should be based on both diversity and
exploitation. So we propose new survivor selection scheme which will take both diversity and
exploitation into account. The fitness function for survivor selection:
Here, real fitness function,
function of gene variation with chromosomes of current generation.
adaptive parameter which determines how much weight will be put on functions
.
Usually value of will be lower for early generations to preserve diversity; will be bigger for
final generations to facilitate convergence. We propose changing value of is function of
generation count, survival of offspring and difference between fitness of the best individual
and desired fitness.
Measuring gene variation can be crucial. One naïve approach we can adopt is to measure
Euclidean distance from all the individuals of current population, which is of O(n). We can
improve this algorithm further by some trivial modifications. At first, we will take a point or
individual as reference for measuring distances. Let, we take the individual (LowerBound0,
LowerBound1,……, LowerBoundn-1) as our reference individual. Now at the beginning, we
will measure distance from reference individual to every individual of current population. So
normalized distance of any individual is
√
So mean normalized diversity of current population is
Standard deviation using µ as reference, √
Every individual will include an additional real valued vector called relative diversity.
Relative diversity is a measure of diversity of an individual relative to the rest of the
population. Relative diversity is found by calculating standard deviation of the population
using corresponding individual as reference
√
XXXIII
So diversity fitness function is
is monotonically increasing function of . A generic fitness function will be of form
where α is scaling factor dependent on optimization problem
Scaling factor α is needed to make diversity fitness function more comparable to real fitness
function.
Usually if , then offspring improves diversity .
Now if the offspring survives and normalized distance of replaced individual is . So
the new relative diversity measurement of individuals is
√
( )
( )
We will adopt elitist selection. Top 10% individuals according to real fitness function and top
10% individuals according to gene variation will be reserved. So this scheme emphasize both
on exploration and exploitation. The rest 80% individuals have to survive through proposed
fitness function.
Careful observation reveals that normalized diversity of individual is in range . So
standard deviation of ( ) will certainly between 0 and 1. We can use as adaptive
parameter . But if two groups of individuals are at maximum distance while group members
are in the same neighborhood, then will be nearly 0.5 high. But the population is not
diverse at all; it just converges in two groups situated far away from each other. We can take
average relative diversity as . But same problem still persists.
First we have to determine the area of neighborhood (β) in terms of normalized diversity .
Optimal size of neighborhood depends on optimization problem. We will introduce n-
dimensional array of buckets. A bucket is a small region specified by neighborhood size
which is essentially an n-dimensional hypercube. Individuals located on buckets region will
fall into that bucket. So there will be buckets. Each value in an element of array
means number of individuals in that bucket. Every element will be initialized to zero. One
can easily find value of bucket array index
⌊
⌋
For huge search space this bucket array may require enormous amount of memory. We can
use sparse matrix as data structure to address this issue.
After finding an individual in bucket region, value of that bucket will be increased. Individual
will also contain the location of bucket for easy removal.
XXXIV
We can see that . If we have to remove an individual we simply
decrement the value of corresponding bucket. We can use this as parameter .
When value of is higher algorithm puts more weight in real fitness function, because
current population is diverse enough. On the other hand, if value of is lower, algorithm puts
more weight in diversity fitness function as current population is losing diversity.
Even so, this function needs to be modified. It doesn’t take generation number into account.
So when the algorithm is converging, value of this function i.e. is low which will slow the
rate of convergence. A simple approach can be
( (
))
It works because is a monotonically increasing function. As generation increases value of
(
) also increases. And maximum value of (
) can be
1, then value of will be less than 1.
Using buckets, we can also define the regions that are already searched. We will introduce a
boolean variable named isSearched (true if the region is already searched, false if not) for
each bucket. We will define criteria eligible of being searched for each separate optimization
problem. For example, we can declare a bucket region searched when it has at least three
individuals each surviving at least 50 generations. The point of marking regions as searched
is that two individuals of same relative diversity, individual belongs to unsearched bucket
will get more weight in diversity than the one in searched bucket. For example, we will
reduce the to
⁄ .
(
)
A rather extreme scheme can be eliminating all individuals of a searched bucket except the
best individual in that bucket. These individuals will be replaced by new individuals taken
from buckets which are not searched and individual count is 0. But it can be detrimental for
complex optimization problem.
Above adaptive survival fitness function is for function maximization problems, but many
real life problems involve function minimization i.e. cost minimization. In case of function
minimization, above approach won’t work. Because, both real fitness function and diversity
fitness function should decrease value for better individuals. So we need to adopt an
algorithm which will assign lower diversity fitness value for diverse individuals, higher
diversity fitness value for less diverse individual. A simple approach can be
XXXV
Above equation assigns more diversity fitness for less diverse individual and less diversity
fitness for more diverse individual. So we can also apply this diversity management
technique to function minimization.
Another approach can be applying local search to the individuals like Memetic algorithm.
Steepest ascent hill climbing is adopted here. During this hill climbing process if that
individual goes through the region of a bucket, then that bucket will be marked as searched.
Advantage of this scheme is that we can easily identify the searched regions even if those
regions don’t meet the criteria for being marked as search.
In cases of premature convergence, we can override this function and manually set the value
of .
Although niching methods: crowding methods or fitness sharing maintains diversity. But
these methods lack control on diversity. They solely try to keep pre-existing diversity level;
they neither increase diversity level in case of premature convergence nor decrease diversity
to facilitate exploitation. On the other hand, proposed survivor selection scheme gives full
control on diversity level needed in any time of evolution.
3.4 New Mutation Strategy
Using probability distributions for generating random numbers to introduce random variation
in real vectors (is called mutation). Till now, only three distributions are used successfully in
mutation. They are:
1. Gaussian Mutation
2. Cauchy Mutation
3. Levy Mutation
Careful observation reveals that above three distributions used are members of Stable Family
of distributions. Stable Family is a family of distributions where linear combination of two
independent distributions of same kind has the same distribution up to location and scale
parameters. In fact, above three distributions are special cases of stable distribution. All the
stable distributions are infinitely divisible. They are absolutely continuous and unimodal. A
random variable X is called stable (has a stable distribution) if, for n independent copies Xi of
X, there exist constants cn > 0 and dn such that
XXXVI
Figure (3.4): Probability Density Function of Stable Family
So we can try other member distributions of stable family to generate random numbers for
mutation. Other two members of stable family are:
1. Laplace Distribution
2. Slash Distribution
3.4.1 Laplace Distribution
Like Gaussian distribution, it has two parameters: Location parameter, µ and Scale
parameter, σ. Cauchy distribution is the result of Fourier transformation of Laplace
distribution. The probability density function of the Laplace distribution is also reminiscent
of the Gaussian distribution; however, whereas the Gaussian distribution is expressed in
terms of the squared difference from the mean μ, the Laplace density is expressed in terms of
the absolute difference from the mean. Consequently the Laplace distribution has fatter tails
than the Gaussian distribution.
XXXVII
Figure (3.4.1a): Probability Density Function of Laplace Distribution
Figure (3.4.1b): Comparison of Gaussian and Laplace Distribution
Above is a graph of Gaussian and Laplace distribution with same scale and location
parameter. It is noticeable that Laplace has fatter tail than Gaussian and has a sharper peak
than Gaussian. Laplace falls rather quickly in comparison with Gaussian. It is expected to
have a higher probability of escaping from a local optima or moving away from a plateau,
especially when “the basin of attraction” of the local optima or plateau is large relative to the
mean step size. On the other hand, Gaussian has greater probability in the mid-range. From
observation, we can conclude that, sharp peak of Laplace facilitates exploitation as it has
more probability of producing short jump; it can also give long jump more than Gaussian.
Although for the mid-range jump, Gaussian gives better result.
XXXVIII
3.4.2 Slash Distribution
The Slash distribution is a continuous unbounded distribution developed as a deviation to the
Gaussian distribution to allow for fatter tails kurtosis by altering the κ parameter, as
illustrated in the plot below. When κ=0 the distribution reduces to a Gaussian(μ, σ). If
Gaussian distribution is divided by a standard uniform random variable, then the resulting
distribution is Slash distribution. It’s an example of ratio distribution.
It has three parameters, like Gaussian distribution location parameter μ, scale parameter σ and
an extra parameter κ.
Figure (3.4.2): Probability Density Function of Slash Distribution at different parameters
From above graph, we see that, as value of κ getting bigger, tail and peak of Slash
distribution is getting bigger, slope is getting steeper and mid-range is getting smaller. By
controlling the value κ, we can get an adaptive probability distribution which will facilitate
two extreme ends: exploitation and exploration.
The Slash distribution is used to fit to data that are approximately Gaussian distribution but
have a kurtosis > 3. i.e. greater than the Gaussian distribution. The Slash distribution can
readily be compared to a Gaussian distribution since they share the same mean μ and standard
deviation σ parameters.
Another distribution of which Gaussian distribution is a special form of family of
distributions is called Student’s t-distribution.
XXXIX
3.4.3 Student’s t-distribution
Student’s t-distribution (or simply the t-distribution) is a family of continuous probability
distributions that arises when estimating the mean of a normally distributed population in
situations where the sample size is small and population standard deviation is unknown. The
t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails,
meaning that it is more prone to producing values that fall far from its mean.
Figure (3.4.3): Probability Density Function for Student’s t-distribution with different degrees of
freedom
The overall shape of the probability density function of the t-distribution resembles the bell
shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit
lower and wider. As the number of degrees of freedom grows, the t-distribution approaches
the normal distribution with mean 0 and variance 1. As t-distributions have similar bell curve
shape of Gaussian distribution and when the number of degrees of freedom reaches infinity it
converges to Gaussian. Although for practical purposes, when degree of freedom is 30, t-
distribution converges to Gaussian. Careful observation reveals that when degree of freedom
is low, t-distribution have much fatter tail and lower peak. As degree of freedom (DOF)
increases, tails become thinner and peak becomes higher. That means, at low DOF, this
distribution gives more long jumps and with increase of DOF distribution gives sorter jumps.
So we can exploit this behavior of t-distribution. At the beginning of EA, DOF for t-
distribution will be low; diversity is needed so t-distribution will produce long jumps to
facilitate diversity. As generation increases we will increase DOF of t-distribution, it will
give less short jumps and the algorithm will be less exploitative.
XL
Chapter 4
Experimental Study
4.1 Modified DPGA
We have implemented
1. Standard GA
2. DPGA
3. Modified DPGA
Parameter setting for algorithms:
Maximum generation = 1000
Population size = 500.
Main population parents = 2
Reserve population parents = 2
Inbred main population offspring = 2
Inbred reserve population offspring = 2
Crossbred offspring (are produced by taking 1 parent individual from main population
and another parent from reserve population) = 2
For crossover blend crossover method was used with parameter .
Uniform Gaussian mutation with is applied for both of them.
Tournament selection was used for parent selection. Naïve survivor selection method was
adopted.
4.1.1 Pitfalls of Modified DPGA
Theoretically the proposed algorithm should work better than DPGA. But in practice it
doesn’t. At the beginning if the algorithm, we set which means generated individual
should be at maximum distance possible. As a result, on the fly generated individual always
go to the edge of search space. So generated individual only searches extreme ends of search
space. Thus offspring produced by crossover taking generated individual as parent, are also
on the boundary of search space or in its neighborhood. Mutating these offspring seldom
works, because short or mid-range jumps will still keep individual near other individuals.
And long jump needed to introduce diversity is very unlikely by current mutation operators.
XLI
Even if we design a mutation operator which gives this sort of jump, it has the risk of taking
individual out of search space and taking sufficiently diverse individual to already searched
regions. If we have search space of n dimensional, every dimension has same lower bound,
upper bound then the search space will be like n-dimensional hypercube. Literally this
algorithm only searches the faces of hypercube and their neighborhoods, while core region of
hypercube remains unsearched.
Another modification can be made to start the algorithm with lower value of . This will
prevent generating individual at the boundary of search space as well as offspring. But lower
value of also means algorithm is unable to make long jumps.
4.2 Adaptive Survivor Selection Strategy
We have used standard GA with different types of survivor selection scheme. Implemented
schemes are:
1. Naïve survivor selection
2. Adaptive survivor selection
Initial setting of parameters of adaptive survivor selection:
Population size = 2000
Maximum generation = 500
Bucket edge length = 1.90734863e-6
Minimum number of generations required to be declared searched = 70
Minimum number of individuals required to be declared searched = 3
Survival adaptive parameter = 0.3
Diversity scaling factor = 50
Penalty factor = 10
Reserved number of best individuals (elite) = 20
Reserved number of most diverse individuals (elite) = 20
Experiment result for Ackley function by both adaptive survivor selection and naïve
survivor selection given below:
XLII
Figure (4.2a): Change in diversity across generations
Figure (4.2b): Number of buckets searched
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Adaptive Survivor Selection Naive Survivor Selection
XLIII
Chapter 5
Conclusion
5.1
Adaptive survivor selection is a noble approach which first introduces adaptive survivor
selection with new diversity measurement. The best tool it provides that selection method can
be adapted with respect to generation and diversity level. It checks the amount of diversity so
that at any moment diversity doesn’t fall beyond lowest permitted value. It also incorporates
elitist selection scheme not only for best individuals but also for most diverse individuals; so
that when individuals are trapped into deep local optima, these most diverse individuals
found by far helps to escape.
Experiments show that adaptive survivor selection beats currently most used naïve survivor
selection in terms of maintaining diversity exclusively. Although niching methods can
maintain pre-existing diversity better than adaptive survivor selection sometimes, but we can
mitigate this gap of performance by using proper initialization of adaptive diversity parameter
and update rule. This scheme addresses one of the drawbacks of niching methods, they can’t
control the diversity needed for at any generation. Actually niching methods and adaptive
survivor selection have different goals. Niching methods mainly focuses on growing niches
of individuals and maintaining niches, on the other hand our scheme focuses on maintain the
level of diversity which can guide to individuals to global maxima.
5.2 Future Work
5.2.1 Modified DPGA
It is obvious that value of δ caused this measurable performance of this algorithm. If we can
change initialization and update rule of δ, hopefully this algorithm will perform better. One
approach could be instead of initializing δ to 1, we will initialize δ to lower values. Thus risk
of individuals going beyond the search space or only residing on the search space boundary
will be mitigated. But this approach has a flaw. If we restrict δ to lower values, that means
algorithm is now less capable of getting out of local optima and hence more prone to
premature convergence. Assigning value to δ can be taken from a probability distribution. So
that δ won’t be vulnerable to being too high or too low. After the initialization problem of δ is
solved, update rule of δ is still needs to be revised.
XLIV
5.2.2 Adaptive Survivor Selection
We have investigated new diversity measurement technique and using that diversity
measurement technique, we have proposed new survival selection strategy which works
better than existing survivor selection schemes. A pitfall of new diversity measurement is for
some edge cases, diversity measurement gives high value of diversity although the population
isn’t diverse at all. So detecting these edge cases and mitigating the error caused by these
edge cases can be done in future. Also we can adopt fitness sharing to assign fitness to each
bucket, where every individual of that bucket will share that fitness. Assigned fitness to a
bucket will be dependent how much diverse that bucket is. That means instead of measuring
diversity of individuals, we are measuring diversity of their container buckets. Once bucket is
assigned fitness, then individuals of same bucket will share that fitness among them.
5.2.3 New Distribution Based Mutation
We have investigated distributions which have similar properties of currently deployed
distributions or have same family origin. These distributions have bell shaped curve similar to
Gaussian to and also dependent on the same set of parameters like Gaussian, Cauchy or Levy
distributions. Three distributions presented before has potential to replace current distribution
based mutations. All of them have fatter tails and Laplace, Slash distributions have higher
peaks, so theoretically both of them should give better performance in both exploration and
exploitation. Student’s t-distribution has converged to Gaussian at DOF 30. So we can
experiment on which initial DOF, we initiate our algorithm and how we can change the DOF
as the generation increases.
XLV
References
[1] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading, MA, 1989.
[2] L.J. Fogel, A.J. Owens, and M.J. Walsh, Artificial Intelligence through simulated
evolution, New York, John Wiley & Sons, 1966.
[3] E. Eiben, R. Hinterding, and Z. Michalewicz, “Parameter control in evolutionary
algorithms,” IEEE Trans. Evol. Comput., vol. 3, no. 2, pp. 124–141, Jul. 1999.
[4] D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization”, IEEE
Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997.
[5] D. E. Goldberg and J. Richardson, “Genetic algorithms with sharing for multimodal
function optimization,” in Proc. 2nd Int. Conf. Genetic Algorithms (ICGA), 1987, pp. 41–49.
[6] T. Jumonji, G. Chakraborty, H. Mabuchi, and M. Matsuhara, “A novel distributed genetic
algorithm implementation with variable number of islands,” in Proc. IEEE Congr. Evolut.
Comput., 2007, pp. 4698–4705.
[7] Y. Yoshida and N. Adachi, “A diploid genetic algorithm for preserving population
diversity-pseudo-Meiosis GA,” in Proc. 3rd Parallel Problem Solving Nature (PPSN), 1994,
pp. 36–45.
[8] M. Kominami and T. Hamagami, “A new genetic algorithm with diploid chromosomes by
using probability decoding for nonstationary function optimization,” in Proc. IEEE Int. Conf.
Syst., Man, Cybern., 2007, pp. 1268–1273.
[9] S. W. Mahfoud, “Crowding and preselection revisited,” in Proc. 2nd Parallel Problem
Solving Nature (PPSN), 1992, pp. 27–37.
[10] S. W. Mahfoud, “Niching methods for genetic algorithms,” Ph.D. dis-sertation, Dept.
General Eng., Univ. Illinois, Urbana-Champaign, 1995.
[11] T. Park and K. R. Ryu, “A dual population genetic algorithm with evolving diversity,” in
Proc. IEEE Congr. Evol. Comput. , 2007, pp. 3516–3522.
[12] T. Park and K. R. Ryu, “Adjusting population distance for dual-population genetic
algorithm,” in Proc. Aust. Joint Conf. Artif. Intell., 2007, pp. 171–180.
[13] T. Park and K. R. Ryu, “A Dual-Population Genetic Algorithm for Adaptive Diversity
Control” in Proc. Aust. Joint Conf. Artif. Intell., 2009, pp. 191–210.
[13] R. McKay, “Fitness sharing in genetic programming,” in Proc. of the Genetic and
Evolutionary Computation Conference, Las Vegas, Nevada, 2000, pp. 435–442.
XLVI
[14] R. K. Ursem, “Diversity guided Evolutionary algorithm,” in Proc. of Parallel Problem
Solving from Nature (PPSN) VII, vol. 2439, J. J. Merelo, P. Adamidis, H. P. Schwefel, Eds.
Granada, Spain, 2002, pp. 462–471.
[15] T. Bäck and H.-P. Schwefel, “An overview of evolutionary algorithms for parameter
optimization,” Evol. Comput., vol. 1, pp. 1–23, 1993.
[16] K. Chellapilla, “Combining mutation operators in evolutionary programming,” IEEE
Trans. Evol. Comput., vol. 2, pp. 91–96, Sept. 1998.
[17] R. Mantegna, “Fast, accurate algorithm for numerical simulation of Lévy stable
stochastic process,” Phys. Rev. E, vol. 49, no. 5, pp. 4677–4683, 1994.
[18] X. Yao, G. Lin, and Y. Liu, “An analysis of evolutionary algorithms based on
neighborhood and step size,” in Proc. 6th Int. Conf. Evolutionary Programming, 1997, pp.
297–307
[19] D. Thierens, “Adaptive mutation rate control schemes in genetic algorithms,” in Proc.
Congr. Evol. Comput. , vol. 1. 2002, pp. 980–985.
[20] G. Rudolph, “On takeover times in spatially structured populations: Array and ring,” in
Proc. 2nd Asia-Pacific Conf. Genetic Algorithms Applicat., 2000, pp. 144–151.
[21] X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,” IEEE Trans.
Evol. Comput., vol. 3, no. 2, pp. 82–102, Jul. 1999.