Upload
antonia-blankenship
View
219
Download
0
Embed Size (px)
Citation preview
2101INT – Principles of Intelligent Systems
Lecture 11
Biological Overview of Genetic Algorithms
Organisms produce a number of offspring similar, but not entirely so, to themselves
– Variations are caused by mutations – random changes in the genome, which are often environmental
– Variations are caused by sexual recombination – genes inherited from parents giving giving characteristics of each
On an evolutionary scale, the better adapted offspring are more likely to survive and produce their own offspring
Over time this reinforces their particular genetic characteristics
Genetic Algorithms use this process to evolve better solutions to problems
Classes of search techniques
Search Techniques
Calculus Base Techniqes
Local search techniques
Enumerative Techniqes
BFSDFS Dynamic Programming
Tabu Search Hill Climbing
Simulated Anealing
Evolutionary Algorithms
Genetic Programming
Genetic Algorithms
Fibonacci Sort
What is a GA?
An allele is the simplest piece of genetic information, a base-pair in humans, a bit in computers
A gene is string of alleles responsible for the production of a protein (humans) describing one particular feature of a solution
A chromosome is a sequence of genes. Humans have many, generally GAs are considered to just have one, that being the complete set of genes describing all features of the problem
The genotype refers to the genes of an organism, the phenotype refers to the observable characteristics
What is a GA? - Example
Consider the problem of matching observed data to a polynomial curve – ax4 + bx3 + cx2 + dx + e
Our environment variables are a, b, c, etc. and it is these that will form our genes. Let us assume we have an 8-bit signed integer for each variable. One random chromosome is then:
{01010101 11110011 11001010 00001000 11010010}
Each 8-bit word represents a gene. Each single bit is an allele. In total we have a 40-bit chromosome.
Genotypes and phenotypes
Continuing the curve fitting example, we can discuss the differences between genotypes and phenotypes
Genotype-space is the space of possible chromosomes, i.e. the space of possible 40-bit strings so has 240
elements. Phenotype-space is the space of possible 4th order
polynomial curves (with 8-bit signed coefficients) which are the characteristics given by the genes. There may be fewer elements in the phenotype space, as multiple genotypes can map to a single phenotype.
Metaphor
Nature Genetic Algorithm
Environment Optimisation problem
Individuals – humans Feasible solutions
Degree of adaptation Solution quality/fitness
A population A set of feasible solutions
Selection, recombination & mutation
Genetic operators – analogues of the biological operators
Life Iteratively applying operators to population
Flowchart of a Genetic Algorithm
Outputsolution
Initialize Population
Terminate?Yes
No
Evaluate Fitness
Perform selection, crossover and mutation
Evaluate Fitness
Problem Encoding
Each problem variable becomes a gene with sufficient bits to represent the domain of possible values
– Genes can represent integers, real numbers, permutations of a list, lists of rules, sequences of instructions
Must be careful (particularly with last three) that solutions remain feasible
After instantiating the genes, need some measure of fitness which can judge which genes give the best performance in the phenotype space
Fitness
Generally consider that higher fitness equates to a better solution. When it doesn’t, it is necessary to standardise the fitness function so that it does
Continuing the curve fitting example, we could find the sum of square error for the true curve (given by data points) and our evolved curve (given by its coefficients)
Of course, this function ideally should be zero, if the evolved curve is a perfect match
spoxip edxcxbxaxTf
int
2234 )(
Standardising Fitness
Easiest way to standardise fitness is to subtract it from the largest fitness in current population
p refers to a single population element, P to the current set of all population elements
pPMAXp fff '
Selection
Many different types of selection– Fitness proportionate: randomly select elements based on
their fitness proportionate to total population fitness– Fitness guaranteed: select each element a guaranteed
minimum number of times based on its fitness– Rank selection: select based on fitness rank, rather than on
true fitness. Useful when the fitness function increases exponentially and some elements could dominate.
– Tournament selection: extract k elements from the population and make them compete against each other for selection. Useful when evolving strategies.
All selection methods give a pool of individuals that may participate in reproduction
Genetic Operators
Primary operators are crossover, mutation and to some extent cloning.
Many other specialised operators have been formulated for particular problems
Genetic operators can be considered as performing a local search of the phenotype space. The local area of a genotype is the set of all other genotypes that can be reached by applying the genetic operators.
As such, the local area depends on the larger population, rather than a single genotype
Crossover/Recombination
Analogue of sexual reproduction – combines the genetic material of two parents to form two new offspring
This is the main genetic operator. Different types: gene based, crossover, random
Gene Preserving recombination
Gene based. Don’t break genes.
01101001 01001110 10101101 10110101 11010100 01011010 10110100 10100101 11011001 01011010 10101101 10100101
Crossover recombination
Pick a point and take all genes to the left from one parent, and all from the right of another. Can preserve gene integrity:
01101001 01001110|10101101 10110101 11010100 01011010|10110100 10100101 01101001 01001110|10110100 10100101
But more often is just a random point:
01101001 0|1001110 10101101 10110101 11010100 0|1011010 10110100 10100101 01101001 0|1001110 10110100 10100101
Multi-point crossover
Using just a single point crossover is less disruptive to a genotype
Two-point crossover treats the genotype as a ring, where the start and end allele are considered joined
Some evidence to suggest that multi-point crossover can be useful in smaller population sizes
Mutation
Asexual reproduction, using just a single parent. Flip the bits at a number of random positions in a
chromosome. Used to re/introduce diversity into the population.
01101001 01001110 10101101 1011010101111001 01001110 10001101 10010101
Cloning
Cloning exactly duplicates/copies an existing genotype into the successive generation
Often used to copy the best element(s) of one population to the next, in which case the GA is described as elitist, since the elite elements remain
Genetic Programming
GP was developed by Koza around 1990 Extends the GA to a non-linear, tree-based structure. Instead of single bits, nodes in the tree represent
functions and terminals (constants). Internal nodes are functions Leaf nodes are terminals or 0-arity functions - such as
rand()
Unrestricted Size and Bloat
Beyond practical limitations, the chromosomes of GP are not restricted to a particular size of shape
The trees will continue to grow while ever there is no appreciable reduction in fitness
So as an example, consider curve fitting. If you didn’t know it was a 4th order polynomial, you could use GP instead, which could learn 4th, 5th etc order polynomials
Introduces the problem of bloat – trees will continue to increase in size even if there is no increase in fitness, as long as it remains constant
Multi-criteria Optimisation
Bloat is often difficult to control because it introduces a multi-criteria problem – that is, give me the equation of the best fitting curve that is also smallest
How do you trade-off an improved curve with a smaller description?
No easy solution – introduces the concept of Pareto optimisation, and Pareto fronts.
One answer is said to dominate another if it is as good according to one measure and better according to another.
The current set of non-dominated solutions is termed the Pareto-optimal front
Use of GP
GP can be used to evolve programs That can be computer programs, functions, strategies What you can evolve is limited only by the choices of
functions and terminals GP is usually strongly typed to ensure that solutions
remain feasible. For example, adding an Int and a String has no meaning, and would not be allowed to occur – interchanged subtrees are always of the same type
GP Operators
Extends crossover to interchange subtrees of two parents
Mutation generates an entirely new subtree rooted at a random point
Cloning is identical
Comparing GAs and GP
There are extensions to GAs that allow those linear data structures to represent non-linear tree like data. After all, computer memory is linear and it still manages to store the trees somehow.
These GAs are generally termed messy GAs, and do not have a predetermined or fixed length
But generally, if you know the form of the solution you are searching for you would use a GA, if not, use GP
Bibliography
I direct you towards the following books– Richard Dawkins “The Selfish Gene”– John Holland “Adaptation in Natural and Artificial Systems”– John Koza “Genetic Programming”
The first one is popular science The second two are the original GA and GP text books
respectively