View
216
Download
2
Tags:
Embed Size (px)
Citation preview
1
Genetic Algorithms
Data Mining and Distributed Computing Group
Facultad de Informática
2
DATSI, Universidad Politécnica de Madrid
Overview
A class of probabilistic optimization algorithms Inspired by the biological evolution process Uses concepts of “Natural Selection” and “Genetic
Inheritance” (Darwin 1859) Originally developed by John Holland (1975) Particularly well suited for hard problems where little
is known about the underlying search space Widely-used in business, science and engineering
3
DATSI, Universidad Politécnica de Madrid
Natural Evolution
Darwin: The Origin of Species– Search of optimal forms (environment)– Based on:
assortment : recombination of genetic material
randomness selection : survival of the fittest
4
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Cell)
• A set of many small “factories” working together
• Center: cell nucleus
• The nucleus contains the genetic information
5
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Chromosome)
• Chromosomes store genetic information
• Each chromosome is build of DNA
• Chromosomes in humans form pairs (23 pairs)
• The chromosome is divided in parts: genes
• Genes code for properties
• Possible values of the genes: allele
• Position of the gene in the
chromosome: locus
6
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Genetics)
• The entire combination of genes: genotype
• A genotype is expressed as a phenotype
• Alleles can be either dominant or recessive
• Dominant alleles will always express from the genotype to the phenotype
• Recessive alleles can survive in the population for many generations, without being expressed
7
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Reproduction)
Mitosis: copying the same genetic information to new offspring: there is no exchange of information
Normal way of growing of multicell structures, like organs.
8
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Reproduction)
• Meiosis: the basis of sexual reproduction
• After meiotic division 2 gametes appear in the process
• In reproduction two gametes conjugate to a zygote wich will become the new individual
• Genetic information is shared between the parents in order to create new offspring
9
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Reproduction)
• During reproduction “errors” occur
• Due to these “errors” genetic variation exists
• Most important “errors” are:
• Recombination (cross-over)
• Mutation
10
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Natural Selection)
• The origin of species: “Preservation of favourable variations and rejection of unfavourable variations.”
• Survival of the fittest
• Mathematical expresses as fitness: success in life
11
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms (GAs)
John Holland (~1973)– GAs use the principle of natural selection to solve
complicated optimization problems.– An alternative of brute-force search on a complex
solution space. Instead of complete exploration Estocastic-driven search
12
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms (GAs)
F inonacc i N ew ton
D irect m ethods Indirect m ethods
C alcu lus-based techn iques
Evolu tionary s trategies
C entra l ized D is tr ibuted
Para l le l
S teady-s ta te G enera tiona l
Sequentia l
G enetic a lgori thm s
Evolutionary a lgori thm s S im u lated annealing
G uided random search techniques
D ynam ic program m ing
Enum erative techn iques
Search techniques
© W. Williams, ‘99
13
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms: Definitions
Concepts:– Individual: A possible solution of the problem.– Gene: A solution component (a.k.a. an attribute) [eye color]– Allele: A gene value [eye color=green]– Population: group of potential solutions
10011101011011
-19 4 108 46
0.433 -33.345 0.0013
+
A x
B 3
14
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms: Definitions
Concepts:– Genotype: Genetic codification (seq. of genes)– Phenotype: Actual individual (with its capabilities)
For GAs, phenotype represents the fitness value of the solution:
Asumption: there exists a relationship between genetic information and individual fitness.
15
DATSI, Universidad Politécnica de Madrid
Fitness Landscapes
Fitness: Function that evaluates individual capabilities.
Graphical representation
N+1 dimensions
fitness
© L.J. Pit, ‘96
16
DATSI, Universidad Politécnica de Madrid
Natural Evolution and Operators
How new individuals are created?– Crossover (recombination)
mixing existing genetic material, assortment Inheritage of usefull characteristics
– Mutation: rare (very rare in biological perspective) Randomness
They simulate the evolution process
17
DATSI, Universidad Politécnica de Madrid
Natural Evolution and Operators
Crossover
1001110101101100010100010001
#9
00010100011011
Variants: 2, 3, n-points crossover Uniform crossover Other...
18
DATSI, Universidad Politécnica de Madrid
Natural Evolution and Operators
Mutation
00010100011011
#5
00011100011011
Variants: Mutation-by-swap Biased mutation Cataclysmic mutation
19
DATSI, Universidad Politécnica de Madrid
Natural Evolution and Operators
Selection:1. Evaluation of individuals (fitness)
2. Parent selection (for each crossover operation).
Selective pressure: relationship between capacities (fitness) of the individual and its possibilities to generate new individuals (participate in the reproduction).
20
DATSI, Universidad Politécnica de Madrid
Natural Evolution and Operators
Selection (alternatives):– Tournament selection– Rank-based: Continuous function– Roulette wheel
Variants:– Previous population selection (worst individual
elimination).
21
DATSI, Universidad Politécnica de Madrid
Canonical Genetic Algorithm
P0
Initial population
end condition?
selection P’iSelected population
crossover/mutation
Fitness-function
Pi+1
Next population
Includes the chances of participation in reproduction
22
DATSI, Universidad Politécnica de Madrid
GA Variants (Replacement)
Elitism:– The best(s) individuals in Pi stay in population Pi.
– Provides monotonic fitness increment
Steady-state:– Only one individual is created per generation.– New one replaces worst individual.
23
DATSI, Universidad Politécnica de Madrid
Replacement Overview
Pi
Current population
SelectionCrossover/mutation
P’iNext population
Offi
Offspring population
Canonical Steady-sate Elitism
Offspring size N 1 >, = ó < N
Current Next 0 N-1 << N
Replacement All Worst Worst
Size=N
24
DATSI, Universidad Politécnica de Madrid
The Metaphor
Genetic Algorithm Nature
Optimization problem Environment
Feasible solutions Individuals living in that environment
Solutions quality (fitness function) Individual’s degree of adaptation to its surrounding environment
A set of feasible solutions A population of organisms (species)
Stochastic operators Selection, recombination and mutation in nature’s evolutionary process
Iteratively applying a set of stochastic operators on a set of feasible solutions
Evolution of populations to suit their environment
25
DATSI, Universidad Politécnica de Madrid
Aspects to consider: Population Size
What’s the optimal size of the population?– Too small: premature convergence (weak
exploration)– Too large: waste of resources
26
DATSI, Universidad Politécnica de Madrid
Aspects to consider:Problem representation
A key aspect: chromosomes representation– Initially, a binary string– Nowadays, other possibilities:
Bit strings (0101 ... 1100) Real numbers (43.2 -33.1 ... 0.0 89.2) Permutations of element (E11 E3 E7 ... E1 E15) Lists of rules (R1 R2 R3 ... R22 R23) Program elements (genetic programming) ... any data structure ...
– Great influence in the problem resolution
27
DATSI, Universidad Politécnica de Madrid
GA elements to define
Chromosome representation Creation of initial population Fitness function Genetic operators Parameters (population size,
probabilities of the genetic operators, etc.)
Population size: 50 – 100
Children per generation:
= population size
Crossovers: > 85%
Mutations: < 5%
Generations: 20 – 20,000
Typical configuration for small problems:
28
DATSI, Universidad Politécnica de Madrid
Example (The MAXONE problem)
• Suppose we want to maximize the number of ones in a string of m binary digits
Is it a trivial problem?
• It may seem so because we know the answer in advance
• However, we can think of it as maximizing the number of correct answers, each encoded by 1, to m yes/no difficult questions`
29
DATSI, Universidad Politécnica de Madrid
MAXONE problem
An individual is encoded (naturally) as a string of m binary digits
The fitness f of a candidate solution to the MAXONE problem is the number of ones in its genetic code
We start with a population of n random strings. Suppose that m = 10 and n = 6
30
DATSI, Universidad Politécnica de Madrid
MAXONE problem (initialization)
• We toss a fair coin 60 times and get the following initial population:
s1 = 1111010101 f (s1) = 7
s2 = 0111000101 f (s2) = 5
s3 = 1110110101 f (s3) = 7
s4 = 0100010011 f (s4) = 4
s5 = 1110111101 f (s5) = 8
s6 = 0100110000 f (s6) = 3
31
DATSI, Universidad Politécnica de Madrid
MAXONE problem (selection)
• Next we apply fitness proportionate selection with the roulette wheel method:
21n
3
Area is Proportional to fitness value
Individual i will have a
probability to be chosen
i
if
if
)(
)(
4
• We repeat the extraction as many times as the number of individuals we need to have the same parent population size (6 in our case)
32
DATSI, Universidad Politécnica de Madrid
MAXONE problem (selection)
• Suppose that, after performing selection, we get the following population:
s1` = 1111010101 (s1)
s2` = 1110110101 (s3)
s3` = 1110111101 (s5)
s4` = 0111000101 (s2)
s5` = 0100010011 (s4)
s6` = 1110111101 (s5)
33
DATSI, Universidad Politécnica de Madrid
MAXONE problem (crossover)
• Next we mate strings for crossover. For each couple we decide according to crossover probability (for instance 0.6) whether to actually perform crossover or not
• Suppose that we decide to actually perform crossover only for couples (s1`, s2`) and (s5`, s6`). For each couple, we randomly extract a crossover point, for instance 2 for the first and 5 for the second
34
DATSI, Universidad Politécnica de Madrid
MAXONE problem (crossover)
s1` = 1111010101 s2` = 1110110101
s5` = 0100010011 s6` = 1110111101
Before crossover:
After crossover:
s1`` = 1110110101 s2`` = 1111010101
s5`` = 0100011101 s6`` = 1110110011
35
DATSI, Universidad Politécnica de Madrid
MAXONE problem (mutation)
• The final step is to apply random mutation: for each bit that we are to copy to the new population we allow a small probability of error (for instance 0.1)
• Before applying mutation:
s1`` = 1110110101 s4`` = 0111000101
s2`` = 1111010101 s5`` = 0100011101
s3`` = 1110111101 s6`` = 1110110011
36
DATSI, Universidad Politécnica de Madrid
MAXONE problem (mutation)
After applying mutation:
s1``` = 1110100101 f (s1``` ) = 6
s2``` = 1111110100 f (s2``` ) = 7
s3``` = 1110101111 f (s3``` ) = 8
s4``` = 0111000101 f (s4``` ) = 5
s5``` = 0100011101 f (s5``` ) = 5
s6``` = 1110110001 f (s6``` ) = 6
37
DATSI, Universidad Politécnica de Madrid
MAXONE problem
• In one generation, the total population fitness changed from 34 to 37, thus improved by ~9%.
• At this point, we go through the same process all over again, until a stopping criterion is met
38
DATSI, Universidad Politécnica de Madrid
Local Optimizations (Lamark)
Fitness calculation performs an improved solution in the neibourghood:– Lamarkian Evolution: the inheritance of acquired
characteristics.– Methods:
Hill-climbing Simulated annealing
39
DATSI, Universidad Politécnica de Madrid
Local Optimizations (Baldwin)
Other alternatives:– Evolve fitness
function– But keep
genetic representation
As a result:– New fitness
landscape
© L.J. Pit, ‘96
40
DATSI, Universidad Politécnica de Madrid
GAs: Why do they work?
• Schema Theorem
• Building Blocks Hypothesis
41
DATSI, Universidad Politécnica de Madrid
Schema Notation
• {0,1,#} is the symbol alphabet, where # is a special wild card symbol
• A schema is a template consisting of a string composed of these three symbols
• Example: the schema [1#00#] matches the strings: [10000], [10001], [11000] and [11001]
100
1 000 0
1 000 1
1 100 0
1 100 1
42
DATSI, Universidad Politécnica de Madrid
Schema
“A schema is a similarity template describing a subset of strings with similarities at certain positions” John Holland
Crossover causes low order schema, to get increased representation in the next generation.
Mutation causes long schema to be destroyed! Thus short schema have a better chance of surviving to the next generation
43
DATSI, Universidad Politécnica de Madrid
Schema Theorem and Building Block
GAs explore the search space by short, low-order schemata which, subsequently, are used for information exchange during crossover
Objective: Building blocks (set of alleles with representative participation on fitness/solution quality).
44
DATSI, Universidad Politécnica de Madrid
Building Blocks Hypothesis
“A genetic algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata, called the building blocks”
Some building blocks (short, low-order schemata) can mislead GA and cause its convergence to suboptimal points
45
DATSI, Universidad Politécnica de Madrid
Differences betwwen GA and Conventional Search Algorithms
GA works on a coding of the parameters set, not the parameter themselves
GA searches from a population of points, not a single point GA uses only a payoff function, and no domain knowledge GA uses probabilistic transition rules, not deterministic
ones GA can provide a number of potential solutions to a given
problem. The final choice is left to the user.
46
DATSI, Universidad Politécnica de Madrid
Conclusions
“Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal
combinations of things, solutions you might not otherwise find in a lifetime.”
Salvatore Mangano
Computer Design, May 1995