Upload
david-e-goldberg
View
2.467
Download
1
Embed Size (px)
DESCRIPTION
Tutorial by David E. Goldberg at 2009 ACM Genetic and Evolutionary Computation Summit in Shanghai, China.
Citation preview
Fast, Effective Fast, Effective GAsGAsfor Large, Hard Problemsfor Large, Hard Problems
David E. GoldbergIllinois Genetic Algorithms LaboratoryUniversity of Illinois at Urbana-ChampaignUrbana, IL 61801 USAEmail: [email protected]; Web: http://www.illigal.uiuc.edu
GAs Had Their Warhol 15, GAs Had Their Warhol 15, Right?Right? Evolution timeless, GAs so
90s. First-generation GA results
were mixed in practice. Sometimes worked,
sometimes not & first impressions stuck.
But GAs had legs. In 90s, logical continuation
of GA thinking has led to ◦ Completion of theory in certain
sense,◦ & GAs that solve large, hard
problems quickly, reliably, and accurately.
Consider design theory and designs that have led to reliable solution of difficult problems.
Andy Warhol (1928-1987)
RoadmapRoadmap One-minute genetic
algorithmist. The unreasonableness of GAs. Airplane & toaster design: A
lesson from the Wright Brothers.
Goals of GA design. Design decomposition step by
step. Competent GA design from the
fast messy GA to hBOA. From competence to efficiency:
When merely fast is not enough.
A billion bits or bust.
One-Minute Genetic One-Minute Genetic AlgorithmistAlgorithmistWhat is a GA?Solutions as chromosomes.Means of evaluating fitness to purpose.Create initial population.Apply selection and genetic operators:
◦ Survival of the fittest.◦ Mutation◦ Crossover
Repeat until good enough.Puzzle: operators by themselves
uninteresting.
SelectionSelectionDarwinian survival of
the fittest.Give more copies to
better guys.Ways to do:
◦ roulette wheel◦ tournament◦ truncation
Gedanken experiment: Run repeatedly without crossover or mutation.
By itself, pick best.
CrossoverCrossoverCombine bits and
pieces of good parents.
Speculate on, new, possibly better children.
Gedanken experiment: 50, 11111 & 50, 00000.
By itself, a random shuffle
Example
Before X After X
11111 11000
00000 00111
MutationMutationMutation is a
random alteration of a string.
Change a gene, small movement in the neighborhood.
Gedanken experiment: 100, 11111.
By itself, a random walk.
Example
Before M After M
11111 11011
The Unreasonableness of The Unreasonableness of GAsGAsHow do individually uninteresting
operators yield interesting behavior?Others talk about emergence.1983 innovation intuition: Genetic
algorithm power like that of human innovation.
Separate ◦ Selection + mutation as hillclimbing or kaizen. ◦ Selection + recombination Let’s examine.
Different modes or facets of innovation or invention.
Selection + Recombination = Selection + Recombination = InnovationInnovationCombine notions to
form ideas. It takes two to invent
anything. The one makes up combinations; the other chooses, recognizes what he wishes and what is important to him in the mass of the things which the former has imparted to him. P. Valéry
Paul Valéry (1871-1945)
Airplane & Toaster DesignAirplane & Toaster Design Airport story. Why do the rules
change? Legacy of Descartes:
Separation of mind and body.
Material machines (airplanes, toasters, autos, etc.) vs. conceptual machines (GAs, neural nets, computer programs, algorithms).
Design is design is design.
Two Bicycle Mechanics from Two Bicycle Mechanics from OhioOhioFour years, 1899-
1903.Three gliders.Orville and Wilbur
Wright created powered flight from scratch.
Query: Why were the Wright brothers the first to fly?
HypothesesHypothesesWrights flew because they were
bicycle mechanics.Wrights flew because it was part
of the zeitgeist.Wrights flew because they were
bachelors!Maybe the Wrights flew because
they were better inventors.
December 17, 1903:December 17, 1903:The Most Famous Moment in Aviation HistoryThe Most Famous Moment in Aviation History
The Wright Brothers’ The Wright Brothers’ SecretSecretFunctional decomposition.Three subproblems:
◦Stability: wing-warping plus elevator in 1899 glider model. 1902 glider had three-axis active control.
◦Lift and Drag: wing shape improved on Lilenthal’s through air tunnel experiments.
◦Propulsion: rotary wing with forward lift is a propeller.
But Decomposition is Old Hat to But Decomposition is Old Hat to ModernsModernsComputer science is about one
thing: busting big problems up into lots of little problems.
Descartes’s theory of decomposition (1637). Discourse on Method of Rightly Conducting the Reason and Seeking Truth in the Sciences.
What else distinguishes Wrights’ method of invention? Method of modeling and integration was different.
Lessons of the Wright Lessons of the Wright BrothersBrothersEffective design decomposition of
your problem.Facetwise, economic models of
subproblem facets.Bounding empirical study and
calibration.Scaling laws (dimensional
analysis) particularly important.
Goals of GA DesignGoals of GA DesignSolve
◦hard problems,◦quickly,◦accurately,◦and reliably.
Call a GA that achieves these goals competent.
Can we design competent GAs?
Effective Theory in GA Effective Theory in GA DesignDesignMany GAs don’t scale & much
GA theory inapplicable.Need design theory that works:
◦ Understand building blocks (BBs), notions or subideas.
◦ Ensure BB supply.◦ Ensure BB growth.◦ Control BB speed.◦ Ensure good BB decisions.◦ Ensure good BB mixing
(exchange).◦ Know BB challengers.
Can use theory to design scalable & efficient GAs.
Play the GA GamePlay the GA GameGive you a population of strings Si.Give a list of associate fi values (bigger is
better).Ask you to create a better string.Blind: no equation relating fi and Si.
String Fitness10111 1001000 511010 2000011 3
What Are We Processing?What Are We Processing?Similarities among
strings.Schemata are
similarity subsets.Schemata described
by similarity templates.
Example: *1*** = {strings with 1 second position}.
Population contains 2l - n2l schemata.
String Fitness10111 1001000 511010 2000011 3
Schema Theorem (Holland Schema Theorem (Holland 1975)1975) where
f - fitnessH - schemaM- number of schema representatives - defining lengtho - schema order
Pc- probability of crossover
Pm - probability of mutation
l - string length Little schemata grow logistically.A necessary condition for BB growth.
€
m(H, t +1) ≥ m(H, t) f ( H ,t )
f t1− pc
δ ( H )l−1 − pmo(H)[ ]
Practical Schema Theorem for Practical Schema Theorem for DesignDesignFitness multiplier =
s.Overall disruption = .
Goldberg & Sastry, 2001 1]1[ cps
11
spc
Problem DifficultyProblem DifficultyThere are hard problems & easy
problems.3 way decomposition.The core:
◦deception - intra◦scaling - inter◦noise - extra
Easy Problems & Hard Easy Problems & Hard ProblemsProblemsThe OneMax
problem:
Linear, uniformly scaled
Define u, unitation variable
◦
◦ # or ones in binary
string
Needle-in-a-haystack (NIAH) problem:
No regularity to infer where good solutions might lie.
Nothing does better than enumeration or random search.
l
iix
1}1,0{ix
l
iixu
1
elsewhere
xxfxf
:0
*:*)(
Designing a Harder Designing a Harder ProblemProblemLow-order estimates mislead GAx* = 111: f111 > fi, i 111.Require complementary
schemata better than competitors.
Squashed Hammingcube representation:
4-bit Deceptive Trap4-bit Deceptive Trap
20,215.1,1 2 md bb
Good Decisions: 2- & Good Decisions: 2- & kk-Armed -Armed BanditBanditCompeting order-one
schemata form a two-armed bandit. Example: 0**** versus 1****.
Exponentially increasing trials to the observed best.
fff**, a 23 = 8-armed bandit.Many bandit problems played in
parallel.
Gambler’s Ruin Population Gambler’s Ruin Population SizeSizeMake Pbb = Q and solve for n:
= 1 - QIn terms of signal and noise:
Compare with populationwise pop-sizing equation:
)ln(
)ln(21
pp
k
n
dmk bbn )ln(2 1
2
2
2)(2d
k Mmcn
100-bit Onemax100-bit Onemax
The Complexity The Complexity TemptationTemptationFirst
complexity results for GAs.
Calculate the W=nt
Function evaluations as product of population size and run duration (single epoch.
A Sense of TimeA Sense of TimeTruncation selection: make s
copies each of top 1/sth of the population.
P(t+1) = sP(t) until P(t) = 1P(t) = stP(0)Solve for takeover time t*: time
to go from one good guy to all good guys (or all but one).
t* = lnn / lns
So What?So What?Who cares about selection alone?I want to analyze a “real GA”.How can selection-only analysis
help me?Answer: Imagine another
characteristic time, the innovation or mixing time.
The Innovation Time, The Innovation Time, ttii
Innovation time is the average time to create an individual better than one so far.
Under crossover imagine pi, the probability of recomb event creating better individual.
Innovation probability in Goldberg,
Deb & Thierens (1993) and Thierens & Goldberg (1993).
1)( nppt ici
Schematic of the RaceSchematic of the Race
Golf Clubs Have Sweet Golf Clubs Have Sweet SpotsSpotsSo do GAs.Easy problems, big sweet spots. Monkey can set GA parameters.Hard problems, vanishing sweet
spots.
[Goldberg, Deb, & Theirens, 1993]
Dr. Jekyll & Mr. Hyde in Dr. Jekyll & Mr. Hyde in PracticePractice GA literature full of
evidence for this problem. Evolution of the “typical
practitioner.”◦ First application goes
swimmingly.
◦ More complex application needs TLC.
◦ Big Kahuna needs compute time =
length of universe.
Why are we fiddling with codings and operators?
Aren’t GAs robust? No. First-generation GAs
are not.
Simple GAs Are Mixing Simple GAs Are Mixing LimitedLimitedWith growing difficulty, “sweet
spot” vanishes.Or populations must grow
exponentially.
The Key: Not the Schema The Key: Not the Schema TheoremTheoremMuch theory focuses on Holland’s
schema theorem.Schema theorem a piece of cake.Make sure GA fires on all seven
cylinders of the design decomposition.Surprise: Mixing is the key.To mix well, must get building blocks
right. Effective GAs identify structure of
problem.Data mine early samples for structure of the landscape.
GA Kitty Hawk: 1993GA Kitty Hawk: 1993
1993 & the fast messy GA.
Moveable bits, cutting and splicing, building-block filtering mechanism.
Original mGAcomplexityestimated: O(l5)
Compares favorably to hillclimbing, too (Muhlenbein 1992).
[Goldberg, Deb, Kargupta, & Harik, 1993]
Look Ma, No Genetics: Look Ma, No Genetics: hBOAhBOAReplace genetics with probabilistic model
building PMBGA or estimation of distribution algorithm: EDA
3 main elements:◦ Decomposition (structural learning): Learn
what to mix and what to keep intact.◦ Representation of BBs (chunking): Means
of representing alternative solutions.◦ Diversification of BBs (niching): Preserve
alternative chunks of solutions.Test on adversarially designed functions so
works on yours.
Schematic of BOA Schematic of BOA StructureStructure
Current population Selectio
n
New populatio
n
Bayesian
network
Results on Spin GlassesResults on Spin Glasses
Pelikan et al. (2002)
64 100 144 196 256 324 400
103
Problem Size
Nu
mb
er
of E
valu
atio
ns hBOA
O(n3.51)
From Competence to From Competence to EfficiencyEfficiencyMotivation: Even competent GAs
require O(I2) time.1000*1000 = a million function
evaluations.In real problems, this can be a
problem.How can we systematically
achieve speedups.
IlliGAL Decomposition of IlliGAL Decomposition of EfficiencyEfficiency
1. Space: parallelization
2. Time: continuation
3. Fitness: Evaluation relaxation
4. Specialization: Hybridization
Computation time:
Communications time:
Less computations, more communications
P
nT f
cPT
Master-Slave Parallel GAsMaster-Slave Parallel GAs
Account for Time (and Account for Time (and Quality)Quality)Use perspective of the master
Minimize time:
cf
p PTP
nTT
c
f
TnT
P *
Master-Slave ExampleMaster-Slave ExampleDummy function Tf =
4 ms
Communications time Tc = 19ms
Pop size: 120, length = 80
02.41194120* S
Cantu-Paz, E. and Goldberg, D. E.(1999). On the scalability of parallel genetic algorithms, Evolutionary Computation, 7(4), 429-449.)
Speedups and EfficiencySpeedups and Efficiency
cPnT
f
p
s
PT
nTTT
Spf
100,10,1c
f
T
T
My Dr. Evil MomentMy Dr. Evil Moment Lunchtime question: do
real large problems draw attention to theoretical & design findings?
Dr. Evil’s mistake: Wondered if GAs could go to 106 vars.
Decided to go for a billion. Use simple underlying
problem (OneMax) with Gaussian noise (0.1 variance of deterministic problem)
Don’t try this at home!!!We get the warhead and then hold the world ransom for... 1 MILLION DOLLARS!
Road to Billion Paved with Road to Billion Paved with SpeedupsSpeedups Naïve
implementation: 100 terabytes & 272 random number calls.
cGA Memory O(ℓ) v. O(ℓ1.5).
Parallelization speedup np.
Vectorize four bits at a time speedup 4.
Other doodads (bitwise ops, limit flops, inline fcns, precomputed evals) speedup 15.
Gens & pop size scale as expected.
A Billion Bits or BustA Billion Bits or Bust
Simple hillclimber solves 1.6(104) or (214).
Souped-up cGA solves 33 million (225) to full convergence.
Solves 1.1 billion (230) with relaxed convergence.
Growth rate the same Solvable to convergence.
Design Fast, Effective GAsDesign Fast, Effective GAsGA design advanced by taking GA
ideas and running with them.Large, difficult problems in grasp.Theory and practice in sync.These direct lessons are crucial.Meta-lessons of this style of thinking
as important for complex systems & interdisciplinary work, generally.
This style of theory works for all GEC.
More InformationMore InformationGoldberg, D. E. (2002). The
design of innovation: Lessons from and for competent genetic algorithms. Boston, MA: Kluwer Academic Publishers.
Lab: www.illigal.org iFoundry:
www.ifoundry.illinois.eduEmail: [email protected]