Pkl Seminar

8/16/2019 Pkl Seminar

1/53

Runtime Analysis of Evolutionary Algorithms

Per Kristian Lehre

School of Computer ScienceUniversity of Nottingham

Functional Programming Laboratory Seminar

Nottingham, November 25th 2011


2/53

Black Box Optimisation

Function class F

f : X → R

Photo: E. Gerhard (1846).

f (x1), f (x2), f (x3),...,f (xt) x1, x2, x3,...,xt

[Droste et al., 2006]


3/53

Evolutionary Algorithms

Selection

Variation


4/53

Meta-heuristics in Operations Research

Meta-heuristics are practical optimisation-techniques

a pragmatic approach to NP-hard optimisation problems

easy to implement, adaptable to many problem domains

often produces solutions of strikingly good quality


5/53

A Theory of Meta-heuristics?

Weak theoretical foundation has been a major critique

lack of performance guarantees

impact of parameter settings poorly understood

no deep understanding of how and why they work

Largely ignored by the general theory community

not “real algorithms”

no guarantees about performance

mathematically very challenging


6/53

An Important Challenge...

“Developing the mathematical methodology for

explaining and predicting the performance of these and

other heuristics is one of the most important challenges

facing the fields of optimization and algorithms today.”

Papadimitriou and Steiglitz (1998)


7/53

Outline

Introduction

Runtime AnalysisBasic DefinitionsOverview of Results

Evolutionary AlgorithmsExploration vs ExploitationAnalytical Techniques

Directions for Further Work

Heuristic UnderstandingSystems to Build Systems

Conclusion


8/53

Runtime Analysis of Meta-heuristics

General Practical Question

Under what conditions will a given heuristic returnsolutions of acceptable quality within reasonable time?


9/53

Runtime Analysis of Meta-heuristics

General Practical Question

Under what conditions will a given heuristic returnsolutions of acceptable quality within reasonable time?

Theoretical Approach The runtime of a heuristic on a problem is

iterations until optimal (or approximate) solution found

Analysis of how the runtime depends on

1. problem characteristics2. parameters of the heuristic


10/53

Meta-heuristics are Randomised Algorithms

(1+1) EA on Easy FSM instance (n=200).

Number of iterations.

D e n s i t y

1000 1500 2000 2500 3000 3500 4000

0 e + 0 0

4 e − 0 4

8 e − 0 4

DefinitionThe expected runtime of an algorithm A on function class F is

T A,F := maxf ∈F

E [T A,f ]

where T A,f is number of f -evaluations before optimum found.


11/53

Expected runtime as a function of problem instance size

4 5 6 7 8 9 11 13 15 17 19 21 23 25

0 e + 0 0

4 e + 0 6

8 e + 0 6

RS on Easy FSM instance class.

Number of states in FSM (n).

I t e r a t i o n s o f R S .

Exponential =⇒ Algorithm impractical on problem.


12/53

Expected runtime as a function of problem instance size

10 60 160 260 360 460 560 660 760

0

2 0 0 0

6 0

0 0

1 0 0 0 0

(1+1) EA on Easy FSM instance class.

Number of states in FSM (n).

I t e r a t i o n s

o f ( 1 + 1 ) E A .

Exponential =⇒ Algorithm impractical on problem. Polynomial =⇒ Possibly efficient algorithm.


13/53

Analytical Tool Box

Artificial Fitness Levels

[Wegener and Witt, 2005, Lehre, 2011a] Concentration of measure

[Dubhashi and Panconesi, 2009]

Typical Runs

Expected Multiplicative Weight Decrease[Neumann and Wegener, 2007]

Drift Analysis [Hajek, 1982]

Branching Processes [Lehre and Yao, 2009]

Electrical Resistive Networks[Lehre and Haddow, 2006]

Yao’s Minimax Principle[Motwani and Raghavan, 1995]


14/53

OneMax (1+1) EA O(n log n) [Mühlenbein, 1992](1+λ) EA O(λn + n log n) [Jansen et al., 2005](µ+1) EA O(µn + n log n) [Witt, 2006]

1-ANT O(n2) w.h.p. [Neumann and Witt, 2006](µ+1) IA O(µn + n log n) [Zarges, 2009]

Linear Functions (1+1) EA Θ(n log n) [Droste et al., 2002] and[He and Yao, 2003]

cGA Θ(n2+ε), ε > 0 const. [Droste, 2006]

Max. Matching (1+1) EA eΩ(n), PRAS [Giel and Wegener, 2003]

Sorting (1+1) EA Θ(n2 log n) [Scharnow et al., 2002]

SS Shortest Path (1+1) EA O(n3 log(nwmax)) [Baswana et al., 2009]

MO (1+1) EA O(n3) [Scharnow et al., 2002]

MST (1+1) EA Θ(m2 log(nwmax)) [Neumann and Wegener, 2007]

(1+λ) EA O(nλ log(nwmax)), λ = m2n [Neumann and Wegener, 2007]1-ANT O(mn log(nwmax)) [Neumann and Witt, 2008]

Max. Clique (1+1) EA Θ(n5) [Storch, 2006]

(rand. planar) (16n+1) RLS Θ(n5/3) [Storch, 2006]

Eulerian Cycle (1+1) EA Θ(m2 logm) [Doerr et al., 2007]

MinCut ACO with h = O(1) O(n2(h/)) [Kötzing et al., 2010]

Partition (1+1) EA PRAS, avg. [Witt, 2005]

Vertex Cover (1+1) EA eΩ(n), arb. bad approx. [Friedrich et al., 2007] and

[Oliveto et al., 2007]Set Cover (1+1) EA eΩ(n), arb. bad approx. [Friedrich et al., 2007]

SEMO Pol. O(log n)-approx. [Friedrich et al., 2007]

MaxLeafST (1+1) EA mΩ(k) [Kratsch et al., 2011]

(k leaves) (1+1) EA edge-exch. O(215k2 log k)

Intersection of (1+1) EA 1/p-approximation in [Reichel and Skutella, 2008]

p ≥ 3 matroids O(|E|p+2 log(|E|wmax))

UIO/FSM conf. (1+1) EA eΩ(n) [Lehre and Yao, 2007]


15/53

What about population-based EAs?


16/53

What about population-based EAs?

“ G i v e n t he mat he mat i c al d i ffi c ul t y o f

t he i nfi ni t e p o p ul at i o n s i z e mo d e l , w e

d o ub t t hat a mat he mat i c al anal y s i s o f fi ni t e p o p ul at i o ns w i l l

b e p o s s i b l e .”

[ M ̈ uhl e nb e i n, 19 9 7 ]


17/53

Exploration vs Exploitation...

Selection

Variation


18/53


E l i E l i i


19/53


E l i Al i h f ( )


20/53

Evolutionary Algorithm maxx∈{0,1}n f (x)

P tx

for t = 0, 1, 2, . . . until termination condition dofor i = 1 to λ do

Sample i-th parent x according to psel(P t, f )Sample i-th offspring P t+1(i) according to pmut(x)

end forend for

S l ti d V i ti


21/53

Selection and Variation

H l t t i k?


22/53

How large tournament size k?

k = 1 No selective pressure

Unbiased random walk

Efficient optimisation is impossible

k = λ Highest selective pressure

Only fittest individual reproduced

No population diversity

Ho la ge to a e t si e k?


23/53


p0 2n

k

1

7

exp

poly

Runtime

Example

The runtime T of a non-elitist EA with tournament size k

bit-wise mutation rate p

population size λ > log(nr)

on any unimodal function with n Boolean variables

r distinct function values

has expected value

E [T ] =

eΩ(n) if k < e pn

O(λ2r + nr) if k > e pn

Lehre (PPSN’10), Lehre (GECCO’11)



24/53


p0 2n

k

1

7

exp

poly

Runtime

Example

The runtime T of a non-elitist EA with tournament size k

bit-wise mutation rate p

population size λ > log(nr)

on any unimodal function with n Boolean variables

r distinct function values

has expected value

E [T ] =

eΩ(n) if k < e pn

O(λ2r + nr) if k > e pn

Lehre (PPSN’10), Lehre (GECCO’11)

k > e p n

How close to the global optimum in polynomial time?


25/53

How close to the global optimum in polynomial time?

Theorem ([Lehre, 2011b])

W.o.p., the Hamming-distance from any individual in the first ecngenerations to the global optimum is at least

n

1

2 −

ln k

4 pn 2 −

ln k

pn

.

Other Example Applications


26/53


Expected runtime of EA with bit-wise mutation rate χ/n

Selection Mechanism High Selective Pressure

Fitness Proportionate ν > f max ln(2eχ)

Linear Ranking η > eχ

k-Tournament k > eχ(µ, λ) λ > µeχ

Cellular EAs

OneMax O(nλ2)

LeadingOnes O(nλ2 + n2)

Linear Functions O(nλ2 + n2)r-Unimodal O(rλ2 + nr)

Jumpr O(nλ2 + (n/χ)r)



27/53


Expected runtime of EA with bit-wise mutation rate χ/n

Selection Mechanism High Selective Pressure Low Selective Pressure

Fitness Proportionate ν > f max ln(2eχ) ν < χ/ ln 2 and λ ≥ n3

Linear Ranking η > eχ η < eχ

k-Tournament k > eχ k < eχ(µ, λ) λ > µeχ λ < µeχ

Cellular EAs ∆(G) < eχ

OneMax O(nλ2) eΩ(n)

LeadingOnes O(nλ2 + n2) eΩ(n)

Linear Functions O(nλ2 + n2) eΩ(n)

r-Unimodal O(rλ2 + nr) eΩ(n)

Jumpr O(nλ2 + (n/χ)r) eΩ(n)

Markov Chain Analysis often Difficult


28/53

Markov Chain Analysis often Difficult

Drift Analysis: Long-term behaviour of X from ∆


29/53

Drift Analysis: Long term behaviour of X from ∆

Theorem (Positive drift)

If exists δ > 0 st for all t ≥ 0

E [∆t | 0 < g(X t)] ≥ δ

...

then E [T ] ≤ smax/δ .

Theorem (Negative drift)

If exists δ > 0 st for all t ≥ 0

E [∆t | 0 < g(X t) < s] ≤ −δ

...

then E [T ] ≥ ecs

[Hajek, 1982, Oliveto and Witt, 2010, He and Yao, 2001]

State Aggregation Problem


30/53

State Aggregation Problem

How to come up with an appropriate distance function g?

Should reflect the progress of the heuristic, given its state.

Often hard to find for single-individual heuristics.

Highly non-trivial to find for population-based heuristics.

A new Approach for Finite Populations


31/53


Fitness Levels (upper bounds)

Concentration of measure

Drift analysis



32/53


Fitness Levels (upper bounds)

Concentration of measure

Drift analysis

Population Drift (lower bounds)

Branching processes

Drift analysis

Population Drift


33/53

p

Central Parameters Reproductive rate

α0 = max1≤ j≤λ

E [#offspring from parent j],

Drift of variation operator

X t+1∼ pmut(X t)

∆mut= g(X t) − g(X t+1)

Population Drift


34/53

p





X t+1∼ pmut(X t)

∆mut= g(X t) − g(X t+1)

Population Drift


35/53

p





X t+1∼ pmut(X t)

∆mut= g(X t) − g(X t+1)

Population Drift


36/53

p





X t+1∼ pmut(X t)

∆mut= g(X t) − g(X t+1)

Population Drift:1 Decoupling Selection & Variation


37/53

Population drift [Lehre, 2011b]

If there exists a κ > 0 such that

M ∆mut (κ) < 1/α0

where

∆mut= g(X t) − g(X t+1)

X t+1∼ pmut(X t)

and

α0 = maxj E [#offspring from parent j],

then the runtime is exponential.

1This slide only shows the main conditions of the theorems.

Population Drift:1 Decoupling Selection & Variation


38/53

Population drift [Lehre, 2011b]


M ∆mut (κ) < 1/α0

where

∆mut= g(X t) − g(X t+1)

X t+1∼ pmut(X t)

and

α0 = maxj E [#offspring from parent j],


Classical drift [Hajek, 1982]


M ∆(κ) < 1

where

∆ = h(P t) − h(P t+1),


1This slide only shows the main conditions of the theorems.

Population Drift - Implications


39/53

M ∆mut (κ) < 1

α0=⇒ Inefficient algorithm

High negative drift induced by the variation operator,must be compensated with high reproductive rate.

Analysis of algorithm can be decoupled into analyses of the drift of the variation operator ∆mut the reproductive rate of the selection mechanism α0

Feasible to analyse highly complex processes!

Proof Idea


40/53

Population drift as a multi-type branching process.

The Perron root of the mean matrix satisfies

∀κ


41/53

Heuristics

RuntimeAnalysis

PerformanceGuarantees

Explicit ProblemStructure

HeuristicUnderstanding

Directions for Further Work


42/53

Heuristics

IndustrialProblems

ProblemCharacterisation

RuntimeAnalysis

PerformanceGuarantees

ProblemInsight

HeuristicUnderstanding

DesignGuidelines

Combine Runtime Analysis with Fitness Landscape Theory “Heuristic Understanding” cluster

“Systems to Build Systems” cluster

Conclusion


43/53

Strong empirical evidence for the benefit of meta-heuristics Recently, it has become possible to prove mathematical

statements that describe the relationship between

a) Problem characteristics & Parameters of the meta-heuristicb) The expected runtime of the meta-heuristic

Progress made possible by appropriate analytical tools Population drift theorem

Questions?


44/53

References I


45/53

Baswana, S., Biswas, S., Doerr, B., Friedrich, T., Kurur, P. P., and Neumann, F.(2009).

Computing single source shortest paths using single-objective fitness.In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms , pages 59–66, New York, NY, USA. ACM.

Doerr, B., Klein, C., and Storch, T. (2007).Faster evolutionary algorithms by superior graph representation.In Proceedings of the 1st IEEE Symposium on Foundations of Computational Intelligence (FOCI 2007), pages 245–250.

Droste, S. (2006).A rigorous analysis of the compact genetic algorithm for linear functions.Natural Computing , 5(3):257–283.

Droste, S., Jansen, T., and Wegener, I. (2002).On the analysis of the (1+1) Evolutionary Algorithm.

Theoretical Computer Science , 276:51–81.

Droste, S., Jansen, T., and Wegener, I. (2006).Upper and lower bounds for randomized search heuristics in black-boxoptimization.Theory of Computing Systems , 39(4):525–544.

References II


46/53

Dubhashi, D. and Panconesi, A. (2009).Concentration of Measure for the Analysis of Randomized Algorithms .Cambridge University Press.

Friedrich, T., Hebbinghaus, N., Neumann, F., He, J., and Witt, C. (2007).Approximating covering problems by randomized search heuristics usingmulti-objective models.In Proceedings of the 9th annual conference on Genetic and evolutionary computation (GECCO 2007), pages 797–804, New York, NY, USA. ACM Press.

Giel, O. and Wegener, I. (2003).Evolutionary algorithms and the maximum matching problem.In Proceedings of the 20th Annual Symposium on Theoretical Aspects of Computer Science (STACS 2003), pages 415–426.

Hajek, B. (1982).Hitting-time and occupation-time bounds implied by drift analysis with

applications.Advances in Applied Probability , 14(3):502–525.

He, J. and Yao, X. (2001).Drift analysis and average time complexity of evolutionary algorithms.Artificial Intelligence , 127(1):57–85.

References III


47/53

He, J. and Yao, X. (2003).Towards an analytic framework for analysing the computation time of evolutionary algorithms.

Artificial Intelligence , 145(1-2):59–97.

Jansen, T., Jong, K. A. D., and Wegener, I. (2005).On the choice of the offspring population size in evolutionary algorithms.Evolutionary Computation, 13(4):413–440.

Kötzing, T., Lehre, P. K., Neumann, F., and Oliveto, P. S. (2010).

Ant colony optimization and the minimum cut problem.In Proceedings of the 12th annual conference on Genetic and evolutionary computation (GECCO 2010), pages 1393–1400, New York, NY, USA. ACM.

Kratsch, S., Lehre, P. K., Neumann, F., and Oliveto, P. S. (2011).Fixed parameter evolutionary algorithms and maximum leaf spanning trees: Amatter of mutations.

In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume6238 of LNCS , pages 204–213. Springer Berlin / Heidelberg.

Lehre, P. K. (2011a).Fitness-levels for non-elitist populations.To appear in Proceedings of 2011 Genetic and Evolutionary ComputationConference (GECCO 2011).

References IV


48/53

Lehre, P. K. (2011b).Negative drift in populations.In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume

6238 of LNCS , pages 244–253. Springer Berlin / Heidelberg.Lehre, P. K. and Haddow, P. C. (2006).Accessibility and runtime between convex neutral networks.In Wang, T.-D., Li, X., Chen, S.-H., Wang, X., Abbass, H. A., Iba, H., Chen, G.,and Yao, X., editors, SEAL, volume 4247 of Lecture Notes in Computer Science ,pages 734–741. Springer.

Lehre, P. K. and Yao, X. (2007).Runtime analysis of (1+1) EA on computing unique input output sequences.In Proceedings of 2007 IEEE Congress on Evolutionary Computation(CEC 2007), pages 1882–1889. IEEE Press.

Lehre, P. K. and Yao, X. (2009).On the impact of the mutation-selection balance on the runtime of evolutionary

algorithms.In Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms (FOGA 2009), pages 47–58, New York, NY, USA. ACM.

Motwani, R. and Raghavan, P. (1995).Randomized Algorithms .Cambridge University Press.

References V


49/53

Mühlenbein, H. (1992).How genetic algorithms really work I. Mutation and Hillclimbing.

In Proceedings of the Parallel Problem Solving from Nature 2, (PPSN-II), pages15–26. Elsevier.

Mühlenbein, H. (1997).The equation for response to selection and its use for prediction.Evoluationary Computation, 5(3):303–346.

Neumann, F. and Wegener, I. (2007).

Randomized local search, evolutionary algorithms, and the minimum spanningtree problem.Theoretical Computer Science , 378(1):32–40.

Neumann, F. and Witt, C. (2006).Runtime analysis of a simple ant colony optimization algorithm.In Proceedings of The 17th International Symposium on Algorithms and

Computation (ISAAC 2006), number 4288 in LNCS, pages 618–627.

Neumann, F. and Witt, C. (2008).Ant colony optimization and the minimum spanning tree problem.In Proceedings of Learning and Intelligent Optimization (LION 2008), pages153–166.

References VI


50/53

Oliveto, P. and Witt, C. (2010).Simplified drift analysis for proving lower bounds inevolutionary computation.Algorithmica, pages 1–18.10.1007/s00453-010-9387-z.

Oliveto, P. S., He, J., and Yao, X. (2007).Evolutionary algorithms and the vertex cover problem.In In Proceedings of the IEEE Congress on Evolutionary Computation(CEC 2007).

Reichel, J. and Skutella, M. (2008).Evolutionary algorithms and matroid optimization problems.Algorithmica.

Scharnow, J., Tinnefeld, K., and Wegener, I. (2002).Fitness landscapes based on sorting and shortest paths problems.In Proceedings of 7th Conf. on Parallel Problem Solving from Nature

(PPSN–VII), number 2439 in LNCS, pages 54–63.

Storch, T. (2006).How randomized search heuristics find maximum cliques in planar graphs.In Proceedings of the 8th annual conference on Genetic and evolutionary computation (GECCO 2006), pages 567–574, New York, NY, USA. ACM Press.

References VII


51/53

Wegener, I. and Witt, C. (2005).On the analysis of a simple evolutionary algorithm on quadratic pseudo-booleanfunctions.Journal of Discrete Algorithms , 3(1):61–78.

Witt, C. (2005).Worst-case and average-case approximations by simple randomized searchheuristics.In In Proceedings of the 22nd Annual Symposium on Theoretical Aspects of Computer Science (STACS 05), number 3404 in LNCS, pages 44–56.

Witt, C. (2006).Runtime Analysis of the (µ + 1) EA on Simple Pseudo-Boolean Functions.Evolutionary Computation, 14(1):65–86.

Zarges, C. (2009).

On the utility of the population size for inversely fitness proportional mutationrates.In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms , pages 39–46, New York, NY, USA. ACM.

Drift Analysis - Upper bounds


52/53

0 B

g(X t+1

) g(X t)

∆

Theorem ([He and Yao, 2001])

Given a stochastic process X t≥0 ∈ Ω and g : Ω → R+0 .Define T to be the first time t such that g(X t) = 0.If there exists a constant D > 0 such that ∀t ≥ 0

1. Pr [g(X t) < B] = 1, and

2. E

[g(X t) − g(X t+1) | T > t] ≥ D,then

E [T ] ≤ B

D.

Negative Drift for Populations

Theorem ([Lehre 2011b])


53/53

Theorem ([Lehre, 2011b])Given the Population Selection-Variation Algorithm with

transition matrix pmut over search space Ω and distance function g : Ω → N+

runtime T := min{t ≥ 0 | g(P 0) ≥ b and g(P t) < a}, b − a = Ω(n)

if there exists a κ > 0, and constants α0, δ1, δ2 > 0, st for all t ≥ 0,

1. E [Rt(i) | a < g(P t(i)) < b] ≤ α0 for all i, 1 ≤ i ≤ λ,

2. E

e−κ∆t(i) | a < g(X t) < b

< 1

α0(1 +

δ1)

3. E

e−κ(g(Xt+1)−b) | g(X t) ≥ b

= O(1)

4. ∀ 1 ≤ + k ≤ j, Pr[∆t(i) = − ∧ ∆t+1(i − ) = −k]

Pr[∆t(i) = − − k] ≤ eκ(b−a)(1−δ2)

5. ∀ 1 ≤ + k ≤ j, Pr[∆t(i) = − j]

Pr[∆t(i − k) = −]

= O(1)

where

Rt(i) := λ

j=1[I t( j) = i] is the number of offspring from individual i

X t≥0 is the Markov process on Ω associated with pmut , and

∆t(i) := (g(X t+1) − g(X t) | g(X t) = i),

then the runtime satisfies Pr [T ≤ ecn] = e−Ω(n), for some constant c > 0.

Documents

Pkl Seminar