LIACS Natural Computing Group Leiden University Evolution Strategies

Embed Size (px)

Citation preview

  • Slide 1

LIACS Natural Computing Group Leiden University Evolution Strategies Slide 2 LIACS Natural Computing Group Leiden University2 Overview Introduction: Optimization and EAs Evolution Strategies Further Developments Multi-Criteria Optimization Mixed-Integer ES Soft-Constraints Response Surface Approximation Examples Slide 3 LIACS Natural Computing Group Leiden University3 Background I Biology = Engineering (Daniel Dennett) Slide 4 LIACS Natural Computing Group Leiden University4 Introduction Modeling Simulation Optimization ! ! ! ??? ! ! !???! ! ! ???! ! ! Input: Known (measured)Output: Known (measured)Interrelation: UnknownInput: Will be given Model: Already exists How is the result for the input? Objective: Will be given How (with which parameter settings) to achieve this objective? Slide 5 LIACS Natural Computing Group Leiden University5 Simulation vs. Optimization Simulator what happens if? Result Trial & Error Simulator... how do I achieve the best result? Optimal Result Maximization / Minimization If so, multiple objectives Optimizer Slide 6 LIACS Natural Computing Group Leiden University6 Introduction: Optimization Evolutionary Algorithms Slide 7 LIACS Natural Computing Group Leiden University7 Optimization f : objective function High-dimensional Non-linear, multimodal Discontinuous, noisy, dynamic M M 1 M 2 ... M n heterogeneous Restrictions possible over Good local, robust optimum desired Realistic landscapes are like that! Global Minimum Local, robust optimum Slide 8 LIACS Natural Computing Group Leiden University8 Illustrative Example: Optimize Efficiency Initial: Evolution: 32% Improvement in Efficiency ! Optimization Creating Innovation Slide 9 LIACS Natural Computing Group Leiden University9 Dynamic Optimization Dynamic Function 30-dimensional 3D-Projection Slide 10 LIACS Natural Computing Group Leiden University10 Classification of Optimization Algorithms Direct optimization algorithm: Evolutionary Algorithms First order optimization algorithm: e.g., gradient method Second order optimization algorithm: e.g., Newton method Slide 11 LIACS Natural Computing Group Leiden University11 Iterative Optimization Methods General description: Actual PointNew PointDirectional vectorStep size (scalar) At every Iteration: Choose direction Determine step size Direction: Gradient Random Step size: 1-dim. optimization Random Self-adaptive Slide 12 LIACS Natural Computing Group Leiden University12 Global convergence with probability one: General, but for practical purposes useless Convergence velocity: Local analysis only, specific (convex) functions The Fundamental Challenge Slide 13 LIACS Natural Computing Group Leiden University13 Global convergence (with probability 1): General statement (holds for all functions) Useless for practical situations: Time plays a major role in practice Not all objective functions are relevant in practice Theoretical Statements Slide 14 LIACS Natural Computing Group Leiden University14 x1x1 x2x2 f(x 1,x 2 ) (x* 1,x* 2 ) f(x* 1,x* 2 ) An Infinite Number of Pathological Cases ! NFL-Theorem: All optimization algorithms perform equally well iff performance is averaged over all possible optimization problems. Fortunately: We are not Interested in all possible problems Slide 15 LIACS Natural Computing Group Leiden University15 Convergence velocity: Very specific statements Convex objective functions Describes convergence in local optima Very extensive analysis for Evolution Strategies Theoretical Statements Slide 16 LIACS Natural Computing Group Leiden University Evolution Strategies Slide 17 LIACS Natural Computing Group Leiden University17 Model-Optimization-Action Evaluation OptimizerBusiness Process Model Simulation FunctionModel from Data ExperimentSubjectiveFunction(s) Slide 18 LIACS Natural Computing Group Leiden University18 Evolutionary Algorithms Taxonomy Evolutionary Algorithms Evolution Strategies Genetic Algorithms Other Evolutionary Progr. Differential Evol. GP PSO EDA Real-coded Gas Mixed-integer capabilities Emphasis on mutation Self-adaptation Small population sizes Deterministic selection Developed in Germany Theory focused on convergence velocity Discrete representations Emphasis on crossover Constant parameters Larger population sizes Probabilistic selection Developed in USA Theory focused on schema processing Slide 19 LIACS Natural Computing Group Leiden University19 Generalized Evolutionary Algorithm t := 0; initialize(P(t)); evaluate(P(t)); while not terminate do P(t) := mating_selection(P(t)); P(t) := variation(P(t)); evaluate(P(t)); P(t+1) := environmental_selection(P(t) Q); t := t+1; od Slide 20 LIACS Natural Computing Group Leiden University20 Evolution Strategy Basics Mostly real-valued search space IR n also mixed-integer, discrete spaces Emphasis on mutation n -dimensional normal distribution expectation zero Different recombination operators Deterministic selection ( )-selection:Deterioration possible ( )-selection:Only accepts improvements , i.e.: Creation of offspring surplus Self-adaptation of strategy parameters. Slide 21 LIACS Natural Computing Group Leiden University21 Representation of search points Simple ES with 1/5 success rule: Exogenous adaptation of step size Mutation: N(0, ) Self-adaptive ES with single step size: One controls mutation for all x i Mutation: N(0, ) Slide 22 LIACS Natural Computing Group Leiden University22 Representation of search points Self-adaptive ES with individual step sizes: One individual i per x i Mutation: N i (0, i ) Self-adaptive ES with correlated mutation: Individual step sizes One correlation angle per coordinate pair Mutation according to covariance matrix: N(0, C) Slide 23 LIACS Natural Computing Group Leiden University23 Evolution Strategy: Algorithms Mutation Slide 24 LIACS Natural Computing Group Leiden University24 Operators: Mutation one Self-adaptive ES with one step size: One controls mutation for all x i Mutation: N(0, ) Individual before mutationIndividual after mutation1.: Mutation of step sizes 2.: Mutation of objective variables Here the new is used! Slide 25 LIACS Natural Computing Group Leiden University25 Operators: Mutation one Thereby 0 is the so-called learning rate Affects the speed of the -Adaptation 0 bigger: faster but more imprecise 0 smaller: slower but more precise How to choose 0 ? According to recommendation of Schwefel*: *H.-P. Schwefel: Evolution and Optimum Seeking, Wiley, NY, 1995. Slide 26 LIACS Natural Computing Group Leiden University26 Operators: Mutation one Position of parents (here: 5) Offspring of parent lies on the hyper sphere (for n > 10); Position is uniformly distributed Contour lines of objective function Slide 27 LIACS Natural Computing Group Leiden University27 Pros and Cons: One Advantages: Simple adaptation mechanism Self-adaptation usually fast and precise Disadvantages: Bad adaptation in case of complicated contour lines Bad adaptation in case of very differently scaled object variables -100 < x i < 100 and e.g. -1 < x j < 1 Slide 28 LIACS Natural Computing Group Leiden University28 Operators: Mutation individual i Self-adaptive ES with individual step sizes: One i per x i Mutation: N i (0, i ) Individual before MutationIndividual after Mutation1.: Mutation of individual step sizes 2.: Mutation of object variables The new individual i are used here! Slide 29 LIACS Natural Computing Group Leiden University29 Operators: Mutation individual i , are learning rates, again : Global learning rate N(0,1): Only one realisation : local learning rate N i (0,1): n realisations Suggested by Schwefel*: *H.-P. Schwefel: Evolution and Optimum Seeking, Wiley, NY, 1995. Slide 30 LIACS Natural Computing Group Leiden University30 Position of parents (here: 5) Offspring are located on the hyperellipsoid (fr n > 10); position equally distributed. Contour lines Operators: Mutation individual i Slide 31 LIACS Natural Computing Group Leiden University31 Pros and Cons: Individual i Advantages: Individual scaling of object variables Increased global convergence reliability Disadvantages: Slower convergence due to increased learning effort No rotation of coordinate system possible Required for badly conditioned objective function Slide 32 LIACS Natural Computing Group Leiden University32 Operators: Correlated Mutations Individual before mutationIndividual after mutation1.: Mutation of Individual step sizes 2.: Mutation of rotation angles New convariance matrix C used here! Self-adaptive ES with correlated mutations: Individual step sizes One rotation angle for each pair of coordinates Mutation according to covariance matrix: N(0, C) 3.: Mutation of object variables Slide 33 LIACS Natural Computing Group Leiden University33 Operators: Correlated Mutations Interpretation of rotation angles ij Mapping onto convariances according to x1x1 x2x2 11 22 12 Slide 34 LIACS Natural Computing Group Leiden University34 Operators: Correlated Mutation , , are again learning rates as before = 0,0873 (corresponds to 5 degree) Out of boundary correction: Slide 35 LIACS Natural Computing Group Leiden University35 Correlated Mutations for ES Position of parents (hier: 5) Offspring is located on the Rotatable hyperellipsoid (for n > 10); position equally distributed. Contour lines Slide 36 LIACS Natural Computing Group Leiden University36 Operators: Correlated Mutations How to create ? Multiplication of uncorrelated mutation vector with n(n-1)/2 rotational matrices Generates only feasible (positiv definite) correlation matrices Slide 37 LIACS Natural Computing Group Leiden University37 Operators: Correlated Mutations Structure of rotation matrix Slide 38 LIACS Natural Computing Group Leiden University38 Operators: Correlated Mutations Implementation of correlated mutations nq:= n(n-1)/2; for i:=1 to n do u[i] := [i] * N i (0,1); for k:=1 to n-1 do n1 := n-k; n2 := n; for i:=1 to k do d1 := u[n1]; d2:= u[n2]; u[n2] := d1*sin( [nq])+ d2*cos( [nq]); u[n1] := d1*cos( [nq])- d2*sin( [nq]); n2 := n2-1; nq := nq-1; od Generation of the uncorrelated mutation vector Rotations Slide 39 LIACS Natural Computing Group Leiden University39 Pros and Cons: Correlated Mutations Advantages: Individual scaling of object variables Rotation of coordinate system possible Increased global convergence reliability Disadvantages: Much slower convergence Effort for mutations scales quadratically Self-adaptation very inefficient Slide 40 LIACS Natural Computing Group Leiden University40 Operators: Mutation Addendum Generating N(0,1)-distributed random numbers? If w > 1 Slide 41 LIACS Natural Computing Group Leiden University41 Evolution Strategy: Algorithms Recombination Slide 42 LIACS Natural Computing Group Leiden University42 Operators: Recombination Only for > 1 Directly after Secektion Iteratively generates offspring: for i:=1 to do choose recombinant r1 uniformly at random from parent_population; choose recombinant r2r1 uniformly at random from parent population; offspring := recombine(r1,r2); add offspring to offspring_population; od Slide 43 LIACS Natural Computing Group Leiden University43 Operators: Recombination How does recombination work? Discrete recombination: Variable at position i will be copied at random (uniformly distr.) from parent 1 or parent 2, position i. Parent 1Parent 2Offspring Slide 44 LIACS Natural Computing Group Leiden University44 Operators: Recombination Intermediate recombination: Variable at position i is arithmetic mean of Parent 1 and Parent 2, position i. Parent 1Parent 2Offspring Slide 45 LIACS Natural Computing Group Leiden University45 Operators: Recombination Parent 1Parent 2Offspring Parent Global discrete recombination: Considers all parents Slide 46 LIACS Natural Computing Group Leiden University46 Operators: Recombination Parent 1Parent 2Offspring Parent Global intermediary recombination: Considers all parents Slide 47 LIACS Natural Computing Group Leiden University47 Evolution Strategy Algorithms Selection Slide 48 LIACS Natural Computing Group Leiden University48 Operators: ( )-Selection ( )-Selection means: parents produce offspring by (Recombination +) Mutation These individuals will be considered together The best out of will be selected (survive) Deterministic selection This method guarantees monotony Deteriorations will never be accepted = Actual solution candidate= New solution candidateRecombination may be left out Mutation always exists! Slide 49 LIACS Natural Computing Group Leiden University49 Operators: ( )-Selection ( )-Selection means: parents produce offspring by (Recombination +) Mutation offspring will be considered alone The best out of offspring will be selected Deterministic selection The method doesnt guarantee monotony Deteriorations are possible The best objective function value in generation t+1 may be worse than the best in generation t. Slide 50 LIACS Natural Computing Group Leiden University50 Operators: Selection Example: (2,3)-Selection Example: (2+3)-Selection Parents dont survive!Parents dont survive but a worse offspring. now this offspring survives. Slide 51 LIACS Natural Computing Group Leiden University51 Exception! Possible occurrences of selection (1+1)-ES: One parent, one offspring, 1/5-Rule (1, )-ES: One Parent, offspring Example: (1,10)-Strategy One step size / n self-adaptive step sizes Mutative step size control Derandomized strategy ( )-ES: > 1 parents, offspring Example: (2,15)-Strategy Includes recombination Can overcome local optima ( + )-strategies: elitist strategies Operators: Selection Slide 52 LIACS Natural Computing Group Leiden University52 Evolution Strategy: Self adaptation of step sizes Slide 53 LIACS Natural Computing Group Leiden University53 Self-adaptation No deterministic step size control! Rather: Evolution of step sizes Biology: Repair enzymes, mutator-genes Why should this work at all? Indirect coupling: step sizes progress Good step sizes improve individuals Bad ones make them worse This yields an indirect step size selection Slide 54 LIACS Natural Computing Group Leiden University54 Self-adaptation: Example How can we test this at all? Need to know optimal step size Only for very simple, convex objective functions Here: Sphere model Dynamic sphere model Optimum locations changes occasionally : Optimum Slide 55 LIACS Natural Computing Group Leiden University55 Self-adaptation: Example Objective function value and smallest step size measured in the population average Largest According to theory ff optimal step sizes Slide 56 LIACS Natural Computing Group Leiden University56 Self-adaptation Self-adaptation of one step size Perfect adaptation Learning time for back adaptation proportional n Proofs only for convex functions Individual step sizes Experiments by Schwefel Correlated mutations Adaptation much slower Slide 57 LIACS Natural Computing Group Leiden University57 Evolution Strategy: Derandomization Slide 58 LIACS Natural Computing Group Leiden University58 Derandomization Goals: Fast convergence speed Fast step size adaptation Precise step size adaptation Compromise convergence velocity convergence reliability Idea: Realizations of N(0, ) are important! Step sizes and realizations can be much different from each other Accumulates information over time Slide 59 LIACS Natural Computing Group Leiden University59 Derandomzed (1, )-ES Current parent: in generation g Mutation ( k=1,, ): Offspring k Global step size in generation g Individual step sizes in generation g Selection: Choice of best offspring Best of offspring in generation g Slide 60 LIACS Natural Computing Group Leiden University60 Derandomized (1, )-ES Accumulation of selected mutations: The particular mutation vector, which created the parent! Also: weighted history of good mutation vectors! Initialization: Weight factor: Slide 61 LIACS Natural Computing Group Leiden University61 Derandomized (1, )-ES Step size adaptation: Norm of vectorVector of absolute values Regulates adaptation speed and precision Slide 62 LIACS Natural Computing Group Leiden University62 Derandomized (1, )-ES Explanations: Normalization of average variations in case of missing selection (no bias): Correction for small n : 1/(5n) Learning rates: Slide 63 LIACS Natural Computing Group Leiden University63 Evolution Strategy: Rules of thumb Slide 64 LIACS Natural Computing Group Leiden University64 Some Theory Highlights Convergence velocity: For (1, )-strategies: For ( )-strategies (discrete and intermediary recombination): Problem dimensionality Speedup by is just logarithmic more processors are only to a limited extend useful to increase Genetic Repair Effect of recombination! Genetic Repair Effect of recombination! Slide 65 LIACS Natural Computing Group Leiden University65 For strategies with global intermediary recombination: Good heuristic for (1, ): 9.5219.03150 9.4118.82140 9.3018.60130 9.1818.36120 9.0518.10110 8.9117.82100 8.7517.5090 8.5717.1580 8.3716.7570 8.1416.2860 7.8715.7450 7.5315.0740 7.1014.2030 6.4912.9920 5.4510.9110 n Slide 66 LIACS Natural Computing Group Leiden University Mixed-Integer Evolution Strategies Slide 67 LIACS Natural Computing Group Leiden University67 Mixed-Integer Evolution Strategy Generalized optimization problem: Slide 68 LIACS Natural Computing Group Leiden University68 Mixed-Integer ES: Mutation Learning rates (global) Geometrical distribution Mutation Probabilities Slide 69 LIACS Natural Computing Group Leiden University69 Literature H.-P. Schwefel: Evolution and Optimum Seeking, Wiley, NY, 1995. I. Rechenberg: Evolutionsstrategie 94, frommann- holzboog, Stuttgart, 1994. Th. Bck: Evolutionary Algorithms in Theory and Practice, Oxford University Press, NY, 1996. Th. Bck, D.B. Fogel, Z. Michalewicz (Hrsg.): Handbook of Evolutionary Computation, Vols. 1,2, Institute of Physics Publishing, 2000.