Introduction to Genetic Algorithmsretis.sssup.it/~giorgio/slides/neural/np11-nesti3.pdf · 2021. 3....

Preview:

Citation preview

Introduction to Genetic Algorithms

Federico Nesti, f.nesti@santannapisa.it

• Introduction

• Working principles

• Tuning

• Applications of genetic algorithms

Outline

• Introduction

• Working principles

• Tuning

• Applications of genetic algorithms

Outline

A group of bacteria is fighting for food. First come, first served.

A swimming competition

How would you design the best swimmer? You can select the parameters 𝜃 including:

• Length of the flagella

• Dimension of the head

• Color of flagella

… and imagine you can evaluate the function

𝑓: 𝜃 → ℝ, that is the swimming time.

A swimming competition

We are looking for the optimal bacterium (aka the optimal parameters)

𝜃∗ = argmin 𝑓(𝜃)

…or an optimization problem?

𝑓 𝜃

𝜃

Mathematical optimization is the process of finding the best solution withrespect to a certain criterion (cost function).

Optimization methods

E.g., find the minimum of the function above.

How would you do that?

There are LOTS of optimization methods.

Just to list a few:

• Convex optimization

• Non-linear optimization

• Stochastic optimization (e.g., SGD)

• Dynamic programming

• Heuristics

• … and many more!

Optimization methods

Now, think that our function 𝑓 is a highly nonlinear and non-convex function, in many dimensions.

Optimization methods

How would you find the absoluteminimum of such a function?

Genetic Algorithms

Gradient descent is fine, but you willeventually get stuck in a local minimum.

Other optimization methods (e.g.,simulated annealing) are ok, but thelocal minimum problem is still there.

Genetic algorithms might managethis problem!

Genetic algorithms are a broad set of heuristic optimization methods.

Genetic Algorithms

Evolutionary strategies (ES)

Evolutionary programming (EP)

Genetic algorithms (GA)

Genetic Programming (GP)

✓ Most used.

✓ Other approaches are converging towards GAs.

✓ Easily applicable.

Biological inspiration of GAs

• Introduction

• Working principles

• Tuning

• Applications of genetic algorithms

Outline

Genetic AlgorithmsStep 1 - Initialization

The population is initialized with N random individuals 𝑥1, … , 𝑥𝑁.

Genetic AlgorithmsStep 1 - Initialization

The population is initialized with N random individuals 𝑥1, … , 𝑥𝑁.

Genetic AlgorithmsStep 2 – Fitness Evaluation

The fitness 𝑓 of each individual 𝑥𝑖 is evaluated and solutions are ranked.

Genetic AlgorithmsStep 2 – Fitness Evaluation

The fitness 𝑓 of each individual 𝑥𝑖 is evaluated and solutions are ranked.

The worst solutions are discarded from the population.

3

12

Genetic Algorithms

Step 3 – Reproduction/recombination

The survived individuals reproduce and their genes are recombinedin the offspring to refill the population.

Parents

Offsprings

Genetic Algorithms

Step 4 – Mutation

Stochastically, a mutation might occur in one or more of the individuals.

Genetic Algorithms

Iterate steps 2-4: a Generation iscomposed of evaluation, survival, reproduction and mutation.

Initialization

Ranking and survival

Reproduction

Mutation

Pros and cons

• Introduction

• Working principles

• Tuning

• Applications of genetic algorithms

Outline

Recap of GAs

Recap of GAs

Tuning of GAs Crucial hyperparameters

Gene encoding

Initialization

Selection

Crossover

Mutation

Termination

Binary encoding

(e.g., the rucksack problem)

Gene encoding

Encoding strictly depends on the type of problem.

Real encoding

Permutation of elements

(traveling salesman/scheduling)

Mixed encoding (custom problems)

Starting population is fundamental. Number of individuals andrandomness should ensure population diversity and a correctsampling of the search space.

Population Initialization

Increasing the number of individuals influences the convergence and computational time!

Need for a tradeoff.

Once the fitness of the population has been evaluated, how to selectthe survivors?

Selection

The ratio of survived and extinct is somehowrelated to the concept of exploration andexploitation.

Once the fitness of the population has been evaluated, how to selectthe survivors?

Selection

If almost every individual survives we areencouraging the algorithm to explore also not-so-fit solutions. This could slow convergencedown

Once the fitness of the population has been evaluated, how to selectthe survivors?

Selection

Conversely, when keeping only the very bestsolutions (elitism), we are exploiting the highfitness of a few solutions. This could lead toinstability, especially when the problem is highlyrandom.

SelectionSo, as always, a tradeoffshould be considered. Typically,50-75% of survivors is ok.Sometimes stochastic selectionis considered.

Selection

The same is true for the fraction ofsurvivors, offsprings and mutatedindividuals.

These hyperparameters influence theconvergence rate (i.e., could becompared to the learning rate).

Two good solutions are combined to generate offsprings that exploit the(hopefully) good genes of the parents. Parents can be chosen uniformlyrandom or proportionally to fitness score.

Crossover type (could use several in the same problem) depends onencoding and problem.

Crossover

Crossover

Crossover

Local operation that causes a random movement in the searchspace, inherently stochastic. Similar to exploration rate inreinforcement learning.

Different methods, depending on encoding and problem.

Mutation

Many different ways:

• Number of generations

• Achieved satisfactory fitness

• Population diversity

• Small improvements

Termination

• Introduction

• Working principles

• Tuning

• Applications of genetic algorithms

Outline

Applications of EAs

Computer-Aided Design

https://www.youtube.com/watch?v=aR5N2Jl8k14&t=207s

Applications of EAs

Evolvable Electronics

A. Thompson, “An Evolved Circuit, Intrinsic in Silicon, Entwined with Physics” (1996) https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.9691

Applications of EAs

Molecular Design

Applications of EAs

…and many more!

• Image Processing

• Climatology

• Finance and Economics

• Social Sciences

• Quality Control

• Scheduling

T. Alam, “Genetic Algorithm: Reviews, Implementations, and Applications”

https://www.preprints.org/manuscript/202006.0028/v1

Other approaches

How to combine these two approaches?

Hard Computing:• Theoretically sound• Robust• Hardly adaptable/flexible• Interpretable

Soft Computing:• Data-driven• No guarantees of stability• Very adaptable/flexible• Black-box (non-interpretable)

Other approaches

Isolated: used alternatively, with a «judge» deciding which one to use

example: following slides

Parallel: used together, merging the outputs

example: voting, average response, or sum (linear controller + neural network compensating the non-linearities)

+

Other approaches

Cascaded: HC for pre-processing, SC as main algorithm.

example: HC for feature extraction in an image (e.g., SIFT features), SC for classification (MLP).

Assisted/Designed: SC used to refine parameters of the HC

example: following slides

Genetic Algorithms can be combined with other computing units, also other soft computing units.

For instance, a genetic algorithm can be used to tune the hyperparameters of the training process of a ML model, or to find the best rule set of a fuzzy system.

Genetic Algorithms for Control

It is possible to use Genetic Algorithms to optimize the weights of a network without computing gradients: population is a set of randomizedweights Θ = {𝜃𝑖 , 𝑖 = 1,… ,𝑁}

Genetic Algorithms are gradient-free methods, since it is sufficient to evaluate and rank solutions in the population to advance optimization.

Critical hyperparameters become: population initialization, fractionof survivors, rate and scope of mutation, kind of recombination.

Genetic Algorithms for Control

EAs for Control

Neural Network(parameters 𝜃𝑖)

Observ

ations,

rew

ard

Actio

ns

𝑓 𝜃𝑖 =

𝑡

𝑅𝑡,𝑖 ,

Reward

Ranking

Fitness

Survival, Mutation,

Recombination

Run episode(s) for each individual 𝜃𝑖

Run for M generations

There are many Genetic algorithms, and are classified in lots of different categories. Some of them are intuitivelydescribed in this great blog post.

In this lecture we are going to use only CMA-ES (Covariance Matrix Adaptation – Evolutionary Strategy), one of the most popular Evolutionary Algorithms. For full implementation details, refer to this tutorial.

Evolutionary Algorithms

1) Initialize parameters 𝜇0, 𝜎02 ∈ ℝ𝑛 , 𝑛 number of parameters

2) Initialize population by sampling a multivariate Gaussian𝑥𝑖,0~𝒩 𝜇0, 𝜎0

2 , 𝑖 = 1, … , 𝑁

3) Evaluate and rank solutions. Keep only the best 𝑁𝑏𝑒𝑠𝑡4) Compute survivors’ mean 𝜇𝑔+1 and variance 𝜎𝑔+1

2

5) Sample new individuals from multivariate Gaussian𝑥𝑖,𝑔+1~𝒩(𝜇𝑔+1, 𝜎𝑔+1

2 )

Covariance Matrix Adaptation

Iterate for M generations

Covariance Matrix Adaptation

1) Initialize parameters 𝜇0 , 𝜎02

2) Initialize population by sampling a multivariate Gaussian

𝑥𝑖,0~𝒩(𝜇0 , 𝜎02)

3) Evaluate and rank solutions. Keep only the best 𝑁𝑏𝑒𝑠𝑡

4) Compute survivors’ mean 𝜇𝑔+1 and variance 𝜎𝑔+12

5) Sample new individuals from multivariate Gaussian𝑥𝑖,𝑔+1~𝒩(𝜇𝑔+1 , 𝜎𝑔+1

2 )

Iterate for M generations

1) Initialize parameters 𝜇0 , 𝜎02

2) Initialize population by sampling a multivariate Gaussian

𝑥𝑖,0~𝒩(𝜇0 , 𝜎02)

3) Evaluate and rank solutions. Keep only the best 𝑁𝑏𝑒𝑠𝑡

4) Compute survivors’ mean 𝜇𝑔+1 and variance 𝜎𝑔+12

5) Sample new individuals from multivariate Gaussian𝑥𝑖,𝑔+1~𝒩(𝜇𝑔+1 , 𝜎𝑔+1

2 )

Covariance Matrix Adaptation

Iterate for M generations

1) Initialize parameters 𝜇0 , 𝜎02

2) Initialize population by sampling a multivariate Gaussian

𝑥𝑖,0~𝒩(𝜇0 , 𝜎02)

3) Evaluate and rank solutions. Keep only the best 𝑁𝑏𝑒𝑠𝑡

4) Compute survivors’ mean 𝜇𝑔+1 and variance 𝜎𝑔+12

5) Sample new individuals from multivariate Gaussian𝑥𝑖,𝑔+1~𝒩(𝜇𝑔+1 , 𝜎𝑔+1

2 )

Covariance Matrix Adaptation

Iterate for M generations

1) Initialize parameters 𝜇0 , 𝜎02

2) Initialize population by sampling a multivariate Gaussian

𝑥𝑖,0~𝒩(𝜇0 , 𝜎02)

3) Evaluate and rank solutions. Keep only the best 𝑁𝑏𝑒𝑠𝑡

4) Compute survivors’ mean 𝜇𝑔+1 and variance 𝜎𝑔+12

5) Sample new individuals from multivariate Gaussian𝑥𝑖,𝑔+1~𝒩(𝜇𝑔+1 , 𝜎𝑔+1

2 )

Covariance Matrix Adaptation

Iterate for M generations

Reinforcement Learning Car

LIDAR data

𝑑Acceleration,

SteerShallow NN

𝑑8 = 0.5

𝑑1 = 1

𝑑2 = 0.7 𝑑3 = 0.5 𝑑4 = 0.7

𝑑5 = 1

𝑑6 = 0.5𝑑7 = 0.3

𝑑 =

10.70.50.710.50.30.5

CMA-ES for Car Control

No obstacles60 generations20 individualsKeep 10 best

CMA-ES for Car Control

With obstacles.

Same hyperp.Reward +1 per

obstacle passed,-100 per collision

CMA-ES for obstacle avoidance

CMA-ES for pendulum control

Won 2nd prize at Huawei University

Challenge

Evolution of fuzzy systems

Evolution of fuzzy systems

Genetic Algorithms promising for AI, since

• The search for the optimal solution is not donesequentially, but «in parallel» for each differentsolution. This allows a much more broader explorationand could lead to better solutions.

• There is no need to have a strong mathematicalbackground to optimize a complex problem!

• Tuning of hyperparameters is more «intuitive» than with neural networks

Conclusions