36
Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Embed Size (px)

Citation preview

Page 1: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Genetic Programming

• Dinesh Dharme• Prateek Srivastav• Pankhil Chheda• Harshad Kanhere

Page 2: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Outline

• Introduction• Motivation• Theory• Symbolic Regression using GP• Applications• Summary• References

Page 3: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Inspiration

• Based on Darwin’s concept of “Survival of Fittest”

• Fitter individuals survive and reproduce at a higher rate.

• Structure of individuals in population changes because of natural selection.

Page 4: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

What is Genetic Programming?

• Part of larger discipline called Machine Learning. Machine learning can be best described as "the study of computer algorithms that improve automatically through experience" (Mitchell 1996).

• It attempts to solve the problem - How can computers be made to do what needs to be done without being told exactly how to do it?

• This is where the aspect of Artificial Intelligence comes into play.

Page 5: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Why Genetic Programming?

• It saves time by freeing the human from having to design complex algorithms. Not only designing the algorithms but creating ones that give optimal solutions.

• “It combines genetic algorithms with the basic thrust of AI, which was to get computers to do things automatically – to evolve a population of programs” - John R. Koza

Page 6: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Algorithm

• Randomly create an initial population of programs from the available primitives

• Repeat • Execute each program and ascertain its fitness• Select one or two program(s) from the population with a

probability based on fitness to participate in genetic operations

• Create new individual program(s) by applying genetic operations with specified probabilities

• until an acceptable solution is found or some other stopping condition is met

• return the best-so-far individual

Page 7: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Tree Structure

(+ 1 2 (IF (> TIME 10) 3 4))

• Programs are represented as tree.

Page 8: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Basics

• Terminal: The variables and constants in the program (x, y and 3) which forms leaves of the tree.

• Funtion: The arithmetic operations (+, * and max) are internal nodes called functions.

• Primitive set : Functions and terminals together from primitive set.

Page 9: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Initialization of population

• Full Method: Selecting internal nodes from function set untill maximum depth is reached. At the end, select node from terminal set. All leaves are at same level.

• Grow Method: Select nodes from the primitive set until maximum depth is reached.

• Ramped half-and-half: Half the initial population is constructed using full and half is constructed using grow. Depth limits of trees not fixed.

Page 10: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Initialization of population(Contd.)

• Initialization need not be entirely random.• If properties of the desired solution is known,

produce trees accordingly. This creates intial bias in population for faster convergence.

Page 11: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Selection Strategies

• Tournament Selection: Best individual from a set of randomly selected individuals is chosen to be parent.

• Advantage: It rescales fitness among the population. A best program doesn't populate the next generation with its children reducing diversity.

• Disadvantage: A small difference in fitness gets amplified in next generation. Even though it is just marginally better than its competitior.

Page 12: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Selection Strategies(Contd.)

• Fitness-Proportionate Selection: Select individuals based on their fitness values.

0.5

0.08

0.250.17

Page 13: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Breeding Next Generation

• Crossover, mutation and reproduction are main operators applied on parents to generate offsprings.

• Subtree Crossover: Create the offspring by replacing the subtree rooted at the crossover point in a copy of the first parent with a copy of the subtree rooted at the crossover point in the second parent.

• Reproduction: Simply copy an elite individual to next generation. Ensures passing of good genes.

Page 14: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Subtree Crossover

Page 15: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Crossover contd.

• Since most of the nodes of tree are leaves.Hence crossover results in replacement/exchange of small genetic material.

• John R. Koza suggested 90 times functions and 10 times terminals should be selected for crossover.

Page 16: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Mutation

• Subtree Mutation: Replace the mutation point by randomly generated tree.

• Point Mutation: Randomly select a node and replace it with another primitive of same arity.

Page 17: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Contraints on Operators:closure

• Type Consistency: Subtree crossover can mix and join nodes arbitrarily. Hence all functions used must be type consistent. i.e they return values of same type.

• Evaluation safety: Subtree after evaluation should produce a valid value to be used. For this, protected versions of functions used. (e.g. “/” with 2nd argument 0 gives 1 as its output).

Page 18: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Sufficiency

• Sufficiency means solution to the problem can be expressed through combinations of different primitives used.

• In general sufficiency cannot be guaranteed. In areas where theory is well developed, we can decide on primitives to be used.

Page 19: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Fitness Function

• Gives a measure of how good a computer program is at solving the given problem.

• Proper encoding of problem through fitness functions is important for GP's success.

• Measuring fitness of a given program means executing the program.

• Evaluation of program tree is done in depth first search order.

Page 20: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

GP Parameters

Control parameters:• Population Size, N• Maximum number of generations, G

max

• Probability Pc of crossover

• Probability Pm of mutation

• Probability Pr of reproduction

Page 21: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Five Preparatory steps to set up GP

• Determining the set T of terminals.• Determining the set F of functions.• Determining the fitness measures.• Determining the GP parameters.• Determining the Termination criteria and

result designation

Page 22: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Symbolic Regression using GP

• Symbolic regression: The process of mechanically creating a computer program that fit certain numerical data.

• We want to evolve an expression whose values match those of the quadratic polynomial x2 + x + 1 in the range [−1,1].

Page 23: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

SYMBOLIC REGRESSION

Independent variable X Dependent variable Y

-1.00 1.00-0.80 0.84-0.60 0.76-0.40 0.76-0.20 0.840.00 1.000.20 1.240.40 1.560.60 1.960.80 2.441.00 3.00

Page 24: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Preparatory Steps

Objective: Find a computer program with one input (independent variable X) whose output equals the given data

1 Terminal set: T = {X, Random-Constants}

2 Function set: F = {+, -, *, %}

3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from –1.0 to +1.0)

4 Parameters: Population size M = 4

5 Termination: An individual emerges whose sum of absolute errors is less than 0.1

Page 25: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Symbolic Regression (initialization)

• Two parental chromosomes exchange part of their genetic information to create new hybrid combinations (recombinant).

• No loss of genes, but an exchange of genes between two previous chromosomes.

• No new genes created, preexisting old ones mixed together.

Generation 0 with 4 individuals

Page 26: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Symbolic regression(fitness)

x + 1 x2 + 1 2 x

0.67 1.00 1.70 2.67

Page 27: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Symbolic Regression (Generation1)

Reproduction:Copy of (a)

Mutation of (c): Picking “2” as mutation point

First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points

Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points

Page 28: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Areas where GP will do well

• The interrelationships among the relevant variables is unknown or poorly understood.

• Finding the size and shape of the ultimate solution is a major part of the problem.

• Search domain is very large and solutions are sparsely distributed.

• There are good simulators to test the performance of tentative solutions to a problem, but poor methods to directly obtain good solutions.

Page 29: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Applications

• GP was used in designing of an antenna for deployment on NASA’s Space Technology 5 Mission.

• Designs generated are complex, non-intuitive, and 100% compliant with the mission antenna performance requirements.

Page 30: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Applications contd.

• Result of Genetic Programming which was allowed to do branching.

• Below we show the genotype for this antenna.

Ref. [4]

Page 31: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

• forward(L,R): creates a rod of length L and radius R.• rotate-x(rad): rotates the rod by angle rad around x

axis. Similarly others are defined.

Page 32: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Applications contd.

• Result of genetic algorithm.

• Branching not allowed.

Ref. [4]

Page 33: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Summary

• Genetic programming now routinely delivers high-return human-competitive machine intelligence.

• The AI ratio (the “artificial-to-intelligence” ratio) of a problem-solving method as the ratio of that which is delivered by the automated operation of the artificial method to the amount of intelligence that is supplied by the human applying the method to a particular problem.

• GP has very high AI ratio.

Page 34: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Summary

• Routine:A problem solving method is routine if it is general and relatively little human effort is required to get the method to successfully handle new problems within a particular domain and to successfully handle new problems from a different domain.

• Human-Competitiveness: Previously patented, an improvement over a patented invention, or patentable today. Publishable in its own right as a new scientific result.

Page 35: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

References

[1]Book: “A Field Guide to Genetic Programming” by Poli, Langdon, McPhee. 2008.[2]www.genetic-pragramming.com[3]http://en.wikipedia.org/wiki/Genetic_programming[4]Figures: “An Evolved Antenna for Deployment on Nasa’s ST5 Mission” by J. Lohn, G. Hornby, D. Linden, GTPT, May-2004

Page 36: Genetic Programming Dinesh Dharme Prateek Srivastav Pankhil Chheda Harshad Kanhere

Suggested Readings

1.Book: “Genetic Programming: On the programming of computers by means of natural selection” by John R. Koza, MIT Press2.Book: “Adaptation in Natural and Artificial Systems” by John H. Holland, MIT Press