18
Digression: Symbolic Digression: Symbolic Regression Regression Suppose you are a Suppose you are a criminologist, and you have criminologist, and you have some data about recidivism. some data about recidivism. Years in Prison Holds Ph.D IQ Injects Heroin in Eyeballs Recidivist 0 87 1 1 86 0 1 186 1 0 108 0 0 143 0 : : :

Digression: Symbolic Regression

Embed Size (px)

DESCRIPTION

Digression: Symbolic Regression. Suppose you are a criminologist, and you have some data about recidivism. Injects Heroin in Eyeballs. Recidivist. Years in Prison. Holds Ph.D. IQ. 10 0 87 1 1 - PowerPoint PPT Presentation

Citation preview

Page 1: Digression: Symbolic Regression

Digression: Symbolic RegressionDigression: Symbolic Regression

Suppose you are a criminologist, and you Suppose you are a criminologist, and you have some data about recidivism.have some data about recidivism.

Years inPrison

HoldsPh.D

IQ Injects Heroinin Eyeballs

Recidivist

10 0 87 1 1 4 1 86 0 022 1 186 1 1 6 0 108 0 1 8 0 143 0 0 : : : : :

Page 2: Digression: Symbolic Regression

Criminology 101Criminology 101

You want a formula that predicts if someone will You want a formula that predicts if someone will go back to jail after being released.go back to jail after being released.

The formula will be based on the data collected, so The formula will be based on the data collected, so the “independent variables” arethe “independent variables” are– xx11 = number of years in jail = number of years in jail– xx22 = holds Ph.D. = holds Ph.D.– xx33 = IQ = IQ– etc.etc.

This is usually done with “regression”. Here is a This is usually done with “regression”. Here is a simpler example, with one independent variable.simpler example, with one independent variable.

Page 3: Digression: Symbolic Regression

Symbolic RegressionSymbolic Regression

A simple data set with one independent A simple data set with one independent variable, called x. What’s the relationship variable, called x. What’s the relationship between x and y?between x and y?

x

y

x y

12457:

2.13.33.11.83.2 :

Page 4: Digression: Symbolic Regression

Symbolic RegressionSymbolic Regression

You might try “linear regression:”You might try “linear regression:”

x

yy = mx + b

Page 5: Digression: Symbolic Regression

Symbolic RegressionSymbolic Regression

You might try “quadratic regression:”You might try “quadratic regression:”

x

y y = ax2 + bx + c

Page 6: Digression: Symbolic Regression

Symbolic RegressionSymbolic Regression

You might try “exponential regression:”You might try “exponential regression:”

x

y y = axb + c

Page 7: Digression: Symbolic Regression

Symbolic RegressionSymbolic Regression

How would you choose?How would you choose? Maybe there is some underlying Maybe there is some underlying

“mechanism” that produced the data.“mechanism” that produced the data. But you may not know…But you may not know… ““Symbolic regression” finds the Symbolic regression” finds the formform of the of the

equation, and the coefficients, equation, and the coefficients, simultaneously.simultaneously.

Page 8: Digression: Symbolic Regression

How To Do Symbolic Regression?How To Do Symbolic Regression?

One way: One way: genetic programminggenetic programming.. ““The evolution of computer programs The evolution of computer programs

through natural selection.”through natural selection.” The brainchild of John Koza, extending The brainchild of John Koza, extending

work by John Holland.work by John Holland. A very bizarre idea that actually works!A very bizarre idea that actually works! We will do this.We will do this.

Page 9: Digression: Symbolic Regression

Regression via Regression via Genetic ProgrammingGenetic Programming

We know how to produce “algebraic We know how to produce “algebraic expression trees.”expression trees.”

We can even form them randomly.We can even form them randomly. Koza says “Make a generation of random Koza says “Make a generation of random

trees, evaluate their fitnesses, then let the trees, evaluate their fitnesses, then let the more fit have sex to produce children.”more fit have sex to produce children.”

Maybe the children will be more fit?Maybe the children will be more fit?

Page 10: Digression: Symbolic Regression

Expression Trees AgainExpression Trees Again

A one-variable tree A one-variable tree isis a regression equation: a regression equation:

+

*

x2

-

x+

.5x y = (((x + 0.5) - x) + (2 * x))

Page 11: Digression: Symbolic Regression

Evaluating Expression TreesEvaluating Expression Trees

yp = (((x + 0.5) - x) + (2 * x))

x yo yp |yo - yp|2

12457

2.1 2.5 0.163.3 4.5 1.443.1 8.5 29.161.8 10.5 75.693.2 14.5 127.69

234.14 = “fitness”

Superscripts:“o” for “observed”“p” for “predicted”

Page 12: Digression: Symbolic Regression

A Generation of Random TreesA Generation of Random Trees

Tree 1 Tree 2 Tree 3 Tree 4

Tree Fitness

1 3352 15303 9504 1462: :

(most of these are really rotten!)

Page 13: Digression: Symbolic Regression

Choosing ParentsChoosing Parents

Tree 1 Tree 2 Tree 3 Tree 4

Tree Fitness

1 3352 15303 9504 1462: :

Choose these two,randomly, “proportionalto their fitness"

Generation 1

Page 14: Digression: Symbolic Regression

““Sexual Reproduction”Sexual Reproduction”

Choose “crossoverpoints”, at random

Then, swap the subtreesto make two new childtrees:

Generation 1

Generation 2

Page 15: Digression: Symbolic Regression

The StepsThe Steps

1.1. Create Generation 1 by randomly generating 500 Create Generation 1 by randomly generating 500 trees.trees.

2.2. Find the fitness of each tree.Find the fitness of each tree.3.3. Choose pairs of parent trees, proportional to their Choose pairs of parent trees, proportional to their

fitness.fitness.4.4. Crossover to make two child trees, adding them to Crossover to make two child trees, adding them to

Generation 2.Generation 2.5.5. Continue until there are 500 child trees in Continue until there are 500 child trees in

Generation 2.Generation 2.6.6. Repeat for 50 generations, keeping the best (most Repeat for 50 generations, keeping the best (most

fit) tree over all generations.fit) tree over all generations.

Page 16: Digression: Symbolic Regression

How Could This Possibly Work?How Could This Possibly Work?

No one seems to be able to say…No one seems to be able to say… John Holland proved something called the John Holland proved something called the

“schema theorem,” but it really doesn’t “schema theorem,” but it really doesn’t explain much.explain much.

It’s a highly “parallel” process that It’s a highly “parallel” process that recombines “good” building blocks.recombines “good” building blocks.

It really does work very well for a huge It really does work very well for a huge variety of hard problems!variety of hard problems!

Page 17: Digression: Symbolic Regression

Why This, in a Java Course?Why This, in a Java Course?

Because we’re going to implement it!Because we’re going to implement it! Because writing code to implement this Because writing code to implement this

isn’t too hard.isn’t too hard. Because it illustrates a large number of O-O Because it illustrates a large number of O-O

and Java ideas.and Java ideas. Because it’s fun!Because it’s fun! Here is what my implementation looks like:Here is what my implementation looks like:

Page 18: Digression: Symbolic Regression