47
Symbolic Regression with Interaction-Transformation Prof. Fabricio Olivetti de França Federal University of ABC Center for Mathematics, Computation and Cognition (CMCC) Heuristics, Analysis and Learning Laboratory (HAL) 12 de Julho de 2018

Symbolic Regression with Interaction-Transformation · Symbolic Regression with Interaction-Transformation Prof.FabricioOlivettideFrança Federal University of ABC Center for Mathematics,

  • Upload
    others

  • View
    35

  • Download
    0

Embed Size (px)

Citation preview

Symbolic Regression withInteraction-Transformation

Prof. Fabricio Olivetti de França

Federal University of ABCCenter for Mathematics, Computation and Cognition (CMCC)Heuristics, Analysis and Learning Laboratory (HAL)

12 de Julho de 2018

Topics

1. Goals

2. Interaction-Transformation

3. Experiments

4. Conclusions

1

Goals

Find the function

Find a function f that minimizes the approximation error:

minimizef (x)

‖ε‖2

subject to f (x) = f (x) + ε.

2

Simple is better

• Ideally this function should be as simple as possible.• Conflict of interests:

• minimize approximation (use universal approximators)• maximize simplicity (walk away from generic approximators)

3

The good. . .

The Linear Regression:

f (x,w) = w · x.

• Very simple (and yet useful) model.• Clear interpretation• The variables may be non-linear transformation of the original

variables.

4

. . . the bad. . .

The mean:

f = f (x)

• The average can lie!• It’s just for the sake of the pun

5

. . . the Deep Learning

A deep chaining of non-linear transformations that just works!

• Universal approximation• Alien mathematical expression

6

The best?

Despite its success with error minimization, it raises some questions:

• What does the answer mean?• What if the data is wrong?

7

Symbolic Regression

Searches for a function form and the correct parameters.

Hopefully a simple function

8

Symbolic Regression

Disclaimer: I have large experience with evolutionary algorithms,but limited with Symbolic Regression. I have start studying that lastyear.

9

Symbolic Regression

This was my first experience with GP:

6.379515826309025e − 3 +−0.00 ∗ id(x−1 4.0 ∗ x3

2 .0 ∗ x13 .0)

+− 0.00 ∗ id(x−1 4.0 ∗ x3

2 .0 ∗ x23 .0)− 0.01 ∗ id(x−

1 4.0 ∗ x32 .0 ∗ x3

3 .0)−0.02 ∗ id(x−

1 4.0 ∗ x32 .0 ∗ x4

3 .0) + 0.01 ∗ cos(x−1 3.0 ∗ x−

2 1.0)+0.01 ∗ cos(x−

1 3.0) + 0.01 ∗ cos(x−1 3.0 ∗ x1

3 .0) + 0.01 ∗ cos(x−1 3.0 ∗ x1

2 .0)+0.01 ∗ cos(x−

1 2.0 ∗ x−2 2.0)− 0.06 ∗ log(x−

1 2.0 ∗ x−2 2.0)

+0.01 ∗ cos(x−1 2.0 ∗ x−

2 1.0) + 0.01 ∗ cos(x−1 2.0 ∗ x−

2 1.0 ∗ x13 .0)

+0.01 ∗ cos(x−1 2.0) + 0.01 ∗ cos(x−

1 2.0 ∗ x13 .0)

+0.01 ∗ cos(x−1 2.0 ∗ x2

3 .0) + 0.01 ∗ cos(x−1 2.0 ∗ x1

2 .0)+0.01 ∗ cos(x−

1 2.0 ∗ x12 .0 ∗ x1

3 .0) +−0.00 ∗ id(x−1 2.0 ∗ x2

2 .0)−0.00 ∗ sin(x−

1 2.0 ∗ x22 .0) + 0.01 ∗ cos(x−

1 2.0 ∗ x22 .0) + . . .

10

Why?

• Infinite search space• Redundancy• Rugged

11

Redundancy

f (x) = x3

6 + x5

120 + x7

5040f (x) = 16x(π − x)

5π2 − 4x(π − x)f(x)= sin (x).

12

Rugged space

+

x cos

x −10 −5 0 5 10−15

−10

−5

0

5

10

×

x cos

x −10 −5 0 5 10−10

−5

0

5

10

13

What I wanted

• A few additive terms (linear regression of transformed variables)• Each term with as an interaction of a couple of variables• Maximum of one non-linear function applied to every

interaction (no chaining)

14

Interaction-Transformation

Interaction-Transformation

Constrains the search space to what I want: a linear combinationof the application of different transformation functions oninteractions of the original variables.

15

Interaction-Transformation

Essentially, this pattern:

f (x) =∑

iwi · ti (pi (x))

pi (x) =d∏

j=1xkj

j

ti = {id , sin, cos, tan,√, log, . . .}

16

Interaction-Transformation

Valid expressions:

• 5.1 · x1 + 0.2 · x2

• 3.5 sin (x21 · x2) + 5 log (x3

2 /x1)

17

Interaction-Transformation

Invalid expressions:

• tanh (tanh (tanh (w · x)))• sin (x2

1 + x2)/x3

18

Interaction-Transformation

We can control the complexity of the expression by limiting thenumber of additive terms and the number of interactions:

f (x) =k∑

i=1wi · ti (pi (x))

pi (x) =d∏

j=1xkj

j

s.t.|{kj | kj 6= 0}| ≤ n

19

Interaction-Transformation

Describing as an Algebraic Data Type can help us generalize toother tasks:

IT x = 0 | Weight (Term x) `add` (IT x)

Term x = Trans (Inter x)Trans = a -> a

Inter x:xs = 1 | x s `mul` Inter xs

The meaning of add and mul can lead us to boolean expressions,decision trees, program synthesis.

20

SymTree

Simple search heuristic:

symtree x leaves | stop = best leaves| otherwise = symtree x leaves'

whereleaves' = [expand leaf | leaf <- leaves]

symtree x [linearRegression x]

21

SymTree

expand leaf = expand' leaf termswhere terms = interaction leaf U transformation leaf

expand' leaf terms = node : expand' leaf leftoverwhere (node, leftover) = greedySearch leaf terms

22

IT-ELM

Interaction-Transformation Extreme Learning Machine, it generateslots of random interactions, enumerates the transformations foreach interaction and then adjust the weight of the terms using l0 orl1 regularization.

23

IT-ELM

x1

x2

x3

x4

Π

Π

id

cos

sqrt

id

cos

sqrt

Σ y

eij1

wk

TransformationInteractionInput Output

24

Experiments

Data sets

Data set Features 5-Fold / Train-Test

Airfoil 5 5-FoldConcrete 8 5-FoldCPU 7 5-FoldenergyCooling 8 5-FoldenergyHeating 8 5-FoldTowerData 25 5-FoldwineRed 11 5-FoldwineWhite 11 5-Foldyacht 6 5-FoldChemical-I 57 Train-TestF05128-f1 3 Train-TestF05128-f2 3 Train-TestTower 25 Train-Test 25

Methods

For the sets with folds:

• Each algorithm was run 6 times per fold and the median of theRMSE of the test set is reported

• SymTree was run 1 time per fold (deterministic)

For the sets with train-test split:

• Each algorithm was run 10 times and the median of the RMSEfor the test set is reported

• SymTree was run 1 time per data set

26

Results

For a complete table:

Binder

Cell –> Run All

27

Results

28

Results

29

Results

30

Results

31

Results

32

Results

33

Results

34

Sample equation

CPU:

≈ 0.86 · cache + 0.12 · 10−6 ·maxMem√

maxChan ·minMem

35

Sample equation

CPU:

≈ 0.86 · cache + 0.12 ·maxMem(MB)√

maxChan ·minMem(MB)

36

Sample equation

CPU:

≈ 0.86 · cache + 0.12 ·maxMem(MB)√

maxChan√

minMem(MB)

More cache = more performance! (sounds about right)

37

Sample equation

CPU:

≈ 0.86 · cache + 0.12 ·maxMem(MB)√

maxChan√

minMem(MB)

The original paper isn’t clear about it, but it seems that max/minMem refers to the range of machine tests with a given CPU. So thesecond term may represent the existence of not measured variablesproportional to memory and channels of the experimented machines.

38

Conclusions

Summary

• The Interaction-Transformation representation can help toeliminate complicated expressions from the symbolic regressionsearch space.

• Two algorithms created so far: SymTree, a search-basedheuristic, and IT-ELM, based on extreme learning machines.

• The results show a good compromise between model accuracyand simplicity.

39

Future research

• Generalization as a Algebraic Data Type• Use this representation for classification, program synthesis, etc.• Broaden the search space a little bit• Explore other search heuristics (evolutionary based)

40

Try it!

You can try a lightweight version of SymTree at:

https://galdeia.github.io/

It works even on midrange Smartphones!

41

Some problems with the provided data sets

The folds data sets were provided by some authors that extensivelyused for GP performance comparison, but:

Forest Fire contains many samples with target = 0, because most ofthe time the forests did not caught fire.

CPU should use the last but one column as the target variable, thelast column is just the predicted values from the original paper.

42