Upload
hammoudeh13
View
250
Download
3
Embed Size (px)
Citation preview
System Identification and Curve Fitting with a
Genetic Algorithm Hierarchy
Alice E. Smith and Mehmet GulsenDepartment of Industrial Engineering
University of Pittsburgh
INFORMS Fall 1997
Curve Fitting Process of approximating a closed form function to a
given data set of independent variables and dependent variable (variable selection, closed form function selection, coefficient estimation). Used for:– System identification– Judging the strength of relationship– Identifying main variables and interaction between variables– Interpolate/extrapolate to new data
Conventional Approaches Various regression techniques Time series analysis Spline fitting Neural networks
Genetic Algorithm Hierarchy
LowerModule
UpperModule
Function andVariable Selection
Coefficient Estimation
y c x c Cos c c x 1 1 2 3 4 22( )
y x Cos xSSE
9 234 2 123 0 093 4 8230 34627
1 22. . ( . . )
.
candidatefunctions
optimizedcoefficientsfor functions
Search Structure
Lower GASearch
Data
n1 n2 n
111
Upper GASearch
Upper GAPopulation
Lower GAPopulation
Genetic Search Process
InitialPopulation
Mutants
Offspring
InitialPopulation
Offspring
Mutants
FinalPopulation
( )n
( )n1
( )n2
best (n)
( )n
TopHalfSelection
UniformSelection
Upper GA - Function Selection Explore the possible functional forms that could represent
the underlying relationship between independent and dependent variables of a data set
Objective Function: Minimize “adjusted” total error corresponding to the functional form. Adjustment is performed by penalizing more complex representations (more variables, higher order terms)
Stopping Criteria: Search is terminated when no improvement is observed for a specific number of generations
Upper GAFunction Selection - Encoding
Tree Structure y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( )
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
Upper GAFunction Selection - Penalty Function
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
[( )]number of nodesconstant
m
( ) ..145
1 05280 05
Penalty Factor = 0.05
Upper GAFunction Selection - Crossover
y CC C x
C xC C x C x x 1
2 3 1
4 25 6 2 7 1 2
ln( )cos( )
C5
y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( ) y CC C x
C xC 1
2 3 1
4 25sin(
ln( ))
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
C3
+
/
x2
x1
1
sinC1
C2 C4
ln
crossover
y C C x C 1 2 13
3sin( )
Before:
After:
Parent 1 Parent 2
Offspring 1 Offspring 2
Upper GAFunction Selection - Mutation
y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( )
C5
x2
+
+
*
*
1
x1 x1
C1
x1
C2
+
*
x1
cos
x2
C3
C4
mutation
y C C x C C x C x C x C x 1 2 13
3 4 2 5 1 6 1 7 12cos( ) exp( )
Before:
After:
C3
x1
+
x1
C1
C2
exp
x2
Parent 1
Mutant
randomly generated tree
Lower GA - Coefficient Estimation Estimate the coefficients of a given closed form function
which minimize the total error over the set of data pointsObjective Function: Minimize total squared error
Minimize
K: number of data points
Stopping Criteria: Search is terminated when no improvement is observed for specific number of generations
Detailed results are published in “International Journal of Production Research”, Vol. 33, No. 7, 1995
( )y yi
K
actual model
1
2
Lower GACoefficient Estimation - Encoding
y C C x C C x C x x 1 2 13
3 4 2 5 1 2cos( )
C1 C2 C3 C4 C5
Lower GA - Selection/Breeding Parents are selected for breeding uniformly from the superior
half of the population The values of the offspring’s coefficients are determined by
calculating the arithmetic mean of the corresponding coefficients of two parents
Parent A: 45.876 32.958 12.098 -3.892 0.2356Parent B: 12.988 35.832 0.234 -12.984 2.4576
Offspring: 29.432 34.395 6.166 -8.438 1.3466
Lower GA - Mutation Perturbing existing solutions to explore new regions of search
space Perturbation value is obtained by multiplying the current
population range with a random factor
C1 C2 C3 C4 C5
k C1 1 1 k C4 4 4 k C2 2 2 k C3 3 3 k C5 5 5
Test Problem
C Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 MeanSd.Dv.
1 9.986 9.998 10.002 10.000 9.996 10.001 9.9970.005
2 9.999 10.000 10.000 10.000 10.000 10.000 10.0000.000
3 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
4 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
5 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
6 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
7 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
8 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
9 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
10 10.000 10.000 10.000 10.000 10.000 10.000 10.0000.000
SE. 0.0017 0.000 0.0000 0.000 0.000 0.000 0.000 -
y C C x C x C x C x C x C x C x x C x x C x x 1 2 1 3 2 4 3 5 12
6 22
7 32
8 1 2 9 1 3 10 2 3
Test ProblemDifferent Error Metrics
012345678
0 500 1000Number of Generations
Log1
0 of
Squ
ared
Err
or
1500
Squared ErrorAbsolute Error
Maximum Error
Test Problem Different Numbers of Data Points
-8
-6
-4
-2
0
2
4
6
8
0 500 1000 1500 2000 2500 3000 3500Number of Generations
Log1
0 of
Squ
ared
Err
or
4000
25 Points
100 Points
Empirical Data Sets
Five benchmark problems from the literature1. onion growth2. children growth3. sunspots4. chemical plant5. slip casting
Single variable/50 observations to 13 variables/1000 observations
Nonlinear regression, time series analysis, model identification
Sunspot data from 1700 to 1995 Highly cyclic with peak and bottom values approximately
in every 11.1 years Cycle is not symmetric. The number of counts reaches to
maximum value faster than it drops to a minimum Training range: 1700-1979 Validation range: 1980-1995
Test Problem 3, Sunspot Data
Functions IdentifiedM o d e l E q u a t i o n S S E
A 9)-0.2471(+2)-0.4585(-1)-1.1965( ttt 6 1 9 6 4
B2))-0.6271(-1))-p(-0.3263((-2.7260ex15.7476exp+
9))--0.3512(1.1989exp(-1)-0.8337(tt
tt 4 5 5 3 3
C9)-0.1148(+4)-0.1316(-1)-0.8064(+1)-2))(-0.8446(-
1))-0.4282(+4)-(1.4097(-0.6099cos1.2410exp(ttttt
tt 4 0 3 4 1
D
9)-0.1046(+4)-0.1413(-1)-0.8253(+1)-2)(-0.9362(-2))-0.7485(-2))-3.1756(-2))-2.8807(+
4))-(0.2561(-3.3442cos0.6979exp(+4)-(-1.4893(-0.5564cos1.6258exp(
tttttttt
tt 3 8 7 1 5
Model D
0
1
2
3
4
5
6
7
8
9
10
1700 1750 1800 1850 1900 1950 2000
Year
20 x
Ann
ual N
umbe
r of S
unsp
ots
Extrapolation of Model D
0
1
2
3
4
5
6
7
8
9
1980 1985 1990 1995Year
DataFitted Function
ConclusionsA unique approach for curve fitting problems
Provides closed form function for the given data setCan handle non-linear, discontinuous functionsFlexible in terms of error metricCan be used separately for function selection and coefficient optimizationComputationally intensive and needs a priori setting of search parameters
and penalty function componentsForthcoming paper : “A hierarchical genetic algorithm for system
identification and curve fitting with a supercomputer implementation,” Mehmet Gulsen and Alice E. Smith, Institute for Mathematics and its Applications, Volumes in Mathematics and its Applications, Volume on Evolutionary Computing.