Upload
sunit-arora
View
243
Download
3
Embed Size (px)
Citation preview
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Curve FittingCurve fitting techniques are used to fit curves to data to obtain intermediate estimates or to derive a simpler function from a complicated function.
• Least squares regression is used when the data exhibits a significant degree of error or noise.
• Interpolation is used to fit curves that pass directly through each of the points.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Least-Squares Regression
• Least-squares regression techniques used to fit a curve to experimental data.
• These techniques used to derive an approximate function that fits the shape or general trend of the data.
Techniques: linear regression, polynomial regression, multiple linear regression
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Linear Regression
• Fit a straight line to a set of paired observations (x1,y1), (x2,y2), …, (xn,yn)
• The mathematical expression for the straight line is
y = a0 + a1 x + e
e is called the error or “residual”
The residual is the difference between the observation The residual is the difference between the observation and the line: and the line: ee = = yy aa00 aa11 xx
• What are the values of a0 and a1?
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Minimize the sum of the squares of the residuals
n
iii
n
iii
n
ir xaayyyeS
i1
210
1
2model,measured,
1
2
This criterion yields a unique line for a given set of data.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Least-Squares Fit of a Straight Line
•To determine the values of a0 and a1, differentiate Sr with respect to each of the coefficients and set to zero:
0)(2
0)(2
101
100
iiir
iir
xxaayaS
xaayaS
2
10
10
0
0
iiii
ii
xaxaxy
xaay
•The equations become:
•The normal equations are
iiii
ii
yxxaxa
yxana2
10
10
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
221
i
iiii
xxn
yxyxna
i
•The slope and the y-intercept are given by
nxay
a
yxana
ii
ii
10
10
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Fit a straight line to the data
xi yi
1 0.52 2.53 24 45 3.56 67 5.5
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Linearization of Nonlinear Relationships
•Transformations can be used to express the data in a form that is compatible with the linear regression.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
•Suppose the relationship between x and y isxbeay 1
1
xbay 11lnln
22
bxay
It can be linearized by taking the ln of both sides:
•Consider
It can be transform into the linear form
xbay logloglog 22
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
•Consider
xbxay
3
3
It can be linearized by inverting both sides
33
33
3
3
3
11111111axa
byx
bayx
xbay
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Problem 2
Fit to the data22
bxay x y1 0.52 1.73 3.44 5.75 8.4
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Problem 2
Fit to the data
Answer
22
bxay x y1 0.52 1.73 3.44 5.75 8.4
75.1
2
3.02
'1
5.0
75.15.010300.0
xy
baa
x'= log(x) y‘=log(y)0 -0.301
0.301 0.230.477 0.5310.602 0.7560.699 0.924
y = a0 + a1 x + e
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Polynomial Regression•We need to fit a polynomial to data using polynomial
regression.
•A second-order polynomial or quadratic fit is
y = a0 + a1 x + a2 x2 +
•The sum of squares of the residues:
n
iiiir xaxaayS
1
22210
•Differentiate Sr with respect to all parameters:
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
•Set the partials to zero and arrange
•These equations are called the normal equations.
•They form a system of linear equations with 3 equations and 3 unknowns.
•In general, an mth order polynomial requires solving a system of m+1 linear equations.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Problem 3Fit a second-order polynomial to the data
x y0 2.11 7.72 143 274 415 61
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Multiple Linear Regression
•The function y is a linear function of 2 or more independent variables, such as
y = a0 + a1 x1 + a2 x2 +
•The sum of the squares of the residuals
•To minimize Sr,
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
•The normal equations are
A system of 3 linear equations and 3 unknowns
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Q1. An electric heating-coil is immersed in a stirred tank. Solvent at 15oC with heat capacity 2.1 kJ kg-1 oC-1 is fed into the tank at a rate of 15 kg h-1. Heated solvent is discharged at the same flow rate. The tank is filled initially with 125 kg cold solvent at 10oC. The rate of heating by the electric coil is 800 W. Calculate the time required for the temperature of the solvent to reach 60oC.
Tutorial: 5
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Regression in Matlab
Use the in polymath
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Additional Example
The natural gas consumption for electric power generation in the Kingdom from 1977 to 2000 is shown in the graph below.
0
2000
4000
6000
8000
10000
12000
1975 1980 1985 1990 1995 2000 2005
Year
Mill
ion
Cubi
c M
eter
s
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
•Observations from the data:
There is an upward trend in the observations.
It looks like that the relation between the gas consumption and the years is linear; i.e. the general trend of the data is linear.
•Can regression be used?
Yes because the gas consumption values are not precise (there are errors in the measurements).
We can assume the normality holds.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
0
2000
4000
6000
8000
10000
12000
1976 1981 1986 1991 1996
Year
Mill
ion
Cubi
c M
eter
s
•Using the equations:
a1 = 393.94
a0 = - 777828
•The coefficient of determination = 0.8811
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Quantification of Error of Linear Regression
•To quantify the error reduction due to describing the data in terms of a straight line, we use the coefficient of determination which is defined as
2
2
)( where yyS
SSSr
it
t
rt
•It represents the fraction of variability in y that can be explained by the variability in x (how close the points are to the line).
•For r2 = 1, it signifies the line explains 100% of the variability of the data.
n
iii
n
iii
n
ir xaayyyeS
i1
210
1
2model,measured,
1
2
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
n
iii
n
iii
n
ir xaayyyeS
i1
210
1
2model,measured,
1
2
P448/Num-methods
If r^2 is 87 87% of original uncertainty has beenexplained by the model
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Compute the coefficient of determination for the linear regression in previous example
•St = 22.7145
•Sr = 2.9911
•r2 = 0.868
•This indicates that 86.8% of the original uncertainty is explained by the linear model.
Answer
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM