Upload
vinodh-kumar
View
253
Download
0
Embed Size (px)
Citation preview
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 1/32
Least-Square Regression
Chapter 17
July 2013
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 2/32
July 2013
Where substantial error is associated with data,
polynomial interpolation is inappropriate and may yield
unsatisfactory results when used to predict intermediate
values. Experimentally data is often of this type. For
example, the following gure (a) shows seven
experimentally derived data points showing signicant
variability. he data indicates that higher values of y are
associated with higher values of x.
!ow, if a sixth"order interpolating polynomial is tted to this
data (g b), it will pass exactly through all of the points.
#owever, because of the variability in the data, the curve
oscillates widely in the interval between the points. $n
particular, the interpolated values at x % &.' and x % .'
appear to be well beyond the range suggested by the data.
more appropriate strategy is to derive an
approximating function that ts the shape . Fig (c) illustrates
how a straight line can be used to generally characteri*e
the trend of the data without passing through any particular
point.
+ne way to determine the line in gure (c) is to loo at
the plotted data and then setch a -best line through the
points. /uch approaches are not enough because they are
arbitrary. hat is, unless the points dene a perfect straight
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 3/32
July 2013
line (in which case, interpolation would be appropriate),
di0erent analysis would draw di0erent lines.
o avoid this, some criterion
must be devised to establish a basis for the t. +ne way to
do this is to derive a curve that minimi*es the discrepancy
between the data points
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
Z MY ["+> Qر@+ "(N)\> ])^B " _`9% )5 "(8 OKB O?"B
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 4/32
July 2013
and the curve. +ne techni1ue for doing
this is called least"s1uares regression.
1!"1 #inear $egression
he simple example of a least"s1uares approximation is
tting a straight line to a set of paired observations2 (x &,y&),
(x3,y3), 4, (xn,yn).
he mathematical expression for the straight line is
y % a5 6 a&x 6 e
where a5 and a& are coe7cients representing the intercept
and the slope, respectively, and e is the error between the
model and the observations, which can be represented by
rearranging the previous e1uation as
e % y 8 a5 8 a&x
thus, the error is the discrepancy between the true value of
y and the approximate value, a5 6 a&x, predicted by the
linear e1uation.
1!"1"1 Criteria %or the &'est( )t
+ne strategy for tting a -best line through the data
would be to minimi*e the sum of the residual errors for all
the available data, as in
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
" )5 UETK$ b#K/ O> "c )5 d'+ /N,KF ?> bTر
f ي)5 UET/ " bA
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 5/32
July 2013
∑i=1
n
ei=∑i=1
n
( y i−a0−a1 xi )
where n % total number of points. #owever, this is an
inade1uate criterion, as illustrated by the next gure, which
shows the t of a straight line to two points.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
bA P U .%MY f'($ "#jB K$% ?MT$%
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 6/32
July 2013
+bviously, the best t is the line connecting the points.
#owever, any straight line passing through the midpoint of
the connecting line results in a minimum value of the
previous e1uation e1ual to *ero because the errors cancel.
herefore, another logical criterion might be to minimi*e the
sum of the absolute values of the discrepancies, as in
∑i=1
n
|ei|=∑i=1
n
| y i−ao−a1 x i|
he previous g (b) demonstrates why this criterion is also
inade1uate.
For the four points shown, any straight line falling within the
dashed lines will minimi*e the sum of the absolute values.
hus, this criterion also does not yield a uni1ue best t.
third strategy for tting a best line is the minimax
criterion.
$n this techni1ue, the line is chosen that minimi*es the
maximum distance that an individual point falls from the
line. s shown in previous g (c), this strategy is ill"suited
for regression because it gives big e0ect to an outlier, that
is, a single point with a large error.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
.%k",+c ]^T/ "M(5 d/"#F "(5 ,#/ "KA d+c ["($% M^K#Bر
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 7/32
July 2013
strategy that overcomes the shortcomings of the
previous approaches is to minimi*e the sum of the s1uares
of the residuals between the measured y and the y
calculated with the linear model.
ei
2=¿∑i=1
n
( yi ,measured− y i,model)2=∑
i=1
n
( yi−a0−a1 x i)2
Sr=∑i=1
n
¿
his criterion has a number of advantages, including the
fact that it yields a uni1ue line for a give set of data.
1!"1"2 #east*Suares )t o% a straight line
o determine values of a5 and a&, the previous e1uation
is di0erentiated with respect to each coe7cient2
∂Sr
∂a0
=−2∑ ( y i−a0−a1 x i)
∂Sr
∂a1
=−2∑ [( y i−a0−a1 xi) x i ]
!ote that we have simplied the summation symbols9
unless otherwise indicated, all summation are from i % & to
n. /etting these derivatives e1ual to *ero will result in a
minimum /r.
yi−∑ a0−¿∑ a1 x i
0=∑ ¿
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
U mj %c d+c Q" يT$% bV",5 4 f)#$ f ي)5 UETKF d+ *^$% O? "
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 8/32
July 2013
yi x i−∑ a0 xi−¿∑ a1 x i2
0=∑ ¿
!ow, reali*ing that ∑ a0 % na5, we can express the
e1uations as a set of two simultaneous linear e1uations with
two unnowns (a5 and a&)2
na0+(∑ xi ) a1=∑ y i (&:.;)
(∑ x i )a0+(∑ x i
2) a1=∑ xi y i
hese are called the normal e1uations. hey can be solvedsimultaneously
a1=n∑ xi yi−∑ x i∑ y i
n∑ xi2−(∑ x i)
2
his result can then be used in con<unction with E1. (&:.;)
to solve for
a0=´ y−a1 ´ x
where ´ y and ´ x are the means of y and x, respectively.
Eam-le 1!"1 #inear $egression
Pro'lem Statement.Fit a straight line to the x and y values in the rst two
columns of the next table
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
4B f) ي/ 4B " f^ ي^T/ 4B "? f^ ي^T/
f ي$c )$ "^B ر̀ !M#+ 4$
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 9/32
July 2013
Solution.
he following 1uantities can be computed
n % : ∑ x i yi=119.5 ∑ x i2=140
∑ xi=28 ´ x=28
7 =4
∑ y i=24 ´ y=24
7 =3.428571
=sing the previous two e1uations,
a1=7 (119.5 )−28(24)
7 (140 )−(28)2 =0.8392857
a0=3.428571−0.8392857 (4 )=0.07142857
herefore, the least"s1uare t is
y=0.07142857+0.8392857 x
he line, along with the data, is shown in the rst gure (c).
1!"1"3 uanti)cation o% Error o% #inear $egression
ny line other than the one computed in the previous
example results in a larger sum of the s1uares of the
residuals. hus, the line is uni1ue and in terms of our
chosen criterion is a -best line through the points.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
O Qر $> d^BM? Qر ?M5 A%
Qرj$"8 )5> ]BME$% b)^+%
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 10/32
July 2013
number of additional properties of this t can be
explained by examining more closely the way in which
residuals were computed.
>ecall that the sum of the s1uares is dened as
Sr=∑i=1
n
e i
2=∑i=1
n
( y i−a0−a1 xi)2
!otice the similarity between the previous e1uation and
St =∑ (
yi−
́y)
2
he similarity can be extended further for cases where (&)
the spread of the points around the line is of similar
magnitude along the entire range of the data and (3) the
distribution of these points about the line is normal.
$t can be demonstrated that if these criteria are met, least"
s1uare regression will provide the best (that is, the most
liely) estimates of a5 and a&.
$n addition, if these criteria are met, a -standard
deviation of the regression line can be determined as
s y / x=√ Sr
n−2
where s y / x is called the stan0ar0 error o% the estimate.
he subscript notation - y / x designates that the error is
for a predicted value of y corresponding to a particular
value of x.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
O> M5$"8 <F? Ujي (* $(Nر."8"+8 O ي)8"+v M#/ Z
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 11/32
July 2013
lso, notice that we now divide by n"3 because two data
derived estimates 8 a5 and a& 8 were used to compute /r9
thus, we have lost two degrees of freedom.
nother <ustication for dividing by n"3 is that there is no
such thing as the -spread of data around a straight line
connecting two points..
he standard error of the estimate 1uanties the spreadof the data. #owever, s y / x 1uanties the spread around the
regression line as shown in the next gure (b) in contrast to
the original standard deviation /y that 1uantied the spread
around the mean ( g (a)).
he above concepts can be used to 1uantify the -goodness
of our t. his is particularly useful for comparison of
several regressions
(next gure). o do this, we return to the original data and
determine the total sum of the s1uares around the mean for N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
A" O"+c $c ي mBMF> %cd ي$c O"+c wMF> Oc? xر/ O>
x"(/ O> A" ي
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 12/32
July 2013
the dependent variable
(in our case, y). his 1uantity is designated as / t. his is the
magnitude of the residual error associated with the
dependent variable prior to regression. fter performing the
regression, we can compute /r, the sum of the s1uares of
the residuals around the regression line.
his characteri*es the residual error that remains after the
regression.
$t is, therefore, sometimes called the unexplained sum of
the s1uares.
he di0erence between the two 1uanties, /t 8 /r,
1uanties the improvement or error reduction due to
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
^K bD"\ m+>? )/ d/" يA %` f ي)5 !M(/ BMA @5>
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 13/32
July 2013
describing the data in terms of a straight line rather than as
an average value.
?ecause the magnitude of this 1uantity is scale"dependent,
the di0erence is normali*ed to /t to yield
r2=
S t −Sr
St
where r3 is called the coecient o% 0etermination and r
is the correlation coe7cient (%√ r
2
).
For a perfect t, /r % 5 and r % r3 % &, signifying that the
line explains &55 percent of the variability of the data. For r
% r3 % 5, /r % /t and the t represents no improvement.
n alternative formulation for r that is more convenient
for computer implementations is
r= n∑ xi y i−(∑ xi)(∑ yi)
√ n∑ x i2−(∑ x i)
2
√ n∑ y i2−(∑ yi)
2
Eam-le 1!"2 Estimate o% errors %or the linear least*
Suares it
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
]BM b$ *)" *($ m,KA% "8 QMA%? *^ يY f/"< U$% 4 4$
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 14/32
July 2013
Pro'lem Statement.
@ompute the total standard deviation, the standard error of
the estimate, and the correlation coe7cient for the data in
Example &:.&
Solution.
he summations are performed and represented in the
previous exampleAs table. he standard deviation is
S y=√ St
n−1=√
22.7143
7−1 =1.9457
and the standard error of the estimate is
S y / x=√ Sr
n−2=√
2.9911
7−2 =0.7735
hus, because S y / x<S y , the linear regression model is
e7cient.
he extent of the improvement is 1uantied by
r2=
S t −Sr
St
=22.7143−2.9911
22.7143 =0.868
or
r=√ 0.868=0.932
hese results indicate that B.B percent of the original
uncertainty has been explained by the linear model.
1!"1" #ineari4ation o% onlinear $elationshi-s
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
&$% *#+ M^ O"+c "z ر{$%
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 15/32
July 2013
Cinear regression provides a powerful techni1ue for
tting a best line to data. #owever, it is predicated on the
fact that the relationship between the dependent and
independent variables is linear.
his is not always the case and the rst step in any
regression analysis should be to plot and visually inspect
the data to now whether a linear model applies. For
example, the next gure shows some data that is obviously
curvilinear. $n some cases, techni1ues such as polynomial
regression, are appropriate. For example, transformations
can be used to express the data in a form that is compatible
with linear regression.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
b "(B> Q"#$% |(B 4 4رb K x}%? "')B 4 4/Z?
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 16/32
July 2013
+ne example is the
e-onential mo0el
y=α 1e β1 x
(&:.3)
where α 1 and β1 are
constants. s shown
in the next gure, the
e1uation represents a nonlinear
relationship (for β1≠0
) between x and y.
nother
example of a nonlinear model is the sim-le -o5er
euation
y=a2 x β2
(&:.&D)
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 17/32
July 2013
where α 2 and β2 are constant coe7cients. s shown in
the previous figure, the e1uation ( for 3 5 or &) is
nonlinear.
third example of a nonlinear model is the saturation*
gro5th*rate e1uation
y=α 3 x
β3+ x
(&:.;)Where α
3 and β3 are constant coe7cients. his model
also represents a nonlinear relationship between y and x,
that levels o0 as x increases.
simpler alternative is to use mathematical
manipulations to transform the e1uations into a linear form.
hen, simple linear regression can be employed to t the
e1uations to data.
E1uation (&:.3) can be lineari*ed by taing its natural
logarithm
ln y=lnα 1+ β1 x ln e ?ut because ln e % &,ln y=lnα 1+ β1 x
hus, a plot of ln y versus x will yield a straight line with a
slope of β1 and an intercept of ln α 1 (previous g d).
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 18/32
July 2013
E1uation (&:.D) is lineari*ed by taing its base"&5 logarithm
to give
log y= β2 log x+ logα 2
hus, a plot of y versus log x will yield a straight line with a
slope of β2 and an intercept of log α 2 ( previous g e).
E1uation (&:.&;) is lineari*ed by inverting it to give
1
y =
β3
α 3
1
x +
1
α 3
hus, a plot of1/ y
versus1/ x
will be linear, with a slope of β3/α 3 and an intercept of 1/α 3 (previous g f).
$n their transformed forms, these models can use linear
regression to evaluate the constant coe7cients. hey could
then be transformed bac to their original state and used
for predictive purposes.
Example &:.; illustrates this procedure for E1. (&:.D)
Eam-le 1!"6 #ineari4ation o% a Po5er Euation
Pro'lem Statement.
Fit E1.(&:.&D) to the data in the next table using a
logarithmic transformation of the data.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
"(K *V ["~~($% U M#/ Z<5 8 dK^z 4/ Z? dAي ~~~"{
dK %F )5 "~~~B NK
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 19/32
July 2013
Solution.
he next gure (a) is a plot of the original data in its
untransformed state. Figure (b) shows the plot of the
transformed data. linear regression of the log"transformed
data yields the result
log y=1.75 log x−0.300
hus, the intercept, log α 2 , e1uals "5.D55, and therefore,
by taing the antilogarithm, α 2=10−0.3
=0.5 . he slope is β2=1.75 .
@onse1uently, the power e1uation is
y=0.5 x1.75
his curve, as plotted in the next gure (a), indicates a good
t.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
f')/Z K$% *T يE($% NY ! ي$% 8:B 4 F•% }%رP $c %M\ Pي ;jB € "ر
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 20/32
July 2013
1!"1"7 8eneral Comments on
#inear $egression
We have focused on the
simple derivation and practical use of e1uations to t data.
/ome statistical assumptions that are inherent in the linear
least"s1uare procedures are
&.Each x has a xed value9 it is not random and is
nown without error.3. he y values are independent random variables and
all have the same variance.D. he y values for a given x must be normally
distributed.
/uch assumptions are relevant to the proper derivation
and use of regression. For example, the rst assumption
means that (&) the x values must be error"free and (3) the
regression of y versus x is not the same as x versus y.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
O"+c :;< &^B 4
O"; يP f ي)5 ?EB 4ر
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 21/32
July 2013
1!"2 Polynomial $egression
/ome engineering data, although representing a mared
pattern, is poorly represented by a straight line. For these
cases, a curve would be better suited to t the data. +ne
method to accomplish this ob<ective is to use
transformations. nother alternative is to t polynomials to
the data using polynomial regression.
he least"s1uares procedure can be readily extended to
t the data to a higher"order polynomial. For example,
suppose that we t a
second"order polynomial or 1uadratic2
y=a0+a1 x+a2 x2
+e
for this case the sum of the s1uares of the residuals is
Sr=∑i=1
n
( y i−a0−a1 x i−a2 x i
2)2
Following the procedure of the previous section, we tae the
derivative of the previous e1uation with respect to each of
the unnown coe7cients of the polynomial, as in
∂Sr
∂a0
=−2∑ ( y i−a0−a1 x i−a2 xi2)
∂Sr
∂a1
=−2∑ x i( yi−a0−a1 xi−a2 x i
2)
∂Sr
∂a2=−2∑ x i
2
( y i−a0−a1 x i−a2 x i
2
)
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
Z 4 ]T$% QMP w?"KB 4 & %M8> E+ b);Bي ‚$% *Tير
xري\ ƒV"E(8 Zc &(^B Z
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 22/32
July 2013
hese e1uations can be set e1ual to *ero and rearranged to
develop the following set of normal e1uations2
(n )a0+(∑ xi )a1+(∑ x i
2 )a2=∑ yi
(∑ x i )a0+(∑ x i
2) a1+(∑ xi
3 )a2=∑ xi y i
(∑ x i
2 )a0+(∑ x i
3)a1+(∑ xi
4 ) a2=∑ xi
2 y i
where all summations are from i % & through n. !ote that
the above three e1uations are linear and have three
unnowns2 a0 , a1 , and a2 .
he coe7cients of the unnowns can be calculated
directly from the observed data.
For this case, we see that the problem of determining a
least"s1uares second"order polynomial is e1uivalent to
solving a system of three simultaneous linear e1uations.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
„< U 4 Q"'KFZ% 4A>?> dN ي ي'< }|8 ]V"Y …'A *# %ر ?> ">
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 23/32
July 2013
Eam-le Polynomial $egression
Pro'lem Statement.
Fit a second"order polynomial to the data in the rst two
columns of the next table.
Solution.
From the given data,
m % 3 ∑ x i=15 ∑ xi4=979
n % ∑ y i=152.6 ∑ x i yi=585.6
´ x=2.5 ∑ x i2=55 ∑ x i
2 y i=2488.8
´ y=25.433 ∑ xi3=225
herefore, the simultaneous linear e1uations are
[ 6 15 55
15 55 225
55 225 979]{a0
a1
a2}=
{152.6
585.6
2488.8}/olving these e1uations through a techni1ue such as Gauss
elimination gives a5 % 3.;:B':, a& % 3.D'H3H, and a3 %
&.B5:&.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
d8 K( ي f,K/ => $c € ر#/ Z %cxMB ر/ =$% %$\ O" ƒ •% )5 €";ير
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 24/32
July 2013
Continue.
herefore, the least"s1uares 1uadratic e1uations for this
case is
y % 3.;:B': 6 3.D'H3Hx 6 &.B5:&x3
he standard error of the estimate based on the regression
polynomial is
S y / x=√ Sr
n−(m+1)=√
3.74657
6−3=1.12
he coe7cient of determination is
r2=
S t −Sr
St
=2513.39−3.74657
2513.39 =0.99851
and the correlation coe7cient is r % 5.HHH3'.
hese results indicate that HH.B'& percent of the
original uncertainty has been explained by the model. hisresult supports the conclusion that the 1uadratic e1uation
represents an excellent t, as is also evident from the next
gure.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 25/32
July 2013
1!"3 9ulti-le #inear
$egression
useful extension of linear regression is the case where
y is a linear function of two or more independent variables.For example, y might be a linear function of x& and x3, as in
y=a0+a1 x1+a2 x2+e
/uch an e1uation is particularly useful when tting
experimental data where the variable being studied is often
a function of two other variables. For this two"dimensional
case, the regression -line becomes a -plane (next gure).
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
U† ! ي$% O"+9% ‡ ي#B
bT/ 4B: "/%*^8"F .%%رY ˆ.%(F <|5 M#8 O/ Oر
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 26/32
July 2013
s with the previous cases, the -best values of the
coe7cients are determined by setting up the sum of the
s1uares of the residuals,
Sr=∑i=1
n
( yi−a0−a1 x1 i−a2 x2 i)2
and di0erentiating with respect to each of the unnown
coe7cients.
∂Sr
∂a0
=−2∑ ( y i−a0−a1 x1i−a2 x2 i)
∂Sr
∂a1
=−2∑ x1 i( yi−a0−a1 x1 i−a2 x2 i)
∂Sr
∂a2
=−2∑ x2 i( yi−a0−a1 x1i−a2 x2 i)
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
Q" يT$% ‡" )5 ‡ ي#/ O> / UdBM$ O> !> ["($% b)\> ‡ ي#B "
& > $"#)$? dK• .""Fc
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 27/32
July 2013
he coe7cients yielding the minimum sum of the s1uares of
the residuals are obtained by setting the partial derivatives
e1ual to *ero and expressing the result in matrix form as
[ n ∑ x
1 i ∑ x2 i
∑ x1 i ∑ x
1 i
2 ∑ x1 i∑ x
2 i
∑ x2 i ∑ x
1 i∑ x2 i ∑ x
2 i
2 ]{a0
a1
a2
}={ ∑ yi
∑ x1 i y i
∑ x2 i y i}
Eam-le 1!"7 9ulti-le #inear $egression
Pro'lem Statement.
he following data was calculated from
the e1uations y % ' 6 ;x& 8 Dx32
=se multiple linear regression to t this
data.
Solution.
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
„'+ U#/ O> O(,$% &YK/? *^B ر;$% „'(8 "5•%
4 ي;|+>v *')K ‰V"K+
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 28/32
July 2013
he summations re1uired to develop the previous e1uation
are2
he result is
[ 6 16.5 14
16.5 76.25 48
14 48 54 ]{a0
a1
a2}={
54
243.5
100 }Which can be solved using a method such as Gauss
elimination for
a5 % ' ai % ; a3% "D
which is consistent with the original e1uation from which
the data was derived.
he foregoing two"dimensional case can be easily
extended to m dimensions, as in
y % a5 6 a&x& 6 a3x3 6 4 6 amxm 6 e
where the standard error is formulated as
S y / x=√ Sr
n−(m+1)
and the coe7cient of determination is computed as in E1
(&:.&5).
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
W",+ fKNY"5 U "5 b#/ Š4 ي(z MA> &MKF $‹ U| "(z *KY‹ *) "5 *#K ?>
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 29/32
July 2013
lthough there may be certain cases where a variable is
linearly related to two or more other variables, multiple
linear regression has additional utility in the derivation of
power e1uations of the general form
y=a0 x
1
a1 x
2
a2…. xm
am
/uch e1uations are extremely useful when tting
experimental data.
o use multiple linear regression, the e1uation is
transformed by taing its logarithm to yield.
y= loga0+a1 log x1+¿a2log x2+…+am log xm
log ¿
his transformation is similar in spirit to the one used to t a
power e1uations when y is a function of a single variable x.
Pro'lem 1!"
=se least"s1uares regression to t a straight line to
x : && &' &: 3& 3D 3H 3H D:
DH
y 3H 3& 3H &; 3& &' : : &D 5
D
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
k%Œ$% F ر$% U{ ])^$% f($? "V% رTK/ d)#, يF O" => $% d)B 4$
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 30/32
July 2013
@ompute the standard error of the estimate and the
correlation coe7cient. Ilot the data and the regression line.
$f someone made an additional measurement of x % &5, y %
&5, would you suspect, that the measurement was valid or
faultyJ Kustify your conclusion.
Solution.
he results can be summari*ed as
)901489.0;476306.4( 78055.00589.31 / ==−= r s x y x y
t x % &5, the best t e1uation gives 3D.3';D. he line and
data can be plotted along with the point (&5, &5).
he value of &5 is nearly D times the standard error awayfrom the line,
3D.3';D 8 &5 % &D.3';D L D;.;:
hus, we can conclude that the value is probably erroneous.
Pro'lem 1!"13
n investigator has reported the data tabulated below for
an experiment to determine the growth rate of bacteria
(per d), as a function of oxygen concentration c (mgMC). $t is N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
A Q"#Fي #|/ 4ر {> ƒNEKFر
b\ر/ "'+ Q" يT$% O>d/M+"? d5
Z MY ["+> Qر@+ "(N)\> ])^B "
)5 "(8 OKB O?"B _`9%
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 31/32
July 2013
nown that such data can be modeled by the following
e1uation2
k =
k maxc2
cs+c2
Where cs and k max are parameters. =se a transformation
to lineari*e this e1uation. hen use linear regression to
estimate cs and k max and predict the growth rate at c % 3
mgMC.
@ 5.' 5.B &.' 3.' ;N &.& 3.; '.D :. B.H
Solution.
he e1uation can be lineari*ed by inverting it to yield
max2
max
111
k ck
c
k s
+=
@onse1uently, a plot of &Mk versus &Mc should yield a straight
line with an intercept of &Mk max and a slope of csMk max
c,mg/# k , /0 1/c2 1/k
1/c2 1/
k :1/c2;2
5.' &.&;.555555
5.H5H5H&
D.DD;
&.555555
5.B 3.;&.'3'55
5.;&:
5.'&5;3
3.;;&;5
&.' '.D5.;;;;;;
5.&BB:H
5.5BDB':
5.&H:'D&
3.' :.5.&5555
5.&D&':H
5.53&5'D
5.53'55
; B.H5.53'55
5.&&3D5
5.55:533
5.55DH5
Sum 7"22<6 1"!=3 6"3<<3 1="77=
N$"8ر K$9% MBر ?+8 2G0HHHHI?>ر E+ *$"Fي */ر%D "ر ?+", ."/($%*"$%45 689"8:;< =>?>."@A*Bي * $()'& %$#"! ير
hysics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
umerical, Economy
O"N#P Q"A ! [email protected]*$)T UV"? WرP" X+", 4 ي#Y$"8eng-hs. com, eng-hs. net
8/20/2019 Numerical Ch17
http://slidepdf.com/reader/full/numerical-ch17 32/32
July 2013
66 ! 3= 66
@ontinue2
he slope and the intercept can be computed as
202489.0)229444.6()66844.18(5
)758375.1(229444.6)399338.4(521 =
−
−=a
099396.05
229444.6202489.0
5
758375.10 =−=a
herefore, k max % &M5.5HHDH % &5.55:; and cs %
&5.55:;(5.353;BH) % 3.5D:&BH, and the t is
2
2
037189.2
06074.10
c
ck
+
=
his e1uation can be plotted together with the data2
he e1uation can be used to compute
666.6)2(037189.2
)2(06074.102
2
=
+
=k
)5 =KTB Z 4 ي̂ z%$% ["Y
; $ $ @$ U