Upload
deborahrosales
View
223
Download
0
Embed Size (px)
Citation preview
7/28/2019 06 Simple Modeling
1/16
Linear fit of a theoretical relationship
concentration
abso
rbance,
550nm
Were just trying to minimize the standard error (SE) for
our model.
The optimum value for b1 is the one that minimizes the
standard error.
We also must assume that the values are homoscedastic:
(unweighted least squares)
7/28/2019 06 Simple Modeling
2/16
homoscedasticnot homoscedastic
Weighted least squares
Used when the data is not homoscedastic.
Each measurement is weighted by thereciprocal of its variance.
v y12 ! v y2
2 ! .... ! v yn2 !_ i
b1=xi
2!xi yi! =0.0204 soY=0.0204X
An ANOVA can can be conducted to determin
the goodness of fit for the model.
Source of Variance Sum of Squares DF
Model p
Residual n-p-1
Total n-1
p = number of parameters.
n = number of data points.
7/28/2019 06 Simple Modeling
3/16
ppm absexp abscalc model residual
1 0.020 0.020 0.0402 0.0004
2 0.038 0.0408 0.0198 0.0028
3 0.064 0.0612 -0.0006 -0.0028
4 0.077 0.0816 -0.021 0.0046
5 0.105 0.102 -0.003 -0.0030
0.00416 0.000046
Calculate absorbances based on Y = b1X
b1=0.0204 Y=0.0608
F test will show that the model accounts for asignificant amount of the variance comparedto the residual. F = 272
Both examples show a linear regression fit of
a data set. The one on the right indicates that
the best model may not be a linear one.
Y
+r
-r
0 r
7/28/2019 06 Simple Modeling
4/16
Correlation, |r| 100 r2
0.10 1
0.20 4
0.50 25
0.80 64
0.90 81
0.95 90
0.99 99
1.00 100
Relationship between correlation coefficient (r)
and proportion of variance (r2).
|r| valuesbelow 0.9
indicate a poor
relationship.
Y
X
Y
X
Y
X
Y
X
r = 0.90
r = 1.00
r = -1.00
r = 0.60
7/28/2019 06 Simple Modeling
5/16
7/28/2019 06 Simple Modeling
6/16
Regression of absorbance by ppm
0
0.02
0.04
0.06
0.08
0.1
0.12
0 1 2 3 4 5 6
ppm
absorbance
95% CI
95% CI of
the mean
Pred(absorbance) / absorbance
0
0.02
0.04
0.06
0.08
0.1
0.12
0 0.02 0.04 0.06 0.08 0.1 0.12
Pred(absorbance)
absorbance
Standardized residuals / ppm
-1.5
-1
-0.5
0
0.5
1
0 1 2 3 4 5 6
ppm
Standardizedr
esiduals
7/28/2019 06 Simple Modeling
7/16
In some cases, it is best to do a
simple transformation of your data
prior to attempting a linear
regression fit.
The goal of the transformation is tomake the resulting relationship
linear.
An ANOVA analysis can be conducted
on the transformed data.
Transform Equation of the lin
Y X Y = b X + a
Y 1/X Y = a + b / X
1/Y X Y = 1 / ( a + b X )
X/Y X Y = X / ( a + b X )
log Y X Y = a bX
log Y log X Y = a Xb
Y Xn Y = a + b Xn
b = slope, a = intercept.
MLR assumes a linear relationship between
Xi and y, with superimposed noise (e). It
also assumes that there are no
interactions between X1, X2, Xn.
Y = b0 + b1X1 + b2X2 + . . . . bnXn + e
We then fit a regression equation. The bn
values are considered to be estimates of
the true population parameters, n.
Y=b1X1+b2X2
b1 X12! +b2 X1X2= YX1!!
b2 X22! +b1 X1X2= YX2!!
R2 simply looks a how well all of the X
account for Y. The adjusted R2 is weighted
by the number of X used.
R2=
SSTotalSSTotal-SSresidual
adjusted R2=MSTotal
MSTotal-MSresidual
7/28/2019 06 Simple Modeling
8/16
One advantage of an MLR approach is that you
can obtain multiple measurements (X) for a
single response. This can help eliminate noise.
Well look at one example using MLR -Determination of Octane Number by NIR.
Well revisit this example in a later unit when
we compare it to other multivariate calibration
methods.
A 915 nm, CH2
stretch B 1021 nm, CH2
/CH3
combination band
C 1151 nm, aromatic and CH3
stretch D 1194 nm, CH3
stretch
E 1394 nm, CH2
combination bands F 1412 nm aromatic & CH2
combination bands
G 1435 nm aromatic & CH2 combination bands
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
900 1000 1100 1200 1300 1400 1500 1600
Wavelength
Absorbance
A
C
B
D
E
F
G
Single variable - 900 - just for comparison.
80
82
84
86
88
90
92
94
96
98
0.21 0.215 0.22 0.225 0.23 0.235
900
Octane#
Active
Validation
Model
Conf. interv
(Mean 95%
Conf. interv
(Obs. 95%
7/28/2019 06 Simple Modeling
9/16
Standardized residuals / 900
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0.21 0.215 0.22 0.225 0.23 0.235
900
Standardized
residuals
Active
Validation
Using all values results in a significant
improvement in the fit..
Sum of squares analysis shows which lines are
the most significant for the fit.
What happened to the other variables?
Pred(Octane #) / Octane #
83
85
87
89
91
93
83 85 87 89 91 93
Pred(Octane #)
Octane#
Active Validation
Octane # / Standardized residuals
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
83 85 87 89 91
Octane #
Standardizedr
esiduals
Active Validation
7/28/2019 06 Simple Modeling
10/16
Pred(Octane #) / Octane #
83
85
87
89
91
93
83 85 87 89 91 93
Pred(Octane #)
Octane#
No significant change in the
quality of the fit.
Means you only need 4 lines
to determine octane number.
Pred(Octane #) / Standardized residuals
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
83 85 87 89 91 93
Pred(Octane #)
Standardizedr
esiduals
Number of X
variablesQual Quant Mixed
1Simple
ANOVALR -
2 or more ANOCA MLR ANCOVA
Both treatment andcovariate variables
are significant with
the same model.
Covariate variable
is significant,treatment is not.
Neither are
significant.
Treatment is
significant but
covariate isnt.
Both treatment and
covariate variables aresignificant but the
models are different.
7/28/2019 06 Simple Modeling
11/16
For simplicity, well look at a
single near IR region. It has
one of the highest correlations
with octane number of those
evaluated earlier.
Start with a simple linear
regression analysis, ignoring
the fact there are both summer
and winter blends included.
80
82
84
86
88
90
92
94
96
0.34 0.36 0.38 0.4 0.42
a1360
Octane
#
Active Model
Conf. interval (Mean 95%) Conf. interval (Obs. 95%)
Standardized residuals / a1360
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0.34 0.36 0.38 0.4 0.42
a1360
Standardized
residuals
Octane # / Standardized residuals
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
83 85 87 89 91 93
Octane #
Sta
ndardizedr
esiduals
It seems pretty clear that asimple linear regression has a
problem. It is also obvious that
there are two types of samples.
Evaluating a simple
scatterplot may help.y = 0.0023x + 0.159
R2
= 0.9072
y = 0.0021x + 0.2249
R2
= 0.8963
0.35
0.36
0.37
0.38
0.39
0.40
0.41
0.42
0.43
83.0 84.0 85.0 86.0 87.0 88.0 89.0 90.0 91.0 92.0 93.0
Summer
Winter
Linear (Summer)
Linear (Winter)
Lets try an ANCOVA
using a1360 as thecovariate and the blen
type as an additional
factor.
Pred(Octane #) / Octane #
83
85
87
89
91
93
83 85 87 89 91 93
Pred(Octane #)
Octan
e#
Octane # / Standardized residuals
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
83 85 87 89 91 93
Octane #
Standardizedr
esiduals
7/28/2019 06 Simple Modeling
12/16
x y
1 2
2 43 8
For our simple model, the program will
attempt to find this minimum value.
Yi
f(x)
error
7/28/2019 06 Simple Modeling
13/16
Yi
f(x)
error
This can be more difficult as the number of
adjustable parameters increases. There can
be several false minima that must be avoided.
Typical options
Initial estimate of parameters
An initial guess of your values can not only
save processing time but avoid false
convergence as well.
Scaling of parameters
The magnitude of your parameters can varygreatly. Scaling will give them comparable
weight during the fit.
Maximum iterations
How many guesses it should attemptbefore quitting.
Limits
Do you want to set any upper/lowerlimits for the parameters?
Convergence
How good does the estimate need to be.
7/28/2019 06 Simple Modeling
14/16
For this model, the goal is to determine the
minimum error solution for the following:
We will be minimizinby tracking the sum
of squares - that iswhat is held in thedelta2 column
A A1
Column B contains known constants for theexperiment
Column E contains the values to beadjusted - Dp, Pp and Hp. SSE is the sumof squares that will be tested.
7/28/2019 06 Simple Modeling
15/16
7/28/2019 06 Simple Modeling
16/16
Pred(A1) / A1
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5
Pred(A1)
A1