Correlation and Regression Analysis
� Definitions
� Regression
� Linear Least Squares
� Continuous Least Squares
ITCS 4133/5133: Numerical Comp. Methods 1 Regression
Regression
� Frequently, in engineering problems, estimates of variables with min-imal error are needed.
� Simply using a function of aggregate measures (mean, standard de-viation) is insufficient.
� Building a model to relate a random variable to measured values ofother variables is termed optimization or model calibration
� Uses concepts of calculus and numerical analysis to determine un-knowns of a function that minimizes or maximizes some function.
ITCS 4133/5133: Numerical Comp. Methods 2 Regression
Statistical Optimization
Consists of the following elements:
� Objective Function: the function to be optimized
� Criterion Variable: the dependent variable
� Predictor Variables: the independent variables
� Regression Coefficients: unknowns to be determined
� Mathematical Model: the function that is used to estimate values ofthe criterion variable
� Dataset: measured values of criterion and predictor variables
Optimization requires (1) objective function, (2)the mathematical modeland (3) a dataset.
ITCS 4133/5133: Numerical Comp. Methods 3 Regression
Statistical Optimization (contd)
� Bivariate Model:
E = aX + b
where E, the evaporation is related to T , the temperature via a linearmodel, parameterized by two constants b0, b1
� Multivariate Model:
Y = a1X1 + a2X2 + · · · + apXp + b
where p is the number of predictor variables, Xi is the ith predic-tor variable, b is the intercept coefficient, ais are partial regressioncoefficients.
� Models may be linear, quadratic, cubic, etc.
ITCS 4133/5133: Numerical Comp. Methods 4 Regression
Principle of Least Squares:Motivation
� Oil Reservoir Model: Data relates equil. constant of reaction to pres-sure
ITCS 4133/5133: Numerical Comp. Methods 5 Regression
Principal of Least Squares:Motivation
ITCS 4133/5133: Numerical Comp. Methods 6 Regression
Least Squares Approximation
� The ai coefficients can be determined using the Principle of LeastSquares
� Least Squares is an example of a regression method
� Principle of least squares is used to regress Y on the Xis so as tobring the expected value of the random variable towards the meanof the set.
ITCS 4133/5133: Numerical Comp. Methods 7 Regression
Least Squares Approximation: Procedure
� Error (or residual) is defined as
ei = Yi − Yi
where ei, Yi, Yi are the ith error, predicted and measured variablesrespectively.
� Objective Function:
F = MIN
n∑i=1
(Yi − Yi)2
� The minimization of the objective function by taking derivatives w.r.t.each unknown and setting it to zero.
� Solve the resulting set of equations
ITCS 4133/5133: Numerical Comp. Methods 8 Regression
Linear LSQ Example: Bivariate Model
F = MIN
n∑i=1
(Yi − Yi)2
= MIN
n∑i=1
(aXi + b− Yi)2
Taking derivatives,
∂∑n
i=1 (yi − yi)2
∂a= 2
n∑i=1
(aXi + b− Yi)(Xi) = 0
∂∑n
i=1 (yi − yi)2
∂b= 2
n∑i=1
(aXi + b− Yi) = 0
ITCS 4133/5133: Numerical Comp. Methods 9 Regression
Linear LSQ Example: Bivariate Model (contd)
� Upon simplification these become,
a
n∑i=1
X2i + b
n∑i=1
Xi =
n∑i=1
XiYi
an∑
i=1
Xi + b(n) =
n∑i=1
Yi
� Let Sx =∑n
i=1 Xi, Sxx =∑n
i=1 X2i , etc. Equations become
aSxx + bSx = Sxy
aSx + b(n) = Sy
ITCS 4133/5133: Numerical Comp. Methods 10 Regression
Linear LSQ Example: Bivariate Model (contd)
aSxx + bSx = Sxy
aSx + b(n) = Sy
Can use Cramer’s rule for small systems.
a = det(A1)/det(A)
b = det(A1)/det(A)
a =nSxy − SxSy
(n)Sxx − SxSx
b =SxxSy − SxySx
nSxx − SxSx
ITCS 4133/5133: Numerical Comp. Methods 11 Regression
Linear LSQ Approximation: Algorithm
ITCS 4133/5133: Numerical Comp. Methods 12 Regression
Linear LSQ Approximation: Example:NoisyData
ITCS 4133/5133: Numerical Comp. Methods 13 Regression
Quadratic LSQ Approximation
� Approximate the function f(x) with a quadratic,
f (x) = ax2 + bx + c
� Error function is
E = [a(x1)2 + b(x1) + c− y1]
2 + . . . + [a(xn)2 + b(xn) + c− yn]
2
� Partially differentiate w.r.t. a, b and c and equate to zero.
� The normal equations are given by
a
n∑i=1
x4i + b
n∑i=1
x3i + c
n∑i=1
x2i =
n∑i=1
x2iyi
a
n∑i=1
x3i + b
n∑i=1
x2i + c
∑i=1
xi =
n∑i=1
xiyi
a
n∑i=1
x2i + b
∑i=1
xi + c[n] =
n∑i=1
yiITCS 4133/5133: Numerical Comp. Methods 14 Regression
Quadratic LSQ Approximation:Algorithm
ITCS 4133/5133: Numerical Comp. Methods 15 Regression
Quadratic LSQ Approximation:Example
ITCS 4133/5133: Numerical Comp. Methods 16 Regression
General LSQ Approximation
� Similarly, LSQ approximation can be extended to fit cubics - resultsin 4 equations in 4 unknowns
� More generally, data can be approximated by a function that is alinear combination of a fixed set of functions, also known as basisfunctions.
◦ Linear case: g1(x) = 1, g2(x) = x.◦ Quadratic case: g1(x) = 1, g2(x) = x, g3(x) = x2.
� For the general case,
f (x) = a1g1(x) + a2g2(x) + a3g3(x) + a4g4(x)
where gi(x), i ∈ (1, 2, 3, 4) are the basis functions, and the error Eis
E =
n∑i=1
[f (xi)− yi]2
ITCS 4133/5133: Numerical Comp. Methods 17 Regression
General LSQ Approximation(contd)
� For n data points,
E = [a1g1(x1) + a2g2(x1) + a3g3(x1) + a4g4(x1)− y1]2+ · · ·
[a1g1(xn) + a2g2(xn) + a3g3(xn) + a4g4(xn)− yn]2
� Setting ∂E∂a1
= 0,
a1
4∑i=1
g1(xi)g1(xi) + a2
4∑i=1
g1(xi)g2(xi) +
4∑i=1
g1(xi)g3(xi) +
4∑i=1
g1(xi)g4(xi) =
4∑i=1
g1(xi)yi
and similarly for ∂E∂a2
= 0, etc.
ITCS 4133/5133: Numerical Comp. Methods 18 Regression
Continuous Least-Squares Approximation
� Extension to continuosly defined function.
� Define the problem as fitting a function to a function defined over aninterval, say [0, 1].
� Summations are defined by integrals.
E =
∫ 1
0[ax2 + bx + c− s(x)]
2
resulting in
a
5+
b
4+
c
3=
∫ 1
0x2s(x)dx
a
4+
b
3+
c
2=
∫ 1
0xs(x)dx
a
3+
b
2+ c =
∫ 1
0s(x)dx
ITCS 4133/5133: Numerical Comp. Methods 19 Regression