Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Andrew Lonardelli
December 20, 2013
Multiple Linear Regression
2 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Table Of Contents
Introduction: p.3
Multiple Linear Regression Model: p.3
Least Squares Estimation of the Parameters: p.4-5
The matrix approach to Linear Regression: p.5-6
Estimating σ2: p.7
Properties of the Least Squares Estimator: p.7-8
Test for Significance of Regression: p.8-9
R2
and Adjusted R2: p.9-10
Tests On Individual Regression Coefficients and Subsets of Coefficients: p.10-11
Hypothesis for General Regression Test: p.11
Confidence intervals on Individual Regression Coefficients: p.12
Confidence Interval on the Mean Response: p.12-13
Prediction of New Observations: p.13
Residual Analysis: p.13-14
Influential Observations: p.14-15
Polynomial Regression Models: p.15
Categorical Regressors and Indicator Variables: p.15
Selection Of Variable Building: p.16-17
Stepwise Regression: p.17
Forward Selection: p.17
Backward Elimination: p.17-18
Multicollinearity: p.18
Data/Analysis: p.18-34
3 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Introduction:
In class, we went over simple linear regression where there was one predictor/ regressor variable.
This regressor variable comes with the slope of a best fit line, which tries to extract most of the
information in the data given. Learning how to create multiple linear regression models can give some
ideas and insights into relationships of different variables with different responses. Usually engineers
and scientists use multiple linear regression when working with experiments that have many different
variables affecting the outcome of the experiment.
Multiple Linear Regression Model
There are different situations where there will be more than one regressor variable and this is called the
multiple regression model. With k regressors we get
(1)
Y=β0 + β1x1 + β2x2+ ...+ βkxk +ε
This is a multiple linear regression multiple of k regressors and we consider the error term ε to be
close to zero. We say linear because the equation (1) is a linear function of the unknown parameters β0,
β1, β2, .., βk.
A multiple linear regression model/equation gives a surface where the β0 is the intercept of the
hyperplane, while the coefficients of the regressors are known as the partial regression coefficients. β1
measures the expected change in Y per unit change in x1 while x2, ..., and xk are all held constants. The
same would be said for the other partial regression coefficients.
The dependent variable will be Y, while the independent variables will be all the different x’s. Multiple
linear regression models are used for approximations for the Y variable or even any of the x variables.
Least Squares Estimation of the Parameters
4 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
The least squares method is used to estimate the regression coefficients in the multiple regression
equation. Suppose that n>k observations are available and let xij show the ith observation of variable xj,
the observations are:
Data for Multiple Linear Regression
y x1 x2 … xk
y1 x11 x12 … x1k
y2 x21 x22 … x2k
yn xn1 xn2 … xnk
(This table is depicted that same way as the NHL data table used below.)
Then the model would be:
The least squares function is
We want to minimize the least squares function with respect to β0, β1, … , βk. The least squares
estimates of β0, β1, … , βk must satisfy
and
And by simplifying the equation we get the scalar least squares equation:
5 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Given data, solutions for all the regression coefficients can be obtained with standard linear algebra
techniques.
The matrix approach to Linear Regression
When fitting a multiple Linear Regression model, it would be a lot simpler expressing the operations
using a matrix notation. If there are k regressor variables and n observations, (xi1, xi2, … , xik, yi), i = 1, 2,
… , n, the model relating to the regressor response is:
yi = β0+β1xi1+β2xi2+...+βkxik+εi i = 1, 2, ..., n
This model can be expressed in matrix notation:
y=Xβ+ε
where
y =
X =
β=
β
β
β
and ε=
The X matrix is called the model matrix.
o least squares estimator to solve for the vector β is:
and β is the solution for β in the partial derivative equations:
6 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Theses equations can be shown to be equivalent to the following Normal Equations
This equation is the least squares equation in a matrix form which is identical to scalar least squares
equation given before.
The Least Squares Estimate of β
This is the same equation as before, but we just isolated β.
And this is the matrix form of the normal equations, as you can see, these equations hold a large
resemblance to the scalar normal equations.
With this, we get the fitted regression model to be:
And in matrix notation, it would look like this:
The residual would be the difference between the observed yi and the fitted value .
This is a (n x 1) vector of residuals.
Later on I will calculate the residual for my data.
7 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Estimating σ2
Measuring the variance of the error term, σ2 in multiple linear regression is similar to measuring
the σ2 in just a simple linear regression model. In simple linear regression, we divide the sum of the
squared residuals by n-2 because there were only 2 parameters. Instead, in a multiple linear regression
model there are p parameters so we would divide the sum of the squared residuals by n-p. (SSE is the
sum of the squared residuals)
For my hockey data that you will see later on, there are 15 parameters in total. (14 categories + 1
intercept)
While the formula for SSE is:
We can substitute by making into the above equation, and obtain
SSE = y y - β X y
Properties of the Least Squares Estimators
The properties of the least squares estimators can be found with certain assumptions on the error terms.
We assume that the errors εi are statistically independent with mean zero and variance σ2. With these
assumptions, the least squares estimators are unbiased estimators of the regression coefficients.
This property is shown like this:
Norice we assumed that E(ε) = 0 and used (X X)-1X X = (the identity matrix = ) . hen β is an
unbiased estimator of β. he variances of the β 's are expressed in terms of the inverse of the X X
8 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
matrix, then the inverse X X multiplied by σ2 gives the covariance matrix of the regression coefficients.
The covariance matrix looks like this if there are 2 regressors:
C =(X X)-1
=
Then we can see that C10 = C01, C20 = C02, and C12 = C21 all equal each other because (X X)-1
is
symmetric.
Hence we have:
Normally the covariance matrix is (p x p) that is symmetric and j,j’th element is the variance of β and
the i,j’th element is the covariance between β i and β j:
o obtain the estimates of the variances of these regression coefficients, we replace σ2 with an estimate.
(σ 2 ). he square root of the estimated variance of the jth regression coefficient is known as the
estimated standard error of β j or se(β j) = σ 2 . These standard errors measure the precision of
estimation for the regression coefficients. Small standard error means that there is a good precision.
Test for Significance of Regression
The test for a significance of regression is a test to check if there is a linear relationship between the
response variable y and the regressor variables x1, x2, ..., xk. The hypothesis used is
By rejecting the null hypothesis, we can assume that at least one regressor variable contributes
significantly to the model. Just like in simple linear regression there is a similar formula which is used
that is applied in more general cases.
First the total sum of Squares SST is separated/partitioned into a sum of squares due to the model and the
sum of squares due to the error.
9 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Now if the null hypothesis is true, then SSR/σ2 is a chi-squared random variable with the number of
regressors equal to the degrees of freedom. We can also show that SSE/σ2 is a chi-squared random
variable with observation - parameter (n-p) degrees of freedom. The test statistic for H0: β1 = β2 = ··· =
βk = 0 is
and it follows the F-distribution. We would reject Ho if the computed fo is greater than fα,h,n-p. Usually
the procedure is shown is summarized in an analysis of variance table like this one.
Analysis of Variance for Testing Significance of Regression in Multiple Regression
Source of Variation Sum of Squares Degrees of Freedom Mean Square F0
Regression SSR k MSR MSR/MSE
Error or residual SSE n - p MSE
Total SST n - 1
Since SST is
and we can write that SSE is
or
Therefore SSR (the regression sum of squares) will be
10 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
R2
and Adjusted R2
We can also use the equation from the simple linear regression model for the coefficient of
determination R2
in the general model for multiple linear regression. The R2 statistic is used to evaluate
the fit of the model.
When working with multiple linear regression, many people like to use adjusted R2 because SSE/(n-p) is
the error or the residual mean squared and SST/(n-1) is a constant. R2 will only increase if a variable is
added to a model and so we consider
The adjusted R2 statistic penalizes the analyst for adding terms to the model. This was helps guard
against overfitting, which is including regressors that aren't useful. R2
adj will be used when we look at
variable selection.
Now if we add a regression variable to the model, the sum of the squares will always increase while the
error sum of squares will decrease. (R2 will always increase) So adding an unimportant variable will
cause R2 to increase, so we look at R
2adj instead because it's a better fit. Whereas R
2adj only increase if the
variable added to the model will reduce the error mean square reduces.
Tests On Individual Regression Coefficients and Subsets of Coefficients
We can test hypothesis on individual regression coefficients and these tests will determine the potential
value of each regressor variables in the regression model. This will help make the model more effective
by being able to deleting some variables and adding others.
he hypothesis to test if an individual regression coefficient βj equals a value βj0 is
And the test statistic for this hypothesis is
11 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
where Cjj is the diagonal element of (X X)-1 which corresponds to β j. he denominator of the test
statistic is the standard error of the regression coefficient β j. The null hypothesis Ho: βj = βjo is rejected if
This is known as the partial or marginal test because the regression coefficient β j depends
on all the other regressor models xi (i≠j). A special case where Ho : βjo = 0 is not rejected, this means that
the regressor xj can be deleted.
Partial F Test
Where 0 means a vector of zeroes and β1 is a subset of the regression coefficients. Now the model can
written as if there are 2 regressor variables:
X1 represents the columns of X associated to β1, and X2 represents the columns associated to β2. For the
full model with both β1 and β2 we know that β = (X X)-1
X y. he regression sum of squares for all
variables (with the intercept) is
and
The regression sum of squares of β1 when β2 is in the model is
The sum of squares shown above has r degrees of freedom and it is called the extra sum of squares due
to β1. SSR (β1| β2) is the increase in the regression sum of squares by including the variables x1, x2,..., xr
in the model. The null hypothesis β1 = 0 and the test statistic is
This is called the partial F-test and if fo > fα,r,n-p then we reject H0 and conclude that at least one of the
parameters in β1 is not zero. This means that one of the variables x1, x2,..., xr in X1 contributes
significantly. The partial F-test can measure the contribution of each individual regressor in the model
as if it was the last variable added.
12 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
This is the increase in the regression sum of squares caused by adding xj to the model that already
includes x1, ..., xj-1, xj+1, ..., xk. The F-test can measure the effect of sets of variables.
Confidence intervals on Individual Regression Coefficients
A 100(1-α)% confidence interval on the regression coefficient βj , j = 0,1, ..., k in a multiple linear
regression model is
We can also write it this way
Because is the standard error of the regression coefficient β j. We use the t-score in the
confidence interval because the observations Yi are independently distributed with mean
and variance σ2. ince the least squares estimator β j is a linear combination of the
observations, it follows that β j is normally distributed with mean vector β and covariance matrix
. Cjj is the jjth element of the (X′X)-1
matrix, and is the estimate of the error variance.
Confidence Interval on the Mean Response
We can also get the a confidence interval on the mean response at a particular point (x01, x02, ..., xok). We
need to define the vector
The mean response is E(Y|x0) = μY|x0 = xo β, estimated by
The variance is
13 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
And the 100(1-α)% confidence interval is constructed from the following variable is t-distributed: T=
The 100(1-α)% confidence interval on the mean response at the point (x01, x02, ..., xok) is
Prediction of New Observations
Say x01, x02, ..., xok, we can predict future observations on the response variable Y. f x 0 = [1, x01, x02,
..., xok], a point estimation of the future estimation Y0 at the point x01, x02, ..., xok is
The 100(1-α)% prediction interval for the future observation is
This is a general prediction interval and the prediction interval will always be wider than the mean
interval because of the addition of 1 in the radical. There is a larger error in estimating the prediction
interval than the error interval.
Residual Analysis
The residuals defined by help judge the model accuracy. By plotting the residuals versus
other variables that are excluded but might be a factor because they are possible candidates. This model
can show if variables can be improved when adding the candidate variable.
14 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Standardized Residual
can be useful when assessing the magnitude. Some people like to use the standardized residuals can be
scaled so that their standard deviation is unity. Then there is studentized residual
where hii is the ith diagonal element of the matrix
The H matrix is called the "hat" matrix, since
Thus H transforms the observed values of y into a vector of fitted values .
Since each row of the matrix X corresponds to a vector, , another way to write
the diagonal elements of the hat matrix is
hii is the variance of the fitted .
Under the belief that the model errors are independently distributed with mean zero and variance σ2, we
depict that the variance of the ith residual ei is
This means that the hii elements must fall in the interval 0 < hii ≤ 1. This implies that the standardized residuals
understate the true residual magnitude; thus, the studentized residuals would be better used to examine
potential outliers.
15 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Influential Observations
Their maybe points or variables that is different and remote from the rest of the data. These points can be
influential in determining R2, estimating the regression coefficients and the magnitude of the error mean square.
By measuring the distance we can detect if the points are influential. We measure the squared distance between
the least squares estimate of β based on all n observations and the estimate obtained when the ith point is
removed, say, We use Cook's distance
If the ith point is influential, its removal will result in changing considerably from the value . A large value
of Di means that the ith point is influential. The statistic Di is actually computed using
In the cook's distance formula, Di consists of the squared studentized residual which shows how well the model
fits the ith observation yi A value of Di > 1 would indicate that the point is influential. A component of Di (or
both) may contribute to a large value.
Polynomial Regression Models
This is the second-degree polynomial in one variable.
and the second-degree polynomial with two variables.
They are both linear regression models. Polynomial regression models are used when the response is curvilinear.
The general principles of multiple linear regression still apply.
Categorical Regressors and Indicator Variables
Categorical regressors are when we take into account qualitative variables instead of quantitative variables. To
define the different levels of the qualitative variables, we would use numerical indicator variables. For example,
16 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
if the color red, blue and green where some kind of qualitative variables, then we can indicate 0 for red, 1 for
blue and 2 for green.
A qualitative variable with r-levels can also be shown with r - 1 indicator variables, which are assigned the value
of either zero or one
Selection Of Variables in Model Building
All the models would have a intercept β0 so we would have K+1 terms. But the problem is trying the figure out
which variables is the right variable to choose for inclusion in the model. Preferably we would like a model to
use only a few regressor variables but we don't want to remove any important regression variables. This can
help us with predictions.
One criterion that is used to evaluate and compare the regression models are the R2 and R2adj. The analyst would
increase the variables until the increase to the R2 or R2adj is small. Often the R2
adj will stabilize and decrease when
we add variables to the model. The model that maximises the R2adj is the good candidate for the best regression
equation. The value that maximizes the R2adj also minimizes the mean squared error.
Another criterion is the Cp statistic and this measures the total mean square for the regression model. The total
standardized mean square error is
We use the mean square error from the full K + 1 term model as an estimate of σ2; that is, .
The estimator of Γp is Cp statistic:
If there is a bias in the p-term then:
17 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
The values of Cp for each regression model under consideration should be evaluated to p. The regression
equations that have negligible bias will have values of Cp that are close to p, while those with significant bias will
have values of Cp that are significantly greater than p. We then choose as the “best” regression equation either a
model with minimum Cp or a model with a slightly larger Cp.
Prediction Error Sum of Squares( PRESS) statistic is another way to evaluate competing regression models, and it
is defined as the sum of the squares of the differences between each observation yi and the corresponding
predicted value based on a model fit to the remaining n - 1 points, . PRESS gives a measure of how well the
model is likely to perform when predicting new data or data that was not used to fit the regression model. The
formula for PRESS is
Models with small values of PRESS are preferred.
Stepwise Regression
This procedure constructs a regression model by adding or deleting variables at each step. The criterion to add
and remove variables is using the partial F-Test. Let fin be the value of the F-random variable for adding a
variable to the model, and let fout be the value of the F-random variable for deleting a variable from the model.
We must have fin ≥ fout, and usually fin = fout.
Stepwise regression starts by making a one-variable model using the regressor variable that has the
highest correlation with the variable Y. This regressor will produce the largest F-statistic.
If the calculated value f1 < fout, the variable x1 is removed. If not, we keep the variable and we do the next test
with a new variable and each variable that has been kept.
At each step the set of remaining candidate regressors is examined, and the regressor with the largest partial F-
statistic is entered if the observed value of f exceeds fin. Then the partial F-statistic for each regressor in the
model is calculated, and the regressor with the smallest observed value of F is deleted if the observed f < fout.
The procedure continues until no other regressors can be added to or removed from the model.
Forward Selection
18 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
This procedure is a variation of stepwise regression and we just add a regressor to the model one at a time until
there are no remaining candidate regressors that produce a significant increase in the regression sum of
squares.( Variables are added one at a time as long as their partial F-value exceeds fin)Forward selection is a
simplification of stepwise regression that doesn't use the partial F-test for removing variables from the model
that have been added at previous steps. This is a potential weakness of forward selection because we don't
check the previous variables added.
Backward Elimination
This begins with all K candidate regressors in the model. Then the regressor with the smallest partial F-statistic is
deleted if this F-statistic is insignificant, that is, if f < fout. Next, the model with K - 1 regressors is fit, and the next
regressor for potential elimination is found. The algorithm terminates when no further regressor can be deleted.
(This technique will be used later on in my data.)
Multicollinearity
Normally, we expect to find dependencies between the response variable Y and the regressors xj. But, we can
also find that there are dependencies between the regressor variables xj. In situations where these
dependencies are strong, we say that multicollinearity exists.
Effects of multicollinearity can be evaluated. The diagonal elements of the matrix C = (X′X)-1 can be written as
is the coefficient of multiple determination resulting from regressing xj on the other k - 1 regressor variables.
We can think of as a measure of the correlation between xj and the other regressors. The stronger the linear
dependency of xj on the remaining regressor variables, and the stronger the multicollinearity, the larger the
value of will be. Recall that Therefore, we say that the variance of is “inflated” by the
quantity . Consequently, we define the variance inflation factor for βj as
If the columns of the model matrix X are perpendicular, then the regressors are completely uncorrelated,
and the variance inflation factors will all be unity. VIF that exceeds indicates some level of multicollinearity.
19 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
If VIF exceeds 10, then multicollinearity is a problem. Another way to see if multicollinearity is present is
when the F-test for significance of regression is significant, but the tests on the individual regression
coefficients are not significant, then we may have multicollinearity. Doing more observations and maybe
deleting some variables can decrease the levels of multicollinearity.
Data/Analysis:
Now that we finished summarizing multiple linear regression, were going to look over the data which
we will use.
NHL 2012-2013 Stats of 30 Teams
W Gf Ga
AdV PPGF PCTG PEN BM AVG SHT PPGA PKPCT SHGF SHGA FG
25 112 100 157 25 15.9 655 6 13.6 167 20 88 2 1 25
26 145 128 166 31 18.7 776 12 16.2 157 19 87.9 1 4 32
36 149 97 150 25 16.7 444 6 9.2 141 18 87.2 5 5 29
28 127 106 122 18 14.8 584 14 12.2 163 21 87.1 5 2 26
23 132 139 171 37 21.6 755 18 15.7 184 26 85.9 2 3 26
25 116 112 169 34 20.1 521 4 10.9 147 22 85 3 4 25
29 124 114 149 29 19.5 531 16 11.1 150 23 84.7 3 5 29
26 122 115 165 26 15.8 609 4 12.7 169 27 84 2 3 31
19 123 131 169 34 20.1 522 16 10.9 175 29 83.4 4 1 24
27 131 114 166 33 19.9 481 10 10 161 27 83.2 2 1 26
24 115 115 155 22 14.2 605 10 12.6 144 25 82.6 3 2 20
24 122 110 185 34 18.4 469 6 9.8 164 30 81.7 3 5 25
30 134 115 135 29 21.5 535 10 11.1 162 30 81.5 5 5 24
19 128 157 155 31 20 518 6 10.8 151 28 81.5 6 2 25
26 126 108 153 24 15.7 444 12 9.2 148 28 81.1 5 4 21
19 110 122 176 28 15.9 509 14 10.6 169 32 81.1 11 6 19
22 128 141 171 29 17 623 18 13 179 34 81 2 3 23
26 118 125 151 27 17.9 545 6 11.4 135 26 80.7 1 0 23
18 147 147 163 31 19 577 10 12 155 30 80.6 0 4 19
16 114 150 140 21 15 598 8 12.5 183 36 80.3 3 3 16
24 135 136 156 31 19.9 461 6 9.6 142 28 80.3 2 0 23
21 121 125 169 25 14.8 504 12 10.5 169 34 79.9 2 2 19
29 146 124 203 42 20.7 636 12 13.2 173 35 79.8 0 2 29
24 126 141 145 20 13.8 535 8 11.1 138 28 79.7 1 3 23
36 162 119 170 42 24.7 563 12 11.7 167 34 79.6 2 3 31
21 118 139 163 23 14.1 630 8 13.1 178 37 79.2 7 7 24
20 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
27 146 130 164 44 26.8 516 8 10.8 163 36 77.9 3 4 26
19 127 159 165 24 14.6 538 8 11.2 161 36 77.6 3 4 19
16 109 133 140 24 17.1 471 10 9.8 139 34 75.5 1 4 18
15 109 170 142 29 20.14 541 6 11.3 151 39 74.2 4 1 17
(2012-2013 STATISTICS GATHERED ON NHL.COM)
Before going through many different calculations, we should first understand what each category mean.
W(Y) = WINS
GF(x1) = Goals For
GA(x2) = Goals Against
ADV(x3) = Total Advantage. Power-play opportunities
PPGF(x4) = Power-play Goals For
PCTG(x5) = Power-play Percentage. Power-play Goals For Divided by Total Advantages
PEN(x6) = Total Penalty Minutes Including Bench Minors
BMI(x7) = Total Bench Minor Minutes
AVG(x8) = Average Penalty Minutes Per Game
SHT(x9) = Total Times Short-handed. Measures Opponent Opportunities
PPGA(x10) = Power-play Goals Against
PKPCT(x11) = Penalty Killing Percentage. Measures a Team's Ability to Prevent Goals While
its Opponent is on a Power-play. Opponent Opportunities Minus Power-play Goals Divided by
Opponents' Opportunities
SHGF(x12) = Short-handed Goals For
SHGA(x13) = Short-handed Goals Against
FG(x14) = Games Scored First
With this data, I will investigate a multiple linear regression model with the response variable
Y being wins and the other variables will be my regressor variables. To make a good model, I
will use the Backward Elimination, by first placing all my regressor variables in Minitab and
removing the variables whose individual tests for significance show p-values that are greater
than 0.05. The variables with highest p-values will be removed one at a time until there are no
more p-values greater than 0.05. The highlighted variables are the ones being removed in the
next trial.
(Trial 1).Regression Analysis: W versus Gf, GA, ... The regression equation is
W = - 381 + 0.0761 Gf - 0.170 GA - 0.080 ADV + 0.024 PPGF + 0.13 PCTG
+ 0.520 PEN + 0.073 BMI - 24.6 AVG - 0.669 SHT + 3.47 PPGA + 5.09 PKPCT
- 0.004 SHGF - 0.378 SHGA + 0.557 FG
Predictor Coef SE Coef T P
21 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Constant -381.5 162.2 -2.35 0.033
Gf 0.07607 0.05823 1.31 0.211
GA -0.16994 0.03515 -4.84 0.000
ADV -0.0798 0.1687 -0.47 0.643
PPGF 0.0244 0.9283 0.03 0.979
PCTG 0.131 1.458 0.09 0.930
PEN 0.5198 0.2946 1.76 0.098
BMI 0.0735 0.1058 0.69 0.498
AVG -24.60 14.18 -1.73 0.103
SHT -0.6689 0.2427 -2.76 0.015
PPGA 3.474 1.292 2.69 0.017
PKPCT 5.087 2.061 2.47 0.026
SHGF -0.0044 0.2327 -0.02 0.985
SHGA -0.3781 0.2549 -1.48 0.159
FG 0.5569 0.1519 3.67 0.002
S = 1.82634 R-Sq = 93.7% R-Sq(adj) = 87.8%
Analysis of Variance
Source DF SS MS F P
Regression 14 743.967 53.141 15.93 0.000
Residual Error 15 50.033 3.336
Total 29 794.000
Source DF Seq SS
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
PPGF 1 2.036
PCTG 1 0.125
PEN 1 1.048
BMI 1 5.217
AVG 1 0.575
SHT 1 1.771
PPGA 1 4.566
PKPCT 1 18.115
SHGF 1 0.038
SHGA 1 5.509
FG 1 44.820
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 20.196 1.434 2.804 2.48R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 340.657 100.285 (126.906, 554.409) (126.871, 554.444)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
22 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
New Obs Gf GA ADV PPGF PCTG PEN BMI AVG SHT PPGA PKPCT SHGF
1 130 114 155 30.0 18.5 600 4.00 10.0 11.2 155 26.0 81.4
New Obs SHGA FG
1 3.00 23.0
(Trial 2) Regression Analysis: W versus Gf, GA, ... The regression equation is
W = - 380 + 0.0766 Gf - 0.170 GA - 0.080 ADV + 0.024 PPGF + 0.13 PCTG
+ 0.518 PEN + 0.073 BMI - 24.5 AVG - 0.667 SHT + 3.46 PPGA + 5.07 PKPCT
- 0.380 SHGA + 0.557 FG
Predictor Coef SE Coef T P
Constant -380.2 142.6 -2.67 0.017
Gf 0.07662 0.04895 1.57 0.137
GA -0.17013 0.03268 -5.21 0.000
ADV -0.0796 0.1630 -0.49 0.632
PPGF 0.0240 0.8987 0.03 0.979
PCTG 0.130 1.412 0.09 0.928
PEN 0.5181 0.2714 1.91 0.074
BMI 0.0731 0.1006 0.73 0.478
AVG -24.51 13.02 -1.88 0.078
SHT -0.6673 0.2199 -3.03 0.008
PPGA 3.464 1.146 3.02 0.008
PKPCT 5.070 1.808 2.80 0.013
SHGA -0.3800 0.2268 -1.68 0.113
FG 0.5566 0.1460 3.81 0.002
S = 1.76837 R-Sq = 93.7% R-Sq(adj) = 88.6%
R2adj Increased, which means that the model takes into account 88.6% of the data and the error mean
squared decreased.
Analysis of Variance
Source DF SS MS F P
Regression 13 743.966 57.228 18.30 0.000
Residual Error 16 50.034 3.127
Total 29 794.000
Source DF Seq SS
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
PPGF 1 2.036
PCTG 1 0.125
PEN 1 1.048
BMI 1 5.217
AVG 1 0.575
SHT 1 1.771
PPGA 1 4.566
PKPCT 1 18.115
SHGA 1 4.953
FG 1 45.412
Unusual Observations
23 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 20.196 1.388 2.804 2.56R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 340.241 94.739 (139.402, 541.079) (139.367, 541.114)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
New Obs Gf GA ADV PPGF PCTG PEN BMI AVG SHT PPGA PKPCT SHGA
1 130 114 155 30.0 18.5 600 4.00 10.0 11.2 155 26.0 3.00
New Obs FG
1 23.0
(Trial 3) Regression Analysis: W versus Gf, GA, ... The regression equation is
W = - 380 + 0.0770 Gf - 0.170 GA - 0.0753 ADV + 0.168 PCTG + 0.517 PEN
+ 0.0731 BMI - 24.4 AVG - 0.666 SHT + 3.46 PPGA + 5.06 PKPCT - 0.382 SHGA
+ 0.557 FG
Predictor Coef SE Coef T P
Constant -380.1 138.3 -2.75 0.014
Gf 0.07702 0.04520 1.70 0.107
GA -0.17033 0.03082 -5.53 0.000
ADV -0.07532 0.02465 -3.06 0.007
PCTG 0.1676 0.1573 1.07 0.301
PEN 0.5168 0.2591 1.99 0.062
BMI 0.07315 0.09761 0.75 0.464
AVG -24.45 12.42 -1.97 0.066
SHT -0.6663 0.2098 -3.18 0.006
PPGA 3.459 1.096 3.16 0.006
PKPCT 5.061 1.721 2.94 0.009
SHGA -0.3816 0.2123 -1.80 0.090
FG 0.5566 0.1417 3.93 0.001
S = 1.71561 R-Sq = 93.7% R-Sq(adj) = 89.2%
R2adj increased, the model takes into account 89.2% of the data and the error mean square was
reduced.
Analysis of Variance
Source DF SS MS F P
Regression 12 743.964 61.997 21.06 0.000
Residual Error 17 50.036 2.943
Total 29 794.000
Source DF Seq SS
24 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
PCTG 1 2.116
PEN 1 0.923
BMI 1 5.297
AVG 1 0.607
SHT 1 1.780
PPGA 1 4.340
PKPCT 1 17.818
SHGA 1 5.505
FG 1 45.430
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 20.191 1.336 2.809 2.61R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 339.757 90.220 (149.409, 530.105) (149.375, 530.139)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
New Obs Gf GA ADV PCTG PEN BMI AVG SHT PPGA PKPCT SHGA FG
1 130 114 155 18.5 600 4.00 10.0 11.2 155 26.0 3.00 23.0
(Trial 4)Regression Analysis: W versus Gf, GA, ... The regression equation is
W = - 353 + 0.0867 Gf - 0.173 GA - 0.0748 ADV + 0.163 PCTG + 0.495 PEN
- 23.4 AVG - 0.616 SHT + 3.23 PPGA + 4.72 PKPCT - 0.358 SHGA + 0.533 FG
Predictor Coef SE Coef T P
Constant -353.3 132.0 -2.68 0.015
Gf 0.08673 0.04277 2.03 0.058
GA -0.17274 0.03028 -5.71 0.000
ADV -0.07481 0.02434 -3.07 0.007
PCTG 0.1633 0.1552 1.05 0.307
PEN 0.4955 0.2543 1.95 0.067
AVG -23.41 12.19 -1.92 0.071
SHT -0.6162 0.1965 -3.14 0.006
PPGA 3.232 1.040 3.11 0.006
PKPCT 4.717 1.639 2.88 0.010
SHGA -0.3578 0.2073 -1.73 0.101
FG 0.5328 0.1364 3.91 0.001
S = 1.69459 R-Sq = 93.5% R-Sq(adj) = 89.5%
R2adj increased, the model takes into account 89.5% of the data and the error mean square was
reduced.
25 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Analysis of Variance
Source DF SS MS F P
Regression 11 742.311 67.483 23.50 0.000
Residual Error 18 51.689 2.872
Total 29 794.000
Source DF Seq SS
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
PCTG 1 2.116
PEN 1 0.923
AVG 1 0.500
SHT 1 4.319
PPGA 1 4.713
PKPCT 1 20.018
SHGA 1 5.742
FG 1 43.831
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 20.056 1.308 2.944 2.73R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 320.809 85.543 (141.090, 500.528) (141.054, 500.564)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
New Obs Gf GA ADV PCTG PEN AVG SHT PPGA PKPCT SHGA FG
1 130 114 155 18.5 600 10.0 11.2 155 26.0 3.00 23.0
(Trial 5)Regression Analysis: W versus Gf, GA, ... The regression equation is
W = - 294 + 0.112 Gf - 0.174 GA - 0.0712 ADV + 0.375 PEN - 17.7 AVG - 0.530 SHT
+ 2.79 PPGA + 3.98 PKPCT - 0.363 SHGA + 0.566 FG
Predictor Coef SE Coef T P
Constant -294.3 119.8 -2.46 0.024
Gf 0.11172 0.03566 3.13 0.005
GA -0.17371 0.03035 -5.72 0.000
ADV -0.07115 0.02416 -2.95 0.008
PEN 0.3749 0.2277 1.65 0.116
AVG -17.70 10.95 -1.62 0.122
SHT -0.5297 0.1789 -2.96 0.008
PPGA 2.7884 0.9532 2.93 0.009
26 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
PKPCT 3.977 1.484 2.68 0.015
SHGA -0.3633 0.2078 -1.75 0.097
FG 0.5657 0.1331 4.25 0.000
S = 1.69931 R-Sq = 93.1% R-Sq(adj) = 89.5%
R2adj stayed the same, the model still takes into account 89.5% of the data and the error mean
square did not change.
Analysis of Variance
Source DF SS MS F P
Regression 10 739.135 73.913 25.60 0.000
Residual Error 19 54.865 2.888
Total 29 794.000
Source DF Seq SS
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
PEN 1 0.563
AVG 1 0.005
SHT 1 3.928
PPGA 1 5.859
PKPCT 1 10.621
SHGA 1 5.852
FG 1 52.158
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 19.656 1.254 3.344 2.92R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 278.982 75.946 (120.025, 437.939) (119.986, 437.979)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
New Obs Gf GA ADV PEN AVG SHT PPGA PKPCT SHGA FG
1 130 114 155 600 10.0 11.2 155 26.0 3.00 23.0
(Trial 6) Regression Analysis: W versus Gf, GA, ... The regression equation is
W = - 243 + 0.136 Gf - 0.188 GA - 0.0670 ADV + 0.00696 PEN - 0.460 SHT
+ 2.40 PPGA + 3.34 PKPCT - 0.304 SHGA + 0.504 FG
Predictor Coef SE Coef T P
Constant -243.0 120.1 -2.02 0.057
27 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Gf 0.13569 0.03372 4.02 0.001
GA -0.18826 0.03013 -6.25 0.000
ADV -0.06697 0.02497 -2.68 0.014
PEN 0.006962 0.006255 1.11 0.279
SHT -0.4598 0.1805 -2.55 0.019
PPGA 2.3953 0.9581 2.50 0.021
PKPCT 3.342 1.488 2.25 0.036
SHGA -0.3037 0.2126 -1.43 0.169
FG 0.5043 0.1326 3.80 0.001
S = 1.76653 R-Sq = 92.1% R-Sq(adj) = 88.6%
R2adj decreased by a little which is alright, the model takes into account 88.6% of the data and
the error mean square has increased while there are less insignificant regressors.
Analysis of Variance
Source DF SS MS F P
Regression 9 731.587 81.287 26.05 0.000
Residual Error 20 62.413 3.121
Total 29 794.000
Source DF Seq SS
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
PEN 1 0.563
SHT 1 3.929
PPGA 1 5.738
PKPCT 1 10.637
SHGA 1 5.445
FG 1 45.127
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
2 145 26.000 28.802 1.231 -2.802 -2.21R
5 132 23.000 19.433 1.296 3.567 2.97R
19 147 18.000 20.628 1.213 -2.628 -2.05R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 210.621 65.582 (73.819, 347.422) (73.770, 347.472)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
New Obs Gf GA ADV PEN SHT PPGA PKPCT SHGA FG
1 130 114 155 600 11.2 155 26.0 3.00 23.0
(Trial 7)Regression Analysis: W versus Gf, GA, ADV, SHT, PPGA, PKPCT, SHGA, FG
28 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
The regression equation is
W = - 211 + 0.135 Gf - 0.171 GA - 0.0651 ADV - 0.387 SHT + 2.07 PPGA
+ 2.92 PKPCT - 0.280 SHGA + 0.539 FG
Predictor Coef SE Coef T P
Constant -210.5 117.2 -1.80 0.087
Gf 0.13489 0.03390 3.98 0.001
GA -0.17085 0.02590 -6.60 0.000
ADV -0.06512 0.02505 -2.60 0.017
SHT -0.3869 0.1692 -2.29 0.033
PPGA 2.0723 0.9183 2.26 0.035
PKPCT 2.923 1.448 2.02 0.056
SHGA -0.2801 0.2127 -1.32 0.202
FG 0.5387 0.1297 4.15 0.000
S = 1.77656 R-Sq = 91.7% R-Sq(adj) = 88.5%
R2adj decreased by a little which is alright, so did the R
2(R
2 will always decrease if you remove a
regressor) the model takes into account 88.5% of the data and the error mean squared has increased
while there are less insignificant regressors.
Analysis of Variance
Source DF SS MS F P
Regression 8 727.721 90.965 28.82 0.000
Residual Error 21 66.279 3.156
Total 29 794.000
Source DF Seq SS
Gf 1 317.950
GA 1 332.049
ADV 1 10.149
SHT 1 2.016
PPGA 1 0.974
PKPCT 1 6.341
SHGA 1 3.795
FG 1 54.446
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 19.375 1.302 3.625 3.00R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 181.882 60.628 (55.800, 307.964) (55.746, 308.018)XX
XX denotes a point that is an extreme outlier in the predictors.
Values of Predictors for New Observations
New Obs Gf GA ADV SHT PPGA PKPCT SHGA FG
1 130 114 155 11.2 155 26.0 3.00 23.0
29 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
(Trial 8)Regression Analysis: W versus Gf, GA, ADV, SHT, PPGA, PKPCT, FG The regression equation is
W = - 157 + 0.138 Gf - 0.166 GA - 0.0621 ADV - 0.312 SHT + 1.63 PPGA
+ 2.25 PKPCT + 0.530 FG
Predictor Coef SE Coef T P
Constant -156.5 111.6 -1.40 0.175
Gf 0.13778 0.03439 4.01 0.001
GA -0.16594 0.02605 -6.37 0.000
ADV -0.06214 0.02536 -2.45 0.023
SHT -0.3121 0.1620 -1.93 0.067
PPGA 1.6260 0.8675 1.87 0.074
PKPCT 2.250 1.377 1.63 0.116
FG 0.5296 0.1317 4.02 0.001
S = 1.80593 R-Sq = 91.0% R-Sq(adj) = 88.1%
R2adj decreased by a little which is alright, the model takes into account 88.1% of the data and
the error mean square has increased while there are less insignificant regressors. R2 also
decreased, but that is normal.
Analysis of Variance
Source DF SS MS F P
Regression 7 722.25 103.18 31.64 0.000
Residual Error 22 71.75 3.26
Total 29 794.00
Source DF Seq SS
Gf 1 317.95
GA 1 332.05
ADV 1 10.15
SHT 1 2.02
PPGA 1 0.97
PKPCT 1 6.34
FG 1 52.77
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
5 132 23.000 19.831 1.276 3.169 2.48R
19 147 18.000 20.993 1.180 -2.993 -2.19R
29 109 16.000 19.006 1.252 -3.006 -2.31R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 152.049 57.164 (33.499, 270.599) (33.439, 270.658)XX
XX denotes a point that is an extreme outlier in the predictors.
30 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Values of Predictors for New Observations
New Obs Gf GA ADV SHT PPGA PKPCT FG
1 130 114 155 11.2 155 26.0 23.0
(Trial 9)Regression Analysis: W versus Gf, GA, ADV, SHT, PPGA, FG The regression equation is
W = 25.6 + 0.153 Gf - 0.175 GA - 0.0571 ADV - 0.0516 SHT + 0.217 PPGA + 0.517 FG
Predictor Coef SE Coef T P
Constant 25.552 5.968 4.28 0.000
Gf 0.15307 0.03428 4.47 0.000
GA -0.17550 0.02629 -6.67 0.000
ADV -0.05713 0.02608 -2.19 0.039
SHT -0.05156 0.02952 -1.75 0.094
PPGA 0.21653 0.09688 2.23 0.035
FG 0.5167 0.1361 3.80 0.001
S = 1.87036 R-Sq = 89.9% R-Sq(adj) = 87.2%
R2adj decreased by a little which is alright, the model takes into account 87.2% of the data and
the error mean square has increased while there are less insignificant regressors. R2 also
decreased, but that is normal.
Analysis of Variance
Source DF SS MS F P
Regression 6 713.54 118.92 33.99 0.000
Residual Error 23 80.46 3.50
Total 29 794.00
Source DF Seq SS
Gf 1 317.95
GA 1 332.05
ADV 1 10.15
SHT 1 2.02
PPGA 1 0.97
FG 1 50.40
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
19 147 18.000 21.263 1.210 -3.263 -2.29R
29 109 16.000 20.392 0.954 -4.392 -2.73R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 61.457 14.450 (31.564, 91.350) (31.315, 91.599)XX
XX denotes a point that is an extreme outlier in the predictors.
31 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Values of Predictors for New Observations
New Obs Gf GA ADV SHT PPGA FG
1 130 114 155 11.2 155 23.0
(Trial 10)Regression Analysis: W versus Gf, GA, ADV, PPGA, FG The regression equation is
W = 20.9 + 0.163 Gf - 0.174 GA - 0.0689 ADV + 0.156 PPGA + 0.461 FG
Predictor Coef SE Coef T P
Constant 20.868 5.554 3.76 0.001
Gf 0.16264 0.03525 4.61 0.000
GA -0.17388 0.02738 -6.35 0.000
ADV -0.06889 0.02625 -2.62 0.015
PPGA 0.15614 0.09429 1.66 0.111
FG 0.4608 0.1378 3.34 0.003
S = 1.94861 R-Sq = 88.5% R-Sq(adj) = 86.1%
R2adj decreased by a little which is alright, the model takes into account 86.1% of the data and
the error mean square has increased while there are less insignificant regressors. R2 also
decreased, but that is normal.
Analysis of Variance
Source DF SS MS F P
Regression 5 702.87 140.57 37.02 0.000
Residual Error 24 91.13 3.80
Total 29 794.00
Source DF Seq SS
Gf 1 317.95
GA 1 332.05
ADV 1 10.15
PPGA 1 0.29
FG 1 42.43
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
19 147 18.000 21.427 1.257 -3.427 -2.30R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 46.311 12.042 (21.457, 71.165) (21.134, 71.488)XX
XX denotes a point that is an extreme outlier in the predictors.
32 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
Values of Predictors for New Observations
New Obs Gf GA ADV PPGA FG
1 130 114 155 155 23.0
*(Trial 11)Regression Analysis: W versus Gf, GA, ADV, FG* The regression equation is
W = 20.9 + 0.172 Gf - 0.152 GA - 0.0514 ADV + 0.365 FG
Predictor Coef SE Coef T P
Constant 20.923 5.744 3.64 0.001
Gf 0.17239 0.03594 4.80 0.000
GA -0.15227 0.02489 -6.12 0.000
ADV -0.05144 0.02486 -2.07 0.049
FG 0.3648 0.1293 2.82 0.009
S = 2.01536 R-Sq = 87.2% R-Sq(adj) = 85.2%
R2adj decreased by a little which is alright, the model takes into account 85.2% of the data and
the error mean square has increased while there are less insignificant regressors. R2 also
decreased, but that is normal.
Analysis of Variance
Source DF SS MS F P
Regression 4 692.46 173.11 42.62 0.000
Residual Error 25 101.54 4.06
Total 29 794.00
Source DF Seq SS
Gf 1 317.95
GA 1 332.05
ADV 1 10.15
FG 1 32.31
Unusual Observations
Obs Gf W Fit SE Fit Residual St Resid
19 147 18.000 22.427 1.141 -4.427 -2.66R
R denotes an observation with a large standardized residual.
Predicted Values for New Observations
New Obs Fit SE Fit 95% CI 95% PI
1 26.392 0.585 (25.188, 27.596) (22.070, 30.714)
Values of Predictors for New Observations
New Obs Gf GA ADV FG
1 130 114 155 23.0
33 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
*The highlight represents the variables that were removed in the next trial. *
Now we will look at the NHL data with the Least Squares Method with my last trial. Calculating data were
n=30 and k=4 is too large to be able to do manually, instead I used the program Minitab and got a linear
regression model.
W = 20.9 + 0.172 Gf - 0.152 GA - 0.0514 ADV + 0.365 FG
Some aspects of the model makes sense while others don't:
Remember, the 4 regressor variables create a linear function with the response Y (wins). The β 0 is the
intercept and it is 20.9. In practical terms the intercept doesn't really make since because it is saying that if a
team shows up to a game and does absolutely nothing, they will finish the season off with 21 wins.
The regressor β 1 is the expected change in Y(wins) per unit change in x1(GF), if the other variables were held
the same. The β 1 is +0.172 and it makes sense. Goals for (GF) is when your team scores a goal which gives
that team a better chance to win. The more goals you score, the better the chance of getting a win.
The β 2 is -0.152 which also makes sense. Goals against (GA) is the amount of goals the other team scores on
you, which reduces the chances of winning a game. β 2 is the expected change in wins (Y) per unit change in
(Goals against) x2.
The β 3 is -0.0514 and this regressor doesn't make sense. ADV is the amount of power-plays your team has,
which is an advantage and can help you win hockey games. This should be a positive regressor because the
more power-plays a team has, the greater chance they have of winning a game. Just like the other regressors,
this regressor is the expected change in wins per unit change of x3.
Finally, β 4 is + 0.365 and this makes sense. FG is when your team scores first, normally when a team scores
first, they take an early lead and is one step closer to winning a game. β 4 is the expected change in wins per
unit change when a team scores first.
This is a fitted regression which is practical to predict wins for a NHL team given the other regressor
variables are.
β 0 = 20.9 with p-value of 0.001
β 1 =0.172 with p-value of 0.000
β 2 =-0.152 with p-value of 0.000
β 3 = -0.0514 with p-value of 0.049
34 Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.
β 4 =0.365 with p-value of 0.049
The p-values are acquired by doing the this test
The R2 is 87.2% while the R2adj is 85.2%. R2 shouldn't be considered because with the addition of any
variables, the R2 never decreases even if the errors rise. Many people take into account R2adj because it
holds a better fit to the model. Now this is not the largest R2adj seen. In trial 4 and 5 the R2
adj was at 89.5%.
This means that the model fit for 89.5% of the data and that the model was significant but the regressor
variables had a p-value larger than 0.05 which made the regressor variables not significant. Which lead to
the R2adj to be 85.2%. 85.2% is still significant and it fits for 85% of the data. R2
adj is better because it guards
for over fitting while R2 causes it to over fit when adding a not so useful variable. We can say that R2adj
penalizes the analyst for adding terms to the model.
y es mated ariance σ 2) is 4.06.
For the prediction inter al when α = 0.05 Gf = 130, Ga =114, ADV = 155, FG = 23) is
22.070<Yo<30.714 with 95% confidence
The 95% confidence interval for the mean response (Gf = 130, Ga =114, ADV = 155, FG = 23) is
25.188< μY|x0< 27.596 with 95% confidence
While the residual is -4.427, the standard residual is -2.66 and the Yi =18
Work Cited
Montgomery, Applied Statistics and Probability for Engineers, 5e. Textbook.