Model Selection 1
Multiple Regression 2
Model Selection 2
1. Regress Y on each k potential X variables.2. Determine the best single variable model.3. Regress Y on the best variable and each of the
remaining k-1 variables.4. Determine the best model that includes the
previous best variable and one new best variable.
5. If either the adjusted-R2 declines, the standard error of the regression increases, the t-statistic of the best variable is insignificant, or the coefficients are theoretically inconsistent, STOP, and use the previous best model.
Repeat 2-4 until stopped or an all variable model has been reached.
A Forward Selection Heuristic
Model Selection 3
The idea behind Forward Selection
-If the adjusted-R2 declines when an additional variable is added, then the added value of the variable does not outweigh its modeling cost.
- If the standard error increases then the additional variable has not improved estimation.
- If the t-statistic of one of the variables is insignificant then there may be too many variables.
- If the coefficients are inconsistent with theory may indicate multicollinearity effects.
Model Selection 4
1. Regress Y on all k potential X variables2. Use t-tests to determine which X has the
least amount of significance3. If this X does not meet some minimum
level of significance, remove it from the model
4. Regress Y on the set of k-1 X variables
Repeat 2-4 until all remaining Xs meet minimum
The Backward Elimination Heuristic
Multiple Regression 1 5
Use Tests One at a Time
The tests should be used one at a time.
• T1 can tell you to drop X1 and keep X2-X6
• T2 can tell you to drop X2 and keep X1 and X3-X6
• Together, they don’t necessarily tell you to drop both and keep X3-X6
Model Selection 6
The idea behind Backwards EliminationIf tstat not significant, we can remove an X and simplify the model while still maintaining the model’s high Rsquare.
Typical stopping rule
Continue until all Xs meet some target “significance level to stay” (often .10 or .15 to keep more Xs).
Model Selection 7
The forward and backward heuristics may or may not result in the same end model. Generally however the resulting models should be quite similar.
The backwards elimination model requires that you start with a model that includes all possible explanatory variables. But, for example, Excel will only conduct regression for up to 16 variables.
Concordance
Model Selection 8
When using many variables in a regression, it may be the case that some of the explanatory variables are highly correlated with other explanatory variables. In the extreme when two of the variables are linearly related, the multiple regression will fail as unstable.
Simple indicators are a failure of the F-test; an increase in Standard Error; insignificant t-statistic for a previously significant variable; theoretically inconsistent coefficients.
Recall also that when using a categorical variable, one of the categories must be “left out”.
Multi-collinearity
Model Selection 9
The variance-inflation-factors (VIFs) should be calculated after reaching a supposed stopping point in a multiple regression selection method.
The VIFs are calculated for each independent variable by regressing that INDEPENDENT VARIABLE against the other independent variables = 1 / (1-R2)
A simple rule-of-thumb is that the VIFs should be less than 4.
VIF as a measure of multi-collinearity
Model Selection 10
The forward and backward heuristic rely on adding or deleting one variable at a time.
It is however possible to evaluate the statistical significance of including a set of variables by constructing the partial F-statistic.
Subsets of Variables
Multiple regression 5 -- The partial F test 11
The “full” and “reduced” models Suppose there are r variables in the group
Define the full model to be the one with all Xs (all k predictors)
Define the reduced model to be the one with the group left out (it has k-r variables).
Multiple regression 5 -- The partial F test 12
Partial F Statistic Look at the increase in the sum of squared
errors SSEReduced – SSEFull to see how much of the explained variation is lost.
Divide this by r, the number of variables in the group.
Put this in ratio to the MSE of the full model.
This is called the partial F statistic.
Multiple regression 5 -- The partial F test 13
Partial F Statistic
This has an F distribution with r numerator and (n-k-1) denominator degrees of freedom
F
FR
MSE
SSE(SSEFPartial
r/)
Multiple regression 5 -- The partial F test 14
Two regression runs
Full
Reduced
Multiple regression 5 -- The partial F test 15
The Partial F for 4 variablesHo: Four variable coefficients are insignificant
H1: at least one variable coefficient in the group is useful
(889.042 – 765.939 )/4 30.776F = -------------------- = ----- = 3.255 9.456 9.456
The correct F dist to test against is 4 numerator and 81 denominator degrees of freedom. The value for a (4,60) distribution is 2.53 at a significance level of .05 and 3.65 at a significance level of .01
Multiple Regression 4: Indicator Variables
16
Extensions
• Two lines, different slopes
• More than two categories
•Multicategory, multislope
Multiple regression 5 -- The partial F test 17
Fit two lines with different slopes Recall that using the Executive variable
alone created a salary model with two lines having different intercepts.
Adding the variable Alpha Experience resulted in a model also having two lines with different intercepts.
But, what if there is an interaction effect between Executive status and Alpha experience.
Multiple regression 5 -- The partial F test 18
Create two new variables. The Executive status variable has two categories: 0
and 1.
Create two variables from Alpha experience so that◦ when Executive =0, Alpha retains its value, otherwise it
equals 0.◦ When Executive = 1, Alpha retains its value, otherwise it
equals 0.
Using now three variables, Executive status and the two alpha variables will result in a model with two lines having different intercepts and different slopes capturing a simple interaction effect among the variables.
Model Selection 19
Executive Status variable
Model Selection 20
Executive Status and Alpha Experience
Model Selection 21
Executive Status and Alpha Experience with Interaction