Upload
jacob-west
View
215
Download
1
Embed Size (px)
Citation preview
1G89.2229 Lect 5W
• Regression assumptions
• Inferences about regression
• Predicting Day 29 Anxiety
• Analysis of sets of variables: partitioning the sums of squares
• Predicting Day 29 Anger with two Day 28 support measures, after adjusting for Day 28 Mood
G89.2229 Multiple Regression Week 5 (Wednesday)
2G89.2229 Lect 5W
Usual OLS Regression Assumptions
• Needed for unbiased estimates» Model is properly specified
• Linear model?• Selection characteristics
included as IVs?• Reliable IVs?
• Needed for efficient estimates» Independent observations» Homoscedastic residuals
• Needed for inference» Independent residuals» Homoscedastic residuals» Normally distributed residuals
3G89.2229 Lect 5W
Inferences about regression
• Suppose we believe that reaction time, Y, is inverse-linearly related to amount of cereal subjects eat.» Y=A+BX+e, where B<0
• We collect data from 20 students on the grams(x10) of cereal they ate, and we measure their reaction time in identifying ambiguous stimuli.
• Suppose we obtain estimates of Â=453 and B=-.6
• Question: Is there really evidence that X and Y are related?
» Can we reject H0: B=0?(In this case H0 is credible!)
ˆ
4G89.2229 Lect 5W
B estimates are random variables
• Even if B is truly zero, it is unlikely that B will be zero» The least squares criterion guarantees
that B will fit even chance association between Y and X.
» Especially for small samples, chance associations can be striking.
• Example of chance results:
ˆ
ˆ
TEN RANDOM SETS OF Y NUMBERS ASSOCIATED WITH A FIXED XX Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
-0.9051 -0.542 1.5483 -0.097 -0.19 -0.588 -0.468 0.9165 1.5759 -0.873 -1.609-0.751 0.3759 0.6454 2.7952 0.469 0.7946 -0.364 0.3553 -0.739 -0.928 -1.176
-0.7463 -0.473 1.4903 0.4562 -0.36 0.5954 0.4532 0.8936 0.8749 0.1414 -0.552-0.486 -0.2 0.139 -1.277 0.0163 0.4325 0.4834 -0.217 0.7097 1.1672 0.0017
-0.4719 1.848 1.3726 0.6994 -1.035 -0.125 -2.007 -1.878 -1.746 -0.545 1.1778-0.3786 -0.52 1.4496 1.2535 -0.593 0.7131 -0.071 1.3424 -1.695 1.6091 -0.017-0.3161 1.3836 0.2476 -0.051 0.4506 0.2617 2.5619 -1.3 -1.554 0.1389 -1.344-0.3044 1.6534 -2.135 0.26 0.8263 -1.075 -1.7 -0.952 0.656 1.425 -0.783-0.2244 -0.982 -0.647 -2.371 1.7112 1.0909 -0.249 0.4269 0.8333 -0.364 1.4492-0.1795 -0.441 0.4023 0.0411 -0.441 1.761 -0.57 -0.335 -1.603 -1.75 1.3185-0.1444 1.5766 -0.824 0.3415 -0.45 1.1703 -1.566 0.0787 -0.502 -0.316 0.1037
-0.016 0.876 0.4681 1.3639 1.1135 1.8292 -1.536 0.008 0.071 -0.26 -0.619-0.0103 -1.228 -0.204 1.0044 -0.229 -0.919 0.9972 1.0638 -0.823 -1.495 -0.1440.00974 -0.909 -0.377 -0.39 -1.071 -1.536 0.4459 0.6276 0.3031 0.3773 0.4183
0.0373 -0.52 -0.185 -1.028 0.7013 0.5918 -0.525 1.4214 -2.273 1.9763 0.1830.27863 -3.006 0.8735 -0.416 0.0158 -1.02 0.1071 0.9589 0.571 0.3113 -0.3740.27863 -0.43 -0.508 -0.871 -0.395 -1.006 -0.22 -0.144 0.849 -0.147 -2.1930.28961 0.0259 0.532 -0.503 -0.682 -0.474 1.0917 0.0648 0.1892 -1.744 -1.4630.30179 -2.106 0.7699 -0.758 -0.673 -0.006 -2.402 -0.458 -0.5 1.1049 -0.7770.82075 -1.436 -0.965 0.4483 -1.416 0.7005 -0.543 -0.494 0.5125 0.0249 1.83840.42417 1.2668 0.9555 1.1128 0.7874 0.9757 1.179 0.8795 1.1049 1.0753 1.09
OLS slopes: -1.227 -0.895 -0.676 -0.509 -0.303 -0.162 -0.147 0.0224 0.1099 0.4948
5G89.2229 Lect 5W
The compelling nature of Random patterns
• Formal statistical inference methods tell us how often to expect such striking patterns by chance alone.
• Two approaches» Wald test (ratio of B to sdB)
» ANOVA test
ˆ ˆ
One in Ten Random Patterns
-4
-3
-2
-1
0
1
2
3
-1 -0.5 0 0.5 1
X
Ran
dom
Y
Y1
6G89.2229 Lect 5W
Multiple Regression Inference: Single variables
• Y = B0 + B1X1 + B2X2 + ...+ BqXq + e
• Formal question: What can be said about an individual coefficient, Bq in the context of the full model (i.e. “adjusting for X1, X2,..., Xq-1”)
» Test null hypothesis, H0: Bq= 0
» Compute 95% CI, (Lq,Uq) around Bq
» How much variance in Y does Xq account, given that some variance is already fitted by X1, X2, ..., Xq-1?
• Example from CCWA:Does gender add to the prediction of salary when experience and productivity are included in the model?
7G89.2229 Lect 5W
Example: Predicting Depressed mood day 29
• In bar exam study, let's revisit the prediction of depression on day 29 as a function of depression and anxiety on day 28.
• What can we say about» The relation of anxiety28 to
depression29 when depression 28 is adjusted?
» The residual distribution?» Homoscedasticity?» Adequacy of the linear model?» Alternative scaling of depression?
8G89.2229 Lect 5W
Multiple Regression Inference: Fit of whole equation
• Example: Suppose that outcome is productivity of workgroups in a corporation and X’s are characteristics of work setting, such as space/employee, ambient noise level, distance to restrooms, etc.
• Y = B0 + B1X1 + B2X2 + ...+ BqXq + e
• What can be said about the whole set of variables (i.e., X1, X2, ..., Xq) in relation to Y?» Test the null hypothesis,
H0: B1= B2= ... =Bq=0
» Alternative formulation, H0: R2=0
9G89.2229 Lect 5W
Decomposition of Regression and Residual Variance
• Step 1: Estimate regression coefficients using OLS and compute predicted (fitted) values of Y (Y).
• Step 2: Estimate Regression Sums of Squares as (Y-Y)2, MSR=SSR/df
• Step 3: Estimate Residual Sums of Squares as e2, MSE=SSE/df
• Under H0, MSR/MSE is distributed as central F on (q,n-q-1) df
Source df SS MS
Regression q (Y-Y)2 SSR/qResidual n-q-1 e2 SSE/(n-q-
1)
^
^
^
^
^
¯
¯
10G89.2229 Lect 5W
Test of Incremental R2 due to Xq
• Hierarchical Regression» Fit reference model with X1, X2, ...,Xq-1
• Determine Regression Sums of Squares• This determines R2 of reference model
» Fit expanded model with Xq added to reference model
• Determine increase in Regression Sums of Squares (SSq)
» on 1 df for single predictor Xq
• Determines R2 increment» “semipartial squared correlation”
• Determine Sums of Squares & Mean Squares for residual from expanded model
» MSE is mean square for residual» on (n-q-1) degrees of freedom
» Under null hypothesis, H0:Bq=0
• MSq is simply fitted random variation
• MSq/MSE ~ F[1, (n-q-1)]
11G89.2229 Lect 5W
Example: Predicting Anger on Day 29 with Day 28 Measures
• Does Anger on day 28 improve the fit of Anger on day 29 after four other moods have been included in the model?
• Do two emotional support variables on day 28 improve the fit of Anger 29 after five moods have been included?
12G89.2229 Lect 5W
Numerical Results
SourceCum R
dfCum R
SSIncrm
dfIncrm
SSMean
Sq Cum F Incrm F
4 Moods (ignoring anger & support) 4 9.90 4 9.90 8.1 8.1
4 Moods +Anger (ignoring support) 5 25.14 1 15.24 16.5 50.15 Moods + Support 7 25.29 2 0.15 11.9 0.2Residual 60 18.27 0.3044Total 67 43.56