12
1 G89.2229 Lect 5W • Regression assumptions • Inferences about regression • Predicting Day 29 Anxiety • Analysis of sets of variables: partitioning the sums of squares • Predicting Day 29 Anger with two Day 28 support measures, after adjusting for Day 28 Mood G89.2229 Multiple Regression Week 5 (Wednesday)

1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

Embed Size (px)

Citation preview

Page 1: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

1G89.2229 Lect 5W

• Regression assumptions

• Inferences about regression

• Predicting Day 29 Anxiety

• Analysis of sets of variables: partitioning the sums of squares

• Predicting Day 29 Anger with two Day 28 support measures, after adjusting for Day 28 Mood

G89.2229 Multiple Regression Week 5 (Wednesday)

Page 2: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

2G89.2229 Lect 5W

Usual OLS Regression Assumptions

• Needed for unbiased estimates» Model is properly specified

• Linear model?• Selection characteristics

included as IVs?• Reliable IVs?

• Needed for efficient estimates» Independent observations» Homoscedastic residuals

• Needed for inference» Independent residuals» Homoscedastic residuals» Normally distributed residuals

Page 3: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

3G89.2229 Lect 5W

Inferences about regression

• Suppose we believe that reaction time, Y, is inverse-linearly related to amount of cereal subjects eat.» Y=A+BX+e, where B<0

• We collect data from 20 students on the grams(x10) of cereal they ate, and we measure their reaction time in identifying ambiguous stimuli.

• Suppose we obtain estimates of Â=453 and B=-.6

• Question: Is there really evidence that X and Y are related?

» Can we reject H0: B=0?(In this case H0 is credible!)

ˆ

Page 4: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

4G89.2229 Lect 5W

B estimates are random variables

• Even if B is truly zero, it is unlikely that B will be zero» The least squares criterion guarantees

that B will fit even chance association between Y and X.

» Especially for small samples, chance associations can be striking.

• Example of chance results:

ˆ

ˆ

TEN RANDOM SETS OF Y NUMBERS ASSOCIATED WITH A FIXED XX Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10

-0.9051 -0.542 1.5483 -0.097 -0.19 -0.588 -0.468 0.9165 1.5759 -0.873 -1.609-0.751 0.3759 0.6454 2.7952 0.469 0.7946 -0.364 0.3553 -0.739 -0.928 -1.176

-0.7463 -0.473 1.4903 0.4562 -0.36 0.5954 0.4532 0.8936 0.8749 0.1414 -0.552-0.486 -0.2 0.139 -1.277 0.0163 0.4325 0.4834 -0.217 0.7097 1.1672 0.0017

-0.4719 1.848 1.3726 0.6994 -1.035 -0.125 -2.007 -1.878 -1.746 -0.545 1.1778-0.3786 -0.52 1.4496 1.2535 -0.593 0.7131 -0.071 1.3424 -1.695 1.6091 -0.017-0.3161 1.3836 0.2476 -0.051 0.4506 0.2617 2.5619 -1.3 -1.554 0.1389 -1.344-0.3044 1.6534 -2.135 0.26 0.8263 -1.075 -1.7 -0.952 0.656 1.425 -0.783-0.2244 -0.982 -0.647 -2.371 1.7112 1.0909 -0.249 0.4269 0.8333 -0.364 1.4492-0.1795 -0.441 0.4023 0.0411 -0.441 1.761 -0.57 -0.335 -1.603 -1.75 1.3185-0.1444 1.5766 -0.824 0.3415 -0.45 1.1703 -1.566 0.0787 -0.502 -0.316 0.1037

-0.016 0.876 0.4681 1.3639 1.1135 1.8292 -1.536 0.008 0.071 -0.26 -0.619-0.0103 -1.228 -0.204 1.0044 -0.229 -0.919 0.9972 1.0638 -0.823 -1.495 -0.1440.00974 -0.909 -0.377 -0.39 -1.071 -1.536 0.4459 0.6276 0.3031 0.3773 0.4183

0.0373 -0.52 -0.185 -1.028 0.7013 0.5918 -0.525 1.4214 -2.273 1.9763 0.1830.27863 -3.006 0.8735 -0.416 0.0158 -1.02 0.1071 0.9589 0.571 0.3113 -0.3740.27863 -0.43 -0.508 -0.871 -0.395 -1.006 -0.22 -0.144 0.849 -0.147 -2.1930.28961 0.0259 0.532 -0.503 -0.682 -0.474 1.0917 0.0648 0.1892 -1.744 -1.4630.30179 -2.106 0.7699 -0.758 -0.673 -0.006 -2.402 -0.458 -0.5 1.1049 -0.7770.82075 -1.436 -0.965 0.4483 -1.416 0.7005 -0.543 -0.494 0.5125 0.0249 1.83840.42417 1.2668 0.9555 1.1128 0.7874 0.9757 1.179 0.8795 1.1049 1.0753 1.09

OLS slopes: -1.227 -0.895 -0.676 -0.509 -0.303 -0.162 -0.147 0.0224 0.1099 0.4948

Page 5: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

5G89.2229 Lect 5W

The compelling nature of Random patterns

• Formal statistical inference methods tell us how often to expect such striking patterns by chance alone.

• Two approaches» Wald test (ratio of B to sdB)

» ANOVA test

ˆ ˆ

One in Ten Random Patterns

-4

-3

-2

-1

0

1

2

3

-1 -0.5 0 0.5 1

X

Ran

dom

Y

Y1

Page 6: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

6G89.2229 Lect 5W

Multiple Regression Inference: Single variables

• Y = B0 + B1X1 + B2X2 + ...+ BqXq + e

• Formal question: What can be said about an individual coefficient, Bq in the context of the full model (i.e. “adjusting for X1, X2,..., Xq-1”)

» Test null hypothesis, H0: Bq= 0

» Compute 95% CI, (Lq,Uq) around Bq

» How much variance in Y does Xq account, given that some variance is already fitted by X1, X2, ..., Xq-1?

• Example from CCWA:Does gender add to the prediction of salary when experience and productivity are included in the model?

Page 7: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

7G89.2229 Lect 5W

Example: Predicting Depressed mood day 29

• In bar exam study, let's revisit the prediction of depression on day 29 as a function of depression and anxiety on day 28.

• What can we say about» The relation of anxiety28 to

depression29 when depression 28 is adjusted?

» The residual distribution?» Homoscedasticity?» Adequacy of the linear model?» Alternative scaling of depression?

Page 8: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

8G89.2229 Lect 5W

Multiple Regression Inference: Fit of whole equation

• Example: Suppose that outcome is productivity of workgroups in a corporation and X’s are characteristics of work setting, such as space/employee, ambient noise level, distance to restrooms, etc.

• Y = B0 + B1X1 + B2X2 + ...+ BqXq + e

• What can be said about the whole set of variables (i.e., X1, X2, ..., Xq) in relation to Y?» Test the null hypothesis,

H0: B1= B2= ... =Bq=0

» Alternative formulation, H0: R2=0

Page 9: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

9G89.2229 Lect 5W

Decomposition of Regression and Residual Variance

• Step 1: Estimate regression coefficients using OLS and compute predicted (fitted) values of Y (Y).

• Step 2: Estimate Regression Sums of Squares as (Y-Y)2, MSR=SSR/df

• Step 3: Estimate Residual Sums of Squares as e2, MSE=SSE/df

• Under H0, MSR/MSE is distributed as central F on (q,n-q-1) df

Source df SS MS

Regression q (Y-Y)2 SSR/qResidual n-q-1 e2 SSE/(n-q-

1)

^

^

^

^

^

¯

¯

Page 10: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

10G89.2229 Lect 5W

Test of Incremental R2 due to Xq

• Hierarchical Regression» Fit reference model with X1, X2, ...,Xq-1

• Determine Regression Sums of Squares• This determines R2 of reference model

» Fit expanded model with Xq added to reference model

• Determine increase in Regression Sums of Squares (SSq)

» on 1 df for single predictor Xq

• Determines R2 increment» “semipartial squared correlation”

• Determine Sums of Squares & Mean Squares for residual from expanded model

» MSE is mean square for residual» on (n-q-1) degrees of freedom

» Under null hypothesis, H0:Bq=0

• MSq is simply fitted random variation

• MSq/MSE ~ F[1, (n-q-1)]

Page 11: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

11G89.2229 Lect 5W

Example: Predicting Anger on Day 29 with Day 28 Measures

• Does Anger on day 28 improve the fit of Anger on day 29 after four other moods have been included in the model?

• Do two emotional support variables on day 28 improve the fit of Anger 29 after five moods have been included?

Page 12: 1 G89.2229 Lect 5W Regression assumptions Inferences about regression Predicting Day 29 Anxiety Analysis of sets of variables: partitioning the sums of

12G89.2229 Lect 5W

Numerical Results

SourceCum R

dfCum R

SSIncrm

dfIncrm

SSMean

Sq Cum F Incrm F

4 Moods (ignoring anger & support) 4 9.90 4 9.90 8.1 8.1

4 Moods +Anger (ignoring support) 5 25.14 1 15.24 16.5 50.15 Moods + Support 7 25.29 2 0.15 11.9 0.2Residual 60 18.27 0.3044Total 67 43.56