15
SADC Course in Statistics Inferences about the regression line (Session 03)

SADC Course in Statistics Inferences about the regression line (Session 03)

Embed Size (px)

Citation preview

Page 1: SADC Course in Statistics Inferences about the regression line (Session 03)

SADC Course in Statistics

Inferences about the regression line

(Session 03)

Page 2: SADC Course in Statistics Inferences about the regression line (Session 03)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session, you will be able to

• make inferences concerning the slope of the regression line– through the use of a t-test– using an analysis of variance F-test

• describe and interpret the components of an anova table

• explain the meaning of s2 in the analysis of variance and the importance of attention to the corresponding degrees of freedom

Page 3: SADC Course in Statistics Inferences about the regression line (Session 03)

3To put your footer here go to View > Header and Footer

Smoking and death rates again!

We consider again the example used in the previous session concerning the average number of cigarettes smoked per adult in 1930 and the death rate per million in 1952 for sixteen countries.

Previously we described this relationship.

We now ask whether this relationship is a real one, or whether it could be just a chance occurrence.

Page 4: SADC Course in Statistics Inferences about the regression line (Session 03)

4To put your footer here go to View > Header and Footer

Recall model estimates

------------------------------------------------------ deathrate|Coef. Std.Err. t P>|t| [95% Conf.Int.]---------+--------------------------------------------cigars | .2410 .0544 4.43 0.001 .1245 .3577const. | 28.31 46.92 0.60 0.556 -72.34 128.95------------------------------------------------------

Estimates and of unknown parameters and of the model y = + x +

Estimated equation is: = 28.31 + 0.241 * x

y

Page 5: SADC Course in Statistics Inferences about the regression line (Session 03)

5To put your footer here go to View > Header and Footer

Assessing the regression line

Is there a real relationship between y and x?

In the model y = +x, need to test the hypothesis:

H0: no linear relationship, i.e. slope = 0

H1: y is linearly related to x, i.e. slope 0

One approach is to use a t-test, i.e. first calculate t below.

(Same as t-value for “cigars” in slide 4)

slope - 0 0.241t 4.43

s.e.(slope) 0.0544

Page 6: SADC Course in Statistics Inferences about the regression line (Session 03)

6To put your footer here go to View > Header and Footer

Interpreting results about the slope

Compare calculated t of 4.43 with tabulated t-value with 14 d.f.

The 2-sided tabulated value is 2.98 at a 1% significance level, and 4.14 at a 0.1% sig. level.

It may be concluded that there is strong evidence to reject the null hypothesis H0.

i.e. there is strong evidence of a linear relationship between smoking and death rates.

Note: In practice, just the computer output P>|t| , will be interpreted. This is the p-value for the test.

Page 7: SADC Course in Statistics Inferences about the regression line (Session 03)

7To put your footer here go to View > Header and Footer

Another approach…

The same hypothesis as above can also betested using an analysis of variance (ANOVA)

This involves splitting the overall variation iny into two components:

• Variation due to the regression, i.e. due to the presence of the explanatory variable x

• Balance (or residual) variation, i.e. variation that is not explained by the explanatory variable

Page 8: SADC Course in Statistics Inferences about the regression line (Session 03)

8To put your footer here go to View > Header and Footer

Deviations from overall mean0

100

200

300

400

500

De

ath

rate

(y)

0 500 1000 1500 2000Cigarettes smoked (x)

Mean=215

Deviation from mean

Page 9: SADC Course in Statistics Inferences about the regression line (Session 03)

9To put your footer here go to View > Header and Footer

010

020

030

040

050

0

0 500 1000 1500 2000Cigarettes smoked (x)

Death rate (y) Fitted values

Deviations from regression and residual deviation

Residual deviation

Deviation from regression

Page 10: SADC Course in Statistics Inferences about the regression line (Session 03)

10To put your footer here go to View > Header and Footer

Source d.f. S.S. M.S. F Prob.

Regression 1 132934.7 132934.7 19.7 0.0006

Residual 14 94637.0 6759.8

Total 15 227571.8 15171.5

Analysis of Variance (ANOVA)

ANOVA shows breakdown of total variation into

• Variation due to regression, and

• Residual variation

Page 11: SADC Course in Statistics Inferences about the regression line (Session 03)

11To put your footer here go to View > Header and Footer

Source d.f. S.S. M.S. F Prob.

Regression 1 132934.7 132934.7 19.7 0.0006

Residual 14 94637.0 6759.8

Total 15 227571.8 15171.5

• Mean square (M.S.)=Sum of squares (S.S.) degrees of

freedom(d.f.)

• Need sufficient d.f. for residual M.S. for reliable significance testing

• Regression has 1 d.f. because 1 slope is being estimated

Analysis of Variance (ANOVA) ctd…

Page 12: SADC Course in Statistics Inferences about the regression line (Session 03)

12To put your footer here go to View > Header and Footer

Interpretation Residual Mean Square

• Residual Mean Square (s2) estimates the underlying variation (2) in y that is not explained by the x variable

• It is used in the calculation of standard errors of model estimates (& other estimates derived from the model)

• Hence it plays a role in determining the precision of such estimates

• For a simple linear regression model, the residual degrees of freedom = n – 2.

Page 13: SADC Course in Statistics Inferences about the regression line (Session 03)

13To put your footer here go to View > Header and Footer

Interpretation of Anova tableSignificance test:

H0: no linear relationship between death rate and number of cigarettes smoked (=0)H1: there is a linear relationship (0)

• F-value of 19.7• Compare with F-distribution with (1,14) df• Highly significant: p-value=0.0006

Conclusion: there is a strong evidence of a linear relationship between death rates and number of cigarettes smoked.

Page 14: SADC Course in Statistics Inferences about the regression line (Session 03)

14To put your footer here go to View > Header and Footer

ANOVA versus t-testIn our example, anova and t-test were testingthe same hypothesis, so conclusions identical!

However, note that

• the anova can be extended to include more than one regressor variable

• The t-test can be used to test general hypotheses concerning the slope,

e.g. H0: slope=1 for testing if a new, simpler

poverty index behaves similarly to a standard measure previously used.

Page 15: SADC Course in Statistics Inferences about the regression line (Session 03)

15To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives are

achieved…