DEPARTMENT OF STATISTICS Stats Questions We Are Often Asked

DEPARTMENT OF STATISTICS

Stats Questions We Are Often Asked


Stats questions we are often asked

When can I use r and R 2 ?

When can I make a ‘causal-type’ claim?

Why should I be careful with a media reported margin of error?

When can I say a confidence interval gives support to a claim?


Stats questions we are often asked

When can I use r and R 2 ?

When can I make a ‘causal-type’ claim?

Why should I be careful with a media reported margin of error?

When can I say a confidence interval gives support to a claim?


r – little r – what is it?

r is the correlation coefficient between y and x

r measures the strength of a linear relationship

r is a multiple of the slope


r – when can it be used?

Only use r if the scatter plot is linear

Don’t use r if the scatter plot is non-linear!

x

y

******

** ** * **** **

** *r = 0.99


r – what does it tell you?

How close the points in the scatter plot come to lying on the line

r = 0.99

x

y

**** ** ** ** * **** **** *

r = 0.57

x

y*

**

*

** *

*

**

*

****

*

*

*

* *


R 2 – big R

2 – what is it?

R 2 is the coefficient of determination

Measures how close the points in the scatter plot come to lying on the fitted line or curve

x

y

**** ***

***

**

***

** ** *

x

y

******

** ** * **** **

** *


R 2 – big R

2 – when can it be used?

When the scatter plot of y versus x is

linear or non-linear

x

y

**** ***

***

**

***

** ** *

x

y

******

** ** * **** **

** *


R 2 – what does it tell you?

x

x

Dotplot of the y ’s

Shows the variation in the y ’s

y

y

Dotplot of the y ’s Shows the variation in the y ’s

ˆ

ˆ



x

We see some additional variation in the y ’s.

The excess is not explained by the model.

y y

2 Variation in y 'sˆVariation in fitted valuesVariation in y values Variation in y 's

R = =

Variation in the y ’s

This amount of variation can be explained by the model

ˆ



When expressed as a percentage, R 2 is

the percentage of the variation in Y that

our regression model can explain

R 2 near 100% model fits well

R 2 near 0% model doesn’t fit well



90% of the variation in Y is explained by

our regression model.

x

y** **

**

*

* ***

* ***

** ** *

R 2 = 90%


R 2 – pearls of wisdom!

R 2 and r 2 have the same value ONLY

when using a linear model

DON’T use R 2 to pick your model

Use your eyes!


R 2 and Excel & Graphics Calculators


Damaged for life by too much TV



N Z Herald (04/10/2005)





TV watching

Hea

lth

Sco

re

r = - 0.93

Causal relationship?


Causal relationships

Two general types of studies: experiments and observational studies

In an experiment, the experimenter determines which experimental units receive which treatments.



TV watching

Hea

lth

Sco

re

r = - 0.93

Causal relationship?



Two general types of studies: experiments and observational studies

In an experiment, the experimenter determines which experimental units receive which treatments.

In an observational study, we simply compare units that happen to have received different levels of the factor of interest.



Only well designed and carefully executed experiments can reliably demonstrate causation.

An observational study is often useful for identifying possible causes of effects, but it cannot reliably establish causation


Causal relationships - Summary

In observational studies, strong relationships are not necessarily causal relationships.

Correlation does not imply causation.

Be aware of the possibility of lurking variables.




Margin of Error

Sunday Star Times:National 44%Labour 37.2%NZ First 4.7%margin of error: 4.4%

(n = 540)

Herald on Sunday:Labour 42%National 38.5%NZ First 5.5%margin of error: 4.9%

(n = 400)


Margin of Error


(n = 400)


Margin of Error


(n = 400)

Confidence Interval:estimate ± margin of error


Margin of Error

SurveyErrors

Nonsampling ErrorsSampling Error


Margin of Error

SurveyErrors


caused by the act of sampling has potential to be bigger in smaller samples can determine how large it can be

– margin of error unavoidable (price of sampling)


Margin of Error

SurveyErrors


e.g., nonresponse bias, behavioural, . . . can be much larger than sampling errors impossible to correct for after completion of

survey impossible to determine how badly they affect

results


Margin of Error


(n = 400)


Approx. 95% confidence interval for p:

Margin of Error

n

ppp

196.1ˆ

n

ppp

ˆ1ˆ96.1ˆ

np

5.05.02ˆ

np

1ˆ


Margin of Error

Margin of error(single proportion)

7.0ˆ3.0ˆ1

7.0ˆ3.01

porpn

pn


Margin of Error


(n = 400)

Sunday Star Times:National 44%Labour 37.2%NZ First 4.7%margin of error: 4.4%

(n = 540)


C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

Bank Dissatisfaction Scores – 95% CIs


C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

With 95% confidence, the mean dissatisfaction score for Canterbury customers is somewhere between 0.5 and 20.7 larger than the mean dissatisfaction score for Auckland customers.



C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

With 95% confidence, the mean dissatisfaction score for Canterbury customers is somewhere between 0.5 and 20.7 larger than the mean dissatisfaction score for Auckland customers.




C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

With 95% confidence, the mean dissatisfaction score for Auckland customers is somewhere between 9.8 less than and 6.6 greater than the mean dissatisfaction score for Wellington customers.



C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

With 95% confidence, the mean dissatisfaction score for Auckland customers is somewhere between 9.8 less than and 6.6 greater than the mean dissatisfaction score for Wellington customers.



C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

Does this confidence interval support the proposition that there is a difference between the two population means?

Supports A – W 0 ?

No, it doesn’t support the proposition.

Since 0 is in the confidence interval, then 0 is a believable value for the difference. There could be no difference between the two means.

A – W = 0 (no diff)

A – W 0 (a diff)



C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

Does this confidence interval support the proposition that there is NO difference between the two population means?

Supports A – W = 0 ?


Since there are non-zero numbers in the interval A – W could be non-zero, there could be a

difference.


A – W 0 (a diff)



C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

Does this confidence interval support the proposition that there is a difference between the two population means?

Supports A – W 0 ?

Yes, it does support the proposition.

Since zero is not in the interval, it is not believable that the difference is zero. No difference between the means is not believable.


A – W 0 (a diff)



C – A: 0.5 to 20.7A – W: – 9.8 to 6.6

Does this confidence interval support the proposition that there is NO difference between the two population means?

Supports A – W = 0 ?


In fact, it provides evidence against it. 0 is not in the interval. No difference between the means is not believable.


A – W 0 (a diff)

Documents

DEPARTMENT OF STATISTICS Stats Questions We Are Often Asked