View
220
Download
0
Tags:
Embed Size (px)
Citation preview
7.11
Using Statistics To Make Inferences 7
Summary
Single sample test of variance.Comparison of two variances.
Tuesday 18 April 2023 09:31 PM
7.22
GoalsTo perform and interpret χ2 and F tests.
These tests are not available individually within SPSS, but embedded within more complex procedures.
PracticalRevert to the data from practical 5 and employ the Mann-Whitney test where you previously employed a t test.Perform a t test on reading ability data.
Chi squared
7.33
Recall
In lecture 4 we compared the mean of two samples using an appropriate t test.
What assumption did we make about the variances of the two samples?It was assumed that the two sample variances were effectively equal.
Ffffffffffffffffffffffffffff
7.44
Recall
What notation do we employ for a population mean and for a sample mean?
Population mean μSample mean Typically assessed with a t test
xCCCz
7.55
Recall
What notation do we employ for a population variance and for a sample variance?
Population variance σ2
Sample variance s2
Typically assessed with a χ2 or F test
FffffffffffffffffffffFffffff
7.66
Examination of the Variance
Equality of means does not imply equality of variances.
It is often important to control (minimise) the variance.
7.77
Examination of the Variance
How do we compare a sample variance against the expected population value?
7.88
Single Sample Variance Test
The null hypothesis is that a population standard deviation is equal to a particular value σ. Assuming that the data are normally distributed, a sample of size n is obtained from the population and a standard deviation, s, calculated. The test statistic is
2
22 1
sn
calc
7.99
Conclusion
This statistic follows a Chi-squared distribution with ν = n - 1 degrees of freedom and significance level α, )(2
7.1010
Example
From past records students marks have a standard deviation of 10. A group of 20 students are taught by a new method, the standard deviation of their marks is 7.6. Is the group significantly more, or less variable?
H0 is that σ = 10H1 is that σ ≠ 10
7.1111
Conclusion
97.10
10
6.712012
2
2
22
sn
calc
n = 20 s = 7.6 σ = 10 ν = n – 1 = 19
7.1212
Use of Tables
ν p=0.1 p=0.05
p=0.025
p=0.01 p=0.005
p=0.002
19 27.204
30.144
32.852 36.191 38.582 41.610
ν p=0.9 p=0.95
p=0.975
p=0.99 p=0.995
p=0.998
19 11.651 10.117
8.907 7.633 6.844 5.970
14.30)05(.2
19
12.10)95(.2
19
97.102 calc ν = 19
Here the 5% and 95% values are 30.14 and 10.12, the value is not significant at the 10% level (two tail test), the null hypothesis is accepted, there is no evidence that the new method affects the variability of the marks.For 5% (2.5% and 97.5%)the corresponding values are 32.85 and 8.91 and the conclusion is unchanged.
7.1313
Example
The following data is obtained.
36.2 38.1 35.3 34.8 39.6 39.331.4 34.6 40.2 32.2 35.2 37.2
Past experiments suggest that the standard deviation is never more than 2.
H0 is that σ = 2H1 is that σ > 2 (a one sided test)
7.1414
CalculationYou might find the following sums useful Σx = 434.1
and Σx2 = 15790.71
7.1515
CalculationYou might find the following sums useful Σx = 434.1
and Σx2 = 15790.71
815.2
922.71.43412
171.15790
112
1
1
1
1
2
222
s
xn
xn
s
CCCCCCCc
7.1616
Calculation
n = 12 s = 2.815 (direct calculation) σ = 2 ν = n – 1 = 11
79.21
2
815.211212
2
2
22
sn
calc
ν p=0.1 p=0.05
p=0.025
p=0.01
p=0.005
p=0.002
11 17.275
19.675 21.920 24.725 26.757 29.354 68.19)05(.2
11
Since 19.68 < 21.79, the result is significant at the 5% level, this means you can be 95% confident of your result, the null hypothesis is rejected, and the variability is significantly higher.
7.1717
Confidence Interval
21
1
2
1
21
22
21
2
nn
snsn
A confidence interval for the variance, with confidence level 1-α is
7.1818
Confidence IntervalFor example, if n = 31 and s = 27.63, our degrees of freedom are n-1 = 30, so that if the confidence level is 95% (α = 0.05), we look up
979.46025.2
230
21
n
P=0.1 P=0.05 P=0.025 P=0.01 P=0.005 P=0.002 30 40.256 43.773 46.979 50.892 53.672 57.167
21
1
2
1
21
22
21
2
nn
snsn
7.1919
Confidence IntervalFor example, if n = 31 and s = 27.63, our degrees of freedom are n-1 = 30, so that if the confidence level is 95% (α = 0.05), we look up
P=0.9 P=0.95 P=0.975 P=0.99 P=0.995 P=0.998 30 20.599 18.493 16.791 14.953 13.787 12.461
791.16975.2
1 230
21
n
21
1
2
1
21
22
21
2
nn
snsn
7.2020
Confidence Interval
791.16975.2
1 230
21
n
21
1
2
1
21
22
21
2
nn
snsn
Recall n = 31 and s = 27.63
979.46025.2
230
21
n
98.1363
791.16
63.27131
979.46
63.2713151.487
22
2
7.2121
Confidence Interval
So we can be 95% certain that σ lies in the interval [22.08, 36.93] (on taking the square root).
98.1363
791.16
63.27131
979.46
63.2713151.487
22
2
7.2222
Aside
Consider Boys and Girls and the desire to predict gender based on some simple test. Assume that 50% of births are Boys so that Prob(Boy) = Prob(Girl) = ½.
A simple, inexpensive, non-invasive gender testing procedure indicates that it is "perfect" for boys, Prob(Test Boy|Boy) = 1, implying Prob(Test Girl|Boy) = 0. Unfortunately, this simple gender testing procedure for girls is a "coin toss," Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.
By evaluating Prob(Boy|Test Boy) and Prob(Girl|Test Girl) assess which is the most likely.
7.2323
Aside
Consider Boys and Girls and the desire to predict gender based on some simple test. Assume that 50% of births are Boys so that Prob(Boy) = Prob(Girl) = ½.
A simple, inexpensive, non-invasive gender testing procedure indicates that it is "perfect" for boys, Prob(Test Boy|Boy) = 1, implying Prob(Test Girl|Boy) = 0. Unfortunately, this simple gender testing procedure for girls is a "coin toss," Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.
By evaluating Prob(Boy|Test Boy) and Prob(Girl|Test Girl) assess which is the most likely.
7.2424
Aside
Consider Boys and Girls and the desire to predict gender based on some simple test. Assume that 50% of births are Boys so that Prob(Boy) = Prob(Girl) = ½.
A simple, inexpensive, non-invasive gender testing procedure indicates that it is "perfect" for boys, Prob(Test Boy|Boy) = 1, implying Prob(Test Girl|Boy) = 0. Unfortunately, this simple gender testing procedure for girls is a "coin toss," Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.
By evaluating Prob(Boy|Test Boy) and Prob(Girl|Test Girl) assess which is the most likely.
What approach is appropriate (simplest)?
7.2525
SolutionCCCCCCCCCCCCC
What is the grand total of the probabilities?
CCCCCCCCCCCCc
7.2626
Prob(girl|test says girl)CCCCCCCCCCCCCc
Test says girl
CCCCCCCCCCCCCCc
7.2727
Prob(boy|test says boy)CCCCCCCCCCCCC
Test says boy
7.2828
Solution
The tree diagram or application of Bayes theorem yields what seems to be a strange inversion, Prob(Boy|Test Boy) = ⅔ and Prob(Girl|Test Girl) = 1.
That is, somehow, "perfection" switched from Boy to Girl. The test itself was perfect in "confirming" that a Boy was a Boy and has a 50% error rate in confirming that a Girl was a Girl.
CCCCCCCCCCCc
7.2929
Alternate Approach
What if we tested 100 boys and 100 girls?
Complete the following table.
7.3030
Alternate ApproachTest says
boyTest says
girl
Boy 100
Girl 100
200Complete the table
Prob(Test Boy|Boy) = 1Prob(Test Girl|Girl) = Prob(Test Boy|Girl) = ½.
7.3131
Alternate Approach
Test says boy
Test says girl
Boy 100 0 100
Girl 50 50 100
150 50 200
CCCCCCCCCCCCc
7.3232
Prob(girl|test says girl)
Test says boy
Test says girl
Boy 100 0 100
Girl 50 50 100
150 50 200
Prob(Girl|Test says girl) = 50/50 =1
CCCCCCCCCCCCCc
7.3333
Prob(boy|test says boy)
Test says boy
Test says girl
Boy 100 0 100
Girl 50 50 100
150 50 200
Prob(boy|test says boy)=100/150= ⅔
CCCCCCCCCCCCCc
7.3434
Conclusion
The previous result follows.
Of course!
More of this next week.
7.3535
Comparison of Two Sample Variances
We know that a t test may be used to compare two sample means.
We now compare two sample variances, assuming that the data are normally distributed.
7.3636
Two Sample Variance Test
Note that the tables only give upper tail significance levels, so the larger sample variance must be placed in the numerator.
2
2
2
1 ss
2
2
2
1
ss
Fcalc
1
2F111 n 122 n From tables
Significant if
1
2
FFcalc
7.3737
Two Sample Variance Test
The tables only give upper tail significance levels. What if the lower tail is required?
)(
1)1(
2
1
1
2
F
F
So, swap the degrees of freedom and reciprocate.
7.3838
Two Sample Variance Test To illustrate, swapping the degrees of freedom and reciprocating.
ν1 5 ν2 4 α 0.025 Fcrit 9.36ν1 4 ν2 5 α 0.025 Fcrit 7.39ν1 5 ν2 4 α 0.975 Fcrit 0.14 (reciprocal7.39)
)025.0(
1)975.0(
4
5
5
4F
F
Calculator
7.3939
Two Sample Variance Test
2
2
2
1
ss
Fcalc
2
2
2
1 ss
7.4040
Example
Two samples are taken to check for equality of their variances.
sample 1 - 16 observations with
standard deviation 8.4sample 2 - 20 observations with
standard deviation 5.2.
7.4141
Hypothesis
H0 is that 22
21
61.22.5
4.82
2
22
21
s
sFcalc
22
21 ss Not
e
So n1 = 16 n2 = 20
And ν1 = n1 – 1 = 15 ν2 = n2 – 1 = 19
sample 1 - 16 observations with standard deviation 8.4sample 2 - 20 observations with standard deviation 5.2
7.4242
Tables 05.01
2
F
ν1
ν2
11 12 13 14 15 16 17 18 19 20 40 60 100
19 2.34
2.31
2.28
2.26
2.23
2.22
2.22
2.18
2.17
2.16
2.03
1.98
1.94 23.205.015
19 F
025.01
2
F
ν1
ν2
11 12 13 14 15 16 17 18 19 20 40 60 100
19 2.77
2.72
2.68
2.65
2.62
2.59
2.59
2.55
2.53
2.51
2.33
2.27
2.22
62.2025.01519 F
7.4343
Conclusion 23.205.015
19 F 62.2025.01519 F 61.2calcF
At 90% the upper cut off is 2.23 (2.23 < 2.61).
At 90% the upper cut off is 2.23 (2.23 < 2.61).
The result is significant at the 10% level, this means you can be 90% confident of your result, reject H0, the variances are probably inconsistent. But further work is probably required.
7.4444
Example
In a clinical test the following scores were obtained for “normal” and “diseased” patients.
Normal 10.3 11.8 12.6 8.6 9.2 10.1 10.2 7.4Diseased 10.1 12.7 14.3 13.6 9.8 15.0 11.2 11.4
Is there a significant difference between the mean test scores for the two groups?
A t test was performed previously, see lecture 4 example 1. This assumed “equality” of the variances!
7.4545
Previously
H0 is that μ1 = μ2
H1 is that μ1 ≠ μ2 under a two tail test
81 n 025.101 x 669.11 s
82 n 262.122 x 936.12 s
Because s1 and s2 are similar we assumed that σ1 = σ2 (chapter 4). Was this justified?
7.4646
Conclusion
H0 is that 22
21 345.1
669.1
936.12
2
22
21
s
sFcalc
79.305.077 F
ν1
ν2
1 2 3 4 5 6 7 8 9 10
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64
There would appear to be no significant difference and the original assumption was justified.
05.077F
7.4747
Confidence IntervalThe confidence interval for the ratio of the two variances is
2
112
1
22
21
22
21
121
22 1
22
1
1
nnn
n
Fs
s
Fs
s
Note the change in the degrees of freedom for the two choices of F. In fact one gives the upper tail and one the lower.
It is not necessary that .2
2
2
1
7.4848
Confidence IntervalIt is not necessary that . The two bounds
are always and .
2
2
2
1
7.4949
α = 0.05 Confidence IntervalFor example if 6 056.0 and 5 797.1 2
221
21 nsns
36.9025.542
11
2
1
FF nn
2
112
1
22
21
22
21
121
22 1
22
1
1
nnn
n
Fs
s
Fs
s
First value from table
7.5050
α = 0.05 Confidence IntervalFor example if 6 056.0 and 5 797.1 2
221
21 nsns
39.7025.)025(. 45
11
1
2
FF nn
2
112
1
22
21
22
21
121
22 1
22
1
1
nnn
n
Fs
s
Fs
s
Second value from table
7.5151
α = 0.05 Confidence IntervalFor example if 6 056.0 and 5 797.1 2
221
21 nsns
39.7025.)025(. 45
11
1
2
FF nn 36.9025.5
421
12
1
FF nn
2
112
1
22
21
22
21
121
22 1
22
1
1
nnn
n
Fs
s
Fs
s
39.7797.1
056.0
36.9
1
797.1
056.021
22
0 0033 0 230322
12
. .
Switching the roles of the groups gives bounds 4.342 and 300.356, the reciprocal of the values reported above.
7.5252
What if I have lost my statistical tables?
Most tabulated statistical values may be obtained from Excel
Excel Statistical Calculator
7.5353
Next Week
Bring your calculators next week
7.5454
Read
Read Howitt and Cramer 181-186
Read Davis and Smith pages 434-448
7.5555
Solution To The First Assignment
The individual solutions to the first assignment should now be available on the module web page.
Please access the “SPSS Verification” which employs the syntax window. You will find this particularly useful at Stage III.
7.5656
Practical 7
This material is available from the module web page.
http://www.staff.ncl.ac.uk/mike.cox
Module Web Page
7.5757
Practical 7
This material for the practical is available.
Instructions for the practical
Practical 7
Material for the practicalPractical 7
7.5858
Whoops!
Last week, a formatting error led to us inadvertently suggesting that there was a one in 1,019 chance of the world ending before thisedition. That should have read, er, one in 1019 rather less likely. Sorry. Feel free to remove the crash helmet.
Independent
13/09/08
7.5959
Whoops!
Yeah... that's not the quadratic formula.