Upload
patience-franklin
View
221
Download
4
Embed Size (px)
Citation preview
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
LIS 397.1Introduction to Research in
Library and Information Science
Student’s t -Test and ANOVA
R. E. Wyllys
Copyright 2003 by R. E. Wyllys
Last revised 2003 Jan 15
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
Standardized Tests of Statistical Hypotheses
• To each type of statistical hypothesis corresponds a particular standardized test procedure or procedures
• Each test procedure includes a formula, the “test statistic”
• You – place, into the test statistic, data from observed
sample or samples
– obtain a number, the observed value of the test statistic
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
Standardized Tests of Statistical Hypotheses
• Traditional Method: Compare absolute value of observed value of test statistic against threshold value from pertinent table– If |test statistic| tabled threshold Accept H0
– If |test statistic| > tabled threshold Reject H0
• Computer-Era Method: Use probability of getting observed value of test statistic when the null hypothesis H0 is true (OVTSWNHT)
– If P(OVTSWNHT) Accept H0
– If P(OVTSWNHT) < Reject H0
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
Common Types of Single-Variable Statistical Hypotheses
• H0: = 0
– Population mean is some number: “Average daily circulation total is 123”
– Handled by t-test
• H0: 1 = 2
– Means of two populations are equal: “Average cost per online search using Service A = average cost using Service B”
– Handled by t-test and ANOVA
– Samples can be independent or dependent (paired, repeated)
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
Common Types of Single-Variable Statistical Hypotheses
• H0: 1 = 2 = 3 = ..., etc.
– Means of Populations 1, 2, 3, ..., etc. are all equal: “Average number of books borrowed per student per semester is the same for freshmen, sophomores, juniors, and seniors.”
– Handled by ANOVA– Samples can be independent or dependent
(repeated, replicated)
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: = 0
0 0
( )x n
X Xt
ssn
Test statistic
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: = 25
30 25 58.43 8.43
7.07505
4.191.19
t
50, 30, 8.43n x s
1 Adapted from Hinton, pp. 62-67
Example1
:
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: = 25
50, 30, 8.43n x sMean 30Standard Error 1.192151Median 30Mode 30Standard Deviation 8.429782Sample Variance 71.06122Kurtosis -0.33067Skewness -2.4E-18Range 36Minimum 12Maximum 48Sum 1500Count 50Confidence Level(95.0%) 2.395716Confidence Level(99.0%) 3.19491
Output of Excel’ s Descriptive Statistics procedure
Example:
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Dependent Samples
• Dependent samples consist of pairs of observations for each sample element
• The crucial evidence is the set of pairwise differences: Di = di2 - di1
• Test statistic
( )
0
D n
D Dt
s sn
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Dependent Samples
1.501.604
81.50
2.6460.567
t
8, 1.50, 1.604n D s
1 Adapted from Hinton, pp. 83-86
Example1:
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Dependent Samples
8, 1.50, 1.604n D s
Morning AfternoonMean 5.25 3.75Variance 1.642857 1.071429Observations 8 8Pearson Correlation 0.053838Hypoth'd. Mean Diff. 0df 7t Stat 2.645751P(T<=t) one-tail 0.016573t Critical one-tail 1.894578P(T<=t) two-tail 0.033146t Critical two-tail 2.364623
Output of Excel’ s t-Test: Paired Two-Sample for Means procedure
Example:
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Independent Samples
1 2
1 2
x x
x xt
s
where
1 2
1 2
1 1x x p
n ns s
2 21 1 2 22
1 2
1 1
1 1p
n s n ss
n nand
Test statistic
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Independent Samples
Example1:
1 Adapted from Hinton, pp. 88-90
21 1 1 1
22 2 2 2
6, 5.00, 0.8944, 0.8000
8, 6.25, 1.4880, 2.2143
n x s s
n x s s
2 6 1 (0.8000) 8 1 2.2143
1.6250; 1.27485 7p ps s
1 2
1 11.2748 1.2748 0.2917
6 8
1.2748 0.5401 0.6884
x xs
5.00 6.251.8157
0.6884t
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Independent Samples
Example:
Output of Excel’ s t-Test: Two-Sample Assuming Equal Variances procedure
Men WomenMean 5 6.25Variance 0.8 2.214286Observations 6 8Pooled Variance 1.625Hypoth'd. Mean Diff. 0df 12t Stat -1.81568P(T<=t) one-tail 0.047236t Critical one-tail 1.782287P(T<=t) two-tail 0.094472t Critical two-tail 2.178813
21 1 1 1
22 2 2 2
6, 5.00, 0.8944, 0.8000
8, 6.25, 1.4880, 2.2143
n x s s
n x s s
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 with Independent Samples
Example:
Output of Excel’ s ANOVA: Single Factor procedure
21 1 1 1
22 2 2 2
6, 5.00, 0.8944, 0.8000
8, 6.25, 1.4880, 2.2143
n x s s
n x s s
SUMMARY
Groups Count Sum Average VarianceMen 6 30 5 0.8Women 8 50 6.25 2.214286
ANOVA
Source of Variation SS df MS F P-value F critBetween Groups 5.357143 1 5.357143 3.296703 0.094472 4.747221Within Groups 19.5 12 1.625
Total 24.85714 13
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
An ANOVA Table
Source SS DF MS F
Total SST (total number of observations)-1
Between SSB (no. of groups)-1=k-1
2
2
2
ˆ ˆ
ˆB
B B obs
B W
BSSMS F
DF
Within SST - SSB DFT - DFB 2ˆWW W
W
SSMS
DF
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
H0: 1 = 2 = 3 with Independent Samples
Example from pp. 122-124 of Hinton
Output of Excel’ s ANOVA: Single Factor procedure
SUMMARYGroups Count Sum Average Variance
First Letter 10 150 15 8.666667Last Letter 10 240 24 12No Letter 10 280 28 19.33333
ANOVASource of Variation SS df MS F P-value F critBetween Groups 886.6667 2 443.3333 33.25 5.22E-08 3.354131Within Groups 360 27 13.33333
Total 1246.667 29
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
ANOVAThe ANOVA procedure is capable of dealing, in a single experiment, not only with several levels of one factor but also with several levels of each of several factors. This capability has contributed, in a major way, to research in countless fields of science and technology since R. A. Fisher invented ANOVA in the 1920s, ranging from agriculture (e.g., the Green Revolution) to pharmaceuticals to the manufacturing of integrated circuits.
Without ANOVA, we would not eat as well and cheaply as we do, we would not be as healthy, and we would be nowhere near as far along in the computer and communications revolution.
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
ANOVA and Sir Ronald FisherAn even more important achievement was Fisher's origination of the concept of the analysis of variance. This is a statistical procedure used to design experiments that answer several questions at once, instead of just one. Fisher's principal idea was to arrange an experiment as a set of partitioned subexperiments that differ from each other in one or several of the factors or treatments applied in them. The subexperiments are designed in such a way as to permit differences in their outcome to be attributed to the different factors or combinations of factors by means of statistical analysis. This was a notable advance over the prevailing scientific method of varying only one factor at a time in an experiment, which was a relatively inefficient procedure. It was later found that the problems of bias and multivariate analysis that Fisher had solved in his plant-breeding research are encountered in a great deal of other experimental work in biology, and indeed in many other scientific fields as well. From: Encyclopedia Britannica Online, 2002
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science
Sir Ronald Aylmer Fisher 1890-1962