20
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction to Research in Library and Information Science Student’s t -Test and ANOVA R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

Embed Size (px)

Citation preview

Page 1: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

LIS 397.1Introduction to Research in

Library and Information Science

Student’s t -Test and ANOVA

R. E. Wyllys

Copyright 2003 by R. E. Wyllys

Last revised 2003 Jan 15

Page 2: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

Standardized Tests of Statistical Hypotheses

• To each type of statistical hypothesis corresponds a particular standardized test procedure or procedures

• Each test procedure includes a formula, the “test statistic”

• You – place, into the test statistic, data from observed

sample or samples

– obtain a number, the observed value of the test statistic

Page 3: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

Standardized Tests of Statistical Hypotheses

• Traditional Method: Compare absolute value of observed value of test statistic against threshold value from pertinent table– If |test statistic| tabled threshold Accept H0

– If |test statistic| > tabled threshold Reject H0

• Computer-Era Method: Use probability of getting observed value of test statistic when the null hypothesis H0 is true (OVTSWNHT)

– If P(OVTSWNHT) Accept H0

– If P(OVTSWNHT) < Reject H0

Page 4: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

Common Types of Single-Variable Statistical Hypotheses

• H0: = 0

– Population mean is some number: “Average daily circulation total is 123”

– Handled by t-test

• H0: 1 = 2

– Means of two populations are equal: “Average cost per online search using Service A = average cost using Service B”

– Handled by t-test and ANOVA

– Samples can be independent or dependent (paired, repeated)

Page 5: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

Common Types of Single-Variable Statistical Hypotheses

• H0: 1 = 2 = 3 = ..., etc.

– Means of Populations 1, 2, 3, ..., etc. are all equal: “Average number of books borrowed per student per semester is the same for freshmen, sophomores, juniors, and seniors.”

– Handled by ANOVA– Samples can be independent or dependent

(repeated, replicated)

Page 6: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: = 0

0 0

( )x n

X Xt

ssn

Test statistic

Page 7: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: = 25

30 25 58.43 8.43

7.07505

4.191.19

t

50, 30, 8.43n x s

1 Adapted from Hinton, pp. 62-67

Example1

:

Page 8: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: = 25

50, 30, 8.43n x sMean 30Standard Error 1.192151Median 30Mode 30Standard Deviation 8.429782Sample Variance 71.06122Kurtosis -0.33067Skewness -2.4E-18Range 36Minimum 12Maximum 48Sum 1500Count 50Confidence Level(95.0%) 2.395716Confidence Level(99.0%) 3.19491

Output of Excel’ s Descriptive Statistics procedure

Example:

Page 9: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Dependent Samples

• Dependent samples consist of pairs of observations for each sample element

• The crucial evidence is the set of pairwise differences: Di = di2 - di1

• Test statistic

( )

0

D n

D Dt

s sn

Page 10: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Dependent Samples

1.501.604

81.50

2.6460.567

t

8, 1.50, 1.604n D s

1 Adapted from Hinton, pp. 83-86

Example1:

Page 11: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Dependent Samples

8, 1.50, 1.604n D s

Morning AfternoonMean 5.25 3.75Variance 1.642857 1.071429Observations 8 8Pearson Correlation 0.053838Hypoth'd. Mean Diff. 0df 7t Stat 2.645751P(T<=t) one-tail 0.016573t Critical one-tail 1.894578P(T<=t) two-tail 0.033146t Critical two-tail 2.364623

Output of Excel’ s t-Test: Paired Two-Sample for Means procedure

Example:

Page 12: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Independent Samples

1 2

1 2

x x

x xt

s

where

1 2

1 2

1 1x x p

n ns s

2 21 1 2 22

1 2

1 1

1 1p

n s n ss

n nand

Test statistic

Page 13: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Independent Samples

Example1:

1 Adapted from Hinton, pp. 88-90

21 1 1 1

22 2 2 2

6, 5.00, 0.8944, 0.8000

8, 6.25, 1.4880, 2.2143

n x s s

n x s s

2 6 1 (0.8000) 8 1 2.2143

1.6250; 1.27485 7p ps s

1 2

1 11.2748 1.2748 0.2917

6 8

1.2748 0.5401 0.6884

x xs

5.00 6.251.8157

0.6884t

Page 14: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Independent Samples

Example:

Output of Excel’ s t-Test: Two-Sample Assuming Equal Variances procedure

Men WomenMean 5 6.25Variance 0.8 2.214286Observations 6 8Pooled Variance 1.625Hypoth'd. Mean Diff. 0df 12t Stat -1.81568P(T<=t) one-tail 0.047236t Critical one-tail 1.782287P(T<=t) two-tail 0.094472t Critical two-tail 2.178813

21 1 1 1

22 2 2 2

6, 5.00, 0.8944, 0.8000

8, 6.25, 1.4880, 2.2143

n x s s

n x s s

Page 15: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 with Independent Samples

Example:

Output of Excel’ s ANOVA: Single Factor procedure

21 1 1 1

22 2 2 2

6, 5.00, 0.8944, 0.8000

8, 6.25, 1.4880, 2.2143

n x s s

n x s s

SUMMARY

Groups Count Sum Average VarianceMen 6 30 5 0.8Women 8 50 6.25 2.214286

ANOVA

Source of Variation SS df MS F P-value F critBetween Groups 5.357143 1 5.357143 3.296703 0.094472 4.747221Within Groups 19.5 12 1.625

Total 24.85714 13

Page 16: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

An ANOVA Table

Source SS DF MS F

Total SST (total number of observations)-1

Between SSB (no. of groups)-1=k-1

2

2

2

ˆ ˆ

ˆB

B B obs

B W

BSSMS F

DF

Within SST - SSB DFT - DFB 2ˆWW W

W

SSMS

DF

Page 17: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

H0: 1 = 2 = 3 with Independent Samples

Example from pp. 122-124 of Hinton

Output of Excel’ s ANOVA: Single Factor procedure

SUMMARYGroups Count Sum Average Variance

First Letter 10 150 15 8.666667Last Letter 10 240 24 12No Letter 10 280 28 19.33333

ANOVASource of Variation SS df MS F P-value F critBetween Groups 886.6667 2 443.3333 33.25 5.22E-08 3.354131Within Groups 360 27 13.33333

Total 1246.667 29

Page 18: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

ANOVAThe ANOVA procedure is capable of dealing, in a single experiment, not only with several levels of one factor but also with several levels of each of several factors. This capability has contributed, in a major way, to research in countless fields of science and technology since R. A. Fisher invented ANOVA in the 1920s, ranging from agriculture (e.g., the Green Revolution) to pharmaceuticals to the manufacturing of integrated circuits.

Without ANOVA, we would not eat as well and cheaply as we do, we would not be as healthy, and we would be nowhere near as far along in the computer and communications revolution.

Page 19: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

ANOVA and Sir Ronald FisherAn even more important achievement was Fisher's origination of the concept of the analysis of variance. This is a statistical procedure used to design experiments that answer several questions at once, instead of just one. Fisher's principal idea was to arrange an experiment as a set of partitioned subexperiments that differ from each other in one or several of the factors or treatments applied in them. The subexperiments are designed in such a way as to permit differences in their outcome to be attributed to the different factors or combinations of factors by means of statistical analysis. This was a notable advance over the prevailing scientific method of varying only one factor at a time in an experiment, which was a relatively inefficient procedure. It was later found that the problems of bias and multivariate analysis that Fisher had solved in his plant-breeding research are encountered in a great deal of other experimental work in biology, and indeed in many other scientific fields as well. From: Encyclopedia Britannica Online, 2002

Page 20: School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS 397.1 Introduction

School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science

Sir Ronald Aylmer Fisher 1890-1962