43
1 Experimental Experimental Statistics Statistics - week 3 - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

Embed Size (px)

Citation preview

Page 1: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

1

Experimental StatisticsExperimental Statistics - week 3 - week 3Experimental StatisticsExperimental Statistics - week 3 - week 3

Chapter 8: Inferences about More Than 2 Population Central Values

Page 2: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

2

PC SAS on Campus

Library

BIC

Student Center

http://support.sas.com/rnd/le/index.html

SAS Learning Edition $125

Page 3: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

3

Hypothetical Sample Data

Scenario A

Pop 1 Pop 2

5 8 7 9 6 6 3 8 4 9

Scenario B

Pop 1 Pop 2

3 7 10 4 3 12 1 4 8 131 5X 2 8X 1 5X 2 8X

0 :

:A B

a A B

H

H

0 | | 2.306H t Reject if

For one scenario, | t | = 1.17For the other scenario, | t | = 3.35

Page 4: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

4

In general, for 2-sample t-tests:

To show significance, we want the difference

between groups to be ___________

compared to the variability within groups1 2( i.e. ) X X

1 2

1 1(as measured by )pS

n n

Page 5: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

5

Completely Randomized Design1-Factor Analysis of Variance

(ANOVA)

2 2 21 2- t

Setting (Assumptions):

- t populations

- populations are normal

2- and i i

- mutually independent random samples are taken from the populations

- the sample sizes to not have to all be equal

denote the mean and variance

of the ith population

Page 6: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

6

1-Factor ANOVA1-Factor ANOVA1-Factor ANOVA1-Factor ANOVA

. . .

Page 7: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

7

Question:

1 2 IS ?t

0 1 2: tH

: the means are not all equalaH

Notes:- not directional

i.e. no “1-sided / 2-sided” issues

- alternative doesn’t say that all means are distinct

i.e we test the null hypothesis

Page 8: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

8

Completely Randomized Design1-Factor Analysis of Variance

Example data setup where t = 5 and n = 4

Page 9: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

9

Notation:

ijy j i- denotes th observation from th population

in i- denotes sample size from th population

- denotes sample average from th populationiy i

- denotes sample average of all observationsy

Page 10: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

10

2 2 2.. . .. .

1 1 1 1 1

( ) ( ) ( )t n t t n

ij i ij ii j i i j

y y n y y y y

A Sum-of-Squares Identity

Note: This is for the case in which all sample sizes are equal ( n )

The 3 sums of squares measure: - variability between samples - variability within samples - total variability

Question: Which measures what?

Page 11: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

11

2 2 2.. . .. .

1 1 1 1 1

( ) ( ) ( )t n t t n

ij i ij ii j i i j

y y n y y y y

TSS SSB SSW Notation:

where

TSS(total SS) = total sample variability

SSB(SS between samples) = variability due to factor effects

SSW(within sample SS) = variability due to uncontrolled error

In words:Total SS = SS between samples + within sample SS

Note: Formula for unequal sample sizes given on page 388

Page 12: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

12

Pop 1 5 5 5 5

Pop 2 9 9 9 9

Pop 3 7 7 7 7

2. ..

1

( )t

ii

SSB n y y

What is

2.

1 1

( )t n

ij ii j

SSW y y

What is

Page 13: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

13

Pop 1 4 8 3 9

Pop 2 6 10 2 6

Pop 3 5 8 7 4

2. ..

1

( )t

ii

SSB n y y

What is

2.

1 1

( )t n

ij ii j

SSW y y

What is

Page 14: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

14

To show significance, we want the difference between groups 1 2y y( i.e. ) to be large

compared to the variability within groups

1 2

1 1(as measured by )pS

n n

Recall: For 2-sample t-test, we tested using

1 2

1 2

1 1

p

y yt

sn n

0 1 2:H

Page 15: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

15

Note: Our test statistic for testing

will be of the form

0 1 2: tH :aH the means are not all equal

/( 1)

/( )

SSB tF

SSW tn t

This has an F distribution

-1 -t tn twith and df when

0H is true

Question: What type of F values lead you to believe the null is NOT TRUE?

Page 16: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

16

Analysis of Variance TableAnalysis of Variance TableAnalysis of Variance TableAnalysis of Variance Table

Note:

1 2

T

t

n nt

n n n

if sample sizes are equal

otherwise

2

0 2( 1, )B

TW

sH F F t n t

s We reject at significance level if

Page 17: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

17

Note:

2 2W ps s is a generalization of

Page 18: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

18

CAR DATA Example

For this analysis, 5 gasoline types (A - E) were to be tested. Twenty carswere selected for testing and were assigned randomly to the groups (i.e. the gasoline types). Thus, in the analysis, each gasoline type was tested on 4 cars. A performance-based octane reading was obtained for each car,and the question is whether the gasolines differ with respect to this octanereading.  

  A

91.7 91.2 90.9 90.6

B

91.7 91.9 90.9 90.9

C

92.4 91.2 91.6 91.0

D

91.8 92.2 92.0 91.4

E

93.1 92.9 92.4 92.4

means 91.10 91.35 91.55 91.85 92.70

Page 19: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

19

ANOVA Table Output - car data 

 

Source SS df MS F p-value 

Between 6.108 4 1.527 6.80 0.0025  samples

Within 3.370 15 0.225  samples

Totals 9.478 19 

Page 20: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

20

F-table -- p.1106

Page 21: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

21

Extracted from From Ex. 8.2, page 390-391

3 Methods for Reducing Hostility

12 students displaying similar hostility were randomly assigned to 3 treatment methods. Scores (HLT) at end of study recorded.

Method 1 96 79 91 85

Method 2 77 76 74 73

Method 3 66 73 69 66

Test: 0 1 2 3:H

Page 22: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

22

ANOVA Table Output - hostility data 

 

Source SS df MS F p-value 

Between  samples

Within  samples

Totals 

Page 23: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

23

      

SPSS ANOVA Table for Hostility Data

Page 24: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

24

ANOVA Models

Consider the random sample

Population has mean .

1 2, ,..., ny y y

1 2 35.5, 3.8, 6.0,y y y where etc.

1 2, ,...,

,

, 1,...,

n

i i

y y y

y i n

2

If is a sample from a population that is

normal with mean and variance then we

can write

Note:

Example:

Page 25: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

25

11 1 11

12 1 12

21

42

We can write . .

yy

y

y

For 1-factor ANOVA

Page 26: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

26

Alternative form of the 1-Factor ANOVA Model

2 ' are (0, )ij s NID

General Form of Model: ij i ijy

(pages 394-395)

- random errors follow a Normal distribution, are independently distributed, and have zero mean and constant variance

1

0 Note: t

ii

i i

ij i ijy

1

1

t

iit

-- i.e. variability does not change from group to group

Page 27: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

27

0 1 2:

:

Testing the hypotheses:

at least 2 means a unequalt

a

H

H

0 :

:

is equivalent to testing the hypotheses:

a

H

H

Page 28: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

28

Analysis of Variance TableAnalysis of Variance TableAnalysis of Variance TableAnalysis of Variance Table

2

0 2( 1, )B

TW

sH F F t n t

s We reject at significance level if

Recall:

Note:

- if no factor effects, we expect F _____

- if factor effects, we expect F _____

Page 29: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

29

 The CAR data set as SAS needs to see it:  A 91.7A 91.2A 90.9A 90.6B 91.7B 91.9B 90.9B 90.9C 92.4C 91.2C 91.6C 91.0D 91.8D 92.2D 92.0D 91.4E 93.1E 92.9E 92.4E 92.4

Page 30: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

30

Case 1:  Data within SAS FILE : DATA one;INPUT gas$ octane;DATALINES;A 91.7A 91.2 . . . E 92.4E 92.4 ;PROC GLM; CLASS gas; MODEL octane=gas; TITLE 'Gasoline Example - Completely Randomized Design'; MEANS gas;RUN;PROC MEANS mean var;RUN;PROC MEANS mean var;class gas;RUN;

SAS file for CAR data

Page 31: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

31

The SAS Output for CAR data: Gasoline Example - Completely Randomized Design  

General Linear Models Procedure 

Dependent Variable: OCTANE Sum of MeanSource DF Squares Square F Value Pr > FModel 4 6.10800000 1.52700000 6.80 0.0025Error 15 3.37000000 0.22466667Corrected Total 19 9.47800000 

R-Square C.V. Root MSE OCTANE Mean 0.644440 0.516836 0.4739902 91.710000  

Source DF Type I SS Mean Square F Value Pr > FGAS 4 6.10800000 1.52700000 6.80 0.0025GAS 4 6.10800000 1.52700000 6.80 0.0025

Textbook Format for ANOVA Table Output - car data 

 

Source SS df MS F p-value 

Between 6.108 4 1.527 6.80 0.0025  samples

Within 3.370 15 0.225  samples

Totals 9.478 19 

Page 32: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

32

Problem 1. Descriptive Statistics for CAR Data   The MEANS Procedure  Analysis Variable : octane  Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.7100000 0.7062876 90.6000000 93.1000000

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 33: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

33

Problem 3. Descriptive Statistics by Gasoline  ------------------------------------ gas=A -------------------------------------  The MEANS Procedure  Analysis Variable : octane  Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.1000000 0.4690416 90.6000000 91.7000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ  ------------------------------------ gas=B -------------------------------------  Analysis Variable : octane  Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.3500000 0.5259911 90.9000000 91.9000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ  ------------------------------------ gas=C -------------------------------------  Analysis Variable : octane Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.5500000 0.6191392 91.0000000 92.4000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ------------------------------------ gas=D ------------------------------------- Analysis Variable : octane  Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.8500000 0.3415650 91.4000000 92.2000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ------------------------------------ gas=E -------------------------------------  The MEANS Procedure  Analysis Variable : octane  Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 92.7000000 0.3559026 92.4000000 93.1000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

 

Page 34: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

34

Page 35: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

35

Question 1: Which gasolines are different?

Question 2: Why didn’t we just do t-tests to compare all combinations of gasolines?

i.e. compare

A vs B

A vs C

. . .

D vs E

Page 36: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

36

Simulation:

i.e. using computer to generate data under certain known conditions and observing the outcomes

Page 37: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

37

Setting:

Normal population with: and

Simulation Experiment:Generate 2 samples of size n = 10 from this population and run t-test to compare sample means.

Question:

What do we expect to happen?

0 1 2

1 2

:

:a

H

H

i.e test:

Page 38: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

38

2 21.1 5.4

y st-test procedure:

Reject H0 if | t | > 2.101

Simulation Results:

t = .235 so we do not reject H0

(which is what we expected)

1 21.6 4.0

Page 39: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

39

1 21.6 4.02 21.1 5.43 20.9 6.24 18.3 3.25 23.1 6.76 18.6 4.87 22.2 5.88 19.1 5.99 20.3 2.510 19.3 3.2

y s

Now - suppose we obtain 10 samples and test:0 1 2 10 by doing all possible t-tests?H

Simulation results:

Note: Comparing means 4 vs 5 we get t = 2.33

-- i.e. we reject the null (but it’s true!!)

Page 40: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

40

Suppose we run all possible t-tests at significance level to compare 10 sample means of size n = 10 from this population

- it can be shown that there is a 63% chance that at least one pair of means will be declared significantly different from each other

F-test in ANOVA controls overall significance level.

Page 41: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

41

Probability of finding at least 2 of k means significantly different using multiple t-tests at the level when all means are actually equal.

k Prob.

2 .05

3 .13

4 .21

5 .29

10 .63

20 .92

Page 42: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

Protected LSD: Preceded by an F-test for overall significance.

1 2

1 2

22

1 2

1 1( )α/ W

y y

y y

t sn n

and are significantly different if

| | LSD

where

LSD = +

and within (error) df

Unprotected: Not preceded by an F-test (like individual t-tests).

Only use the LSD if F is significant.

Fisher’s Least Significant Fisher’s Least Significant Difference (LSD)Difference (LSD)

X

Page 43: 1 Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values

43

Gasoline Example - Completely Randomized Design -- All 5 Gasolines  The GLM Procedure Dependent Variable: octane  Sum of Source DF Squares Mean Square F Value Pr > F  Model 4 6.10800000 1.52700000 6.80 0.0025  Error 15 3.37000000 0.22466667  Corrected Total 19 9.47800000   R-Square Coeff Var Root MSE octane Mean  0.644440 0.516836 0.473990 91.71000   Source DF Type I SS Mean Square F Value Pr > F  gas 4 6.10800000 1.52700000 6.80 0.0025