90
1 Lecture 7 POWER IN STATISTICAL TESTING

1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

Embed Size (px)

Citation preview

Page 1: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

1

Lecture 7

POWER IN STATISTICAL TESTING

Page 2: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

2

The caffeine experiment

In the Caffeine experiment, there were two groups:

1. the CAFFEINE group;

2. the PLACEBO group.

Page 3: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

3

Statistical summary

Page 4: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

4

Statistics and parameters

• We use Arabic letters to denote the statistics of the sample; we use Greek letters to denote PARAMETERS, that is, characteristics of the population.

Page 5: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

5

The null hypothesis

The null hypothesis (H0) states that, in the population, the Caffeine and Placebo means have the same value.

• H0: μ1 = μ2

0 1 2:H

Page 6: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

6

The alternative hypothesis

The alternative hypothesis (H1) states that the Caffeine and Placebo means are NOT equal.

1 1 2:H

Page 7: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

7

The test statistic

• To test the null hypothesis, we shall need a TEST STATISTIC, in this case one which reflects the size of the difference between the sample means M1 and M2.

• The t statistic fits the bill.

Page 8: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

8

The value of t

• We do not know the supposedly constant population variance σ2.

• Our estimate of σ2 is 10.37.

Pooled variance estimate

Page 9: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

9

Sampling distribution of t

• There are many t distributions.

• To test H0, we must locate our value of t in the appropriate t distribution.

• That distribution is specified by the DEGREES OF FREEDOM df, which is given by df = n1 +n2 – 2. In our example, df = 20 + 20 – 2 = 38.

Page 10: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

10

Appearance of a t distribution• A t distribution is very like

the standard normal distribution.

• They are both centred on zero.

• The greater the difference between the two means, the further out in either tail your value of t will be.

• If the null hypothesis is true, the value of (M1 – M2) and therefore that of t will usually be small.

• A large value of t is taken to be evidence against the null hypothesis.

.95.95Z~N(0, 1)

t(2)

0

Page 11: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

11

Significance

• To make a test of SIGNIFICANCE, we locate a CRITICAL region of values in the t distribution with 38 degrees of freedom.

• A SIGNIFICANCE LEVEL or ALPHA-LEVEL is a small probability α, such as 0.05 and 0.01, fixed by convention. In psychology, the 0.05 level is generally accepted.

• We locate the critical region in the tails of the t distribution, so that the probability of a value in EITHER one tail OR the other is α. In our case, α = 0.05. The probability of a value in one PARTICULAR tail is α/2 = 0.025.

Page 12: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

The critical region

• We shall reject the null hypothesis if the value of t falls within EITHER tail of the t distribution on 38 degrees of freedom.

• To be significant beyond the .05 level, our value of t must be greater than +2.02 OR less than –2.02.

• (To be significant beyond the .01 level, our value of t must be greater than +2.704 or less than –2.704.)

Page 13: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

13

The p-value

• The TOTAL blue area is the probability, under the null hypothesis, of getting a value of t at least as extreme as your value.

• If the p-value is less than .05, you are in the critical region, and your value of t is significant beyond the .05 level.

0

Pr of a value at least as small as

yours

Page 14: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

14

The sign of t

• The sign of t is IRRELEVANT.

• If t is negative, it’s in the lower tail: if it’s positive, it’s in the upper tail.

• In either case, the p-value is the TOTAL blue area, because an extreme value in EITHER direction is evidence against the null hypothesis.

Page 15: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

15

Direction of subtraction in the t test

• The value of t is based upon the difference between means M1 – M2.

• If the first mean is larger than the second, the difference (and t) will be positive.

• When in the t-test procedure, you complete the Define Groups dialog, the mean entered second will be subtracted from the mean entered first.

• We entered the values in the order: 1, 0, ensuring that the Placebo mean would be subtracted from the Caffeine mean.

Page 16: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

16

Avoiding the negative sign

• By entering the Caffeine mean first, we ensure that the sign of t is positive.

• Had the Placebo mean had been larger, the sign of t could still have been kept positive by entering that mean first when defining the groups in the t-test dialog.

• That’s fine – a difference in EITHER direction can lead to rejection of the null hypothesis.

Page 17: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

17

Result of the t test

• The p-value of 2.6 is .01 (to 2 places of decimals).

• Our t test has shown significance beyond the .05 level.

• But the p-value is greater than .01, so the result, although significant beyond the .05 level, is not significant beyond the .01 level.

Page 18: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

18

Your report

“The scores of the Caffeine group (M = 11.90; SD = 3.28) were higher than those of the Placebo group (M = 9.25; 3.16). With an alpha-level of 0.05, the difference is significant: t(38) = 2.60; p = .01 . ”

degrees of

freedom

The p-value is expressed to two places of decimals

value of t

Page 19: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

19

Type I error

• Suppose the null hypothesis is true.

• If you keep sampling a large number of times, every now and again (in 5% of samples), you will get a value of t in one of the tail areas (the critical region) and reject the null hypothesis.

• You will have made a TYPE I ERROR.

• The probability of a Type I error is the significance level, which is also termed the ALPHA-LEVEL.

Page 20: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

20

Type II error

• A Type II error is said to occur when a test fails to show significance when the null hypothesis is FALSE.

• The probability of a Type II error is symbolised as β, and is also known as the TYPE II ERROR RATE or BETA RATE.

Page 21: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

21

The beta rate

• The light-grey area is part of the critical region.

• Any value of t outside the tail of the H0 distribution is insignificant.

• The dark area is the probability that the null hypothesis will be accepted, even though it is false.

• This is the BETA RATE.

Real difference

0 α/2

Page 22: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

22

Power

The POWER of a statistical test is the probability that the null hypothesis will be refected, given that it is FALSE.

Page 23: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

23

Power

• The power of the statistical test is the area under the H1 distribution to the right of the dark-grey area.

• This is the POWER of the test. Real

difference

0α/2

Page 24: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

24

Power and the beta rate

Since the entire area under either curve is 1, the area in the H1 distribution to the right of the dark area (the power) is 1 – β.

Real difference

0α/2

Page 25: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

25

Increasing the power

By REDUCING the beta-rate, you INCREASE the power of the test, because beta-rate + power = 1.

Page 26: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

26

Type 1 and type 2 errors: power

Type I error Pr = α

Type II errorPr = β

Page 27: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

27

A change in emphasis

• Traditionally, the emphasis had been on controlling Type I errors.

• This is achieved by insisting upon statistical significance.

• Even more control would be achieved by adopting the 0.01 significance level, rather than the 0.05 level.

• So why not fix alpha at 0.01, rather than 0.05?

Page 28: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

28

Significance level and the beta-rate

• Suppose you decide upon a smaller significance level (lower figure).

• The probability of a type II error (green) increases.

• The power (P) decreases.

βP

β P

Page 29: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

29

The need to strike a balance

• Adopting the 0.01 level reduces the Type I error rate.

• But that INCREASES the Type II error rate and REDUCES the power of the test.

• It is now considered that, in the past, there was insufficient concern with high beta-rates and low power.

• The 0.05 level is thought to achieve the right BALANCE between Type I and Type II errors.

Page 30: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

30

Power and sample size

• Increasing the sample size reduces the overlap of the sampling distributions under H0 and H1.

• The beta-rate is reduced and so the power increases.

Small samples

Large samples

0

Real difference

Page 31: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

31

Small samples

• Small samples bring the distributions together, increase the size of the dark-grey area and reduce the power of the test.

• Large samples separate the distributions, reduce the dark-grey area and increase the power of the test.

Real difference

0α/2

Page 32: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

32

Factors affecting the Type II error rate

• An insufficiency of data (too few participants) means that the sampling distributions under H0 and under H1 overlap too much and most of the H1 distribution lies below the critical value for rejection of H0.

• A similar effect arises from unreliable data, which inflate random variation and increase the overlap.

• Finally, as we have seen, fixing a lower significance level (0.01 rather than 0.05) also increases the beta-rate.

Page 33: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

33

How much power do we need?

• Cohen (1988) observed a general tendency for psychologists to be preoccupied with avoiding Type I errors and insufficiently concerned with the possibility of Type II errors.

• Most tests had insufficient power. Power levels as low as 0.05 were not uncommon.

• A MINIMUM POWER OF 0.75 IS RECOMMENDED.

Page 34: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

34

Effect size in the t-test

• We obtained a difference between the Caffeine and Placebo means of (11.90 – 9.25) = 2.65 score points.

• If we take the spread of the scores to be the average of the Caffeine and Placebo SDs, we have an average SD of about 3.25 score points.

• So the means of the Caffeine and Placebo groups differ by about 0.8 SD.

Page 35: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

35

Measuring effect size: Cohen’s d statistic

• In our example, the value of Cohen’s d is 2.75/3.25 = .8 .

• Is this a ‘large’ difference?

Page 36: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

36

Levels of effect size• Jacob Cohen proposed that we regard a d of .2

as a SMALL effect size, a d of .5 as a MEDIUM effect size and a d of .8 as a LARGE effect size.

• So our experimental result is a ‘large’ effect.

Page 37: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

37

How many participants?

• That depends upon the minimum effect size that you want to pick up with your significance tests.

• You also want to make sure your power is at least at the 0.75 level.

• You can obtain the number of participants necessary by looking up tables (Cohen 1988; Clark-Carter, 2004).

Page 38: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

38

Books with power tables

• Clark-Carter, D. (2004). Quantitative psychological research: a student’s handbook (2nd ed.). Hove: Psychology Press.

• Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Page 39: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

39

Using the Web

• The Web is a very useful source of up-to-date information on all statistical topics, including power and effect size.

• Use the Google search engine.

• Their address is http://www.google.com

• Use the phrases ‘statistical power’ and ‘effect size’.

Page 40: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

40

Software

• Software is available for finding the numbers of participants you will need to make a test at a specified level of power.

• An example is G*Power (Erdfelder et al., 1996).

• G*Power is available on the Web.

Page 41: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

41

The Google window

Page 42: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

42

Useful information on the Web

Page 43: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

43

The G-power opening window

Page 44: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

44

Using G-Power

• You can choose the ‘a priori’ option to find out how many participants you would need to achieve a power of .95 of rejecting the null hypothesis for a medium effect on a one-tail test.

• You would need 176 participants!

• That’s why we usually settle for a power level of 0.75 rather than 0.95!

You fill in these values

The answers

Page 45: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

45

Directional hypotheses

• The null hypothesis states simply that, in the population, the Caffeine and Placebo means are equal.

• H0 is refuted by a sufficiently large difference between the means in EITHER direction.

• But some argue that if our scientific (alternative) hypothesis is that Caffeine improves performance, we should be looking at differences in only ONE direction.

Page 46: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

46

One-tail tests

• Suppose we are only interested in the possibility of a difference in ONE DIRECTION.

• Some argue that we should locate the critical region for our test in one tail only of the t distribution.

• The critical value of t is now lower.

0.025

(2.5%)0.025

(2.5%)

0.05

(5%)

-2.03 0 +2.03

0 +1.69

Page 47: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

47

Another result

• Suppose that the mean score of the Caffeine group had been, not 11.90, but 10.97.

• In other words, the Caffeine and Placebo means differ by only 1.72 points, rather than 2.65 points, as in the original example.

• The value of t is now 1.69, which is the critical value on a one-tail test.

Page 48: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

48

Report of the one-tailed test

“The scores of the Caffeine group (M = 10.97; SD = 3.28) were significantly higher than those of the Placebo group (M = 9.25; 3.16): t(38) = 1.69; p = .05 (one-tailed) .”

degrees of

freedom

The p-value is expressed to two places of decimals

value of t

Page 49: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

49

Advantage of the one-tailed test

• Our t value of 1.69 would have failed to achieve significance on the two-tailed test, since the critical value was +2.03 .

• On the one-tailed test, however, t is in the critical region and the null hypothesis is rejected.

Page 50: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

50

More power

• In locating the entire critical region in the upper tail of the H0 distribution, we increase the light-grey area and reduce the dark-grey area.

• In order words, we increase the POWER of the test to reject H0.

Real difference

Page 51: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

51

An unexpected result

• Now suppose that, against expectation, the Placebo group had outperformed the Caffeine group.

• The mean for the Caffeine group is 9.25 and that for the Placebo group is 10.20.

• Were we to subtract the smaller mean from the larger (as in the traditional approach), we should obtain t = +2.02.

• On a two-tailed test, this is in the critical region (p <.05) and we should reject the null hypothesis.

Page 52: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

The null hypothesis

Page 53: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

53

Direction of subtraction

• The direction of subtraction is now crucial.

• You MUST subtract the Placebo mean from the Caffeine mean.

• Only a POSITIVE value of t can falsify the directional null hypothesis.

Page 54: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

54

One-sided p-value

• In the one-tailed test, the p-value is ONE-SIDED, that is, it is the probability, under the one-sided null hypothesis, that the Caffeine mean would have been at least as LARGE as the one we obtained.

Page 55: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

55

The one-sided p-value

• The one-sided p-value is the entire area under the curve to the right of your value of t. This includes the black area.

• That area is 0.975. -2.02

Your value of t.

Page 56: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

56

Report of the one-tail test

“The scores of the Caffeine group (M = 9.25; SD = 3.16) were not significantly higher than those of the Placebo group (M = 10.20; SD = 3.28): t(38) = -1.69 ; p = 0.975 (one-tailed) .”

Page 57: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

57

Changing horses

• Oh dear! I haven’t got significance! I won’t get my paper published!

• I know: I’ll make a two-tail test instead.

• There we are, the result is significant!

• Problem solved!

Page 58: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

58

The true Type I error rate.

• If you follow that practice, your true Type I error rate is now the black area (0.05) PLUS the area in the lower tail (0.025). This is 0.05 + 0.025 = 0.075, a level many feel is too high.

0.025

Page 59: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

59

One-tailed tests?

• I have concerns about the making of one-tailed tests.

• There are two reasons for this.

1. You could find yourself unable to report a surprising result.

2. Many journal editors are suspicious of one-tail tests – and with good reason.

Page 60: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

60

A special situation

• Neurospsychologists often want to know whether a score is so far BELOW the norm that there is evidence for brain damage.

• They want to establish a cut-off point below which a score is deemed to be indicative of impairment.

• A one-tail test is appropriate there. • But I think one-tail tests are seldom justified in

RESEARCH, as opposed to clinical practice.

Page 61: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

61

Making your choice: some guidelines

Page 62: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

62

Advice

• In Chapter 1 of Kinnear & Gray (2004, 2006), there is some advice on the choosing of an appropriate statistical test.

• Your choice depends upon three main considerations:

1. Your RESEARCH QUESTION2. The plan or DESIGN of your research3. The kind of DATA you have.

Page 63: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

63

1. Five research questions

Page 64: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

64

The first decision

• Are you comparing averages on the SAME VARIABLE over different conditions or groups?

OR• Are you investigating

a possible association between TWO measured variables?

Page 65: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

65

The caffeine experiment

Page 66: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

66

Making a comparison

• There is only ONE measured variable: the dependent variable, Performance.

• We want to COMPARE the AVERAGE scores of the two groups.

Page 67: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

67

The violence study

• There are TWO variables, Actual violence and Exposure to screen violence.

• We are investigating a possible ASSOCIATION between them.

Page 68: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

68

Comparison or association?

Comparing or associating?

Comparing Associating

t-tests, ANOVA Correlation, regression

Page 69: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

69

2. Design of the study

Two important aspects

1. COMPLEXITY: how many conditions or groups were there? How many measured variables?

2. SAMPLING STRATEGY: was your experiment of between subjects or within subjects design?

Page 70: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

70

Complexity

Comparing

Simple (2 groups or conditions)

Complex(three or more)

t-test ANOVA

Page 71: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

71

Complexity …

Associating

Simple Complex

Pearson correlationSimple regression

Multiple regressionFactor analysis

Page 72: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

72

Sampling strategy

Comparing

Between subjects Within subjects

(repeated measures)

Independent samples Related samples

Page 73: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

73

3. Your data

Two aspects are important

1. Their LEVEL OF MEASUREMENT

2. Their DISTRIBUTION

Page 74: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

74

Levels of measurement

1. SCALE data are measures on an independent scale with units: heights, weights, IQs.

2. ORDINAL data are ranks.

3. NOMINAL data are assignments to categories: so-many males, so-many females.

Page 75: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

75

Implications of level of measurement

• If you have nominal data, you can’t run a t-test or an ANOVA.

• If you have two or more sets of ranks, you can’t run a Pearson correlation.

• Other statistics are available for those purposes.

Page 76: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

76

The distribution

• Many statistical techniques such as t-tests and ANOVA assume normality of distribution.

• There may be additional requirements such as homogeneity of variance.

• If the data violate the assumptions of the model underlying the test, the p-value of the test statistic may be inaccurate.

Page 77: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

77

Decision charts

• There is no safe, automated way of choosing a statistical test correctly.

• The various charts in Chapter 1 of Kinnear & Gray (2004, 6) are intended to be GUIDELINES, rather than prescriptions.

• There has been considerable controversy about the correct choice in some situations.

• Here is one such chart.

Page 78: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

78

Decision chart for choosing a test of the difference between means, medians, or frequencies

Page 79: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

79

‘Well-behaved’ data

• You have run an experiment of simple, two-group between subjects design.

• At the bottom of the chart you see listed the independent-samples t-test and the Mann-Whitney test, which is a nonparametric equivalent.

• If your data conform to t-test requirements, the t-test is preferred: it has more POWER than the Mann-Whitney test.

Page 80: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

80

Nonconforming data

• But suppose your data violate the requirements of the t-test.

• Suppose the distributions are far from normal and there appears to be marked heterogeneity of variance.

• Some would say you should run a Mann-Whitney test instead of a t-test.

• Others, however, would urge you to clean up your data and run the version of the t-test that doesn’t make the assumption of homogeneity of variance.

Page 81: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

81

Example

Type II error rate.

Type I error rate.

Type II error rate.

Page 82: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

82

Appendix 1

EQUIVALENCE OF ANOVA AND THE t-TEST IN THESPECIAL CASE OF TWO

GROUPS OR CONDITIONS

Page 83: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

83

The two-group case

In the first lecture, I revised the independent-samples t-test and illustrated the procedure with the data from the caffeine experiment.

Returning to the caffeine experiment, what would happen if, instead of making a t test, we were to run an ANOVA to test the null hypothesis of equality of the means?

Page 84: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

84

The two-group case: comparison of ANOVA with the t-test

• Observe that F = t2 .• Observe also that the

p-value is the same for both tests.

• The ANOVA and the independent-samples t test are EXACTLY EQUIVALENT and produce the same decision about the null hypothesis.

Page 85: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

85

Equivalence of the F and t tests

Page 86: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

86

The t statistic

In the present example (where n1 =

n2), the pooled

estimate s2 of σ2 is simply the mean of the variance estimates from the two samples.

1 2

2

1 2

1 1

M Mt

sn n

37.102

28.316.3

2

2222

212

sss

Page 87: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

87

In the two-group case …

• The pooled variance estimate s2 in the t statistic is identical with the pooled variance estimate MSwithin in the one-way ANOVA.

• When we specify a t statistic, we put the degrees of freedom in brackets, as in t(38).

• In the present comparative context, this is t(dfwithin).

Page 88: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

88

Equivalence of F and t in the two-group case

• When there are only two groups, the value of F is the square of the value of t.

• So if t is significant, then so is F and vice versa.

Page 89: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

89

Explanation

Page 90: 1 Lecture 7 POWER IN STATISTICAL TESTING. 2 The caffeine experiment In the Caffeine experiment, there were two groups: 1. the CAFFEINE group; 2. the PLACEBO

90

The related-samples t-test

• The same equivalence holds with the within subjects ANOVA and the related-samples t-test.