159
1 This morning’s programme • 9:05am – 10:50am The t tests • 10:50am – 11:20am A break for coffee. • 11:20am – 12:30pm Approximate chi-square tests

1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

Embed Size (px)

Citation preview

Page 1: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

1

This morning’s programme

• 9:05am – 10:50am The t tests

• 10:50am – 11:20am A break for coffee.

• 11:20am – 12:30pm Approximate chi-square tests

Page 2: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

2

SESSION 2

From description to inference: hypothesis testing

Page 3: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

3

INDUCTIVE REASONING

• Traditional Aristotelian logic is DEDUCTIVE: it argues from the general to the particular.

• Statistical inference reverses this process by arguing INDUCTIVELY from the particular (the sample) to the general (the population).

• Statistical inference, therefore, is subject to error and inferences must be expressed in terms of probabilities.

Page 4: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

4

Kinds of statistical inference

1. Estimation (of parameter values);

2. Hypothesis testing

Page 5: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

5

Estimates

There are two types of estimate:

1. POINT ESTIMATES. For example, we might use the sample mean as an estimate of the value of the population mean.

2. INTERVAL ESTIMATES. On the basis of sample data, we can specify a range of values within which we can assume with specified levels of CONFIDENCE that the population value lies. I discuss confidence intervals in the appendix to this talk.

Page 6: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

6

‘Confirming’ our data

• Suppose that we have found, in our results, a pattern that we would like to confirm, such as a difference between means.

• Could this pattern have arisen merely through sampling error? Would another research team who collect data of this type obtain a similar result?

• Hypothesis testing can provide an answer to questions of this sort.

Page 7: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

7

Statistical hypotheses

• A statistical hypothesis is a statement about a population, usually to the effect that a parameter, such as the mean, has a specified value, or that the means of two or more populations have the same value or different values.

• Here, by “population” is always meant a probability distribution, a hypothetical population of numbers.

Page 8: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

8

Two hypotheses

• In hypothesis testing, as widely practised at present by researchers, a decision is made between two complementary hypotheses, which are set up in opposition to each other:

• the null hypothesis (H0);

• the alternative hypothesis (H1).

Page 9: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

9

The null hypothesis

• The null hypothesis (H0) is the statistical equivalent of the hypothesis of NO EFFECT, the negation of the scientific hypothesis.

• For example, if a researcher thinks a set of scores is from a population with a different mean than a control population, the null hypothesis will state that there is NO such difference.

• The alternative hypothesis (H1) is that the null hypothesis is false.

• In traditional statistical testing, it is the null hypothesis, not the alternative hypothesis, that is tested.

Page 10: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

10

Number of samples

• Tests of some hypotheses can be made by drawing a single sample of scores.

• Other hypotheses, however, can only be tested by drawing two or more samples.

• It is easiest to consider the elements of hypothesis testing by considering one-sample tests first.

Page 11: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

11

One-sample tests

Page 12: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

12

Situation (a)

The population standard deviation is known

Page 13: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

13

An example

• Let us suppose that in the island of Erewhon, men’s heights have an approximately normal distribution with a mean of 69 inches and an SD of 3.2 inches.

• A researcher wonders whether there might be a tendency for those in the north of the island to be taller than the general population.

• A sample of 100 northerners has a mean height of 69.8 inches.

• Remembering that this is merely a sample from the population of northerners, do we have evidence that northerners are taller?

Page 14: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

14

Steps in testing a hypothesis

• Formulate the null and alternative HYPOTHESES.

• Decide upon a SIGNIFICANCE LEVEL. • Decide upon an appropriate TEST STATISTIC. • Decide upon the CRITICAL REGION, a range of

“unlikely” values for the test statistic, that is, less probable than the significance level.

• If the value of the test statistic falls within the critical region, the null hypothesis is rejected.

Page 15: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

15

The null and alternative hypotheses

• The null hypothesis is that, contrary to the researcher’s speculation, the height of northerners is no different from that of the general population.

• The alternative hypothesis is that northerners are of different height.

Page 16: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

16

Significance level

• The significance level is a small probability fixed by tradition.

• The significance level is commonly set at .05, but in some areas researchers insist upon a lower level, such as .01 .

• We shall set the level at .05 .

Page 17: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

17

Revision

• We are talking about a situation in which a single sample has been drawn from a population.

• Here the reference set is the population or probabililty distribution of such samples, which is known as the SAMPLING DISTRIBUTION OF THE MEAN.

• Its SD is known as the STANDARD ERROR OF THE MEAN (σM ).

Page 18: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

18

Sampling distribution of the mean

Page 19: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

19

The standard normal distribution

• Questions about ranges of values in any normal distribution can always be referred to questions about corresponding values in the STANDARD NORMAL DISTRIBUTION.

• We do this by tranforming the original values to values of z, the STANDARD NORMAL VARIATE.

Page 20: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

20

The standard normal distribution

• We transform the original value to z by subtracting the mean, then dividing by the standard deviation.

• In this case, we must divide by σM, not σ.

Page 21: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

21

The test statistic

• Since we know the SD σ, we can use as our test statistic z, where the denominator is the STANDARD ERROR OF THE MEAN, that is, the SD of the sampling distribution of the mean.

Page 22: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

22

The critical region

• We want the total probability of a value in the critical region to be .05, that is the significance level.

• We distribute that probability equally between the two tails of the distribution: .025 in each tail.

Page 23: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

23

Calculate the value of z

• Since this value falls within the critical region, the null hypothesis is rejected.

• We have evidence that the northerners are taller.

Page 24: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

24

The p-value

• The p-value of a test statistic is the probability, assuming that the null hypothesis is true, of obtaining a value of the test statistic at least as unlikely as the one obtained.

• The p-value must be clearly distinguished from the significance level (say .05): the significance level is fixed beforehand; but the p-value is determined by your own data.

Page 25: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

25

Use of the p-value

• If the p-value is less than the significance level, the value of your test statistic must have fallen within the critical region.

• But the p-value tells you more than this.

• A high p-value means that the value of the test statistic is well short of being significant; whereas a low p-value means we are well over the line.

Page 26: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

26

The one-tailed p-value

• The ONE-TAILED p-value is the probability of a value of the test statistic at least as extreme (in the same direction) as the value actually obtained.

Page 27: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

27

The one-tailed p-value

• We obtain the one-tailed p-value by subtracting the cumulative probability of 2.5 from 1: 1 - .9938 = .0062

Page 28: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

28

One-tailed and two-tailedp-values

• If the region of rejection is located in both tails of the sampling distribution, as in the present example, a TWO-TAILED p-value must be calculated.

• We must DOUBLE the one-tailed p-value. • If we didn’t do that, a value only marginally

significant would seem to have a probability of only .025, not .05 as previously decided.

• So if the p-value in either direction is less than .025, the two-sided p-value is less than .05, and we have significance.

Page 29: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

29

The two-tailed p-value of 2.5

• We must now double the one-tailed p-value:

• .0062 × 2 = .0124 .

Page 30: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

30

One-tailed tests

Page 31: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

31

Directional hypothesis

• Our researcher suspects that the northerners are TALLER, not simply that they are of DIFFERENT height. This is a DIRECTIONAL hypothesis.

• On this basis, it could be (and is) argued that the critical region, with a probability of .05, should be located entirely in the UPPER tail of the standard normal distribution.

Page 32: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

32

Critical region for a one-tailed test

Page 33: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

33

Comparison of the critical regions

• If you are only interested in the possibility of a difference in ONE direction, you might decide to locate the critical region entirely in one tail of the distribution.

0.025

(2.5%)0.025

(2.5%)

0.05

(5%)

Page 34: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

34

Easier to get a significant result

• Note that, on a one-tail test, you only need a z-value of 1.64 for significance, rather than a value of 1.96 for a two-tail test.

• So, on a one-tail test, it’s easier to get significance IN THE DIRECTION YOU EXPECT.

Page 35: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

35

Errors in hypothesis testing

Page 36: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

36

Type I errors

• Suppose the null hypothesis is true, but the value of z falls within the critical region.

• We shall reject the null hypothesis, but, in so doing, we shall have made a Type I or alpha (α) error.

• The probability of a Type I error is simply the chosen significance level and in our example its value is .05 .

Page 37: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

37

Probability of a Type I error

• Suppose H0 is true.• If the value of z falls

within either tail, we shall reject H0 and make a Type I error.

• The probability that we shall do this is the significance level, .05.

• Accordingly, the significance level is also referred to as the ALPHA-LEVEL.

Page 38: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

38

Type II (beta) errors

• Suppose the null hypothesis is false.

• The value of test statistic, however, does not fall within the critical region and the null hypothesis is accepted.

• We have made a Type II or beta (β) error.

Page 39: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

39

Power

• The POWER of a statistical test is the probability that, if the null hypothesis is false, it will be rejected by the statistical test.

• When the power of a test is low, an insignificant test result is impossible to interpret: there may indeed be nothing to report; but the researcher has no way of knowing this.

Page 40: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

40

Two distributions

• The following diagram shows the relationships among the significance level, the Type I error rate (significance level) and Power when the null hypothesis is tested against a one-sided alternative hypothesis that the mean has a higher value. (This is a one-tailed test.)

• The overlapping curves represent the sampling distributions of the mean under the null hypothesis (left) and the alternative hypothesis (right).

• In the diagram, μ0 and μ1 are the values of the population mean according to the null and alternative hypotheses, respectively.

Page 41: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

41

Power and the type I and type II error rates

Page 42: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

42

Points

• Any value of M to the left of the grey area will result in the acceptance of H0.

• If H1 is true (distribution on the right), a Type II error will have been made.

• Notice that the Power and Type II error rates sum to unity.

Page 43: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

43

Summary of errors and correct decisions

Page 44: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

44

Factors affecting the power of a statistical test

Page 45: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

45

Significance level and power

• In the upper figure, the red area is the .05 significance level; the green area is the Type II error rate.

• The lower figure shows that a lower significance level (e.g. .01) reduces the probability of making a Type I error, but the probability of a type II error (green) increases and the power (P) decreases.

βP

β P

Page 46: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

46

Size of the difference between μ1 and μ0.

• The greater the difference between the real population mean and the value assumed by the null hypothesis, the less the overlap between the sampling distributions.

• The less the overlap, the greater will be the area of the H1 (right) curve beyond the critical value under H0 and the greater the power of the test to reject the null hypothesis.

• The researcher has no control over this determinant of power.

Page 47: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

47

A small difference

Page 48: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

48

A large difference

Page 49: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

49

Sample size

• Now we come to another important determinant of the power of a statistical test: sample size.

• This is the factor over which the research usually has the most control.

Page 50: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

50

Revision

• The larger the sample, the smaller the standard error of the mean and the taller and narrower will be the sampling distribution if drawn to the same scale as the original distribution.

Page 51: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

51

Effect of increasing the sample size n

μ

The IQ distribution

Sampling distributions of the mean for n = 16 and n = 64.

n = 64

n = 16

Page 52: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

52

Sample size

• When there are two samples, therefore, larger samples will result in greater separation of the sampling distributions, reduction in the Type II error rate and more power.

Page 53: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

53

Power and sample size

• Increasing the sample size reduces the overlap of the sampling distributions under H0 and H1 by making them taller and narrower.

• The beta-rate is reduced and the power (green area) increases.

Small samples

Large samples

Page 54: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

54

Reliability of measurement

• Greater precision or RELIABILITY of measurement also reduces the standard error of the mean and improves the separation of the null and alternative distributions.

• The more separation between the sampling distributions, the greater the power of the statistical test.

• Jeanette Jackson will discuss the topic of reliability.

Page 55: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

55

Situation (b)

The population standard deviation is unknown

Page 56: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

56

Very rarely do we know the population standard deviation

Page 57: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

57

Vocabulary

• A researcher suspects that a new intake to a college of further education may require extra coaching to enrich their vocabulary.

• The College has been using a vocabulary test, students’ performance on which, over the years, has been found to have a mean of 50. The standard deviation is not known with certainty (estimates have varied and the records are incomplete); but the population distribution seems to be approximately normal.

• The 36 new students have vocabulary scores with a mean of 49 and a sample standard deviation of 2.4 .

• Is this evidence that their vocabulary scores are not from the usual College student population?

Page 58: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

58

Sampling distribution of the mean

Page 59: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

59

Estimating the standard error

• When we don’t know σ, we must use the statistics of our sample to estimate the standard error of the mean.

Page 60: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

60

The t statistic

Page 61: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

61

In our example …

Page 62: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

62

Distribution of t

• Like the standard normal variate z, the distribution of t has a mean of zero.

• The t statistic, however, is not normally distributed.

• Although, like the normal distribution, it is symmetrical and bell-shaped, it has thicker tails: that is, large absolute values of t are more likely than large absolute values of z.

Page 63: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

63

The family of t distributions

• There is only one standard normal distribution, to which any other normal distribution can be transformed; but there is a whole family of t distributions.

• A normal distribution has two parameters: the mean and the SD.

• A t distribution has ONE parameter, known as the DEGREES OF FREEDOM (df ).

Page 64: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

64

Degrees of freedom

• The term is borrowed from physics. • The degrees of freedom of a system is the

number of constraints that must be placed upon it to determine its state completely.

• By analogy, the variance of n scores is calculated from the squares of n deviations from the mean; but deviations from the mean sum to zero, so if you know the values of (n – 1) deviations, you know the nth deviation.

• The degrees of freedom of the variance is therefore (n – 1).

Page 65: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

65

Degrees of freedom of t

• The degrees of freedom of the one-sample t statistic is (n – 1), where n is the size of the sample. This is the degrees of freedom of the variance estimate from the sample.

• In our case, the degrees of freedom of the t statistic = (n – 1) = 36 – 1 = 35.

• As the size of n increases, the t distribution becomes more and more like the standard normal distribution.

Page 66: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

66

Extreme values of t are more likely than extreme values of z

Page 67: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

67

The critical region

• Arguably, since the administrator’s concern is with low scores, we can justify a one-tailed test here and locate the critical region exclusively in the lower tail of the distribution of t on 35 degrees of freedom.

• We want the critical region to the left of the 5th percentile of the distribution in the lower tail.

Page 68: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

68

Critical region for a one-tailed t test

Page 69: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

69

Boundary of critical region for the t test lies further out in the tail

• Notice that the boundary (-1.69) of the critical region lies further out in the lower tail than does the 5th percentile of the standard normal distribution (–1.64).

Page 70: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

70

A significant result

• Our value of t (–2.5) lies within the critical region.

• The null hypothesis is therefore rejected and we have evidence that our sample is from a population with a mean score of less than 50.

Page 71: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

71

Two-sample tests

Page 72: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

72

Results of the caffeine experiment

Page 73: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

73

Is this a repeatable result?

• The difference between the Caffeine and Placebo means is (11.90 – 9.25) = 2.65 hits.

• Could this difference have arisen merely from sampling error?

Page 74: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

74

Independent samples

• The Caffeine experiment yielded two sets of scores - one set from the Caffeine group, the other from the Placebo group.

• There is NO BASIS FOR PAIRING THE SCORES.

• We have INDEPENDENT SAMPLES.

• We shall make an INDEPENDENT-SAMPLES t test.

Page 75: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

75

The null hypothesis

• The null hypothesis states that, in the population, the Caffeine and Placebo means have the same value.

Page 76: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

76

The alternative hypothesis

• The alternative hypothesis states that, in the population, the Caffeine and Placebo means do not have the same value.

Page 77: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

77

Revision

• We are talking about a situation in which two samples have been drawn from identical normal populations and the difference between their means M1 – M2 has been calculated.

• Here the reference set is the population or probabililty distribution of such differences, which is known as the SAMPLING DISTRIBUTION OF THE DIFFERENCE (between means).

• Its SD is known as the STANDARD ERROR OF THE DIFFERENCE

Page 78: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

78

Sampling distribution of the difference

Page 79: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

79

The standard normal distribution

• Questions about ranges of values in any normal distribution can always be referred to questions about corresponding values in the STANDARD NORMAL DISTRIBUTION.

• We do this by tranforming the original values to values of z, the STANDARD NORMAL VARIATE.

Page 80: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

80

If we knew the population standard deviation …

Page 81: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

81

The standard normal distribution

• We could transform the original value to z by subtracting the mean, then dividing by the standard deviation.

• In this case, we would divide by

Page 82: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

82

The test statistic

• We could have calculated z in the usual way:

Page 83: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

83

But we don’t know σ!

• So we must estimate the standard error of the difference from the statistics of our samples:

Page 84: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

84

The pooled variance estimate

Page 85: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

85

Estimate of the standard error of the difference

Page 86: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

86

The independent samples t statistic

• The degrees of freedom of this t statistic is the sum of the dfs of the two sample variance

estimates. • In our example,

df = 20 + 20 – 2 = 38

Page 87: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

87

The value of t

Page 88: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

The critical region

• We shall reject the null hypothesis if the value of t falls within EITHER tail of the t distribution on 38 degrees of freedom.

• To be significant beyond the .05 level, our value of t must be greater than +2.02 OR less than –2.02. Since our value for t (2.60) falls within the critical region, the null hypothesis is rejected.

Page 89: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

89

The p-value

• The one-tailed p-value is (1 – the cumulative probability of the t value 2.60), that is, .0066 .

• To obtain the 2-tailed p-value, we must double this value: 2 × .0066 = .0132 .

Page 90: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

90

Your report

“The scores of the Caffeine group (M = 11.90; SD = 3.28) were higher than those of the Placebo group (M = 9.25; 3.16). With an alpha-level of 0.05, the difference is significant: t(38) = 2.60; p = .0132 (two-tailed) . ”

degrees of

freedom

Page 91: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

91

Representing very small p-values

• Suppose, in the caffeine experiment, that the p-value had been very small indeed. (Suppose t = 6.0). The computer would have given your p-value as ‘.000’. NEVER write, ‘p = .000’. This is unacceptable in a scientific article.

• Write, ‘p < .01’, or ‘p < .001’. • You would have written the present result as

‘ t(38) = 6.0; p < .001’ .

Page 92: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

92

Lisa DeBruine’s guidelines

• Lisa DeBruine has compiled a very useful document describing the most important of the APA guidelines for the reporting of the results of statistical tests.

• I strongly recommend this document, which is readily available on the Web.

• http://www.facelab.org/debruine/Teaching/Meth_A/

• Sometimes the APA manual is unclear. In such cases, Lisa has opted for what seems to be the most reasonable interpretation.

• If you follow Lisa’s guidelines, your submitted paper won’t draw fire on account of poor presentation of your statistics!

Page 93: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

93

A one-tailed test?

• The null hypothesis states simply that, in the population, the Caffeine and Placebo means are equal.

• H0 is refuted by a sufficiently large difference between the means in EITHER direction.

• But some argue that if our scientific hypothesis is that Caffeine improves performance, we should be looking at differences in only ONE direction.

Page 94: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

94

Assumption

• In what follows, we shall assume that the researcher, on the basis of sound theory, has planned to make a one-tailed test .

• Accordingly, the critical region is located entirely in the upper tail of the distribution of t on 38 degrees of freedom.

Page 95: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

95

The null hypothesis again

• The null and alternate hypotheses must be complementary: that is, they must exhaust the possibilities.

• If the alternative hypothesis says that the Caffeine mean is greater, the null hypothesis must say that it is not greater: that is, it is equal to OR LESS THAN the Placebo mean.

Page 96: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

96

A one-sided null hypothesis

Page 97: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

97

Direction of subtraction

• The direction of subtraction of one sample mean from the other is now crucial.

• You MUST subtract the Placebo mean from the Caffeine mean.

• Only a POSITIVE value of t can falsify the directional null hypothesis.

Page 98: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

98

A smaller difference between the means

• Suppose that the mean score of the Caffeine group had been, not 11.90, but 10.99. The cell variances are the same as before.

• In other words, the Caffeine and Placebo means differ by only 1.74 points, rather than 2.65 points, as in the original example.

Page 99: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

99

The critical region

Page 100: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

100

The result

• The value of t is now 1.71, which is greater than the critical value (1.69) on a one-tailed test.

• The null hypothesis that the Caffeine mean is no greater than the Placebo mean is rejected.

Page 101: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

101

Report of the one-tailed test

“The scores of the Caffeine group (M = 10.97; SD = 3.28) were significantly higher than those of the Placebo group (M = 9.25; 3.16): t(38) = 1.71; p = .0477 (one-tailed) .”

Page 102: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

102

Advantage of the one-tailed test

• Our t value of 1.69 would have failed to achieve significance on the two-tailed test, since the critical value there was +2.03 .

• On the one-tailed test, however, t lies in the critical region and the null hypothesis is rejected.

Page 103: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

103

More power

• In locating the entire critical region in the upper tail of the H0 distribution, we increase the light-grey area and reduce the dark-grey area - the beta rate.

• In other words, we increase the POWER of the test to reject H0.

Page 104: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

104

An unexpected result

• Now suppose that, against expectation, the Placebo group had outperformed the Caffeine group.

• The mean for the Caffeine group is 9.25 and that for the Placebo group is 10.20.

• If we subtract the Placebo mean from the Caffeine mean as before, we obtain t = – 2.02.

• On a two-tailed test, this would have been in the critical region (p <.05) and we should have rejected the null hypothesis.

Page 105: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

105

One-sided p-value

• We cannot, however, change horses and declare this unexpected result to be significant.

• In the one-tailed test, the null hypothesis is also one-sided.

• Accordingly, the p-value is also one-sided, that is, it is the probability that the (Caffeine – Placebo) difference would have been at least as LARGE in the positive direction as the one we obtained.

Page 106: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

106

The one-sided p-value

• The one-sided p-value is the entire area under the curve TO THE RIGHT of your value of t.

• That area is 0.975. • You have nothing to

report.

Page 107: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

107

Correct report of the one-tail test

“The scores of the Caffeine group (M = 9.25; SD = 3.16) were not significantly higher than those of the Placebo group (M = 10.20; SD = 3.28): t(38) = -2.02 ; p = 0.975 (one-tailed) .”

Page 108: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

108

Why you can’t change horses

• Having decided upon a one-tailed test, you cannot change to a two-tailed test when you get a result in the opposite direction to that expected.

• If you do, the Type I error rate increases.

Page 109: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

109

The true Type I error rate.

• If you switch to a two-tailed test, your true Type I error rate is now the black area (0.05) PLUS the green area in the lower tail (0.025). This is 0.05 + 0.025 = 0.075, a level many would feel is too high.

• (See the OR rule in the appendix to my first talk.)

Page 110: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

110

Effect size

Page 111: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

111

A reoccupation with significance

• For many years, following R. A. Fisher, the first to develop a system of testing, there was a preoccupation with significance and insufficient regard for the MAGNITUDE of the effect one was investigating.

• Fisher himself observed that, on a sufficiently powerful test, even the most minute difference will be statistically “significant”, however “insubstantial” it may be.

Page 112: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

112

A ‘substantial’ difference?

• We obtained a difference between the Caffeine and Placebo means of (11.90 – 9.25) = 2.75 score points.

• This difference, as we have seen, is “significant” in the statistical sense; but is it SUBSTANTIAL, that is, worth reporting?

Page 113: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

113

Measuring effect size: Cohen’s d statistic

Page 114: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

114

In our example …

Page 115: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

115

Levels of effect size• On the basis of scrutiny of a large number of

studies, Jacob Cohen proposed that we regard a d of .2 as a SMALL effect size, a d of .5 as a MEDIUM effect size and a d of .8 as a LARGE effect size.

• So our experimental result is a ‘large’ effect. • When you report the results of a statistical test,

you are now expected to provide a measure of the size of the effect you are reporting.

Page 116: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

116

Cohen’s classification of effect size

Page 117: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

117

Complete report of your test

“The scores of the Caffeine group (M = 11.90; SD = 3.28) were higher than those of the Placebo group (M = 9.25; 3.16). With an alpha-level of 0.05, the difference is significant: t(38) = 2.60; p = .0132 (two-tailed) . Cohen’s d = .82, a ‘large’ effect”

Page 118: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

118

Coffee break

Page 119: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

119

The analysis of nominal data

Page 120: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

120

Nominal data

• A NOMINAL data set consists of records of membership of the categories making up QUALITATIVE VARIABLES, such as gender or blood group.

• Nominal data must be distinguished from SCALAR, CONTINUOUS or INTERVAL data, which are measurements of QUANTITATIVE variables on an independent scale with units.

• Nominal data sets merely carry information about the frequencies of observations in different categories.

Page 121: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

121

A set of nominal data

• A medical researcher wishes to test the hypothesis that people with a certain type of body tissue (Critical) are more likely to have a potentially harmful antibody.

• Data are obtained on 79 people, who are classified with respect to 2 attributes:

1. Tissue Type;2. Presence/Absence of the antibody.

Page 122: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

122

A question of association

• Do more of the people in the critical group have the antibody?

• We are asking whether there is an ASSOCIATION between the variables of category membership (tissue type) and presence/absence of the antibody.

• The SCIENTIFIC hypothesis is that there is such an association.

Page 123: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

123

The null hypothesis

• The NULL HYPOTHESIS is the negation of the scientific hypothesis.

• The null hypothesis states that there is NO association between tissue type and presence of the antibody.

Page 124: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

124

Contingency tables (cross-tabulations)

• When we wish to investigate whether an association exists between qualitative or categorical variables, the starting point is usually a display known as a CONTINGENCY TABLE, whose rows and columns represent the categories of the qualitative variables we are studying.

• Contingency tables are also known as CROSS-TABULATIONS, or CROSSTABS.

Page 125: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

125

The equivalent of a scatterplot

• The contingency table is the equivalent, for use with nominal data, of the scatterplot that is used to display bivariate continuous data sets.

Page 126: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

126

A contingency table

Page 127: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

127

Interpretation

• Is there an association between Tissue Type and Presence of the antibody?

• The antibody is indeed more in evidence in the ‘Critical’ tissue group.

• It looks as if there may be an association.

Page 128: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

128

Some terms

Page 129: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

129

Observed and expected cell frequencies

• Let O be the frequency of observations in a cell of the contingency table.

• From the marginal totals, we calculate the cell frequencies E that we should expect if there were NO ASSOCIATION between the two attributes Tissue Type and Presence/Absence of the antibody.

Page 130: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

130

Testing the null hypothesis

• We test the null hypothesis by comparing the values of O and E.

• Large (O – E ) differences cast doubt upon the null hypothesis of no association.

Page 131: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

131

What cell frequencies can be expected?

• The pattern of the OBSERVED FREQUENCIES (O) would suggest that there is a greater incidence of the antibody in the Critical tissue group.

• But the marginal totals showing the frequencies of the various groups in the sample also vary.

• What cell frequencies would we expect under the independence hypothesis?

Page 132: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

132

More terms

Page 133: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

133

Expected cell frequencies (E)• According to the null hypothesis, the joint occurrence of

the antibody and a particular tissue type are independent events.

• The probability of the joint occurrence of independent events is the product of their separate probabilities. (See the appendix of the first talk.)

• On this basis, we find the expected frequencies (E) by multiplying together the marginal totals that intersect at the cells concerned and dividing by the total number of observations.

Page 134: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

134

Formula for E

Page 135: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

135

Example calculation of E

Page 136: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

136

Marked (O – E ) differences

• In both cells of the Critical group, there seem to be large differences between O and E: there are many fewer No’s than expected and many more Yes’s.

Page 137: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

137

The chi-square (χ2) statistic

• We need a statistic which compares the differences between the O and E, so that a large value will cast doubt upon the null hypothesis of independence.

• The approximate CHI-SQUARE (χ2) statistic fits the bill.

Page 138: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

138

Formula for chi-square

• The element of this summation expresses the square of the difference between O and E as a proportion of E.

• Add up these proportional squared differences for all the cells in the contingency table.

Page 139: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

139

The value of chi-square

There are 8 terms in the summation, but only the first two and the last are shown in the calculation below.

Page 140: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

140

Degrees of freedom

• To decide whether a given value of chi-square is significant, we must specify the DEGREES OF FREEDOM df of the chi-square statistic.

• If a contingency table has R rows and C columns, the degrees of freedom is given by

• df = (R – 1)(C – 1)• In our example, R = 4, C = 2 and so• df = (4 – 1)(2 – 1) = 3.

Page 141: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

141

Significance

• The p-value of a chi-square with a value of 10.655 in the chi-square distribution with three degrees of freedom is .014.

• We should write this result as: χ2(3) = 10.66; p = .014 .

• Since the result is significant beyond the .05 level, we have evidence against the null hypothesis of independence and evidence for the scientific hypothesis.

Page 142: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

142

The odds and the odds ratio

Page 143: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

143

A 2 × 2 contingency table

Page 144: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

144

The odds

Page 145: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

145

The odds and probability

• Like the probability p, the odds is a measure of likelihood. The two measures are related according to

Page 146: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

146

Example of a calculation of the odds

• The odds in favour of the antibody in the Critical group are

Page 147: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

147

The odds ratio

• The ODDS RATIO (OR ) compares the odds in favour of an event between two groups.

Page 148: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

148

Example

• Moving to the critical group multiplies the odds in favour of the antibody being present nearly five times.

Page 149: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

149

APPENDIX

Confidence Intervals

Page 150: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

150

Confidence intervals

• A CONFIDENCE interval is a range of values centred on the value of the sample statistic and which one can assume with a specified level of “confidence” includes the true value of the parameter.

Page 151: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

151

Sampling distribution of the mean

Page 152: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

152

Equivalent probability statement

• An expression with terms such as < is known as an INEQUALITY.

• There are special rules for manipulating inequalities.

Page 153: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

153

Inference about the population mean

• Notice that the population mean is now at the centre of the inequality and the sample mean is in the terms denoting the lower and upper limits of the interval.

• We have changed a statement about the sample mean to one about the population mean.

Page 154: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

154

The 95% confidence interval on the sample mean

• You can be 95% “confident” that the value of the population mean lies within this range.

Page 155: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

155

Example

• A sample of 100 people has a mean height of 69.8 inches.

• Suppose, (very unrealistically), that we know that the population SD is 3.2 inches, but we don’t know the value of the population mean.

• Construct the 95% confidence interval on the mean.

Page 156: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

156

The first step

• Calculate the standard error of the mean.

Page 157: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

157

The 95% confidence interval

• You can be 95% confident that the population mean lies within this range.

Page 158: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

158

Using the confidence interval to test the null hypothesis

• Notice that the 95% confidence interval on the mean, that is, [69.17, 70.43], does not include the value 69.

• If the confidence interval does not include the value specified by the null hypothesis, the hypothesis can be rejected.

• The two approaches lead to exactly the same decision about the null hypothesis.

Page 159: 1 This mornings programme 9:05am – 10:50am The t tests 10:50am – 11:20am A break for coffee. 11:20am – 12:30pm Approximate chi- square tests

159

Interpretation of a confidence interval

• The 95% confidence interval on our sample mean is, [69.17, 70.43].

• We cannot say, “The probability that the mean lies between 69.17 and 70.43 is .95”. A confidence confidence interval is not a sample space. (See the appendix to my first talk.)

• A classical probability refers to a hypothetical future. Here, the die has already been cast and either the interval fell over the population mean or it didn’t. In view of the manner in which the interval was constructed, however, we can be “95% confident” that it fell over the true value of the population mean.