31
Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Hypothesis testing

Behavioural Science II

Week 1, Semester 2, 2002

Page 2: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 2

Hypothesis testing

• Null hypothesis is that there is no systematic relationship between independent variables (IVs) and dependent variables (DVs).

• Research hypothesis is that any relationship observed in the data is real.

Page 3: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 3

Hypothesis testing

• Whereas research hypothesis tends to be imprecise about numerical differences between groups (e.g., difference in reaction times), null hypothesis states very specifically that difference should be zero.

Page 4: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 4

Null hypothesis versus alternative hypothesis

• The null hypothesis assumes that scores for different levels of the IV are random samples from the same population.

• The alternative hypothesis is that samples come from different populations.

Page 5: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 5

Null hypothesis versus alternative hypothesis

• For any single experiment, we are bound to see a difference, just as we see a difference between the means of two random samples in a distribution of sample means.

• If the null hypothesis is true, then differences in mean scores are just two random samples from the same population.

Page 6: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 6

Testing the null hypothesis

• A statistical test assesses the probability of obtaining a given sample or samples of scores, assuming the null hypothesis is correct.

Page 7: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 7

Testing the null hypothesis

• If the probability is low enough (e.g., p<.05), then the null hypothesis is rejected in favour of the alternative (research) hypothesis, and the IV is deemed to have a systematic effect.

• If the probability is not sufficiently low (e.g., p>.05), then the null hypothesis is not rejected but retained, and the IV is deemed to have no effect (i.e., the observed changes are due to chance).

Page 8: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 8

Statistical significance

• Statistical significance refers to the probability of the data obtained, given that the null hypothesis is true.

• A statistically significant result does not mean that the null hypothesis is improbable.

• There is an ongoing gap between statistical significance and substantive significance.

Page 9: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 9

Hypothesis testing and sampling distributions

• The decision to reject or not reject the null hypothesis usually is made with reference to the sampling distribution of a statistic of some kind (e.g., z-distribution, t-distribution).

Page 10: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 10

Example of hypothesis testing using z-distribution• Null hypothesis population

parameters: = 15

=15

• Random sample statisticsMean = 110

N=9

Page 11: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 11

Applying formulae

• Given that z-score of 1.96 = p< .05 (two-tailed), would reject null hypothesis.

X

N

15

9

15

35

Z X

X

X

110 100

510

52

Page 12: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 12

Example of hypothesis testing using t-distribution• Null hypothesis population

parameters:=100

• Random sample statisticsMean = 110

N=9

∑x2 = 960

Page 13: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 13

Applying formulaeGiven that t-

scores of 2.306 (df=8) =p< .05 (two-tailed), would reject the null hypothesis.

˜ x2

N 1 960

9 1 960

810.95

˜ X

˜ N

10.95

9

10.95

33.65

tX

X

˜ X

110 100

3.65 10

3.652.74

Page 14: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 14

Hypothesis testing using confidence intervals

• We reject null hypothesis when null population mean lies outside the confidence interval.

• We infer alternative population mean is higher than null population mean if lower limit of confidence intervals is to right of null population mean and lower if upper limit of confidence intervals is to left of null population mean.

Page 15: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 15

Errors in hypothesis testing

• Given the gap between statistical and substantive significance, a decision based on probability to retain or reject the null hypothesis can be wrong.

Page 16: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 16

When null hypothesis is true (Type I error)

• When null hypothesis is true, and it is rejected, this decision is called a Type 1 error.

• The probability of making such an error is designated alpha () and is equivalent to the significance level (e.g., p<.05).

Page 17: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 17

When null hypothesis is true (Type I error)

• If null hypothesis is true and alpha level is set at .05, then the null hypothesis will be rejected 5% of time even though it is true.

• One way to safeguard against a Type I error is to set a more stringent alpha level (e.g., p<.01).

Page 18: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 18

When null hypothesis is false (Type II or III errors)

• When alternative hypothesis is true, and the statistic (mean) from alternative distribution falls within cut-off points (i.e., p>.05), then null hypothesis would be retained.

Page 19: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 19

Type II error

• Retaining null hypothesis when alternative hypothesis is true is called a Type II error.

• The probability of making a Type II error usually is symbolized as beta ().

• The probability of beta depends on how much the alternative hypothesis sampling distribution overlaps the retention region of the null hypothesis sampling distribution.

Page 20: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 20

Type III error

• It is also possible to make a Type III error, by rejecting a null hypothesis but inferring the incorrect alternative hypothesis.

• The probability of making a Type III error usually is symbolized as gamma () and is equivalent to whatever percentage of scores in the alternative distribution falls in the far end of the null hypothesis distribution. The probability of making a Type III error is usually quite small.

Page 21: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 21

The power of a test

• The probability of rejecting a false null hypothesis and correctly inferring the position or direction of the alternative hypothesis with respect to the null hypothesis.

• Factors affecting power and error rates

Page 22: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 22

Power is affected by significance (alpha) level

• Setting a less stringent significance level increases the discriminatory power of the statistical test and increases power as long as the alternative hypothesis is true.

Page 23: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 23

Power is affected by magnitude of difference between sample means

• So, increasing the difference in the size of the mean at differing levels of the IV increases the power of the test.

Page 24: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 24

Power is affected by sample size

• An increase in sample size increases the power of the test, if the alternative hypothesis is true.

• This is because as sample size increases, the standard error of the mean decreases, thus reducing the overlap between the null and alternative hypotheses.

Page 25: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 25

Effect size

• In order to gauge the effect of the IV, it makes sense to contrast the difference between the population mean for the null hypothesis and the population mean for the alternative hypothesis.

Page 26: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 26

Effect size formula

• where

• is standard deviation of population of dependent measure scores.

Effect _ size 0 1

Page 27: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 27

Judging effect sizes

• According to Cohen (1988).20 = small effect size

.50 = medium effect size

.80 = large effect size

Page 28: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 28

Do we really need the null hypothesis?

• A significant test of the null hypothesis does not mean the data are not a product of chance.

• The significant result may simply be a Type I error (falsely rejecting null hypothesis).

Page 29: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 29

Do we really need the null hypothesis?

• Better to test research hypothesis, if know size and direction of effect.

• Even better report combination of outcome values (e.g., effect sizes, confidence intervals, strength of relationship).

Page 30: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 30

One-tailed versus two-tailed tests

• Conventionally reject null hypothesis if obtained z-score or t-score falls beyond certain values in either tail of the relevant sampling distribution (i.e., a two-tailed test).

• In specific contexts, a one-tailed test might seem appropriate (e.g., reject null hypothesis only if test statistic fell in 5% left-hand tail of distribution.

Page 31: Hypothesis testing Behavioural Science II Week 1, Semester 2, 2002

Behavioural Science II 31

One-tailed versus two-tailed tests

• Generally, two-tailed tests are preferred to one-tailed tests.

• The IV may have an effect in opposite direction to the one predicted.