Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin

Lecture 2:Lecture 2:Null Hypothesis Significance Null Hypothesis Significance

Testing ContinuedTesting Continued

Laura McAvinueSchool of PsychologyTrinity College Dublin

Null Hypothesis Significance Null Hypothesis Significance TestingTesting

• Previous lecture, Steps of NHST

– Specify the alternative/research hypothesis– Set up the null hypothesis– Collect data– Run the appropriate statistical test– Obtain the test statistic and associated p value– Decide whether to reject or fail to reject the

null hypothesis on the basis of p value


• Decision to reject or fail to reject Ho

– P value

– Probability of obtaining the observed results if Ho is true

– By convention, use the significance level of p < .05

– Conclude that it is highly unlikely that we would obtain these results by chance, so we reject Ho

– Caveat! The fact that there is a significance level does not mean that there is a simple ‘yes’ or ‘no’ answer to your research question


• If you obtain results that are not statistically significant (p>.05), this does not necessarily mean that the relationship you are interested in does not exist

• There are a number of factors that affect whether your results come out as statistically significant

– One and two-tailed tests– Type I and Type II errors– Power

One and Two-tailed TestsOne and Two-tailed Tests

• One-tailed / Directional Test– Run this when you have a prediction about the

direction of the results

• Two-tailed / Non-Directional Test– Run this when you don’t have a prediction

about the direction of the results

Recall previous example…Recall previous example…

• Research Qu – Do anxiety levels of students differ from anxiety

levels of young people in general?

• Prediction – Due to the pressure of exams and essays, students

are more stressed than young people in general

• Method– You know the mean score for the normal young

population on the anxiety measure = 50– You predict that your sample will have mean > 50– Run a one-tailed one-sample t test at p < .05 level

One-tailed TestOne-tailed Test

• Compare the mean of your sample to the sampling distribution for the population mean

• Decide to reject Ho if your sample mean falls into the highest 5% of the sampling distribution

DilemmaDilemma

• But! What if your prediction is wrong?– Perhaps students are less stressed than the

general young population• Their own bosses, summers off, no mortgages

– With previous one-tailed test, you could only reject Ho if you got an extremely high sample mean

– What if you get an extremely low sample mean?

• Run a two-tailed test– Hedge your bets– Reject Ho if you obtain scores at either extreme of

the distribution, very high or very low sample mean

Two-tailed TestTwo-tailed Test

• You will reject Ho when a score appears in the highest 2.5% of the distribution or the lowest 2.5%

• Note that it’s not the highest 5% and the lowest 5% as then you’d be operating at p = .1 level, rejecting Ho for 10% of the distribution

• So, we gain ability to reject Ho for extreme values at either end but values must be more extreme

Errors in NHSTErrors in NHST

• Howell (2008) p. 157– “Whenever we reach a decision with a

statistical test, there is always a chance that our decision is the wrong one”

• Misleading nature of NHST– Because there is a significance level (p = .05),

people interpret NHST as a definitive exercise– Results are statistically significant or not

– We reject Ho or we don’t

– The Ho is wrong or right

Errors in NHSTErrors in NHST

• Remember we are dealing with probabilities– We make our decision on the basis of the

likelihood of obtaining the results if Ho is true– There is always the chance that we are making

an error

• Two kinds of Error– We reject Howhen it is true (Type I error)

• We say there’s a significant difference when there’s not

– We accept Ho when it is false (Type II error)• We say there is no significant difference when there is

Type I ErrorType I Error

• Our anxiety example• Predict that students will

have greater anxiety score than young people in general

• Test Ho that students’ anxiety levels do not differ from young people

• One-tailed one sample t-test at p < .05

• Compare sample mean with sampling distribution of mean for the population (Ho)


• Decide to reject Ho if your sample mean falls in the top 5% of the distribution

• But!• This 5%, even though at

the extreme end, still belongs to the distribution

• If your sample mean falls within this top 5%, there is still a chance that your sample came from the Ho population


• For example, if p = .04, this means that there is a very small chance that your sample mean came from that population, – But this is still a chance, you could be rejecting Ho

when it is in fact true

• Researchers are willing to accept this small risk (5%) of making a Type I error, of rejecting Ho when it is in fact true

• Probability of making Type I error = alpha = the significance level that you chose– .05, .01

Type II ErrorType II Error

• So why not set a very low significance level to minimise your risk of making a Type I error?– Set p < .01 rather than p < .05

• As you decrease the probability of making a Type I error you increase the probability of making a Type II error

• Type II Error– Fail to reject Ho when it is false

– Fail to detect a significant relationship in your data when a true relationship exists

• For argument’s sake, imagine that H1 is correct

• Sampling Distribution under Ho

• Sampling Distribution under H1

• Reject Ho if sample mean equals any value to the right of the critical value (red region)– Correct Decision

• Accept Ho if sample mean equals any value to the left of the critical region– Type II Error

Four Outcomes of Decision Four Outcomes of Decision MakingMaking

True State of Nature

Decision Ho is True Ho is False

Accept Ho Correct Decision Type II Error

Reject Ho Type I Error Correct Decision

PowerPower

• You should minimise both Type I and Type II errors– In reality, people are often very careful about Type

I (i.e. strict about ) but ignore Type II altogether

• If you ignore Type II error, your experiment could be doomed before it begins – even if a true effect exists (i.e. H1 is correct), if is

high, the results may not show a statistically significant effect

• How do you reduce the probability of a Type II error?– Increase the power of the experiment

PowerPower

• Power– The probability of

correctly rejecting a false Ho

– A measure of the ability of your experiment to detect a significant effect when one truly exists

– 1 -

How do we increase the power How do we increase the power of our experiment?of our experiment?

• Factors affecting power

– The significance level ()

– One-tailed v two-tailed test

– The true difference between Ho and H1(o - 1)

– Sample Size (n)

The Influence of The Influence of on Poweron Power

• Reduce the significance level ()…– Reduce the probability of making a

Type I error• Rejecting the Ho when it is true

– Increase the probability of making a Type II error

• Accepting the Ho when it is false

– Reduce the power of the experiment to detect a true effect as statistically significant

Reduce Reduce and reduce and reduce powerpower

Increase Increase and increase and increase powerpower

But! You increase the probability of a Type I error!

The Influence of One v Two-The Influence of One v Two-tailed Tests on Powertailed Tests on Power

• We lose power with a two-tailed test

– power is divided across the two tails of the experiment

– Values must be more extreme to be statistically significant

The Influence of the True The Influence of the True Difference between HDifference between Hoo and H and H11

• The bigger the difference between o and 1, the easier it is to detect it

The Influence of Sample Size on The Influence of Sample Size on PowerPower

• The bigger the sample size, the more power you have

• A big sample provides a better estimate of the population mean

• With bigger sample sizes, the sampling distribution for the mean clusters more tightly around the population mean

• Standard deviation of the sampling distribution, known as standard error the mean is reduced

• There is less overlap between the sampling distributions under Ho and H1

• The power to detect a significant difference increases

The Influence of Sample Size on The Influence of Sample Size on PowerPower

Sample Size ExerciseSample Size Exercise

• Open the following dataset– Software / Kevin Thomas / Power dataset

(revised)– Explores the effects of Therapy on Depression

• Perform two Independent Samples t-test– Analyse / Compare means / Independent

Samples t test– Group represents Therapy v Control– Score represents post-treatment depression– 1. Group1 & Score1– 2. Group 2 & Score 2

Complete the following tableComplete the following table

Analysis 1 Analysis 2

Size of sample

Therapy mean score

Therapy standard deviation

Control mean score

Control standard deviation

Mean difference

T statistic

df

P-value

What explains these results?What explains these results?

Analysis 1 Analysis 2

Size of sample 20 200

Therapy mean score

5.5 5.5

Therapy standard deviation

3.03 2.89

Control mean score

6.3 6.3

Control standard deviation

2.75 2.62

Mean difference -.8 -.8

T statistic -.618 -2.051

Df 18 198

P-value .54 .042

So, how do I increase the power So, how do I increase the power of my study?of my study?

• You can’t manipulate the true difference between Ho and H1

• You could increase your significance level () but then you would increase the risk of a Type I error

• If you have a strong prediction about the direction of the results, you should run a one-tailed test

• The factor that is most under your control is sample size– Increase it!

Documents

Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin