Upload
glen-needham
View
226
Download
2
Tags:
Embed Size (px)
Citation preview
Lecture 2:Lecture 2:Null Hypothesis Significance Null Hypothesis Significance
Testing ContinuedTesting Continued
Laura McAvinueSchool of PsychologyTrinity College Dublin
Null Hypothesis Significance Null Hypothesis Significance TestingTesting
• Previous lecture, Steps of NHST
– Specify the alternative/research hypothesis– Set up the null hypothesis– Collect data– Run the appropriate statistical test– Obtain the test statistic and associated p value– Decide whether to reject or fail to reject the
null hypothesis on the basis of p value
Null Hypothesis Significance Null Hypothesis Significance TestingTesting
• Decision to reject or fail to reject Ho
– P value
– Probability of obtaining the observed results if Ho is true
– By convention, use the significance level of p < .05
– Conclude that it is highly unlikely that we would obtain these results by chance, so we reject Ho
– Caveat! The fact that there is a significance level does not mean that there is a simple ‘yes’ or ‘no’ answer to your research question
Null Hypothesis Significance Null Hypothesis Significance TestingTesting
• If you obtain results that are not statistically significant (p>.05), this does not necessarily mean that the relationship you are interested in does not exist
• There are a number of factors that affect whether your results come out as statistically significant
– One and two-tailed tests– Type I and Type II errors– Power
One and Two-tailed TestsOne and Two-tailed Tests
• One-tailed / Directional Test– Run this when you have a prediction about the
direction of the results
• Two-tailed / Non-Directional Test– Run this when you don’t have a prediction
about the direction of the results
Recall previous example…Recall previous example…
• Research Qu – Do anxiety levels of students differ from anxiety
levels of young people in general?
• Prediction – Due to the pressure of exams and essays, students
are more stressed than young people in general
• Method– You know the mean score for the normal young
population on the anxiety measure = 50– You predict that your sample will have mean > 50– Run a one-tailed one-sample t test at p < .05 level
One-tailed TestOne-tailed Test
• Compare the mean of your sample to the sampling distribution for the population mean
• Decide to reject Ho if your sample mean falls into the highest 5% of the sampling distribution
DilemmaDilemma
• But! What if your prediction is wrong?– Perhaps students are less stressed than the
general young population• Their own bosses, summers off, no mortgages
– With previous one-tailed test, you could only reject Ho if you got an extremely high sample mean
– What if you get an extremely low sample mean?
• Run a two-tailed test– Hedge your bets– Reject Ho if you obtain scores at either extreme of
the distribution, very high or very low sample mean
Two-tailed TestTwo-tailed Test
• You will reject Ho when a score appears in the highest 2.5% of the distribution or the lowest 2.5%
• Note that it’s not the highest 5% and the lowest 5% as then you’d be operating at p = .1 level, rejecting Ho for 10% of the distribution
• So, we gain ability to reject Ho for extreme values at either end but values must be more extreme
Errors in NHSTErrors in NHST
• Howell (2008) p. 157– “Whenever we reach a decision with a
statistical test, there is always a chance that our decision is the wrong one”
• Misleading nature of NHST– Because there is a significance level (p = .05),
people interpret NHST as a definitive exercise– Results are statistically significant or not
– We reject Ho or we don’t
– The Ho is wrong or right
Errors in NHSTErrors in NHST
• Remember we are dealing with probabilities– We make our decision on the basis of the
likelihood of obtaining the results if Ho is true– There is always the chance that we are making
an error
• Two kinds of Error– We reject Howhen it is true (Type I error)
• We say there’s a significant difference when there’s not
– We accept Ho when it is false (Type II error)• We say there is no significant difference when there is
Type I ErrorType I Error
• Our anxiety example• Predict that students will
have greater anxiety score than young people in general
• Test Ho that students’ anxiety levels do not differ from young people
• One-tailed one sample t-test at p < .05
• Compare sample mean with sampling distribution of mean for the population (Ho)
Type I ErrorType I Error
• Decide to reject Ho if your sample mean falls in the top 5% of the distribution
• But!• This 5%, even though at
the extreme end, still belongs to the distribution
• If your sample mean falls within this top 5%, there is still a chance that your sample came from the Ho population
Type I ErrorType I Error
• For example, if p = .04, this means that there is a very small chance that your sample mean came from that population, – But this is still a chance, you could be rejecting Ho
when it is in fact true
• Researchers are willing to accept this small risk (5%) of making a Type I error, of rejecting Ho when it is in fact true
• Probability of making Type I error = alpha = the significance level that you chose– .05, .01
Type II ErrorType II Error
• So why not set a very low significance level to minimise your risk of making a Type I error?– Set p < .01 rather than p < .05
• As you decrease the probability of making a Type I error you increase the probability of making a Type II error
• Type II Error– Fail to reject Ho when it is false
– Fail to detect a significant relationship in your data when a true relationship exists
• For argument’s sake, imagine that H1 is correct
• Sampling Distribution under Ho
• Sampling Distribution under H1
• Reject Ho if sample mean equals any value to the right of the critical value (red region)– Correct Decision
• Accept Ho if sample mean equals any value to the left of the critical region– Type II Error
Four Outcomes of Decision Four Outcomes of Decision MakingMaking
True State of Nature
Decision Ho is True Ho is False
Accept Ho Correct Decision Type II Error
Reject Ho Type I Error Correct Decision
PowerPower
• You should minimise both Type I and Type II errors– In reality, people are often very careful about Type
I (i.e. strict about ) but ignore Type II altogether
• If you ignore Type II error, your experiment could be doomed before it begins – even if a true effect exists (i.e. H1 is correct), if is
high, the results may not show a statistically significant effect
• How do you reduce the probability of a Type II error?– Increase the power of the experiment
PowerPower
• Power– The probability of
correctly rejecting a false Ho
– A measure of the ability of your experiment to detect a significant effect when one truly exists
– 1 -
How do we increase the power How do we increase the power of our experiment?of our experiment?
• Factors affecting power
– The significance level ()
– One-tailed v two-tailed test
– The true difference between Ho and H1(o - 1)
– Sample Size (n)
The Influence of The Influence of on Poweron Power
• Reduce the significance level ()…– Reduce the probability of making a
Type I error• Rejecting the Ho when it is true
– Increase the probability of making a Type II error
• Accepting the Ho when it is false
– Reduce the power of the experiment to detect a true effect as statistically significant
Reduce Reduce and reduce and reduce powerpower
Increase Increase and increase and increase powerpower
But! You increase the probability of a Type I error!
The Influence of One v Two-The Influence of One v Two-tailed Tests on Powertailed Tests on Power
• We lose power with a two-tailed test
– power is divided across the two tails of the experiment
– Values must be more extreme to be statistically significant
The Influence of the True The Influence of the True Difference between HDifference between Hoo and H and H11
• The bigger the difference between o and 1, the easier it is to detect it
The Influence of Sample Size on The Influence of Sample Size on PowerPower
• The bigger the sample size, the more power you have
• A big sample provides a better estimate of the population mean
• With bigger sample sizes, the sampling distribution for the mean clusters more tightly around the population mean
• Standard deviation of the sampling distribution, known as standard error the mean is reduced
• There is less overlap between the sampling distributions under Ho and H1
• The power to detect a significant difference increases
The Influence of Sample Size on The Influence of Sample Size on PowerPower
Sample Size ExerciseSample Size Exercise
• Open the following dataset– Software / Kevin Thomas / Power dataset
(revised)– Explores the effects of Therapy on Depression
• Perform two Independent Samples t-test– Analyse / Compare means / Independent
Samples t test– Group represents Therapy v Control– Score represents post-treatment depression– 1. Group1 & Score1– 2. Group 2 & Score 2
Complete the following tableComplete the following table
Analysis 1 Analysis 2
Size of sample
Therapy mean score
Therapy standard deviation
Control mean score
Control standard deviation
Mean difference
T statistic
df
P-value
What explains these results?What explains these results?
Analysis 1 Analysis 2
Size of sample 20 200
Therapy mean score
5.5 5.5
Therapy standard deviation
3.03 2.89
Control mean score
6.3 6.3
Control standard deviation
2.75 2.62
Mean difference -.8 -.8
T statistic -.618 -2.051
Df 18 198
P-value .54 .042
So, how do I increase the power So, how do I increase the power of my study?of my study?
• You can’t manipulate the true difference between Ho and H1
• You could increase your significance level () but then you would increase the risk of a Type I error
• If you have a strong prediction about the direction of the results, you should run a one-tailed test
• The factor that is most under your control is sample size– Increase it!