Hypothesis generation and testing
Dr Kirsten Challinor
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of the
University of New South Wales pursuant to Part VB of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act. Any further
reproduction or communication of this material by you may be the subject of copyright protection
under the Act.
Do not remove this notice.
Why does one collect data?
Lecture topics so far-
• Epidemiology
• Measures of disease frequency
• Intro to Biostatistics
• Variables, scales and descriptive statistics
• Intro to study designs
• Survey methods
• Sampling methods
You collect data because you have and interesting QUESTION or IDEA or
THEORY about the world.
• Statistics is about helping you answer your question.
Image from http://www.aamu.edu/Academics/EHBS/HSHPCD/csd/Pages/Program-Data.aspx
Initial Observation
• Find something that needs explaining
– Observe the real world
– Read other research
• Test the concept: collect data
– Collect data to see whether your hunch is correct
– To do this you need to define variables
o Anything that can be measured and can differ across entities
or time.
- Analyse data
- Fit statistical model to the data
Types of research
• Correlational research:
– Observing what naturally goes on in
the world without directly interfering
with it.
• Cross-sectional research:
– This term implies that data come from
people at different age points with
different people representing each
age point.
• Experimental research:
– One or more variable is
systematically manipulated to see
their effect (alone or in combination)
on an outcome variable.
– Statements can be made about
cause and effect.
Image from http://www.leichtmanresearch.com/research.html
Research hypothesis
Hypothesis = A proposition for reasoning
= A suggestion as to why
something might be as it is
= A prediction from a theory.
A testable statement of the state of the world.
Examples of testable and non testable statements:
–The Beatles were the most influential band ever = Non-scientific
statement.
–The Beatles were the best selling band ever = Scientific/testable
statement.
Good theories produce hypotheses that are scientific statements.
Scientific statements are ones that can be verified with reference to empirical
evidence.
The
research
process
Initial observation
(research
question/hypothesis)
Generate theory
Generate stats
hypotheses
Collect data to test
theory
Analyse data
Data
Identify variables
Measure variables
- Graph data
- Fit a modelAdapted from Field, A. Page 3
Research hypothesis & Statistical hypotheses
Research hypothesis
Null hypothesis
Experimental hypothesis
Do ‘great’ supervisors produce
better students?
No difference between students
Students with highly rated supervisors will be rated better than students with lower rated supervisors
The experimental hypothesis
• The hypothesis or prediction that comes from your theory is usually saying
that an effect will be present. It says that there will be a difference between
groups.
• This is called the
– Experimental hypothesis OR
– The alternative hypothesis (because it relates to a type of
methodology)
• It is labelled like this: H1
• Examples of an experimental hypothesis:
– H1 = The Beatles have sold more records than Michael Jackson.
– H1 = The instance of dry eye is different in men and women.
– H1 = Students with highly rated supervisors will score better marks
than students with lower rated supervisors.
The null hypothesis
• It is the opposite of the experimental hypothesis. It states
that nothing interesting will happen.
• This states that there is no effect.
• It is labelled like this: H0
• Examples of a null hypothesis:
– H0 = The number of records sold by The Beatles and
Michael Jackson will not be different.
– H0 = There is no difference in the prevalence of dry eye
in men and women.
– H0 = No difference between the marks of students who
had highly rated supervisors to those who had lowly
rated supervisors.
The null hypothesis
• Remember that null hypothesis does not necessarily
state that the size of the effect is zero.
Footnote slide- for those who are curious:
Why do we need the null hypothesis? • The issue of truth.
• We can’t prove the truth, but we can talk in terms of probability.
• Can never really prove our hypothesis.
• We cannot prove the experimental (alternate) hypothesis using statistics,
but we can reject the null hypothesis.
• If we get data that gives us confidence to reject our null hypothesis, that
gives us support to our experimental hypothesis.
• However, even if we reject our null hypothesis that still doesn’t prove our
experimental hypothesis.
• A basic understanding of hypothesis testing is focused on accepting or
rejecting the null hypothesis. However what we really should talk about is
‘the chances of obtaining the data, assuming the null hypothesis is true.”
Example Experimental hypothesis: H1 Null hypothesis: H0
Music The Beatles have sold more
records than Michael Jackson.
The number of records sold by
The Beatles and Michael Jackson
will not be different.
Optometry
(Correlational
research)
The instance of dry eye is
different in men and women.
There is no difference in the
prevalence of dry eye in men and
women.
Optometry
(Experimental
research)
Students with highly rated
supervisors will have higher
marks than students with lower
rated supervisors.
No difference between the marks
of students who had highly rated
supervisors to those who had
lowly rated supervisors.
Summary of H1 and H0 Examples
Example
Group A.
Students with an ‘OK’ Supervisor
Name Student mark
Peter
Sarah
Alex
John
….
Mean Mean of Group B
Group B.
Students with a ‘Great’ Supervisor
Name Student mark
Tom
Jill
Sally
Louise
….
Mean Mean of Group B
Logic behind testing the hypotheses
We evaluate our statistical hypotheses by
thinking about the chance of getting the data
we found in association with the null
hypothesis statement.
Chance/ probability / likelihood
Experimental hypothesis: H1 Null hypothesis: H0
Students with highly rated
supervisors will have higher marks
than students with lower rated
supervisors.
No difference between the marks of
students who had highly rated
supervisors to those who had lowly
rated supervisors.
Logic behind testing the hypothesis: example 1.
Let’s say that 75% of students with highly rated supervisors got high marks (say A or A+).
What are the chances that we got this result by accident?
Let’s consider the null hypothesis: No difference between students.
E.g. If the null hypothesis is true (that there is no difference between students), what are the chances that 75% students in the great supervisor group had high marks just by chance?
Not very likely. It is pretty unlikely that we accidently got 75% of high mark students in the ‘great supervisor’ group.
Experimental hypothesis: H1 Null hypothesis: H0
Students with highly rated supervisors will
have higher marks than students with lower
rated supervisors.
No difference between the marks of students
who had highly rated supervisors to those who
had lowly rated supervisors.
Our conclusion…• Therefore we were unlikely to have gotten the data we did if the null
hypothesis were true.
OR said differently:
• If there is no true difference between the groups, it is pretty unlikely that
when we collected data we randomly got 75% of students with high marks
in our ‘great supervisor’ group.
THEREFORE
A basic understanding = we reject the null hypothesis and feel that we have
support for our experimental hypothesis*
So we think that it is not likely to be the case that there is no difference
between the groups. It seems possible that the 2 groups of students are in
fact different. We will need to look at the means of the groups to see the
direction of the difference.
* remember that, even if we reject our null hypothesis that still doesn’t prove
our experimental hypothesis.
Logic behind testing the hypothesis- example 2.
But what if our result was not 75%, but something like 8%?
(let’s say that 8% of students with highly rated supervisors got high marks of A or A+).
If we assume the null hypothesis is true:
What is the chance that 8% of the students in the ‘great supervisor’ group got high marks just by random chance?
In my opinion 8% is a low percentage…
Well maybe this result could be a random accident.
…now it seems that 8% with high marks is possible or likely, so we might feel uncomfortable rejecting the null hypotheses and we say that it is possible that the 2 groups are not different.
Experimental hypothesis: H1 Null hypothesis: H0
Students with highly rated supervisors will
have higher marks than students with lower
rated supervisors.
No difference between the marks of students
who had highly rated supervisors to those who
had lowly rated supervisors.
Fitting statistical models to data
• We have looked at testing the hypotheses in a very
informal manor by casually asking “what is the chance
that we got a result by accident”.
• Statistical testing formalises this process. You will be
learning about this process in more detail in
– Lecture 17 – Statistical test selection by "Mr
Krishnaiah Sannapaneni"
– Lecture 18 – Statistical Analysis by Mr. Hasnat Ali
• Is all about this logic of seeing if the data you got is a
‘real’ representation of the world of if it is just due to
chance.
Lecture Summary• The process of doing research.
• Research questions are formulated through observing phenomena or collecting data about a hunch.
• Once the observation has been confirmed, theories can be generated about why something happens.
• From these theories we formulate hypotheses that we can test. To test hypotheses we need to measure things and this leads us to think about the variables that we need to measure and how to measure them.
• Then we collect some data.
• The beginning of the final stage is to analyse these data. We can begin analysing the data by looking at the shape of it in graphs/descriptive stats (lecture 4), but ultimately we should end up fitting a statistical model to the data (lecture 17).
• Then you can interpret your results and think about your next research question!
End of lecture
…oh and the Beatles are the best selling band of all time.
http://ifpi.org
References
Chapter 1 in Field, A. P. (2009). Discovering statistics using SPSS. London, England : SAGE.
http://www.statisticshell.com/html/limbo.html
http://www.uk.sagepub.com/field3e/MCQch01.htm?gm=PracticeQuiz&ds=ALL&sk=wct&arg=ttttf%27%
29
New words
• Theories
– An hypothesized general principle or set of principles that explain known findings about a
topic and from which new hypotheses can be generated.
• Hypothesis
– A prediction from a theory.
• Falsification
– The act of disproving a theory or hypothesis.
Homework task: find definitions for these:
• Experimental hypothesis/Alternate hypothesis
• Null hypothesis
• Independent variable
• Dependent variable
Self-test Quiz
The aim of experimental research is to:
a) be a phenomenon
b) cause a phenomenon
c) investigate what caused a phenomenon
d) to prevent a phenomenon
Turing a research question into a testable hypothesis
i) Identify if each statement in the list below is a scientific statement or not?
Remember that scientific statements be can be proved/ are testable.
ii) For the statements that are not testable, can you change their wording to make them scientific?
List of statements
• Chocolate is the best food.
• Watching television makes you happy.
• Cricket is the world’s most popular sport to watch.
• Coke is the worst drink.