Download pdf - Hypothesis generation and testing

Hypothesis generation and testing

Dr Kirsten Challinor

COMMONWEALTH OF AUSTRALIA

Copyright Regulations 1969

WARNING

This material has been reproduced and communicated to you by or on behalf of the

University of New South Wales pursuant to Part VB of the Copyright Act 1968 (the Act).

The material in this communication may be subject to copyright under the Act. Any further

reproduction or communication of this material by you may be the subject of copyright protection

under the Act.

Do not remove this notice.

Why does one collect data?

Lecture topics so far-

• Epidemiology

• Measures of disease frequency

• Intro to Biostatistics

• Variables, scales and descriptive statistics

• Intro to study designs

• Survey methods

• Sampling methods

You collect data because you have and interesting QUESTION or IDEA or

THEORY about the world.

• Statistics is about helping you answer your question.

Image from http://www.aamu.edu/Academics/EHBS/HSHPCD/csd/Pages/Program-Data.aspx

Initial Observation

• Find something that needs explaining

– Observe the real world

– Read other research

• Test the concept: collect data

– Collect data to see whether your hunch is correct

– To do this you need to define variables

o Anything that can be measured and can differ across entities

or time.

- Analyse data

- Fit statistical model to the data

Types of research

• Correlational research:

– Observing what naturally goes on in

the world without directly interfering

with it.

• Cross-sectional research:

– This term implies that data come from

people at different age points with

different people representing each

age point.

• Experimental research:

– One or more variable is

systematically manipulated to see

their effect (alone or in combination)

on an outcome variable.

– Statements can be made about

cause and effect.

Image from http://www.leichtmanresearch.com/research.html

Research hypothesis

Hypothesis = A proposition for reasoning

= A suggestion as to why

something might be as it is

= A prediction from a theory.

A testable statement of the state of the world.

Examples of testable and non testable statements:

–The Beatles were the most influential band ever = Non-scientific

statement.

–The Beatles were the best selling band ever = Scientific/testable

statement.

Good theories produce hypotheses that are scientific statements.

Scientific statements are ones that can be verified with reference to empirical

evidence.

The

research

process

Initial observation

(research

question/hypothesis)

Generate theory

Generate stats

hypotheses

Collect data to test

theory

Analyse data

Data

Identify variables

Measure variables

- Graph data

- Fit a modelAdapted from Field, A. Page 3

Research hypothesis & Statistical hypotheses

Research hypothesis

Null hypothesis

Experimental hypothesis

Do ‘great’ supervisors produce

better students?

No difference between students

Students with highly rated supervisors will be rated better than students with lower rated supervisors

The experimental hypothesis

• The hypothesis or prediction that comes from your theory is usually saying

that an effect will be present. It says that there will be a difference between

groups.

• This is called the

– Experimental hypothesis OR

– The alternative hypothesis (because it relates to a type of

methodology)

• It is labelled like this: H1

• Examples of an experimental hypothesis:

– H1 = The Beatles have sold more records than Michael Jackson.

– H1 = The instance of dry eye is different in men and women.

– H1 = Students with highly rated supervisors will score better marks

than students with lower rated supervisors.

The null hypothesis

• It is the opposite of the experimental hypothesis. It states

that nothing interesting will happen.

• This states that there is no effect.

• It is labelled like this: H0

• Examples of a null hypothesis:

– H0 = The number of records sold by The Beatles and

Michael Jackson will not be different.

– H0 = There is no difference in the prevalence of dry eye

in men and women.

– H0 = No difference between the marks of students who

had highly rated supervisors to those who had lowly

rated supervisors.

The null hypothesis

• Remember that null hypothesis does not necessarily

state that the size of the effect is zero.

Footnote slide- for those who are curious:

Why do we need the null hypothesis? • The issue of truth.

• We can’t prove the truth, but we can talk in terms of probability.

• Can never really prove our hypothesis.

• We cannot prove the experimental (alternate) hypothesis using statistics,

but we can reject the null hypothesis.

• If we get data that gives us confidence to reject our null hypothesis, that

gives us support to our experimental hypothesis.

• However, even if we reject our null hypothesis that still doesn’t prove our

experimental hypothesis.

• A basic understanding of hypothesis testing is focused on accepting or

rejecting the null hypothesis. However what we really should talk about is

‘the chances of obtaining the data, assuming the null hypothesis is true.”

Example Experimental hypothesis: H1 Null hypothesis: H0

Music The Beatles have sold more

records than Michael Jackson.

The number of records sold by

The Beatles and Michael Jackson

will not be different.

Optometry

(Correlational

research)

The instance of dry eye is

different in men and women.

There is no difference in the

prevalence of dry eye in men and

women.

Optometry

(Experimental

research)

Students with highly rated

supervisors will have higher

marks than students with lower

rated supervisors.

No difference between the marks

of students who had highly rated

supervisors to those who had

lowly rated supervisors.

Summary of H1 and H0 Examples

Example

Group A.

Students with an ‘OK’ Supervisor

Name Student mark

Peter

Sarah

Alex

John

….

Mean Mean of Group B

Group B.

Students with a ‘Great’ Supervisor

Name Student mark

Tom

Jill

Sally

Louise

….

Mean Mean of Group B

Logic behind testing the hypotheses

We evaluate our statistical hypotheses by

thinking about the chance of getting the data

we found in association with the null

hypothesis statement.

Chance/ probability / likelihood

Experimental hypothesis: H1 Null hypothesis: H0

Students with highly rated

supervisors will have higher marks

than students with lower rated

supervisors.

No difference between the marks of

students who had highly rated

supervisors to those who had lowly

rated supervisors.

Logic behind testing the hypothesis: example 1.

Let’s say that 75% of students with highly rated supervisors got high marks (say A or A+).

What are the chances that we got this result by accident?

Let’s consider the null hypothesis: No difference between students.

E.g. If the null hypothesis is true (that there is no difference between students), what are the chances that 75% students in the great supervisor group had high marks just by chance?

Not very likely. It is pretty unlikely that we accidently got 75% of high mark students in the ‘great supervisor’ group.


Students with highly rated supervisors will

have higher marks than students with lower

rated supervisors.

No difference between the marks of students

who had highly rated supervisors to those who

had lowly rated supervisors.

Our conclusion…• Therefore we were unlikely to have gotten the data we did if the null

hypothesis were true.

OR said differently:

• If there is no true difference between the groups, it is pretty unlikely that

when we collected data we randomly got 75% of students with high marks

in our ‘great supervisor’ group.

THEREFORE

A basic understanding = we reject the null hypothesis and feel that we have

support for our experimental hypothesis*

So we think that it is not likely to be the case that there is no difference

between the groups. It seems possible that the 2 groups of students are in

fact different. We will need to look at the means of the groups to see the

direction of the difference.

* remember that, even if we reject our null hypothesis that still doesn’t prove

our experimental hypothesis.

Logic behind testing the hypothesis- example 2.

But what if our result was not 75%, but something like 8%?

(let’s say that 8% of students with highly rated supervisors got high marks of A or A+).

If we assume the null hypothesis is true:

What is the chance that 8% of the students in the ‘great supervisor’ group got high marks just by random chance?

In my opinion 8% is a low percentage…

Well maybe this result could be a random accident.

…now it seems that 8% with high marks is possible or likely, so we might feel uncomfortable rejecting the null hypotheses and we say that it is possible that the 2 groups are not different.


Students with highly rated supervisors will

have higher marks than students with lower

rated supervisors.

No difference between the marks of students

who had highly rated supervisors to those who

had lowly rated supervisors.

Fitting statistical models to data

• We have looked at testing the hypotheses in a very

informal manor by casually asking “what is the chance

that we got a result by accident”.

• Statistical testing formalises this process. You will be

learning about this process in more detail in

– Lecture 17 – Statistical test selection by "Mr

Krishnaiah Sannapaneni"

– Lecture 18 – Statistical Analysis by Mr. Hasnat Ali

• Is all about this logic of seeing if the data you got is a

‘real’ representation of the world of if it is just due to

chance.

Lecture Summary• The process of doing research.

• Research questions are formulated through observing phenomena or collecting data about a hunch.

• Once the observation has been confirmed, theories can be generated about why something happens.

• From these theories we formulate hypotheses that we can test. To test hypotheses we need to measure things and this leads us to think about the variables that we need to measure and how to measure them.

• Then we collect some data.

• The beginning of the final stage is to analyse these data. We can begin analysing the data by looking at the shape of it in graphs/descriptive stats (lecture 4), but ultimately we should end up fitting a statistical model to the data (lecture 17).

• Then you can interpret your results and think about your next research question!

End of lecture

…oh and the Beatles are the best selling band of all time.

http://ifpi.org

http://ifpi.org/

References

Chapter 1 in Field, A. P. (2009). Discovering statistics using SPSS. London, England : SAGE.

http://www.statisticshell.com/html/limbo.html

http://www.uk.sagepub.com/field3e/MCQch01.htm?gm=PracticeQuiz&ds=ALL&sk=wct&arg=ttttf%27%

29

http://www.statisticshell.com/html/limbo.html

http://www.uk.sagepub.com/field3e/MCQch01.htm?gm=PracticeQuiz&ds=ALL&sk=wct&arg=ttttf')

New words

• Theories

– An hypothesized general principle or set of principles that explain known findings about a

topic and from which new hypotheses can be generated.

• Hypothesis

– A prediction from a theory.

• Falsification

– The act of disproving a theory or hypothesis.

Homework task: find definitions for these:

• Experimental hypothesis/Alternate hypothesis

• Null hypothesis

• Independent variable

• Dependent variable

Self-test Quiz

The aim of experimental research is to:

a) be a phenomenon

b) cause a phenomenon

c) investigate what caused a phenomenon

d) to prevent a phenomenon

Turing a research question into a testable hypothesis

i) Identify if each statement in the list below is a scientific statement or not?

Remember that scientific statements be can be proved/ are testable.

ii) For the statements that are not testable, can you change their wording to make them scientific?

List of statements

• Chocolate is the best food.

• Watching television makes you happy.

• Cricket is the world’s most popular sport to watch.

• Coke is the worst drink.