Unit 2: Comparing Two Groups In Unit 1, we learned the basic process of statistical inference using...
69
Unit 2: Comparing Two Groups In Unit 1, we learned the basic process of statistical inference using tests and confidence intervals. We did all this by focusing on a single proportion. In Unit 2, we will take these ideas and extend them to comparing two groups. We will compare two proportions, two independent means, and paired data.
Unit 2: Comparing Two Groups In Unit 1, we learned the basic process of statistical inference using tests and confidence intervals. We did all this by
Unit 2: Comparing Two Groups In Unit 1, we learned the basic
process of statistical inference using tests and confidence
intervals. We did all this by focusing on a single proportion. In
Unit 2, we will take these ideas and extend them to comparing two
groups. We will compare two proportions, two independent means, and
paired data.
Slide 2
Chapter 5: Comparing Two Proportions 5.1: Descriptive (Two-Way
Tables) 5.2: Inference with Simulation-Based Methods 5.3: Inference
with Theory-Based Methods
Slide 3
Positive and Negative Perceptions Consider these two questions:
Are you having a good year? Are you having a bad year? Do people
answer each question in such a way that would indicated the same
answer? (e.g. Yes for the first one and No for the second.)
Slide 4
Researchers questioned 30 students (randomly giving them one of
the two questions). They then recorded if a positive or negative
response was given. Is this an observational study or randomized
experiment? Positive and Negative Perceptions
Slide 5
Observational units The 30 students Variables Question wording
(good year or bad year) Perception of their year (positive or
negative) Which is the explanatory and which is the response?
Positive and Negative Perceptions
Slide 6
Individual Type of Question ResponseIndividual Type of Question
Response 1Good YearPositive16Good YearPositive 2Good
YearNegative17Bad YearPositive 3Bad YearPositive18Good YearPositive
4Good YearPositive19Good YearPositive 5Good YearNegative20Good
YearPositive 6Bad YearPositive21Bad YearNegative 7Good
YearPositive22Good YearPositive 8Good YearPositive23Bad
YearNegative 9Good YearPositive24Good YearPositive 10Bad
YearNegative25Bad YearNegative 11Good YearNegative26Good
YearPositive 12Bad YearNegative27Bad YearNegative 13Good
YearPositive28Good YearPositive 14Bad YearNegative29Bad
YearPositive 15Good YearPositive30Bad YearNegative Raw Data in a
Spreadsheet
Slide 7
A two-way table organizes data Summarizes two categorical
variables Also called contingency table Are students more likely to
give a positive response if they were given the good year question?
Two-Way Tables Good YearBad YearTotal Positive response 15419
Negative response 3811 Total 181230
Slide 8
Conditional proportions will help us better determine if there
is an association between the question asked and the type of
response. We can see that those given the positive question were
more likely to respond positively. Two-Way Tables Good YearBad
YearTotal Positive response 15/18 0.834/12 0.3319 Negative response
3811 Total 181230
Slide 9
We can use segmented bar graphs to see this association.
Remember that variables are associated if the conditional
proportion of the outcomes for one group differ from the
conditional proportion of outcomes in other groups. Segmented Bar
Graphs
Slide 10
Those responding to the good year question were more likely to
answer positively (83% to 33%) than those responding positively to
the bad year question. The statistic we will be using to measure
this is the difference in proportions. 0.83 - 0.33 = 0.50 higher
for the good year question than the bad year question.
Perceptions
Slide 11
In the next section we will conduct tests of significance to
compare two proportions and I want to give you a preview of that
here. We will assume there is no association between the variables
(i.e. the two population proportions are the same) and decide if
two sample proportions differ enough to conclude this would be very
unlikely just by random chance. A Sneak Peak at Comparing Two
Proportions: Simulation-Based Approach
Slide 12
Hypotheses Null Hypothesis: There is no association between
which question is asked and the type of response. (The proportion
of positive responses will be the same in each group. ) Alternative
Hypothesis: There is an association between which question is asked
and the type of response. (The proportion of positive responses
will be different in each group. )
Slide 13
Results Good YearBad YearTotal Positive response 15 (83%)4
(33%)19 Negative response 3811 Total 181230 The difference in
proportions of positive responses is 0.83 0.33 = 0.50. How likely
is a difference this great or greater if the type of question asked
made no difference in how the student would respond?
Slide 14
Random Reassignment Notice that 19 students gave a positive
response. If the null hypothesis is true, these 19 would have given
a positive response no matter which question was asked. Therefore,
under a true null hypothesis, we can randomly place these 19 people
into either group and they will still give a positive response.
This replicates the random assignment that was done in the
experiment. We will also keep constant the 18 that receive the
positive question and 12 that receive the negative question. Good
YearBad YearTotal Positive response random 19 Negative response
random 11 Total181230
Slide 15
You can think about this random reassignment with the raw data
as well. It doesnt matter which question was asked, the responses
will be the same. Therefore, we can shuffle the type of question
and leave the responses fixed. This is equivalent to keeping the
same column and row totals and just shuffling the inside of the
two-way table as described earlier. IndividualType of
QuestionResponseIndividualType of QuestionResponse 1Good
YearPositive16Good YearPositive 2Good YearNegative17Bad
YearPositive 3Bad YearPositive18Good YearPositive 4Good
YearPositive19Good YearPositive 5Good YearNegative20Good
YearPositive 6Bad YearPositive21Bad YearNegative 7Good
YearPositive22Good YearPositive 8Good YearPositive23Bad
YearNegative 9Good YearPositive24Good YearPositive 10Bad
YearNegative25Bad YearNegative 11Good YearNegative26Good
YearPositive 12Bad YearNegative27Bad YearNegative 13Good
YearPositive28Good YearPositive 14Bad YearNegative29Bad
YearPositive 15Good YearPositive30Bad YearNegative
Slide 16
Random Reassignment I did this once and found a difference in
the proportions of positive responses for the two questions of 0.50
0.83 = 0.33 Good YearBad YearTotal Positive response 9 (50%)10
(83%) 19 Negative response 92 11 Total 181230
Slide 17
Random Reassignment I did this again and found a difference in
the proportions of positive responses for the two questions of 0.61
0.67 = 0.06 Good YearBad YearTotal Positive response 11 (61%)8
(67%) 19 Negative response 74 11 Total 181230
Slide 18
Random Reassignment I did this again and found a difference in
the proportions of positive responses for the two questions of 0.67
0.58 = 0.09 Good YearBad YearTotal Positive response 12 (67%)7
(58%) 19 Negative response 65 11 Total 181230
Slide 19
Random Reassignment In my three randomizations, I have yet to
see a difference in proportions that is as far away from zero as
the observed difference of 0.5. Lets do some more randomizations to
develop a null distribution.
Slide 20
Random Reassignment After 1000 randomizations, only 7 were as
far away from zero as our observed proportion.
Slide 21
Conclusion Since we have a p-value of 7/1000 or 0.007, we can
conclude the alternative hypothesis and say we have strong evidence
that how the question is phrased affects the response.
Slide 22
Applets Lets look at how this is done in two applets Simulation
for Two Proportions Simulation for Multiple Proportions
Slide 23
Exploration 5.1: Murderous Nurse?
Slide 24
Example 5.2: Swimming With Dolphins
Slide 25
Is swimming with dolphins therapeutic for patients suffering
from clinical depression? Researchers recruited 30 subjects aged
18- 65 with a clinical diagnosis of mild to moderate depression.
Discontinued antidepressants and psychotherapy 4 weeks prior to and
throughout the experiment 30 subjects went to an island near
Honduras Randomly assigned to two treatment groups Swimming with
Dolphins
Slide 26
Both groups engaged in one hour of swimming and snorkeling each
day. One group swam in the presence of dolphins and the other group
did not. Participants in both groups had identical conditions
except for the dolphins After 2 weeks, each subjects level of
depression was evaluated, as it had been at the beginning of the
study The response variable is if the subject achieved substantial
reduction in depression. Swimming with Dolphins
Slide 27
Observational units The 30 subjects with mild to moderate
depression. Explanatory variable Swimming with dolphins or not
Response variable Reduction in depression or not Are the variables
quantitative or categorical? Swimming with Dolphins
Slide 28
Is this study an observational study or an experiment? Are the
subjects in this study a random sample from a larger population?
Swimming with Dolphins
Slide 29
Results Swimming with Dolphins Dolphin group Control group
Total Improved10 (67%)3 (20%)13 Did Not Improve51217 Total15
30
Slide 30
The difference in proportions of improvers is 0.67 0.20 = 0.47.
There are two possible explanations for an observed difference of
0.47. A tendency to be more likely to improve with dolphins The 13
subjects were going to show improvement with or without dolphins
and random chance assigned more improvers to the dolphins Swimming
with Dolphins
Slide 31
Null hypothesis: Dolphins dont help Swimming with dolphins is
not associated with substantial improvement in depression
Alternative hypothesis: Dolphins help Swimming with dolphins
increases the probability of substantial improvement in depression
symptoms Swimming with Dolphins
Slide 32
Slide 33
Null Hypothesis: The probability someone exhibits substantial
improvement after swimming with dolphins is the same as the
probability someone exhibits substantial improvement after swimming
without dolphins. Alternative Hypothesis: The probability someone
exhibits substantial improvement after swimming with dolphins is
higher than the probability someone exhibits substantial
improvement after swimming without dolphins. Swimming with
Dolphins
Slide 34
Slide 35
If the null hypothesis is true (dolphin therapy is not better)
we would have 13 improvers and 17 non-improvers regardless of the
group they were in. Any differences we see between groups arise
solely from the randomness in the assignment to the groups.
Swimming with Dolphins
Slide 36
We can perform this simulation with cards. 13 black cards
represent the improvers 17 red cards represent the non-improvers We
assume these outcomes would happen no matter which treatment group
subjects were in. Shuffle the cards and put 15 in one pile (dolphin
therapy) and 15 in another (control group) An improver is equally
likely to be assigned to each group Swimming with Dolphins
Slide 37
In the actual study, there were 10 improvers (diff of 0.47) in
the dolphin group. We conducted 3 simulations and got 8, 5, and 6
improvers in the dolphin therapy group. (notice the diff in
proportions) Swimming with Dolphins
Slide 38
We did 1000 repetitions to develop a null distribution. Why is
it centered at about 0? What does each dot represent? Swimming with
Dolphins
Slide 39
Slide 40
A 95% confidence interval for the difference in the probability
using the standard deviation from the null distribution is 0.467 +
2(0.178) = 0.467 + 0.356 or (0.111to 0.823) We are 95% confident
that when allowed to swim with dolphins, the probability of
improving is between 0.111 and 0.823 higher than when no dolphins
are present. How does this interval back up our conclusion from the
test of significance? Swimming with Dolphins
Slide 41
Can we say that the presence of dolphins caused this
improvement? Since this was a randomized experiment, and assuming
everything was identical between the groups, we have strong
evidence that dolphins were the cause Can we generalize to a larger
population? Maybe mild to moderately depressed 18-65 year old
patients willing to volunteer for this study We have no evidence
that random selection was used to find the 30 subjects. Swimming
with Dolphins
Slide 42
Exploration 5.2: Contagious Yawns? MythBusters investigated
this. 50 subjects were ushered into a small room by co-host Kari.
She yawned as she ushered 34 in the room and for 16 she didnt yawn.
We will assume she randomly decided who would received the
yawns.
Slide 43
Comparing Two Proportions: Theory-Based Approach Section
5.3
Slide 44
Introduction Just as with a single proportion, we can often
predict results of a simulation using a theory-based approach. The
theory-based approach also gives a simpler way to generate a
confidence intervals.
Slide 45
Smoking and Birth Gender
Slide 46
Smoking and Gender How does parents behavior affect the sex of
their children? Fukuda et al., 2002 (Japan) found the following:
255 of 565 births (45.1%) where both parents smoked more than a
pack a day were boys. 1975 of 3602 births (54.8%) where both
parents did not smoke were boys. Other studies have shown a reduced
male to female birth ratio where high concentrations of other
environmental chemicals are present (e.g. industrial pollution,
pesticides)
Slide 47
Smoking and Gender A segmented bar graph and 2-way table Lets
compare the proportions to see if the difference is statistically
significantly.
Slide 48
Smoking and Gender
Slide 49
What are the observational units in the study? What are the
variables in this study? Which variable should be considered the
explanatory variable and which the response variable? Can you draw
cause-and-effect conclusions for this study?
Slide 50
Smoking and Gender OK to shuffle? In the last section we
re-randomized subjects to treatment groups to simulate the null
distribution. In this study the parents werent randomized to the
treatment, since its observational, but we can still represent the
null hypothesis of no association through randomization.
Slide 51
Smoking and Gender Use the 3S Strategy to asses the strength 1.
Statistic: The proportion of boys born to nonsmokers minus boys
born to smokers is 0.548 0.451 = 0.097.
Slide 52
Smoking and Gender 2. Simulate: Use the Multiple Proportions
applet to simulate Many repetitions of shuffling the 2230 boys and
1937 girls to the 565 smoking and 3602 nonsmoking parents Calculate
the difference in proportions of boys between the groups for each
repetition. Shuffling simulates the null hypothesis of no
association
Slide 53
Smoking and Gender 3. Strength of evidence: Nothing as extreme
as our observed statistic ( 0.097 or 0.097) occurred in 5000
repetitions, How many SDs is 0.097 above the mean?
Slide 54
Smoking and Gender Notice the null distribution is centered at
zero and is bell-shaped. This, along with its standard deviation
can be predicted using normal distributions.
Slide 55
Smoking and Gender We can use either the Multiple Proportion
applet or the Theory-Based Inference applet to find the
p-value
Slide 56
Smoking and Gender
Slide 57
From our test of significance, do we expect 0 to be in the
interval of plausible values for the difference in the population
proportions?
Slide 58
Smoking and Gender Again, either applet can be used to
determine a confidence interval. We are 95% confident that the
probability of a boy baby is 0.053 to 0.141 higher for families
where neither parent smokes compared to families with two smoking
parents
Slide 59
Smoking and Gender We can also write the confidence interval in
the form: statistic margin of error. Our statistic is the observed
sample difference in proportions, 0.097. We can find the margin of
error by subtracting the statistic (center) from the upper endpoint
or 0.141 0.097 = 0.044. 0.097 0.044 Is the margin of error about
the standard deviation?
Slide 60
Smoking and Gender How would the interval change if the
confidence level was 99%?
Slide 61
Smoking and Gender Written as the statistic margin of error
0.097 0.058. Margin of error 0.058 for the 99% confidence interval
0.044 for the 95% confidence interval
Slide 62
Smoking and Gender
Slide 63
( 0.141, 0.053) or 0.097 0.044 instead of (0.053, 0.141) or
0.097 0.044 The negative signs indicate the probability of a boy
born to smoking parents is lower than that for nonsmoking
parents.
Slide 64
Smoking and Gender Validity Conditions of Theory-Based Same as
with a single proportion. Should have at least 10 observations in
each of the cells of the 2 x 2 table. Smoking Parents Non- smoking
Parents Total Male 25519752230 Female 31016271937 Total
56536024167
Slide 65
Smoking and Gender The strong significant result in this study
yielded quite a bit of press when it came out Soon other studies
came out which found no relationship between smoking and gender One
article argued that confounding variables like social factors,
diet, environmental exposure or stress were the reason for
different studys results. (These are all possible since it was an
observational study.)
Slide 66
Formulas
Slide 67
Formulas
Slide 68
Strength of Evidence As the proportions move farther away from
each other, the strength of evidence increases. As sample size
increases, the strength of evidence increases.
Slide 69
Lets run this previous test using both the Simulation-Based and
the Theory- Based Applets. Donating Blood Exploration 5.3 Questions
1-14 (skip 2 and 5)