Upload
truonghuong
View
230
Download
0
Embed Size (px)
Citation preview
AP Statistics – Ch. 10 Notes
Comparing Two Proportions
Situations in which we perform inference about the difference between two proportions (1 2
p p−−−− ):
• Comparing the proportions of individuals with a certain characteristic in two different populations: The parameters of interest are the true proportions of individuals with the characteristic in
each population, 1p and 2.p We estimate these proportions by taking separate random samples from the
two population and calculating the proportion of individuals in each sample with the characteristic ( 1p̂
and 2p̂ ).
• Comparing the proportions of successful outcomes for two treatment groups in a completely randomized experiment: The parameters of interest are the true proportions of successful outcomes for
each treatment, 1p and 2.p We estimate these proportions using the proportions of successes in the two
treatment groups of our randomized experiment ( 1p̂ and 2p̂ ).
We compare the populations or treatments by doing inference about the difference 1 2p p− between the
parameters. The statistic that we use to estimate this difference in a confidence interval or hypothesis test is the
difference between the two sample proportions, 1 2ˆ ˆ .p p−
The Sampling Distribution of 1 2
ˆ ˆp p−−−−
Choose an SRS of size 1n from Population 1 with proportion of successes 1p and an independent SRS of size
2n from Population 2 with proportion of successes 2.p
• Shape: When 1 1 1 1 2 2 2 2, , , and n p n q n p n q are all at least 10, the sampling distribution of 1 2ˆ ˆp p− is
approximately Normal.
• Center: 1 2ˆ ˆ 1 2.p pµ p p−
= − That is, the difference in sample proportions is an unbiased estimator of the
difference in population proportions.
• Spread: 1 2
1 1 2 2ˆ ˆ
1 2
p p
p q p qσ
n n−
= + as long as the 10% condition is met.
Example: A researcher reports that 80% of high school graduates but only 40% of high school dropouts would
pass a basic literacy test. Assume the researcher’s claim is true. Suppose we give a basic literacy test to a
random sample of 60 high school graduates and a separate random sample of 75 high school dropouts. Let
graduatep̂ and dropoutp̂ be the proportions of graduates and dropouts in the samples who pass the test, respectively.
a) What is the shape of the sampling distribution of graduate dropout
ˆ ˆ ?p p− How do you know?
b) Find the mean and standard deviation of the sampling distribution of graduate dropout
ˆ ˆ .p p− Interpret these
values in context.
c) Find the probability that in your samples the proportion of graduates who pass the test is no more than
0.20 higher than the proportion of dropouts who pass.
d) Suppose that the difference in the sample proportions (graduate – dropout) who pass the test is exactly
0.20. Based on your result in part (c), would this give you reason to doubt the researcher’s claim?
Explain.
Just as we can construct confidence intervals and perform hypothesis tests for one-sample situations, we can do
the same for two-sample situations.
When we are constructing a confidence interval for 1 2 ,p p− we don’t know the values of 1p or 2 ,p so we have
to use 1p̂ and 2p̂ to estimate these values in the formula for standard deviation.
Standard Error (or Estimated Standard Deviation) of 1 2
ˆ ˆp p−−−− : 1 2
1 1 2 2ˆ ˆ
1 2
ˆ ˆ ˆ ˆp p
p q p qSE
n n−
= +
Two-Sample z Interval for a Difference between Two Proportions (or Two-Proportion z Interval)
An approximate level C confidence interval for 1 2p p− is
( ) 1 1 2 21 2
1 2
ˆ ˆ ˆ ˆˆ ˆ
p q p qp p z
n n
∗− ± +
where z∗ is the critical value for the standard Normal curve with area C between z∗− and z∗ .
Conditions:
• Random: The data come from independent random samples from two different populations
• Normal: The counts of “successes” and “failures” in each sample— 1 1ˆ ,n p 1 1
ˆ ,n q 2 2ˆ ,n p and 2 2
ˆn q — are all at
least 10.
• 10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.
Confidence Interval for the Difference between Two Proportions on TI-83/TI-84 Calculators. 1. Choose “2-PropZInt” on the STAT → TESTS menu.
2. Enter the requested information: x1: number of successes in sample 1
n1: sample size of sample 1
x2: number of successes in sample 2
n2: sample size of sample 2
C-Level: confidence level (as a decimal)
3. Choose “Calculate”
Example: Did the proportion of U.S. adults who would report having read a book in the past year change
between 2015 and 2018? In a random sample of 1,403 U.S. adults in April 2015, 72% reported having read a
book in the past year. In another random sample of 2,001 U.S. adults in January 2018, 76% reported having
read a book in the past year.
a) Calculate the standard error of the sampling distribution of the difference in the sample proportions
(2018 – 2015). Interpret this value.
b) Construct and interpret a 90% confidence interval for the difference in the proportions of all U.S. adults
who would report having read a book in the past year in 2018 and in 2015 (2018-2015).
c) Based on your interval, is there convincing evidence that the proportion of U.S. adults who would report
having read a book in the past year changed between 2015 and 2018? Explain.
Example: Are teens or adults more likely to be online “almost constantly”? The Pew Internet and American
Life Project asked a random sample of 1060 teens (September 2014 to March 2015) and a separate random
sample of 2001 adults (June to July 2015) how often they use the Internet. In these two surveys, 257 of the teens
and 420 of the adults said they are online “almost constantly”. Construct and interpret a 95% confidence
interval for teens adults .p p−
Significance Tests for 1 2
p p−−−−
Very often, we want to test the null hypothesis that there is no difference between two proportions, so we test
0 1 2 0 1 2: 0, or, alternatively, : .H p p H p p− = = The alternative hypothesis specifies what kind of difference we
expect.
Since the null hypothesis assumes that 1 2 ,p p= we use a pooled sample proportion to calculate the standard
deviation.
1 2
1 2
count of successes in both samples combinedˆ
count of individuals in both samples combinedC
X Xp
n n
+= = =
+pooled sample proportion
Two-Sample z Test for the Difference between Two Proportions
To test the hypothesis 0 1 2
: 0H p p− = or 0 1 2
:H p p=
Test statistic: ( )1 2
1 2
ˆ ˆ 0
ˆ ˆ ˆ ˆC C C C
p pz
p q p q
n n
− −=
+
P-value: Find the probability of getting a z statistic this large or larger in the direction specified by the
alternative hypothesis .a
H The P-value is the shaded area. For a two-tailed test, it is the total area in both tails.
Conditions:
• Random: The data come from independent random samples from two different populations
• Normal: The counts of “successes” and “failures” in each sample—1 1
ˆ ,n p 1 1̂
,n q 2 2
ˆ ,n p and 2 2ˆn q — are all at
least 10.
• 10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.
Hypothesis Test for the Difference between Two Proportions on TI-83/TI-84 Calculators. 1. Choose “2-PropZTest” on the STAT → TESTS menu.
2. Enter the requested information:
x1: number of successes in sample 1
n1: sample size of sample 1
x2: number of successes in sample 2
n2: sample size of sample 2
3. Specify which proportion the alternative hypothesis says is higher.
4. Choose “Calculate” to see results, or “Draw” to see a shaded Normal curve.
Example: Are teenagers going deaf? In a study of 3000 randomly selected teenagers in 1988-1994, 15%
showed some hearing loss. In a similar study of 1800 teenagers in 2005-2006, 19.5% showed some hearing loss.
(These data are reported in Arizona Daily Star, August 18, 2010.)
a) Do these data give convincing evidence that the proportion of all teens with hearing loss has increased?
b) Between the two studies, Apple introduced the iPod. If the results of the test are statistically significant,
can we blame iPods for the increased hearing loss in teenagers?
Example: The Centers for Disease Control and Prevention selected a random sample of Americans age 65 and
older. They found that 411 of the 1012 men and 535 of the 1062 women suffered from some form of arthritis.
Do these data provide convincing evidence that arthritis is more likely to afflict senior women than senior men?
Comparing Two Means
Situations in which we perform inference about the difference between two means (1 2µ µ−−−− ):
• Comparing the mean of some quantitative variable for the individuals in two different
populations: The parameters of interest are the population means in each population, 1µ and
2.µ We
estimate these means by taking separate random samples from each population and calculating the
sample means 1
x and 2.x
• Comparing the average effectiveness of two treatments in a completely randomized experiment:
The parameters of interest are the true mean responses for treatment 1 and treatment 2, 1µ and
2.µ We
use the mean response in the two groups, 1
x and 2,x to make the comparison.
We compare the populations or treatments by doing inference about the difference 1 2µ µ− between the
parameters. The statistic that we use to estimate this difference in a confidence interval or hypothesis test is the
difference between the two sample means, 1 2
.x x−
The Sampling Distribution of 1 2
x x−−−−
Choose an SRS of size 1
n from Population 1 with mean 1µ and standard deviation
1σ and an independent SRS
of size 2
n from Population 2 with mean 2µ and standard deviation
2.σ
• Shape: When both population distributions are Normal, the sampling distribution of 1 2
x x− is Normal.
In other cases, the sampling distribution of 1 2
x x− is approximately Normal if the sample sizes are large
enough (1
30n ≥ and 2
30n ≥ ).
• Center: 1 2 1 2 .x xµ µ µ−
= − That is, the difference in sample means is an unbiased estimator of the
difference in population means.
• Spread: 1 2
2 2
1 2
1 2
x x
σ σσ
n n−
= + as long as the 10% condition is met.
Example: A potato chip manufacturer buys potatoes from two different suppliers, Riderwood Farms and
Camberley, Inc. The weights of potatoes from Riderwood Farms are approximately Normally distributed with a
mean of 175 grams and a standard deviation of 25 grams. The weights of potatoes from Camberley are
approximately Normally distributed with a mean of 180 grams and a standard deviation of 30 grams. When
shipments arrive at the factory, inspectors randomly select a sample of 20 potatoes from each shipment and
weigh them. They are surprised when the average weight of the potatoes in the sample from Riderwood Farms,
,R
x is higher than the average weight of the potatoes in the sample from Camberley, .C
x
a) Describe the shape, center, and spread of the sampling distribution of .C R
x x− Interpret the values of the
mean and standard deviation in context.
b) Find the probability that the mean weight of the Riderwood sample is larger than the mean weight of the
Camberley sample. Should the inspectors have been surprised that the Riderwood sample had a higher
mean weight than the Camberley sample?
c) Review from Ch. 6: Find the probability that a single potato from Riderwood Farms weighs more than a
single potato from Camberley Farms.
Standard Error (or Estimated Standard Deviation) of 1 2
x x−−−− : 1 2
2 2
1 2
1 2
x x
s sSE
n n−
= +
Two-Sample t Statistic: ( ) ( )1 2 1 2
2 2
1 2
1 2
x x µ µt
s s
n n
− − −=
+
.
Like any other z or t statistic, this statistic tells us how many standard deviations the sample statistic 1 2
x x− is
from its mean. The two-sample t statistic has approximately a t distribution. There are two options for
determining the degrees of freedom.
• Option 1 (Technology): Use the t distribution with degrees of freedom calculated from the data by the
lovely formula below. With this option, the degrees of freedom may not be a whole number. 2
2 2
1 2
1 2
2 22 2
1 2
1 1 2 2
df1 1
1 1
s s
n n
s s
n n n n
+
=
+
− −
• Option 2 (Conservative): Use the t distribution with degrees of freedom equal to the smaller of 1
1n −
and 2
1.n − This always gives a confidence interval that is wider than necessary (higher margin of error)
for the desired confidence level, and a P-value that is greater than or equal to the true P-value.
Robustness of Two-Sample t Procedures Two-sample t procedures are even more robust against non-Normality than the one-sample t procedures. This is
especially true if the two populations being compared have distributions with similar shapes. The two-sample t
procedures are most robust against non-Normality when the sample sizes are equal or very similar.
Two-Sample t Interval for a Difference between Two Means (or Two-Sample t Interval)
An approximate level C confidence interval for 1 2µ µ− is
( )2 2
1 21 2
1 2
s sx x t
n n
∗− ± +
where t∗ is the critical value for confidence level C for the t distribution with degrees of freedom approximated
by technology or the smaller of 1
1n − and 2
1.n −
Conditions:
• Random: The data come from independent random samples from two different populations
• Normal/Large Sample Size: Both samples are large (1
30n ≥ and 2
30n ≥ ) OR no strong skewness or
outliers can be seen in the graph of either distribution of sample data. (You are checking to make sure it is
reasonable to believe that both populations distributions are approximately Normal.)
• 10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.
Confidence Interval for the Difference between Two Means on TI-83/TI-84 Calculators. 1. Choose “2-SampTInt” on the STAT → TESTS menu.
2. Choose “Data” if you have a list of sample data. Choose “Stats” if you have values for 1 2 1 2, , and .x x s s
3. Enter the requested information:
For “Data” option, input the sample values into two lists and indicate which lists they are in.
For “Stats” option,
1
x : mean of sample 1 2
x : mean of sample 2
Sx1: sample st. dev. of sample 1 Sx2: sample st. dev. of sample 2
n1: size of sample 1 n2: size of sample 2
C-level: confidence level (as a decimal)
Always choose “NO” pooling! 4. Choose “Calculate”
Example: A team of anthropologists headed by reasearcher Nicole Waguespack studied the difference between
stone-tipped and wooden-tipped arrows. Stone arrow tips are tougher, but also take longer to make. Many
cultures used both types of arrow tips, including the Apache in North America and the Tiwi in Australia. The
researchers set up a compound bow with 60 lbs. of force. They shot arrows of both types into a hide-covered
ballistics gel and measured the penetration depth in mm. Arrows that penetrate more deeply into their targets
make deadlier weapons. Here are the penetration depths for seven shots with each type of arrow tip.
Wooden (mm) 216 211 192 208 203 210 203
Stone (mm) 240 208 213 225 232 214 240
a) Draw parallel dotplots of the data. Just from looking at the graphs, do you expect to find evidence of a
significant difference in penetration depth for the two types of arrow tips?
b) Construct and interpret a 95% confidence interval for the difference in mean penetration depth for the
two types of arrow tips.
c) Does your interval provide convincing evidence that there is a difference in the mean penetration depth
for the two types of arrow tips? Does the difference seem worth the extra time, effort, and cost of stone
tips?
Two-Sample t Test for the Difference between Two Means
To test the hypothesis 0 1 2
: hypothesized value,H µ µ− = compute the two-sample t statistic
Test statistic: ( ) ( )1 2 1 2
2 2
1 2
1 2
x x µ µt
s s
n n
− − −=
+
P-value: Find the probability of getting a t statistic this large or larger in the direction specified by the
alternative hypothesis .a
H Use the t distribution with degrees of freedom approximated by technology or the
smaller of 1
1n − and 2
1.n − The P-value is the shaded area. For a two-tailed test, it is the total area in both tails.
1 2 1 2 1 2
: hypothesized value : hypothesized value : hypothesized valuea a a
H µ µ H µ µ H µ µ− > − < − ≠
Conditions:
• Random: The data come from independent random samples from two different populations
• Normal/Large Sample Size: Both samples are large (1
30n ≥ and 2
30n ≥ ) OR no strong skewness or
outliers can be seen in the graph of either distribution of sample data. (You are checking to make sure it is
reasonable to believe that both populations distributions are approximately Normal.)
10% Condition: Check that both populations are at least 10 times as large as their corresponding samples.
Significance Tests for the Difference between Two Means on TI-83/TI-84 Calculators. 1. Choose “2-SampTTest” on the STAT → TESTS menu.
2. Choose “Data” if you have a list of sample data. Choose “Stats” if you have values for 1 2 1 2, , and .x x s s
3. Enter the requested information:
For “Data” option, input the sample values into two lists and indicate which lists they are in.
For “Stats” option,
1
x : mean of sample 1 2
x : mean of sample 2
Sx1: sample st. dev. of sample 1 Sx2: sample st. dev. of sample 2
n1: size of sample 1 n2: size of sample 2
Always choose “NO” pooling! 5. Specify which proportion the alternative hypothesis says is higher.
6. Choose “Calculate” to see results, or “Draw” to see a shaded Normal curve.
Example: In commercials for Bounty paper towels, the manufacturer claims that they are the “quicker picker-upper.” But
are they also the stronger picker upper? Two AP Statistics students, Wesley and Maverick, decided to find out.
They selected a random sample of 30 Bounty paper towels and a random sample of 30 generic paper towels and
measured their strength when wet. To do this, they uniformly soaked each paper towel with 4 ounces of water,
held two opposite edges of the paper towel, and counted how many quarters each paper towel could hold until
ripping, alternating brands. Here are their results:
Bounty: 106 111 106 120 103 112 115 125 116 120 126 125 116 117 114
118 126 120 115 116 121 113 111 128 124 125 127 123 115 114
Generic: 77 103 89 79 88 86 100 90 81 84 84 96 87 79 90
86 88 81 91 94 90 89 85 83 89 84 90 100 94 87
a) Display these distributions using parallel boxplots and briefly compare these distributions. Based only
on the boxplots, discuss whether or not your think the mean for Bounty is significantly higher than the
mean for generic.
b) Use a significance test to determine whether there is convincing evidence that wet Bounty paper towels
can hold more weight, on average, than wet generic paper towels can.
c) Interpret the P-value from part (b) in the context of this question.
Independent Samples vs. Paired Samples – Which Is It? a) To test the effect of background music on productivity, several workers are observed. For one month
they had no music. For another month they had background music.
b) A random sample of 10 workers in Plant A are to be compared to a sample of 10 workers in Plant B.
c) A new weight reducing diet was tried on ten women. The weight of each woman was measured before
the diet, and again after being on the diet for ten weeks.
d) To compare the average weight gain of pigs fed two different rations, nine pairs of pigs were used. The
pigs in each pair were litter-mates.
e) To test the effects of a new fertilizer, 100 plots are treated with one fertilizer, and 100 plots are treated
with the other.
f) A sample of college teachers is taken. We wish to compare the average salaries of male and female
teachers.
g) A new fertilizer is tested on 100 plots. Each plot is divided in half. Fertilizer A is applied to one half and
B to the other.
h) Consumers Union wants to compare two types of calculators. They get 100 volunteers and ask them to
carry out a series of 50 routine calculations (such as figuring discounts, sales tax, totaling a bill, etc.).
Each volunteer does each calculation on both types of calculator, and the time required for each
calculation is recorded.
Inference for Experiments
Important Differences
Parameters:
• Proportions of individuals like those in the study who would respond a certain way to each treatment.
• Mean response sizes for individuals like those in the study.
� Caution: Avoid past tense when defining your parameters and in your conclusion. If you refer to how
individuals did respond rather than how they would respond, you are talking about your treatment group and
are referring to values of statistics ( ˆ 'p s or 'x s ) that can be calculated directly rather than the unknown
parameters ( 'p s or 'µ s ) for which you are doing inference. Also, do not refer to subjects when defining
your parameter or in your conclusion for the same reason – subjects are people who actually took part in
your experiment, not individuals similar to them.
Conditions:
• Random: The data come from two groups in a randomized experiment.
• Normal:
o For 2-sample inference about proportions:
� 1 1 1 1 2 2 2 2
ˆ ˆ ˆ ˆ, , , and n p n q n p n q must all be at least 10. That is, there must be at least 10
successes and 10 failures in each treatment group.
o For 2-sample inference about means:
� The two treatment groups are both large (1 2
30 and 30n n≥ ≥ ) OR no strong skewness
or outliers can be seen in the graph of either distribution of sample data. (You are
checking to make sure it is reasonable to believe that the true distributions of responses to
the two treatments are approximately Normal).
• Independent: The outcomes for the individuals in the study must be independent of each other. (The
outcome for any individual shouldn’t give you any new information about the likely outcome for any
other individual.) In a well-designed experiment that includes controls and random assignment, this
should be true. (If you want to study the effects of the treatments, you must control any other variables
that might affect the outcome, including any influence the subjects might have on each other.)
� DO NOT CHECK THE 10% CONDITION! (You didn’t take a random sample, so it isn’t
correct to check the 10% condition, and YOU WILL LOSE POINTS if you do!)
Example: High levels of cholesterol in the blood are associated with a higher risk of heart attacks. Will using a
drug to lower blood cholesterol reduce heart attacks? The Helsinki Heart Study recruited middle-aged men with
high cholesterol but no history of other serious medical problems to investigate this question. The volunteer
subjects were assigned at random to one of two treatments: 2051 men took the drug gemfibrozil to reduce their
cholesterol levels, and a control group of 2030 men took a placebo. During the next five years, 56 men in the
gemfibrozil group and 84 men in the placebo group had heart attacks.
a) Do the results of this study give convincing evidence at the 0.01α = level that gemfibrozil is effective in
preventing heart attacks?
b) Interpret the P-value you got in part a) in the context of this experiment.
The logic behind this example: There are two possible reasons why we might have observed a difference in
the proportions of subjects in our two groups who experienced heart attacks. Either a lower proportion of the
gemfibrozil group experienced heart attacks because gemfibrozil is more effective at preventing heart attacks
than a placebo, or all 140 people in the study who experienced heart attacks would have had a heart attack
regardless of which treatment they received, and the researchers just happened to put a higher proportion of
them in the gemfibrozil group by chance.
Let’s assume that 0
: 0G C
H p p− = is true. That is, there is no difference in the effectiveness of gemfibrozil and
the placebo. All 140 people in the study who experienced heart attacks would have had a heart attack no matter
which treatment they received. We can think about what would happen if we were to repeat the reassignment
many times – dividing the 4081 subjects into two
treatment groups, one with 2051 subjects, and one
with 2030 subjects. Each time, we count how many of
those who end up having a heart attack end up in each
group, then we calculate the difference in sample
proportions, ˆ ˆG C
p p− for each randomization. The
result is called a randomization distribution.
The figure to the right shows the result of 3000 re-
randomizations for this scenario. Notice that the
distribution is approximately Normal with a mean of
0 and a standard deviation of 0.0057, which are very
close to the same mean and standard deviation we get
from the formulas used for situations where we have
two independent random samples! Very convenient!
In the Helsinki Heart Study, the observed difference in the proportions of subjects who had a heart attack in the
gemfibrozil and placebo groups was 0.0273 – 0.0414 = – 0.0141. Only 25 of the 3000 re-randomizations
resulted in a difference in proportions this low or lower by chance, so the estimated P-value is about 0.0083,
which isn’t far off from the P-value we calculated with our formulas. Basically, the P-value is the probability
that we see at least this much of a difference in the sample proportions simply due to chance variation in
random assignment if gemfibrozil is no more effective than a placebo at preventing heart attacks.
Example: Does increasing the amount of calcium in our diet reduce blood pressure? Observational studies have
suggested a link, but researchers designed a randomized comparative experiment to investigate the question of
causation. The subjects were 21 healthy men who volunteered to take part in the experiment. They were
randomly assigned to two groups: 10 of the men received a calcium supplement for 12 weeks, while the control
group of 11 men received a placebo pill that looked identical. The experiment was double-blind. The response
variable is the decrease in systolic blood pressure for a subject after 12 weeks, in millimeters of mercury. (An
increase appears as a negative number.) Here are the data:
Calcium: 7 –4 18 17 –3 –5 1 10 11 –2
Placebo: –1 12 –1 –3 3 –5 5 2 –11 –1 –3
a) Draw parallel dotplots of the data. Based only on the dotplots, do you suspect there will be a significant
difference between the true mean decrease in systolic blood pressure for healthy men like those in the
study who take calcium and those who take a placebo?
b) Do the data provide convincing evidence that a calcium supplement reduces blood pressure more than a
placebo, on average, for subjects like the ones in this study?
The logic behind this example: There are two possible reasons why we observed a difference in mean blood
pressure reduction for the two groups as large as we did. Either the mean blood pressure reduction was higher
for the calcium group because calcium is more effective at lowering blood pressure, or the two treatments are
equally effective, and any differences we saw were simply because of chance variation due to random
assignment.
Assume that 0
: 0C P
H µ µ− = is true. That is, assume that calcium and the placebo are equally effective at
lowering blood pressure. If we reassign the 21 subjects to the two groups many times, assuming the treatment
doesn’t affect each individual’s change in blood pressure, and then calculate the new difference in sample mean
decrease in systolic blood pressure (C P
x x− ) for each re-randomization, we can estimate how likely it is that
we’d see a difference at least as extreme as the one
we actually observed by chance if the two treatments
don’t differ in effectiveness.
The randomization distribution is approximately
Normal with a mean of 0.014 and a standard
deviation of 3.400, which agree well with the mean
and standard deviation we calculated using the
formulas for situations involving two independent
random samples (0 and 3.29). Very convenient!
The observed difference in the mean reduction in
blood pressure in the calcium and placebo groups
was 5.000 – (–0.273) = 5.273. About 660 of the re-
randomizations resulted in differences this high or
higher by chance, so the estimated P-value is about
0.066, which isn’t far off from the P-value we calculated with our formulas. Basically, the P-value is the
probability that we see at least this much of a difference in the sample means simply due to chance variation in
random assignment if calcium is no more effective than a placebo at reducing blood pressure.
AP Statistics – 12.1 Notes
Inference for Linear Regression
Regression Line: A line that describes how a response variable y changes as an explanatory variable x changes.
Correlation (r): A number between –1 and 1 that measures the direction and strength of the linear relationship
between two variables.
Coefficient of Determination 2( r ) : The proportion of the variation in the values of y that is explained by the
least-squares regression line.
Residuals: The differences between the observed value of the response variable and the value predicted by the
regression equation. ˆresidual y y= −
Residual Plot: A plot of the explanatory variable (x) vs. the residuals.
If we have all the data for a population, we can calculate the true regression line: .yµ α βx= += += += +
Regression line based on the entire population (all eruptions of Old Faithful in a month)
What can we do if we have a sample? Can a regression line based on a sample tell us anything about the true
regression line of the population?
Regression lines based on samples of 20 eruptions from that month.
Notice how the slope of the regression line is different for each sample, even though they all come from the
same population.
Population Regression Equation:
33.97 10.36yµ x= +
Conditions for Regression Inference (LINER): Suppose we have n observations on an explanatory variable
x and a response variable .y Our goal is to study or predict the behavior of y for given values of .x
• Linear: The actual relationship between x and
y is linear. The mean values of y for each value
of x line up along the population (true)
regression line .yµ α βx= +
• Independent: Individual observations are
independent of each other (or the 10% condition
is met).
• Normal: For each value of x, the y-values are
Normally distributed around the regression line.
• Equal Variance: The standard deviation of ,y
called ,σ is the same for all values of x.
• Random: The data come from a well-designed random sample or randomized experiment.
How to Check Conditions:
• Linear: Look at the scatterplot to make sure the overall pattern is roughly linear. Make sure there are no
curved patterns in the residual plot. Check to see that the residuals appear randomly scattered and are
centered around the “residual = 0” line.
• Independent: Look at how the data were produced. If sampling is done without replacement, check the
10% condition. If the study is a randomized experiment, good design with proper controls and random
assignment help ensure the independence of individual observations.
• Normal: Make a stemplot, histogram, or Normal probability plot of the residuals and check to make
sure there isn’t extreme skewness, outliers, or other major departures from Normality.
• Equal Variance: Look at the scatter of the residuals above and below the “residual = 0” line in the
residual plot. The amount of scatter should be roughly the same from the smallest to the largest x-value.
Make sure you don’t see a “fan” pattern – a tight cluster at one end of the graph and a spread out pattern
at the other end.
• Random: The data come from a well-designed random sample or randomized experiment.
Example: Many people believe that students learn better if they sit closer to the front of the classroom. Does
sitting closer cause higher achievement or do better students simply choose to sit nearer to the front? To
investigate, an AP Statistics teacher randomly assigned students to seat locations in his classroom for a
particular chapter and recorded the test score for each student at the end of the chapter. The explanatory variable
in this experiment is which row the student was assigned to (Row 1 is closest to the front and Row 7 is farthest
away). Here are the results.
Row 1: 76, 77, 94, 99
Row 2: 83, 85, 74, 77
Row 3: 90, 88, 68, 78
Row 4: 94, 72, 101, 70, 79
Row 5: 76, 65, 90, 67, 96
Row 6: 88, 79, 90, 83
Row 7: 79, 76, 77, 63
Predictor Coef SE Coef T P
Constant 85.706 4.239 20.22 0.000
Row -1.1171 0.9472 -1.18 0.248
S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%
a) What is the equation of the least-squares regression line?
b) Interpret the slope of the least-squares regression line in this context.
c) Interpret the value of 2r in this context.
d) Why was it important to randomly assign the students to seats rather than letting each student choose
where to sit?
e) Check to see if the conditions for inference are met. A residual plot and a histogram of the residuals are
shown below.
7654321
20
10
0
-10
-20
Row
Residual
20151050-5-10-15
7
6
5
4
3
2
1
0
Residual
Frequency
7654321
100
90
80
70
60
Row
Score
Parameters for the True Regression Line yµ α βx= += += += +
• α is the true y-intercept.
• β is the true slope.
• σ is the standard deviation. It describes the variability of the response y about the population (true)
regression line. It basically says how tightly packed the observations are around the line.
Estimating α and :β In the regression line for a sample, ˆ ,y a bx= + the slope b is an unbiased estimator of
the true slope ,β and the intercept a is an unbiased estimator of the true intercept .α
Estimating :σ We are often interested in how tightly the data are clustered around the regression line. Since σ
is unknown, we use ,s which is the standard deviation of the residuals. Remember that s can be interpreted as
the typical prediction error, or the typical or average distance of the observed values from the predicted values.
( )
22 ˆresiduals
2 2
y ys
n n
−= =
− −
∑ ∑.
Inference about the True Slope, β
Usually, the most important parameter in a regression problem is the true slope, .β The slope tells us how much
y is predicted to change, on average, each time x changes by 1 unit.
The standard error of the slope (((( ))))bSE is the standard deviation of the
sampling distribution of b – the standard deviation of the slopes of regression
lines formed by taking repeated samples of the same size from the population.
It measures how much the slopes of the sample regression lines from repeated
samples typically vary from the slope of the population regression line.
The graph at the left shows the slopes of the regression lines for 1000 samples
of size 20n= from the Old Faithful data.
Normally, we get the value of b
SE from computer output, but the formula is .1
b
x
sSE
s n=
−
� If 0,β = that means the mean of y does not change at all when x changes. In other words, it means there
is no true linear relationship between x and .y
� When data from a random sample or a randomized experiment suggest that there is an association
between two variables, there are two possible explanations for why the slope differs from 0. We do
inference to decide which explanation seems more plausible.
o Explanation 1: There really is no association between the variables, and we got a nonzero slope
due to sampling variability or the chance variation due to random assignment.
o Explanation 2: There really is an association between the two variables.
Inference about the true slope, ,β involves a t curve with 2n− degrees of freedom.
Confidence Interval for the True Slope, :β
( )( )statistic critical value standard error of the statistic
*b
b t SE
±
±
Use the t curve with 2n − degrees of freedom.
Example: Here is the computer output from the previous example (test score vs. row #):
Predictor Coef SE Coef T P
Constant 85.706 4.239 20.22 0.000
Row -1.1171 0.9472 -1.18 0.248
S = 10.0673 R-Sq = 4.7% R-Sq(adj) = 1.3%
a) Identify the standard error of the slope, ,b
SE from the computer output. Interpret this value in context.
b) Calculate a 95% confidence interval for the true slope. Show your work. Interpret the interval in context.
c) Based on your interval, is there convincing evidence that seat location affects scores?
Example: For their second-semester project, two AP Statistics students decided to investigate the effect of
sugar on the life of cut flowers. They went to the local grocery store and randomly selected 12 carnations. All
the carnations seemed equally healthy when they were selected. When the students got home, they prepared 12
identical vases with exactly the same amount of water in each vase. They put one tablespoon of sugar in 3
vases, two tablespoons of sugar in 3 vases, and 3 tablespoons of sugar in 3 vases. In the remaining vases, they
put no sugar. After the vases were prepared and placed in the same location, the students randomly assigned one
flower to each vase and observed how many hours each flower continued to look fresh. Here are the data along
with computer output from a least-squares regression analysis:
Predictor Coef SE Coef T P
Constant 181.200 3.635 49.84 0.000
Sugar (Tbsp.) 15.200 1.943 7.82 0.000
S = 7.52596 R-Sq = 86.0% R-Sq(adj) = 84.5%
Sugar (Tbsp.) Freshness (hrs.)
0 168
0 180
0 192
1 192
1 204
1 204
2 204
2 210
2 210
3 222
3 228
3 234
3.02.52.01.51.00.50.0
240
230
220
210
200
190
180
170
160
Sugar (Tbsp.)
Freshness (hrs.)
3.02.52.01.51.00.50.0
10
5
0
-5
-10
-15
Sugar (Tbsp.)
Residual
1050-5-10-15
4
3
2
1
0
Residual
Frequency
a) Construct and interpret a 99% confidence interval for the slope of the true regression line.
b) Would you feel confident predicting the hours of freshness if 10 tablespoons of sugar are used? Explain.
Significance Tests about the True Slope, :β
Hypotheses:
0 :H 0β = (There is no true linear relationship between x and .y )
:a
H 0β> (There is a positive correlation – y increases as x increases)
-or- 0β< (There is a negative correlation – y decreases as x increases)
-or- 0β ≠ (There is a linear relationship – y changes when x changes)
Test Statistic: b
bt
SE= with 2n− degrees of freedom.
P-value: The probability of getting a t statistic this large or larger in the direction specified by the alternative
hypothesis.
� The P-value given by computer output is for a two-sided test. You must divide it by two if you are
doing a one-sided test.
Example: Do customers who stay longer at buffets give larger tips? An AP Statistics student who worked at an
Asian buffet decided to investigate this question. While doing her job as a hostess, she obtained a random
sample of receipts, which included the length of time (in minutes) the party was in the restaurant and the
amount of the tip (in dollars). Do these data provide convincing evidence that customers who stay longer give
larger tips? Here are the data and computer output.
Predictor Coef SE Coef T P
Constant 4.535 1.657 2.74 0.021
Time (min.) 0.03013 0.02448 1.23 0.247
S = 1.77931 R-Sq = 13.2% R-Sq(adj) = 4.5%
a) Describe what the scatterplot tells you about the relationship between the two variables.
b) What is the equation of the least-squares regression line for predicting the amount of the tip from the
length of the stay? Define any variables you use.
c) Interpret the slope and the y-intercept of the least-squares regression line in context.
d) Carry out an appropriate test to determine whether the data provide convincing evidence that customers
who stay longer give larger tips. Assume the conditions for inference have been met.
Time (min.) Tip ($)
23 5.00
39 2.75
44 7.75
55 5.00
61 7.00
65 8.88
67 9.01
70 5.00
74 7.29
85 7.50
90 6.00
99 6.50
1009080706050403020
9
8
7
6
5
4
3
2
T ime (min.)Tip ($)
1009080706050403020
3
2
1
0
-1
-2
-3
T ime (min.)
Residual
210-1-2-3
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Residual
Frequency
Example: A random sample of 11 used Honda CR-Vs from the 2002-2006 model years was selected from the
inventory at www.carmax.com. The number of miles driven and the advertised price were recorded for each
CR-V. A 95% confidence interval for the slope of the true least-squares regression line for predicting advertised
price from number of miles (in thousands) driven is ( )122.3, 50.1 .− − Based on this interval, what conclusion
should we draw from a test of 0 : 0H β = versus : 0a
H β ≠ at the 0.05α = significance level?