Upload
gwendolyn-caldwell
View
212
Download
0
Embed Size (px)
Citation preview
Lecture 3 Preview: Interval Estimates and the Central Limit Theorem
Review
Populations, Samples, Estimation Procedures, and the Estimate’s Probability Distribution
Why Is the Mean of the Estimate’s Probability Distribution Important?
Why Is the Variance of the Estimate’s Probability Distribution Important?
Normal Distribution: A Way to Estimate Probabilities
Relative Frequency Interpretation of Probability
Random Variables
Clint’s Dilemma and His Opinion Poll
Interval Estimates
Central Limit Theorem
Properties of the Normal Distribution
Using the Normal Distribution Table: An Example
Justifying the Use of the Normal Distribution
Normal Distribution’s Rules of Thumb
Mean and Variance of the Estimate’s Probability Distribution for a Sample Size of T
Review
Populations, Samples, and Estimation ProceduresQuestion: How can we use sample information to draw inferences about a population?
Random Variables: Before the experiment is conducted: Bad news. What we do not know: We cannot determine the numerical value of the random variable with certainty before the experiment is conducted.Good news. What we do know: On the other hand, we can often calculate the random variable’s probability distribution telling us how likely it is for the random variable to equal each of its possible numerical values.
Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment the distribution of the numerical values from the experiment mirrors the random variable’s probability distribution.
The mean reflects the center of the distribution. The variance reflects the spread of the distribution.
An example, Clint’s poll: 12 of the 16 individuals polled support Clint. EstFrac = .75Question: Does this poll definitely prove that Clint is ahead?Answer: No. It is possible for 12 (or more) individuals to support Clint in one poll even when the election is a toss up.
Question: How do we describe a distribution?
Distribution of the Numerical Values After many, many repetitions Probability Distribution
Answer: Center (Mean) and Spread (Variance)
Opinion Poll: Sample Size Equals T
Write the names of every individual in the population on a card.Perform the following procedure T times:
Thoroughly shuffle the cards.Randomly draw one card.Ask that individual if he/she supports Clint; the answer determines the numerical value of vt:
Replace the card.
Calculate the fraction of those polled supporting Clint.
Question: What do we know about the vt’s?From our last class – Sample Size of 2: Mean[v1] = Mean[v2] = p
Mean[vt] = p for each t; that is, Mean[v1] = Mean[v2] = … = Mean[vT] = p
From our last class – Sample Size of 2: Var[v1] = Var[v2] = p(1p)Var[vt] = p(1p) for each t; that is, Var[v1] = Var[v2] = … = Var[vT] = p(1p)
where T = Sample Size
From out last class – Sample Size of 2: v1 and v2 are independent; their covariance equals 0The vt’s are independent; hence, their covariances equal 0.
where p = ActFrac = Actual fraction of the population supporting Clint
vt equals 1 if the tth individual polled supports Clint; 0 otherwise.
The estimated fraction, EstFrac, is a random variable.
Mean[vt] = p for each t; that is, Mean[v1] = Mean[v2] = … = Mean[vT] = p
Var[vt] = p(1p) for each t; that is, Var[v1] = Var[v2] = … = Var[vT] = p(1p)
The vt’s are independent; that is, all their covariances equal 0where p = ActFrac = Actual fraction of the population supporting Clint
Mean[cx] = cMean[x]
Mean[x + y] = Mean[x] + Mean[y]
How many p terms are there? T
Mean[cx] = cMean[x]
Mean[x + y] = Mean[x] + Mean[y]
Mean[v1] = Mean[v2] = … = Mean[vT] = p
Distribution Center: Mean of the Estimate’s Probability Distribution
Mean[vt] = p for each t; that is, Mean[v1] = Mean[v2] = … = Mean[vT] = p
Var[vt] = p(1p) for each t; that is, Var[v1] = Var[v2] = … = Var[vT] = p(1p)
The vt’s are independent; hence, all their covariances equal 0where p = ActFrac = Actual fraction of the population supporting Clint
Var[cx] = c2Var[x]
Var[x + y] = Var[x] + 2Cov[x, y] + Var[y]
How many p(1p) terms are there?
Var[x + y] = Var[x] + Var[y]
Summary:
T
Var[cx] = c2Var[x]
Var[x + y] = Var[x] + Var[y]
Var[v1] = Var[v2] = … = Var[vT] = p(1p)
Distribution Spread: Variance of the Estimate’s Probability Distribution
Simulations: Confirming the equations.
Mean[EstFrac] = ActFrac = pVar[EstFrac] =
Mean of Variance of Mean (Average) of Variance of EstFrac’s EstFrac’s Numerical Values Numerical ValuesSample Prob Prob Simulation of EstFrac from of EstFrac from Size Dist Dist Repetitions the Experiments the Experiments
1
2
25
100
400
.50
.50
.50
.50
.50
>1,000,000 .50 .25
>1,000,000 .50 .125
>1,000,000 .50 .01
>1,000,000 .50 .0025
>1,000,000 .50 .000625
Two QuestionsWhy is the distribution center (mean) important?
Why is the distribution spread (variance) important?
Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the actual numerical values mirrors the probability distribution of the random variable. Both distributions have the same mean and variance.
Lab 3.1
More specifically, Mean[EstFrac] = ActFrac. Why is this important?
Question: Why is the mean of the estimate’s probability distribution important?A mean describes the center of its probability distribution.
Mean[EstFrac] = ActFrac
Conceptually, an estimation procedure is unbiased whenever it does not systematically underestimate or overestimate the actual population fraction.
If the probability distribution is symmetric, we have even more intuition.the chances that the estimated fraction is
too low
the chances that the estimated fraction is
too highequal
Average of the estimate’s
numerical values after many, many repetitions
Unbiased Estimation Procedure
Formally, an estimation procedure is unbiased whenever the mean of the estimated fraction’s probability distribution equals the actual population fraction.
Relative Frequency Interpretation of Probability
Lab 3.2
Mean[EstFrac]
Probability Distribution of EstFrac
ActFrac
EstFrac
In one poll,
So, we have already shown that Clint’s
estimation procedure is unbiased.
Average of the estimate’s numerical values after many, many repetitions
= ActFrac
=
Now we have some intuition.
Question: Why is the variance of the estimate’s probability distribution important when the estimation procedure is unbiased?
Claim: When the estimation procedure is unbiased, the reliability of the estimated fraction depends on the variance of the estimated fraction’s probability distribution.
Interval Estimate Question: What is the probability that the estimated fraction from a single poll lies close to the actual value?
Small probability Large probability
Estimate is unreliable
Estimate is reliable
Decide on a close to criterion: .05
Population Fraction = ActFrac = p Simulations: Percent of Repetitions Sample Variance of Random Simulation in which the Numerical Value of Size Variable EstFrac Repetitions EstFrac Lies between .45 and .55
25 .01 100 .0025 400 .000625
>1,000,000 39%>1,000,000 69%>1,000,000 95%
= .50
Question: After many, many repetitions, how frequently is the estimated fraction are close to, within .05 of, the actual population fraction?
Lab 3.3
Quantifying Reliability:
Strategy: A simulation and apply the relative frequency interpretation of probability.
Interval Estimate Question: What is the probability that the estimated fraction from a single poll lies close to, within .05 of, the actual value?
Probability that the Numerical ValueSample Variance of EstFrac’s of EstFrac Lies between .45 and .55 Size Probability Distribution in a Single Poll (One Repetition)
25 .01 100 .0025 400 .000625
.39.69.95
Interval Estimate Question: What is the probability that the numerical value of the estimated fraction from one repetition of the experiment lies close to, within .05 of, the actual population fraction?
ActFrac = .50 Simulations: Percent of Repetitions Sample Variance of EstFrac’s Simulation in which the Numerical Value of Size Probability Distribution Repetitions EstFrac Lies between .45 and .55
25 .01 100 .0025 400 .000625
>1,000,000 39%>1,000,000 69%>1,000,000 95%
Relative Frequency Interpretation of Probability: After many, many repetitions of the experiment, the distribution of the numerical values mirrors the probability distribution.
The portion of estimates that lie within .05 of the actual value,
between .45 and .55,after many, many repetitions
How can we use the simulation results to answer the interval estimate question?
equals
The probability that the estimate lies within .05 of the actual value,
between .45 and .55,in a single poll (one repetition)
Reconsider the interval estimate question:
Sample Variance of EstFrac’s In a Single Poll (One Repetition): Size Probability Distribution Prob[.45 Numerical Value .55]
25 .01 100 .0025 400 .000625
.39.69.95
Variance Large Variance Small
Small probability that the numerical value of the estimated fraction,
EstFrac, from one repetition of the experiment will be close to the actual
population fraction, ActFrac.
Large probability that the numerical
value of the estimated fraction, EstFrac, from one repetition of the
experiment will be close to the actual population fraction, ActFrac.
Estimate is unreliable
Estimate is reliable
Variance large Variance small
Probability Distributions of EstFrac
Mean[EstFrac] = ActFrac Mean[EstFrac] = ActFracEstFrac EstFrac
Summary: When the estimation procedure is unbiased, the variance
tells us how reliable the estimate is.
Generalizing, when an estimation procedure is unbiased:
Sample Size = T = 25
Sample Size = T = 100
Mean[EstFrac] = p
Mean[EstFrac] = p
Sample Size = T = 400
Mean[EstFrac] = p
Strategy for Motivating and Illustrating the Central Limit Theorem: Four Steps
Central Limit Theorem Motivation: Role of the Standard DeviationCentral Limit Theorem: As the sample size becomes larger and larger, we can use the normal distribution to calculate better and better approximations of interval estimates.
Step 2: Use simulations to calculate the percent of repetitions that fall within 1, 2, and 3 standard deviations of Mean[EstFrac], the mean EstFrac’s probability distribution.Step 3: Observe an interesting similarity.Step 4: Introduce the normal distribution and use it to calculate the percent of repetitions that fall within 1, 2, and 3 standard deviations of Mean[EstFrac].
Step 1: Mean, variance, and SD for three sample sizes
Step 1: Use the equations to calculate the mean, variance, and standard deviation of EstFrac’s probability distribution for three sample sizes, 25, 100, and 400.
Summary of Mean and SD Calculations Sample Size 25 100 400Mean[EstFrac] .500 .500 .500SD[EstFrac] .100 .050 .025
Interval: 1 SD
.400-.600From-To Values
Percent of Repetitions 69.2%
Interval: 2 SD’s
From-To Values
Percent of Repetitions
Interval: 3 SD’s
From-To Values
Percent of Repetitions
.300-.700
96.3%
.200-.800
99.9%
.450-.550
68.5%
.400-.600
95.6%
.350-.650
99.8%
.475-.525
68.3%
.450-.550
95.5%
.425-.575
99.7%
Question: What do these results suggest?
Central Limit Theorem Motivation: Role of the Standard DeviationCentral Limit Theorem: As the sample size becomes larger and larger, the normal distribution provides better and better approximations of interval estimates.
Step 2: Use simulations to calculate the percent of repetitions that fall within 1, 2, and 3 standard deviations of Mean[EstFrac], the mean EstFrac’s probability distribution.
Step 3: Observe an interesting similarity.
Answer: The standard deviations, the SD’s, appear to be critical.
Lab 3.4
Normal Distribution: The Famed Bell-Shaped CurveThe variable z: the “normalized” value of the random variable.
z equals the number of standard deviations the value lies from the random variable’s mean:
Normal Distribution TableThe row specifies the z value’s whole number and its tenths.
For example, suppose that z = 1.53:
What is the probability that the random variable would lie more than 1.53 standard deviations above its mean?
1.53 SD’s
.0630Normal Distribution: Three Important Properties
The normal distribution is bell shaped.
The area beneath the normal curve equals 1.
The number in the body of the table estimates the probability that the random variable lies more than z standard deviations above its mean.
.0630
The column the z value’s hundredths.
z SD’s
Probability of being more than z standard deviations about the
distribution mean
The normal distribution is symmetric around its mean (center).
Normal Distribution
Normal Distribution Rules of Thumb
Standard Deviations within Random Probability of Variable’s Mean being within 1 .68 2 .95
3 >.99
Simulations: Percent of Interval: Repetitions within Interval Standard Deviations within Sample Size Random Variable’s Mean 25 100 400 1 69.2% 68.5% 68.3% 2 96.3% 95.6% 95.5% 3 99.9% 99.8% 99.7%
68.26%95.44%99.74%
z 0.00 0.01 0.9 0.1841 0.1814 1.0 0.1587 0.1562 1.1 0.1357 0.1335
z 0.00 0.01 1.9 0.0287 0.0281 2.0 0.0228 0.0222 2.1 0.0179 0.0174
z 0.00 0.01 2.9 0.0019 0.0018 3.0 0.0013 0.0013
1 (.1587 + .1587) = .6826 1 (.0228 + .0228) = .9544 1 (.0013 + .0013) = .9974
.1587.1587
.0228.0228
NormalDistributionPercentages
The area beneath the normal curve equals 1. The normal distribution is symmetric around its mean (center).Normal Distribution
Summary
Central Limit Theorem: As the sample size becomes larger and larger, we can use the normal distribution to calculate better and better approximations of interval estimates.
Revisiting Clint’s DilemmaOn the eve of the election, Clint must decide whether or not to hold a pre-election party:
If he is comfortably ahead, he will not hold the party; he will save his campaign funds for a future political endeavor (or a trip to Cancun).
If he is not comfortably ahead, he will hold the party hoping to capture more votes.
There is not enough time to canvas everyone, however. What should he do?
Econometrician’s Philosophy: If you lack the information to determine the value directly, estimate the value to the best of your ability using the information you do have.
Clint’s Estimation ProcedureQuestionnaire: Are you voting for Clint?
Results: 12 students report that they will vote for Clint and 4 against Clint.
Estimated fraction of population supporting Clint
Clint uses the information collected from the sample to draw inferences about the entire population. Seventy-five percent, .75, of the sample support Clint.
This poll suggests that Clint leads.
Question: Should Clint be confident that he has the election in hand or should he fund the party?
Procedure: Clint selects 16 students at random.
= .75