19
Confidence Intervals I 3/25/2015 NYU Statistics for Social Research Spring 2015

US+Confidence+Intervals+I

Embed Size (px)

Citation preview

Notebook Backgrounds

Confidence Intervals I3/25/2015NYUStatistics for Social ResearchSpring 2015

1

Sample vs. PopulationExample Is the unemployment rate (UR) we know a population-based rate? - Yes or No?

Current Population Survey (CPS) - Monthly survey of a sample of about 50,000 adults regarding job-related activities by the Bureau of Labor Statistics - So vital that job-related estimates from CPS often cause fluctuations in the stock market and influence economic policies

How confident should we be in the estimated UR?

2

Sample vs. Population

3

EstimationDefinition A process whereby a random sample from a population is selected and a sample statistic is used to estimate a population parameter - e.g., using the % of the unemployed estimated from the CPS data to estimate the actual % of the unemployed

Why do this, BTW? - In most cases, we dont know the values of the population parameters nor have enough resources to survey the entire population - With the use of sampling theory and statistical inference, we can approximate the population parameters

4

Point and Interval EstimationPoint estimates Sample statistics used to estimate the exact value of a population parameter

Examples - Rate: unemployment rate, infant mortality rate, college entrance rate, - Mean: years of education, duration of unemployment, - Proportion: % of Americans watching Fox News, MSMBC, or CNN, - Others

Main concern - How accurate are sample statistics? - How to reflect the uncertainty due to sampling variation?

5

Point and Interval EstimationInterval estimates A range of values within which the population parameter may fall

Confidence Interval (CI) A range of values defined by the confidence level within which the population parameter is estimated to fall Also referred to as margin of error Point estimate a margin of error Confidence level - The likelihood that a specified confidence interval will contain the population parameter - Expressed as a percentage or a probability - Commonly used CI levels: 95%, 99%, 90%

6

Point and Interval Estimation

7

CI for the Population MeanInferential statistics Provide a best guess of the population parameters based entirely on information from a sample of the population With the sample, we know: a sample mean, standard deviation, and sample size Combine these statistics with sampling theory (i.e., central limit theorem) to produce confidence intervals Infer the population parameters

8

CI for the Population MeanAverage income of single-parent families

The populationof single-parent familiesA sampleof single-parentfamiliesMean = $36,000n = 100$36,000$2,0001. SamplingProcedure3. Inference aboutpopulation2. Estimation

9

CI for the Population MeanRecall the central limit theorem

The 90-95-99 rule A total of 90% of all random sample means will fall within 1.65 standard error of the true population mean A total of 95% of all random sample means will fall within 1.96 ( 2) standard error of the true population mean A total of 99% of all random sample means will fall within 2.58 standard error of the true population mean

10

CI for the Population Meane.g., 95% confidence level

95 times out of 100, the sample mean is within 2 s.e. of the population mean In 95% of samples, the population mean is within 2 s.e. of the sample mean

11

CI for the Population MeanFormula

Determining the CI Calculate the s.e. of the mean Decide the confidence level Find the corresponding Z value Calculate the CI Interpret the result

12

CI for the Population Meane.g., Average income of single-parent families Mean = $36,000, n = 100 Suppose the population S.D. is $10,000

Say, we decide on a 95% confidence level The corresponding Z value is 1.96 or approximately 2

13

CI for the Population Mean Correct interpretation - We are 95% confident that - In 95% of samples, - 95 times out of 100,the true population mean, , of average income of single-parent families is between $34,000 and $38,000

which means that 5 times of 100, is not included in the specified CIDemonstration

Incorrect interpretation (think about why?) - The probability that is in the specified interval is 95% - The probability that the sample mean is in the specified interval is 95%

0 or 1 Always 114

CI for the Population MeanVarying the confidence level What happens?

Examine the width of CI as the confidence level changes - [upper limit, lower limit]

Trade-off between confidence and precision - The higher confidence, the less precise estimate - The more precise estimate, the lower confidence

3300 3920 516015

CI for the Population MeanEstimating the population S.D. So far, we assume we know the population S.D. But do we really know it?

Applying the central limit theorem in a slightly different way, we see

- In other words, use the estimated s.e. - Replace the actual population S.D. by the sample S.D.

?

16

CI for the Population Meane.g., Average TV watching hours per day, GSS 2008 n = 562, mean = 2.98 hours, S.D. = 2.66 hours Calculate the estimated standard error (s.e.)

95% CI; Z-score is 1.96

Interpretation - We are 95% confident that the actual average TV watching hours would be between 2.76 and 3.2 hours per day.

17

CI for the Population MeanFactors affecting CI Look at the second part of the formula for CI

If the confidence level should be larger, Z should be larger If the standard deviation is larger, the CI is wider less precise If n is larger, the CI is narrower more precise

e.g., Average TV watching hours per day, GSS 2008 Mean = 2.98 hours, S.D. = 2.66 hours

ns.e.95% CIInterval width1950.19[2.61, 3.35]0.745620.11[2.76, 3.20]0.4419870.06[2.86, 3.10]0.24

- To reduce the CI by about one half, we had to nearly quadruple the sample size18

CI for the Population MeanExample. Earnings differential among Hispanics, 2000 Census

Cubans: s.e. = 36298/sqrt(29233) = 212.29 95% CI = 24018 1.96(212.29) = 24018 416 = [23602, 24434]Puerto Ricans: s.e. = 25694/sqrt(66933) = 99.32 95% CI = 18748 1.96(99.32) = 18748 195 = [18553, 18943]Mexicans: s.e. = 23502/sqrt(34620) = 126.31 95% CI = 16537 1.96(126.31) = 16537 248 = [16289, 16785]nEarningsS.D.Cubans29233$24018$36298Puerto Ricans66933$18748$25694Mexicans34620$16537$23502

- Interpret the results - Locate the numbers on the horizontal line and see if there are any overlaps19