18
Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Embed Size (px)

Citation preview

Page 1: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Error and Sample SizesPHC 6716June 1, 2011Chris McCarty

Page 2: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Types of error

•Non-sampling error – Error associated with collecting and analyzing the data

•Sampling error – Error associated with failing to interview the entire population

Page 3: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Non-Sampling Error• Coverage error

▫ Wrong population definition▫ Flawed sampling frame▫ Interviewer or management error in following sampling frame

• Response error▫ Badly worded question results in invalid or incorrect response▫ Interviewer bias changes response

• Non-response error▫ Respondent refuses to take survey or is away▫ Respondent refuses to answer certain questions

• Processing errors▫ Error in data entry or recording of responses

• Analysis errors▫ Inappropriate analytical techniques, weighting or imputation

are applied

Page 4: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Sampling Error• Sampling error is known after the data are collected by calculating

the Margin of Error and confidence intervals

• Surveys don’t have a Margin of Error, questions do

• Power analyses use estimates of the parameters involved in calculating the margin of error

• It is common to see sample sizes of 400 and 1000 for surveys (these are associated with 5% and 3% margins of error)

• In most cases the size of the population being sampled from is irrelevant

• The margin of error should be calculated using the size of the subgroups sampled

Page 5: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Margin of Error Formula

n

zsH

• H = Half interval expressed in units of standard deviation

• z = z score associated with level of confidence (typically 95%)

• s = standard deviation

• n = sample size

Page 6: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

The z score•The z value is the z score associated with a

level of confidence

•Typically (almost exclusively) surveys use 95%

•This means that if the survey were replicated 100 times, 95 times out of 100 the estimate would be within the margin of error

•The z score associated with 95% is 1.96

Page 7: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

The standard deviation (s)

•For a continuous variable the standard deviation is typically not known

•Previous research may suggest some reasonable range for the margin of error

•After you have collected the data the standard deviation is known

Page 8: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Example: Age of Floridians

712.1149.20

496.34

406

)6.17(96.1H

• Sample of 406 Floridians

• Age range 18 to 92

• Mean age of sample = 52.3

• Standard deviation = 17.6

• 95 times out of 100 sample estimate would be between 50.58 and 54.01 (Frequentist interpretation)

Page 9: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Margin of Error for a Proportion

n

ppzH

)1(

• p = proportion

Page 10: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Example: Floridians employed

047.415

)4471(.5529.96.1

)1(

n

ppzH

• Sample of 415 Floridians

• 55.29 percent employed

• 44.47 percent not employed

• 95 times out of 100 the estimate of the percent employed would be between 50.59 and 59.99

Page 11: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Margin of Error with Finite Population Adjustment

)1(

)()1(

N

nN

n

ppzH

Page 12: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Example: Floridians employed with finite population adjustment

0469.)1759,949,6(

)415759,949,6(

415

)4471(.5529.96.1

)1(

)()1(

N

nN

n

ppzH

• With the finite population adjustment the margin of error is .01 percent lower

Page 13: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

H adjusted versus not adjusted as sample size increases

0

20

40

60

80

100

120

1 200 399 598 797 996 1195 1394

n

Mar

gin

of

erro

r

H

H adjusted

• No real value to adjustment until you reach 10 percent of population

• H adjusted falls to zero as you approach a census

• H unadjusted never does

Page 14: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Formula to determine sample size given a desired margin of error

2

22

H

szn

Page 15: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Calculator sites• http://www.americanresearchgroup.com/moe.ht

ml

• http://www.surveysystem.com/sscalc.htm

Page 16: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Power Analysis

n H (%)

100 9.8

200 6.9

300 5.7

400 4.9

500 4.4

600 4.0

700 3.7

800 3.5

900 3.3

1000 3.1

Page 17: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Dillman formula

Ns = (Np) (p) (1 – p)

(Np – 1) (B/C)2 + (p) (1 – p) Where:

Ns = completed sample size needed for desired level of precision Np = size of population (in this case assume 80,000) p = proportion of population expected to choose one of the two response categories (in this case either owner or renter) B = acceptable amount of sampling error (in this case assume +/- 5% = 0.05) C = z statistic associated with the confidence level (in this case assume a 95% confidence level = 1.96)

Page 18: Error and Sample Sizes PHC 6716 June 1, 2011 Chris McCarty

Relationship between cost and sampling error with increases in sample size

0

5000

10000

15000

20000

25000

30000

35000

1 200 399 598 797 996 1195 1394

n

Do

llars

0

20

40

60

80

100

120

Marg

in o

f err

or

(%)

CostH