Sampling MICS3 Regional Workshop Survey Design. MICS Sample Design MICS is a complex survey (Multi-stage stratified). MICS is a worldwide program, consistence

Sampling

MICS3 Regional Workshop

“Survey Design”

MICS Sample Design

MICS is a complex survey (Multi-stage stratified). MICS is a worldwide program, consistence &

comparability are important issues. We will discuss only a few of the highlights including:

Sample size determination Stratification and sample allocation Number of Primary Sampling Units and cluster sizes Use of existing sample or new sample A few special topics

Sample Size for MICS

Most important feature of MICS with respect to survey costs. We will discuss:

DETERMINANTS – factors, constraints

INDICATORS to use

FORMULA to calculate sample size

Determinants of Sample Size(Factors and Constraints) Sample size (households) depends on many factors:

Expected size estimate of indicators Expected size estimate of target population(s) Average household size Margin of error wanted Level of confidence wanted “Design effect” (increase in sample error due to use of cluster

survey instead of simple random sample) Expected non-response rate Number of clusters or PSUs Cluster size (number of households per sample cluster) Number of sub-national areas for separate estimates (domains) Survey budget and implementing capability

MICS Recommendations on Sample Size Determinants

FACTOR RECOMMENDATION

1.Expected size estimate of indicators (next slide)2.Expected size estimate of target population12-23 mos [3%]3.Average household size 6 persons4.Relative margin of error wanted 12% of coverage rate5.Level of confidence wanted 95 percent6.Design effect in cluster surveys 1.57.Expected non-response rate 10 percent8.Number of clusters or PSUs - minimum [300-400]9.Cluster size [15-35]10.Number of estimation “domains” wanted [5 or fewer]11.Survey budget (country specific)

For items 2, 3, 6, 7 use available country data (recent survey or census); if not available, use value above.

Indicators for Sample Size Determination Sample size is different for each MICS indicator. Must choose a key indicator, since only one sample size can

be used in MICS. Recommendations for choosing key indicator:

Choose from among main indicators of interest in your country. Choose the one which will yield largest sample size. Usually for a single-year age group, and Usually DPT, measles, polio or tuberculosis immunization - or

birth weight below 2.5 kg Exceptions: Do not choose infant or maternal mortality rates

as the key indicators. Do not choose a low coverage indicator that is desirably low (such as malnutrition prevalence). Do not choose breast-feeding indicators for 4-month age groups.

Checklist for Target Group and Indicator To decide on the appropriate target group and indicator that

you need to determine your sample size: 1. Pick children 12-23 months old - the target population that

comprises the smallest percentage of the total population – probably about 3 percent.

2. For that target group, pick the lowest from among the following coverage rates: - DPT immunization level - Measles immunization level - Polio immunization level - Tuberculosis immunization level

3. Do not pick from the desirably low coverage indicators that is already acceptably low.

Formula for Sample Size

Different formula than MICS2000 MICS2005 formula emphasizes relative margin of error*

instead of 5% absolute error (high coverage indicator) or 3% for low coverage indicator. Less confusing Does not depend on high or low coverage

* The Relative Margin of Error is the percentage of tolerable difference that the estimated proportion can differ from its true value with a given confidence level. It determines the relative length of the confidence interval.

Formula

n = [4 (r) (1 - r) (deff) (1.1)] / [(.12r )2(p)(ave-size)]

where n is the required sample size, expressed as number of households, for

the KEY indicator 4 is factor to achieve 95 percent level of confidence, r is anticipated prevalence (coverage) rate for key indicator, 1.1 is factor to raise sample size by 10 percent for potential nonresponse, deff is shortened symbol for design effect, 0.12r is margin of error to be tolerated, defined as 12 percent of r (12

percent thus represents the relative sampling error of r), p is proportion of total population that smallest group comprises, and ave-size is average household size.

You may use the table on the next page instead of formula if all conditions are satisfied for that table in your country.

Sample Size (Households) Calculation for Proportion Estimation Using Smallest Target Population

Average Household Size

(number of persons)

coverage rate,

r = 0.15

coverage rate,

r = 0.20

Coverage rate, r = 0.30

coverage rate,

r = 0.40

4.5 19,239 13,580 7,922 5,093

5.0 17,315 12,222 7,130 4,583

5.5 15,741 11,111 6,481 4,167

6.0 14,429 10,185* 5,941 3,819

6.5 13,319 9.402 5,484 3,526

Use this table when your

1. Target population is 3 percent of total population; this is generally children 12-23

months old

2. Sample design effect, deff, is assumed to be 1.5 and nonresponse is expected to be 10 percent

3. Relative marginal sampling error is set at 12 percent of estimate of coverage rate, r

Example 1

Target group: Children 12 to 23 months old Percent of population: 3 percent Key indicator: DPT immunization coverage Prevalence (Coverage): 30 percent Deff: No information Non-response: No information Average household size: 6

Checking table => n = 5941

Checklist for Use of Sample Size formula The formula to determine your sample size :

n = [4 (r) (1 - r) (f) (1.1)] / [(.12r)2 (p) (nh)].

Use it if any (one or more) of the following applies in your country:

1) p – the proportion of one-year-old children is other than 3%2) nh – the average household size is less than 4.5 persons or greater

than 6.53) r – the coverage rate of your key indicator is under 20 or over 40

percent4) f - the sample design effect for your key indicator is different from 1.5,

according to accepted estimates from other surveys in your country5) your anticipated non-response rate is more or less than 10 percent.

Example 2 Target group: Children 12 to 23 months old Percent of population: 3.5 percent Key indicator: DPT immunization coverage Prevalence (Coverage): 25 percent Deff: 1.6 Non-response adjustment = 1.05 (response rate

95%) Average household size: 6

n = [4 (.25) (.75) (1.6) (1.05)] / [(.12*.25)2 (.035) (6)] = 1.26/.000189 = 6667.

Stratification & Sample Allocation Stratification is the process of regrouping similar PSUs into sub-groups

(strata).

Effects: better precision, flexible design, small sub-population coverage (or over sampling).

How to do stratification? (region) X (residence type)

Sample allocation: proportional, power allocation, equal size allocation (if budget is too tight).

Implicit stratification: sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc.., then select a pps sample.

There is no unique rule for stratification, it depends on country situation

Number of PSUs and Cluster Size

Survey costs depend not only on number of households but their distribution among Primary Sampling Units (PSUs).

In general, the more PSUs the better for reliability but the greater the cost (usually travel costs).

We recommend 300 to 400 PSUs or more.

Number of PSUs also depends on cluster size.

Cluster size should be as small as practical for reliability.

Example: 8000 households selected in 400 PSUs of 20 households each is much more reliable sample than 200 PSUs of 40 each, but more expensive.

MICS Sampling Option 1

USE AN EXISTING SAMPLE Piggy-back MICS onto DHS or other survey if timely and feasible. Or, use sample from a previous survey and re-interview households for

MICS. Or, use old survey sample EAs and construct new listing of

households to select for MICS. Old sample must be probability-based, national in scope. Possibilities – DHS, other national health survey, recent labour force

survey Possibilities – DHS, other national health survey, recent labour force or

household expenditure surveys Important: design parameters must be known (such as selection

probability, stratification, etc..)

OPTION 1 - USE OF AN EXISTING SAMPLE, continued Advantages of old sample - cost savings - maps available for interviewers - design rigor - simplicity Limitations of old sample - burden on respondents - sample design may need modification * sample size * sub-national coverage * number of PSUs or clusters => Balance between loss and gain


USE NEW SAMPLE WITH HOUSEHOLD LISTING OPERATION Design new MICS sample based on prototype Two stages with census as frame (see comprehensive discussion

in Chapter 4 on frame construction and up-dating old frames) Use of implicit stratification, systematic selection of census EAs at

first stage with pps Create standard segments (DHS approach) List households in selected segments Select households systematically from list Interview only the selected households, no replacement will be

allowed

OPTION 2 - NEW SAMPLE WITH HOUSEHOLD LISTING, continued Advantages of option 2 - simple design - probability-based - if possible self-weighting (national level) Limitations of option 2 - expense of listing households - time necessary to list households [Example, sample size of 5000 households may need 25000

to 50000 households to be listed.]

DHS Method - Option 2

Create “standard” segments. Divide census population in each EA by 500 to

determine number of standard segments. Map sketch segments in each EA. Choose 1 segment at random. List households in selected segment only (instead of

entire EA). Purpose is to reduce listing workload to a manageable

size.


USE NEW SAMPLE WITHOUT HOUSEHOLD LISTING OPERATION

(Modified Segment, or Cluster, Design) Design new MICS sample based on prototype. Two stages with census as frame Use of implicit stratification, systematic selection of census

EAs at first stage with pps Pre-determine number of segments based on desired cluster

size. Map sketch segments in each EA. Choose 1 segment at random. Interview all households in selected segment

OPTION 3 - NEW SAMPLE WITHOUT HOUSEHOLD LISTING, continued Illustration: Suppose desired cluster size is 20 households. Suppose first sample EA contains 112 census

households (according to frame). Divide 112 by 20 = 5.6 (round to 6). Map sketch exactly 6 segments based on canvass of EA. Select one segment at random. Interview all households (no matter how many are

currently in the selected segment).

OPTION 3 - NEW SAMPLE WITHOUT HOUSEHOLD LISTING, continued Advantages of option 3 avoids listing completely probability-based self-weighting (national level)

Limitations of option 3 less reliable than option 2 (households are “clustered” together in compact

segments) segmentation itself can be time-consuming and complicated difficult to control sample size

Special Topics

Sub-national estimates, domains Water and sanitation estimates Survey weighting, sampling errors Other – sample frame construction, selection

techniques Country examples

Sub-national Estimates, Domains Number of separate areas (domains) for which separate,

equally reliable estimates are wanted affects sample size. If, say, 5 regional estimates are wanted, then, theoretically,

sample should be increased by factor of 5. Must be careful therefore in producing separate estimates for

domains. Either limit number of domains to avoid large increase in

sample size, Or be prepared to accept domain estimates with much higher

sampling errors than national.

Water and Sanitation Estimates These are an important component of MICS. Sampling errors will be high, however (extremely high in some

cases). MICS sample is design primarily for person variables rather

than household variables such as water/sanitation. Sample design effects for water and sanitation indicators will

be much higher than for other indicators. Consequently, sampling reliability is very low. Estimates can nevertheless be useful to estimate trends in

water/sanitation if previous surveys exist upon which to make comparison.

Survey Weighting and Sampling Errors All analysis based on survey data must apply survey weights

in order to prevent biased results. Survey weighting is design-specific. Non-response must be

taken into account. Formulas for calculating weights depend on the exact sample

design used in each country.

Sampling Error Estimation

Calculation of sampling errors necessary to evaluate reliability of survey estimates

Should be done for 30-50 important indicators Methodology is complex and design-specific There are several options for sampling error calculations:

May use existing software (Clusters, WesVar, CenVar, PCCarp, etc.)

Latest version of SPSS currently evaluated whether new routines on sampling error are appropriate for MICS3 surveys

Routines in CSPro can be used Or use simple, variance spreadsheet that will be available on the

MICS website, www.childinfo.org

Sampling Error Estimation, continued With spreadsheet, only necessary to enter:

Survey weights for each cluster Unweighted indicator estimate for each cluster

Sampling error automatically calculated Confidence limits, design effect automatically

calculated

Other Topics

Other key information to be included in the MICS3 manual for the sampling statistician to review: Sample frame construction

When new sample is used for MICS Especially important if frame is old

Selection techniques Details of systematic sampling PPS sampling (probability proportionate to size)

Country examples from MICS2000 Papua New Guinea, Lebanon, Angola

Documents

Sampling MICS3 Regional Workshop Survey Design. MICS Sample Design MICS is a complex survey (Multi-stage stratified). MICS is a worldwide program, consistence