View
79
Download
0
Category
Tags:
Preview:
DESCRIPTION
Estimation and Weighting, Part I. Goal of Estimation. Minimize a survey’s total error Sampling Error is error arising solely from the sampling process (measure: variance) Mainly a function of sample size Surveys are also subject to biases from nonsampling errors such as: - PowerPoint PPT Presentation
Citation preview
Copyright 2010, The World Bank Group. All Rights Reserved.
Estimation and Weighting, Part I
Copyright 2010, The World Bank Group. All Rights Reserved.
Goal of Estimation
Minimize a survey’s total error
• Sampling Error is error arising solely from the sampling process (measure: variance)– Mainly a function of sample size
• Surveys are also subject to biases from nonsampling errors such as: – Coverage errors and non-probability sampling– Response errors– Nonresponse
2
Copyright 2010, The World Bank Group. All Rights Reserved.
Typical Estimation Steps
The estimation steps for a typical household survey avoid or help control some nonsampling errors• Editing and Imputation are aimed at controlling response
errors• Basic Weighting based on probabilities of selection produces
essentially unbiased estimates when there is 100% response and no response error
• Nonresponse Adjustment helps avoid some obvious biases that arise when nonrespondents are ignored
• Population Controls help minimize some coverage problems
3
Copyright 2010, The World Bank Group. All Rights Reserved.
Editing and Imputation
Editing– deleting or correcting unacceptable data values– coding/combining data to classify respondents
Imputation – insert values for missing data– for missing items (imputation is common)– For missing HH or persons (not used as often)– modeling methods– Hot deck methods
4
Copyright 2010, The World Bank Group. All Rights Reserved.
5
Item Nonresponse Imputation
When a household is interviewed and a small amount of data is not obtained for a person, imputing for the missing data creates a complete data set.
Hot Deck Method: Use answers from another similar unit to impute answers for an item nonresponse – “nearest neighbor”
Modeling Method: Mathematically impute an answers for an item nonresponse
Copyright 2010, The World Bank Group. All Rights Reserved.
Example of Imputation
Suppose a woman aged 29, was employed last month. This month, we were not able to obtain her labor force status. Construct a “transition matrix” using records of “similar” persons with labor force status coded in both months – use females aged 24-45.
Last MonthThis
MonthEMP UE NILF TOTAL
EMP 120 10 7 137UE 2 20 5 27
NILF 5 2 50 57Total 127 32 62 221
6
Copyright 2010, The World Bank Group. All Rights Reserved.
7
Example of Imputation
Employed Last Month
EMP UE NILF
Sample Frequency
120
2
5
Estimated Probability
0.9449
0.0157
0.0394
Range for 0 rn 1
[0, 0.9449]
[0.9449, 0.9606]
[0.9606, 1]
Based on Frequencies, Compute Probabilities
Copyright 2010, The World Bank Group. All Rights Reserved.
Example of Imputation
• Generate a random number between 0 and 1• If rn = .7221, for example, then rn falls in the range [0, .9449] and
“employed” is imputed for this month– Will happen 94.49% of the time
• No guarantee that this is right for the particular data item that is imputed
• Imputed data set is complete and preserves known relationships
8
Copyright 2010, The World Bank Group. All Rights Reserved.
9
Example of ImputationWould you impute a labor force status? Maybe not:• Usually a determination will be made concerning how much
data is required for a response to be accepted by a survey • For a labor force survey, enough information to determine LF
status will probably be required
Copyright 2010, The World Bank Group. All Rights Reserved.
Purpose of Weighting
Estimate the number of persons each person in a sample household represents
Each person interviewed helps represent– not-in-sample population of the area
(geographic stratum) where the person lives
– sample persons not interviewed– Generally, persons of the same age,
race, gender, and ethnic origin as the person interviewed
10
Copyright 2010, The World Bank Group. All Rights Reserved.
Basic Weights
Applied at the household level (all persons in HH have the same basic weight)
Inverse of probability of selectionIn a typical HH sample there are two stages of sampling
and two probabilities– 1st stage probability for an EA EAprob– 2nd stage probability for HH in that EA HHprob– TOTprob = EAprob * Hhprob– Baseweight = 1/TOTprob
11
Copyright 2010, The World Bank Group. All Rights Reserved.
Base Weights
• Self weighting samples are not common• Primary stratifier for HH surveys is geography, such as
state – often the base weights in a state are all equal– OR nearly the same
• For a self-weighting stratum use N/n: Number N of HHs on the Frame Number n of HHs in the Sample
12
Copyright 2010, The World Bank Group. All Rights Reserved.
Example of Basic Weighting
Sample Count
Sample
HHs
Base
Estimates After Basic Weighting
EMP UE HHs on Frame Weight EMP UE
State A
3,000
400
2,000
500,000
250
750,000
100,000
State B
2,750
250
1,750
175,000
100
275,000
25,000
13
Copyright 2010, The World Bank Group. All Rights Reserved.
Example of Basic Weighting
• Self-weighting within state• State A has N= 500,000 and sample n=2,000
– baseweight = N/n = 500,000/2,000 = 250– An estimate of employment obtained by multiplying sample
count (EMP = 3,000) by the baseweight • 3,000 x 250 = 750,000
• State B has N= 175,000 and sample n=1,750– baseweight = N/n = 175,000/1,750 = 100– An estimate of unemployment obtained by multiplying
sample count (UE = 250) by the baseweight • 250 x 100 = 25,000
14
Copyright 2010, The World Bank Group. All Rights Reserved.
Simple Weighted Estimates
Estimate x of a Total X• A Simple Weighted Estimate adds persons using their
weights (wi weight for ith person)
• Sum across all persons in the sample• xi is a data value for person i
– for example xi = 1 for employed, 0 otherwise
m
iixwx1
15
Copyright 2010, The World Bank Group. All Rights Reserved.
Simple Weighted Estimates Example
Continue the previous example for State A• Simple Weighted Estimate of employment
xi = 1 for employed, 0 otherwise
• Can restrict sum to the 3,000 employed – since xi=0 for the other responding persons
000,750250*3000
)1(2503000
1
3000
1
000,4
1
ii
m
ii xwxwx
16
Recommended