Upload
taylor-pope
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Variance Estimation When Variance Estimation When Donor Imputation is Used to Donor Imputation is Used to
Fill in Fill in Missing ValuesMissing Values
Jean-François Beaumont and Cynthia BocciJean-François Beaumont and Cynthia BocciStatistics CanadaStatistics Canada
Third International Conference on Establishment Third International Conference on Establishment SurveysSurveys
Montréal, June 18-21, 2007Montréal, June 18-21, 2007
2
OverviewOverview
ContextContext
Donor imputationDonor imputation
Variance estimationVariance estimation
Simulation studySimulation study
ConclusionConclusion
3
Context Context
Population parameter to be estimated Population parameter to be estimated ::
Domain total: Domain total:
Estimator in the case of full response:Estimator in the case of full response:
Calibration estimator Calibration estimator
Horvitz-Thompson estimatorHorvitz-Thompson estimator
Uk kkdy ydt
sk kkkdy ydwt̂
4
Donor ImputationDonor Imputation
Imputed estimator :Imputed estimator :
With donor imputation, the imputed With donor imputation, the imputed value isvalue is
A variety of methods can be A variety of methods can be considered in order to find a donor considered in order to find a donor ll((kk) for the recipient ) for the recipient kk
sk kkk
Idy ydwt̂
otherwise,
,*k
rkk y
skyywith
rklk sklyy )(donorafor,)(*
5
Donor ImputationDonor Imputation
Two simple examples:Two simple examples:
Random Hot-Deck Imputation Within Random Hot-Deck Imputation Within ClassesClasses
Nearest-neighbour imputationNearest-neighbour imputation
Practical considerations that add Practical considerations that add some complexity to the imputation some complexity to the imputation process:process:
Post-imputation edit rulesPost-imputation edit rules
hierarchical imputation classes hierarchical imputation classes
6
Imputation ModelImputation Model
Most imputation methods can be Most imputation methods can be justified by an imputation model:justified by an imputation model:
The donor imputed estimator is The donor imputed estimator is assumed to be approximately assumed to be approximately unbiased under the model:unbiased under the model:
0),|,(
),|(
),|(2
rlkm
krkm
krkm
ssyyC
ssyV
ssyE
0,|ˆˆE )( msk
klkkkrdyIdym dwsstt
7
CurrentVariance CurrentVariance Estimation MethodsEstimation Methods
Assuming negligible sampling fractionsAssuming negligible sampling fractions
Chen and Shao (2000, JOS) for NN Chen and Shao (2000, JOS) for NN imputation imputation
Resampling methodsResampling methods
Our method is closely related to:Our method is closely related to:
Rancourt, Särndal and Lee (1994, proc. Rancourt, Särndal and Lee (1994, proc. SRMS): Assumes a ratio model holdsSRMS): Assumes a ratio model holds
Brick, Kalton and Kim (2004, SM): Brick, Kalton and Kim (2004, SM): Condition on the selected donorsCondition on the selected donors
8
Imputation Model Imputation Model ApproachApproach
Variance decomposition of Särndal Variance decomposition of Särndal (1992, SM):(1992, SM):
For any donor imputation method, For any donor imputation method, we have: we have:
MIXNRSAM
MIX
22
VVV
VˆˆEˆVEˆE
dy
Idympqdypmdy
Idympq ttttt
m
r
siiikkdk
sk kdkIdy
dwkilIdwW
yWt
))((
where,ˆ
9
Estimation of the Estimation of the nonresponse variancenonresponse variance
The estimation of the nonresponse The estimation of the nonresponse variance is achieved by estimatingvariance is achieved by estimating
Noting that the nonresponse error Noting that the nonresponse error is:is:
Then, the nonresponse variance Then, the nonresponse variance estimator is:estimator is:
rdyIdymrdy
Idym ssttsstt ,|ˆˆV,|)ˆˆ(E 2
mr sk kkksk kkkdkdyIdy ydwydwWtt -)(ˆˆ
2222 ˆˆ)(,|ˆˆV̂ ksk kksk kkkdkrdyIdym
mrdwdwWsstt
10
Estimation of the mixed Estimation of the mixed componentcomponent
Similarly, the estimation of the mixed Similarly, the estimation of the mixed component is achieved by estimatingcomponent is achieved by estimating
The mixed component estimator is:The mixed component estimator is:
This component can be either positive This component can be either positive or negative and may not always be or negative and may not always be negligiblenegligible
rdydydyIdym sstttt ,|ˆ,ˆˆCov2
2 2ˆ ˆ ˆV 2 ( 1)( ) 2 ( 1)r m
MIX k dk k k k k k k k kk s k s
w W w d d w w d
11
Estimation of the sampling Estimation of the sampling variancevariance
Let be the full response Let be the full response variance est.variance est.
The strategy consists of The strategy consists of
EstimatingEstimating
Replace by their estimates the unknownReplace by their estimates the unknown
This leads to the sampling variance This leads to the sampling variance estimator: estimator:
E ( ) | , ,m r rv y s s Y
)ˆ(V̂)( θyv p
2and kk
msk
kkkk dwyv 22ˆ ˆ)1()(
12
Estimation of the sampling Estimation of the sampling variancevariance
This strategy is essentially equivalent This strategy is essentially equivalent toto Randomly imputing the missing values Randomly imputing the missing values
using the imputation modelusing the imputation model Computing the full response sampling Computing the full response sampling
variance estimator by treating these variance estimator by treating these imputed values as true valuesimputed values as true values
Repeating this process a large number of Repeating this process a large number of times and taking the average of the times and taking the average of the sampling variance estimatessampling variance estimates
Similar to multiple imputation Similar to multiple imputation sampling variance estimatorsampling variance estimator
13
Simulation studySimulation study
Generated a population of size 1000Generated a population of size 1000 Two y-variables: Two y-variables:
LIN: Linear relationship between y and x LIN: Linear relationship between y and x NLIN: Nonlinear relationship between y NLIN: Nonlinear relationship between y
and xand x
Two different sample sizes: Two different sample sizes: Small sampling fraction: n=50Small sampling fraction: n=50 Large sampling fraction: n=500Large sampling fraction: n=500
Response probability depends on x Response probability depends on x with an average of 0.5with an average of 0.5
14
Simulation studySimulation study
Imputation: Nearest-Neighbour Imputation: Nearest-Neighbour imputation using x as the matching imputation using x as the matching variablevariable
Estimation ofEstimation of
LIN: Linear model in perfect agreement LIN: Linear model in perfect agreement with the LIN y-variablewith the LIN y-variable
NPAR: Nonparametric estimation using NPAR: Nonparametric estimation using the procedure TPSPLINE of SAS the procedure TPSPLINE of SAS
2and kk
15
Simulation studySimulation study
Two objectives:Two objectives: Compare the two ways of estimatingCompare the two ways of estimating
LIN and NPARLIN and NPAR
Compare three nonparametric methods:Compare three nonparametric methods: NPARNPAR
NPAR_Naïve: NPAR with the sampling NPAR_Naïve: NPAR with the sampling variance being estimated by the naïve variance being estimated by the naïve sampling variance (Brick, Kalton sampling variance (Brick, Kalton and Kim, 2004)and Kim, 2004)
CS : method of Chen and Shao (2000) CS : method of Chen and Shao (2000)
2and kk
)( yv
16
Results: Large sampling Results: Large sampling fractionfraction
MethoMethodd
Relative Bias Relative Bias in %in % RRMSE in %RRMSE in %
y-LINy-LIN y-y-NLINNLIN y-LINy-LIN y-y-
NLINNLIN
LINLIN -2.4-2.4 358.4358.4 15.715.7 514.1514.1
NPARNPAR -0.3-0.3 -18.8-18.8 21.521.5 54.654.6
17
Results: Small sampling Results: Small sampling fractionfraction
MethoMethodd
Relative Bias Relative Bias in %in % RRMSE in %RRMSE in %
y-LINy-LIN y-y-NLINNLIN y-LINy-LIN y-y-
NLINNLIN
NPARNPAR -4.9-4.9 -13.3-13.3 41.841.8 245.4245.4
NPAR_NPAR_NaïveNaïve -5.9-5.9 -10.4-10.4 42.142.1 265.8265.8
CSCS -9.1-9.1 -9.4-9.4 52.852.8 257.8257.8
18
Results: Large sampling Results: Large sampling fractionfraction
MethoMethodd
Relative Bias Relative Bias in %in % RRMSE in %RRMSE in %
y-LINy-LIN y-y-NLINNLIN y-LINy-LIN y-y-
NLINNLIN
NPARNPAR -0.3-0.3 -18.8-18.8 21.521.5 54.654.6
NPAR_NPAR_NaïveNaïve -0.3-0.3 -12.0-12.0 21.821.8 69.169.1
CSCS 33.933.9 59.659.6 53.753.7 118.7118.7
19
ConclusionConclusion
Nonparametric estimation of Nonparametric estimation of seems beneficial (robust) with seems beneficial (robust) with Nearest-Neighbour imputationNearest-Neighbour imputation
Our proposed method is valid even Our proposed method is valid even for large sampling fractionsfor large sampling fractions
It seems to be slightly better to use It seems to be slightly better to use our sampling variance estimator our sampling variance estimator instead of the naïve sampling instead of the naïve sampling variance estimatorvariance estimator
2and kk
20
ConclusionConclusion
Work done in the context of Work done in the context of developing a variance estimation developing a variance estimation system (SEVANI)system (SEVANI)
Methodology implemented in the Methodology implemented in the next version 2.0 of SEVANInext version 2.0 of SEVANI
Estimation of :Estimation of :
Linear modelLinear model
Nonparametric estimation Nonparametric estimation
2and kk
21
Thanks - MerciThanks - Merci
For more For more information information please please contactcontact
Pour plus Pour plus d’informationd’information, veuillez , veuillez contactercontacter
Jean-François BeaumontJean-François [email protected]@statcan.ca
Cynthia BocciCynthia BocciCynthia.BocciCynthia.Bocci@@statcan.castatcan.ca