Upload
eleanor-kelly
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Outlier detection and accommodation for business Outlier detection and accommodation for business surveys utilizing multiple linear regression models in surveys utilizing multiple linear regression models in
edit and imputationedit and imputation
Robert PhilipsRobert Philips
ICES-IIIICES-IIIJune 21June 21stst, 2007, 2007
Presentation OutlinePresentation Outline
E & I for the Monthly Wholesale Retail E & I for the Monthly Wholesale Retail Trade Survey (MWRTS)Trade Survey (MWRTS)
Outlier Model and TheoryOutlier Model and Theory
Illustrative ExampleIllustrative Example
Outlier Procedure for “large” imputation Outlier Procedure for “large” imputation cellscells
Simulation resultsSimulation results
ConclusionConclusion
E & I for the MWRTSE & I for the MWRTS
Statistical edits are run prior to imputation and in part identify which of the respondent data will be used to impute non-respondents.
Statistical editing is done at the industrial grouping by geography level; if not enough units then collapse over geography.
Hidiroglou - Berthelot method (1986) used in conjunction with monthly, yearly and administrative data trend edits.
E & I for the MWRTS (cont.)E & I for the MWRTS (cont.)
;2/1 εσβy WX
ii
titititi w
yyyyI 1,
12,21,1,)(
In general for most E & I classes in the MWRTS the model is of the following form:
ii
mtimtiti w
yyyII
,,3,)(
E & I for the MWRTS (cont.)E & I for the MWRTS (cont.)
The imputation classes are at a finer level of detail than the statistical edit groupings.The principal method for imputation is the bivariate model (60%) and respondents who have passed the univariate statistical edits might actually be considered as outliers during the imputation process.There is clearly a need for an outlier detection routine for the imputation module.
Outlier Model and TheoryOutlier Model and Theory
kmwxy
iiiM
mmmmmm iiiitii
ki
,,1,
:),,( 1)(
forβ
modeltherepresent)(withLet
etcomittingofvectortheisy
εβy
satisfynsobservatioofmajoritythewhile
kiiii
iii
yysykn
WX
,,'1)(
,
1)(
2/1)()()(
Outlier Model and Theory cont…Outlier Model and Theory cont…
nmww
UW mmmmmm
m
iiiiii
i ,,1),2
,2
(~,|
likelyequallyismodeleach
,)(,
1)|( )(
nki Si
k
nkMP
The priors for the parameters are:
.,,
,,1
),(),2
1(~
1
22
2
someand
ββPoisson
iid
RRpk
nii
p
Outlier Model and Theory cont…Outlier Model and Theory cont…
.),,()(
)(
,)(k
Q,2
)(
2/1|1)(
|
2/1|1|)(
1regressionthefromomittingaftertheiswhere
Let
kii yySSEi
S
nkSi
iq
pn
(i)S
S
(i)X
iWt
(i)X
XWtXi
q
.,k
Q
!
)1.0(),,|(*
,max
,,1,0
k
nk
kCWXk
kp
kk
youtliersProb
For
Outlier Model and Theory cont…Outlier Model and Theory cont…
,*),,|(* max
0kpkWXkEk
k
k
y
isoutliersofnumbertheofestimateposteriorThe
.
outlierslikelymostthedetermines
attainediswhereindicesofsetThe
youtlierstheareProband
.,,
)(max),,(
).(k
Q)(
),,,|,,(
*
1
1
*
1
k
k
ii
k
ii
yy
ipii
ipi
qWXkyy
Outlier Model and Theory cont…Outlier Model and Theory cont…
nkSi
iik
k
iiti
i
iitiii
tii
pniiii
pp
XWXpn
S
yWXXWX
tpWXMk
)()()(
**
1)(
1)()(
)(
)(1)()(
1)(
1)()()(
)()()()()(
ˆ
)(2
)(ˆ
~,,,,|
ββ
isβofestimateposteriorthe
varianceand
β
meanwith
variateyβ
Outlier Model and Theory cont…Outlier Model and Theory cont…
k Siii
ti
iik
Si
tiiii
ti
iik
t
kkk
nk
nk
XWXpn
Spp
XWXpn
SpD
DpWXV
)(
1)(
1)()(
)()(
*2*
2
)()()(
1)(
1)()(
)()(
***
)()2(
ˆˆ)()2(
.),,|(
,
isofestimateposteriortheSimilarly
ββand
ββyβ
βofvarianceposteriorThe
An illustrative example
).10,0(05.0)1,0(95.0~
25,,1,2595.030
,200~
2NN
ixy
x
i
iii
i
normaledcontaminatafromwereerrorstheand
Exp
Plot of Simulated data
0
200
400
600
800
1000
1200
0 100 200 300 400 500 600
obs 19
obs 10
obs 25
Y
X
Example cont…
0 0.0000
1 0.3168
2 0.3652
3 0.2143
4 0.1037
k*kp
The posterior estimate of the number of outliers is 2.105. With estimates of 34.755 and 0.967 for the intercept and slope.
Example cont…
The MM (Yohai 1987) M-estimator (high breakdown) indicates that observation 25 is an outlier with high leverage and observation 10 is just of high leverage.
The estimates for the parameters are intercept=35.465 and slope= 0.9629.
Example cont…
i_1 i_2 p_i Intercept Slope19 25 0.713 33.492 0.96516 25 0.036 37.393 0.9731 25 0.029 40.039 0.95815 25 0.021 37.090 0.9556 25 0.019 31.667 0.98322 25 0.015 37.280 0.9704 25 0.014 37.524 0.9685 25 0.013 38.032 0.96513 25 0.013 32.614 0.98010 25 0.012 33.747 0.987
)!25,10(6025,19 thanlikelymoretimesare
thenoutlierspossibletwoonlyarethereIf
Plot of Simulated data
0
200
400
600
800
1000
1200
0 100 200 300 400 500 600
obs 19
obs 10
obs 25
Y
X
Outlier Model and Theory cont…Outlier Model and Theory cont…
Strengths: method works well in detecting outliers and estimating the relevant parameters robustly. All of the data is used.
Drawback: method becomes impractical as the imputation class size n increases, since the number of possible subsets of size k will become astronomically large.
Outlier Procedure for “large” Outlier Procedure for “large” imputation cellsimputation cells
outliers.theidentifytoleveragetheand
usingoriftestpasseachonlargen
mmiim hipp
k
)(,
10*0
ident hii pi P_0 b0 b1 outlier25 0.1849 1.0000 0.0000 35.6930 0.9708 Y19 0.0443 0.6768 0.6676 35.2458 0.9695 Y16 0.0443 0.1338 0.8207 33.5584 0.9651 N
.0655.25.0343.0
5023.8*
10
isestimateThelyrespectiveand
areandforerrorsstandardThe
Simulation resultsSimulation results
Data from MRTS was selected where the number of respondents for the bivariate imputation model > 50 for 3 imputation classes.
For a given simulation (1-p)% in each cell were selected to impute for the remaining units.
The method presented here was compared to the MM M-estimator using the relative difference of the average predictions.
Simulation results for 200 runsSimulation results for 200 runs
p% Method Cell 1 Cell 2 Cell 3
5 MRTS -0.002 -0.020 0.013
5 MM -0.008 -0.028 0.001
10 MRTS -0.000 -0.007 0.015
10 MM -0.008 -0.015 0.003
15 MRTS -0.000 -0.011 0.014
15 MM -0.008 -0.016 0.002
Conclusions
The procedure for outlier detection works well and produces fairly robust estimates. It would also allow for more covariates to be included in the E&I process.Even though the assumption of normality led to the closed form solution of the estimator it is still applicable to situations where modest departures from normality arise.
For more Pour plus
information d’information, please contact veuillez contacter
www.statcan.ca
Robert Philips- e-mail: [email protected] telephone: (613) 951-1493
Merci!