23
Outlier detection and accommodation for Outlier detection and accommodation for business surveys utilizing multiple linear business surveys utilizing multiple linear regression models in edit and imputation regression models in edit and imputation Robert Philips Robert Philips ICES-III ICES-III June 21 June 21 st st , 2007 , 2007

Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Embed Size (px)

Citation preview

Page 1: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier detection and accommodation for business Outlier detection and accommodation for business surveys utilizing multiple linear regression models in surveys utilizing multiple linear regression models in

edit and imputationedit and imputation

Robert PhilipsRobert Philips

ICES-IIIICES-IIIJune 21June 21stst, 2007, 2007

Page 2: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Presentation OutlinePresentation Outline

E & I for the Monthly Wholesale Retail E & I for the Monthly Wholesale Retail Trade Survey (MWRTS)Trade Survey (MWRTS)

Outlier Model and TheoryOutlier Model and Theory

Illustrative ExampleIllustrative Example

Outlier Procedure for “large” imputation Outlier Procedure for “large” imputation cellscells

Simulation resultsSimulation results

ConclusionConclusion

Page 3: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

E & I for the MWRTSE & I for the MWRTS

Statistical edits are run prior to imputation and in part identify which of the respondent data will be used to impute non-respondents.

Statistical editing is done at the industrial grouping by geography level; if not enough units then collapse over geography.

Hidiroglou - Berthelot method (1986) used in conjunction with monthly, yearly and administrative data trend edits.

Page 4: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

E & I for the MWRTS (cont.)E & I for the MWRTS (cont.)

;2/1 εσβy WX

ii

titititi w

yyyyI 1,

12,21,1,)(

In general for most E & I classes in the MWRTS the model is of the following form:

ii

mtimtiti w

yyyII

,,3,)(

Page 5: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

E & I for the MWRTS (cont.)E & I for the MWRTS (cont.)

The imputation classes are at a finer level of detail than the statistical edit groupings.The principal method for imputation is the bivariate model (60%) and respondents who have passed the univariate statistical edits might actually be considered as outliers during the imputation process.There is clearly a need for an outlier detection routine for the imputation module.

Page 6: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and TheoryOutlier Model and Theory

kmwxy

iiiM

mmmmmm iiiitii

ki

,,1,

:),,( 1)(

forβ

modeltherepresent)(withLet

etcomittingofvectortheisy

εβy

satisfynsobservatioofmajoritythewhile

kiiii

iii

yysykn

WX

,,'1)(

,

1)(

2/1)()()(

Page 7: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and Theory cont…Outlier Model and Theory cont…

nmww

UW mmmmmm

m

iiiiii

i ,,1),2

,2

(~,|

likelyequallyismodeleach

,)(,

1)|( )(

nki Si

k

nkMP

The priors for the parameters are:

.,,

,,1

),(),2

1(~

1

22

2

someand

ββPoisson

iid

RRpk

nii

p

Page 8: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and Theory cont…Outlier Model and Theory cont…

.),,()(

)(

,)(k

Q,2

)(

2/1|1)(

|

2/1|1|)(

1regressionthefromomittingaftertheiswhere

Let

kii yySSEi

S

nkSi

iq

pn

(i)S

S

(i)X

iWt

(i)X

XWtXi

q

.,k

Q

!

)1.0(),,|(*

,max

,,1,0

k

nk

kCWXk

kp

kk

youtliersProb

For

Page 9: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and Theory cont…Outlier Model and Theory cont…

,*),,|(* max

0kpkWXkEk

k

k

y

isoutliersofnumbertheofestimateposteriorThe

.

outlierslikelymostthedetermines

attainediswhereindicesofsetThe

youtlierstheareProband

.,,

)(max),,(

).(k

Q)(

),,,|,,(

*

1

1

*

1

k

k

ii

k

ii

yy

ipii

ipi

qWXkyy

Page 10: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and Theory cont…Outlier Model and Theory cont…

nkSi

iik

k

iiti

i

iitiii

tii

pniiii

pp

XWXpn

S

yWXXWX

tpWXMk

)()()(

**

1)(

1)()(

)(

)(1)()(

1)(

1)()()(

)()()()()(

ˆ

)(2

)(ˆ

~,,,,|

ββ

isβofestimateposteriorthe

varianceand

β

meanwith

variateyβ

Page 11: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and Theory cont…Outlier Model and Theory cont…

k Siii

ti

iik

Si

tiiii

ti

iik

t

kkk

nk

nk

XWXpn

Spp

XWXpn

SpD

DpWXV

)(

1)(

1)()(

)()(

*2*

2

)()()(

1)(

1)()(

)()(

***

)()2(

ˆˆ)()2(

.),,|(

,

isofestimateposteriortheSimilarly

ββand

ββyβ

βofvarianceposteriorThe

Page 12: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

An illustrative example

).10,0(05.0)1,0(95.0~

25,,1,2595.030

,200~

2NN

ixy

x

i

iii

i

normaledcontaminatafromwereerrorstheand

Exp

Page 13: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Plot of Simulated data

0

200

400

600

800

1000

1200

0 100 200 300 400 500 600

obs 19

obs 10

obs 25

Y

X

Page 14: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Example cont…

0 0.0000

1 0.3168

2 0.3652

3 0.2143

4 0.1037

k*kp

The posterior estimate of the number of outliers is 2.105. With estimates of 34.755 and 0.967 for the intercept and slope.

Page 15: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Example cont…

The MM (Yohai 1987) M-estimator (high breakdown) indicates that observation 25 is an outlier with high leverage and observation 10 is just of high leverage.

The estimates for the parameters are intercept=35.465 and slope= 0.9629.

Page 16: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Example cont…

i_1 i_2 p_i Intercept Slope19 25 0.713 33.492 0.96516 25 0.036 37.393 0.9731 25 0.029 40.039 0.95815 25 0.021 37.090 0.9556 25 0.019 31.667 0.98322 25 0.015 37.280 0.9704 25 0.014 37.524 0.9685 25 0.013 38.032 0.96513 25 0.013 32.614 0.98010 25 0.012 33.747 0.987

)!25,10(6025,19 thanlikelymoretimesare

thenoutlierspossibletwoonlyarethereIf

Page 17: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Plot of Simulated data

0

200

400

600

800

1000

1200

0 100 200 300 400 500 600

obs 19

obs 10

obs 25

Y

X

Page 18: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Model and Theory cont…Outlier Model and Theory cont…

Strengths: method works well in detecting outliers and estimating the relevant parameters robustly. All of the data is used.

Drawback: method becomes impractical as the imputation class size n increases, since the number of possible subsets of size k will become astronomically large.

Page 19: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Outlier Procedure for “large” Outlier Procedure for “large” imputation cellsimputation cells

outliers.theidentifytoleveragetheand

usingoriftestpasseachonlargen

mmiim hipp

k

)(,

10*0

ident hii pi P_0 b0 b1 outlier25 0.1849 1.0000 0.0000 35.6930 0.9708 Y19 0.0443 0.6768 0.6676 35.2458 0.9695 Y16 0.0443 0.1338 0.8207 33.5584 0.9651 N

.0655.25.0343.0

5023.8*

10

isestimateThelyrespectiveand

areandforerrorsstandardThe

Page 20: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Simulation resultsSimulation results

Data from MRTS was selected where the number of respondents for the bivariate imputation model > 50 for 3 imputation classes.

For a given simulation (1-p)% in each cell were selected to impute for the remaining units.

The method presented here was compared to the MM M-estimator using the relative difference of the average predictions.

Page 21: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Simulation results for 200 runsSimulation results for 200 runs

p% Method Cell 1 Cell 2 Cell 3

5 MRTS -0.002 -0.020 0.013

5 MM -0.008 -0.028 0.001

10 MRTS -0.000 -0.007 0.015

10 MM -0.008 -0.015 0.003

15 MRTS -0.000 -0.011 0.014

15 MM -0.008 -0.016 0.002

Page 22: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

Conclusions

The procedure for outlier detection works well and produces fairly robust estimates. It would also allow for more covariates to be included in the E&I process.Even though the assumption of normality led to the closed form solution of the estimator it is still applicable to situations where modest departures from normality arise.

Page 23: Outlier detection and accommodation for business surveys utilizing multiple linear regression models in edit and imputation Robert Philips ICES-III June

For more Pour plus

information d’information, please contact veuillez contacter

www.statcan.ca

Robert Philips- e-mail: [email protected] telephone: (613) 951-1493

Merci!