1 Selective data editing Development & implementation Q 2010 Helsinki Jörgen Svensson Process...

Preview:

Citation preview

1

Selective data editingSelective data editingDevelopment & implementationDevelopment & implementation

Q 2010 Helsinki

Jörgen Svensson

Process Owner

Statistics Sweden (SCB)

Standardization at SCB

• Decentralized production

• Development of CBM:s

• Editing costly, 33% of budgets

• Data collection departments, 2006

• Standardization – the Lotta project, in 2006

22

3

Nine case studiesNine case studies

Purpose of the project:

• Try using selective data editing

• What is the potential gain using the method?

• Would it be possible to develop and use a common tool?

4

Some results from case Some results from case studiesstudies

SurveyReduction

%

Short term employment, private sector 60

Business activity indicators 50

Price indices in producer & import stages 50

Short term statistics, wages & salaries, private sector

40

Wage & salary structures in the private sector 25

Foreign trade (5)

Structural business statistics ---

SUSPICION

• SUSP(j, k) = Suspicion of variable j for unit k

• SUSP(j, k) = 0 if variable value falls within acceptance interval

• SUSP(j, k) → 1 as value deviates from acceptance limit

• 0 ≤ SUSP(j,k) ≤ 1

POTENTIAL IMPACT

• POTIMP = Potential impact

• POTIMP is weighted absolute difference between observed and predicted value :

• POTIMP(j ,k,d) =

for variable j, unit k in domain d wk is sampling weight, k(d) is domain indicator

• SELEKT supports several ways to establish predicted value: from time series data and from cross sectional analysis within homogenous groups of units

Flagging suspected errorsFlagging suspected errors

log(Potential impact)

log(Suspicion)

20

Flagged

LOCAL SCORE

Local (item) score LScore (j,k,d):

LScore (j,k,d) = SUSP(j,k)*|POTIMP(j,k,d)|*Cello(j,d)

Cello(j,d) is inversely proportional to the standard error based on previous data

GLOBAL SCORE

• Global (unit) score GScore(k) is obtained by aggregation of local scores

• LScore (k, j, d) → LScore (k , j) → GScore(k)

• → = Summation , Euclidian Summation or Maximum

• Only those units with GScore larger than a pre-decided threshold are followed up

SELEKT, EDIT SELEKT, EDIT and process dataand process data

1010

Implementation of SELEKT

So far three surveys:

• Business activity indicators

• Wage & salary structures in the private sector

• Commodity flow survey

1111

1212

Documentation

A General Methodology for Selective Data Editing

jorgen.svensson@scb.se

anders.norberg@scb.se