Micro-Fusion

Preview:

DESCRIPTION

Micro-Fusion. Presented by. Marco Di Zio Istat – Italian National institute of Statistics. Outline. Micro-Fusion in Memobust Objectives characterising Micro-Fusion settings Focus on some methods Structure of the Memobust handbook section concerning micro-fusion. Micro-Fusion in Memobust. - PowerPoint PPT Presentation

Citation preview

EurostatEurostat

Micro-Fusion

Presented by

• Marco Di Zio • Istat – Italian National institute of Statistics

Outline

• Micro-Fusion in Memobust• Objectives characterising Micro-Fusion settings• Focus on some methods• Structure of the Memobust handbook section

concerning micro-fusion

Micro-Fusion in Memobust

• Micro Fusion in Memobust

– Integration of data sources composed of units (input: micro) in order to obtain still a data set composed of units (output: micro).

– It is focused on statistical techniques

Main settings of M-F in Memobust

• Integration of data sources composed of the same units (Record linkage-Object matching)

• Integration of sources composed of different units (Statistical matching)

• Make integrated data to be consistent (Microintegration)

Example of Integration with the same units – (Record linkage-object matching)

• There is a register where main variables are observed, we want to integrate with information from administrative sources and sample surveys

– Register of businesses with main characteristics: NUTS, NACE, n. employee,..

– Financial statements and the Tax Authority sources – Small and medium enterprise survey

Frameworks for record linkage

1. Unique unit identifier without error

2. unit identifier should be created by the available variables, without error

3. unit identifier should be created by the available variables, affected by errors

The fellegi-sunter decision rule

• Data sources A and B (NA and NB obs)

• Choose k match vars (common) X1,…,Xk

• Compare (e.g. ci=1 if Xi in A eq Xi in B, or ci=0 otherwise) and obtain C=(C1,…,Ck) for couple of units (a,b)

Fellegi-Sunter (1969)

• Compute

• Couples (a,b) can be ordered and classified in M* and U* (or undefined Q*) sets according to r

)(

)(

),(

),(

cu

cm

UbacP

MbacPr

*,,

*,,

*,,

UbauTbar

QbauTbarmT

MbamTbar

Fellegi-Sunter • The thresholds are assigned solving equations that

minimize both the size of the set Q and the false match rate and false non-match rate

Modules for record linkage-object matching in the handbook

1. Object matching (record linkage)2. Object identifier matching 3. Unweighted matching of object characteristics 4. Weighted matching of object characteristics5. Probabilistic record linkage6. Fellegi-Sunter method for record linkage

Example of integration of ds with different units – (Statistical matching)

• Combining Farms– Farm Structure Survey – Farm Accountancy Data survey

• Combining Income - Consumption– Household Budget Survey– Bank of Italy survey on Income

Statistical matching

z~

Statistical matching methods

• Imputation methods

– Parametric methods

– Non-parametric methods (donor imputation)

– Mixed methods

Mixed methods

1. Estimate a parametric model (e.g. regression)

2. Use the model in step 1 to predict values in both the data sets (e.g. recipient A, donor B)

3. Use predicted values for finding a donor to impute in the recipient A (e.g. find the nearest neighbour in B according to a distance computed on regressed values)

Limitations and alternatives in SM

1. Naïve methods are implicitly based on the conditional independence assumption (Y and Z are independent given common vars X).

2. To overcome 1, use auxiliary information on Y and Z, e.g., outdated data, proxy variables…

3. computation of uncertainty bounds, i.e., the bounds of unidentifiable parameters (e.g., correlation of Y and Z).

Modules for statistical matching in the handbook

1. Matching different observations from different sources (statistical matching)

2. Statistical matching methods

Obtaining consistent data - Example

• Key vars on reliable admin data (e.g., Turnover, n. Employees, tot wages paid - Wages).

• The SBS requires more detail so ->

• A sample survey is conducted to obtain additional details.

• For Turnover and other key vars, the register values are used and survey values for the other variables.

Integration of data sources with different units - Example

Variable Name Survey values Composite (I)

x1 Profit 330 330

x2 Employees (Number of employees) 20 25

x3 Turnover main (Turnover main activity)

1000 1000

x4 Turnover other (Turnover other activities)

30 30

x5 Turnover (Total turnover) 1030 950

x6 Wages (Costs of wages and salaries) 500 550

x7 Other costs 200 200

x8 Total costs 700 700

Obtaining consistent data - Example

• Business records have to adhere to a number of accounting rules and logical constraints, e.g

1. e1: x1 – x5 + x8 = 0 (Profit = Turnover – Total Costs)

2. e2: –x3 + x5 – x4 = 0 (Turnover = Turnover main + Turnover other)

3. e3: –x6 – x7 + x8 = 0 (Total Costs = Wages + Other costs)

Integration of data sources with different units - Example

and…• violation of the edit-rules

• to obtain a consistent record some of the values have to be changed or “adjusted''.

Adjusting methods

1. Prorating,

2. Minimum adjustment methods

3. Generalised ratio adjustment.

Minimum adjustment methods

• e1: x1 – x5 + x8 = 0 (Profit = Turnover – Total Costs)

• e2: –x3 + x5 – x4 = 0 (Turnover = Turnover main + Turnover other)

• e3: –x6 – x7 + x8 = 0 (Total Costs = Wages + Other costs)

can be expressed in the form Ex = c with

11100000

00011100

10010001

E

0

0

0

c

Minimum adjustment methods

• More in general edits can be expressed asAx >= b

The min. adj. method consists in finding a solution for

x0 :observed values of vars that can be modified

bxA

xxx x

~ ..

),(minarg~0

ts

D

Microintegration modules in the handbook

1. Reconciling conflicting micro-data

2. Prorating

3. Minimum adjustment methods

4. Generalised ratio adjustments

Authors of the modules

1. Introductory module • Di Zio M. (Istat)

2. Modules on record linkage - object matching• Willneborg L., Van de Laar R. (CBS)• Tuoto T., Cibella N. (Istat)

3. Modules on statistical matching• Scanu M., D’Orazio M. (Istat)

4. Modules on microintegration• Pannekoek J. (CBS)

Recommended