26
Eurostat Micro-Fusion

Micro-Fusion

  • Upload
    torin

  • View
    55

  • Download
    2

Embed Size (px)

DESCRIPTION

Micro-Fusion. Presented by. Marco Di Zio Istat – Italian National institute of Statistics. Outline. Micro-Fusion in Memobust Objectives characterising Micro-Fusion settings Focus on some methods Structure of the Memobust handbook section concerning micro-fusion. Micro-Fusion in Memobust. - PowerPoint PPT Presentation

Citation preview

Page 1: Micro-Fusion

EurostatEurostat

Micro-Fusion

Page 2: Micro-Fusion

Presented by

• Marco Di Zio • Istat – Italian National institute of Statistics

Page 3: Micro-Fusion

Outline

• Micro-Fusion in Memobust• Objectives characterising Micro-Fusion settings• Focus on some methods• Structure of the Memobust handbook section

concerning micro-fusion

Page 4: Micro-Fusion

Micro-Fusion in Memobust

• Micro Fusion in Memobust

– Integration of data sources composed of units (input: micro) in order to obtain still a data set composed of units (output: micro).

– It is focused on statistical techniques

Page 5: Micro-Fusion

Main settings of M-F in Memobust

• Integration of data sources composed of the same units (Record linkage-Object matching)

• Integration of sources composed of different units (Statistical matching)

• Make integrated data to be consistent (Microintegration)

Page 6: Micro-Fusion

Example of Integration with the same units – (Record linkage-object matching)

• There is a register where main variables are observed, we want to integrate with information from administrative sources and sample surveys

– Register of businesses with main characteristics: NUTS, NACE, n. employee,..

– Financial statements and the Tax Authority sources – Small and medium enterprise survey

Page 7: Micro-Fusion

Frameworks for record linkage

1. Unique unit identifier without error

2. unit identifier should be created by the available variables, without error

3. unit identifier should be created by the available variables, affected by errors

Page 8: Micro-Fusion

The fellegi-sunter decision rule

• Data sources A and B (NA and NB obs)

• Choose k match vars (common) X1,…,Xk

• Compare (e.g. ci=1 if Xi in A eq Xi in B, or ci=0 otherwise) and obtain C=(C1,…,Ck) for couple of units (a,b)

Page 9: Micro-Fusion

Fellegi-Sunter (1969)

• Compute

• Couples (a,b) can be ordered and classified in M* and U* (or undefined Q*) sets according to r

)(

)(

),(

),(

cu

cm

UbacP

MbacPr

*,,

*,,

*,,

UbauTbar

QbauTbarmT

MbamTbar

Page 10: Micro-Fusion

Fellegi-Sunter • The thresholds are assigned solving equations that

minimize both the size of the set Q and the false match rate and false non-match rate

Page 11: Micro-Fusion

Modules for record linkage-object matching in the handbook

1. Object matching (record linkage)2. Object identifier matching 3. Unweighted matching of object characteristics 4. Weighted matching of object characteristics5. Probabilistic record linkage6. Fellegi-Sunter method for record linkage

Page 12: Micro-Fusion

Example of integration of ds with different units – (Statistical matching)

• Combining Farms– Farm Structure Survey – Farm Accountancy Data survey

• Combining Income - Consumption– Household Budget Survey– Bank of Italy survey on Income

Page 13: Micro-Fusion

Statistical matching

z~

Page 14: Micro-Fusion

Statistical matching methods

• Imputation methods

– Parametric methods

– Non-parametric methods (donor imputation)

– Mixed methods

Page 15: Micro-Fusion

Mixed methods

1. Estimate a parametric model (e.g. regression)

2. Use the model in step 1 to predict values in both the data sets (e.g. recipient A, donor B)

3. Use predicted values for finding a donor to impute in the recipient A (e.g. find the nearest neighbour in B according to a distance computed on regressed values)

Page 16: Micro-Fusion

Limitations and alternatives in SM

1. Naïve methods are implicitly based on the conditional independence assumption (Y and Z are independent given common vars X).

2. To overcome 1, use auxiliary information on Y and Z, e.g., outdated data, proxy variables…

3. computation of uncertainty bounds, i.e., the bounds of unidentifiable parameters (e.g., correlation of Y and Z).

Page 17: Micro-Fusion

Modules for statistical matching in the handbook

1. Matching different observations from different sources (statistical matching)

2. Statistical matching methods

Page 18: Micro-Fusion

Obtaining consistent data - Example

• Key vars on reliable admin data (e.g., Turnover, n. Employees, tot wages paid - Wages).

• The SBS requires more detail so ->

• A sample survey is conducted to obtain additional details.

• For Turnover and other key vars, the register values are used and survey values for the other variables.

Page 19: Micro-Fusion

Integration of data sources with different units - Example

Variable Name Survey values Composite (I)

x1 Profit 330 330

x2 Employees (Number of employees) 20 25

x3 Turnover main (Turnover main activity)

1000 1000

x4 Turnover other (Turnover other activities)

30 30

x5 Turnover (Total turnover) 1030 950

x6 Wages (Costs of wages and salaries) 500 550

x7 Other costs 200 200

x8 Total costs 700 700

Page 20: Micro-Fusion

Obtaining consistent data - Example

• Business records have to adhere to a number of accounting rules and logical constraints, e.g

1. e1: x1 – x5 + x8 = 0 (Profit = Turnover – Total Costs)

2. e2: –x3 + x5 – x4 = 0 (Turnover = Turnover main + Turnover other)

3. e3: –x6 – x7 + x8 = 0 (Total Costs = Wages + Other costs)

Page 21: Micro-Fusion

Integration of data sources with different units - Example

and…• violation of the edit-rules

• to obtain a consistent record some of the values have to be changed or “adjusted''.

Page 22: Micro-Fusion

Adjusting methods

1. Prorating,

2. Minimum adjustment methods

3. Generalised ratio adjustment.

Page 23: Micro-Fusion

Minimum adjustment methods

• e1: x1 – x5 + x8 = 0 (Profit = Turnover – Total Costs)

• e2: –x3 + x5 – x4 = 0 (Turnover = Turnover main + Turnover other)

• e3: –x6 – x7 + x8 = 0 (Total Costs = Wages + Other costs)

can be expressed in the form Ex = c with

11100000

00011100

10010001

E

0

0

0

c

Page 24: Micro-Fusion

Minimum adjustment methods

• More in general edits can be expressed asAx >= b

The min. adj. method consists in finding a solution for

x0 :observed values of vars that can be modified

bxA

xxx x

~ ..

),(minarg~0

ts

D

Page 25: Micro-Fusion

Microintegration modules in the handbook

1. Reconciling conflicting micro-data

2. Prorating

3. Minimum adjustment methods

4. Generalised ratio adjustments

Page 26: Micro-Fusion

Authors of the modules

1. Introductory module • Di Zio M. (Istat)

2. Modules on record linkage - object matching• Willneborg L., Van de Laar R. (CBS)• Tuoto T., Cibella N. (Istat)

3. Modules on statistical matching• Scanu M., D’Orazio M. (Istat)

4. Modules on microintegration• Pannekoek J. (CBS)