Upload
torin
View
55
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Micro-Fusion. Presented by. Marco Di Zio Istat – Italian National institute of Statistics. Outline. Micro-Fusion in Memobust Objectives characterising Micro-Fusion settings Focus on some methods Structure of the Memobust handbook section concerning micro-fusion. Micro-Fusion in Memobust. - PowerPoint PPT Presentation
Citation preview
EurostatEurostat
Micro-Fusion
Presented by
• Marco Di Zio • Istat – Italian National institute of Statistics
Outline
• Micro-Fusion in Memobust• Objectives characterising Micro-Fusion settings• Focus on some methods• Structure of the Memobust handbook section
concerning micro-fusion
Micro-Fusion in Memobust
• Micro Fusion in Memobust
– Integration of data sources composed of units (input: micro) in order to obtain still a data set composed of units (output: micro).
– It is focused on statistical techniques
Main settings of M-F in Memobust
• Integration of data sources composed of the same units (Record linkage-Object matching)
• Integration of sources composed of different units (Statistical matching)
• Make integrated data to be consistent (Microintegration)
Example of Integration with the same units – (Record linkage-object matching)
• There is a register where main variables are observed, we want to integrate with information from administrative sources and sample surveys
– Register of businesses with main characteristics: NUTS, NACE, n. employee,..
– Financial statements and the Tax Authority sources – Small and medium enterprise survey
Frameworks for record linkage
1. Unique unit identifier without error
2. unit identifier should be created by the available variables, without error
3. unit identifier should be created by the available variables, affected by errors
The fellegi-sunter decision rule
• Data sources A and B (NA and NB obs)
• Choose k match vars (common) X1,…,Xk
• Compare (e.g. ci=1 if Xi in A eq Xi in B, or ci=0 otherwise) and obtain C=(C1,…,Ck) for couple of units (a,b)
Fellegi-Sunter (1969)
• Compute
• Couples (a,b) can be ordered and classified in M* and U* (or undefined Q*) sets according to r
)(
)(
),(
),(
cu
cm
UbacP
MbacPr
*,,
*,,
*,,
UbauTbar
QbauTbarmT
MbamTbar
Fellegi-Sunter • The thresholds are assigned solving equations that
minimize both the size of the set Q and the false match rate and false non-match rate
Modules for record linkage-object matching in the handbook
1. Object matching (record linkage)2. Object identifier matching 3. Unweighted matching of object characteristics 4. Weighted matching of object characteristics5. Probabilistic record linkage6. Fellegi-Sunter method for record linkage
Example of integration of ds with different units – (Statistical matching)
• Combining Farms– Farm Structure Survey – Farm Accountancy Data survey
• Combining Income - Consumption– Household Budget Survey– Bank of Italy survey on Income
Statistical matching
z~
Statistical matching methods
• Imputation methods
– Parametric methods
– Non-parametric methods (donor imputation)
– Mixed methods
Mixed methods
1. Estimate a parametric model (e.g. regression)
2. Use the model in step 1 to predict values in both the data sets (e.g. recipient A, donor B)
3. Use predicted values for finding a donor to impute in the recipient A (e.g. find the nearest neighbour in B according to a distance computed on regressed values)
Limitations and alternatives in SM
1. Naïve methods are implicitly based on the conditional independence assumption (Y and Z are independent given common vars X).
2. To overcome 1, use auxiliary information on Y and Z, e.g., outdated data, proxy variables…
3. computation of uncertainty bounds, i.e., the bounds of unidentifiable parameters (e.g., correlation of Y and Z).
Modules for statistical matching in the handbook
1. Matching different observations from different sources (statistical matching)
2. Statistical matching methods
Obtaining consistent data - Example
• Key vars on reliable admin data (e.g., Turnover, n. Employees, tot wages paid - Wages).
• The SBS requires more detail so ->
• A sample survey is conducted to obtain additional details.
• For Turnover and other key vars, the register values are used and survey values for the other variables.
Integration of data sources with different units - Example
Variable Name Survey values Composite (I)
x1 Profit 330 330
x2 Employees (Number of employees) 20 25
x3 Turnover main (Turnover main activity)
1000 1000
x4 Turnover other (Turnover other activities)
30 30
x5 Turnover (Total turnover) 1030 950
x6 Wages (Costs of wages and salaries) 500 550
x7 Other costs 200 200
x8 Total costs 700 700
Obtaining consistent data - Example
• Business records have to adhere to a number of accounting rules and logical constraints, e.g
1. e1: x1 – x5 + x8 = 0 (Profit = Turnover – Total Costs)
2. e2: –x3 + x5 – x4 = 0 (Turnover = Turnover main + Turnover other)
3. e3: –x6 – x7 + x8 = 0 (Total Costs = Wages + Other costs)
Integration of data sources with different units - Example
and…• violation of the edit-rules
• to obtain a consistent record some of the values have to be changed or “adjusted''.
Adjusting methods
1. Prorating,
2. Minimum adjustment methods
3. Generalised ratio adjustment.
Minimum adjustment methods
• e1: x1 – x5 + x8 = 0 (Profit = Turnover – Total Costs)
• e2: –x3 + x5 – x4 = 0 (Turnover = Turnover main + Turnover other)
• e3: –x6 – x7 + x8 = 0 (Total Costs = Wages + Other costs)
can be expressed in the form Ex = c with
11100000
00011100
10010001
E
0
0
0
c
Minimum adjustment methods
• More in general edits can be expressed asAx >= b
The min. adj. method consists in finding a solution for
x0 :observed values of vars that can be modified
bxA
xxx x
~ ..
),(minarg~0
ts
D
Microintegration modules in the handbook
1. Reconciling conflicting micro-data
2. Prorating
3. Minimum adjustment methods
4. Generalised ratio adjustments
Authors of the modules
1. Introductory module • Di Zio M. (Istat)
2. Modules on record linkage - object matching• Willneborg L., Van de Laar R. (CBS)• Tuoto T., Cibella N. (Istat)
3. Modules on statistical matching• Scanu M., D’Orazio M. (Istat)
4. Modules on microintegration• Pannekoek J. (CBS)