ENP Course Georgia
2018
Combining data from different sources and modes
Register system and some extra on Datafusion
Statistical unit centric model
2
Source: Holmberg, A Discussion on coverage in Administrative data, Journal of Official Statistics, Vol. 31, No. 3, 2015, pp. 515–525
Statistical Units in Practise
Domains with statistical registers
A System of statistical registers has1. Base or Core registers – with important
statistical sets
2. Other statistical sources – to access important
variables
3. Linkage options between entities in different
base registers. Linkage options base registers
and other statistical sources
4. Standard variables (fundamental variables)
5. Tailored statistical methods, quality assurance
6. Metadata
7. IT-tools for processing and maintenance
8. Rules for protecting Confidentiality and Privacy
Properties of a Base register
Phase 1 – Create a ‘augmented’ register
7
Phase 2 – Create Statistical Register (SR) for the
targeted population
8
Typical situation of a statistical register
9
Source: Falorsi, Fortini, Di Zio, DIME&ITDG Steering Group, Hungary, 19 October 2016 ISTAT
Building a statistical register cont.
Phase 3 – Compute Estimates from the SR in Main Domains
10
Phase 4 – Quality feedback from maintenance and validation
surveys
11
http://www.afdb.org/en/knowledge/publications/guidelines-for-building-statistical-business-registers-in-africa/
SBR Guidelines: Economic Units Model
12 Source: African Development Bank
Data Fusion or Statistical Matching
X Y
Y Z
D’Orazio, M., Di Zio, M., and Scanu, M. (2006) Statistical Matching: Theory and Practice. Wiley and Sons, Chichester. http://www.wiley.com/go/matching
X Y Z
The microdata objective
means creation of a
synthetic dataset
Statistical matching aims at determining information
on (X;Y;Z), or at least on the pairs of variables which
are not observed jointly (X;Z)
Data Fusion or Statistical Matching
X Y Z
Y Z
D’Orazio, M., Di Zio, M., and Scanu, M. (2006) Statistical Matching: Theory and Practice. Wiley and Sons, Chichester. http://www.wiley.com/go/matching
D’Orazio M , Di Zio M , and Scanu, M. Statistical Matching for Categorical Data: Displaying Uncertainty and Using Logical Constraints Journal of Official Statistics, Vol. 22, No. 1,
2006, pp. 137–157
Usually through Imputation
using Conditional
Independence Assumption
(cia)
File A
File B donor
A ‘New’ Opportunity, Networks and the
Semantic Web
• World Wide Web Consortium (W3C), RDF, (Resource Description Framework), Neo4j etc
• Networks….
• Linked Open Data Initiative
• LOD and LOD2 (EU’s 7th Framework program)
The Semantic Web
Man-Made Technology Networks
Nature/Bio/Cognitive Networks
Information/Knowledge Networks
LESS STRUCTURED DATA, TRIPLETS AND RDFs
Located in
Located in
Location
Person
JOB
LKAU
Works at
Has job
Employee of
Member of
Lives in
Is owned by
Dwelling
Household Member of Enterprise
Industry
Employees
Employs