Eurostat
big data in the European Statistical System
Michail SKALIOTIS – EUROSTAT, Head of Task Force 'Big Data'
Conference by STATEC and EUROSTAT
Savoir pour agir: la statistique publique au service des citoyens
Eurostat
Digital footprint
Datafication
Sensors
Eurostat
Eurostat
Proclamation of pope Benedict
2005
Eurostat
Proclamation of pope Francis
2013
Eurostat
African proverb “ When the music changes, so does the dance”
If we fail to listen we will be out of step!
(Denise Lievesley)
Eurostat
Big data @ ESS – key points
ESS (European Statistical System) Scheveningen Memorandum September 2013
Examine the potential of big data sources for official statistics
Official Statistics big data strategy as part of wider government strategy
Address privacy and data protection
Collaboration at European and global level
Address need for skills
Partnerships between different stakeholders (government, academics, private sector)
Developments in methodology, quality assessment and IT
Adopt action plan and roadmap for the ESS
Eurostat
Big data @ ESS – key points
ESS (European Statistical System) Scheveningen Memorandum Sep 2013 Task Force Big Data Big Data Action Plan and Roadmap 1.0 Sept. 2014
ESS Pilots 2016 - 2019
Implementation of ESS Vision 2020: Big Data project integral part of the portfolio
European Commission Communication
"Towards a thriving data driven economy"
Public Private Partnership on big data
International cooperation (UNSD, UNECE, etc.)
Eurostat
Policy Quality framework Skills
Experience sharing Legislation IT
Infrastructures
Methods Ethics / Communication Pilots
Areas in Big data roadmap
Eurostat
Action (example) ▫ Pilot projects, carried out by the Member States (ESSnet)
2015 – 2019 (FPA / SGA construction) Exploring different big data sources (but also IT architecture,
partnerships), developing generic guidelines and frameworks Enable the ESS to gradually integrate big data sources into
the production of European and national statistics ?
Challenges ▫ cooperation, sharing of know-how ▫ development of a sound
methodology ("from design-based to model-based approach")
▫ exploration & tentative implementation
Eurostat
Action (example) ▫ Training program for European statisticians (ESTP)
In the next years: dedicated courses on big data Focus on big data sources and on big data tools Acquiring the skills needed to assess sources and their
quality, the skills to use tools and to explore big data sources
Challenges ▫ new skills for NSI staff:
statisticians vs. data scientists ? ▫ computing capacity, hardware ? ▫ analytical tools, software? ▫ storage ?
Eurostat
Action (example) ▫ Project on the analysis of legislation and strategy (but also
ethics and communication) 2015-2017 (22 months) Analysis for EU and for Member States at national level
▫ See also the Feasibility study on the use of mobile positioning data for tourism statistics (report on feasibility of access)
Challenges ▫ integrating official statistics in
big data strategies ▫ getting access to data &
continuity of access ▫ data security & privacy concerns ▫ pay for data ?
Eurostat
Action (example) ▫ Cooperation with UN (lead) on a quality framework for big data
▫ Project on the analysis of ethics and communication (but also legislation and strategy) 2015-2017 (22 months) Analysis for EU and for Member States at national level
Challenges ▫ transversal challenges to all big data
activities: quality and ethics & communication
▫ big data vs. statistics : "goodness of fit" (concepts, representativeness,…)
▫ impact on the public opinion of privacy and security concerns ?
Eurostat
Mobile Phone Data
Tourism Statistics
Population Statistics
Migration Statistics
Traffic Statistics
Commuting Statistics
Big data =
Multiple sources & Multiple outputs
Population
Statistics
Mobile phone data
Smart Meters
VGI websites
Satellite Images
Eurostat
Statistical domains
Tourism Employment Population Migration Balance of payments Regional and GIS Transport ICT usage Prices and inflation Land use Agriculture
Eurostat
National initiatives as a driver
CBS Netherlands ISTAT Italy ONS UK CSO Ireland Statistics Finland SURS Slovenia …
Insights for world heritage sites from Wikipedia use
• Source • Hourly page views for each Wikipedia article • Content of Wikipedia articles • High timeliness, temporal detail and transparency, no
geographical information • Processing
• Big Data Sandbox: computer cluster with 4 nodes • Tools: Pig, Map-Reduce, Python, R • Association of Wikipedia articles to specific WHS
• Output • Exposure of world heritage via Wikipedia
Insights for world heritage sites from Wikipedia use
Page views of English Wikipedia articles related to World Heritage Sites
Nowcasting Unemployment
• Source • Google Trends (others to be explored) • High timeliness, geo info available, low transparency
• Processing • Low computing power required • Time-series modelling (machine learning to be
explored) • Tools: R
• Output • Nowcasting of unemployment from 1 month lag to
current time
Eurostat
The statistical office of the future
Data flows in addition to surveys and censuses
Embedded in data flow – statistics 'everywhere'
Product designers in addition to data collection designers
Statistical modelling will be a major activity
From descriptive indicators to nowcasting (and forecasting)
Trust and quality will be key
New role in teaching digital literacy
Accreditation and certification instead of pure production
Address issues linked to quality & transparency, privacy & confidentiality, access to third party data sources & data sharing, scientific standards & methodology, professional ethics, skills, …
Eurostat
The NSI of the future: Official Statistics in a full-fledged IoT world
Svein Nordbotten: Use of electronically observed data in official statistics
Eurostat
Thank you for your attention !