30
Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION IN MEMBER STATES POLAND Mirosław Migacz INSPIRE Conference 2016 Barcelona, 26 IX 16

Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Development of guidelines for publishing statistical dataas linked open dataMERGING STATISTICS AND  GEOSPATIALINFORMATION IN  MEMBER STATES ‐ POLAND

Mirosław MigaczINSPIRE Conference 2016Barcelona, 26 IX 16

Page 2: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Overall objectiveSupport decision‐making processes involving provision of standardized, usable and open georeferenced statistical data.

Page 3: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

What is linked open data?• Internet – collection of documents published online – accessible at Web location identified by a URL,

• Documents mainly human‐readable and cannot be understood by machines. 

• Linked open data – data machine‐readable formats and connecting described using Uniform Resource Identifiers (URIs), thus enabling people and machines to collect the data, and put it together to do all kinds of things with it (permitted by the licence).

source:  https://joinup.ec.europa.eu/community/ods/description (CC 2.0)

Page 4: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Linked open data• URI – for names

• RDF – to describe data

• SPARQL – to query for data

source:  https://joinup.ec.europa.eu/community/ods/description (CC 2.0)

Page 5: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Uniform Resource Identifier (URI)to „make a long story short”:

object described by an internet address

A country, e.g. Belgium

http://publications.europa.eu/resource/authority/country/BEL 

A dataset, e.g. Countries Named Authority List

http://publications.europa.eu/resource/authority/country/

In official statistics it can look like this:

http://teryt.stat.gov.pl/32/18/05/3 ‐ gmina Węgorzyno

source:  https://joinup.ec.europa.eu/community/ods/description (CC 2.0)

Page 6: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

RDF & SPARQLResource Description Framework (RDF ) is a syntax for representing data and resources in the Web

RDF breaks every piece of information down in triples:

• Subject – a resource, which may be identified with a URI.

• Predicate – a URI‐identified reused specification of the relationship.

• Object – a resource or literal to which the subject is related.

source:  https://joinup.ec.europa.eu/community/ods/description (CC 2.0)

http://example.org/place/Brussels is the capital of “Belgia”LUB

http://example.org/place/Brussels is the capital of http://example.org/place/Belgium

subject predicate object

SPARQL is a standardised language for querying RDF data.

Page 7: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Five stars of linked open data

source:  https://joinup.ec.europa.eu/community/ods/description (CC 2.0)

Make your stuff available on the Web (whatever format) under an open license.

Make it available as structured data (e.g., Excel instead of image scan of a table)

Use non‐proprietary formats (e.g., CSV instead of Excel)

Use URIs to denote things, so that people can point at your stuff

Link your data to other data to provide context

Page 8: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Now

powiatłobeski(LAU 1)

3218

4.4.32.64.18

lobeski

4326418

Page 9: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Aim

powiat łobeski

http://nts.stat.gov.pl/4/4/32/64/18

Page 10: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Specific objectives• identification of statistical units for which data can be published with harmonization of theirgeometries for respective years

• building standarizedURIs for statistical units

• identification and analysis of potential data sources

• plan for transformation of existing data sourcesinto open formats

• creation of RDF metadata for data sources

• feasibility analysis for publishing linked open data

Page 11: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Identification of data sources• Three major databases:

• Local Data Bank• biggest set of statistical information available

for a wide range of years• updated monthly

• Demography Database • integrated data source for state and structure

of population, vital statistics and migrations

• Development monitoring system STRATEG• a system for facilitating and monitoring the 

development policy• key measures to monitor execution of 

strategies at local, regional, transregional and EU level.

Page 12: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Identification of data sources• Other data sources:• publications

• tables

• communiques

• announcements

• articles

Page 13: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Identification of data sources• Metadata:• thematic category,

• format (PDF, DOC, XLS, CSV),

• spatial reference (country, NUTS, LAU, functional areas, urbanareas),

• temporal reference (years)

• presence of identifiers (TERYT, NTS, NUTS)

• update cycle

Page 14: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Preliminary analysis of data sources• Key aspects:• openness

• redundance of information

• popularity (based on view and download statistics)

• Inclusion / exclusion of the data source

Page 15: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Statistical units harmonization• Basis:• NTS (Nomenclature of Territorial Units for Statistical Purposes)

Name NTS NUTS/LAU Identifier

Region 1 NUTS 1 1.6

Voivodship 2 NUTS 2 2.6.22

Subregion 3 NUTS 3 3.6.22.40

Powiat 4 LAU 1 4.6.22.40.11

Gmina 5 LAU 2 5.6.22.40.11.01.1

Page 16: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Statistical units harmonization• Input data:• administrative boundaries since 2002 for LAU 2 (gmina), excluding 2007

• Harmonization process:• structure standardization

• standardization of identifiers (creating NTS identifiers)

• aggregation to higher level units (LAU 1 ‐> NUTS 1)

Page 17: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Statistical units harmonization• Non‐standard statistical units:• functional areas

• urban areas

• Groups of NTS units

• Derive mostly from strategic documents

• Changes of geometries in time to be determined

Page 18: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Statistical units URIs• NTS as basic classification

Name NTS NUTS/LAU

Identifier URIhttp://nts.stat.gov.pl/...

Region 1 NUTS 1 1.6 …1/6

Voivodship 2 NUTS 2 2.6.22 …2/6/22

Subregion 3 NUTS 3 3.6.22.40 …3/6/22/40

Powiat 4 LAU 1 4.6.22.40.11 …4/6/22/40/11

Gmina 5 LAU 2 5.6.22.40.11.01.1 …5/6/22/40/11/01/1

http://nts.stat.gov.pl/5/6/22/40/11/01/1

Page 19: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Data transformation plan• Test workflow from ontology to SPARQL endpoint• Decide what will be published as Open Data• three major databases• other data sources

• Create ontology

• Map to existing databases

• Export to RDF data store

• Publish on linked data server

• Workflow tested on STRATEG database

Page 20: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Ontology ‐ methods and tools• Ontop ‐ platform to query databases as Virtual RDF Graphs using SPARQL• SPARQL 1.0 Support

• Supports interface for ontology development

• Intuitive/powerful mapping language

• Support for free and commercial DBMS

• SPARQL end‐point

Page 21: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Mapping ontology on database

Page 22: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

SPARQL query on mapped data

Page 23: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

SPARQL endpoint tools for the web• Apache Jena Fuseki• Fuseki is a SPARQL server. It allows REST‐style SPARQL Query.

• Ontop generated RDF’s are imported to Apache Jena

• Pubby• A Linked Data Frontend for SPARQL Endpoints

• Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application.

• Provides data at given linked data uri

Page 24: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Fuseki SPARQL endpoint query

Page 25: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Query result facilitated by Pubby

Page 26: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Further works• Consultation of the designed workflow during a studyvisit at the Madrid University of Technology

• Setting up an internal test linked data server to implement web tools

• Creating ontologies and workflows for databases and other data sources

Page 27: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Summary – results so far• Harmonized geometries for statistical units

• Identified data sources with comprehensive metadata

• Preliminary data transformation plan with tools tested

Page 28: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Poland’s data opening strategy• launched this year

• aimed at opening data resources of governmentinstitutions with respect to the 5‐stars of linked open data goals

• the grant results (guidelines) in line with the strategy

• increased probability of acquiring financing for a fullyfledged implementation

Page 29: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

INSPIRE Thematic Clustershttps://themes.jrc.ec.europa.eu – collaboration platform

Statistical Cluster:

statistical units

population distribution (demography)

human health and safety

Informal meeting of Cluster members duringthe coffee break (15:30‐16:00)

Page 30: Development of guidelines statistical data as linked …...Development of guidelines for publishing statistical data as linked open data MERGING STATISTICS AND GEOSPATIAL INFORMATION

Questions?