16
Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of Information Systems RC “Athena” 2nd International Workshop on Open Data (WOD 2013) @ Paris Theodore Dalamagas

Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Publishing Census as Linked Open Data.

A Case Study

Irene Petrou

George Papastefanatos

Institute for the Management of Information Systems

RC “Athena”

2nd International Workshop on Open Data (WOD 2013) @ Paris

George Papastefanatos

Theodore Dalamagas

Page 2: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Outline

− Introduction to LOD and statistical data

− Greek census data overview

− LOD Technology adopted

− Case Study: Publishing Population Data− Case Study: Publishing Population Data

− Conclusions and Future Work

2

Page 3: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Linked Open Data (LOD)

• Principles of Linked Data by Tim Berners-Lee:

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those

names

3. When someone looks up a URI, provide useful

information, using the standards (RDF, SPARQL)

4. Include links to other URIs so that they can discover

more things

3

Page 4: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Statistical data and related vocabularies

� Statistical data are everywhere!– consumed by governmental institutions, private organizations, journalists,

scientists, etc.

� Statistical vocabularies and standards:

– SDMX (Statistical data and metadata exchange) standard• Sponsored by EUROSTAT, UN, World Bank, BIS, ECB, IMF, OECD

• SDMX CONTENT-ORIENTED GUIDELINES (2009) (COG’s)• SDMX CONTENT-ORIENTED GUIDELINES (2009) (COG’s)– Cross-Domain Concepts

– Cross-Domain Code Lists

– Statistical Subject-Matter Domains

– Metadata Common Vocabulary

– SCOVO (Statistical Cοre Vocabulary)• Simple and minimal

• Concepts: dataset, data item and dimension

– Data Cube Vocabulary

4

Page 5: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Greek Census Data Overview Inhabitants (Π-1.2) Households and dwellings (Π-1.1)

• sex, marital status, nationality,

educational level, address, etc.

• type of residence, total area,

facilities, number of residents, etc.

5

Page 6: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Motivation - Why publish these data as

LOD?

─ easier to process format

─ crawling and querying via SPARQL

─ Identifiable and linkable

─ comparable against other datasets─ comparable against other datasets

─ consumption by third parties

─ data exploration and development of novel

applications

─ consistency and uniformity between datasets

6

Page 7: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Outline of the Data Cube Vocabulary

7

• Multidimensional model (Cube)

• Dimensions, attributes, measures

(components)

• What the observation applies to

• What is the phenomenon being

observed

• How it was measured

Page 8: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Google Refine

• Simple and powerful tool

• Clean up messy data

• Process data with its own expression language (GREL)

• Transform data between different formats, such as TSV, CSV, *SV, Excel, JSON,

XML, and Google Data documents

• RDF Refine plugin to convert files to RDF

8

Page 9: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

URI Scheme

• Base URI: http://linked-statistics.gr

• The URI scheme distinguishes between:

− Schema – Structural components (DSD, Components Specifications)

http://{BASE_URI}/schema#{ComponentName}

@prefix schema:<http://linked-statistics.gr/schema/>

− Dataset and observations− Dataset and observations

http://{BASE_URI}/data/{DatasetName}

http://{BASE_URI}/data/{DatasetName}#{DatasetKey}

@prefix data:<http://linked-statistics.gr/data/>

− Concepts and their values

http://{BASE_URI}/dic/{ConceptName}

http://{BASE_URI}/dic/{ConceptName}#{value}

@prefix dic:<http://linked-statistics.gr/dic/>

9

Page 10: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Converting to RDF

• Step 1- Download the census results

in .xls format from the Hellenic

Statistical Authority website and

imported the data in Google Refine

• Census results: permanent

population in Greece for 2011

based on the place of residence

• Step 2 – Clean up the data (removed

(Dimension) (Measure)geographical code Permanent population

• Step 2 – Clean up the data (removed

unwanted empty rows, cells)

• Step 3 - Build skeleton with Data Cube

Vocabulary:

a. Define the components

b. Define the DSD, Dataset and

Component Specifications

c. Define Observations

• Step 4 – Export data to RDF file as

RDF/XML

10

Page 11: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Modelling with Data Cube Vocabulary

Data Cube Vocabulary Custom Vocabulary Instances

qb:DimensionProperty

qb:CodedProperty

schema:geocodeDim URI of an administrative

division (geographical code,

“geocode”)

qb:MeasureProperty schema:population number

qb:AttributeProperty schema:UnitOfMeasure URI representing that the

population is measured by population is measured by

number of inhabitants

qb:DataStructureDefinition schema:PopulationPerGeocodeCensus2011

qb:DataSet data:PopulationPerGeocodeCensus2011

qb:Observation data:PopulationPerGeocodeCensus2011#{ge

ocode} *

11

*Each observation was connected with the appropriate administrative division using

the schema:geocodeDim property and to denote the population for the corresponding

geocode schema:population was used.

Page 12: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Modelling Administrative divisions of residence

• Characteristics of each division:

– hierarchical geocode value

dic:geocode#{geocodeValue}

– description – skos:prefLabel property

– administrative level - dic:haslevel

property12

Page 13: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Example – Population of the division with the geocode

0102

geolevel#5 geocode#0102

observation

13

Page 14: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

RDF Output Exampleobservation:

geocode:geocode:

14

Page 15: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Conclusions and Future Work

• Case study on publishing census data as LOD

• The census data concerned Greece’s resident population census (2011)

• Tabular data converted (.xls file) to RDF • Tabular data converted (.xls file) to RDF

• Further work aims at extending the proposed data model for representing more statistical indexes and more complex census datasets

• New release of Data Cube Vocabulary (13/03/13)

• Configuration of SPARQL endpoint service

15

Page 16: Publishing Census as Linked Open Data. A Case Study€¦ · Publishing Census as Linked Open Data. A Case Study Irene Petrou George Papastefanatos Institute for the Management of

Merci

Beaucoup!!!

Questions?

16

All published data are available at:

http://linked-statistics.gr