44
Virtual meeting 4 3 June 2016 ISA Programme Action 1.1 StatDCAT-AP

StatDCAT-AP - Joinup.eu · Key points from Rome meeting (2) • SDMX transformation mechanism o Not part of the specifications (relevant for SDMX implementers migrating to StatDCAT-AP)

  • Upload
    ngohanh

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Virtual meeting 4

3 June 2016

ISA Programme Action 1.1

StatDCAT-AP

Opening, agenda, tour de table

Agenda

1. Opening, agenda, tour de table

2. Outcomes Rome meeting, 13 May 2016

3. Objectives of this meeting

4. Overview of DCAT-AP and StatDCAT-AP

5. Extensions: Dimensions and attributes, Quality aspects, Visualisation, Other extensions

6. Mapping theme vocabulary (Eurostat MDR)

7. SDMX Transformation mechanism

8. Any other business

9. Next steps

Tour de table

Outcome of Rome meeting 13 May 2016

Key points from Rome meeting (1)

• Decisions on extensions:

o Not include ‘Number of observations’ and ‘Dimensions as keywords’

o Include ‘Number of data series’

o Further discussion on ‘Link to visualisation’, ‘Dimension as property’, ‘Quality aspects’, ‘Statistical population’, ‘Statistical unit’ and ‘Length of time series’

o Discuss proposals for RDF terms

Key points from Rome meeting (2)

• SDMX transformation mechanism

o Not part of the specifications (relevant for SDMX implementers migrating to StatDCAT-AP) → move to

annex

o Slight preference for the use of a Metadata Structure Definition (recommendation?)

• Theme vocabulary mapping

o Publications Office to propose mapping from Eurostat themes to Metadata Registry (MDR) data themes

Objectives of this meeting

Intended outcome

• Agree on proposed extensions

• Review mapping of themes Eurostat Metadata

Registry (MDR)

• Reach common understanding of proposed SDMX transformation mechanism

• Identify further issues for the future

• Prepare for public review period

Overview of DCAT-AP and StatDCAT-AP

Model diagram DCAT-AP

Catalogue

Mandatory Recommended Optional

dcat:datasetdct:descriptiondct:publisherdct:title

foaf:homepagedct:languagedct:licensedct:issueddcat:themeTaxonomydct:modified

dct:hasPartdct:isPartOfdcat:recorddct:rightsdct:spatial

Dataset

Mandatory Recommended Optional

dct:descriptiondct:title

dcat:contactPointdcat:distributiondcat:keyworddct:publisherdcat:theme

adms:identifieradms:sampleadms:versionNotesdcat:landingPagedct:accessRightsdct:accrualPeriodicitydct:conformsTodct:hasVersiondct:isVersionOfdct:identifierdct:issueddct:languagedct:modifieddct:provenancedct:relationdct:sourcedct:spatialdct:temporaldct:typefoaf:pageowl:versionInfo

StatDCAT-AP to add optional properties:

dqv:hasQualityAnnotationqb:attribute or stat:attributeqb:dimension or stat:dimensionschema:populationstat:numSeriesstat:statUnit

Distribution

Mandatory Recommended Optional

dcat:accessURL dct:descriptiondct:formatdct:license

adms:statusdcat:byteSizedcat:downloadURLdcat:mediaTypedct:conformsTodct:issueddct:languagedct:modifieddct:rightsdct:titlefoaf:pagespdx:checksum

StatDCAT-AP to add optional property: dct:type

Dimensions and attributes

Dimensions and attributes

• Requirement to expose information about:

o Dimensions: e.g. observations related to sex, age, etc.

o Attributes: e.g. observations expressed in certain units

• Option 1: re-use properties qb:dimension and qb:attribute from Data Cube Vocabulary

• Option 2: define new properties stat:dimensionand stat:attribute in StatDCAT-AP namespace

Option 1: Data Cube properties

• Expected values: URI of qb:DimensionProperty and qb:AttributeProperty

• However, these properties are not directly attached to qb:Dataset

• This may lead to confusion for datasets that are published as qb:Datasets

Option 2: StatDCAT-AP properties

• Allows semantics to be precisely defined to meet StatDCAT-AP requirements

• Expected values can still be: URI of qb:DimensionProperty and qb:AttributeProperty

• For Data Cube datasets, values can be derived (copied) from qb:dimension and qb:attribute

Quality aspects

Short-term vs. longer-term approach

• Quality aspects are very important for datasets in general and statistical datasets in particular

• Due to time and resource constraints we cannot fully address the issue now

• Propose to address it in two phases:

o Short-term: provide mechanism to link to existing quality information in StatDCAT-AP, version 1

o Longer-term: consider integrated quality framework as basis for extensions to StatDCAT-AP, version 2

Combining ESMS and ESQRS: the “Single Integrated Metadata Structure” (SIMS)

of the European Statistical System

Short-term: annotation

• Link to existing document/webpage with quality information, or provide plain text

• Use property: dqv:hasQualityAnnotation from W3C Data Quality Vocabulary (in development)

• Expected value: URI (e.g. webpage) or plain text conformant to specification of oa:Annotation

:Dataset-001 a dcat:Dataset ;

dqv:hasQualityAnnotation :Annot-001 .

:Annot-001 a dqv:QualityAnnotation ;

oa:hasBody <URL> ;

oa:hasTarget :Dataset-001 ;

oa:motivation oa:commenting .

:Dataset-001 a dcat:Dataset ;

dqv:hasQualityAnnotation :Annot-001 .

:Annot-001 a dqv:QualityAnnotation ;

oa:hasBody [ oa:text "Some text" ] ;

oa:hasTarget :Dataset-001 ;

oa:motivation oa:commenting .

Longer-term: Quality aspects of SIMS

• Eurostat's Single Integrated Metadata Structure includes specific quality aspects:

o e.g. Accessibility and clarity; Quality management; Relevance; Accuracy and reliability; Timeliness and punctuality; Coherence and comparability

• This set of aspects can form the basis for future extensions to StatDCAT-AP, or even to DCAT-AP

Property: Statistical unit

• ESMS concept STAT_UNIT:o Defined as “entity for which information is sought and for which

statistics are ultimately compiled”

o Usage note: “list the basic units of statistical observation for which data are provided. These observation units (e.g. the enterprise, the local unit, private households,...) can be different from the reporting units used in the underlying statistical surveys”

• New property in StatDCAT-AP namespace: stat:statUnit

• Expected value: free text

• Or: defer to deeper discussion on quality aspects?

Property: Statistical population

• ESMS concept STAT-POP:o Defined as: “total membership or population or "universe" of a defined

class of people, objects or events”

o Usage note: “describe the target statistical population (one or more) which the data set refers to, i.e. the population about which information is to be sought”

• Use property: schema:population, “Any characteristics of the population used in the study, e.g. 'males under 65'.”

• Expected value: free text

• Or: defer to deeper discussion on quality aspects?

Visualisation

Link to visualisation

• Visualisation can be seen as a type of Distribution

• Use dcat:accessURL and dct:type

• Type: MDR ../distribution-type/VISUALIZATION (to

be added)

• Example::Visual-001 a dcat:Distribution ;

dcat:accessURL <URL of page> ;

dct:type <MDR distribution type> .

Type of distribution

• To support modelling approach for visualisation

• Use property: dct:type

• Expected value: URI of type (e.g. visualisation)

• Also to be discussed for DCAT-AP as part of larger discussion on Distributions that are not files, see:

https://joinup.ec.europa.eu/asset/dcat-ap_implementation_guidelines/issue/service-based-data-access

Other extensions

Number of data series

• Information on how values in the Dataset are groupedo Dataset contains data for three regions with three values for each

region number of series is three while the number of observations is

nine

• New property in StatDCAT-AP namespace: stat:numSeries

• Expected value: integer

Time coverage of the data series

• Proposed to express this as start and end of data series, e.g. 2011-2012 rather than “two years”

• Use property: dct:temporal

• Expected value: time period with schema:startDate and schema:endDate

• Already in DCAT-AP; no extension necessary

Mapping theme vocabulary

Eurostat themes MDR Data themes

Themes Title Title Code TypeTheme 1 General and regional statistics general folder

Theme 2 Economy and finance Economy and finance economy folder

Theme 3 Population and social conditions Population and society popul folder

Theme 4 Industry, trade and services Economy and finance icts folder

Theme 5 Agriculture, forestry and fisheries Agriculture, fisheries, forestry,foods agric folder

Theme 6 International trade Economy and finance external folder

Theme 7 Transport Transport transp folder

Theme 8 Environment and energy Environment Energy envir folder

Theme 9 Science and technology Science and technology science folder

Eurostat Metadata Registry (MDR)

*Legend

Red clear: no equivalence

Red: partial equivalence

Orange: equivalence by combing two terms

Green: equivalence

SDMX Transformation mechanism

How to produce DCAT-AP metadata

• Organisations are free to choose how to create DCAT-AP from their systems

o For SDMX users the specification defines a mapping between SDMX-ML structural metadata and DCAT-AP

o For those not wishing to use SDMX, the organisation must make its own map between the metadata in its system and DCAT-AP

• Two approaches are under consideration:

o SDMX structural metadata to DCAT-AP

o SDMX metadata set to DCAT-AP

SDMX Structural Metadata (Example)DCAT-AP Dataset

SDMX Dataflow

DCAT-APDataset

DCAT-APDataset

SDMX Metadata Set – valid content defined by Metadata Structure Definition (MSD)

Metadata Attributes Defined in MSD

Metadata Report

Metadata Target

Attribute Values

DCAT-AP Transformation Mechanism

DCAT-AP

Intermediary File or Data Stream

SDMX Structural Metadata Repository

Other Metadata Sources

SDMX Data Reader

Choices• SDMX-ML Structure• SDMX Metadata Set

These components can be developed in Java and .NET and integrated into SDMX systems or used in SDMX conversion tools Data Publisher

Organisation

SDMX DataValidator

MSD

SDMX Data Writer

Any other business

More issues? Comments, questions?

Next steps

Planning

• December 2015: invitations to stakeholders, set up collaboration infrastructure

• January 2016: collect requirements and suggestions

• 5 February 2016: Familiarisation Webinar

• February 2016: first draft based on initial analysis and issues raised

• 11 March 2016: first virtual WG meeting to discuss first draft

• 15 April 2016: second meeting; to discuss draft mapping and implementation options

• 6 May 2016: second draft available for review, incorporating comments and further development

• 13 May 2016: third meeting (face-to-face plus Adobe Connect) in Rome; to discuss mapping issues in practice

• End of May 2016: third draft, including full mapping proposal and usage of controlled vocabularies

• 3 June 2016: fourth virtual WG meeting to agree schedule for public review

• June 2016: preparation of final draft for public review

• July and August 2016: public review period

• Mid-September 2016: fifth virtual WG to discuss and resolve public comments received

• End of September 2016: approval of StatDCAT-AP version 1 for publication

Public review period planning

• Preparation of final draft – early to mid-June

• Review of final draft by ISA – mid- to late June

• Announcement (Joinup, mailing lists) – late June

• Gathering comments – July and August

• Final WG meeting to resolve received comments – mid-September (Doodle to be issued)

• Publication StatDCAT-AP 1.0 – end September