Upload
ngohanh
View
213
Download
0
Embed Size (px)
Citation preview
Agenda
1. Opening, agenda, tour de table
2. Outcomes Rome meeting, 13 May 2016
3. Objectives of this meeting
4. Overview of DCAT-AP and StatDCAT-AP
5. Extensions: Dimensions and attributes, Quality aspects, Visualisation, Other extensions
6. Mapping theme vocabulary (Eurostat MDR)
7. SDMX Transformation mechanism
8. Any other business
9. Next steps
Key points from Rome meeting (1)
• Decisions on extensions:
o Not include ‘Number of observations’ and ‘Dimensions as keywords’
o Include ‘Number of data series’
o Further discussion on ‘Link to visualisation’, ‘Dimension as property’, ‘Quality aspects’, ‘Statistical population’, ‘Statistical unit’ and ‘Length of time series’
o Discuss proposals for RDF terms
Key points from Rome meeting (2)
• SDMX transformation mechanism
o Not part of the specifications (relevant for SDMX implementers migrating to StatDCAT-AP) → move to
annex
o Slight preference for the use of a Metadata Structure Definition (recommendation?)
• Theme vocabulary mapping
o Publications Office to propose mapping from Eurostat themes to Metadata Registry (MDR) data themes
Intended outcome
• Agree on proposed extensions
• Review mapping of themes Eurostat Metadata
Registry (MDR)
• Reach common understanding of proposed SDMX transformation mechanism
• Identify further issues for the future
• Prepare for public review period
Catalogue
Mandatory Recommended Optional
dcat:datasetdct:descriptiondct:publisherdct:title
foaf:homepagedct:languagedct:licensedct:issueddcat:themeTaxonomydct:modified
dct:hasPartdct:isPartOfdcat:recorddct:rightsdct:spatial
Dataset
Mandatory Recommended Optional
dct:descriptiondct:title
dcat:contactPointdcat:distributiondcat:keyworddct:publisherdcat:theme
adms:identifieradms:sampleadms:versionNotesdcat:landingPagedct:accessRightsdct:accrualPeriodicitydct:conformsTodct:hasVersiondct:isVersionOfdct:identifierdct:issueddct:languagedct:modifieddct:provenancedct:relationdct:sourcedct:spatialdct:temporaldct:typefoaf:pageowl:versionInfo
StatDCAT-AP to add optional properties:
dqv:hasQualityAnnotationqb:attribute or stat:attributeqb:dimension or stat:dimensionschema:populationstat:numSeriesstat:statUnit
Distribution
Mandatory Recommended Optional
dcat:accessURL dct:descriptiondct:formatdct:license
adms:statusdcat:byteSizedcat:downloadURLdcat:mediaTypedct:conformsTodct:issueddct:languagedct:modifieddct:rightsdct:titlefoaf:pagespdx:checksum
StatDCAT-AP to add optional property: dct:type
Dimensions and attributes
• Requirement to expose information about:
o Dimensions: e.g. observations related to sex, age, etc.
o Attributes: e.g. observations expressed in certain units
• Option 1: re-use properties qb:dimension and qb:attribute from Data Cube Vocabulary
• Option 2: define new properties stat:dimensionand stat:attribute in StatDCAT-AP namespace
Option 1: Data Cube properties
• Expected values: URI of qb:DimensionProperty and qb:AttributeProperty
• However, these properties are not directly attached to qb:Dataset
• This may lead to confusion for datasets that are published as qb:Datasets
Option 2: StatDCAT-AP properties
• Allows semantics to be precisely defined to meet StatDCAT-AP requirements
• Expected values can still be: URI of qb:DimensionProperty and qb:AttributeProperty
• For Data Cube datasets, values can be derived (copied) from qb:dimension and qb:attribute
Short-term vs. longer-term approach
• Quality aspects are very important for datasets in general and statistical datasets in particular
• Due to time and resource constraints we cannot fully address the issue now
• Propose to address it in two phases:
o Short-term: provide mechanism to link to existing quality information in StatDCAT-AP, version 1
o Longer-term: consider integrated quality framework as basis for extensions to StatDCAT-AP, version 2
Combining ESMS and ESQRS: the “Single Integrated Metadata Structure” (SIMS)
of the European Statistical System
Short-term: annotation
• Link to existing document/webpage with quality information, or provide plain text
• Use property: dqv:hasQualityAnnotation from W3C Data Quality Vocabulary (in development)
• Expected value: URI (e.g. webpage) or plain text conformant to specification of oa:Annotation
:Dataset-001 a dcat:Dataset ;
dqv:hasQualityAnnotation :Annot-001 .
:Annot-001 a dqv:QualityAnnotation ;
oa:hasBody <URL> ;
oa:hasTarget :Dataset-001 ;
oa:motivation oa:commenting .
:Dataset-001 a dcat:Dataset ;
dqv:hasQualityAnnotation :Annot-001 .
:Annot-001 a dqv:QualityAnnotation ;
oa:hasBody [ oa:text "Some text" ] ;
oa:hasTarget :Dataset-001 ;
oa:motivation oa:commenting .
Longer-term: Quality aspects of SIMS
• Eurostat's Single Integrated Metadata Structure includes specific quality aspects:
o e.g. Accessibility and clarity; Quality management; Relevance; Accuracy and reliability; Timeliness and punctuality; Coherence and comparability
• This set of aspects can form the basis for future extensions to StatDCAT-AP, or even to DCAT-AP
Property: Statistical unit
• ESMS concept STAT_UNIT:o Defined as “entity for which information is sought and for which
statistics are ultimately compiled”
o Usage note: “list the basic units of statistical observation for which data are provided. These observation units (e.g. the enterprise, the local unit, private households,...) can be different from the reporting units used in the underlying statistical surveys”
• New property in StatDCAT-AP namespace: stat:statUnit
• Expected value: free text
• Or: defer to deeper discussion on quality aspects?
Property: Statistical population
• ESMS concept STAT-POP:o Defined as: “total membership or population or "universe" of a defined
class of people, objects or events”
o Usage note: “describe the target statistical population (one or more) which the data set refers to, i.e. the population about which information is to be sought”
• Use property: schema:population, “Any characteristics of the population used in the study, e.g. 'males under 65'.”
• Expected value: free text
• Or: defer to deeper discussion on quality aspects?
Link to visualisation
• Visualisation can be seen as a type of Distribution
• Use dcat:accessURL and dct:type
• Type: MDR ../distribution-type/VISUALIZATION (to
be added)
• Example::Visual-001 a dcat:Distribution ;
dcat:accessURL <URL of page> ;
dct:type <MDR distribution type> .
Type of distribution
• To support modelling approach for visualisation
• Use property: dct:type
• Expected value: URI of type (e.g. visualisation)
• Also to be discussed for DCAT-AP as part of larger discussion on Distributions that are not files, see:
https://joinup.ec.europa.eu/asset/dcat-ap_implementation_guidelines/issue/service-based-data-access
Number of data series
• Information on how values in the Dataset are groupedo Dataset contains data for three regions with three values for each
region number of series is three while the number of observations is
nine
• New property in StatDCAT-AP namespace: stat:numSeries
• Expected value: integer
Time coverage of the data series
• Proposed to express this as start and end of data series, e.g. 2011-2012 rather than “two years”
• Use property: dct:temporal
• Expected value: time period with schema:startDate and schema:endDate
• Already in DCAT-AP; no extension necessary
Eurostat themes MDR Data themes
Themes Title Title Code TypeTheme 1 General and regional statistics general folder
Theme 2 Economy and finance Economy and finance economy folder
Theme 3 Population and social conditions Population and society popul folder
Theme 4 Industry, trade and services Economy and finance icts folder
Theme 5 Agriculture, forestry and fisheries Agriculture, fisheries, forestry,foods agric folder
Theme 6 International trade Economy and finance external folder
Theme 7 Transport Transport transp folder
Theme 8 Environment and energy Environment Energy envir folder
Theme 9 Science and technology Science and technology science folder
Eurostat Metadata Registry (MDR)
*Legend
Red clear: no equivalence
Red: partial equivalence
Orange: equivalence by combing two terms
Green: equivalence
How to produce DCAT-AP metadata
• Organisations are free to choose how to create DCAT-AP from their systems
o For SDMX users the specification defines a mapping between SDMX-ML structural metadata and DCAT-AP
o For those not wishing to use SDMX, the organisation must make its own map between the metadata in its system and DCAT-AP
• Two approaches are under consideration:
o SDMX structural metadata to DCAT-AP
o SDMX metadata set to DCAT-AP
SDMX Metadata Set – valid content defined by Metadata Structure Definition (MSD)
Metadata Attributes Defined in MSD
Metadata Report
Metadata Target
Attribute Values
DCAT-AP Transformation Mechanism
DCAT-AP
Intermediary File or Data Stream
SDMX Structural Metadata Repository
Other Metadata Sources
SDMX Data Reader
Choices• SDMX-ML Structure• SDMX Metadata Set
These components can be developed in Java and .NET and integrated into SDMX systems or used in SDMX conversion tools Data Publisher
Organisation
SDMX DataValidator
MSD
SDMX Data Writer
Planning
• December 2015: invitations to stakeholders, set up collaboration infrastructure
• January 2016: collect requirements and suggestions
• 5 February 2016: Familiarisation Webinar
• February 2016: first draft based on initial analysis and issues raised
• 11 March 2016: first virtual WG meeting to discuss first draft
• 15 April 2016: second meeting; to discuss draft mapping and implementation options
• 6 May 2016: second draft available for review, incorporating comments and further development
• 13 May 2016: third meeting (face-to-face plus Adobe Connect) in Rome; to discuss mapping issues in practice
• End of May 2016: third draft, including full mapping proposal and usage of controlled vocabularies
• 3 June 2016: fourth virtual WG meeting to agree schedule for public review
• June 2016: preparation of final draft for public review
• July and August 2016: public review period
• Mid-September 2016: fifth virtual WG to discuss and resolve public comments received
• End of September 2016: approval of StatDCAT-AP version 1 for publication
Public review period planning
• Preparation of final draft – early to mid-June
• Review of final draft by ISA – mid- to late June
• Announcement (Joinup, mailing lists) – late June
• Gathering comments – July and August
• Final WG meeting to resolve received comments – mid-September (Doodle to be issued)
• Publication StatDCAT-AP 1.0 – end September
Join the SEMIC group on LinkedIn
Follow @SEMICeu on Twitter
Join the SEMIC community on Joinup
Project Officers [email protected]
Get involvedVisit our initiatives