19
07-Apr-08 Metadata use in the Statistical Value Chain UNECE-Eurostat-OECD Meeting on Management of Statistical Information Systems MSIS 2008 Luxembourg, 7-9 April 2008 Georges Pongas Adam Wroński

Metadata use in the Statistical Value Chain

Embed Size (px)

DESCRIPTION

Metadata use in the Statistical Value Chain. UNECE-Eurostat-OECD Meeting on Management of Statistical Information Systems MSIS 2008 Luxembourg, 7-9 April 2008 Georges Pongas Adam Wroński. Content. Introduction Operational Characteristics of Metadata - PowerPoint PPT Presentation

Citation preview

Page 1: Metadata use in the Statistical Value Chain

07-Apr-08

Metadata use in the Statistical Value Chain

UNECE-Eurostat-OECD Meeting on

Management of Statistical Information SystemsMSIS 2008

Luxembourg, 7-9 April 2008

Georges Pongas Adam Wroński

Page 2: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 2

Content

1. Introduction

2. Operational Characteristics of Metadata

3. Technical Characteristics of the Metadata

4. Metadata types needed in the various steps of the SVC (statistical value chain)

5. Conclusion

Page 3: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 3

Seven SVC steps1. Expression of the need

2. Data collection design

3. Specification and development of the tools needed for the data collection

4. Data collection

5. Data editing and imputation

6. Data processing

7. Data dissemination

Page 4: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 4

Basics

Leave out the statistical notions from the technical (implementation oriented) characteristics of the metadata.

Design metadata technical characteristics so the same metadata structures can cover both statistical and non-statistical requirements

Page 5: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 5

Operational Characteristics of Metadata

Static nature Long production process Located in various places (resources) Critical link with statistical data

– depends on statistical data changes Strong coupling of structural metadata with

the statistical data Large number of metadata entities needed in

SVC

Page 6: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 6

Technical Characteristics of Metadata

Terminology often complex Technical characteristics and

statistical notions frequently mixed

Page 7: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 7

Statistical Notions and Metadata Examples

– Classification, keyword list and set of information related to the SDDS standard

– Correspondence table between two classifications & table containing the links (access rights) between the user names and the statistical datasets of a database

The only difference is the context, i.e., the user interface

Thus develop separately: – a common set of functionalities and – the interface layer for an application

Page 8: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 8

Metadata Technical Structure Categories

Three categories proposed:1. Simple Metadata Entities (SME)

2. Binary Relationships (BR)

3. Clustered Metadata Entities (CME)

Page 9: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 9

Simple Metadata Entities (SME)

simple key variable number of attributes appropriate for

vertical type storage

Example 1Example 2

Entity NACE user nameEntity element 2122 gpongasAttribute name English label phone noAttribute value “Mining” 430139

Page 10: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 10

Examples of SMEs

SDDS documentsDublin CoreClassificationsKeywordsAdministrative entitiesProgramsPublications

Page 11: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 11

Binary Relationships (BR)

Two types: Between two different entities

– correspondence tables, access rights definitions Inside the same entity

– thesauri, classification hierarchies, links between regulations, statistical documents

ExampleRelationship id UN thesaurusFirst entity id EUROPEFirst entity role ParentSecond entity id FRSecond entity role ChildReason of link Broader term

Page 12: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 12

Clustered Metadata Entities (CME)

Complex entities characterised by variable keys’ cardinality and references to other entities of type CME, SME and BR

Description techniques – XML schema is appropriate

Page 13: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 13

Examples

SDMX, Gesmes definitions

Dataset definitions

Annotations to dataset cells

Confidentiality definitions linked to datasets

Page 14: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 14

Metadata in the various steps of the SVC

Page 15: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 15

Collection Metadata

Mostly of type BR and SMEAmong others they contain:

– source agencies– data files descriptions– codelists– validation rules linked to initial data

checks

Page 16: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 16

Editing, Imputation and Processing Metadata

More complex than the collection metadata (more CME entities needed)

Among others they contain: – Dataset definitions– Formulas, programs, scripts– Conditional and ordinary annotations– Dissemination feeding information

Page 17: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 17

Dissemination Metadata The most complex metadata types

are located here. They contain almost all the previously

described metadata plus their own Reasons for this complexity

• Dissemination contains all the statistical domains

• It must cover all user types• It has tight delivery deadlines • It must offer navigation presentation and

extraction facilities of great friendliness

Page 18: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 18

Among others dissemination metadata contain

Sitemap descriptionRelease calendarsDataset links to publication tablesQuestionnaires definitions linked to

datasetsUnits of measurementReady made queries

Page 19: Metadata use in the Statistical Value Chain

7-Apr-08 Metadata use in the Statistical Value Chain 19

Conclusion

Separation of

statistical notions (context) and structure (functionality) of metadata

gives

minimisation of structural metadata types

consequently it makes easier to

build and implement a complex statistical (metadata and data) system