26
Case Studies: Statistics Canada (WP 11) Alice Born [email protected] Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007

Case Studies: Statistics Canada (WP 11) Alice Born [email protected] Statistics [email protected] UNECE Workshop on Statistical Metadata

Embed Size (px)

Citation preview

Page 1: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Case Studies:Statistics Canada (WP 11)

Alice Born [email protected] Statistics Canada

UNECE Workshop on Statistical Metadata

July 4 to 6, 2007

Page 2: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Outline

1. Overview

2. Statistical metadata systems and the statistical cycle

3. Statistical metadata in each phase of the statistical cycle

4. Systems and design issues

5. Organizational and cultural issues

Page 3: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Overview of Integrated Metadatabase (IMDB)

• To support interpretation of the data – dissemination phase

• Responsibility of Standards Division (metadata, classifications and standard definitions)

• Adherence to Policy on Informing Users on Data Quality and Methodology, Policy on Standards and Quality Assurance Framework

• In general, metadata goes back November 2001

Page 4: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Overview of Integrated Metadatabase (IMDB)

• Contains metadata on 350 active and 250 inactive surveys and statistical programs– Purpose– Methodology used to produce the data– Measures of data accuracy– Variables, classifications for the data– Location of clean master datafile – Contacts

• Survey managers cannot release data without the prescribed metadata – mandatory

Page 5: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Overview of Integrated Metadatabase (IMDB)

Next priorities:• Complete documentation of variables• Complete questionnaire model• determine metadata for archived datafiles – may require

additional metadata

Lessons learned:• Opportunities in collecting metadata in the first phase of

the statistical cycle – not at the time of dissemination

Page 6: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata systems and the statistical cycle

Relationship with survey planning and design phase• IMDB expanded its role as part of the Household Survey Content

Harmonization• Standardize concepts, questions, question blocks across household

surveys• Variables follow the ISO-IEC 11179• Questions and question blocks, associated response choices linked

to variables and classifications are stored in the IMDB at the beginning

• Survey Specification Manager pulls metadata from the IMDB but contains specifications and code

Page 7: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata systems and the statistical cycle

Relationship to dissemination systems• Metadata for information modules on the STC website

– mandatory • Information for survey respondents – requires

metadata prior to release of data• Data Liberation Initiative – public-use microdata files

documented in DDI• Metadata to support data exchange – SDMX, DDI,

XBRL, Wiki, HTML, etc….

Page 8: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata systems and the statistical cycle

Relationship to aggregation - analysis phase• Analytical datawarehouses use IMDB to organize their

tables (variables and classifications)

Relationship to archive phase• IMDB contains location of master datafile, record layout,

contact information• Currently developing business rules for archived

datafiles

Page 9: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata systems and the statistical cycle

Relationship with management systems• Software Register – registry of Agency’s software and

applications organized by survey and statistical program – IMDB is the inventory

• Quality management assessment and questionnaire – based on inventory of surveys in the IMDB; reuse of existing metadata

Page 10: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

OperationsManagement

QualityAssurance

Analysis Dissemination

Collect Edit Estimate Tabulate Publish

OperationalData

RegistersSurveyData

AdministrativeData

Data Warehouses

Operational Data Stores

IMDB in the survey life cycleIMDB in the survey life cycle

Design

Metadata IMDB

Archive

IMDB

Page 11: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata for phases in the statistical cycle

Metadata describing statistical business processes– Data dissemination for interpretation of data– IMDB serves as the corporate inventory of all

surveys and statistical programs, questionnaires, master datafiles

– metadata or paradata resides in other metainformation systems – SSM, IQMS

Page 12: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata for phases in the statistical cycle

Metadata for data elements– Supports: Survey planning and design; Analysis;

Dissemination; Archiving– Metadata objects tracked over time for changes

(versioning) and validity (registration)– Output to online data tables and STC products– For discovery – inventory of DE on STC website and

STCWiki (internal review before going public)– Links to questions, question blocks, datafiles

Page 13: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

STCWiki – Type of marital status of person

Page 14: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Statistical metadata for phases in the statistical cycle

Metadata for survey planning and design– Questions, standard questions blocks and

standard response choices in IMDB– Mapped to value domains, data elements and

surveys in the IMDB– These metadata assembled into collection

instruments in other metainformation systems outside the IMDB

Page 15: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Systems and design issues

• IMDB started in 1998– Phase 1 Consolidation of existing metadata stores– Phase 2 Metadata describing statistical business

processes– Phase 3 Metadata for data elements, etc.

• MetaStat system – Statistical activity, survey, instance, frame, universe, instrument, datafiles, survey methodology, documentation, data accuracy

• MetaWeb system – object class, property, data element, value domain, question, response choices, question block, value meaning manager

Page 16: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Phase 2 Input Screens

Text strings related to data components

Directives Resource Bundle

Key Value

SurveySDDS Statistical Data Doc…… ...

Labels Resource Bundle

Key Value

SurveySDDS SDDS… ...

IMDBdatabase

Page 17: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Phase 2 Input ScreenAdministered Item

Page 18: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Phase 2 - Identification Tab

Page 19: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Systems and design issues

Dissemination and information discovery systems

• Web publication from IMDB is through HTML, dynamically generated with Perl scripts

• Conforms to government standards – CLF• Survey-centric view and developing DE-centric view• Discovery from Wiki solution – non-linear view of Phase

2 and 3 metadata• Allows users to view links among administered items in

the IMDB

Page 20: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Organizational and cultural issues

• Information management• Assist in harmonization / usage of standards• Knowledge sharing• Corporate memory• Reuse of our metainformation assets

Page 21: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Knowledge Sharing/Corporate MemorySurvey Life Cycle

IMDB

Collect Edit Estimate Tabulate PublishDesign

Survey

Universe

Frame

Instance

Collection Instrument

Methodology

Data Files

Enterprise Architecture

Concepts(Object Class, Property,

Data Element Concept)

Data Elements

Questions

Questions Blocks

Classifications(Conceptual Domain

Value Domain)

Page 22: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Corporate MemoryData Files

IMDB

OperationalData

RegistersSurveyData

AdministrativeData

Operational Data Stores

Clean Master File

Public UseMaster File

Archival information

ArchivedData

Page 23: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

IMDB

Reuse of Information AssetsInformation Discovery/Dissemination

Wiki

HTML

SDMX

DDI

?

One meta data source

many uses for the information

many output formats

Page 24: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Reuse of Information AssetsApplications Development

IMDB

Classification coding

Collection instrument development

Publishing

Other applications

Page 25: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Reuse of Information AssetsIntegration with Data

IMDBData Warehouses

CANSIM

Page 26: Case Studies: Statistics Canada (WP 11) Alice Born alice.born@statcan.ca Statistics Canadaalice.born@statcan.ca UNECE Workshop on Statistical Metadata

Organizational and cultural issues

• STC is one of the most integrated statistical systems in the world

• As part of its Enterprise Architecture strategy – moving towards centralized and generalized systems, including the IMDB

• IMDB was built initially to support interpretation of disseminated data

• Pressure is to provide metadata up (and down) the statistical value chain and into management systems

• Opportunities at the Survey planning and design phase – reuse of existing metadata (variables, classifications, questions, etc) registered in the IMDB – coherence