13
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative: Toward a Standard for the Social Sciences Mary Vardigan, Pascal Heus, Wendy Thomas ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center [email protected] / [email protected] / [email protected]

3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

Embed Size (px)

Citation preview

Page 1: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

3rd International Digital Curation ConferenceWashington, DC, Dec 2007

Paper Presentations: Interoperability, Metadata & Standards

Data Documentation Initiative: Toward a Standard for the Social Sciences

Mary Vardigan, Pascal Heus, Wendy Thomas

ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center

[email protected] / [email protected] / [email protected]

Page 2: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

What is Metadata?

• Common definition: Data about Data

Unlabeled stuff Labeled stuff

The bean example is taken from: A Manager’sIntroduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf

Page 3: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Managing data and metadata is challenging!

We are in charge of the data. We support our users but also need to protect our respondents!

We want easy access to high quality and well documented data!

We need to collect the information from the producers, preserve it, and provide access to our users!

Producers

Librarians

Users

General Public

Policy Makers

Sponsors

Media/Press

Academic

Business

Government

We have an information

management problem

Page 4: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Metadata issues

• Without producer / archive metadata– researchers can’t work discover data or perform efficient

analysis

• Without researcher metadata– Research process is not documented and cannot be

reproduced (Gary King replication standard!)– Other researchers are not aware of what has been done

(duplication / lack of visibility)– Producer don’t know about data usage and quality issues

• Without standards– Such information can’t be properly managed and

exchanged between actors or with the public

• Without tools:– We can’t capture, preserve or share knowledge

Page 5: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

XML to the rescue!

• XML stands for eXtensible Markup Language• Technology that is driving today’s web service

oriented architecture of the Internet and Intranets• Using XML, we can capture, structure, transform,

discover, exchange, query, edit and secure metadata and data

• XML is platform & language independent and can be used by everyone

• XML is both machine and human readable• XML is non-proprietary, public domain and many

open tools exist• Domain specific standards are available!

Page 6: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Suggested XML metadata specifications for socio-economic data

• Statistical Data and Metadata Exchange (SDMX)– Macrodata, time series, indicators, registries– http://www.sdmx.org

• Data Documentation Initiative (DDI)– Microdata (surveys, studies)– http://www.ddialliance.org

• ISO 11179– Semantic modeling, concepts, registries– http://metadata-standards.org/11179/

• ISO 19115– Geography– http://www.isotc211.org/

• Dublin Core– Resources (documentation, images, multimedia)– http://www.dublincore.org

Page 7: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

The Data Documentation Initiative (DDI)

• International XML based specification for the documentation of social and behavioral data– Started in 1995, now driven by DDI Alliance (30+

members)– Became XML specification in 2000 (v1.0) – Current version is 2.1 with focus on archiving

(survey/codebook)• New Version 3.0 (2008)

– Focus on entire survey “Life Cycle”– Provide comprehensive metadata on the entire survey

process and usage– Aligned on other metadata standards (DC, MARC, ISO

11179, SDMX, …)– Include machine actionable elements to facilitate

processing, discovery and analysis• DDI is being adopted by producers/archives but

needs to extends to the researchers (who are using the data!)

Page 8: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

DDI 3.0 and the Survey Life Cycle

• A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals

• DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”• 3.0 focus on metadata reuse (minimizes redundancies/discrepancies,

support comparison)• Also supports multilingual, grouping, geography, and others• 3.0 is extensible

Page 9: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Metadata Components

• Producer metadata:– Codebook, questionnaires, reports,

methodologies, processing, scripts, quality, admin, etc.

• Research metadata– Recodes, analysis, table, scripts, papers, logs,

data quality, usage– Citations, references– Activities, discussions, knowledge base

• Outputs– Papers, presentations, tables, reports

Page 10: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

When to capture metadata?

• Metadata must be captured at the time the event occurs! (not after the facts)

• Documenting after the facts leads to considerable loss of information

• This is true for producers and researchers

Page 11: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Solutions?

• Simple solutions: use good practices– File and variable naming conventions, sound

statistical methods (metadata in names!)– Comment source code– Document your work

• Adopt DDI & other standard based metadata solutions:– DDI tools, citation database, source code level

metadata capture, variable recodes, table disclosure, data quality feedback, comparability

• Take advantage of web based collaborative tools– Wiki, blogs, discussion groups, lists

Page 12: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Benefits

• Comprehensive data documentation– Through good metadata practices, comprehensive

documentation captured by producers, librarians and users is available to ALL researchers

• Preservation, integration and sharing of knowledge– Research process is captured and preserved in standard

formats– Research knowledge becomes integrant part of the survey

and available to all – Reduce duplication of efforts and facilitates reuse– Producer gets feedback from the data users (usage, quality

issues), which lead to better and more relevant data

• Research outputs and dissemination– Facilitate production of research outputs– Facilitate dissemination and fosters broader visibility of

research results

Page 13: 3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:

DDI Alliance – http://www.ddialliance.org

Conclusions

• Metadata is a crucial component of social and behavioral science

• The Data Documentation Initiative (DDI) is a globally accepted specification for capturing microdata documentation and knowledge

• Latest version 3.0 extends into the entire survey Life Cycle

• Producers and data archives are rapidly adopting metadata standards.

• This adoption process should extend into the research community

• Best practices in data and metadata management benefit all users and have the potential to change the way we conduct research

• http://www.ddialliance.org or [email protected]