The Alliance for Data Archive Technologies: Looking towards
a Common Future
Myron Gutmann, ICPSRBen Evans, ASSDA
Deborah Mitchell, ASSDAKevin Schürer, UK Data Archive
Overview
• Why?• What?• Why Now?• Early Steps• Understanding Process• Understanding Needs• Next Steps
Why?
• Data curation has been an ad hoc process, with local practices & expertise
• Since the 1990s– Enormous investment in technology– Significant successes in social science
(SDA, Nesstar, DVN, IPUMS, even ICPSR)– Major new ways to find & use content (Google) &
architectures to deliver content (web services)
More Why
• Proprietary systems unsustainable• Market too small for commercial systems• Partnerships will help avoid unnecessary
duplication of effort & assure efficiency• Need to be truly global
What?
• New organization to support technologies for curation, preservation, & delivery that are:– Open– Community-developed– Standards-based
• Built on existing networks of social science data archives & technology centers, and …
• Open to all who want to contribute
Why Now? Three Standards
• DDI – Metadata Standard• OAIS – Preservation Reference Model• Repository Architecture Standards:
- Fedora, D-Space & Duraspace
• Organizational models like the DDI Alliance, CESSDA, Data-PASS (even the new Hathi Trust)
Why Now? Community Tech
• Community-developed software has become widely used
• Examples: Drupal/Plone• Examples: Fedora• Examples: SOLR/Lucene
• But we shouldn’t ignore all the challenges that this software has faced
Why Now? Workflows
• Improved workflow technologies are operating in many of our institutions
• Some are shared in CESSDA & Data-PASS• And in other communities: Virtual
Observatory
• Another challenge: not the same as sharing business practices in complex organizations
Why Now? Progress So Far
• SDA• Nesstar• DVN
• All used in more than one archive• Not all open-source• Potential shared technologies that we can
leverage in the future
1st Steps: October 2008 Meeting
• ICPSR• ASSDA• UKDA• Roper Center - UConn• Odum Ins. – N. Carolina• Harvard - IQSS• Minnesota Pop. Center• Berkeley – SDA• DANS – Netherlands
• DDA Denmark• Gesis – ZA• South Africa• DDI Alliance• IASSIST• Library of Congress• U.S. NSF• U.S. NIH• Canadian SSHRC
***Thanks to Library of Congress for hosting
1st Steps: After October, 2008
• Solicit needs in the form of wish lists• Authorize creation of an organization at an
appropriate time• Work on raising money and finding common
ground for future work
Process: Begin with OAIS Model
Design OAIS for ICPSR
Focus on Ingest
ICPSR: Standards Compliance
OAIS Workflow• Ingest tools• AIP Creation-Validation• SIP Creation-Validation• DIP Creation-Validation• Audit tools
DDI Workflow• Tools for full variable-
level metadata creation not dependent on proprietary software (such as SPSS)
• DDI Editor• DDI Converter • DDI 2 to 3 translator
Needs: Wish Lists from …
• ICPSR• UKDA• ASSDA• Harvard• Roper Center• Odum Institute
• DANS (Netherlands)• DDA (Denmark)• GESIS (Germany)• NSD (Norway)• Minnesota Pop.
Center
Needs: A Catalog
Ingest
Data Management
Archival Storage
Access
Storage fabric/architecture (FEDORA or ?)Replication (LOCKSS)Persistent identifiersContent model development
Storage fabric/architecture (FEDORA or ?)Replication (LOCKSS)Persistent identifiersContent model development
Open metadata curationConfidentialitySoftware/algorithm archiving
Open metadata curationConfidentialitySoftware/algorithm archiving
Open metadata curationData format curationData management & analysisQualitative data managementData integrationMetadata registriesSurvey question managementData citation
Open metadata curationData format curationData management & analysisQualitative data managementData integrationMetadata registriesSurvey question managementData citation
Data format conversionSetup file creationInternational data sharingCommunity data/User comments/Web 2.0SearchConfidentialityPersistent identifiersVisualizationData citationSemantic data accessSecurity
Data format conversionSetup file creationInternational data sharingCommunity data/User comments/Web 2.0SearchConfidentialityPersistent identifiersVisualizationData citationSemantic data accessSecurity
AdministrationIdentity managementOAIS workflow & audit (SIP/AIP/DIP)Identity managementOAIS workflow & audit (SIP/AIP/DIP)
ProductionData producer toolsData producer tools
Next Steps: Canberra Meeting
• Prime Goal: Strategic Planning • What’s the business model?• What are the links to… – Standards?– Security?– Archiving practice & workflows?– Training & Research?
• How do we measure success?
Three Major Outcomes
• Goal 1: A few critical decisions– Standards, repository framework, software
approaches
• Goal 2: Initial Common Interests. Examples:– Fedora data/content models– Open source metadata tools (DDI 3?)
• Goal 3: How do we collaborate?