Upload
jian-qin
View
762
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
Data Repositories and Services
Xiamen University Library June 8, 2012
Jian Qin
School of InformaCon Studies Syracuse University
hDp://eslib.ischool.syr.edu/jqin/
Agenda • What is a repository? Repository soNware? • What does it do? • How does it work? • Case studies: – Dryad: an internaConal repository of data and publicaCons for basic and applied biosciences
– Dataverse: a data repository system
2 Data repositories and services 6/8/12
What is a data repository?
Data repositories and services 3
Data Repository is a logical (and someCmes physical) parCConing
of data where mulCple databases which apply to
specific applicaCons or sets of applicaCons reside.
hDp://www.learn.geekinterview.com/data-‐warehouse/
dw-‐basics/what-‐is-‐data-‐repository.html
Repository commonly refers to a locaCon for storage, oNen for safety
or preservaCon.
hDp://en.wikipedia.org/wiki/Repository
6/8/12
WHAT CAN WE EXPECT IN A DATA REPOSITORY?
Data repositories and services 4 6/8/12
Technical features • Standards
– OAI-‐PMH – Z39.50 protocol – Open source license
• Hardware – Minimum hardware requirements – SAN support
• So;ware – OS – Programming language – Database – Web server – Java servlet engine – Search engine – Other
• Staff requirements – UNIX systems
administrator – Java programmer – PERL programmer – Python programmer
Data repositories and services 5
Open Society InsCtute. (2004). A guide to insCtuConal repository soNware. 3rd ed. hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf
6/8/12
Features and funcCons • Repository & system administraDon – User registraCon, authenCcaCon & password administraCon
– Module-‐level APIs • Content submission administraDon – Define mulCple collecCons with same instance of system
– Submission stages – Submission support – System generated usage stats and reposts
Data repositories and services 6
Open Society InsCtute. (2004). A guide to insCtuConal repository soNware. 3rd ed. hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf
6/8/12
FuncCons of repositories • Content management
– Content import/export – Document/object formats – Metadata – Real-‐Cme updaCng and indexing of accepted content
• DisseminaCon – User interface – Search capability
• Full text • All descripCve metadata • Selected metadata fields • Browse • Sort search results
– Indexed by Google/other search engines
• Archiving – Persistent document idenCficaCon
– Data preservaCon report – Object history/version control
• System maintenance – System support
• DocumentaCon/manual • Listserv • Bug track/feature request system
• Formal support/help desk
Data repositories and services 7
Open Society InsCtute. (2004). A guide to insCtuConal repository soNware. 3rd ed. hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf
6/8/12
Research community
The context of repositories
Data repository
Datasets PublicaCons, presentaCons, reports, etc.
InsCtuConal repository
Disciplines Standards Technology
8 Data repositories and services 6/8/12
InsCtuConal repositories • An insCtuConal repository (IR)consists of formally
organized and managed collecCons of digital content generated by faculty, staff, and students at an insCtuCon
• Types of IRs: – CollecCon-‐based digital repositories managed by library
professionals – Course management systems and associated file stores – CollecCon of research data and reports managed by research
units (centers, laboratories, etc.) – Student academic porlolio systems – InsCtuConal file storage systems – Digital asset management workflow systems – Web content management systems used by insCtuCons or
depts to store and stage web content
PublicaCons, presentaCons, reports, etc.
InsCtuConal repository
EDUCAUSE Evolving Technologies CommiDee. (2003). InsCtuConal repositories: Enhancing teaching, learning, and research. hDp://net.educause.edu/ir/library/pdf/DEC0303.pdf
9 Data repositories and services 6/8/12
Data repositories • No one agreed-‐upon definiCon • CharacterisCcs: – A repository operated by an academic insCtuCon/unit or a research organizaCon
– A system for storing, managing, preserving, and providing access to data
– Centered on a discipline or a research field involving mulCple disciplines
– Policies governing the intellectual property rights, management, access, sharing, and citaCon
Data repository
Datasets
10 Data repositories and services 6/8/12
Dryad: a repository for data and publicaCons
Data repositories and services 11
hDp://datadryad.org/
• As a data repository, Dryad provides a plalorm to associate data with underlying publicaCons.
• Content acquisiCon: user submission • How to moCvate users to submit data? • Make it simple and rewarding • Provide detailed support informaCon about:
• DeposiCng data • Managing data • Intellectual property rights (CC0) • Download data packages • View usage staCsCcs
6/8/12
Dryad metadata record example
6/8/12 Data repositories and services 12
hDp://datadryad.org/handle/10255/dryad.8085
Dryad metadata record example (cont’d)
6/8/12 Data repositories and services 13
Individual files in the data package. The metadata shows: • # of downloads • File technical
data • Copyright type • DocumentaCon
for the data file
Dryad Backend • Uses core features of DSpace with modificaCons or complete replacement
• Uses OAI-‐PMH to allow metadata harvesCng – Metadata formats available for harvesCng include
• METS/MODS, OAI-‐DC (Dublin Core), OAI-‐ORE/Atom, and RDF/DC
• Uses DOI to idenCfy Dryad data packages and files
6/8/12 Data repositories and services 14
hDp://wiki.datadryad.org/Category:Technical_DocumentaCon
DOI Examples
• Data packages – doi:10.5061/dryad.1664 – doi:10.5061/dryad.642 – doi:10.5061/dryad.1307
• Data files – doi:10.5061/dryad.1664/1 – doi:10.5061/dryad.642/1 – doi:10.5061/dryad.1307/1 – doi:10.5061/dryad.1307/2 – doi:10.5061/dryad.1307/3
6/8/12 Data repositories and services 15
DATA REPOSITORY SOFTWARE
6/8/12 Data repositories and services 16
6/8/12 Data repositories and services 17
6/8/12 Data repositories and services 18
Dataverse metadata ediCng interface
6/8/12 Data repositories and services 19
Dataverse metadata ediCng interface (cont’d)
6/8/12 Data repositories and services 20
Standards and tools for repositories • Open Archive IniCaCve (OAI) and its Protocol for Metadata HarvesCng (OAI-‐PMH)
• Tools (open source): – DSpace (hDp://www.dspace.org) – Fedora (hDp://www.fedora-‐commons.org/) – Dataverse (hDp://thedata.org/) – EPrints (hDp://www.eprints.org/) – More: hDp://oad.simmons.edu/oadwiki/Free_and_open-‐source_repository_soNware
21 Data repositories and services 6/8/12