37
Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California, San Francisco Noah Wittman University of California, Berkeley

Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Embed Size (px)

Citation preview

Page 1: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Sharing Data-Rich Research Through Repository Layering

Stephen AbramsCalifornia Digital Library

Angela Rizk-JacksonJulia Kochi

University of California, San Francisco

Noah WittmanUniversity of California, Berkeley

Page 2: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Why is data curation important?

Accelerating scientific progress Enabling appropriate scrutiny and verification of results Promoting integrity and debate Facilitating new collaborations Avoiding needless duplication of effort Increasingly, complying with institutional policies, publication

requirements, and funder mandates

Cf. White and Teds (2011), “Making the case for research data management” DCC briefing paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm

Page 3: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

The library’s role

A continuation of its long-standing mission and practice to connect patrons with content of interest in meaningful ways across barriers of space and time

Cf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf

Offering solutions that enhance the natural points of alignment between the scholarly research and information lifecycles

PublishPublish

ReuseReuse

ShareShareCreateCreate

DiscoverDiscover

CollectCollect

PreservePreserveAccessAccessResearchResearch CurationCuration

Scholarly lifecycle Information lifecycle

Page 4: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Merritt

Curation repository available to the UC community and external partners Preservation and access Content agnostic, model free Highly decentralized micro-services architectureCf. Abrams, Cruse, Kunze, and Minor (2011), “Curation micro-services: A pipeline metaphor for

repositories,” Journal of Digital Information 12(2), journals.tdl.org/jodi/article/view/1605

26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB

www.cdlib.org/uc3/merrittmerritt.cdlib.org

Page 5: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Merritt

Storage nodeStorage nodeStorage broker

Storage broker

InventoryInventory

ONEShare UNM storage node

ONEShare UNM storage node

Storage nodeStorage node

UI/APIUI/API

UI/APIUI/API

UI/APIUI/API

FixityFixity

User agentUser agent

Message queue

Message queue

Load balancer

Load balancer

IngestIngest

Load balancer

Load balancer

IngestIngest

IngestIngest

EZIDEZIDDataCiteDataCite

DataONE member node

DataONE member node

……

IDFIDF

Load balancer

Load balancer

Web of Knowledge

Web of Knowledge

PrimoPrimo

SAN

Page 6: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

(Some) issues to address

Scale Individual objects ranging from 0 to 47,000 files Individual files ranging from 0 to 14 GB

Maintaining control Concern over potential loss of control over dissemination and

use of data

User experience Switch from organizational to individual interaction

www.flickr.com/photos/vixon/116447718www.flickr.com/photos/traftery/4319529821www.flickr.com/photos/32195273@N05/51076852642

Page 7: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

(Some) issues to address

Scale Individual objects ranging from 0 to 47,000 files Individual files ranging from 0 to 14 GB

Maintaining control Concern over potential loss of control over dissemination and

use of data

User experience Switch from organizational to individual interaction

Augment repository function by composition (when possible) and addition (when necessary) Loosely-coupled integration with external community supported

systems and services

Page 8: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Scale

Avoiding client timeout ≤ 2 GB: File-based stream-based AIP-to-DIP processing > 2 GB: Asynchronous delivery

Email notification with personalized, time-limited URL

Streamlined storage provisioning SDSC cloudcloud.sdsc.edu

www.kevatron.co.uk/converting-8-24-bit-samples-in-coreaudio-on-ios www.flickr.com/photos/paulbhartzog/680749585

Page 9: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Control

Data use agreements (DUAs) Explicit assertion of license requirements and terms of use Curatorial and consumer notification of acceptance

Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-69, doi:10.1016/j.jbi.2006.09.001

From: [email protected]: Merritt DUA acceptance

Name: Stephen AbramsAffiliation: California Digital LibraryCollection: UCSF DataShareObject: Frontotemporal Lobar Degeneration (FTLD)Date: 2013-05-31 09:50:34 PDTTerms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the identity of any of the study subjects.(2) I will share these data only with my immediate co-workers, and I will not transfer these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them. (3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement ...

From: [email protected]: Merritt DUA acceptance

Name: Stephen AbramsAffiliation: California Digital LibraryCollection: UCSF DataShareObject: Frontotemporal Lobar Degeneration (FTLD)Date: 2013-05-31 09:50:34 PDTTerms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the identity of any of the study subjects.(2) I will share these data only with my immediate co-workers, and I will not transfer these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them. (3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement ...

Page 10: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

User experience

Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems

Shifting user roles, shifting expectations Institutional individual researcher Behavioral expectations set by the commercial/mobile web

Page 11: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

User experience

Due to its open eligibility policy, Merritt will always provide a more generic UX than special-purpose or disciplinary systems

Shifting user roles, shifting expectations Institutional individual researcher Behavioral expectations set by the commercial web

Integration with extant services that better provide the desired UX DataShare

Research Hub

Page 12: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

“The goal of the DataShare project is to catalyze widespread sharing of scientific research data”datashare.ucsf.edu

UCSF Clinical and Translational Science Institutectsi.ucsf.edu

UCSF Librarywww.library.ucsf.edu

UCSF Center for Imaging of Neurodegenerative Diseasewww.radiology.ucsf.edu/cind

ArchitectureDataShare submission client (Ruby/Rails)

Merritt curation repositoryDataShare discovery portal (XTF/Java)

Page 13: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe Upload Curate Discover Share

Page 14: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Best practice advice

Describe Upload Curate Discover Share

Page 15: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe

Schema-directedmetadata editor DataCite schemaschema.datacite.org

Upload Curate Discover Share

Page 16: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe Upload

File browse ordrag-n-drop

Curate Discover Share

Page 17: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe Upload

File browse ordrag-n-drop

Curate Discover Share

Page 18: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe Upload Curate

Manage datasets

Discover Share

Page 19: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe Upload Curate Discover

Faceted search andbrowse

Share

Page 20: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

DataShare

Prepare Describe Upload Curate Discover Share

DataONE DataCite (soon) PrimoWeb of Knowledge SEO

Page 21: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Merritt + DataShare

Storage nodeStorage nodeStorage broker

Storage broker

InventoryInventory

ONEShare UNM storage node

ONEShare UNM storage node

Storage nodeStorage node

UI/APIUI/API

UI/APIUI/API

UI/APIUI/API

FixityFixity

User agentUser agent

Message queue

Message queue

Load balancer

Load balancer

IngestIngest

Load balancer

Load balancer

IngestIngest

IngestIngest

EZIDEZIDDataCiteDataCite

DataONE member node

DataONE member node

……

IDFIDF

Load balancer

Load balancer

Web of Knowledge

Web of Knowledge

PrimoPrimo

SAN

DataShare upload

DataShare upload

Collection Atom feedCollection Atom feed

XTF xtf.cdlib.org

XTF xtf.cdlib.org

DataShare portal

DataShare portal

Page 22: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

“Research Hub provides powerful tools for content management and collaboration”

hub.berkeley.edu

Alfresco CMSwww.alfresco.com

770 projects, 3,900 users Personal file management Project collaboration Departmental resource pooling Research data management

Desktop sync, mobile app, Adobe Creative Suite

UC Berkeley Information Services and Technologyist.berkeley.edu

Page 23: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Acquire andarrange

Describe Upload Curate Discover Share

Page 24: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe

Schema-directedmetadata editors

Upload Curate Discover Share

Page 25: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload

Direct action

Curate Discover Share

Page 26: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Prepare Describe Upload

Direct action

Curate Discover Share

Research Hub

Page 27: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload

Policy-based workflow rules

Curate Discover Share

Page 28: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload

Drag-and-drop

Curate Discover Share

Page 29: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload

Confirmation

Curate Discover Share

Page 30: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload Curate

Manage datasets

Discover Share

Page 31: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload Curate Discover Share

Page 32: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Research Hub

Prepare Describe Upload Curate Discover Share

Page 33: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Merritt + DataShare + Research Hub

Storage nodeStorage nodeStorage broker

Storage broker

InventoryInventory

ONEShare UNM storage node

ONEShare UNM storage node

Storage nodeStorage node

UI/APIUI/API

UI/APIUI/API

UI/APIUI/API

FixityFixity

User agentUser agent

Message queue

Message queue

Load balancer

Load balancer

IngestIngest

Load balancer

Load balancer

IngestIngest

IngestIngest

EZIDEZIDDataCiteDataCite

DataONE member node

DataONE member node

……

IDFIDF

Load balancer

Load balancer

Web of Knowledge

Web of Knowledge

PrimoPrimo

SAN

DataShare upload

DataShare upload

Collection Atom feedCollection Atom feed

XTF xtf.cdlib.org

XTF xtf.cdlib.org

DataShare portal

DataShare portal

Research Hub

Research Hub

Page 34: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Next steps

Self-service account registration UCTrust and InCommon

Shibboleth federations

Additional cloud-based replication

Outreach

Integration with Open Context archaeological portal

opencontext.org

Atom-based submission

Integration with Nuxeowww.nuxeo.com

UC system-wide DAMS solution

Integration with Islandoraislandora.ca

Collaboration with UCLA Library Tuque API

Integration with DPNwww.dpn.org

Page 35: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Sharing research through repositories

Conform to institutional policy, publication requirements, and funder mandates

Pro-active curation of valuable research outputs Stable citation and access High visibility publication and discovery Use metrics

Page 36: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

Sharing research through repositories

Conform to institutional policy, publication requirements, and funder mandates

Pro-active curation of valuable research outputs Stable citation and access High visibility publication and discovery Use metrics Repository layering as an appropriate division of labor

Exploiting existing capabilities already in local use

Page 37: Sharing Data-Rich Research Through Repository Layering Stephen Abrams California Digital Library Angela Rizk-Jackson Julia Kochi University of California,

For more information Merrittwww.cdlib.org/uc3/[email protected] Abrams David LoyPatricia Cruse Mark ReyesShirin Faenza Joan StarrScott Fisher Carly StrasserErik Hetzner Marisa StrongJoshua Hubbard Bhavitavya VedulaGreg Janée Kenneth WeissJohn Kunze Perry WilletRosalie Lack

DataSharedatashare.ucsf.eduGeoffrey Boushey Julia KochiAnirvan Chatterjee Angela Rizk-JacksonManinder Kahlon Michael Weiner

Research Hubhub.berkeley.eduIan Crew Michael McCarthy (Tribloom)Noah Wittman Patrick McGrath

www.slideshare.net/UC3/or-2013abramssharingdatarichresearch