29
An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar Series

An Introduction to the Merritt Curation Repository

  • Upload
    elden

  • View
    54

  • Download
    4

Embed Size (px)

DESCRIPTION

UC3 Summer Webinar Series. An Introduction to the Merritt Curation Repository. University of California Curation Center Team California Digital Library June 9, 2011. First, a word about the webinar series…. A forum for timely topics of interest to the UC community - PowerPoint PPT Presentation

Citation preview

Page 1: An Introduction to the Merritt Curation Repository

An Introduction to the Merritt Curation Repository

University of California Curation Center TeamCalifornia Digital Library

June 9, 2011

UC3 Summer Webinar Series

Page 2: An Introduction to the Merritt Curation Repository

First, a word about the webinar series…• A forum for timely topics of interest to the UC

community– Highlighting projects, services, and developments in the

areas of digital preservation, web archiving, and data curation

– Intended to raise awareness of issues, and provide information on useful resources and services available to the UC community

– 2nd and 4th Thursday of the month, and as scheduled, featuring UC3 staff and UC librarians, content managers, and technologists

Teleconference +1 (866) 740-1260, access code 9879016#Webconference http://bit.ly/jdjMAP

Page 3: An Introduction to the Merritt Curation Repository

First, a word about the webinar series…

• Some logistics…– Participant phones will be muted during the formal

presentation, but we will be monitoring the online chat

– Slides, Q & A, and web and voice recordings will be posted after each presentation

– Schedule available at http://www.cdlib.org/uc3/uc3webinars.html

– Please suggest additional [email protected]

– Take the short surveyhttp://www.surveymonkey.com/s/XSGWP8R

Page 4: An Introduction to the Merritt Curation Repository

Now on with the show…

• Today’s topic is an introduction to the Merritt curation repository– Who is it for?

– What can it do?

– Why use it?

– What does it cost?

– Next steps?

– Q & A

Page 5: An Introduction to the Merritt Curation Repository

What keeps you up at night?

Are there standards or best practices I should

be aware of?

How much will it cost?

How can I transfer my content to an

appropriate curation environment

How do I know my content is safe?

What’s the best strategy to ensure

permanent availability?

Do I need to create new derivatives just for preservation purposes?

How can I get a persistent reference

to my content? What if my content needs to evolve over

time?

Can I control who can see my

content?

I have a good discovery platform; how can I add preservation services?

Page 6: An Introduction to the Merritt Curation Repository

“There’s an app for that”

Are there standards or best practices I should

be aware of?

How much will it cost?

How can I transfer my content to an

appropriate curation environment

How do I know my content is safe?

What’s the best strategy to ensure

permanent availability?

Do I need to create new derivatives just for preservation purposes?

How can I get a persistent reference

to my content? What if my content needs to evolve over

time?

Can I control who can see my

content?

I have a good discovery platform; how can I add preservation services?

Automatic replication and high-availability redundancy

Periodic fixity audit

Simple submission UI/APIMETS “feeder” duplicates

existing DPR workflow

Model freeNo packaging, format, or metadata requirements

Strongly versionedIntegration with

EZID and DataCiteCurator-defined

access control rules

Modular micro-services “toolkit”

UC3 consultation

Storage at $1.04/GB/year

Page 7: An Introduction to the Merritt Curation Repository

Merritt repository

• Merritt is available for use by all members of the UC community

– Libraries/archives/museums– ORU/MRUs– Faculty/staff

• Centrally hosted by UC3/CDL on behalf of the UC community– Economies of scale– Shared experience and

expertise

Mediated through campus libraries

Page 8: An Introduction to the Merritt Curation Repository

Modes of use: dark archive

• Pro-active preservation, but no expectation of direct end user access– Legacy DPR content contributed by campus libraries– Cultural heritage texts, master images, sound, moving

image, data sets

– All DPR content will be automatically migrated to Merritt

Page 9: An Introduction to the Merritt Curation Repository

Modes of use: bright archive

• Provide preservation and end user access– NIH Healthy Pathways project on bio-demographics

• Multi-institutional: UC Davis, University of Colorado, University of Virginia, Syddansk University (Denmark)

• Need to restrict access to project partners initially, with eventual public access

Page 10: An Introduction to the Merritt Curation Repository

Modes of use: bright archive

• Content discovery: search

Page 11: An Introduction to the Merritt Curation Repository

Modes of use: bright archive

• Content discovery: search

Page 12: An Introduction to the Merritt Curation Repository

Modes of use: bright archive

• Content discovery: browse

Page 13: An Introduction to the Merritt Curation Repository

Modes of use: bright archive

• Content discovery: browse

Page 14: An Introduction to the Merritt Curation Repository

Modes of use: preservation “back end”

• Preservation only; content discovery/delivery provided by well-known external systems– Using direct hooks into Merritt to retrieve content

– eScholarshipOpen access publishing

– Open ContextArchaeological data publishing

– Investigating integration with Islandora/Drupal and Alfresco

Page 15: An Introduction to the Merritt Curation Repository

Modes of use: distributed data grids

• DataONE “Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it”

Page 16: An Introduction to the Merritt Curation Repository

More information

• Online help http://merritt.cdlib.org/help

• FAQ http://merritt.cdlib.org/docs/merritt_handout.pdf

• User’s guidehttp://merritt.cdlib.org/docs/merritt_user_guide.pdf

• UC3 contact http://www.cdlib.org/uc3/[email protected]

Page 17: An Introduction to the Merritt Curation Repository

Merritt cost model

• UC3 provides technical infrastructure, data center hosting, staff, monitoring, maintenance, enhancements, help, outreach, consultation, etc.

• Contributors are charged only for storage used, at the UC3 recovery rate of $1.04/GB/year

• Developing an “endowment” model: Pay once, preserve forever

• Will soon extend model for non-UC contributors

How does this compare?• Cost of a physical book in RLF † $

4.62/year• Cost of a digital book in HathiTrust ‡ $

0.15/year• Cost of a digital book in Merritt $

0.06/year

† Gary Lawrence (2007) Internal analysis, CDL; ‡ Paul Courant and Matthew Nielsen (2010), On the cost of keeping a book, HathiTrust.

Page 18: An Introduction to the Merritt Curation Repository

Average collection sizes and costs

Collection Objects Size Annual cost

CA DOE reports 8,000 12.0 GB $ 12.48

Cal Cultures 420 65.6 GB $ 68.22

eScholarship 46,425 118.6 GB $ 123.34

A “cost calculator” spreadsheet is available athttp://www.cdlib.org/uc3/docs/Merritt-cost-calculator-v3.xlsx

Page 19: An Introduction to the Merritt Curation Repository

Average ETD size and cost

Campus ETD titles Size Annual cost

Berkeley 797 12.4 GB $ 12.88

Davis 837 13.0 GB $ 13.52

Irvine 390 6.1 GB $ 6.30

Los Angeles 720 11.2 GB $ 11.63

Riverside 192 2.9 GB $ 3.10

San Diego 558 8.7 GB $ 9.02

San Francisco * 560 8.7 GB $ 9.05

Santa Barbara 325 5.0 GB $ 5.25

Santa Cruz 155 2.4 GB $ 2.50

Based on 2009 holdings in ProQuest * UCSF based on total ETD holdings in Merritt

Page 20: An Introduction to the Merritt Curation Repository

Average research data size and cost

• Almost 50% of all research data is less than 1 GB

Source: Science 331:6018 (February 11, 2011): 692-693 <DOI: 10.1126/science.331.6018.692>

Size Percentage Annual cost

< 1 GB 48.3 % < $ 1.04

1 – 100 GB 32.0 % $ 1.04 – 104.00

100 GB – 1 TB 12.1 % $ 104.00 – 1,040.00

> 1 TB 7.6 % > $ 1,040.00

Page 21: An Introduction to the Merritt Curation Repository

Next steps

• UC3 is working with campus partners to determine ongoing development and collection priorities

ReplicationIdM/Authn/AuthzIngest, Access Inventory, QueuingStorage and Identity

Technology watchMetadata standardsPolicy and business modelData management guidelinesObject and collection modeling

New contentacquisition

Page 22: An Introduction to the Merritt Curation Repository

Next steps

In production• Model-free objects• Submission via UI and API• Persistent identifiers• Format identification• Version provenance• Automated replication• Automated fixity audit• Role-based access control• Collections• Semantic index and search• Object/version/file download

In progress

• Simplified update

• Enhanced characterization (JHOVE2)

• Faceted search and browse (XTF)• CMS/DAMS-like function

(Islandora)

In planning

• Simplified batch

• UCTrust integration

• Linked data

• Transformation• Notification• Annotation• Support for NGTS/DLSTF

recommendations

We welcome your feedback on needs and priorities!http://www.cdlib.org/uc3/[email protected]

Page 23: An Introduction to the Merritt Curation Repository

Simplified update

• Variant form of object update requiring the submission of only the changed components

• Client-side tools to simplify the creation of batch manifests #%checkm_0.7

#%profile | http://uc3.cdlib.org/registry/ingest/mani#%prefix | mrt: | http://merritt.cdlib.org/terms##%prefix | nfo: | http://www.semanticdesktop.org/onto#%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hash

http://merritt.cdlib.org/samples/goldenDragon.jpg | mhttp://merritt.cdlib.org/samples/tumbleBug.jpg | md5 http://merritt.cdlib.org/samples/generalDrapery.jpg | http://merritt.cdlib.org/samples/generalDrapery.jpg |

#%eof

Page 24: An Introduction to the Merritt Curation Repository

Enhanced characterization

• JHOVE2 next-generation framework for format-aware characterization http://jhove2.org/

– Automated extraction and inference of extensive technical metadata significant for preservation analysis and planning

"Module": { "scope": "ICCModule“, "Header": { "scope": "ICCHeader“, "ProfileSize": { "unit": "byte“, "value": 60960 } ,"ProfileVersionNumber": "4.2.0.0“ ,"ProfileDeviceClass_raw": "spac“ ,"ProfileDeviceClass_descriptive": "ColorSpace Conversion profile“ ,"ColourSpace_raw": "RGB “ ,"ColourSpace_descriptive": "rgbData“ ,"ProfileConnectionSpace_raw": "Lab “ ,"ProfileConnectionSpace_descriptive": "labData“

Page 25: An Introduction to the Merritt Curation Repository

Enhanced discovery via XTF

• eXtensible Text Framework http://xtf.cdlib.org/

– CDL developed/supported open source discovery platform– Robust, scalable faceted search and browse

Page 26: An Introduction to the Merritt Curation Repository

CMS/DAMS-like function

• Many campuses are looking for CMS/DAMS solutions• Investigating integration with Islandora to provide a

Drupal CMS/DAMS front-end to Merritt

http://islandora.ca/ http://drupal.org/

Page 27: An Introduction to the Merritt Curation Repository

Questions?

Page 28: An Introduction to the Merritt Curation Repository

Upcoming webinarsDate/time TopicWednesday, June 1512:30 pm

Data Sharing by Scientists: Practices and PerceptionsCarol Tenopir, Univ. TennesseeMike Frame, USGS

Thursday, June 302:00 pm

The Data Management Planning Tool (DMP Tool)Trisha Cruse, UC3

Thursday, July 142:00 pm

Data as PublicationJohn Kunze, UC3Catherine Mitchell, CDL Publishing Program

Thursday, July 282:00 pm

Merritt: Depositing Content and Providing Access

Thursday, August 112:00 pm

DCXL (Data Curation Excel)

http://www.cdlib.org/uc3/uc3webinars.html

Please take the webinar survey http://www.surveymonkey.com/s/XSGWP8R

Page 29: An Introduction to the Merritt Curation Repository

For more information

UC Curation Centerhttp://www.cdlib.org/uc3http://www.cdlib.org/uc3/[email protected]

Stephen Abrams Margaret LowLisa Colvin David LoyPatricia Cruse Mark Reyes Scott Fisher Tracy Seneca Erik Hetzner Joan StarrGreg Janée Marisa StrongJohn Kunze Perry Willett

UC3 webinar serieshttp://www.cdlib.org/uc3/uc3webinars.html

Merritt repositoryhttp://merritt.cdlib.org/ http://merritt.cdlib.org/helphttp://merritt.cdlib.org/docs/merritt_handout.pdfhttp://merritt.cdlib.org/docs/merritt_user_guide.pdf