24
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

Embed Size (px)

Citation preview

Page 1: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

HathiTrust and the Ecology of Shared Collections

Paul N. Courant

21 May 2009

Page 2: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

The Big Picture

Page 3: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Why Collaborate on Shared Digital?

• It used to make sense for libraries to compete on collections

• Now it only makes sense to compete in a very small area of collecting: the rare and unique (and sometimes, sadly, the expensive)

• For everything else, it makes economic sense to collaborate

Page 4: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Why not Google?

Because Google is not a library.

Page 5: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Persistence

• Persistence is essential for scholarship• The libraries that care about persistence are

relatively few. Most of them are in ARL.• This makes it even more important that those

of us who do care about persistence work to make it happen.

Page 6: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Two (and a half) models of participation

1) Contributing both collections and financial support

2) Using the collection and contributing financial support

2.5) Using the collection and contributing nothing. A.k.a. Free riders

Page 7: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

What does it take for me to be able to show in my catalog a work that is persistently available and held elsewhere?

Page 8: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

What is HathiTrust?

•origins•intentions

•size and growth projections•aspirations

Page 9: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

current members

Page 10: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Preservation: OAIS Reference Model

GRINInternal Data Loading

GRINInternal Data Loading

Google[OCA]

In-house Conversion

Google[OCA]

In-house Conversion

MARC record extensions (Aleph)

Rights DB

MARC record extensions (Aleph)

Rights DB

Page TurnerHathiTrust API

OAIGeoIP DB

CNRI Handles[Solr]

Page TurnerHathiTrust API

OAIGeoIP DB

CNRI Handles[Solr]

METS/PREMIS objectTIFF G4/JPEG2000

OCRMD5 checksums

METS/PREMIS objectTIFF G4/JPEG2000

OCRMD5 checksums

METS objectPNGOCRPDF

METS objectPNGOCRPDFIsilon

Site ReplicationTSM

MD5 checksum validation

IsilonSite Replication

TSMMD5 checksum validation

GROOVE(JHOVE)GROOVE(JHOVE)

Page 11: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Mission and Goals

• to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge – materials converted from print– improve access …to meet the needs of the co-owning

institutions– reliable and accessible electronic representations– coordinate shared storage strategies– “public good” … free-riders.– simultaneously …centralized …open

Page 12: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

growth trajectory

Page 13: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

accomplishments to date

1. 25 partners2. successful ingest and millions of vols online3. mirroring and backup4. rich access

Page 14: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

books and journals online?

Page 15: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Search inside in-copyright

Page 16: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

accomplishments to date

1. 25 partners2. successful ingest and millions of vols online3. mirroring and backup4. rich access5. “collection builder”

Page 17: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

Collection Builder

Page 18: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

accomplishments to date

1. 25 partners2. successful ingest and millions of vols online3. mirroring and backup4. rich access5. collection builder6. soon, full text search and data API

Page 19: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

Project staff review comments and enrich cataloging records.

TitleWasīlat al-ṭullābli-ma‘rifata‘mālal-laylwa-al-nahār bi-ṭarīq al-ḥisāb:

وسيلةالطالب ل معرفةأعمااللليلوالنهاربطريقالحساب

manuscript [between 1525? and 1861]

Ḥaṭṭāb, YaḥyáibnMuḥammad, 1496 or 7-1586 or 7.

.يحيىينمحمدالحطاب

Author

Comment 1

Comment 2

Comment 3

Catalog recordsLocal OPAC

Page imagesHathiTrustProject Website

Com

mentsE

nric

hed

reco

rds

Page 20: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

next up …

• non-Google ingest (OCA & local digitization)• corpus research support

– SEASR– Data export– Research center

• openness strategies• binding together shared print and digital in

strategy to manage local print

Page 21: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

Universal Library?

• collaborative work around collaborative problem

• preserving the published record• comprehensiveness through consolidation and

sense-making• commitment to perpetuity

Page 22: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

opportunities• economies of scale• comprehensive collection• combining print and digital strategies• more effective digital preservation• stepping stone to preserving other forms of digital

content• platform for new methods of discovery• non-consumptive research

Page 23: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

challenges

• digital preservation• collaboration• understanding what the right services are• The Silence of the Archive: The USPS problem

Page 24: HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009

www.hathitrust.org

thank you!

• http://www.hathitrust.org/• [email protected][email protected][email protected]