30
Web Archiving Service (WAS) Rosalie Lack [email protected] Data Curation for Practitioners 2012 Workshop

Was uc3-nov2012wkshps-final

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Was uc3-nov2012wkshps-final

Web Archiving Service (WAS)

Rosalie [email protected]

Data Curation for Practitioners 2012 Workshop

Page 2: Was uc3-nov2012wkshps-final

Imagine a world …

Page 3: Was uc3-nov2012wkshps-final

This is our world …

Page 4: Was uc3-nov2012wkshps-final

WAS … is

A service of the UC Curation Center to collect, manage, preserve and publish websites and documents.

Page 5: Was uc3-nov2012wkshps-final

WAS Snapshot

53 public archives

120+ archives total

7,500+ sites

50+ TB

23 institutions

Page 6: Was uc3-nov2012wkshps-final

WAS Institutions• Institute of Governmental

Studies Library, UCB• UC Berkeley Office of Public

Affairs• UC Davis Libraries• UC Irvine Libraries • UC Los Angeles Libraries • UC Riverside Libraries • UC San Diego Libraries • UC San Francisco Libraries • UC Santa Barbara • UC Santa Cruz McHenry

Library

• Emory University Library• Institute for Research on Labor

and Employment• New York University• Northwestern University Library• Purdue  University • Stanford University Libraries • Temple University• University of Arkansas Libraries • University of Illinois at Urbana

Champaign Libraries • University of Michigan, Bentley

Historical Library • USDA Economic Research

Service • Water Resources Collections and

Archives 

Page 7: Was uc3-nov2012wkshps-final

WAS Overview

A) Curator Tools

Page 8: Was uc3-nov2012wkshps-final

Curator Workflow

Page 9: Was uc3-nov2012wkshps-final

1. Create Site

• Enter site name, URL and description• Scope• Capture frequency• Robots.txt

Page 10: Was uc3-nov2012wkshps-final

2. Capture Sites

Page 11: Was uc3-nov2012wkshps-final

3. View Captures

• View captures• QA• Compare

Page 12: Was uc3-nov2012wkshps-final

4. Public Access

• Customize the archive• Write description• Create custom banner and icon

Page 13: Was uc3-nov2012wkshps-final

WAS Overview

B) Public Archives

Page 14: Was uc3-nov2012wkshps-final

Web Archive ‘home page’

Page 15: Was uc3-nov2012wkshps-final

Browse: Site List + Tags

Page 16: Was uc3-nov2012wkshps-final

Search: All Sites in an Archive

Page 17: Was uc3-nov2012wkshps-final

Integration with your Systems

Page 18: Was uc3-nov2012wkshps-final

How are people using WAS?

Page 19: Was uc3-nov2012wkshps-final

Institution’s website

• Preserve intuitional history

• Capture university news and events

Page 20: Was uc3-nov2012wkshps-final

Geographically focused

Page 21: Was uc3-nov2012wkshps-final

Topical

Support special research collections

Page 22: Was uc3-nov2012wkshps-final

Event• Sudden action

required• May need many

selectors• Start date / end

date

Page 23: Was uc3-nov2012wkshps-final

Researcher’s Perspective

• Building collections for research– Study the topic / event– Study site change or web-based

communication– Websites are datasets for analysis and data

mining

• Preservation of research– Archive grant-funded websites– Selected sites

• Create stable citations for publications

Page 24: Was uc3-nov2012wkshps-final

Get started!

• Each library has WAS administrator(s)

• Unlimited number of curators per account

• What’s the cost?–UC does not pay a service fee– Storage only: $1040/per TB (average

site is $1.46/annually); storage costs to go down

Page 25: Was uc3-nov2012wkshps-final

Challenges

• Shared collection development• Metadata issues• Workflow and cost models for faculty

projects• Time!• Limitations of web crawlers• Websites are messy

Page 26: Was uc3-nov2012wkshps-final

Contact me!

Rosalie LackWAS Service [email protected]

Page 27: Was uc3-nov2012wkshps-final

Imagine a world …

“Imagine a world in which libraries and archives

had never existed. No institutions had ever

systematically collected or preserved our

collective cultural past: every book, letter, or

document was created, read and then

immediately thrown away.  What would we know

about our past?’’ 

Page 28: Was uc3-nov2012wkshps-final

This is our world …

“Yet, that is precisely what is happening with the

web: more and more of our daily lives occur

within the digital world, yet more than two

decades after the birth of the modern web, the

“libraries” and “archives” of this world are still

just being formed.”

A Vision Of The Role And Future Of Web ArchivesKalev H. Leetaru, Graduate School of Library and Information Science, University of Illinois. Presented as the keynote address at the 2012 IIPC General Assembly in Washington, DC.http://netpreserve.org/sites/default/files/resources/VisionRoles.pdf

Page 29: Was uc3-nov2012wkshps-final
Page 30: Was uc3-nov2012wkshps-final