7
Office of Strategic Initiatives Office of Strategic Initiatives All Hands Meeting-March 2010 All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team NDIIPP Partner Meeting, July 21, 2010

Office of Strategic Initiatives

  • Upload
    hachi

  • View
    30

  • Download
    1

Embed Size (px)

DESCRIPTION

Challenges in Web Archiving: Library of Congress Edition. Office of Strategic Initiatives. Office of Strategic Initiatives. Abbie Grotke, Web Archiving Team NDIIPP Partner Meeting , July 21, 2010. All Hands Meeting-March 2010. Library of Congress Web Archiving Program. - PowerPoint PPT Presentation

Citation preview

Page 1: Office of Strategic Initiatives

Office of Strategic InitiativesOffice of Strategic Initiatives

All Hands Meeting-March 2010All Hands Meeting-March 2010

Challenges in Web Archiving:

Library of Congress Edition

Abbie Grotke, Web Archiving Team

NDIIPP Partner Meeting, July 21, 2010

Page 2: Office of Strategic Initiatives

Library of Congress Web Archiving Program

p. 1

• 10 years of archiving

• 5 full-time OSI staff on our team, plus 2 contractors, and other IT and Web Services support

• 80+ staff selecting content for our collections: Library Services, Law Library, and Congressional Research Services

• 30+ event and thematic collections

• 12,500+ URLs processed and permissions sent

• 181 TB of content collected

Page 3: Office of Strategic Initiatives

What We Do Pretty Well At This Point

p. 2

• Web Archiving workflows and processes had evolved, and had become more institutionalized

• Improved crawling strategies so we can react more quickly, manage our archive data better, and better serve our customers at LC

• Large-scale contract crawling by Internet Archive

• A move from collection-by-collection crawling to monthly and weekly “crawl buckets”

• Small-scale in-house crawling now available

• tests, emergency crawls

Page 4: Office of Strategic Initiatives

What We Do Pretty Well At This Point

p. xx

• Better tools now to more easily manage our team’s work and all data about various activities: nomination, permissions, crawling, quality review, reporting, etc.

• Automation of manual activities to reduce time spent processing URLs for our nominators and our team

Page 5: Office of Strategic Initiatives

Ongoing Challenges

p. 4

• Selection

• What to select - so many URLs, so little time

• No full-time selection staff, everyone is busy

• Quality Review

• Training to involve Nominators more in the process – “Did we get what you wanted us to get?”

• Team Resources:

• 14 web archive projects actively crawling

• Testing our bandwidth

Page 6: Office of Strategic Initiatives

Ongoing Challenges

p. 5

• Legal

• Permissions: still only about 50% response rate

• Access for Researchers

• Harvesting:

• Collection of specific types of content: rapidly changing news content, YouTube

• Training Nominators re: frequency of collection

• Ramping up in-house crawling (Can we? Should we?)

• The Data:

• How do we transfer this content easily? From IA and within LC

• How do we manage it, store it, and preserve it?

Page 7: Office of Strategic Initiatives

More Information

p. 6

• Web Archiving Team Public Page (about the activity):

http://www.loc.gov/webarchiving/

• Library of Congress Web Archives (our collections):

http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html

• Digital Preservation Video on Web Archiving:

http://www.digitalpreservation.gov/videos/webarch09/index.html

• Contact: Abbie Grotke, [email protected]