39
Web Archiving: An Overview Karl-Rainer Blumenthal, Internet Archive Sumitra Duncan, Frick Art Reference Library Metropolitan New York Library Council January 7, 2015

Web Archiving: An Overview

Embed Size (px)

Citation preview

Page 1: Web Archiving: An Overview

Web Archiving: An Overview

Karl-Rainer Blumenthal, Internet ArchiveSumitra Duncan, Frick Art Reference Library

Metropolitan New York Library CouncilJanuary 7, 2015

Page 2: Web Archiving: An Overview

What is web archiving?

Web archiving is the process of collecting, preserving, and enabling access to web-native materials.

Page 3: Web Archiving: An Overview

Why archive the web?

> Collect web-native resources in your traditional collecting scope.

> Fulfill a records retention requirement.

> Document spontaneous/online events.

> Combat link rot and content drift (no more 404s!).

Page 4: Web Archiving: An Overview

How does it work?

> Web crawlers navigate live websites and download their source code to Web ARChive (WARC) files.

Page 5: Web Archiving: An Overview

How does it work?

> Replay technologies render the archived websites as they appeared at the time they were crawled.

Page 6: Web Archiving: An Overview

Web archiving tools and services

The Wayback Machinehttps://archive.org/web/

The largest publicly available web archive in existence.

> 450+ Billion URLs > 100+ million websites> 40+ languages > ~ 1 billion URLs added per week

Page 7: Web Archiving: An Overview

Web archiving tools and services

The Wayback Machinehttps://archive.org/web/

The largest publicly available web archive in existence.

> 450+ Billion URLs > 100+ million websites> 40+ languages > ~ 1 billion URLs added per week

Page 8: Web Archiving: An Overview

Web archiving tools and services

HeritrixHTTrackUmbrawarcproxWget

ARCWARC

Wayback MachineOpenWaybackpywb (Python Wayback)Webenactoldweb.today

Page 9: Web Archiving: An Overview

Web archiving tools and services

HeritrixHTTrackUmbrawarcproxWget

ARCWARC

Wayback MachineOpenWaybackpywb (Python Wayback)Webenactoldweb.today

Archive-ItNetarchiveSuite (DK/FR)PANDAS (AUS)Web Curator (UK/NZ)Webrecorder

Page 10: Web Archiving: An Overview

Who archives the web?

Society of American Archivist Web Archiving Roundtable> 900+ member participants

Archive-It> 400+ partner organizations (software service subscribers)

National Digital Stewardship Alliance (NDSA)> Surveyed web archivists in in 2011, 2013, 2015...

Page 11: Web Archiving: An Overview

Who archives the web?

Organizations with web archiving programs by typeNDSA, Web Archiving in the United States: A 2013 Survey

52%

15%13%

8%

5%

4%

1%

2%

Page 12: Web Archiving: An Overview

Who archives the web?

Use of external service vs. in-house archivingNDSA, Web Archiving in the United States: A 2013 Survey

63%

16%

20%

Page 13: Web Archiving: An Overview

Who archives the web?

Staff dedicated to web archiving programNDSA, Web Archiving in the United States: A 2013 Survey

36%

19%

25%

6%

7%7%

Page 14: Web Archiving: An Overview

Participation in a collaborative web archiveNDSA, Web Archiving in the United States: A 2013 Survey

Who archives the web?

48%

33%

17%2%

Page 15: Web Archiving: An Overview

Web archiving issues and trends

> Access and discovery

> Big data analysis

> Appraisal, provenance, and metadata

> Spontaneous events and social media

> Permissions and privacy policies

Page 16: Web Archiving: An Overview

Web archiving issues and trends

> Access and discovery

> Big data analysis

> Appraisal, provenance, and metadata

> Spontaneous events and social media

> Permissions and privacy policies

Page 17: Web Archiving: An Overview

Web archiving issues and trends

> Access and discovery

> Big data analysis

> Appraisal, provenance, and metadata

> Spontaneous events and social media

> Permissions and privacy policies

Page 18: Web Archiving: An Overview

Web archiving issues and trends

> Access and discovery

> Big data analysis

> Appraisal, provenance, and metadata

> Spontaneous events and social media

> Permissions and privacy policies

Page 19: Web Archiving: An Overview

Web archiving issues and trends

> Access and discovery

> Big data analysis

> Appraisal, provenance, and metadata

> Spontaneous events and social media

> Permissions and privacy policies

Page 20: Web Archiving: An Overview

NYARC

Page 21: Web Archiving: An Overview

Why web archiving at NYARC?

> Drift from print to born-digital

> Alignment with traditional collecting strengths & unique holdings

> Ephemeral nature of websites & risk of impermanence

> Not addressed elsewhere = risk of gap in art historical record

> Leverage consortial collaboration = better able to be nimble

Willem de Ridder. European Mail-Order Warehouse/Fluxshop inventory with Dorothea Meijer, seated, in the home of the artist, Amsterdam. 1964-65. Gelatin silver print. The Museum of Modern Art, New York.

Page 22: Web Archiving: An Overview

How NYARC got started

> 2010 Auction House Pilot Study with Archive-It

> 2012 Planning Study

> 2013-2015 Mellon Grant for Web Archive Implementation

Page 23: Web Archiving: An Overview

Web archiving life cycle at NYARC

Page 24: Web Archiving: An Overview

Collection development / curation

Page 25: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 26: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 27: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 28: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 29: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 30: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 31: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 32: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 33: Web Archiving: An Overview

Collection scope

> Art Resources

> Artists’ Websites

> Auction Catalogs

> Catalogues Raisonnes

> Institutional Web Presence

> NYC Galleries

> Restitution of Lost or Looted Art

Page 34: Web Archiving: An Overview

Curation & Quality assurance

Page 35: Web Archiving: An Overview

Challenges & Lessons learned

> Scale

> Rapidly evolving and new technologies

> Cost

> Infrastructure/tools

> Permissions/intellectual property considerations

Page 36: Web Archiving: An Overview

Goals & Lessons learned

> Rich and substantial collections

> Permanence and long-term preservation

> Scalability and sustainability

> Networked collections

> Greater collaboration = crucial to work together

Page 37: Web Archiving: An Overview

Where can/should I get started?

NDSA Web Archiving in the United States Surveyshttp://1.usa.gov/1z1H3jo

SAA Web Archiving Roundtablewww2.archivists.org/groups/web-archiving-roundtable

METRO Web Archiving Special Interest Grouplibguides.metro.org/webarchiving

International Internet Preservation Consortiumnetpreserve.org

Page 38: Web Archiving: An Overview

Where can/should I get started?

NDSA Web Archiving in the United States Surveyshttp://1.usa.gov/1z1H3jo

SAA Web Archiving Roundtablewww2.archivists.org/groups/web-archiving-roundtable

METRO Web Archiving Special Interest Grouplibguides.metro.org/webarchiving

International Internet Preservation Consortiumnetpreserve.org

Jill Lepore, “The Cobweb: Can the Internet be Archived?” The New Yorker, 1/26/2015http://www.newyorker.com/magazine/2015/01/26/cobweb

Page 39: Web Archiving: An Overview

Thanks!

...and keep in touch!

Karl-Rainer BlumenthalWeb Archivist, Internet Archive

[email protected]@LandLibrarian

Sumitra DuncanNYARC Web Archiving Coordinator Frick Art Reference Library

[email protected]@artlibrariannyc

Image credits:

Condé Nast

International Internet Preservation Consortium

Susan Kare, Museum of Modern Art

National Digital Stewardship Alliance

Archive-It

Society of American Archivists

Brian Ejar

Simple Icons

Creative Stall

Iconathon

Museum of Modern Art

The Frick Collection

Brooklyn Museum

New York Art Resources Consortium