Upload
karl-rainer-blumenthal
View
737
Download
0
Embed Size (px)
Citation preview
Web Archiving: An Overview
Karl-Rainer Blumenthal, Internet ArchiveSumitra Duncan, Frick Art Reference Library
Metropolitan New York Library CouncilJanuary 7, 2015
What is web archiving?
Web archiving is the process of collecting, preserving, and enabling access to web-native materials.
Why archive the web?
> Collect web-native resources in your traditional collecting scope.
> Fulfill a records retention requirement.
> Document spontaneous/online events.
> Combat link rot and content drift (no more 404s!).
How does it work?
> Web crawlers navigate live websites and download their source code to Web ARChive (WARC) files.
How does it work?
> Replay technologies render the archived websites as they appeared at the time they were crawled.
Web archiving tools and services
The Wayback Machinehttps://archive.org/web/
The largest publicly available web archive in existence.
> 450+ Billion URLs > 100+ million websites> 40+ languages > ~ 1 billion URLs added per week
Web archiving tools and services
The Wayback Machinehttps://archive.org/web/
The largest publicly available web archive in existence.
> 450+ Billion URLs > 100+ million websites> 40+ languages > ~ 1 billion URLs added per week
Web archiving tools and services
HeritrixHTTrackUmbrawarcproxWget
ARCWARC
Wayback MachineOpenWaybackpywb (Python Wayback)Webenactoldweb.today
Web archiving tools and services
HeritrixHTTrackUmbrawarcproxWget
ARCWARC
Wayback MachineOpenWaybackpywb (Python Wayback)Webenactoldweb.today
Archive-ItNetarchiveSuite (DK/FR)PANDAS (AUS)Web Curator (UK/NZ)Webrecorder
Who archives the web?
Society of American Archivist Web Archiving Roundtable> 900+ member participants
Archive-It> 400+ partner organizations (software service subscribers)
National Digital Stewardship Alliance (NDSA)> Surveyed web archivists in in 2011, 2013, 2015...
Who archives the web?
Organizations with web archiving programs by typeNDSA, Web Archiving in the United States: A 2013 Survey
52%
15%13%
8%
5%
4%
1%
2%
Who archives the web?
Use of external service vs. in-house archivingNDSA, Web Archiving in the United States: A 2013 Survey
63%
16%
20%
Who archives the web?
Staff dedicated to web archiving programNDSA, Web Archiving in the United States: A 2013 Survey
36%
19%
25%
6%
7%7%
Participation in a collaborative web archiveNDSA, Web Archiving in the United States: A 2013 Survey
Who archives the web?
48%
33%
17%2%
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
Web archiving issues and trends
> Access and discovery
> Big data analysis
> Appraisal, provenance, and metadata
> Spontaneous events and social media
> Permissions and privacy policies
NYARC
Why web archiving at NYARC?
> Drift from print to born-digital
> Alignment with traditional collecting strengths & unique holdings
> Ephemeral nature of websites & risk of impermanence
> Not addressed elsewhere = risk of gap in art historical record
> Leverage consortial collaboration = better able to be nimble
Willem de Ridder. European Mail-Order Warehouse/Fluxshop inventory with Dorothea Meijer, seated, in the home of the artist, Amsterdam. 1964-65. Gelatin silver print. The Museum of Modern Art, New York.
How NYARC got started
> 2010 Auction House Pilot Study with Archive-It
> 2012 Planning Study
> 2013-2015 Mellon Grant for Web Archive Implementation
Web archiving life cycle at NYARC
Collection development / curation
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Collection scope
> Art Resources
> Artists’ Websites
> Auction Catalogs
> Catalogues Raisonnes
> Institutional Web Presence
> NYC Galleries
> Restitution of Lost or Looted Art
Curation & Quality assurance
Challenges & Lessons learned
> Scale
> Rapidly evolving and new technologies
> Cost
> Infrastructure/tools
> Permissions/intellectual property considerations
Goals & Lessons learned
> Rich and substantial collections
> Permanence and long-term preservation
> Scalability and sustainability
> Networked collections
> Greater collaboration = crucial to work together
Where can/should I get started?
NDSA Web Archiving in the United States Surveyshttp://1.usa.gov/1z1H3jo
SAA Web Archiving Roundtablewww2.archivists.org/groups/web-archiving-roundtable
METRO Web Archiving Special Interest Grouplibguides.metro.org/webarchiving
International Internet Preservation Consortiumnetpreserve.org
Where can/should I get started?
NDSA Web Archiving in the United States Surveyshttp://1.usa.gov/1z1H3jo
SAA Web Archiving Roundtablewww2.archivists.org/groups/web-archiving-roundtable
METRO Web Archiving Special Interest Grouplibguides.metro.org/webarchiving
International Internet Preservation Consortiumnetpreserve.org
Jill Lepore, “The Cobweb: Can the Internet be Archived?” The New Yorker, 1/26/2015http://www.newyorker.com/magazine/2015/01/26/cobweb
Thanks!
...and keep in touch!
Karl-Rainer BlumenthalWeb Archivist, Internet Archive
[email protected]@LandLibrarian
Sumitra DuncanNYARC Web Archiving Coordinator Frick Art Reference Library
[email protected]@artlibrariannyc
Image credits:
Condé Nast
International Internet Preservation Consortium
Susan Kare, Museum of Modern Art
National Digital Stewardship Alliance
Archive-It
Society of American Archivists
Brian Ejar
Simple Icons
Creative Stall
Iconathon
Museum of Modern Art
The Frick Collection
Brooklyn Museum
New York Art Resources Consortium