23
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia

Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia

Embed Size (px)

Citation preview

Web Archiving at the National Library of Australia

Russell LathamSenior Web Archivist,

National Library of Australia

“The Web's ever-expanding size, the dynamic and ephemeral nature of its content, and how this is to be captured, stored and made accessible for the long-term are some of the key questions being addressed by electronic archiving programs. “

PADIhttp://www.nla.gov.au/padi/topics/92.html

What is web archiving? A web archive is not the same as the

live web Brings a different value to web content

Creating artefacts from the web Preserved snapshots, slices, gobbets of

time Challenge of timeliness

At certain times some things are more interesting and valuable

Focus on the future and long term access (preservation objective)

History: web archiving at the NLA April Fools Day 1996: ‘Electronic Unit’

established May 1998: public access to PANDORA titles July 1998: first PANDORA ‘partner’ began

participation 10th participant joined in 2003

June 2001: PANDAS v.1 released Web archiving workflow system developed by NLA

2002: Digital Archiving Branch Our own identity at last! Began first trial of ‘mainstreaming’ web archiving in

Serials and Govt Deposit sections

History: web archiving at the NLA August 2002: PANDAS v.2 released July 2003: joined IIPC 2004: PANDORA added to UNESCO Australian

Memory of the World Register July 2005: first .au domain harvest

Subsequent harvests in 2006, 2007, 2008 & 2009 December 2006: “Web Archiving and Digital

Preservation Branch” July 2007: PANDAS v.3 released (at last!) 2010: PANDORA search moved to Trove May 2010: Proposal for whole-of-govt ‘opt-out’

arrangements through SIGB

PANDORA Participants

7

What we collect

Selective approach Collaboration with PANDORA

participating agencies Modest in size High quality, timely, high value

collection, described and searchable

Accessible to the public

Subjects Browse list Collections Agency based Trove – Archived Websites Trove – bibliographic Search engines

Searching the

collections

CollectionsNational Events Iraq war, 2003

Asia Tsunami, 2004Bali Bombing, 2002

Political Events ElectionsCHOGMNational Apology

Topic Based Extreme sportsSeven Network

Natural events FloodsCyclonesBushfires

Subjects/Browsing

When looking for non-specific resources

Wish to browse a topic area

Agency based

Use the partners page

http://pandora.nla.gov.au/partners.html

Collections

Election campaigns

1996 Federal Election2001 Federal Election2004 Federal Election2007 Federal Election2010 Federal Election1998 Federal Election

The Future

19

Australian web domain harvests

Annual domain harvests 2005-2009 Working with the Internet Archive Covers .au top level domain and a bit

more … No public access Quantity over quality; content not

assessed or described; opportunistic rather than timely

20

Comparative statisticsPANDORA

Files: 115 million

Size: 5.03 TB

Domain Harvest

2005 2006 2007 2008 2009

Unique files

185 million 596 million 516 million 1 billion 765 million

Hosts crawled

811,523 1,046,038 1,247,614 3,038,658 1,074,645

Size TBs 6.69 19.04 18.47 34.55 24.29

Domain Harvests

Files: 3 billion

Size: 103 TB

Current status

23