14
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest Preservation for Ongoing Accessibility: research group Professor Ross Harvey Dr Bob Pymm Dr Anne Lloyd Geoff Fellows Jake Wallis

Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Preservation for Ongoing Accessibility: research group

Professor Ross Harvey

Dr Bob Pymm

Dr Anne Lloyd

Geoff Fellows

Jake Wallis

Page 2: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor
Page 3: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Pandora - http://pandora.nla.gov.au

• NLA solution to website preservation

• Archive of over 1.7 terabytes of data

• selective - identifies specific sites for harvest and gains permission to archive

Page 4: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor
Page 5: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Internet Archive - http://www.archive.org/

• Automated

• Harvests ‘the web’

• issues?– cost – reliability of the crawl eg deep web

Page 6: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

.au Harvest by Internet Archive

• first ran 2005 - producing 6.9 terabytes of data, 185 million unique files

• Issues?– difficulties with certain file types– password-protected sites– difficulty in accessing the ‘deep’ web

Page 7: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

.au Harvest

• September 2006 – more sophisticated crawl

• 19 terabytes of data, 596 million files

• predominant dataset for POA group

Page 8: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Research potential?

• digital preservation

• Australian digital culture

Page 9: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

3 broad questions

• What are the contents of the harvests?

• How can access be provided to this content?

• What is the value of the domain harvests in relation to the NLA’s overall web preservation interests?

Page 10: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor
Page 11: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Blogs

• low skill threshold technology

• as barometer of engagement

• social space

• catalyst for online community

• a new and important collecting point for digital cultural heritage

Page 12: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Archiving and preserving blogs• how to identify Australian specific material?• what to capture

– selection criteria?– linked material?

• frequency of capture to ensure accurate representation• provision of access to harvested blog content

Page 13: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Aspirations

• a conceptual framework for studies in digital anthropology

• a broadening of voices within the Australian public sphere

Page 14: Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor

Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest

Questions/comments?