Upload
evaminerva
View
207
Download
1
Embed Size (px)
DESCRIPTION
Hadar Miller, National Library of Israel: “ArchioNet” Israel Internet DomainArchive pptx file of the presentation at the EVA/Minerva Jerusalem International Conference on Digitisation of Culture, Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013 http://www.digital-heritage.org.il Presentations available at: http://2013.minervaisrael.org.il
Citation preview
“ArchioNet”Israeli Internet
DomainArchive
Agenda
oNLI Digital Library Infrastructure
o“ArchioNet” Project Scope
oTechnical Issues
oThe Project in Numbers
oLegislation
oWhat’s Next
NLI Digital Library Infrastructure
“ArchioNet” Project scope
• Why do we need this project ?
• What do we harvest?
• Phase A : .IL web site
• Phase b : Hebrew characters sites
• How to enable accessibility:
• Phase A : “Way back machine” in NLI Only , “Archionet” Only.
• Phase B : Over the Web , Cross Reference Discovery.
• When we started?
• Phase A : 2 full crawl annually started September 2013
• Phase B : additional 4 subject based crawl annually.
• Where to execute the harvest ?
• Phase A : NLI with Internet Archive.
• Phase B : NLI Infrastructure
Technical Issues
• Which Crawler ( version ) to use ?
• Cataloguing and Search tool
• What to harvest ?
• Seeds is needed
• Depth of a site
• Robots.txt
• The Deep Web
• How to store and preserve a WARC file
• Virus Detection
• System Architecture
The Project in Numbers
•~220K web sits
•0.5 Giga byte/Site
•~100 Tera / Harvest
•Avg page lifetime ~ 100 days
•2 Full Harvest - Annually
Legislation
•Can NLI Harvest
•Where is it accessible ?
•Intellectual Properties
•What can/should we block ?
Thank You
Back