14
ArchioNet” Israeli Internet Domain Archive

F1 hadar miller__israeli_internet_archive-nli

Embed Size (px)

DESCRIPTION

Hadar Miller, National Library of Israel: “ArchioNet” Israel Internet Domain Archive pptx file of the presentation at the EVA/Minerva Jerusalem International Conference on Digitisation of Culture, Jerusalem, The Jerusalem Van Leer Institute, 12-13 November 2013 http://www.digital-heritage.org.il Presentations available at: http://2013.minervaisrael.org.il

Citation preview

Page 1: F1 hadar miller__israeli_internet_archive-nli

“ArchioNet”Israeli Internet

DomainArchive

Page 2: F1 hadar miller__israeli_internet_archive-nli

Agenda

oNLI Digital Library Infrastructure

o“ArchioNet” Project Scope

oTechnical Issues

oThe Project in Numbers

oLegislation

oWhat’s Next

Page 3: F1 hadar miller__israeli_internet_archive-nli

NLI Digital Library Infrastructure

Page 4: F1 hadar miller__israeli_internet_archive-nli

“ArchioNet” Project scope

• Why do we need this project ?

• What do we harvest?

• Phase A : .IL web site

• Phase b : Hebrew characters sites

• How to enable accessibility:

• Phase A : “Way back machine” in NLI Only , “Archionet” Only.

• Phase B : Over the Web , Cross Reference Discovery.

• When we started?

• Phase A : 2 full crawl annually started September 2013

• Phase B : additional 4 subject based crawl annually.

• Where to execute the harvest ?

• Phase A : NLI with Internet Archive.

• Phase B : NLI Infrastructure

Page 5: F1 hadar miller__israeli_internet_archive-nli

Technical Issues

• Which Crawler ( version ) to use ?

• Cataloguing and Search tool

• What to harvest ?

• Seeds is needed

• Depth of a site

• Robots.txt

• The Deep Web

• How to store and preserve a WARC file

• Virus Detection

• System Architecture

Page 6: F1 hadar miller__israeli_internet_archive-nli

The Project in Numbers

•~220K web sits

•0.5 Giga byte/Site

•~100 Tera / Harvest

•Avg page lifetime ~ 100 days

•2 Full Harvest - Annually

Page 7: F1 hadar miller__israeli_internet_archive-nli

Legislation

•Can NLI Harvest

•Where is it accessible ?

•Intellectual Properties

•What can/should we block ?

Page 8: F1 hadar miller__israeli_internet_archive-nli

Thank You

Page 9: F1 hadar miller__israeli_internet_archive-nli

Back

Page 10: F1 hadar miller__israeli_internet_archive-nli
Page 11: F1 hadar miller__israeli_internet_archive-nli
Page 12: F1 hadar miller__israeli_internet_archive-nli
Page 13: F1 hadar miller__israeli_internet_archive-nli
Page 14: F1 hadar miller__israeli_internet_archive-nli