18
How to built a digital Library? A case of Newspaperdigitization at the National A case of Newspaperdigitization at the National Library of the Netherlands 3th of November 2011 Jasper Faase Project Manager Digitization Email: [email protected]

How to Build a Digital Library

Embed Size (px)

DESCRIPTION

Large-Scale Digital Archives: Publisher and Library Case StudiesSpeakers: Thijs Willems, Project Manager, Online Archives, Springer; Jasper Faase, Project Manager, Newspaper Digitization Project, National Library of the Netherlands.This session will present two large scale digitization projects, the Springer Book Archives and the National Library of the Netherlands (aka the Dutch KB). The audience will learn the ‘nuts and bolts’ of these unique projects: key decisions, timelines, consequences for internal and external stakeholders, production matters and clearing hurdles such as rights and permissions. The impact these key initiatives may have on long term preservation, the physical library, metadata and discoverability, author relations and the long tail of usage are topics for discussion with the audience.

Citation preview

Page 1: How to Build a Digital Library

How to built a digital Library?

A case of Newspaperdigitization at the National A case of Newspaperdigitization at the National Library of the Netherlands

3th of November 2011

Jasper FaaseProject Manager DigitizationEmail: [email protected]

Page 2: How to Build a Digital Library

Mission

• The Koninklijke Bibliotheek is the national library of the Netherlands: we bring people and information together.

• Our core values are: accessibility sustainability • Our core values are: accessibility, sustainability, innovation and cooperation.

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 3: How to Build a Digital Library

Vision

• We• Offer everyone everywhere access to everything

published in the NetherlandsPl t l l i th ( i tifi ) i f ti • Play a central role in the (scientific) information infrastructure of the Netherlands

• Promote permanent access to digital information o o e pe e ccess o d g o onationally and internationally

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 4: How to Build a Digital Library

How do we translate this vision into practice

• Mass-digitization: when possible in public-private partnerships

• Speeding up digitization: by the end of 2013 10% of all Dutch Books Newspapers and Magazines will be Dutch Books, Newspapers and Magazines will be digitized (60 million pages).

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 5: How to Build a Digital Library

Digitization at the Dutch National Library (KB)

P j t Ti li PProject Timeline PagesDutch Parliamentary Newspapers 2004-2010 2.500.000

Dutch Daily Newspapers 2007-2012 9.000.000

Early Dutch Books Online 2008-2010 2.000.000

Magazines 2009-2011 1.500.000

Google 2011-2014 30 000 000Google 2011 2014 30.000.000

ProQuest 2011-2016 6.000.000

Metamorfoze (Books, Newspapers and Magazines)

2012-2016 9.000.000Newspapers and Magazines)

Totaal 60.000.000

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 6: How to Build a Digital Library

Projectworkflow

1 Selection KB + partners1. Selection KB + partners

2. Material preparation KB

3 Scanning OCR + metadata Outsourced3. Scanning, OCR + metadata Outsourced

4. Quality assessment KB

5 Presentation & storage KB + partners5. Presentation & storage KB + partners

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 7: How to Build a Digital Library

The scope is ‘everything’: but where to start?

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 8: How to Build a Digital Library

Selection of newspapers (1618-1995)

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 9: How to Build a Digital Library

Copyright

• Freelancers own rights until 70 years after death of ‘author’

• Publishers own rights until 70 years after publication

• Online publication is only possible if permitted by the

copyright holders

• KB negotiated an agreement with representative bodies of

freelancers, journalists and photographers (Lira/Pictoright)

• KB negotiated successfully with 15 publishers to clear

copyrights

• Result: 102 Dutch newspapers will be published online until 1995

d b d f f h

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

and can be accessed free of charge

Page 10: How to Build a Digital Library

Material preparation

• Every page is checked• Small repairs are carried out • Metadata is added• Bindings are prepared for g p p

transportation• Cut and digitize is not an

optionp

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 11: How to Build a Digital Library

A workflow of 50.000 pages per week

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 12: How to Build a Digital Library

Digitization

• Digitization is outsourcedDigitization is outsourced• Final goal (e.g. web service & digital preservation) drives

technical choices• Metadata enrichment to improve quality of the automated • Metadata enrichment to improve quality of the automated

process of segmentation and Optical Character Recognition (OCR)

• Output:• Output:• JPEG2000 (masterimages & accessimages)• PDF• Descriptive, structural and technical metadata (DCX,

MPEG21/METS, ALTO, MIX)

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 13: How to Build a Digital Library

Quality assessment

• QA takes a lot of time so we do QA as automatically as possibleQ Q y p

• Automatic checks on validity of XML files, file names, correlation

between different files, completenessp

• Samples for all aspects that cannot be checked automatically

(results of correction of OCR in headers, segmentation, ea)g

• Focus on improving structural problems instead of incidental

errors

• Balance between quantity and quality

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 14: How to Build a Digital Library

Presentation & storage

• Generic architecture to support all our contentcontent• central metadata store and search

engine• open architecture by use of standards • open architecture by use of standards

(DublinCore) and protocols (OAI, SRU)

A ti l l l • Article level access

• Advanced search options for: date of publication city of publication specific publication, city of publication, specific newspaper(s), article type

• Storage: longterm preservation of results

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

• Storage: longterm preservation of results in KB’s E-Depot

Page 15: How to Build a Digital Library

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 16: How to Build a Digital Library

Mass digitization – lessons learned

• Do your homework: perform desktop research, develop a clear y p p , pfunctional design and implement a pilot phase

• Define detailed specifications and workflows for different source types - and stick to themyp

• Start early with negotiations to clear titles for online publication

• Planning is vital to stay in control. Perform regular transports to a g s ta to stay co t o . e o egu a t a spo ts to suppliers. Agree a detailed planning for deliveries

• Don’t underestimate costs of developing technical and organisational infrastructure for mass-digitization infrastructure for mass digitization

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 17: How to Build a Digital Library

Costs of newspaperdigitization

Newspapers: price per page (all in)

Labourcosts (€ 0,59)

Digitization (€ 0,68)

Infrastructure andsoftware (€ 0,16)( , )Conservation (€ 0,01)

Diverse costs (€0,05)

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland

Page 18: How to Build a Digital Library

Challenges

• Cutting down the price of digitizationg p g

• Bringing the physical and digital library together

• Linking initiatives of potential partners to our ambitions g p p

• Bringing digital collections together by providing a digital platform for all Dutch Books, Newspapers and Magazines

• Improving quality of OCR for historical text (IMPACT)

Koninklijke Bibliotheek – Nationale bibliotheek van Nederland