View
794
Download
4
Category
Preview:
DESCRIPTION
Large-Scale Digital Archives: Publisher and Library Case StudiesSpeakers: Thijs Willems, Project Manager, Online Archives, Springer; Jasper Faase, Project Manager, Newspaper Digitization Project, National Library of the Netherlands.This session will present two large scale digitization projects, the Springer Book Archives and the National Library of the Netherlands (aka the Dutch KB). The audience will learn the ‘nuts and bolts’ of these unique projects: key decisions, timelines, consequences for internal and external stakeholders, production matters and clearing hurdles such as rights and permissions. The impact these key initiatives may have on long term preservation, the physical library, metadata and discoverability, author relations and the long tail of usage are topics for discussion with the audience.
Citation preview
How to built a digital Library?
A case of Newspaperdigitization at the National A case of Newspaperdigitization at the National Library of the Netherlands
3th of November 2011
Jasper FaaseProject Manager DigitizationEmail: jasper.faase@kb.nl
Mission
• The Koninklijke Bibliotheek is the national library of the Netherlands: we bring people and information together.
• Our core values are: accessibility sustainability • Our core values are: accessibility, sustainability, innovation and cooperation.
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Vision
• We• Offer everyone everywhere access to everything
published in the NetherlandsPl t l l i th ( i tifi ) i f ti • Play a central role in the (scientific) information infrastructure of the Netherlands
• Promote permanent access to digital information o o e pe e ccess o d g o onationally and internationally
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
How do we translate this vision into practice
• Mass-digitization: when possible in public-private partnerships
• Speeding up digitization: by the end of 2013 10% of all Dutch Books Newspapers and Magazines will be Dutch Books, Newspapers and Magazines will be digitized (60 million pages).
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Digitization at the Dutch National Library (KB)
P j t Ti li PProject Timeline PagesDutch Parliamentary Newspapers 2004-2010 2.500.000
Dutch Daily Newspapers 2007-2012 9.000.000
Early Dutch Books Online 2008-2010 2.000.000
Magazines 2009-2011 1.500.000
Google 2011-2014 30 000 000Google 2011 2014 30.000.000
ProQuest 2011-2016 6.000.000
Metamorfoze (Books, Newspapers and Magazines)
2012-2016 9.000.000Newspapers and Magazines)
Totaal 60.000.000
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Projectworkflow
1 Selection KB + partners1. Selection KB + partners
2. Material preparation KB
3 Scanning OCR + metadata Outsourced3. Scanning, OCR + metadata Outsourced
4. Quality assessment KB
5 Presentation & storage KB + partners5. Presentation & storage KB + partners
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
The scope is ‘everything’: but where to start?
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Selection of newspapers (1618-1995)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Copyright
• Freelancers own rights until 70 years after death of ‘author’
• Publishers own rights until 70 years after publication
• Online publication is only possible if permitted by the
copyright holders
• KB negotiated an agreement with representative bodies of
freelancers, journalists and photographers (Lira/Pictoright)
• KB negotiated successfully with 15 publishers to clear
copyrights
• Result: 102 Dutch newspapers will be published online until 1995
d b d f f h
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
and can be accessed free of charge
Material preparation
• Every page is checked• Small repairs are carried out • Metadata is added• Bindings are prepared for g p p
transportation• Cut and digitize is not an
optionp
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
A workflow of 50.000 pages per week
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Digitization
• Digitization is outsourcedDigitization is outsourced• Final goal (e.g. web service & digital preservation) drives
technical choices• Metadata enrichment to improve quality of the automated • Metadata enrichment to improve quality of the automated
process of segmentation and Optical Character Recognition (OCR)
• Output:• Output:• JPEG2000 (masterimages & accessimages)• PDF• Descriptive, structural and technical metadata (DCX,
MPEG21/METS, ALTO, MIX)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Quality assessment
• QA takes a lot of time so we do QA as automatically as possibleQ Q y p
• Automatic checks on validity of XML files, file names, correlation
between different files, completenessp
• Samples for all aspects that cannot be checked automatically
(results of correction of OCR in headers, segmentation, ea)g
• Focus on improving structural problems instead of incidental
errors
• Balance between quantity and quality
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Presentation & storage
• Generic architecture to support all our contentcontent• central metadata store and search
engine• open architecture by use of standards • open architecture by use of standards
(DublinCore) and protocols (OAI, SRU)
A ti l l l • Article level access
• Advanced search options for: date of publication city of publication specific publication, city of publication, specific newspaper(s), article type
• Storage: longterm preservation of results
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
• Storage: longterm preservation of results in KB’s E-Depot
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Mass digitization – lessons learned
• Do your homework: perform desktop research, develop a clear y p p , pfunctional design and implement a pilot phase
• Define detailed specifications and workflows for different source types - and stick to themyp
• Start early with negotiations to clear titles for online publication
• Planning is vital to stay in control. Perform regular transports to a g s ta to stay co t o . e o egu a t a spo ts to suppliers. Agree a detailed planning for deliveries
• Don’t underestimate costs of developing technical and organisational infrastructure for mass-digitization infrastructure for mass digitization
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Costs of newspaperdigitization
Newspapers: price per page (all in)
Labourcosts (€ 0,59)
Digitization (€ 0,68)
Infrastructure andsoftware (€ 0,16)( , )Conservation (€ 0,01)
Diverse costs (€0,05)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Challenges
• Cutting down the price of digitizationg p g
• Bringing the physical and digital library together
• Linking initiatives of potential partners to our ambitions g p p
• Bringing digital collections together by providing a digital platform for all Dutch Books, Newspapers and Magazines
• Improving quality of OCR for historical text (IMPACT)
Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
Recommended