20
Planning a Planning a digital digital library library How to Build a Digital Library How to Build a Digital Library Ian H. Witten and David Bainbridge Ian H. Witten and David Bainbridge

Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Embed Size (px)

Citation preview

Page 1: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Planning a Planning a digital digital librarylibrary

How to Build a Digital LibraryHow to Build a Digital LibraryIan H. Witten and David BainbridgeIan H. Witten and David Bainbridge

Page 2: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Planning a Digital Planning a Digital LibraryLibrary

ResponsibilitiesResponsibilities Technology to be usedTechnology to be used

Greenstone, DSpace, Fedora, EprintsGreenstone, DSpace, Fedora, Eprints Metadata standard to be usedMetadata standard to be used

Dublin Core, METS, etc.Dublin Core, METS, etc. Types of accessTypes of access Retrospective or Born Digital?Retrospective or Born Digital?

Page 3: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

ResponsibilitiesResponsibilities

Legal IssuesLegal Issues Distributing information carries Distributing information carries

responsibilitiesresponsibilities CopyrightCopyright

Social IssuesSocial Issues Respect customs of the communityRespect customs of the community Both source and use communitiesBoth source and use communities

Ethical issuesEthical issues

Page 4: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

IdeologyIdeology

Ideology – a clear conception of what Ideology – a clear conception of what you plan to achieve with the collection you plan to achieve with the collection of informationof information

Ideology of a Collection:Ideology of a Collection: PurposePurpose ObjectivesObjectives PrinciplesPrinciples

guide what is to be included in the collectionguide what is to be included in the collection

Placed in Introduction to Digital LibraryPlaced in Introduction to Digital Library

Page 5: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Document versus WorkDocument versus Work

WorkWork The disembodied content of a messageThe disembodied content of a message Pure informationPure information

DocumentDocument Traditional library: a physical object that Traditional library: a physical object that

embodies the workembodies the work Digital library: a particular electronic encoding Digital library: a particular electronic encoding

of a workof a work

How are distinctions made between How are distinctions made between different manifestations of a single work?different manifestations of a single work?

Page 6: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Converting an Existing Converting an Existing LibraryLibrary

Digitizing an existing paper-based Digitizing an existing paper-based collection is the most expensive kind collection is the most expensive kind of projectof project

Consider whether it is worth the Consider whether it is worth the effort and expenseeffort and expense

1616thth Century Mexican Library Century Mexican Library IncunabulaIncunabula BroadsidesBroadsides

Page 7: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Advantages of Digital Advantages of Digital LibrariesLibraries

Easier to access remotely than Easier to access remotely than conventional librariesconventional libraries

Powerful search and browsingPowerful search and browsing Easier to add additional servicesEasier to add additional services Easier to organize and reorganizeEasier to organize and reorganize Easier to maintain?Easier to maintain? Easier to preserve?Easier to preserve? Does your collection have these Does your collection have these

advantages?advantages?

Page 8: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Questions to AddressQuestions to Address

Will the digital library coexist with Will the digital library coexist with an existing physical one?an existing physical one?

What is the collection’s growth rate?What is the collection’s growth rate? How dynamic is the collection?How dynamic is the collection? Should you consider outsourcing the Should you consider outsourcing the

whole digital library operation?whole digital library operation? Could user needs be satisfied in Could user needs be satisfied in

alternative ways?alternative ways?

Page 9: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Prioritizing MaterialsPrioritizing Materials

Special collections and unique Special collections and unique materialsmaterials Rare books and manuscriptsRare books and manuscripts

High use itemsHigh use items Research and teaching materialsResearch and teaching materials

Low-use itemsLow-use items

Page 10: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Criteria for Digital Criteria for Digital ConversionConversion

Intellectual contentIntellectual content Scholarly valueScholarly value Desire to enhance access to informationDesire to enhance access to information Funding availableFunding available

Educational valueEducational value Classroom supportClassroom support Background readingBackground reading Distance educationDistance education

InstitutionalInstitutional Resource sharingResource sharing Promote strengths of an institutionPromote strengths of an institution

Reduce handling of fragile originalsReduce handling of fragile originals Cost and space savingsCost and space savings

Page 11: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Building a New Building a New CollectionCollection

New materialNew material The copyright holder may be the best The copyright holder may be the best

one to create a digital collectionone to create a digital collection MetadataMetadata

Where will it come from?Where will it come from?

Page 12: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Bibliographic EntitiesBibliographic Entities DocumentsDocuments WorksWorks

Distinction between document and workDistinction between document and work EditionsEditions

Electronic documents use terms such as Electronic documents use terms such as version, release and revisionversion, release and revision

AuthorsAuthors Authority control – standardized names for Authority control – standardized names for

authorsauthors TitlesTitles

Attributes of worksAttributes of works

Page 13: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Bibliographic EntitiesBibliographic Entities SubjectsSubjects

Two approaches to automatically assign subject:Two approaches to automatically assign subject: Key-phrase extractionKey-phrase extraction Key-phrase assignmentKey-phrase assignment

Literary and artistic worksLiterary and artistic works Style, form, content, genreStyle, form, content, genre

Library of Congress Subject Headings (LCSH)Library of Congress Subject Headings (LCSH) Controlled vocabularies: 30,000 pages, 2,000,000 entriesControlled vocabularies: 30,000 pages, 2,000,000 entries

Hierarchical relationship of broader and narrower Hierarchical relationship of broader and narrower topicstopics

Subject classificationsSubject classifications Traditional libraries have a linear arrangementTraditional libraries have a linear arrangement Digital collection can be rearranged at the click of a Digital collection can be rearranged at the click of a

mousemouse

Page 14: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Digitizing DocumentsDigitizing Documents

DigitizationDigitization The process of taking traditional library The process of taking traditional library

materials and converting them to materials and converting them to electronic formelectronic form

Allows storage and manipulation by a Allows storage and manipulation by a computercomputer

The process is time-consuming and The process is time-consuming and expensiveexpensive

Page 15: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Stages of DigitizationStages of Digitization

ScanningScanning Creates a digitized image of each pageCreates a digitized image of each page Usually presented to the userUsually presented to the user

Optical Character Recognition (OCR)Optical Character Recognition (OCR) Creates an encoded representation of Creates an encoded representation of

the textual content of the pagesthe textual content of the pages Necessary for full-text indexingNecessary for full-text indexing Allows searchingAllows searching

Page 16: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Decisions in ScanningDecisions in Scanning

Black-and-white, grayscale or colorBlack-and-white, grayscale or color ResolutionResolution

number of pixels per linear unitnumber of pixels per linear unit Bits per pixelBits per pixel

Monochrome display: 16 or 256 levels of Monochrome display: 16 or 256 levels of graygray

Color display: up to 24 or 32 bppColor display: up to 24 or 32 bpp QualityQuality

Increases storage space and time to accessIncreases storage space and time to access

Page 17: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Optical Character Optical Character RecognitionRecognition

Manual cleanup is necessaryManual cleanup is necessary Less efficient than manual keying Less efficient than manual keying

when error rate drops below 95 when error rate drops below 95 percentpercent

Page 18: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Interactive OCRInteractive OCR Optical character recognition should be Optical character recognition should be

done as an interactive processdone as an interactive process AcquisitionAcquisition

Input from scanner or read a fileInput from scanner or read a file CleanupCleanup

Filtering, deskewing and manual cleanup of unwanted Filtering, deskewing and manual cleanup of unwanted areasareas

Page analysisPage analysis Examine layoutExamine layout

RecognitionRecognition The “OCR” partThe “OCR” part

CheckingChecking SavingSaving

Plain text, HTML, RTF, PDF, MS WordPlain text, HTML, RTF, PDF, MS Word

Page 19: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Page HandlingPage Handling

UnbindingUnbinding Microfiche or microfilmMicrofiche or microfilm Two most expensive partsTwo most expensive parts

Handling the paperHandling the paper OCROCR

Page 20: Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Planning a Digitization Planning a Digitization ProjectProject

OutsourcingOutsourcing CostCost

$1 to $2 for scanning and OCR$1 to $2 for scanning and OCR Quality controlQuality control VerificationVerification