DIGITAL LIBRARIESgreenstonesupport.iimk.ac.in/greenstone2010/pdf/... · Foreword Digital Libraries...

Preview:

Citation preview

Feasibility, Features, Functionalities and the Future

Dr. M.G. Dr. M.G. SreekumarSreekumarUNESCO Coordinator, Greenstone Support, South AsiaUNESCO Coordinator, Greenstone Support, South Asia

Librarian & Head, CDDL, IIM Librarian & Head, CDDL, IIM KozhikodeKozhikode

DIGITAL LIBRARIES

Agenda• Digital Library – Concepts, Principles and

Technologies, Architecture…• Open (Source) Digital Libraries • Metadata – Concepts, Functions and Standards• DL : Functional Components, Workflows & Procedures• DL : Build up Strategies• Hardware / Storage / Space• Software Selection• DL Architecture• Major Tasks• DL Hardships

ForewordForewordDigital Libraries Digital Libraries gaining increasing social attention, academic gaining increasing social attention, academic and research interestand research interestDemand for improved information and knowledge Demand for improved information and knowledge management solutions management solutions -- universities, enterprises and universities, enterprises and institutionsinstitutionsNeed for Integrated access to disparate information resourcesNeed for Integrated access to disparate information resourcesKey challenge Key challenge -- how to create online information how to create online information environments facilitating internal content publishing and environments facilitating internal content publishing and single point access to internal/external information sources single point access to internal/external information sources Latest DL technologies Latest DL technologies VsVs Traditional libraries and Traditional libraries and knowledge managementknowledge managementFortunately we have a large number of operational digital Fortunately we have a large number of operational digital libraries and serviceslibraries and services

World of Digital Information :World of Digital Information :FeaturesFeatures

Great Potential and DynamicGreat Potential and DynamicEasy to access, disseminate, store, retrieve, archive, Easy to access, disseminate, store, retrieve, archive, copy, transmit ...copy, transmit ...Ubiquity of the Net / WebUbiquity of the Net / WebInformation Information -- Any time / Anywhere / AnyoneAny time / Anywhere / AnyoneAccess by a wide spectrum of UsersAccess by a wide spectrum of UsersEasiness of access Easiness of access -- Plug & PlayPlug & PlayCurrency of the material / informationCurrency of the material / informationIncrease in value Increase in value

Unique Features of the Net/WebUnique Features of the Net/Web

Reach Reach -- unprecedentedunprecedented

Richness Richness -- unquestionedunquestioned

Feedback Feedback -- excellent excellent

Content HolderContent Holder

Content PublisherContent Publisher

Content CommunicatorContent Communicator

AsynchronousAsynchronous

Death of Distance / TimeDeath of Distance / Time

Technology Requirements

BandwidthCommunication SpeedsProcessing PowerWorld Wide ConnectivityApplication Support

The Current EnvironmentThe Current EnvironmentFascinating times in the history of libraries, Fascinating times in the history of libraries, information systems and electronic publishinginformation systems and electronic publishing

Possibilities of building largePossibilities of building large--scale services scale services

Materials are stored on computers Materials are stored on computers

Network connects the computers to personal computers Network connects the computers to personal computers on the users' deskson the users' desks

In a complete digital library, nothing need ever reach In a complete digital library, nothing need ever reach paper paper

Top Tech Trends in IT / LISTop Tech Trends in IT / LIS

Web 2.0 / Library 2.0Web 2.0 / Library 2.0Blogs / RSS Feeds / Wikis / Podcasts / WebcastsBlogs / RSS Feeds / Wikis / Podcasts / WebcastsOpen Source Software, Open Standards, Open URL Open Source Software, Open Standards, Open URL User Tagging, Automated TaggingUser Tagging, Automated TaggingWeb Web OPACsOPACs, and Interface Design, and Interface DesignSeamless Integration / AggregationSeamless Integration / AggregationOA OA --> OAP + OAA > OAP + OAA Open Resource Discovery Tools Open Resource Discovery Tools -- Google ScholarGoogle ScholarEE--Books, EBooks, E--Journals, EJournals, E--ResourcesResourcesHarvesting, Federation, Harvesting, Federation, MetasearchingMetasearchingDigital Rights ManagementDigital Rights Management

Multimedia Library Info System

Multimedia Library Info System

Internet / IntranetInternet / Intranet

Gateway-out Data capture

USER @ anywhere (access to information from anywhere)

Challenges of the DayChallenges of the DayCollection Building Collection Building –– Acquisition, Subscriptions, Acquisition, Subscriptions, LicensingLicensing……

Diverse Diverse DatastreamsDatastreams -- Content Categories, Publication Content Categories, Publication TypesTypes

Multimedia, Multimedia, PolymediaPolymedia, , MultiformatsMultiformats

Copyright, Intellectual Property, Fair UseCopyright, Intellectual Property, Fair Use……

Technology Complexities, Infrastructure IssuesTechnology Complexities, Infrastructure Issues

PublishersPublishers’’ Stringent Policies / MonopoliesStringent Policies / Monopolies

Integration of legacy systems and the new genreIntegration of legacy systems and the new genre

Popular InformationPopular

Information

Scholarly InformationScholarly

Information

DigitizedInformation

(DL Initiatives)

DigitizedInformation

(DL Initiatives)

Web Resources

Web Resources

The InformationLandscape

The InformationLandscape

Books, eBooksPOD, JLs, eJLs,

NewspapersAV media

Books, eBooksPOD, JLs, eJLs,

NewspapersAV media

Books, eBooks, JLS, eJournals, Scholarly

Articles, ePrint Archives,ETDs, eCourses

Books, eBooks, JLS, eJournals, Scholarly

Articles, ePrint Archives,ETDs, eCourses

Commercial,National,

State & Local LevelNGOs

Commercial,National,

State & Local LevelNGOs

Surface Web,Deep Web,

Multi-ModalSemantic Web

Surface Web,Deep Web,

Multi-ModalSemantic Web

Penetration of E-Content in Libraries

PUBLICATION TYPES

• E-Books, E-Journals…

• Aggregated Scholarly E-Journal Databases

• Databases, CBT/ WBT

• Portals, Vortals…

• Value added services

• Preprints, Eprints, E-Documents….

DOCUMENT FORMATS

• ASCII, RTF, HTML, SGML, Postscript, PDF, Proprietary, Native Application Formats

• Images, Graphics

• Audio

• Video

• XHTML, ASP, PHP, XML ...

WhatWhat’’s a DL ? s a DL ? "Digital libraries are organized collections of digital informat"Digital libraries are organized collections of digital information. They ion. They combine the structuring and gathering of information, which librcombine the structuring and gathering of information, which libraries and aries and archives have always done, with the digital representation that archives have always done, with the digital representation that computers computers have made possible." (have made possible." (Michael Michael LeskLesk) ) ““Is a managed collection of information, with associated servicesIs a managed collection of information, with associated services, where , where the information is stored in digital formats and accessible overthe information is stored in digital formats and accessible over a network. a network. A crucial part of this definition is that the information is manA crucial part of this definition is that the information is managed. A aged. A stream of data sent to earth from a satellite is not a library. stream of data sent to earth from a satellite is not a library. The same data, The same data, when organized systematically, becomes a digital library collectwhen organized systematically, becomes a digital library collection." ion." ((William ArmsWilliam Arms) ) Digital library is "a focused collection of digital objects, incDigital library is "a focused collection of digital objects, including text, luding text, video, and audio, along with methods for access and retrieval, avideo, and audio, along with methods for access and retrieval, and for nd for selection, organization, and maintenance of the collection." selection, organization, and maintenance of the collection." ((Ian Witten and David BainbridgeIan Witten and David Bainbridge).)."Digital libraries are different [from traditional library autom"Digital libraries are different [from traditional library automation] in that ation] in that they are designed to support the creation, maintenance, managemethey are designed to support the creation, maintenance, management, nt, access to, and preservation of digital content. access to, and preservation of digital content. (Bernie Hurley,(Bernie Hurley, the Director the Director for Library Technologies at for Library Technologies at U.C.BerkeleyU.C.Berkeley. Quoted in . Quoted in Digital library technology Digital library technology trendstrends. Sun Microsystems. August 2002) . Sun Microsystems. August 2002)

What is a “digital library”?

Traditional user/librarian distinction is blurredComputers make information activeKitchens for knowledge preparationWWW ≠ DL!—organization, selectivityNice Web site ≠ DL!—import new documents easily

Collection of digital objects (text, video, audio) along with methods for access and retrieval, [user]and for selection, organization, and maintenance [lib]

Ian Witten

Digital Libraries as Digital Libraries as ‘‘CollectionsCollections’’

Digital Libraries as Digital Libraries as ‘‘InstitutionsInstitutions’’

Digital libraries are organizations that provide the Digital libraries are organizations that provide the resources, including the specialized staff, towards resources, including the specialized staff, towards building and operating building and operating DLsDLs

Digital libraries as a dynamic, growing organismDigital libraries as a dynamic, growing organism

Digital libraries evolve and become the predominant Digital libraries evolve and become the predominant mode of access to knowledge and learning, mode of access to knowledge and learning, institutionalization of digital libraries appears to be an institutionalization of digital libraries appears to be an increasing possibilityincreasing possibility

Benefits of Benefits of DLsDLsOutreach Outreach -- Library goes to the user Library goes to the user

Seamless Access Seamless Access -- Searching and browsing Searching and browsing

Borderless Dissemination Borderless Dissemination

Instantaneous and Current Instantaneous and Current

Always (24X7) available Always (24X7) available

Long term preservationLong term preservation……

LimitationsLimitations of of DLsDLsTechnological obsolescence Technological obsolescence

HardwareHardwareSoftwareSoftware

Quite Tender and hence Fragile tooQuite Tender and hence Fragile tooSecurity Issues Security Issues –– Being rigorously addressedBeing rigorously addressedHighly sensitive to Commands Highly sensitive to Commands –– Even a small Even a small ignorance or carelessness could be very fatal at timesignorance or carelessness could be very fatal at timesResources, Cost, ManpowerResources, Cost, ManpowerBandwidthBandwidthRights ManagementRights Management……

Functional ComponentsFunctional Components

Creation of Creation of DLsDLs

Digital ObjectsDigital Objects

Digital ObjectsDigital ObjectsDigital objects of analogue/physical equivalents:Digital objects of analogue/physical equivalents: pictures, pictures, video clips, music, publications, maps, artifacts (e.g. museum video clips, music, publications, maps, artifacts (e.g. museum objects), living beings (plants, animals, people), animation's, objects), living beings (plants, animals, people), animation's, slide shows, print publications, etc. In case of some of these slide shows, print publications, etc. In case of some of these entities (for example, artifacts like buildings and museum entities (for example, artifacts like buildings and museum objects and living beings) digital objects may only carry objects and living beings) digital objects may only carry relevant metadata information and possibly some form of relevant metadata information and possibly some form of multimedia representation of the entity (e.g. photographs). multimedia representation of the entity (e.g. photographs). Digital objects that do not have physical counterparts and Digital objects that do not have physical counterparts and those created dynamically and in realthose created dynamically and in real--time:time: electronic electronic publications, software, spread sheets, databases, data gathered publications, software, spread sheets, databases, data gathered from remote sensors, software agents, and live capture of from remote sensors, software agents, and live capture of digital versions of speech, music and video. digital versions of speech, music and video.

Space Requirements: For 100,000 Space Requirements: For 100,000 Articles (Text) having 5 pages eachArticles (Text) having 5 pages each

Space Requirements: For 100,000 Space Requirements: For 100,000 Images (640X480 in 256 Images (640X480 in 256 colourscolours))

Space Requirements: For 100,000 Audio Space Requirements: For 100,000 Audio Recordings (Half Sound, 8 Bit 11 KHzRecordings (Half Sound, 8 Bit 11 KHz-- Mono Mono

and 16 Bit 44 KHz Stereo, 10 and 16 Bit 44 KHz Stereo, 10 MinsMins each)each)

Space Requirements: For 100,000 Video Space Requirements: For 100,000 Video Clips (320X200 and 256 Clips (320X200 and 256 colourscolours at 15 fps)at 15 fps)

Bandwidth RequirementsBandwidth Requirements

Libraries Libraries -- ShiftsShifts

Traditional / AutomatedTraditional / AutomatedOrganization is physical Organization is physical Shelving of documents Shelving of documents -- Based on Subject Based on Subject ClnClnKey Key -- Index / Catalogues / Cards / Digital Catalogs Index / Catalogues / Cards / Digital Catalogs Cards Cards -- Real/Virtual Real/Virtual -- Author, Title, DescriptionsAuthor, Title, Descriptions

DigitalDigitalOrganization in terms of digital files /objectsOrganization in terms of digital files /objectsContains material digitized formContains material digitized formContains digital materialContains digital materialArchitecture Architecture Key Key -- MetadataMetadata

Shift in ApproachesShift in ApproachesTraditional Automated Dig. Library

AACR2ISO 2709CCFMARCThesauri

AACR2CCCCC / LCCSDDC / UDCThesauri/LCSH

MetadataDCMI -- W3CEAD, TEI, DTDMETS,MODS, Z39.50MARC21OAI-PMH

Limited/ RigidEfficient/ Flexible

Improved

Features of Digital Libraries…

• Dynamic Electronic Information Systems• Seamless Aggregation and Integration of Scholarly

Content• Create / Maintain Local Content• Strengthens - mechanisms and capacity - Information

Systems / Services• Increase Portability• Efficiency of Access• Flexibility• Availability• Long term preservation

UNESCO

Special Requirements

• Infrastructure• Acceptability• Access Restrictions• Readability• Standardization• Authentication• Preservation• Copyright• User Interface

Need for Content Integration / Organization

• Assuring Seamless Access to the Content • Need for a single Info. Gateway / Access Point • Multi - Formats, Media, Platforms (Content / Data

in different formats)• Data encoding (role of markup languages)• Role of Metadata (role of Standards)• Structured Metadata (role of XML)• Need for Interoperability• Interface / Delivery / Presentation• Exorbitant cost of proprietary DL S/W

Digital Library Technologies

• Open architectures (Open DLs)

• Componentized vs Monolithic systems

• Interoperability (role of Z39.50, OAI etc.)

• Unified interface for heterogeneous libraries

• Metadata mapping across different libraries

• OAI-compliant data and service providers

• Multilingual digital libraries

• Scalable digital library architectures

• Publication tools

• Searching tools

Software Selection• Goals and Requirement Specification

• Proprietary Vs Open Source

• Fit the existing Information System

• Accommodate future migration

• Embrace all possible/predominant formats

• Support standard DL technologies/platforms

• Easy installation, population, maintenance

• Comprehensive Documentation

• Software Development Team

• Active User Groups, E-Mail Lists (Users / Developers)

What Distinguishes a DL?

Site Neutrality (3 in 1 Access-Anytime,Anywhere by Anyone Access)

Open AccessGreater variety and granularity of informationSharing of information ‘Sharium’Up-to-date nessAlways available (365*7*24)New forms of rendering (New Genre)

Digital Libraries: An Overview

Digital Libraries

Computing Networking Content Collections Services Community

What are digital libraries for?Knowledge/content management

Manage and access internal information assetsScholarly communication, education, research

E-journals, e-prints, e-books, data sets, e-learningAccess to cultural collections

Cultural, heritage, historical & special collections, museums, biodiversity

E-governanceImproved access to government policies, plans, procedures, rules and regulations

Archiving and preservationMany more …

DL Software: Alternatives

What are your expectations?Develop local web-based application?Commercial DL solution?Adopt open source software?

GreenstoneEprintsDSpaceFedora…

Digital Library TechnologiesDigital Library Technologies

Interoperability Interoperability

Unified interface for heterogeneous libraries Unified interface for heterogeneous libraries

Metadata mapping across different libraries Metadata mapping across different libraries

OAIOAI--compliant data and service providers compliant data and service providers

Multilingual digital libraries Multilingual digital libraries

Scalable digital library architectures Scalable digital library architectures

Publication toolsPublication tools

Searching toolsSearching tools

DLs: Workflows and Processes

Content selectionContent acquisitionContent publishing

Metadata preparationContent loading

Content indexing & storageContent access & delivery

PreservationAccess managementUsage monitoring and evaluationNetworking and interoperationMaintenance

DL Software: Key requirements• Document types (book, journal article, lecture …)• Document formats (text, PDF, Word, PS, …)• Content acquisition (online and offline)

– Metadata description, content tagging– Content uploading

• Indexing and retrieval– Structured/ full text indexing– Automatic metadata extraction

• Storage– Data compression– Efficient storage for metadata– Efficient location of metadata and documents

• Access and delivery– Structured search, browse, hierarchical browsing– CD-ROM distribution

DL Software: More requirements

• Scaling up – for large collections• Multilingual support• Access management and security• Usage monitoring and reporting• Standards compliance

– XML, Dublin Core, Unicode• Interoperation

– OAI, Z39.50 compliance, MARC, CDS/ISIS, …

Traditional Library Standards: MARC

History:• Originally devised by the Library of Congress, 1966: MARC 1

• Format designed with magnetic tape in mind!

• 1967/8 expanded through collaboration with British Library

• Led to two broad versions: UK … subfields …

• Many international variations: tend to follow US MARC orUK MARC

• Used as an exchange format or a communication format

USMARC DANMARCCAN/MARC UNIMARC FINMARC UKMARC CHINA-MARC

MARC21

General DefinitionGeneral Definition

Metadata in its broadest sense is Metadata in its broadest sense is data about datadata about dataDocumentation about documents and objectsDocumentation about documents and objectsDescribing (Tagging) the contents of the objectDescribing (Tagging) the contents of the objectFor Information Discovery from the Resource BaseFor Information Discovery from the Resource Base

Internet context Internet context

Data Data describing the attributes of an electronic resourcedescribing the attributes of an electronic resource on on the netthe netDublin Core (DCMI)Dublin Core (DCMI) –– WWW Consortium StandardWWW Consortium StandardXML XML -- The toolThe tool

MetadataMetadata

Dublin Core Metadata Elements

Responsibility

Manifestation

Title The name given to the resource by the creator or publisher Creator The person responsible for the intellectual content of the

resource Subject The Topic of the resourceDescription A textual description of the content of the source Publisher The Entity responsible for making the resource available Contributor A person or organization (other than the Creator) who is

responsible for making significant contributions to the intellectual content of the resource

Date A date associated with the creation or availability of the resource

Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Identifier An unambiguous reference that uniquely identifies the

resource within a given context Source A reference to a second resource from which the present

resource is derived Language The language of the intellectual content of the resource Relation A reference to a related resource, and the nature of its

relationship Coverage Spatial locations and temporal durations characteristic of

the content of the resourceRights Information about rights held in the resource

The Basics:22 Elements

Metadata Definition

Content

DL DL -- HardshipsHardships

Copyright IssuesCopyright IssuesTechnology ComplexitiesTechnology ComplexitiesInfrastructure IssuesInfrastructure IssuesPublications/Formats Publications/Formats –– Diverse Diverse DatastreamsDatastreamsDigital Objects/Formats Digital Objects/Formats -- Multiple Multiple PublishersPublishers’’ Policies Policies –– Stringent, InconsistentStringent, Inconsistent

Major TasksMajor TasksContent identification (internal / external)Content identification (internal / external)Content CreationContent CreationContent Collation/SignpostsContent Collation/SignpostsOrganisationOrganisationUpdationUpdationRetrieval / Dissemination Retrieval / Dissemination User TrainingUser TrainingArchivingArchiving

Data/Objects

METS/MODS

EAD TEI

DCMI

OS

Z39.50 /OAI-PMH

Network

DL Software

DIGITAL LIBRARY ARCHITECTURE

Recommended