58
Digital Library Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet and tomorrow’s universally accessible digital repositories of all human knowledge The President’s Information Technology Advisory Committee

Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Embed Size (px)

Citation preview

Page 1: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital LibraryDigital Library

The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet and tomorrow’s universally accessible digital repositories of all human knowledge

The President’s Information Technology Advisory Committee

Page 2: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 3: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Traditional Library Traditional Library

Page 4: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 5: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 6: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital Libraries have been positioned at the intersection of

Library and Information ScienceComputer ScienceNetworked System

Page 7: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital LibraryDigital Library

The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet and tomorrow’s universally accessible digital repositories of all human knowledge

The President’s Information Technology Advisory Committee

Page 8: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

History of Digital LibraryHistory of Digital Library

Janus Digital Library,

1993, $105,000Digital Library Phase I, 1993-1998,

$24 millions, 6 major projectsDigital Library Phase II, 1998 – now

about 145 millions, about 30 projects each year

Page 9: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

1993, $105000, “electronic preservation”

Page 10: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital Library Phase IDigital Library Phase I

1994-1998, $24 millions, 6 major projects

Page 11: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

11, Integrated Speed, Image and Language Understanding for Creating Digital Video Library

Carnegie Mellon University. This is the only one focused on Video Medium.

Page 12: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

2, Interoperation mechanisms among 2, Interoperation mechanisms among heterogeneous servicesheterogeneous services

Stanford University. This project is focused on providing a uniform way to access a variety of servers and information sources. --- InfoBus Protocol.

Page 13: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

3, a prototype of a scalable, intelligent, 3, a prototype of a scalable, intelligent, distributed electronic librarydistributed electronic library

University of California at Berkeley. A prototype for environmental information.

Page 14: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

4, Towards a Distributed Digital Library4, Towards a Distributed Digital Library

University of California at Santa Barbara. This project is about “Digital Earth”, a collection of information about the world.

Page 15: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

5, Digital Library infrastructure for a 5, Digital Library infrastructure for a University Engineering CommunityUniversity Engineering Community

University of Illinois at Urbana_Champaign. It provides effective access to engineering and physics journal articles.

Page 16: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

6, Intelligent agents for information 6, Intelligent agents for information locationlocation

University of Michigan. Combines the traditional library and internet technologies to provide the best support for their users.

Page 17: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital Library Phase IIDigital Library Phase II

Start from 1998, $145 million, about 30 projects each year,

Page 18: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 19: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

The Theory of Digital LibraryThe Theory of Digital Library

In the earlier years, the theory of digital libraries was based on its structures and its behaviors.

In 2001, Edward A. Fox from Virginia Polytechnic Institute and State University, propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios and Societies (5S), the 5S theory.

Page 20: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Structure and BehaviorStructure and Behavior Hypertext Information Storage (Database System) Information Retrieval Multimedia Services Human Computer Interaction Program Language Interoperation

Page 21: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Information StorageInformation Storage A digital library must be capable of storing a large

amount of data in a variety of formats and be able to access this data as quickly as possible.

<1>, Relational database, <2>, Active Database, <3>, Mobile Database, <4>, Multiple Database <5>, Object Oriented Database

Page 22: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Relational DatabaseRelational Database A relationship between the tables

Page 23: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Active DatabaseActive DatabaseAn automatic reaction by event-condition-action rules.

Page 24: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Mobile DatabaseMobile Database

Dynamic data and location, Currency Protocol

Page 25: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Multiple DatabaseMultiple Database

A Multiple Database System consists of a collection of autonomous and heterogeneous local databases.

Page 26: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Object-Oriented DatabaseObject-Oriented Database

Object-oriented databases are designed to work well with object-oriented programming languages such Java, C#, and C++. This is because object-oriented databases used the same exact model as object-oriented programming languages.

Page 27: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Information RetrievalInformation Retrieval

Metadata searchingFull-text searchingUnion search platform

Page 28: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Metadata Searching

Metadata– data about data, structured data, data about “Who, What, Where, When”

Metadata tags: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Right.

Metadata Attributes: Name, Identifier, Version, Registration, Authority, Language, Definition, Obligation, Data Type, Maximum Occurrence, Comment.

Page 29: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 30: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Full-Text SearchingFull-Text Searching

This is a example of searching for the string, “Visual basic, Oracle”. It will search throughout a document to find a match.

Page 31: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Union Search PlatformUnion Search Platform

Various providers produce the many types of database retrieval systems that exists today.

End users want the ability to access different types of data using a universal interface.

The solution to this problem is to create a new application that integrates multiple search requests into a union search platform.

Page 32: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Problems:Problems:

1, Why should each digital library start from scratch?

2, Interoperability across heterogeneous digital library systems.

Page 33: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

A Fundamental Digital Library Theory A Fundamental Digital Library Theory 5S theory5S theory

Page 34: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

StreamsStreams

Definition: A stream is a sequence whose codomain is a nonempty set.

•A sequence of abstract items, used to describe both static and dynamic content.

•It can be text, video, audio, or a software program.

Page 35: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

StructuresStructuresDefinition: A structure is a tuple (G, L, F), where

G = (V, E) is a directed graph with vertex set V and edge set E, L is a set of label values, and F is a labeling function F: (V E) L.

•A labeled directed graphs which imposes organization.

•Collection, catalog, hypertext, document, metadata, organizational tool.

•How is the information organize?

Page 36: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

SpacesSpacesDefinition: A space is a measurable space,

measure space, probability space, vector space or a topological space.

• Contains rules to operate on the abstract items.

• User interface, index, retrieval model.

• Different logic and presentational properties. The operation of digital library components.

Page 37: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

ScenariosScenarios

•A sequences of events or actions in order to accomplish a functional requirement.

•Service, event, condition, action

• Communication between users and software developers.

Definition: A scenario is a sequence of related transition events (e1, e2, …,en) on state set S such

that ek = (sk, sk+1), for 1 k n.

Page 38: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

SocietiesSocietiesDefinition: A society is a tuple (CR), where

C = {c1, c2, …,cn} is a set of conceptual communities.

R= {r1,r2, …,rn} is a set of relationships.

•A set of entities and activates, and the relationships between them.

• Community, managers, actors, classes, relationships, attributes, operations.

•Actors and managers act together to carry out the digital library behavior.

Page 39: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital Library is a collection of digital object.

Definition: A digital library is a 4-tuple (R, DM, Serv, Soc), whereR is a repository;DM is a metadata catalog,Serv is a set of services containing at least services for indexing, searching, and browsing;Soc is a society of users of the digital library.

Page 40: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

5S Language is an XML realization of the 5S model.

Page 41: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Study CaseStudy Case

Page 42: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

NDLTDNDLTD• In Virginia Tech. • 177 universities and 27 institutions in worldwide. • A student creates a ETD file (Electronic Theses and

Dissertation) from his or her theses and dissertation. The ETD file is then checked for formatting errors and quantity requirements. The ETD file is then cataloged and placed on a electronic bookshelf.

Page 43: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

<1>, Stream Model:

Page 44: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

<2>, <2>, Structural Model: Model:

Electric Thesis and Dissertation – Metadata Structure

Page 45: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

This is a part of the code.

Page 46: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

<3>, Spatial Model<3>, Spatial Model

Page 47: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

<4>, Scenario Model: <4>, Scenario Model:

An example scenario of a searching service in an NDLTD DL.

Page 48: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 49: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 50: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

<5>, Societal Model<5>, Societal Model

Page 51: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet
Page 52: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital Library Generation Process with 5SL

.

Page 53: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

5SGraph, A Domain-Specific 5SGraph, A Domain-Specific Visual Modeling ToolVisual Modeling Tool

Page 54: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Digital Library In a Box

• Simplifies and enables the creation of a digital library

• Can be developed with little or no programming

• Built with an interoperable design

• Creates a minimal digital library in less than an hour

Page 55: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Open Digital LibraryOpen Digital Library

The goal is universal access to digital libraries and information services.

Page 56: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Report to the President Digital Libraries: Universal Access to Human knowledge

Page 57: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet

Report to the President, Digital Libraries: Universal Access to Human knowledge

Page 58: Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet