95
1 C.W. Post Campus, Long Island U. (23 April 2008) Digital libraries: From Theory to CS/LIS CurriculaEdward A. Fox [email protected] http://fox.cs.vt.edu • Dept. of Computer Science, Virginia Tech • Blacksburg, VA 24061 USA

1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox [email protected] Dept

Embed Size (px)

Citation preview

Page 1: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

1

C.W. Post Campus, Long Island U.(23 April 2008)

“Digital libraries: From Theory to CS/LIS Curricula”

Edward A. Fox

[email protected] http://fox.cs.vt.edu

• Dept. of Computer Science, Virginia Tech

• Blacksburg, VA 24061 USA

Page 2: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

Acknowledgements (selected)

• Colleagues: Lillian Cassel, Debra Dudley, Weiguo Fan, Marcos Gonçalves, Doug Gorton, Rohit Kelapure, Neill Kipp, Aaron Krowne, Ming Luo, Yi Ma, Uma Murthy, Manuel Perez, Ananth Raghavan, Rao Shen, Venkat Srinivasan, Hussein Suleman, Srinivas Vemuri, Layne Watson, Seungwon Yang, …

• Sponsors: ACM, AOL, CAPES, DFG, Google, IBM, IMLS, INL, Microsoft, NSF (CCF-0722259; IIS-9986089, 0080748, 0086227, 0307867, 0325579, 0535057, 0535060, 0736055 ; DUE-0121679, 0121741, 0136690, 0333531, 0333601, 0435059, 0532825), SUN, …

Page 3: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

3

Acknowledgements - Mentors

• JCR Licklider – undergrad advisor (1969-71)– Author in 1965 of “Libraries of the Future”– Before, at ARPA, funded start of Internet

• Michael Kessler – BS thesis advisor– Project TIP (technical information project)– Defined bibliographic coupling

• Gerard Salton – graduate advisor (1978-83)– “Father of Information Retrieval”

Page 4: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

4

Living In the KnowlEdge Living In the KnowlEdge Society (LIKES)Society (LIKES)

North Carolina A & TNorth Carolina A & TSanta Clara UniversitySanta Clara UniversityVillanova UniversityVillanova UniversityVirginia TechVirginia Tech

NSF CPATH:NSF CPATH:CCF-0722259, CCF-0722259, 0722276, 0722276, 0722289, 0722289, and 0752865and 0752865

Page 5: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

5

LIKES Vision - Disciplines

KnowledgeSociety

HCIVisualization

KnowledgeManagement

SystemsAnalysis& Design

Programming

Database

Algorithms

ArchitectureNet-Centricity

Intelligent Systems

Social & Ethical

Library /InformationScience

Sociology

Simulation

Commun- ications

PoliticalScience

Archi-tecture

Health-care

Economics

Finance

Psychology

Marketing

Physics

Music

Engi-neering

History

Biology

Art

ChemistryGeography

Math

Geology

English

Page 6: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

6

LIKES Vision - Applications

KnowledgeSociety

HCIVisualization

KnowledgeManagement

SystemsAnalysis& Design

Programming

Database

Algorithms

ArchitectureNet-Centricity

Intelligent Systems

Social & Ethical

LibraryInformationScience

GIS

Simulation

OnlineShopping

MultiMedia

Semantic Web

CSCW

DigitalGovernment

Healthcare

Services

Page 7: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

7

Four Workshops

• Workshop 1 – Theme: Defining Problems and Applications of the Knowledge Society – Santa Clara University, Dec. 2007

• Workshop 2 – Theme: Testing LIKES Vision– North Carolina Agricultural and Technical State University– Completed April 18-19

• Workshop 3 – Theme: LIKES Pedagogy– Virginia Tech, Fall 2008

• Workshop 4 – Theme: LIKES in Practice– Villanova, Spring 2009

Page 8: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

8

LIKES VisionBuild a community, leading the way to change how

computing concepts are taught in both computing-related disciplines and in the disciplines of the broader workforce & society.

Reach a broader audience of potential students and produce a larger number of professionals with the computing competencies and skills for LIKES.

Improve computing competencies and skills of people in all disciplines, to help them address the pervasive and growing needs for computing in society.

Page 9: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

9

Transform CS Education

• Find Interesting Problems to Bring into Computing Courses for Learning in Context

• Thus, in a database class, students can:– See the value of hierarchical data structures to

biology by representing the taxonomy of species.– See the value of hierarchical data structures to

political science and management by representing the organization chart of the executive branch of U.S. government.

Page 10: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

10

Potential Course Areas/Courses• Personal Knowledge Management

– Computer Science and Information Systems, e.g., multi-media, process design and evaluation, and Human-Computer / Human-Information interaction.

– Psychology, e.g., knowledge organization principles, human cognitive processes.– Industrial Systems Engineering, e.g., Ergonomic factors of knowledge environments. – Ethics, e.g., ethical issues of information disclosure.

• Communication and Collaboration– Communications, e.g., Communication using digital visualizations, using knowledge access

in constructing digital messages.– Information Systems and Computer Science, e.g., computer supported cooperative work

and group support systems.– Marketing, e.g., influence of knowledge presentation on on-line customer behavior.

• Organization– Information Systems, e.g., service innovation and development, system design and

development.– Management Science, e.g., decision support systems concepts, capabilities, techniques,

and tools.– Management, Marketing, Accounting, and Finance, e.g., business in the information age.

• Society– Sociology, e.g., impact of knowledge differentials across society and countries.– Political Science, e.g., governmental collection and use of knowledge, impact of technology

on elections and government.

Page 11: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

11

Interdisciplinary Work Example:Virtual Jamestown

• Project Director– Prof. Crandall Shifflett, Dept. of History, VT – In 1996 he conceived the idea of combining

technology, history, and Jamestown 2007.

• Project Staff Members– Julie Richter: Ph.D. in early American history– Matthew Parrott: computer science major, chief

modeler, animator• Virtual Jamestown is a product of collaboration

between Virginia Tech, the University of Virginia, and the Virginia Center for Digital History at the University of Virginia.

Page 12: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

12

Information Life Cycle

AuthoringModifying

OrganizingIndexing

StoringRetrieving

DistributingNetworking

Retention/ Mining

AccessingFiltering

UsingCreating

Page 13: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

13

DLs Shorten the Chain to

Author

Reader

Digital

LibraryEditor

Reviewer

Teacher

Learner

Librarian

Page 14: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

14

DL Definitions - 1

• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”

• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003

Page 15: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

15

DL Definitions - 2

• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”

• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html

Page 16: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

16

DL Definitions - 3

• Issues and Spectra

– Collection vs. Institution

– Content vs. System

– Access vs. Preservation

– “Free” vs. Quality

– Managed vs. Comprehensive

– Centralized vs. Distributed

Page 17: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

17

DL Definitions - 4

• NOT a “digitized library”• NOT a “deconstruction” of existing

systems and institutions, moving them to an electronic box in a Library

• IS a new way to deal with knowledge– Authoring, Self-archiving, Collecting,– Organizing, Preserving,– Accessing, Propagating, Re-using

Page 18: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

18

D ig ita l L ib ra r y C o n te n t

A rtic le s ,R e p o rts,

B o o ks

T e xtD o cum e n ts

S p ee ch ,M u s ic

V id eoA u d io

(A e ria l)P h o tos

G e og rap h icIn fo rm ation

M o d e lsS im u la tio ns

S o ftw a re ,P ro g ra m s

G e no m eH u m a n,a n im a l,

p la n t

B ioIn fo rm ation

2 D , 3 D ,V R ,C A T

Im ag es a ndG ra p h ics

C o nte n tT yp e s

Page 19: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

19

Informal 5S & DL Definitions

DLs are complex systems that

• help satisfy info needs of users (societies)

• provide info services (scenarios)

• organize info in usable ways (structures)

• present info in usable ways (spaces)

• communicate info with users (streams)

Page 20: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

20

Hypotheses

• A formal theory for DLs can be built based on 5S.

• The formalization can serve as a basis for modeling and building high-quality DLs.

Page 21: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

21

5Ss

Ss Examples Objectives

Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data

Structures Collection; catalog; hypertext; document; metadata

Specifies organizational aspects of the DL content

Spaces Measure; measurable, topological, vector, probabilistic

Defines logical and presentational views of several DL components

Scenarios Searching, browsing, recommending

Details the behavior of DL services

Societies Service managers, learners, teachers, etc.

Defines managers, responsible for running DL services; actors, that use those services; and relationships among them

Page 22: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

22

Page 23: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

23

Page 24: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

24

ETANA Societies

1. Historic and pre-historic societies (being studied)2. Archaeologists (in academic institutes, fieldwork

settings, or local and national governmental bodies)

3. Project directors4. Technical staff (consisting of photographers,

technical illustrators, and their assistants)5. Field staff (responsible for the actual work of

excavation)6. Camp staff (e.g., camp managers, registrars, tool

stewards)7. General public (e.g., educators, learners, citizens)

Page 25: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

25

ETANA Societies

• Social issues1. Who owns the finds?

2. Where should they be preserved?

3. What nationality and ethnicity do they represent?

4. Who has publication rights?

5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this?

Page 26: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

26

ETANA Scenarios1. Life in the site in former times2. Digital recording: the planning stage and the excavation stage 3. Planning stage: remote sensing, fieldwalking, field surveys, building

surveys, consulting historical and other documentary sources, and managing the sites and monuments

4. Excavation1. Detailed information is recorded, including for each layer of soil, and for

features such as pole holes, pits, and ditches. 2. Data about each artifact is recorded together with information about its

exact find spot. 3. Numerous environmental and other samples are taken for laboratory

analysis, and the location and purpose of each is carefully recorded. 4. Large numbers of photographs are taken, both general views of the

progress of excavation and detailed shots showing the contexts of finds. 5. Organization and storage of material6. Analysis and hypotheses generation and testing7. Publications, museum displays8. Information services for the general public

Page 27: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

27

ETANA Spaces

1. Geographic distribution of found artifacts2. Temporal dimension (as inferred by

archaeologists) 3. Metric or vector spaces

1. used to support retrieval operations, and to calculate distance (and similarity)

2. used to browse / constrain searches spatially

4. 3D models of the past, used to reconstruct and visualize archaeological ruins

5. 2D interfaces for human-computer interaction

Page 28: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

28

ETANA Structures

1. Site Organization1. Region, site, partition, sub-partition, locus,

2. Temporal orderings (ages, periods)

3. Taxonomies1. for bones, seeds, building materials, …

4. Stratigraphic relationships1. above, beneath, coexistent

Page 29: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

29

ETANA Streams

1. successive photos and drawings of excavation sites, loci, unearthed artifacts

2. audio and video recordings of excavation activities and discussions

3. textual reports

4. 3D models used to reconstruct and visualize archaeological ruins.

Page 30: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

30

5S and DL formal definitions and compositions (April 2004 TOIS)

5S

structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)

structural metadataspecification(d.25)

descriptive metadataspecification(d.26)

repository(d. 33)

collection (d. 31)

(d.34)indexingservice

structured stream (d.29)

digitalobject (d.30)

metadata catalog (d.32)

browsingservice

(d.37)

searchingservice (d.35)

digital library(minimal) (d. 38)

services (d.22)

sequence (d. 3)

graph (d. 6)function (d. 2)

measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces

event (d.10)state (d. 18)

hypertext(d.36)

sequence (d. 3)

transmission(d.23)

relation (d. 1) language (d.5)

grammar (d. 7)

tuple (d. 4)*

Page 31: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

31

Fox & Gonçalves Book Outline

• Ch. 1. Introduction (Motivation, Synopsis)

• Part 1 – The “Ss”

• Part 2 – Higher DL Constructs

• Part 3 – Advanced Topics

• Appendix

Page 32: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

32

Book Parts and Chapters - 1

• Ch. 1. Introduction (Motivation, Synopsis)

• Part 1 – The “Ss”– Ch. 2: Streams

– Ch. 3: Structures

– Ch. 4: Spaces

– Ch. 5: Scenarios

– Ch. 6: Societies

Page 33: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

33

Book Parts and Chapters - 2

• Part 2 – Higher DL Constructs– Ch. 7: Collections

– Ch. 8: Catalogs

– Ch. 9: Repositories and Archives

– Ch. 10: Services

– Ch. 11: Systems

– Ch. 12: Case Studies

Page 34: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

34

Book Parts and Chapters - 3

• Part 3 – Advanced Topics– Ch. 13: Quality– Ch. 14: Integration– Ch. 15: How to build a digital library– Ch. 16: Research Challenges, Future Perspectives

• Appendix– A: Mathematical preliminaries– B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL– D: Formal Definitions: Archeological DL– E: Glossary of terms, mappings

Page 35: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

35

Chapter 3: (Degree of) Structure

Chaotic Organized Structured

Web DLs DBs

Page 36: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

36

Digital Objects (DOs)

• Born digital

• Digitized version of “real” object– Is the DO version the same, better, or worse?– Decision for ETDs: structured + rendered

• Surrogate for “real” object– Not covered explicitly in metamodel for a

minimal DL– Crucial in metamodel for archaeology DL

Page 37: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

37

Also Important: Epub, SGML, XML

• 5S perspective: streams, structures, scenarios

• Authoring

• Rendering, presenting

• Tagging, Markup, DOM

• Semi-structured information

• Dual-publishing, eBooks

• Styles (XSL, XSLT)

• Structured queries

Page 38: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

38

Chapter 4 Overview (Spaces)

• Retrieval models

– Boolean, extended Boolean

– Vector, LSI

– Probabilistic: classical, belief network, inference network, language models

• User interfaces and visualization – cont’d

Page 39: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

39

User interfaces and visualization

• 2D interfaces

• 3D interfaces

• GIS

• Other paradigms: trees, graphs, bubbles, coordinated views, …

• Stepping Stones and Pathways– http://fox.cs.vt.edu/SSP/

Page 40: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

40

Chapter 6 Overview (Societies)

• User communities– Authors, editors, teachers, students, readers– Personal(ization), group(ware), community, global– Accessibility, universal access

• Librarians: reference, acquisition, operations• Research community

– Associations, conferences, publications, labs, projects• Economics

– Copyright, intellectual property rights, digital rights management, authorization, authentication, security, privacy, self-archiving (eprints)

– Publishers, catalogers, distributors, sustainability– Open source, commercial, hybrid

Page 41: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

41

Chapter 9 Archives & Repositories

• Open Archives Initiative (OAI)• Institutional Repositories

• Persistent storage of digital objects• Coupling of metadata with digital objects• Use of “handles” as identifiers for digital

objects

• Put, get, harvest

Page 42: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

42

OAI - Open Archives Initiative

• Advocacy for interoperability

• Standard for transferring metadata among digital libraries– Protocol for Metadata Harvesting (PMH)

• Simplicity• Generality• Extensibility

• Support for PMH => Open Archive (OA)

Page 43: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

43

OAI – Repository PerspectiveRequired: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

Page 44: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

44

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Page 45: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

45

Institutional Repositories - 1

• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”

• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA

• www.arl.org/sparc/IR/IR_Guide_v1.pdf

Page 46: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

46

Chapter 10 Services

• Taxonomy of services

• Ontology, composition, reuse

• Evaluation

• Key services in-depth:– Crawling, indexing– Clustering, classifying– Recommending, using social networks– Logging

Page 47: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

47

Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing

Annotating Classifying Clustering Evaluating Extracting Indexing

Measuring Publicizing

Rating Reviewing (peer)

Surveying Translating

(language)

Conserving Converting

Copying/Replicating Emulating Renewing

Translating (format)

Acquiring Cataloging

Crawling (focused) Describing Digitizing

Federating Harvesting Purchasing Submitting

Preservational Creational

Add Value

Repository-Building

Information Satisfaction

Services

Infrastructure Services

Page 48: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

48

Ontology: Applications

• Expand definition of minimal DL by characterizing– typical DL services – in the context of “employs” and “produces”

relationships

• Use characterization to:– Reason about how DL services can be built

from other DL components– As well as be composed with other services

through extension or reuse

Page 49: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

49

Streams

text

audio

image

video digitalobject

Repository

CollectionCatalog

describes

stores

is_version_of/ cites/links_to

Index

Service

Scenario

event

extends

reuses

ServiceManager

Actor

operationexecutes

participates_in

recipient

runs

Scenarios

Societies

inherits_from/includes

association

uses

Topological

ProbabilisticMetric

Measurable

Measure

describes

employsproduces

employsproduces

employs

produces

Structures

Spaces

Vector

contains

metadata specifications

is_a is_a

precedes

happens_before

is_a

redefinesinvokes

contains

contains

Page 50: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

50

Ontology: Applications

Page 51: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

51

SearchingBrowsing

queryanchor

Society

actor

Collection, {digital object}

Recommending Filtering Binding Visualizing Expanding query

user model query/category {digital object}

{digital object} {digital object}

binder

InformationSatisfaction Services

space query’

fundamental

Rating Training

Infrastructure

Services (Add_Value)

composite

Requesting

handle

p pp

e e e{(digital object, actor, rate) }

p

e

e

p p p p p

e e

classifier

e ee e

e

p

e

Indexing

Index

p

e

transformer

e

Page 52: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

52

5S and Generating DLs

• 5S Framework

• 5S definitions, services taxonomy, ontology

• 5SL (specification language)

• 5SGraph (to prepare 5SL)

• 5SGen (for DL development, incl. DSpace)

• SchemaMapper for development of union DL

Page 53: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

53

5SL: a DL design language

• Domain specific languages – Address a particular class of problems by offering

specific abstractions and notations for the domain at hand

– Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping.

• XML-based realization of 5S– Interoperability– Use of many sub-languages (e.g., MIME types, XML

Schemas, UML notations)

Page 54: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

54

• Help users model their own instances of a digital library (DL) in the 5S language (5SL).

• A simple modeling process which enables rapid generation of digital libraries

• Features– 5SGraph loads and displays a metamodel in a structured toolbox.– The structured editor of 5SGraph provides a top-down visual

building environment for the DL designer.– 5SGraph produces syntactically correct 5SL files according to the

visual model built by the designer.

5SGraph: A DL Modeling Tool

Page 55: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

55

Overview of 5SGraph

Workspace

(instance model)

Structured

toolbox

(metamodel)

Page 56: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

56

Page 57: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

57

Page 58: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

58

Page 59: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

59

Page 60: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

60

5SGen

• Version 1 – MARIAN as the target system– Focused on rich structures: semantic networks– Behavior attached to nodes/links

• Version 2 – Shifted for later work to componentized (ODL) approach – Focused on scenarios/societies– Structures/Spaces encapsulated within components

(e.g., relational tables, indexes)– Only textual streams supported

• Version 3 – Into DSpace (practical DL)

Page 61: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

61

5SLGen – Version 2: ODL, Services, Scenarios

5SL-SocietiesModel (1)

XPATH/JDOMTransform (2)

XMI:ClassModel (3)

Xmi2Java (4)

JavaClasses

Model (5)

superclass

DeterministicFSM (10)

SMC (11)

JavaFinite

State MachineClass

Controller (12)

5SL-ScenarioModel (6)

XPath/JDOMTransform (7)

StateChartModel (8)

Scenario Synthesis (9)

ODLSearch

Java

Wrapping

import

ComponentPool

ODLBrowse

Java

Wrapping

import

.

.

.

JSPUser

InterfaceView (13)

Generated DL Services

DLDesigner

DLDesigner

binds

5SLGen

5SL-SocietiesModel (1)

XPATH/JDOMTransform (2)

XMI:ClassModel (3)

Xmi2Java (4)

JavaClasses

Model (5)

superclass

DeterministicFSM (10)

SMC (11)

JavaFinite

State MachineClass

Controller (12)

5SL-ScenarioModel (6)

XPath/JDOMTransform (7)

StateChartModel (8)

Scenario Synthesis (9)

ODLSearch

Java

Wrapping

import

ComponentPool

ODLBrowse

Java

Wrapping

import

.

.

.

ODLSearch

Java

Wrapping

import

ComponentPool

ODLBrowse

Java

Wrapping

import

.

.

.

JSPUser

InterfaceView (13)

Generated DL Services

DLDesigner

DLDesigner

binds

5SLGen

Page 62: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

62

Tools/Applications

5S MetaModel

5SGraphDL

Expert

DL Designer

5SL DL

Model

5SLGen

Practitioner

Researcher

TailoredDL

Teacher

componentpool

ODLSearch,ODLBrowse,ODLRate,ODLReview,

…….

Logging ModuleXMLLog

Page 63: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

63

5SGraph5S Archaeology

MetaModelArchDL Expert ArchDL Designer

Structure Sub-model

ETANA-DLUnion Services

Descriptions

HarvestingMapping

SearchingBrowsing

Scenario Sub-model

VN Metadata Format

ETANA-DL Metadata Format

HD Metadata Format

Mapping Tool

Wrapper4VN Wrapper4HD

Inverted Files

Services DB

Index

Index

BrowseService

SearchService

Browse DB

OtherETANA-DL

Services

Web

Interface

XOAI

XOAI

VNCatalog

HDCatalog

UnionCatalog

5SGen

ComponentPool

Browsing…

Page 64: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

64

Computing and Information Technology Interactive Digital Educational Library (CITIDEL)

• Domain: computing / information technology

• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …

• Submission & Collection: sub/partner collections www.citidel.org

Page 65: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

65

Annotations

OAI Data

Harvester

EDUCATORS

ADMINISTRATORS LEARNERS

Multilingual Searching

Revising Annotating Filtering Browsing Administering

Filtering Profiles User Profiles

Union Metadata

OAI Data

Provider

Remote and Peer Digital Libraries (eg. NSDL -CIS)

PORTALS

SERVICES

REPOSITORIES

Digital library architecture for localand interoperable CITIDEL services

Page 66: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

CITIDEL -> NSDL

• A collection project in the

• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL

• National Science Digital Library

• www.nsdl.org

• (Next slides courtesy Lee Zia, NSF)

Page 67: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

67

Page 68: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

68

Page 69: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

69

NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

Page 70: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

A Digital Library Case Study

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Submission: ETD-db, DSpace, Proquest, …

• Collection: local archives, regional collaborations, global union catalog

Project: Networked Digital Library of Theses & Dissertations (NDLTD) www.ndltd.org

Page 71: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

71

Page 72: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

Student Gets CommitteeSignatures and Submits ETD

Signed

Grad School

Page 73: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

• Aiding universities to enhance graduate education, publishing and IPR efforts

• Helping improve the availability and content of theses and dissertations

• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive)

What are we doing?

Page 74: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

74

Why ETD? Short Answer

• For Students:– Gain knowledge and skills for the Information Age– Richer communication (digital information, multimedia, …)

• For Universities: – Easy way to enter the digital library field and benefit

thereby

• For the World: – Global digital library – large, useful, many services

• General:– Save time and money– Increased visibility for all associated with research results

Page 75: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

75

Metamodels in the 5S Framework

• Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services

• Minimal DL

• Minimal ArchDL

• …

Page 76: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

76

Digital Object

RepositoryCollection Minimal DL

Metadata Catalog

Descriptive Metadata

Specification

A Minimal DL in the 5S Framework

Structural Metadata

Specification

Streams Structures Spaces Scenarios Societies

indexing

browsing searching

services

hypertext

Structured Stream

Page 77: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

77

Streams Structures Spaces Scenarios Societies

indexing

browsing searching

services

hypertext

Structured Stream

Descriptive Metadata

specification

SpaTemOrg

StraDia

Arch Descriptive Metadata specification

ArchDO

ArchObj

ArchColl

Arch Metadata catalog

ArchDColl ArchDR Minimal ArchDL

A Minimal ArchDL in the 5S Framework

Page 78: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

78

Moving from a minimal DL towards a DL reference model (1/2)

Minimal DL DL reference model

Multimedia

Annotation

Knowledge management Practical DL

systems

PIMDL quality

Domain-specific DLs

Page 79: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

79

Moving from a minimal DL towards a DL reference model (2/2)

• Content-based image retrieval services in a DL

• A superimposed-information-supported DL

• Practical DL generation

Page 80: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

80

Superimposing information

Superimposed layerNew information/structures

Base layerExisting information from heterogeneous sources: text, images, audio/video documents

MarkReference to base information element

Page 81: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

81

Preliminary SI-DL metamodel

Page 82: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

82

Stream Structure Space Service Society

ImageStream

FeatureVector

Image Descriptor

StructuredFeatuteVector

ImageContent

Description

ImageDigitalObject

ImageObject

User InfoNeed

ImageCollection

VisualizationOperation

Content-based ImageSearching Service

Image DescriptorMetadata Catalog

Composite Descriptor

KNNQ

RQ

Minimal CBIR DL

Page 83: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

83

Summary• 5S and Generating DLs

– 5S Framework– 5S definitions, services taxonomy, ontology– 5SL– 5SGraph– 5SGen (and DL development)– DL development of union DL– 5SGen into DSpace

• 5S Metamodels – Minimal DL– Archaeology DL– Multimedia (CBIR) DL– Union DL– Practical DL, superimposed information, personal DL, …

Page 84: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

84

DL Curriculum Project (NSF supporting VT, UNC-CH)

• Identify, develop and test educational DL modules, guided by

- Experts, international collaborators

- Computing Curriculum 2001

- 5S framework

- Analysis of DL course syllabi

Page 85: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

85

CC2001 Information Management Areas

IM1. Information models and systems*

IM8. Distributed DBs

IM2. Database systems* IM9. Physical DB design

IM3. Data modeling* IM10. Data mining

IM4. Relational DBs IM11. Information storage and retrieval

IM5. Database query languages IM12. Hypertext and hypermedia

IM6. Relational DB design IM13. Multimedia information & systems

IM7. Transaction processing IM14. Digital libraries

Page 86: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

86

Why Modular Design

• Flexibility, e.g., for ETD programs:– Self-study by NDLTD trainers– Self-study by ETD authors– Short courses by NDLTD trainers of ETD

authors– A course based on a single module– Course sequence (program) from multiple

modules– Plug in modules into an existing course

(enhancement)• Module 1. Overview + Module 10. DL

Education & Research

Page 87: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

87

Modules

1. Collection Development2. Digital objects / Composites / Packages3. Metadata, Cataloging, Author submission4. Architecture, Interoperability5. Data visualization6. Services7. Intellectual property rights management,

Privacy, Protection8. Social issues / Future of DLs9. Archiving and Preservation

Page 88: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

88

Ascertaining Priority Topics

• We’ve manually classified and analyzed publications using 9 Modules:

Source Count

Proceedings JCDL ’01 – ’05 354

Proceedings ACM DL ’96 – ’00 189

Magazine articles D-Lib ’95 – ‘06 521

Session titles JCDL, ACM DL, ECDL

264

Page 89: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

89

Conference papers x modules

0

20

40

60

80

100

120

140

160

180

200

1 2 3 4 5 6 7 8 9

Module ID

Nu

mb

er

of

co

nfe

ren

ce

pa

pe

rs

JCDL 05

JCDL 04

JCDL 03

JCDL 02

JCDL 01

ACM DL 00

ACM DL 99

ACM DL 98

ACM DL 97

ACM DL 96

Page 90: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

90

• Analysis Results:

- Total of 543 proceedings:

Most popular topics were architecture (module 4) and services (module 6)

Page 91: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

91

Distribution of D-Lib Magazine Articles

across Module Topics

0

20

40

60

80

100

120

140

160

180

200

1 2 3 4 5 6 7 8 9

Module ID

Nu

mb

er

of

D-L

ib a

rtic

les

D-Lib 06

D-Lib 05

D-Lib 04

D-Lib 03

D-Lib 02

D-Lib 01

D-Lib 00

D-Lib 99

D-Lib 98

D-Lib 97

D-Lib 96

D-Lib 95

Page 92: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

92

• Analysis Results:

- Total of 521 articles:

Most popular topics were architecture (module 4), services (module 6)

and social issues (module 8)

Page 93: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

93

Distribution of Session Titles

across Module Topics

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9

Module ID

Nu

mb

er

of

pa

nel

se

ssio

ns

JCDL & ACM DL

ECDL

ICADL

Page 94: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

94

• Analysis Results:

- Total of 264 session titles (JCDL, ECDL, ICADL):

Most popular topic was services (module 6)

followed by architecture (module 4)

Page 95: 1 C.W. Post Campus, Long Island U. (23 April 2008) “Digital libraries: From Theory to CS/LIS Curricula” Edward A. Fox fox@vt.edu  Dept

95

Pointers and Summary

• http://fox.cs.vt.edu

• http://fox.cs.vt.edu/talks

• www.dlib.vt.edu

[email protected]

• DL, 5S

• Education: CITIDEL, NSDL, NDLTD, LIKES, DLcurric