Upload
kelley-black
View
213
Download
0
Embed Size (px)
Citation preview
ETDs for Beginners: History and Approach
Edward A. FoxExecutive Director, NDLTD
(plus slides from Vinod Chachra, Thom Hickey, Joan Lippincott, and Gail McMillan)
Professor, Dept. of Computer ScienceVirginia Tech (VPI&SU), Blacksburg, VA, USA
http://fox.cs.vt.edu [email protected]
ETD 2003 Humboldt University, Berlin 21-24 May 2003
ACKNOWLEDGEMENTS
• ETD 2003 organizers and attendees
• Wonderful service of NDLTD Board of Directors, and previous Steering Committee, other committees
• Bold efforts by those running ETD initiatives in universities, regions, and countries
• Helpful sponsorship by many organizations, especially Adobe, Brocade Communications, c.a.r.u.s. Information Technoligy, Cisco Systems, CONACyT, Controlware, DFG, Enterasys Networks, Ex Libris, FIPSE, IBM, ImageWare Components, LIB-IT, Microsoft, Nionex, NSF, OCLC, VTLS, SOLINET, Springer-Verlag, SUN, SURA, T-Systems, UNESCO, many governments (Australia, Germany, India, …), …
PERSPECTIVE
Digital Libraries --- Virginia Tech• MARIAN (NLM, NSF)• CS DL Prototype - ENVISION (NSF, ACM)• TULIP (Elsevier, OCLC)• BEV History Base (NSF, Blacksburg)• DL for CS Education - EI (NSF, ACM)• WATERS, NCSTRL (NSF)• NDLTD (SURA, US Dept. of Education, NSF)• CSTC (NSF, ACM), CRIM (NSF, SIGMM)• WCA (Log) Repository (W3C)• VT-PetaPlex-1 (Knowledge Systems)• NSDL (NSF): CITIDEL, DL-in-a-Box, GetSmart• AmericanSouth.Org (Mellon)
DL Examples
• IBM Digital Library
• Virtua (www.vtlc.com)
• Greenstone (www.greenstone.org)
• Eprints (www.eprints.org)
• Many systems in NSF DLI projects
• VT systems: MARIAN, CSTC, NDLTD• Work on ODL, DL-in-a-box, CITIDEL, NCSTRL
Libraries of the FutureJCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
Digital Libraries
SGML (1985)
PDF(1992)
NSF DLI (1994)
LibraryCancellations
(1988)
UniversityScholarlyElectronic
Pub. (1988)
Info.Literacy(1995)
ImprovingEducation Internet
(1984)
WWW(1994)
Multimedia(1986)
SynchronousScholarly Communication
Same time, Same or different place
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
Borgman et al.:Workshop Report onSocial Aspects ofDigital Libraries: http://www-lis.gseis.ucla.edu/DL/
InformationLifeCycle
Information Life Cycle
AuthoringModifying
OrganizingIndexing
StoringRetrieving
DistributingNetworking
Retention/ Mining
AccessingFiltering
UsingCreating
Computing (flops)Digital content
Com
mun
icat
ions
(ban
dwid
th, c
onne
ctiv
ity)
Locating Digital Libraries in Computing andCommunications Technology Space
Digital Libraries technologytrajectory: intellectualaccess to globally distributed information
less more
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
Digital LibrariesShorten the Chain from
Editor
Publisher
A&I
Consolidator
Library
Reviewer
DLs Shorten the Chain to
Author
Reader
Digital
LibraryEditor
Reviewer
Teacher
Learner
Librarian
Digital Libraries --- Objectives
• World Lit.: 24hr / 7day / from desktop• Integrated “super” information systems: 5S:
streams, structures, spaces, scenarios, societies • Ubiquitous, Higher Quality, Lower Cost • Education, Knowledge Sharing, Discovery• Disintermediation -> Collaboration • Universities Reclaim Property• Interactive Courseware, Student Works• Scalable, Sustainable, Usable, Useful
Benefits
• Ease of use
• Effectiveness
• “The benefits of digital libraries will not be appreciated unless they are easy to use effectively.” - IITA Workshop report
DLs: Why of Global Interest?
• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to economic and technological growth, education
• DL - a domain for international collaboration• wherein all can contribute and benefit• which leverages investment in networking• which provides useful content on Internet & WWW• which will tie nations and peoples together more
strongly and through deeper understanding
Application
Domain
Related Institutions
Examples Technical Challenges Benefit / Impact
PublishingPublishers, Eprint
archivesOAI Quality control, openness Aggregation, organization
Education
Schools, colleges, universities
NSDL, NCSTRL Knowledge management,
reuseabilityAccess to data
Art, Culture
Museum AMICO, PRDLA Digitization, describing,
catalogingGlobal understanding
ScienceGovernment,
Academia, Commerce
NVO, PDG, SwissProt, UK
eScience,European Union Commission
Data modelsreproducibility, faster reuse, faster
advance
(e) Governme
nt
Government Agencies (all levels)
Census Intellectual property rights,
privacy, multi-nationalAccountability, homeland security
(e) Commerce
, (e) Industry
Legal institutionsCourt cases,
patents Developing standards
Standardization, economic development
History, Heritage
Foundations American Memory Content, context,
interpretation
Long term view, perspective, documentation, recording, facilitating, interpretation,
understanding
Cross-cutting
Library, Archive
Web, personal collections
Multi-language, preservation, scalability, interoperability, dynamic
behavior, workflow, sustainability, ontologies,
distributed data, infrastructure
Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness
Reagan Moore
Ed Fox
June
2002
for
NSF
Digital Libraries
• Online course materials at http://ei.cs.vt.edu/~dlib/rcontents.htm
• Topical outlines:
Topical Outline - Foundations
• Early visions
• Definitions
• Resources
• References
• Projects
Topical Outline – IR Areas
• Search, Retrieval, Resource Discovery• Information storage and retrieval• Boolean vs. natural language• Search engines• Indexing, phrases, thesauri, concepts• Federated search and harvesting, OAI• Integrating links and ratings• Crawlers, spiders, metasearch, fusion
• Details following – Li Wang indep. study
Topical Outline - Multimedia
• Multiple media types, representations
• Text, audio, image, video, graphics, animation
• Capture, digitization, standards, interchange
• Compression, content-based retrieval
• Playback (Real), SMIL, QoS
• JPEG, MPEG (and versions)
Topical Outline - Architectures
• Distributed, centralized
• Modular, componentized
• Bus (InfoBus), hierarchical, star
• Mediators, wrappers (TSIMMIS)
• Light weight protocols
• Architecture of OAI and XOAI
Topical Outline – Interfaces
• Taxonomy of interface components
• Workflow
• Visualization
• Environments
• Design
• Usability testing
Topical Outline – Metadata
• MARC
• Dublin Core
• RDF
• IMS
• OAI (Open Archives Initiative)
• Crosswalks, mappings
• Ontologies
• Topics maps, concept maps
Topical Outline – Epub, SGML, XML
• Authoring
• Rendering, presenting
• Structure
• Tagging, Markup, DOM
• Semi-structured information
• Dual-publishing, eBooks
• Styles (XSL, XSLT)
• Structure queries
Topical Outline – Databases
• Extending database technology
• Structured and unstructured info
• Multimedia databases
• Link databases
• Performance
• Replicated storage, I2-DSI (details following)
Topical Outline – Agents
• Protocols
• Knowledge interchange
• Negotiation, registries
• Distributed issues
• Ontologies (standard upper)
• Webbots (automatic indexing)
Topical Outline – Economics
• E-commerce
• Sustainability
• Preservation and archiving• DLF, Besser, Lorie, Gladney
• Self-archiving
• Open collections
• Economic models, business plans
Topical Outline – IPR
• Intellectual property rights (IPR)
• Legal issues
• Terms and conditions
• Copyright
• Patents, trademarks
• Distributed rights management
• Security
Topical Outline – Social Issues
• Cooperation, collaboration• Annotation, ratings• Digital divide• Educational applications• Cultural heritage• Museums (AMICO)• Organizational acceptance• Personalization• Internationalization
DL Challenges
• Preservation - so people with trust DLs
• Supporting infrastructure - networks, ...
• Scalability, sustainability, interoperability
• DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM, ...
• Need tools & methods to make them easier to build
Definitions
• Library ++ (library+archive+museum+…)
• Distributed information system + organization + effective interface
• User community + collection + services
• Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation
Definition: Digital Libraries are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
5S Model
Models Examples ObjectivesStream Text; video; audio; image Describes properties of the DL
content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata; organization tools
Specifies organizational aspects of the DL content
Spatial Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending,
Details the behavior of DL services
Societies Service managers, learners, Teachers, etc.
Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
5S Model for DLs
5S DefinitionStreams Sequences of elements of an arbitrary
type
Structures Labeled directed graphs
Spatial Sets and operations on those sets
Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement.
Societies Sets of communities and relationships among them
5SLGen: Automatic DL Generation
5S Meta
Model5SLGraph
DL Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
OCKHAM
• Simplicity (a la OCCAM’s razor)
• Support by Mellon and DLF
• Next meeting in Atlanta Jan. 8, 2003
• Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
Problem
Why do DL developers continue to “reinvent the wheel”? The top 10 reasons are:
1. The library budget won’t allow purchase of a commercial DL system.
2. Unless the development effort is local, there won’t be any control.
3. DLs are extensions of DBMSs, so they are simple applications to develop.
4. Since DLs operate on the Web, one must adopt the newest W3C proposal.
Problem – cont’d
5. Since technology moves so quickly, it is essential to follow the latest fad.
6. CS students always develop from scratch.
7. This team knows it can do it better.
8. This system must have more capabilities than any other system.
9. This DL has to be more flexible and extensible.
10. This is the right system architecture – at last!
Problem Approach
We• address the problem of how to develop DLs;• build on experience in building many DLs;• strive for simplicity as per OCKHAM initiative;• build upon the Open Archives Initiative;• demonstrate our approach in diverse situations;• and invite all to
• use DL-in-a-box and• help build Open Digital Libraries.
NUDL (www.nudl.org)Int’l Research Support (1997)
• Networked University Digital Library• Partners: Germany, Mexico (Puebla and
Monterrey), Brazil• Problems: Multilingual search, high
performance DLs, requirements/usability, …
• Start with ETDs, then expand to other student works, portfolios, data sets, (CS) courseware, ... -> institutional repositories
ALPHABET SOUP,NOT
ROCKET SCIENCE
Alphabet Soup
•E and T or D = ETD
• (electronic)
• (thesis)
• (dissertation)
Alphabet Soup
•ET and ED = ETDs
Alphabet Soup
•DL and ET or ED = DLTD
•(digital library)
Alphabet Soup
•SURA and DLs and ETDs = Regional DLTD
•(Southeastern University Research Association)
Alphabet Soup
• FIPSE and DLs and ETDs = National DLTD
• (Fund for the Improvement of Post Secondary Education – US Dept. of Ed)
Alphabet Soup
• International and DLs and ETDs = Networked DLTD = NDLTD
• (Recall “n” in CNI –> Coalition for Networked Information)
Alphabet Soup - Factoring
• NDLTD = ND LTD• (Paul Mather – from UK)
• NDLTD = NDL TD• (Edie Rasmussen)• (Later, Networked University Digital
Library = NUDL
A Digital Library Case Study
• Electronic theses and dissertations (ETDs)
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
• Networked Digital Library of Theses and Dissertations (NDLTD) http://www.ndltd.org (formerly “National” because of Fed. funds, before international members started joining)
SLIDES FROM 1998
What led to today’s situation?
• 1987 mtg in Ann Arbor: UMI, VT, …
• 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each
• 1993 mtg in Atlanta to start Monticello Electronic Library (MEL): SURA, SOLINET
• 1994 mtg in Blacksburg re ETD project: std of PDF + SGML + multimedia objects
• 1996 funding by SURA and US Dept. of Education (FIPSE) for regional, national projects (NDLTD)
VISION,BENEFITS,
APPROACH,POSSIBILITIES
• Aiding universities to enhance grad educ., publishing and IPR efforts: to help improve the availability and content of theses and dissertations
• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive)
• Demonstrating how for other organizations
What are we doing?
What are the key ideas?• Scalability
• Empower authors to submit to DL, as a natural part of the educational process
• Study workflow & apply automation, so institutions streamline processing and build their part of the DL
• Federate along most suitable cultural/political lines
• People can switch to electronic documents• Becoming more expressive with hypermedia
• Mandating ETDs will change all future scholarship
What are the benefits?
• Save students money• Save handling, shelf space in libraries • Build the Networked Digital Library of Theses and
Dissertations: with faster, broader, and less expensive access
• Demonstrate how universities can work together directly (vs. indirectly through publishers or associations)
What are the long term goals?• 400K US students / year getting grad degrees are exposed
/ involved
• 200K/yr rich hypermedia ETDs that may turn into electronic portfolios
• Dramatic increase in knowledge sharing: lit. reviews, bibliographies, …
• Services providing lifelong access for students/researchers: browse, search, prior searches, citation links
• Record all work with NDLTD, return to prior situation, prepare bibliography
• Powerful (multilingual, text, image) searching, browsing (with categories), following citation links
• Support collaboration with others in same field: help with literature review, sharing tools and data sets, applying their methods
Grad Student Workstation?
• Increase local interchange among students, faculty, library, graduate school
• Increase international understanding, building many more invisible colleges, with students more empowered
• Connect graduate researchers with undergrads, who can access ETDs / them
• Facilitate direct university collaboration, explicitly, in reshaping publishing world
Social Capital?
How are ETDs being done at Virginia Tech?
• Produced using standard word processing packages as PDF files• LaTeX class, outline fonts• Word template, PDFwriter
• Reviewed by the Graduate School
• Cataloged and archived by the library
• Downloaded by UMI from server (if payment has been made)
Convene Local Planning Group
ETD
Build An ETD Site
Digital Library
Policies
Inspection/Approval
Workshop/Training
ETD
ETD
NDLTD
Computer Resources
Research
Literature
Student Prepares Thesis or Dissertation
Student Defends and Finalizes ETD
My Thesis
ETD
Student Gets Committee Signatures and Submits ETD
Signed
Grad School
Graduate School Approves ETD Student is Graduated
Ph.D.
Library Catalogs ETD and New StudentsHave Access to the New Research
WWW
NDLTD
Status of the Local Project
• Approved by university governance Spring 1996; required starting 1/1/97
• Submission & access software in place
• Submission workshops for students (and faculty) occur often: beginner/adv.
• Faculty training as part of Faculty Development Initiative
• Over 700 ETDs in collection by 1/98
How can a university get involved?
• Select planning/implementation team• Graduate School• Library• Computing / Information Technology• Institutional Research / Educ. Tech.
• Send us letter, give us contact names
• Adapt Virginia Tech solution• Build interest and consensus• Start trial / allow optional submission
CONCERNS,PROBLEMS,OPPOSITION
Some Barriers at Universities
• Lethargy; Not invented here (esp. large univ’s)
• Anger with unfunded, added, required work
• Last straw: using more frustrating technology
• Lack of experience in working together: graduate school, library, computing staff
• Lack of interest in (quality of) student work
• More loyalty to discipline than to campus
• Unwillingness to accept responsibility for $ problems with libraries, publishers
MECCA Conf. 6/11/98• Armbruster, U. Tennesee, Memphis• Bennett, Robert C., U. Texas Med Sch• Brown, Melinda, Vanderbilt• Eaton, John, Graduate School, Va Tech• Fox, Ed, Computer Science, Va Tech• Gherman, Paul, Library, Vanderbilt• Goodstein, Lynn, Penn St. U.• Hagen, John H., Library, WVU• Hardemon, James, U. Florida• Helmstetter, Wendy, Library, FIT,• Liston, Rick, NCSU• Lutz, Richard, Graduate School, Florida• McFarland, Mark U. Texas, Austin• McMillan, Gail, Library, Va Tech• Minsker, Tom, Penn St U.• Mortara, Antionet, FIT• Painter, Linda, U. Tennessee• Sowell, Robert, Graduate School, NCSU• Tague, Larry, U. Tennessee, Memphis• Vaughan, Mary Ann, Vanderbilt
ETDOverview
Spirit of NDLTD
• Help make a better (smaller) world• Win-win-win (everyone can benefit)• Have fun helping others• Helpers/teachers learn more than those they work with• Cooperation, friendly competition
• When you “1-up” VT, share your software, documents!• “Doing better” requires both “doing”, “better”
• Balance (and build on standards)• New, popular, powerful, expressive, exciting, “better”• Doable, feasible, learnable, affordable, sharable, preservable
• We can always do more, enhancing quality and knowledge!
The Networked Digital Library of Theses and Dissertations
www.NDLTD.org
Leader of the Worldwide ETD(Electronic Thesis and Dissertation) Initiative
Training AuthorsExpanding Access
Preserving KnowledgeImproving Graduate Education
Enhancing Scholarly CommunicationEmpowering Students & Universities
GradProgram
IT Ed.(Tech)Library
NDLTD
Key Ideas: Networked infrastructure
Scalability
Education is the rationale
University collaboration
Workflow, automation
Authors must submitMaximalAccess
PDF, SGML, MM,MARC, DC, URNs,Federated search
Standards
8th graders vs. grads
What led to today’s meeting?• 1987 mtg in Ann Arbor: UMI, VT, …• 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities
with 3 reps each• 1993 mtg in Atlanta to start Monticello Electronic Library (regional,
US Southeast): SURA, SOLINET• 1994 mtg at VT: std: PDF + SGML + multimedia objects• 1996 funding by SURA, US Dept. of Education (FIPSE)• 1997 meetings in UK, Germany, ...• 1998 – 1st symposium – Memphis (20)• 1999 – 2nd symposium – Blacksburg (70)• 2000 – 3rd symposium – St. Petersburg (225)• 2001 – 4th symposium – Caltech (200)• 2002 – 5th syposium – BYU, Provo, Utah• 2003 – 6th syposium – Berlin (215) • 2004 – 7th syposium – U. Kentucky• 2005 – 8th syposium – Sydney, Australia
NDLTD Membership
• As of 5/17/2003 there were at least:
• 176 members, including:
• 155 individual universities
• 6 consortia
• 21 institutional members
National / Regional Projects• Australia
• U. New South Wales (lead)• U. of Melbourne• U. of Queensland• U. of Sydney• Australian National U.• Curtin U. of Technology• Griffith U.
• Belgium• Brazil• Germany
• Humboldt University (lead)
• 3 other universities
• 5 learned societies: Math, Physics, Chemistry, Sociology, Education
• 1 computing center
• 2 major libraries
• India• Lithuania• Spain: Consorci de Biblioteques
Universitàries de Catalunya, as group, www.cbuc.es: 9 sites
• Sudan• UK (British Library, JISC,
Edinburgh)• UNESCO (especially Latin
America, Eastern Europe, Africa)• USA:
• CIC (“Big 10”)• Ohio: OhioLINK: 79 colleges/univs• SOLINET
• …
OhioLINK
• Statewide Consortium
• Represents 79 colleges, universities, libraries
• Public Universities
• Private Universities and Colleges
• 2-Year Colleges
• Only a few (e.g., Miami U. of Ohio) are also NDLTD members on their own
US University Members• Air University (Alabama)• Baylor University• Boston University• Brigham Young University• Caltech• Clemson University• College of William & Mary• Concordia University (Illinois)• Drexel University – required 4/2002• East Carolina University• East Tenn. State U. – required 1/2001• Florida Institute of Technology• Florida International University• Florida State University• Florida Tech• George Washington University• Georgetown University• Johns Hopkins University • Louisiana State University – required 1/2002• Marshall University (W. Va.)• Miami University of Ohio• Michigan Tech• Mississippi State University• MIT• Montana State University• Naval Postgraduate School (CA)• New Jersey Inst. of Technology• New Mexico Tech• North Carolina State University – required 9/2002• Northwestern University• Penn. State University• Regis University• Rochester Institute of Tech.• Texas A&M
• U. of Central Florida• U. of Colorado Health Science Center• U. of Florida – required 8/2001• U. of Georgia – required 9/2001• U. of Hawaii, Manoa • U. of Illinois, Urbana-Champaign• U. of Iowa• U. of Kentucky – required in CS only• U. of Maine – required in CS, Spatial Info Sci/Eng• U. of Missouri-Columbia• U. of North Texas – required since 8/99• U. of Oklahoma• U. of Nevada, Las Vegas• U. of New Orleans• U. of North Texas – required 8/1999• U. of Oklahoma• U. of Pittsburgh• U. of Rochester• U. of South Florida – required 8/2002• U. of Tennessee, Knoxville• U. of Tennessee, Memphis• U. of Texas at Austin – required 6/2001• U. of Virginia – required 1/2003• U. of West Florida• U. of Wisconsin - Madison – part reqt 12/1999• Vanderbilt U.• Virginia Commonwealth U.• Virginia Tech - required 1/97• Wake Forest U.• West Virginia U. - required 8/1998• Western Kentucky U. – required 9/2004• Western Michigan U.• Worcester Polytechnic Inst. – required 7/2002• Yale U.
Other Countries (selected)
• Australia• Belgium• Brazil• Canada• Chile• China, Hong Kong• Columbia• Finland• France• Germany• Greece• India• Italy• Jamaica• Korea• Lithuania• Mexico
• Netherland• Norway• Poland• Russia• Singapore• S. Africa• S. Korea• Spain• Sudan• Sweden• Taiwan• Thailand• UK• Venezuela
Institutional Members• Australian Digital Theses Program• British Library• Cinemedia• Coalition for Networked Information (CNI)• Committee on Institutional Cooperation (CIC)• Consorci de Biblioteques Universitàries de Catalunya• Diplomica.com• Dissertation.com• Dissertationen Online (Germany)• ETDweb, a Division of Answer4.com• Ibero-American Science & Technology Education Consortium (ISTEC)• MathDISS International• National Documentation Centre (NDC), Greece• National Library of Canada• National Library of Portugal • OCLC Online Computer Library Center• Office of Scientific and Technical Info (US Dept of Energy)• OhioLINK• Organization of American States (SEDI/OAS)• Southeastern Library Network (SOLINET)• Sudanese National Electronic Library• UNESCO (www.unesco.org/webworld/etd)
UNESCO and ETDs• Promoting the use of the Internet as a tool for disseminating scientific
knowledge• Facilitating the transfer of ETD expertise from developed to
developing countries • 1998: Member of the NDLTD Steering Committee• 1999: First UNESCO ETD meeting on ETD internationalisation • 2002: “UNESCO Guide to Electronic Theses and Dissertations” • 2003: Model training programmes and training courses• 2003: Sponsor pilot projects• 2003: Pilot projects (Africa, Europe, Latin-America)
Access Possibilities
Websearchengines
librarycatalogclients
www.theses.org
www.openarchives.org
3rd
PartyServices(e.g.,Bell &Howell)
VirginiaTech
NationalLibrary ofPortugal
CBUC(Spain)
OhioLink
MIT NationalProjects:AU, GE, …
ETD Initiative (and ProQuest)
StudentsLearn aboutDL, EPub
TDsbecome more
expressive
N. Amer. (T)Ds areaccessible, archived
Global TDsbecome more
accessible,archived
UMI
Universities
Why ETD?Short Answer
• For Students:• Gain knowledge and skills for the Information Age
• Richer communication (digital information, multimedia, …)
• For Universities: • Easy way to enter the digital library field and benefit thereby
• For the World: • Global digital library – large, useful, many services
• General:• Save time and money
• Increased visibility for all associated with research results
The Process?Short Answer
• For Students:
• Plan on ETD from day 1
• Secure knowledge from: workshops, online info, colleagues
• Work with faculty to plan approach
• PDF? XML? TEI? Multi/hypermedia? Data sets? Viz?
• Get signed approval form: access, ©, proxy assignment
• After defense and approval, submit ETD to university
• For Universities:
• Form team
• Adapt solution from work at other universities, attend ETD conference
• Pilot -> Option -> Requirement
Assistance
• Software, documentation, tech support
• Email, listservs ([email protected])
• UNESCO sponsored etdguide.org• English in 2001, Spanish&French in 2002• Training sessions in Latin America …
• Marcel Dekker book soon in press
• www.ndltd.org
Technical Umbrella for Practical Interoperability…
ReferenceLibraries
PublishersE-Print
Archives
…that can be exploited by different communities
Museums
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
Repository of Digital Objects
RepositoryAccessProtocol
handle
Digital object
terms and conditions
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
Browse SummarizeSearch Visualize
DO DODODODODODO
Services:
Docs:
Metadata:
Protocol for Metadata Harvesting
• Service Requests• Identify
• ListMetadataFormats
• ListSets
• GetRecord
• ListIdentifiers
• ListRecords
• Metadata Multiplicity
• Date/Time Ranges
• Sets (with semantics depending on local data providers)
• Resumption Tokens
Key Features of the OAI Metadata Harvesting Protocol
• definitions & concepts
• repository
• record
• identifier
• datestamp
• set
• protocol features
• HTTP encoding
• metadata prefix & schema
• flow control
• protocol requests
• supporting requests
• harvesting requests
repository
repos i tory
OAI protocol
harves ter
supportdata
harvestingdata
items
selective harvesting - datestamps
repos i tory
harvest withindate range
record
record
DL Components
User Interfaces
Workflow Mgr
DBMS
Search Engines, Classifiers, …
Data, MM Info
Gateways
Repository
Rights Mgr
MM/ HT Renderer
Open Digital Library (ODL) Hypothesis (Hussein Suleman)
• Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ?
Maybe … if
Digital Libraries can be modeled as• networks of extended Open Archives, where• each extended Open Archive is a• source of data and/or a provider of services.
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
users digital objects
?
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
componentized digital library
?
?
?
?
???
?
?
?
?
??
? ?
?
?
?
?
?
?
?
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
Component System Approach• (Open) DL = Network of Extended OAs
Local Archive
Data Input
Remote Archive
Browse
Metadata Repository
Search Recommend
Resource Discovery
User Interface
OAI/ODL archive
OAI/ODL protocol
leg
end
Example Architecture (NDLTD)
Humboldt
Duisburg
MIT Filter
MIT
Browse
Union Catalog
Search Recent
User Interface
User Interface
OAI/ODL archive
OAI/ODL protocol
leg
end
Virginia Tech
PhysNet
CalTech
Dresden
ODL Demonstration - FrontPage
ODL Component Requirements
• Search• Retrieve a list of items• Index new items
• Annotate• Add annotation to item• Retrieve a list of annotations for an item
Open Digital Library Components
• Running now• XML-File (data provider from file system)• Union, search, browse, recent, filter• E-journal/review, Submit, Edit, Annotation
• Class projects• High performance multilingual search• Recommender, Rating; Mirroring (see JCDL’02)• Working with NCSA: from DB, unstructured text
• Others discussed• Classification/categorization• DL-Viz interconnection (VIDI – Jun Wang ETD)
Harvest from data providers
DBUnion Archive Merger Component
DBBrowse Browse Engine
IRDB-1 Search Engine
As Metadata Search Service Provider
As Metadata Browse Service Provider
XML File Coll. & Data Provider 1
XML File Coll. & Data Provider 2
XML File Coll. & Data Provider 3
Open Digital Library: Extended
What’s NewEngine
As What’s New Service Provider
OAI-PMHData Provider
Submit Archive
OAIB (NCSA:from RDBMS)
Filter
Recommend
RateEngine
AnnotationEngine
IRDB-2 Search Engine
As Annotation Search Service
Provider
As Recommend & Rate Service Provider
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
Digital Library for the Networked Digital Libraryof Theses and Dissertations (www.ndltd.org)
SearchFilter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers
ETD collections
Example Open Digital Library
DBReview Box: Reviews
USER INTERFACE
Box: Resources
under Review
DBUnion: Metadata
Union
User Interface OAI/ODL component OAI/ODL protocol
Box: Accepted
Resources
IRDB
Box: Users
DBUnion: Legacy
Metadata
Thread
DBRate
Suggest
DBBrowse
Example Open Digital Library
Digital Library for theComputer Science Teaching Center (www.cstc.org)
Digital Library in a Box
• Domain: helping DL projects
• Genre: any domain, but especially those involved in NSDL (since funded in part is through NSDL – with U. FL, NCSA)
• Software and Documentation: http://dlbox.nudl.org
DL Standardized Log Format- Design
5S Definition Use in Log Design
Streams Represent static and dynamic multimedia content
Temporal events, types of digital objects
Structures Labeled directed graphs; provide organization within the DL
Structured documents and metadata; structured searches, collection, metadata catalog; hypertext, classification scheme
Spaces Sets, properties and operations on those sets
Retrieval mode, Presentation information,
Scenarios sequences of events that modify states of a computation in order to accomplish some functional requirement.
Organization of the user and system actions into transactions, statements, events and actions; DL services as sets of scenarios.
Societies Sets of communities and relationships among them
User information
ETDs and Libraries
Gail McMillan
Digital Library and Archives, University Libraries
Virginia Polytechnic Institute and State University
Ohio State University/Virginia Tech Video Conference
October 24, 2002
Goals for Libraries and Archives
• Improve services• Better turn-around time• Always available
• Reduce work (save $)• Catalog from etext• Eliminate handling
• Save space
ETDs at Virginia Tech
• Partnership: Library, Graduate School, and Faculty
• Approved by university governance- Mar.1996• Full implementation- Jan.1997• Web submission
• Students: http://etd.vt.edu• Programmers: http://scholar.lib.vt.edu/ETD-db/
• Workshops for students (and faculty)• Over 5000 ETDs approved
How are ETDs managed?• Graduate student creates ETD
• Word processor, multimedia• Saves as PDF, usually
• Graduate student submits ETD• Directly to library server/permanent archive• Archiving fee replaces binding fee
• Graduate School approves• E-mails author, advisor, UMI (VT scripts)• Authors/advisors prescribe Internet access
• Library catalogs and archives • UMI downloads
QuickTime™ and aCinepak decompressor
are needed to see this picture.
http://scholar.lib.vt.edu/theses/available/etd-2227102539751141/
Library Resources• Hardware: server
• Maintenance and security• Started small: NeXt 3.3 (HP; 1989-97)• Grew: Sun dual-processor Enterprise 250--Solaris 2.7 (Apache web
server)
• Software• Submission scripts written by DLA
• Includes e-mail notifications to authors, advisors, UMI• Use it too: http://scholar.lib.vt.edu/ETD-db/
• Log files analyzed with Analog • Survey scripts written by DLA
• Data from authors and readers• Use it too: http://lumiere.lib.vt.edu/surveys/
• Search Engine• Started small: freeWAIS >> Grew: InfoSeek’s ULTRASEEK
Financial Concerns• At VT: start-up costs = $0
• On-hand staff, equipment, software, freeware
• From zero base: estimate $65,000• $24,000 Staff (part time)• $36,000 Equipment• $15,000 Software
http://scholar.lib.vt.edu/theses/data/setup.html
Costs/Savings at VT
• Graduate School stopped shipping to the library 3000 copies of paper TDs/year
• Library stopped handling (e.g., shipping, binding, shelving, and circulating) 3000 copies of TDs/year
• 166 ft of shelf space saved yearly by the Library
• VT used existing equipment in Library (vs. start-up costs for staff, hardware and software)
Digital Library Benefits:Low margin, high use
• Incorporate ETDs with other digital library activities• Ejournals, online class materials, digital images, etc.• Additional equipment, staff may not be necessary
• http://scholar.lib.vt.edu/theses/data/setup.html
• Use VT programs, scripts, etc.• http://scholar.lib.vt.edu/ETD-db/
• Online accesses vs. circulation of copies• VT theses 1990-1994, combined average circulation per
copy: 2.24/yr• VT dissertations 1990-1994, combined average circulation
per copy: 3.2/yr
Access to VT’s ETDshttp://scholar.lib.vt.edu/theses/
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
ETD files requested 231,709 483,030 578,152 2,173,420 4,497,199
Abstracts requested 165,710 215,493 260,699 573,149 471,917
1997/98 1997/98 1999/00 2000/01 2001/02
Why are ETDs so popular?• User surveys
• 67% found VT ETDs easily• 61% found them by searching• 22% browsed by department• 16% browsed by author• 53% downloaded 1 or more ETDs
• Author surveys• Conversion and submission processes less difficult than
anticipated• Over half plan to publish articles from their ETDs
• Why did they restrict access?
http://lumiere.lib.vt.edu/surveys/
Availability of 4224 VT ETDs
Withheld17.2%
Restricted VT-only27.3%
Mixed2.9%
Available/ Unrestricted
52.6%
Reasons for Restricted Access
Advice of others9.49%
Patent pending3.23%
Advice of faculty48.58%
Personal choice25.81%
Other reasons8.73%
Advice of publisher4.17%
ETDs and Accessibility
• Inaccessible ETDs• Patents pending• Future publication fears
• Broken links• Quality of work remains• Similar to out-of-print articles
• Media standards• Open source software (e.g., PDF reader)• Typical commercial software• Few esoteric programs, include original scripts
ETDs and Publishing
• Early controversies waning• Faculty: prior publication?
• Protective of future academics
• Surveys of publishers• No specific policies largely• Consider submissions individually
• VT ETD Alumni• None had problems getting published
• Authors• Retain some rights, e.g., link to curriculum vitae,
online course materials
ETDs and Copyright• Author’s rights
• Reproduction, modification, distribution, public performance, public display
• Retain rights • Share non-exclusive rights
• Permit library to store and to provide access• Publishers
• Author’s obligations: fair use• Balance factors or get permission
• Notification: optionalCopyright 2002 by Gail McMillan ALL RIGHTS RESERVED
• Registration: optional• Possibly receive greater compensation, with less
documentation if filing infringement law suit
ETDs and Long-term Preservation• Concerns: Access without paper
• Long term preservation• Standard multimedia formats
• PDF Reader: an open source
• http://scholar.lib.vt.edu/theses/archive.html
• Addressed Concerns• Cooperatives
• OhioLink • Why not: OCLC, NDLTD?
• Commercial options• UMI: traditional microfilming
• Frequent, regular back-ups available on, off-site
Ensuring Access to VT ETDs
• Every 15 minutes back-ups made of newest, not-yet-approved submissions
• Hourly back-ups of newly approved ETDs
• Weekly back-ups of entire ETD collection
• Multiple copies stored on-site and off-site
• NDLTD: let’s reciprocate, cooperative mirroring
Lessons from ETDs
• Implementation of new formats slower than expected • Text oriented • Not planning for online readers
• If you build it, it will get used.• Access exceeded expectations• Disappointing number are inaccessible
• Remarkable increase in exposure to graduate student research
• Requiring institutions slower than expected• No longer experimental
• Increase in number and diversity of NDLTD institutions
Available at VT
• Informationhttp://scholar.lib.vt.edu/theses
• Automated submission system ready for customization
http://scholar.lib.vt.edu/ETD-db/
• Student guidelines, training materials, FAQ's, multimedia educational materials
http://etd.vt.edu
• NDLTD: Network educational institutions• Annual conferences: Berlin 2003, U of Kentucky 2004
http://www.ndltd.org
Union Catalog
(withVinod Chachra,Thom Hickey)
NDLTD Union Catalog Architecture
TD OAI
Repository
ETD OAI
Repository
WorldCat
VT ODL DemoSearch/Browse
Virtua
UnionCatalog
email FTP
OAI-PMH
OAI-PMH
OAI-PMH
OAI-PMH
20+ sites
OCLC
VTLSSRU/SRW
(search)
Try:Z39.50harvest
Union Catalog Creation
NDLTD Site / Member
Local DB
OAI Server
Local Search / Brow se
Student Entry
NDLTD Central
OAI Harvester
Name Authority Service
(e.g. OCLC)
MARIAN Union
Catalog
VTLS Union Catalog
MARC DB
Virtua
Conversion
Alternate MARC Transport (f tp?) tapes?)
Librarian Verif ication / Validation / Enrichment / Maintenance
OCLC Capabilities
• Harvesting• OAI-PMH versions 1.1 and 2.0
• Harvestable sets• Sets by institution
• Searching• SRU (Z39.50 on the Web)• VTLS• Virginia Tech Open Digital Library demo
• Unicode support
OCLC Statistics
• 19 Sources
• 61,998 records• Probably some overlap
• Adding 1-2 new sites/month
OCLC Metadata Formats
• Dublin Core – All
• ETDMS – 9
• MARC – 5
Complex to Simple
MARC ($50) Dublin Core (DC)
+thesis
ETD-MS
• ETD Metadata Standard• XML-encoded metadata standard
(content and encoding) for Electronic Theses and Dissertations (ETDs)
• in part conforming to Dublin Core (DC)
• using UNICODE
• (optionally / later using RDF)
• Well specified relationship with MARC
NDLTD Members and ETD-MS
• NDLTD members• Share metadata for their ETDs
• Providing that in either ETD-MS• Or if they use a version of MARC locally,
work to have that eventually shared in either MARC21 or UNIMARC
• Run OAI, either locally or in consortia, so their metadata can be harvested, according to necessary terms and conditions
The OAI Static Repository Model
• Components of the model• The static repository
• An well-defined structure XML file with information similar to that in OAI-PMH responses
• Accessible at a persistent network-location
• The static repository gateway• makes one or more Static Repositories harvestable. • assigns a unique base URL to each such Static
Repository• Responding to OAI-PMH requests
The OAI Static Repository Model
NDLTD Union Catalog Statistics1. Participating Countries
So far ETDs from 7 countries are included in the database. Canada Germany Greece Korea Portugal Spain U.S.
UK to be added by June 30, 2002. Brazil to be added soon.
NDLTD Union Catalog Statistics2. Interface Languages in Union Catalog
The language here is the language of the interface The VTLS NDLTD Union Catalog has 14 languages:
English, Arabic, Catalan, Chinese
French, German, Hebrew, Korean
Polish, Portuguese, Russian, Slovak
Spanish and Swedish
Example follows
German
NDLTD Union Catalog Statistics3. Languages in the Union Catalog
The language here is the language of the content of ETD The VTLS NDLTD Union Catalog has data in 6 different languages.
These are: English German Greek Korean Portuguese Spanish
Examples follow
Language = German; hits = 137
Full record display
Language = Greek
In Greek
In English
Other Topics
• Extended services: linking
• Retrospective conversion
• Z39.50
• Requiring ETDs
• …
CollaborativeDevelopment
(Joan Lippincott)
Why Collaboration?
• Expertise in aspects of the digital environment
• Pooling of resources
Collaboration and digital projects
• Distributed systems
• Digital course content
• Digital library resources
• Delivery of services
• Development of policies
Collaborations involve:
• Shared goals
• Common vision
• Shared vocabulary
Two views of an ETD progam
• Have staff scan
• Implement now
• Increase university visibility
• Teach students to write and submit ETDs
• Implement soon
• Develop electronic authors
In a collaboration...
• Each contributes resources
• Partners acknowledge and value contributions
• Partners develop a clear process
• Group and individual accountability
ETD project participants
• Academic administrators
• Faculty
• Students
• Staff
• Graduate school / provost / registrar
• Information technologists
• Librarians
Collaboration and NDLTD
• Common goals of members
• Diverse sets of skills and expertise
• Need for strategies and tactics to surmount any problems -> advocacy
Collaborative project strategy
• Champion initiates project
• Leadership establishes initial goal and parameters
• Issue a call for participants
• Conduct procedure to select participants
Collaborative project strategy
• Initial meeting• Develop shared goals• Develop clear process
• Continue work at institutions• Establish communication channels• Establish project milestones• Evaluate progress, refine approach
Collaborative project strategy
• Disseminate results
• Online documentation
• In-person event
• Disseminate a product
• Regional workshops
• Session at ETD 20XX
NDLTD project areas
• Training materials
• Promotional materials
• Identify and recommend standards
• Local, national, regional policies
YourPlans
(Ana Pavani)