Digital Libraries with Greenstone: an open source solution Tod Olson - University of Chicago Fred...

Preview:

Citation preview

Digital Libraries with Greenstone:an open source solution

Tod Olson - University of Chicago

Fred Miller - Illinois Wesleyan University

Curtis Kelch - Illinois Wesleyan University

Copyright Tod Olson, Fred Miller, and Curtis Kelch 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.

Digital Libraries with Greenstone

• Introduction

• About digital libraries

• Greenstone overview

• Examples

• Future

• Live demos

• Q & A

The World of Digital Libraries

• Access to Digital Collections– Text, images, audio, video– Searching and metadata

• Digital libraries versus repositories– Access and preservation

• Digital Preservation Tutorial http://www.library.cornell.edu/iris/tutorial/dpm/

Sorting Out the Ingredients

• Raw materials

• User interface

• Elements of organization

• Building the collection

GreenstoneNew Zealand Digital Library Project

at the University of Waikato• with UNESCO, Human Info NGO

International, every continentExamples:• Academic

– Digitization projects– Classes on digital libraries

• Non-academic– UNESCO humanitarian documentation

Greenstone features

• Works with existing documents– Imports several formats

• Searching: full text and metadata– Dublin Core, custom metadata

• Browse• Structured documents

– Indexing, access

• Extensible & customizable• OpenSource software (GPL)

Greenstone ArchitectureReceptionist

Collection Server Collection Server

DB & Indexes

Redrawn from Witten & Bainbridge, How to Build a Digital Library, p. 356

Protocol

Collection

Import

DB & Indexes

Collection

Import

DB & Indexes

Collection

Import

Receptionist

Greenstone Architecture

Receptionist• Provides user

interface• Accept user input• Send to appropriate

collection server• Accept results• Dynamic page

generation

Collection Server• Handle collection

content• Search and filter

information• Return results• multiple collections

DB &Indexes

HTML

PDF Import BuildGSAF

???

Building Collections

Building collections

• Create a collection framework– or work with an old collection

• Select documents

• Import documents– Converts to internal XML format (GSAF)

• Build collection– creates search indexes and browse listings

GSAF: internal XML format

Section:• Description

– Metadata fields

• Content– Text,internal markup, images

• Section– No limit in number or depth

Hierarchical documentsSections nest, tree structure

<Section><Description>

<Metadata name=“Title” value=“…”><Content>

[Text, images, links, etc.]<Section>

<Description><Metadata name=“Title” …>

<Content>…<Section>…

<Section>…<Section>…

GSAF: internal XML format

Config file: collect.cfg

Collection-specific configuration file, collect.cfg, specifies:  

• file types to import • Indexes and browse lists

– Document or section level– paragraph (text index only)

• display of results and browse listings • document displays

Chopin Early Editions

Over 400 early edition Chopin scores1830’s to 1880’s

Target audience: music scholars & musicians.

On web, page-turnable JPEG images. Online in March 2003

Currently 374 scores in online collection

Usage:Nearly100 hits per day, > 30% of use is international.

Catalogrecords

ScannedImages

Structuralmetadata

METSXSLT Greenstone

ArchiveFormat

GreenstoneDig. LibrarySoftware

Humanprocessing

XML-based automated processing

Build overview

METS to GSAF

dmdSecMODS: Title, …

fileSecpage1.jpgpage2.jpg

structMapdiv: Score

div: Page 1div: Page 2

SectionDescription

Metadata: Title, …Content:

Title, …Section

Content: Page 1 page1.jpg

SectionContent: Page 2

page2.jpg

Greenstone benefits for Chopin

• Robust, mature system• Recovered time in project

– Fast to bring up– UI out of the box– Dynamic page generation– Incremental customization

• XML compliant– Natural mapping from METS to GSAF

The Argus Digital Collection

• Illinois Wesleyan Student Newspaper– 1894 to 2000

• Preservation and Access

• Image PDF versus full text

• Web interface for building metadata

• Customized searches

Argus Metadata Maintenance

Argus Search

Argus Issue “front door”

Ongoing work: Greenstone

• Greenstone Librarian Interface (GLI)

• Greenstone 3

Greenstone Librarian Interface (GLI)

• Collection management– Informed by work at

GS sites– Assist collection

designer– Support all phases of

collection build process

– Do not specify workflow

• Java-based GUI tool– Formerly called the

“Gatherer”

• 2 yrs in development– Beta sites: Bangalore

and elsewhere

• Training sessions– UNESCO sessions in

Asia, Africa– JCDL 2004 tutorial

Greenstone 3

GS2 mature, 5+ yrs., wide deployment– Constraints: support legacy systems– Other technologies have matured: Java, XML

GS3: rewrite in Java, XML, XSLT• Distributed architecture, SOAP• METS as internal format

– Group assembled for Greenstone METS profile(s)

• OAI support planned• 1 year in dev; alpha testing in lab

Links & Further Information

Greenstone: http://www.greenstone.org/ Chopin Early Editions: http://chopin.lib.uchicago.edu/Argus Digital Collection:

http://www.iwu.edu/library/services/argus1.htm Argus Greenstone Documentation:

http://www.iwu.edu/~ckelch/ArgusProjectDoc12.pdf Witten & Bainbridge. How to Build a Digital Library. Morgan

Kaufman, 2003.

More about Greenstone…

Recommended