13
Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan,

Document management (aka ‘digital libraries’)

Embed Size (px)

DESCRIPTION

Document management (aka ‘digital libraries’). The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze. Document management Content management Metadata management Multimedia documents - PowerPoint PPT Presentation

Citation preview

Page 1: Document management  (aka ‘digital libraries’)

Document management (aka ‘digital libraries’)

The Greenstone Group:

Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze

Page 2: Document management  (aka ‘digital libraries’)

Our work includes…

• Document management

• Content management• Metadata management• Multimedia documents• Alerting and event

notification support

• OCR-ing services• Document & collection

visualization• User needs analysis• Text mining• Automatic metadata

extraction

Page 3: Document management  (aka ‘digital libraries’)

Greenstone software

• ‘digital library’ construction, use, and maintenance software

• Developed at Waikato (www.greenstone.org)• Open Source• Widely used internationally (UNESCO, FAO,

Texas A&M Uni, Kyrgyz Republic, …)

Digital library:A collection of digital objects (text, video, audio) along with methods for access and retrieval, [user]and for selection, organisation, and maintenance[librarian]

Page 4: Document management  (aka ‘digital libraries’)

Greenstone software features “Library” = set of separate collections

“Collection” = set of separate documents Multigigabyte collections

Hierarchical document model Multimedia picture, voice, music, video collections

Multi-language documents Unicode throughout

Multi-language interfaces French, Chinese, Arabic …

Web browser or CD-ROM

Searching full-text and fielded, ranked or boolean

Browsing hierarchical indexes created from metadata

Metadata Dublin core + collection-specific extensions

Plugins different document types and metadata specifications

Classifiers create browsing indexes (collection editor decides)

Compression techniques throughout uses MG

Distributed collections coming soon, with Corba

Open-source software free, extensible

Collections

Documents

Access

Importing

Distributing

Page 5: Document management  (aka ‘digital libraries’)

Greenstone supports: multilanguage documents

Page 6: Document management  (aka ‘digital libraries’)

Greenstone supports: hierarchically

structured documents

A book

Page 7: Document management  (aka ‘digital libraries’)

Greenstone supports: collection design, maintenance

Designing a collection with the Gatherer

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: Document management  (aka ‘digital libraries’)

Greenstone supports: a wide (and growing) set of file formats

• DOC• PDF• XLS• LaTeX• Refer• MARC• …• highly extensible through ‘plugin’ mechanism

Page 9: Document management  (aka ‘digital libraries’)

Mobile document access

• handheld information access• browsing methods for varying screen sizes• studies on search behaviour (on- and off-line)• support for non-text documents (FunkyZoom

views of maps, images)

Page 10: Document management  (aka ‘digital libraries’)

Browsing and exploration: hierarchical phrase index

What’s in this collection?Is it any good?What coverage for topic X?My query returned too much/little, what now?

Page 11: Document management  (aka ‘digital libraries’)

Recent and proposed projects

• Making documents mobile: moving between large online collections and a PDA

• Text mining: extracting quality metadata from legacy documents

• User needs analysis: what sort of documents do a given set of users require, and how can the collection be managed?

• Visualization: making it easy to ‘see’ what’s in a collection, and supporting effective browsing

Page 12: Document management  (aka ‘digital libraries’)

Recent and proposed projects• Multi-language collections: tailoring a document

collection interface and interaction mechanisms to the language of its users

• Alerting services: bringing potentially useful documents to the user’s attention, without overwhelming them

• Supporting unusual users: collections for the physically disabled, illiterate or semi-literate, children, …

• Audio and image collections: novel browsing and searching mechanism

Page 13: Document management  (aka ‘digital libraries’)

Recent and proposed projects

• Storage and searching: developed highly efficient techniques for storing, indexing, and searching text documents; implemented in Greenstone, but portable to other document management software

• Usability analysis: how easy is it to use your current document collection? How can access be improved?

• And a host of wacky and cool things: collaging document collections, music retrieval systems, ‘aerial’ views of documents, …