32
Botanicus.org: Applying emerging technology to historic scientific literature Chris Freeland Doug Holland Missouri Botanical Garden

Botanicus.org: Applying ermerging technology to historic scientific literature

Embed Size (px)

DESCRIPTION

Botany 2007, Chicago, IL.

Citation preview

Page 1: Botanicus.org: Applying ermerging technology to historic scientific literature

Botanicus.org: Applying emerging technology to

historic scientific literature

Chris Freeland

Doug Holland

Missouri Botanical Garden

Page 2: Botanicus.org: Applying ermerging technology to historic scientific literature

Published literature is the foundation on which biological science is based

Botany & systematics are sciences built on accumulated knowledge

Page 3: Botanicus.org: Applying ermerging technology to historic scientific literature

Taxonomic Literature

• Over 250 years of systematic description of life

• Systema naturae (10th ed. 1758) by Carl von Linné

Page 4: Botanicus.org: Applying ermerging technology to historic scientific literature

The cited half-life of publications in taxonomy is longer than in any other scientific discipline

* * * The decay rate is longer than in any scientific discipline

- Macro-economic case for open access, Tom Moritz

Taxonomic Literature

Page 5: Botanicus.org: Applying ermerging technology to historic scientific literature

How historic literature is used

Page 6: Botanicus.org: Applying ermerging technology to historic scientific literature

Taxonomic Impediment

• Specimen collections• Databases• Publications• Observations• ‘Gray’ literature• Index cards• Field notebooks

Page 7: Botanicus.org: Applying ermerging technology to historic scientific literature

www.botanicus.org

A freely accessible, Web- based encyclopedia of digitized botanical

literature, sponsored by the Missouri Botanical Garden Library

• 650,000+ pages of text

• 1,300 volumes, 200 titles

• 145,000 linked protologues

• ~10TB of data

Page 8: Botanicus.org: Applying ermerging technology to historic scientific literature

Workflow

Selection Preparation

Post ProductionPublicationMetadata

Enhancement

Digitization

Conservation

Page 9: Botanicus.org: Applying ermerging technology to historic scientific literature

Selection

Page 10: Botanicus.org: Applying ermerging technology to historic scientific literature

Preparation

• Review bibliographic metadata in MOBOT library catalog– Clean up, if needed

• Extract MARC – Transform to MARCXML– Parse into Botanicus DB

• Review title & determine which scanning device to use– Possible trip through Conservation

Page 11: Botanicus.org: Applying ermerging technology to historic scientific literature

Digitization

5 Full time scanners

3 Indus 5002 book scanners

1 Kodak i280 Sheet feed scanner

Page 12: Botanicus.org: Applying ermerging technology to historic scientific literature

Post Production – Custom Apps

• PageConvert– JPEG2000 (*.jp2) creation– Thumbnail creation– Moves derivative images to server– Updates item records to prepare for publishing– Runs on each scanning workstation

• PagePublish– Looks for items ready to publish– Creates or updates page records– Guesses page “types” text or illustration – Triggers OCR generation and PDF creation– Updates titles and item records to “publish ready”– Runs centrally

Page 13: Botanicus.org: Applying ermerging technology to historic scientific literature

Post Production – Packaged Apps

• PrimeOCR– 6 voting engines– Multi-language support– Character coordinates– Outputs ASCII text, other formats

• LuraTech PDF Compressor– 2GB of TIF page images -> 30MB PDF– PDF/A– OCR (ABBY FineReader)

Page 14: Botanicus.org: Applying ermerging technology to historic scientific literature

Enhancement - Paginator

Page 15: Botanicus.org: Applying ermerging technology to historic scientific literature

View

Page 16: Botanicus.org: Applying ermerging technology to historic scientific literature
Page 17: Botanicus.org: Applying ermerging technology to historic scientific literature
Page 18: Botanicus.org: Applying ermerging technology to historic scientific literature
Page 19: Botanicus.org: Applying ermerging technology to historic scientific literature

Web 2.0 Features

• AJAX interface – JPEG2000 (Image compression with zoom)

• Web Services – uBio TaxonFinder and NameBank Taxonomic

Intelligence

• RSS feeds– Volumes added and news

• Mash Ups– Geocoded Subject headings plotted on Google Maps

• Tag Clouds

Page 20: Botanicus.org: Applying ermerging technology to historic scientific literature

9. Page View

Page 21: Botanicus.org: Applying ermerging technology to historic scientific literature
Page 22: Botanicus.org: Applying ermerging technology to historic scientific literature

• Distributed taxonomic indexing– Public-resource computing application that

identifies name-like strings in OCR text– Bundles of text pages sent to volunteer

computers for indexing & results reporting

• Runs as a screensaver

• Open source framework behind SETI@Home

Page 23: Botanicus.org: Applying ermerging technology to historic scientific literature

TIF Image from ScannerConverted to text via PrimeOCRName finding via bTaxonGrab Extract namesSubmit to TaxonFinderSOAP response

SciLINC in action…

Page 24: Botanicus.org: Applying ermerging technology to historic scientific literature

Prof. Newton wrote me that he is extremely excited about your digitization project. At the moment he and his graduate botany students in Kenya have access to very few resources. He spends his summer terms at Kew doing his research for the next year's teaching and writing, but he tells me that now, because of what is already on your site, he will not have to carry so much back to Kenya for his research and his students but can download and work with your resources right there.

-- excerpt re: Botanicus from email August 2006

Taxonomic Impedectomy

Page 25: Botanicus.org: Applying ermerging technology to historic scientific literature

The Future

• User Accounts– User defined views– MyBookshelf – favoriting & sharing

• Wiki-type editing & tagging– Metadata enrichment– OCR correction by users

• Bibliographic Intelligence– Improved “click through” citations– Citation finding & linking

• Increased geospatial extraction and visualization

Page 26: Botanicus.org: Applying ermerging technology to historic scientific literature

Biodiversity Heritage Library• American Museum of Natural History

(New York)• Field Museum (Chicago)• Natural History Museum (London)• Smithsonian Institution (Washington) • Missouri Botanical Garden• New York Botanical Garden• Royal Botanic Garden, Kew• Botany Libraries, Harvard University• Ernst Meyer Library of the Museum of

Comparative Zoology, Harvard University

• Marine Biological Laboratory / Woods Hole Oceanographic Institution

Page 27: Botanicus.org: Applying ermerging technology to historic scientific literature

• Core literature pre-1923: 400,000 (80 million pages)

• All pre-1923: 600-750,000 (120-150 million pages)

• All literature: 1.4-1.6 million (280-320 million pages)

Biodiversity Heritage Library

Page 28: Botanicus.org: Applying ermerging technology to historic scientific literature

www.biodiversitylibrary.org

Page 29: Botanicus.org: Applying ermerging technology to historic scientific literature
Page 30: Botanicus.org: Applying ermerging technology to historic scientific literature
Page 31: Botanicus.org: Applying ermerging technology to historic scientific literature

Botanicus.org brought to you by:

Andrew W. Mellon Foundation2000-2004

Wm. Keck Foundation2005-2007

Institute of Museum and Library Services (IMLS)

2006-2008

Page 32: Botanicus.org: Applying ermerging technology to historic scientific literature

Botanicus.org

Please comment and send questions and suggestions to:

[email protected]

[email protected]