Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Scientific DisciplinesFrom Discovery to Delivery
Cathy NortonDeputy Director BHLPrincetonJanuary 18, 2011
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
“The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions
for research in Every branch of
biology”
– E.O. Wilson
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Collaborative Tree of Life distributed semantic
Biodiversity Heritage Library ever evolving TED all information Synthesis Center Oh wow! SpeciesBase ClassificationBank Education and Outreach ANTS index MacArthur Foundation taxonomic intelligence modular software communal ownership user defined AvenueA | Razorfish OBIS MBL free
visualization images WorkBench sounds phylogeny web 2.0 names-based infrastructure Atlas of Living Australia February 2008 Google Marine Biological Laboratory all species Smithsonian FISHBASE Harvard Field Museum Tree of Life E. O. Wilson aggregation / mashup EDIT ScratchPad widgets
MOBOT NHM AMNH NYBotancial Sloan Foundation GBIF llison l NameBank videos National Geographic any classification TDWG/BIS
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
http://tagcrowd.com/That's a service that will make word clouds from arbitrary text or a URL
• Clouds– Building word clouds
• Count how many times a word appears• Decide on font large to small
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Serine Molecule
BiodiversityHeritage Library
Synthesis CenterField Museum
InformaticsMarine BiologicalLaboratory & MOBOT
Education & OutreachSmithsonian/Harvard
SecretariatSmithsonian
Missouri Botanical Garden
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
EOL Hierarchy
• The EOL Steering Committee is comprised of senior authorities from
Harvard University, Smithsonian Institution, the Field Museum of
Chicago, the Marine Biological Laboratory at Woods Hole, the
Biodiversity Heritage Library consortium, Missouri Botanical Garden,
and the Macarthur and Sloan Foundations.
• The EOL Institutional Council contains more than 25 institutions from
around the world and provides EOL with global perspectives and
outreach capabilities. The Distinguished Advisory Board consists of
13 global leaders from the scientific and policy communities.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Con’t Working groups
• The Species Sites Group works with contributors and data providers and IP issues
• Biodiversity Informatics Group is responsible for the software development of tools and open access delivery of species information through a single portal
• Education and Outreach Group works to insure widespread awareness of the EOL
• Biodiversity Synthesis Group will facilitate cross disciplinary involvement and will explore integrative topics, including taxonomy, evolution, biogeography, phylogenetics and biodiversity informatics.
• Scanning and Digitization Group led by the Biodiversity Heritage Library, is a consortium of 12 natural history, botanical and research libraries that will scan for the public commons out of copyright and permissioned works.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Con’t • 2008: FishBase (www.fishbase.org), a global information system with data on practically every
fish species known to science. FishBase is serving information on more than 30,000 fish species through the EOL.
• The Catalogue of Life Partnership (CoLp) (www.catalogueoflife.org), an informal partnership dedicated to creating an index of the world’s organisms.. The EOL currently uses CoLp as its taxonomic backbone.
• Tree of Life web project (ToL) (www.tolweb.org), a collaborative effort of biologists from around the world. On more than 9,000 Web pages, the project provides information about the diversity of organisms on Earth, their evolutionary history (phylogeny), and characteristics. ToL project illustrates the genetic connections between all living things.
• The Global Biodiversity Information Facility (GBIF) (www.gbif.org), the world’s premiere source for information on biological specimen and observational data, providing on-line access to more than 135 million data records from around the world. GBIF is providing range maps for the EOL species pages.
• AmphibiaWeb (http://amphibiaweb.org), an online system enabling anyone with a Web browser to search and retrieve information relating to amphibian biology and conservation.
• 2011 Now over 120 partners! See them at:
• http://www.eol.org/content/partners.
Data Partners
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Encyclopedia of Life
• Major project to create a single Web page for every known species (1.9 million!)
• Total funding will reach at least $50M by 2012• EOL needs literature Hence the BHL project• BHL key partner in EOL project• Launched on 9th May, 2007
– First 30,000 pages launched at TED Feb 27th, 2008
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
This library serves the MBL, WHOI, USGS, NMFS, SEA, WHRC,
and other scientific groups in the area.
Facing a new dynamic phase
NMFS - 1871
MBL - 1888
WHOI - 1930
USGS - 1960
SEA - 1971
WHRC - 1985
Woods Hole Scientific Community
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
It began and begat
Reptilia and Batrachia. (1885-1902) by Albert C.L.G. Günther
Open Access: all content can be reused, repurposed, reformatted, sliced, diced, scraped, harvested, integrated.
2003 Telluride . Encyclopedia of Life Meeting2005 London. Library and laboratory: the
Marriage of Research, Data, and Taxonomic Literature.June 2006 Washington. Organization and Technical MeetingOctober 2006 St Louis/San Francisco Technical Meeting
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Reptilia and Batrachia. (1885-1902) by Albert C.L.G. Günther
February 2007 MCZ Harvard Organizational MeetingMay 2007 Encyclopedia of Life Launch. Washington DCFrom then on at least one Annual Meeting per year for Institutional Council and One/Two Technical Meetings per year. Next one in March in Washington
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Collaborators
Sanborn TenneyNatural History of Animals . . .1868. Internet Archive
Set up scanning centers in London, New York, Washington, Boston, etc.High-quality, non-destructiveScanning.Image files and text derived from OCR.
Internet ArchiveInternational Commission on
Zoological NomenclatureOpen Content AllianceEuropean Distributed Institute
of TaxonomyGlobal Biodiversity Information
Facility (GBIF)Many more under negotiation
Sanborn TenneyNatural History of Animals . . .1868.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Mission:Provide Open Access to Biodiversity Literature
Goals:Digitize the core published literature on biodiversity and put on the Web
Agree on approaches with the global taxonomic community, rights holders and others
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Jacob Christian SchäfferElementa entomologica . . . 1766.
BHL Portalhttp://www.biodiversitylibrary.org
Serve image and test files: create volume, Part, piece, metadata; ingest page level Metadata at scanning level; apply GloballyUnique Identifiers (GUIDs) for linking to Other taxonomic services.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
How big is the Biodiversity domain?
• Over 5.4 million books dating back to 1469
• 800,000 monographs
• 40,000 journal titles (12,5000 current)
• 50% pre-1923
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Why now?• Cost low – 10-19 cents a page• Other projects funded– BL/Microsoft
/Google big ten, BLC• Tractable, well-defined scientific domain• Taxonomic information has exceptionally
longevity • Supports GBIF and other international
initiatives –
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• Taxonomists and other scientists will have access to biodiversity literature - globally
• Will provide the developing world with access to the historical literature
• Scientists working in many biological domains – and other areas like meteorology, geology, ecology, genomics, etc – will get access
Benefits
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• Less space needed for Library collections In Lillie – space freed for other uses
• % material can be stored off-site in ‘dark storage. FTP
• Our scientists will get access at their desk or in the field
• Library focus will shift to informatics• Virtual web library will increase public
access• Library staff will change –
Benefits to the MBLWHOI Library
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• Key partner of Encyclopedia of Life• Working Groups have agreed technical
plan, metadata standards and image standards
• Internet Archive to be technical partner – scanning and hosting
• ‘Scribe’ scanners now installed in NHM NYC, Boston, Library of Congress, Univ. Ill ,China, Egypt
• 33 million pages already available
Where are we now?
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• Legal issues - BHL organisational structure, content licensing, contracts being developed by EFF
• BHL will take responsibility for long-term sustainability of the scanned material
• Blackwells Publishing/Wiley back-files possibly available through the BHL
• Zoological Record will provide their index as route to BHL articles
• OCR and name recognition tools identified and linked to project - Taxonomic Intelligence
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• GLOBAL BHL– BHL Europe – 47 libraries– BHL China – National Academy of China– BHL Australia- Atlas of Living Australia– BHL South America – Brazil– BHL- Egypt
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Global Coordinating Committee
BHL-China
BHL-Egypt
BHL-Australia
BHL-South America
BHL-Europe
BHL-US/UK
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Classes of texts
Public Domain – pre-1923 US Pre 1955 Australia, all different
Non-profit society journalsPost-1923 monographs
some with copyright renewalssome without copyright renewals
Commercial journals
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
BHL Seeks Permissions
BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.
Will provide a set of files to the learned society for reuse as they see fit.
Will index the issues using Taxonomic Intelligence increasing their usability.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Benefits
Use of the articles will increase as evidenced by citation upsurge.
Long-term management of the digital assets is provided by the BHL at no cost so it’s contributors
Content will be integrated into EOL project through TI nomenclatural linking.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Levinus Vincent, Elenchus tabularum, pinacothecarum, 1719
The cited half-life of publications intaxonomy is longer than in any other scientific discipline.
The decay rate is longer than in most scientific disciplines.
-Macro-economic case for open accessTom Moritz
Current taxonomic literature often relies on texts and specimens >100 years old.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808
Convention on Biological Diversity: Article 17
Institutions that are creating the BHL exist to persist through time.
–The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly. –Interoperability is the key.. Repository islands will sink
The Long NOW Strategy
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biologia Centrali-American
0
1
2
3
4
5
6
7
8
US & Canada Europe Mexico & C.America
SouthAmerica
Physical Distribution…
Now… you can
Parse data, harvest out data, Wealth of information locked on the pages are now liberated!
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Henry Walter BatesThe Naturalist on the River Amazons, 1863
Most literature is in the developed worldthe Northern Hemisphere
Most Biodiversity is in the developing worldthe Southern Hemisphere
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Progne subis- Purple Martin Illustrations of the nest and eggs of birds of Ohio, 1879-1886
Library and Laboratory: the Marriage of Research, Data and Taxonomic LiteratureLondon, February 2005
Eighty participants from 22 countries gathered to discuss the status and future of access to the taxonomic literature and to propose an agenda for actions that would improve the research environment for taxonomy. The participants were taxonomists; librarians; publishers; representatives of learned and professional societies, private foundations and government agencies; and specialists in information and communications technology.
Scalable Mass ScanningContractsFirewallsSecurityLoading DocksTrucks180 mile round trip!
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Ernest Ingersoll Hand-book to the National Museum … Smithsonian Institution, 1886
Mass Scanning WorkflowBid ListsPick ListsPacking ListsSerials ManagementMonographic ManagementStickers for Media and cartsRare Books-vaults
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Internet Archive Scribe: Boston
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
US/UK PROCESS: multiple institutions
•American Museum of Natural History •Botany Libraries, Harvard University•Ernst Mayr Library, Museum of Comparative Zoology, Harvard University•Field Museum•Marine Biological Laboratory / Woods Hole Oceanographic Institution Library•Missouri Botanical Garden•Natural History Museum (London)•New York Botanical Garden•Royal Botanic Gardens, Kew•Smithsonian Institution Libraries•Academy of Natural Science (Philadelphia)•California Academy of Science US
PROCESS: communication•Telephone conversations•Email strings•Working documents•bhl.wikispaces.com•Face to face meetings•Presentations•Articles•Going beyond self expectations was the norm
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
The Work-Flow Process
• Select Book ~Pull from Shelf
• Review Physically, • and check Metadata• Establish viability and
create pick/pack list / Wonderfetch
• Send to IA scanning center
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
THE TITLE:
•Requested by patron/curator?
•EOL need?
•Gap fill?
Is the title a serial? …need to bid in scanlist
Is the title a Monograph?
…need to run through deduper
THE SCANNING:
Picklist creation
Metadata check
Physically move volumes from stacks to scanning area/building/off campus
Are pages torn?-what are parameters for rejection?
POST SCANNING:
Volumes returned post scanning
QA/QC checking – what are parameters for quality?
Where will volumes be viewed?
Integration with ILS
Integration with web, tools
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
PROCESS –
Documentation,Feedback
Tweaking, Completing
Process
is a continual process!
; - )
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Informatics
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Period of explosive growth• NCL Centre for Biodiversity Informatics (India)--2000• Speciation event: Biodiversity Informatics--2004• Ocean Biodiversity Informatics conferences--2004, 2007• Species-bases sites: FishBase, AntWeb, AmphibiaWeb, North American
Mammals, Swedish ArtDatabanken, Atlas of Living Australia, Netherlands species compendium …
• Specimen-based networks: HerpNet, MANIS, ORNIS, • Regional networks: IABIN, OBIS, …• Biogeomancer--2005• IdentifyLife--2005• JRS Biodiversity Foundation--2005• European Distributed Institute of Taxonomy (EDIT)--2006• BDI curricula
– University of Illinois Master of Science in Biological Informatics--2006
• Encyclopedia of Life (EOL)--2007
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Crowdsourced Articles
• http://www.biodiversitylibrary.org/pdfgen/17298
Demo: http://youtube.com/watch?v=oidf3b26jVs
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Crowdsourced Articles
• 12,000 PDFs generated through September 2009– 4,900 submitted with article metadata – Analysis: http://bit.ly/4Jqu9
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Great, but how to…
• display / manage?
• meet community demands for bibliography / citation management?
• build from more open source tools?
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Development goals re: citations
• Create a repository for community-vetted taxonomic bibliographies.
• Ability to ingest, display, download, and index articles so that the BHL can operate as an article repository.
• Build from existing community of work around Drupal / Biblio.– In use by collaborators
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
“something like GenBank or NameBank for citations…”
So, CitationBank…or CiteBank (saves chars)
NEED…
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
http://citebank.biodiversitylibrary.org/
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Crowdsourced Articles
• PDFs from BHL pushed into Drupal/Biblio:
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
CiteBank boundaries
Book
Citation
Pageturning UIPDFOCR
eBook/Kindle
Stored *somewhere* & retrievable via HTTP URI
CitationCitationCitation
Bibliography
CiteBank
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.orgBHL Data Flow – Sep 2009
CiteBank
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Copyright
Bold statements that need some good legal counsel: Citations don’t have copyright
Unless you get them from OCLC, other services
Bibliographies have copyright They’re a scholarly work
Underlying content has copyright Except when it doesn’t
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Reuse, don’t rebuild
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
“All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”
~ Grimaldi & Engel, 2005, Evolution of the Insects
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Who knowth not the name, knoweth not the subject
Linnaeus, 1737, Critica Botanica n 210.
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• Information about named groups (taxa) of organisms (taxon-related information)
• Extends back at least 1000 years
• Books, journals, surveys• Museum specimens,
herbaria• In many languages and is
distributed
From T.E. Glover, The Fishes of Southwestern Japan, c.1870
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
The challenge for contemporary DIGITAL libraries
Goal:
Use one name to find the content for all names
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Names – the only universal metadata for Biology
Names offer a logical way to search for and index content
•Names annotate data objects•All names annotate all data objects
•A compilation of all names ever used is the foundation of a universal index for biology or for a semantic web for biology
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
LibrariesPublishers
MuseumsFederal Agencies
Who is affected by these problems?
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Serious challenges in federated environments
One organism
4 scientific names
4 maps
We want one map
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Reconciliation – linking alternative names for the same organism
A query initiated with any name, can be expanded to all names and will unify data associated with each
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• All names & all Classifications ClassificationBank • Alternative names reconciled
• Similar names disambiguated
• Exploit hierarchies to browse and search, build a comprehensive classification
• Improve performance with federated systems
• Read documents, web sites, databases and taxonomically indexing the content
• Create a unified portal to information about organisms on the internet
Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• data from various sources may be merged
• red dots on the maplink back to the website thatprovided the geographical co-ordinates
Specimen distribution data from remote sources
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
BHL Taxonomic Intelligence Tool
Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
uBio
• 10.7 Million+ Name Strings
• Reconciliation Groups
• http://www.ubio.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
uBioRSS Taxonomically Intelligent RSS Feed Aggregator
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
MBL WHOI Library – Woods Hole authors’ publications
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
MBL WHOI Library – Woods Hole species publications
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Taxonomically intelligent scientific text parsing
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Taxonomic Intelligence• Lexicon of Scientific Names
• Reconciliation and Disambiguation
• Hierarchical Inclusion
• Integration into Information Retrieval
• Linkage to Other Data Types (e.g., Molecular, Morphological, Phenotype)
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
BEYOND
AND
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
EMF Biology of Aging
Ellison Medical Foundation (EMF)“Enable the Study of Aging Across
the Spectrum of Life”Officially Began January 2008
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
FEDORA Commons
EMF Biology of AgingConditions
Locations
Organisms
Genes
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
EMF/EOL Key Resources
• Medline, BHL (Literature)
• GenBank (Molecular)
• EOL (Habitat & Location)
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• All organisms are affected by aging
• Not all aging is associated with disease
• The flip side: Understanding aging might give insights to regeneration
A constant
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biomedical Focus
• Expand the scope of organisms beyond the “classic” models:
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Goals of EMF (years one & two)• What genes are associated with aging conditions?• What are the conditions associated with these
genes?• What organisms are associated with the aging genes
and conditions?• What other organisms might also have aging genes?• Where do the identified organisms live, and in what
types of habitats?• What are the demographic patterns associated with
organisms across the spectrum of life?• What are common phenotypes associated with
organisms that share common aging genes?
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
• Proven the concept of mass scanning of general collections• Proven concept of automated structured markup done in collaboration with
Penn Stat and the Internet Archive• Built proof of concept portal on proprietary ( .Net) environment.• High levels of OCR accuracy in late 19th and 20th century printing• Applied taxonomic intelligence ( species name finding) across million of pages
Across millions of pages against nearly 11 million names in Name Bank..• Data mining BHL for other bioinformatics projects.(EOL/BOA)• Obtained buy-in from a diverse group of learned societies for the BHL opt-in • Copyright model• Support and encouragement from our traditional bibliophile, and scientific • Audiences• Collaboration with an international group of competitive organizations
Status today
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
•Get equal cost efficiencies and speed for special collections• Nail down automated structural mark up to a high level of accuracy•Improve OCR for publications in other languages with little human intervention•Broaden the use of taxonomic intelligence algorithm Data mining BHL for other bioinformatics projects.• Work with commercial publishers for fair and equitable use of their publications•Expand audiences through social networking and repurposing content for• new audiences• Expand the consortium to bring in more partners, and more partners in Asia, South •and the developing world
Status Tomorrow
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
www.eol.org
www.ubio.org
www.biodiversitylibrary.org
Biodiversity Heritage Library
© 2011 Biodiversity Heritage Library www.biodiversitylibrary.org
Acknowledgments
Jewett FoundatonA.W. Mellon Foundation
Alfred P. Sloan FoundationJohn D. & Catherine T. MacArthur Foundation
Internet Archive
Recommended