Upload
david-shorthouse
View
1.879
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Encyclopedia of LifeRedefining Publication and Access to the
Primary Literature
David P. Shorthouse
Marine Biological Laboratory
Woods Hole, MA
2
Imagine an electronic page for each species of organism on Earth, available
everywhere by single access on command. The page contains the scientific name of
the species, a pictorial or genomic presentation of the primary type specimen on
which its name is based, and a summary of its diagnostic traits. The page opens out
directly or by linkage with other databases such as ARKive, Ecoport, and
GenBank. It comprises a summary of everything known about the species’
genome, proteome, geographic distribution, phylogenetic position, habitat,
ecological relationships, and, not least, its perceived practical importance for
humanity.
Steering Committee
Biodiversity Heritage Library
Executive Smithsonian
Marine Biological Laboratory
Biodiversity Informatics
Atlas of Living Australia
Missouri Botanical GardenPlants
Harvard UniversityEducation and outreach
Field MuseumResearch Community
MacArthur Foundation
Sloan Foundation
Not the first time…
Tree of Life
Catalogue of Life
SpeciesBase
Discover Life
4
What makes this distinct…
Grandeur of the vision
Taxonomically intelligent, names-based cyberinfrastructure
Aggregation of content
Participatory
Open source content and software
…not just a website
5
6
David “Paddy” Patterson
Peter Mangiafico
Patrick LearyDavid Shorthouse
Kristen LansPam Fournier
Alexey Shipunov
Vitthal KudalJeremy Rice
Dimitry Mozzherin Anne Thessen
Exemplar Pages
Anne Pringle • Brian Farrell • Alta Buden • Margaret Thayer • Michael Ashburner • Christy Geraci • Lilibeth Miranda • Senjie Lin
Rick Wilkerson
Jonathan Losos • David Langor • David Shorthouse • Mary Hennen • Judy Stoffer • George Yatskievych • Kendra Buresch
Tonia Hsieh • David Patterson • Christian Thompson • Rod Eastwood • Jerry Louton • Seth Bordenstein
Rich Pyle • Roger Hanlon • Tamara ClARKMWendy Applequist • Grace Servat • Bob Magill • Sandy Knapp • Vicki Funk
Exemplar Process
8
Current Rate ≈ 100 pages / year
9
Biodiversity Heritage Library
Missouri Botanical Garden
New York Botanical Garden
Royal Botanic Gardens, Kew
Field Museum
Natural History Museum (London)
Smithsonian Institution
American Museum of Natural History
Botany Libraries, Harvard University
Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University
Marine Biological Laboratory / Woods Hole Oceanographic Institution Library (MBL/WHOI)
10
11
WHAT?
Digitize the core literature of biodiversity. Full works, not bits & pieces.Open Access: all content can be repurposed, reused, reformatted.Congruent: must fit in to a dynamic knowledge ecology.
BHL Status
9.2M pages
Challenges: metadata extraction & search trajectories– Penn State collaborations– Honing name-matching algorithms, natural
language processing
12
Names are Messy
13
Aa paleacea
Limulus polyphemus
Kiwa hirsuta
Osedax frankpressi
Kingia australisPieris japonica
Pieris rapae
Trypanosoma brucei
Homo sapiens
13
More than One Meaning (Polysemes)
14
Aotus trivirgatus
Aotus Illiger 1811
Aotus
Aotus Smith 1805
Aotus ericoides
. Resolve with intelligent disambiguationAuthority, species, contextual data
Contextual data
PrimateMonkeyEyesFoodPanamaAotus nancymaae
Contextual data
legumeplantflowerMirbelieaAustraliaAotus mollis
Anorexia nervosaHabeas corpusEtcetera etcetera
15
Many names for one species…
Koko
Горилла
Guerilla
Eastern Lowland Gorilla
Gorilla graueri
Gorilla berengei
Gorilla beringei Matschie
Gorilla beringei mikenensis
King kong
Gorilla gorilla
Virunga
Gorila
GorilleMountain gorilla
大猩猩
ゴリラ
15
EOL the Aggregator
“Content partner” schema– Media elements, species profile model– Attribution, licensing
16
Pyle, R. L., J. L. Earle and B. D. Greene. 2008. Five new species of the damselfish genus Chromis (Perciformes: Labroidei: Pomacentridae) from deep coral reefs in the tropical western Pacific. Zootaxa. 1671: 3–31.
17
All information currently in the public domain
will remain in the public domain.
Content providers are required to
adopt a Creative Commons license for the information
that they serve through the EOL. Except for public-
domain content, the default and
preferred license is CC-BY
Content providers who request some restrictions on re-
use of their information may
select: CC-BY-SACC-BY-NC
CC-BY-NC-SA
Licensing Policy for Content Partners
To the greatest extent possible, the Encyclopedia of Life promotes an open-source, open-access approach.
The EOL will provide attribution information for all
content that it serves. EOL will also indicate the
Creative Commons license attached to
each object (text, structured data, graphics,
multimedia, etc.).
V5.0 5 April 2008
EOL the Enabler
Curate species page
Partial funds for post-doctoral positions
18
Cybertaxonomy Drivers
Pool of active taxonomists is evaporatingShift to online workflow must:– Be meaningful (foster engagement with organisms)– Attract funding– Provide personal and institutional visibility– Be scholarly (e.g. citation metrics)– Be simple and task-oriented– Federate workloads
19
Hine, Christine. 2008. Systematics as Cyberscience: Computers, Change, and Continuity in Science. MIT Press. Cambridge, Massachusetts. 307pp.
Nearctic Spider Database: Can This be a Template?
Meaningful
Provides personal & institutional visibility
Simple, task-oriented
Shares the workload
20
21
What’s in My Backyard?
<?xml version="1.0" encoding="windows-1252"?><!--Zoom Search Engine Version 5.0 (1002) PRO--><rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:zoom="http://www.wrensoft/zoom/response/5.0/schema/"><channel><title>Nearctic Spider Database</title><description>Search species pages in The Nearctic Spider Database</description><link>http://canadianarachnology.dyndns.org/data/canada_spiders/</link><opensearch:link rel="search" href="./data/canada_spiders/search/search.xml“ type="application/opensearchdescription+xml" /><zoom:searchquery>pardosa moesta</zoom:searchquery><zoom:searchcategory>All</zoom:searchcategory><opensearch:totalResults>27</opensearch:totalResults><opensearch:startIndex>10</opensearch:startIndex><opensearch:itemsPerPage>10</opensearch:itemsPerPage><item> ............
Can I Share or Get Help?
Can I Track My Searches?
OpenSearch
Can I Grab That Image?
HTML (JavaScript)
& bbCode Gadgets
Taxa-Centric Approach
22
23
http://lifedesk.eol.org
Mid-December alpha testing
Customization (skinning)
Image gallery
Facile names & classification management
Species Page creation:– Images, “chapters”
Licensing and attribution
Granular roles and permissions 24
25
Why Data Centricity?Observe some features
Of some individuals
On a few occasions
In a few places
Record them incompletely
Convert un-interpreted data into interpreted assertions
Construct a narrative
Loss of data26
The Future
27
Raw dataTripl
e stor
e Correlations
Filters:
Faceted searches: What was that tree with pink flowers that we saw in Washington last May?
Visualizations
EOL is…
NOT the mother of all catalogues
A prelude to biocentric data management– Enable a shift from narrative taxonomy to
datacentric taxonomy
A web hosting infrastructure for taxa-centric pursuits
28