Upload
cyndy-parr
View
653
Download
0
Tags:
Embed Size (px)
Citation preview
The emerging biodiversity data ecosystem
Cynthia Parr, Katja Schulz, Jennifer Hammock Smithsonian Institution
Nathan Wilson, Patrick LearyMarine Biological Laboratory
Richard AllenEnvironmental Protection Agency
Today’s story
What is EOL
Core questions
Network analysis
Hotlist development
Page richness algorithm
Conclusion: improving the health and richness of our knowledge network advances understanding
What is EOL
http://www.eol.org• Global access to knowledge
about life on earth• All species• Freely accessible & reusable:
open access, open source• Available from a single portal
in a common format• Quality• Always growing
EOL Topics
Associations Behaviour ConservationStatus Cyclicity Cytology DiagnosticDescription Diseases Dispersal Distribution Evolution GeneralDescription Genetics Growth Habitat Legislation LifeCycle LifeExpectancy LookAlikes Management Migration MolecularBiology Morphology Physiology PopulationBiology Procedures Reproduction RiskStatement Size Threats Trends TrophicStrategy Uses Description Conservation Key Biology Ecology Introduction Education Barcode CitizenScience EducationResources Genome NucleotideSequences FunctionalAdaptations FossilHistory SystematicsOrPhylogenetics Development IdentificationResources
Content providersDatabasesJournalsLifeDesksPublic contributions
Curating
CommentingTagging
http://www.eol.org
EOL is a content curation community
Aggregation
Core questions
Where is our knowledge about biodiversity?
Where are the gaps?
What are the most effective ways to fill gaps given our limited resources?
Implications and next steps
Need more data
Identify isolated projects & mechanisms for connecting them to the network
Improve resilience & redundancy
Distribute annotation & quality control
Model data flow quantity and impact
Developing the EOL hot list
Consultation with taxonomic experts
Development of criteria
Assembly of critical lists
Establishing targets for rich taxon pages, lesser known pages
EOL’s hot lists
Hot List
70,000 taxa
Conservation concern
Invasives
Model organisms
Ecologically important
Pests
Charismatics
Data availability
Red Hot List
2,800 taxa
Most searched
Top 100 invasives
Crops (food)
Zoos & aquaria
High traffic
Higher taxa
Taxon page richness algorithm
a (Breadth) b (Depth) c (Diversity)+ +
Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status
Depth: # words per text object, # words total
Diversity: Sources (partners)
60% 30% 10%
0 – 1, Threshold 0.4
Summary of EOL page richness
Overall
640,000 have content
2 % are rich
25 % have only links
to literature
Hot List
28 % of 75K are rich
Average richness = 0.30
Red Hot List
56 % of 3K are rich
Average richness = 0.43
Strategies for improving richness
Crowd-sourcing
Collections
Communities
Mobile apps
Leveraging
Enabling platforms
Enabling journals
Data mining BHL etc.
Version 2Coming in Fall
2011!
The page richness index
Helps fill gaps with existing knowledge
Helps prioritize funding and training so that it has maximum impact on closing true gaps
Will be available via API
Computing and storing richness index on EOL is a step towards storing and serving computable data
Summarize data within a partner, then across partners.
For example: compute an average value for one taxon (x specimens), compare to range of values across all taxa (621,393 samples)
Dynamic data summaries = new knowledge
Jen Hammock (EOL)Edward van den Berge (OBIS)
Atlantic CodGadus morhua
Conclusions
There is a lot of data out there in a lot of knowledge bases
Understanding how it is connected can help us improve the ecosystem
• Quality control
• Resilience
• Richness assessment
Large-scale data summaries can foster gap-filling and standing, dynamic knowledge analyses
Thank you
http://www.eol.org
160+ content partners
2000 Flickr contributors
1000s Wikipedia contributors
43,000 EOL members
Funding:John D. and Catherine T. MacArthur Foundation, Alfred P. Sloan Foundation, Cornerstone Institutions, Private Donors
Leadership: Erick Mata, Bob Corrigan, Mark Westneat, Marie Studer, Tom Garnett, Jim Edwards, David Patterson,
Developers: Peter Mangiafico, Jeremy Rice, Dimitri Mozzherin, David Shorthouse, Lisa Whalley and others
Biologists: Tanya Dewey, Audrey Aronowsky, Leo Shapiro
See Demo and Version 2 sneak peak in Software Bazaar