46
Frontiers of discovery with Encyclopedia of Life TraitBank research and other case studies Cyndy Parr Smithsonian Institution National Museum of Natural History [email protected] @cydparr http://www.slideshare.net/csparr

Frontiers of discovery with Encyclopedia of Life

Embed Size (px)

DESCRIPTION

Presented at the National Museum of Natural History, Smithsonian Institution 18 June 2014 Describes, among other things, development of the TraitBank repository of species attributes, and the use of EOL and TraitBank in scientific research.

Citation preview

Page 1: Frontiers of discovery with Encyclopedia of Life

Frontiers of discovery with Encyclopedia of LifeTraitBank research and other case studies

Cyndy ParrSmithsonian Institution National Museum of Natural [email protected] @cydparr http://www.slideshare.net/csparr

Page 2: Frontiers of discovery with Encyclopedia of Life

• How is EOL different• How EOL gets used• Introducing TraitBank • Loading up TraitBank• EOL & TraitBank in research• Future of EOL & TraitBank

Outline

Page 3: Frontiers of discovery with Encyclopedia of Life

Take home messages

• EOL can be useful for research• TraitBank is already awesome• Mutualism between collections,

EOL, citizen science• Let us know how we can help

Page 4: Frontiers of discovery with Encyclopedia of Life

Third party applications

How EOL is different

Page 5: Frontiers of discovery with Encyclopedia of Life

started 2008

text, media, literature

all species, genera, etc.

names infrastructure

data curation

2.6 million images

1.3 million taxa with content

Over 5 million visitors/year

75,000 registered members

eol.org

Page 6: Frontiers of discovery with Encyclopedia of Life

How EOL gets used

http://www.notesfromnature.org/

Page 7: Frontiers of discovery with Encyclopedia of Life

http://www.onezoom.org/ http://yanwong.me/

Links and images…what about research?

Page 8: Frontiers of discovery with Encyclopedia of Life

Search groups for “EOL papers”at Mendeley.com

Page 9: Frontiers of discovery with Encyclopedia of Life
Page 10: Frontiers of discovery with Encyclopedia of Life

Anatolia Zooarchaeology Case Study led by Alexandria Archive Institute1. 14 different sites2. 34+ zooarchaeologists3. Decoding, cleanup, metadata documentation4. 220,000+ specimens5. 450 entities linked to 143 EOL taxon concepts6. Anatomical entities linked to Uberon.org7. Biometrics linked to measurement ontology 8. Collaborative analysis

Anatolia Zooarchaeology Case Study led by Alexandria Archive Institute1. 14 different sites2. 34+ zooarchaeologists3. Decoding, cleanup, metadata documentation4. 220,000+ specimens5. 450 entities linked to 143 EOL taxon concepts6. Anatomical entities linked to Uberon.org7. Biometrics linked to measurement ontology 8. Collaborative analysis

http://opencontext.org/

Kansa, E., Kansa, S. W., & Arbuckle, B. (2014). Publishing and Pushing: Mixing Models for Communicating Research Data in Archaeology. International Journal for Digital Curation, 9.

Page 11: Frontiers of discovery with Encyclopedia of Life

Page, R. D. M. (2013). BioNames: linking taxonomy, texts, and trees. PeerJ, 1, e190. doi:10.7717/peerj.190

BioNames.orgRod Page

Page 12: Frontiers of discovery with Encyclopedia of Life

But can we do more?

Introducing TraitBank

Page 13: Frontiers of discovery with Encyclopedia of Life

GenBank 60 million DNA sequence records

900,000 species 4,000 genomes

How are these related to traits?

Page 14: Frontiers of discovery with Encyclopedia of Life

Quick math

In Phenoscape57 publications had 565,158 anatomical trait descriptions for 2,527 kinds of organisms= 223 traits/organism

In ZFIN 38,189 trait descriptions for 4,727 genes for Zebrafish

1.9 million species on the planet

= LOTS OF TRAITS

Page 15: Frontiers of discovery with Encyclopedia of Life

Why Smithsonian + EOL

Page 16: Frontiers of discovery with Encyclopedia of Life

• Numeric data (measurements)

• Categorical data (controlled vocabulary)

• Species interactions

• Mostly summaries for populations, species

• Individual specimens

• Higher taxa

http://eol.org/traitbank released January 2014

Page 17: Frontiers of discovery with Encyclopedia of Life

TraitBank Quick facts

Page 18: Frontiers of discovery with Encyclopedia of Life

TraitBank Data tab

Page 19: Frontiers of discovery with Encyclopedia of Life

TraitBank Metadata

Page 20: Frontiers of discovery with Encyclopedia of Life

TraitBank Search & download

Page 21: Frontiers of discovery with Encyclopedia of Life

TraitBank Search & download

Page 22: Frontiers of discovery with Encyclopedia of Life

TraitBank Data glossaryhttp://eol.org/data_glossary

Page 23: Frontiers of discovery with Encyclopedia of Life

Download

Page 24: Frontiers of discovery with Encyclopedia of Life

Making TraitBank data available to Google Knowledge Graph and anyone

Page 25: Frontiers of discovery with Encyclopedia of Life

TraitBank data sources

Sources include:

Databases (OBIS, AnAge, Paleodb, Phenoscape)

Literature(Dryad, Pangaea, Ecological Archives)

Natural History Collections(Label data)

Legacy/unpublished data

Loading up TraitBank

Page 26: Frontiers of discovery with Encyclopedia of Life

TraitBank

~7 million records

326 traits

1.2 million taxa

40+ datasetshttp://eol.org/collections/97700

Page 27: Frontiers of discovery with Encyclopedia of Life

Text miningEnvironments-EOLEvangelos Pafilis, Hellenic Centre for Marine Research (HCMR), Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Crete, Greece

491,616 habitat terms for 136,548 taxa

Page 28: Frontiers of discovery with Encyclopedia of Life

Text mining

Automated annotation Manual annotation

Page 29: Frontiers of discovery with Encyclopedia of Life

Morphological Data from NMNH KE-EmuAbi Nishimura

Project: Clean-up morphological data from NMNH catalog and publish to TraitBankGoal: Make it easier to access and analyze this valuable morphological data

Sakurai Midori, http://eol.org/data_objects/26918624

Raw data from Spectral Tarsier Tarsius tarsierdatabase search

Page 30: Frontiers of discovery with Encyclopedia of Life

RESULTS •Primate data published (320 taxa)•Comprehensive mammals data to be published soon (4662 taxa) •Bird catalog currently being mined

Wan Hong, http://eol.org/data_objects/29203274

Page 31: Frontiers of discovery with Encyclopedia of Life

Mineralization of tissue in marine organismsJen Hammock with Steve Cairns

For modeling impacts of ocean acidification 143,000 records for 119,000 species and subspecies of Micro- and Macroalgae, Cnidaria, Polychaetes, Bryozoans, Brachiopods, Sponges, Mollusks, Echinoderms and Arthropods

Mineralized tissue =●Biogenic silica●Calcium carbonate

○ Calcite○ and/or Aragonite

Page 32: Frontiers of discovery with Encyclopedia of Life

Other work in progress at NMNH• Sarah Miller: growth form, habitat, and

elevation data from botany collection specimen labels, summarizing elevation

• Reid Rumelt: behavior and other data from Cornell University Macaulay Library sound files and captions

• Katja Schulz:

PaleoBiology DataBase• BHL-MoBot: IMLS

Mining biodiversity

© Donald E. Hurlbert/Smithsonian National Museum of Natural History

Page 33: Frontiers of discovery with Encyclopedia of Life

2013-14 EOL Rubenstein Fellows

EOL & TraitBank research

1. EnvO habitat terms (Pafilis et al.)

2. Altitude Specificity of Flower Coloration (Wright & Seltmann)

3. Morphological impacts of extinction risk in fish (Chang)

4. Butterfly-host plant associations (Ferrer-Parris et al.)

5. Global Biotic Interactions (GLoBI, Poelen & Mungall et al)

6. Reol: An R interface for EOL (Banbury, O’Meara)

7. Taxon Tree Tool (Lin)

Page 34: Frontiers of discovery with Encyclopedia of Life

Chang crowdsourcingJonathan Chang, UCLAhttp://jonathanchang.org/

Amazon Mechanical Turk

Page 35: Frontiers of discovery with Encyclopedia of Life

EOL-BHL Research Sprint

Page 36: Frontiers of discovery with Encyclopedia of Life

1. Character displacement across the Tree of Life

2. Illuminating the Dark Parts of the Tree of Life

3. Evolution in the usage of anatomical concepts in the biodiversity literature

4. Planning for global change: using species interactions in conservation

5. No place like home: Defining “habitat” for biodiversity science

6. Assessing risk status of Mexican amphibians

7. Quantifying color from digital imagery: color may determine species’ responses to habitat edges and to climate change

8. More is less - Identifying global trends in species’ niche width

9. Identifying key species traits associated with climate change vulnerability

NESCent-EOL-BHL Research Sprint

Page 37: Frontiers of discovery with Encyclopedia of Life

Quantifying color from digital imagery1. Automate processing of almost 300k images (of EOL’s 2.4 million)2. Identify pinned specimen images3. Process these for color and pattern information4. Put this info into TraitBank

Elise Larsen, Yan Wong

Page 38: Frontiers of discovery with Encyclopedia of Life

Illuminating the Dark Parts of the Tree of Life

Jessica Oswald, Karen Cranston, Gordon Burleigh, Cyndy Parr

1. Query EOL, GBIF, GenBank for # records

2. Create score for amount of information available

3. Map score to phylogeny

Page 39: Frontiers of discovery with Encyclopedia of Life

Global Genome Initiative Data Portal

For every family:•Use TraitBank to assemble counts of records in repositories•Compute a score (percentile) to assess knowledge available relative to other families•Make it easy to browse to find families that require effort

Beta launch end of June

Page 40: Frontiers of discovery with Encyclopedia of Life

• NSF Genealogy of Life• NSF Big Data• TMON themed portals & traits• Bocas del Toro revisionary taxonomy workshops • NSF ABI Isotopes and Interactions• Microsoft/WCMC Global Ecosystem Models

• And more mutualisms…

EOL & TraitBank future plans

Page 41: Frontiers of discovery with Encyclopedia of Life

Leveraging social networks

Ahn, J., et al.. (2012). Visually Exploring Social Participation in Encyclopedia of Life. In 2012 International Conference on Social Informatics (pp. 149–156). IEEE.

Rotman, D., et al. (2014). Motivations affecting initial and long-term participation in citizen science projects in three countries. In iConference 2014 Proceedings (pp. 110-124).

http://biotracker.umd.edu

• motivation model for citizen scientists• international attitudes of scientists and

citizens to working together • factors that increase curation network

activity• currently working on motivations of EOL

content partners

Page 42: Frontiers of discovery with Encyclopedia of Life

Annotation of a specimen record

Ovary size and reproductive stateAge markersFat statusBody mass and other size attributes

Page 43: Frontiers of discovery with Encyclopedia of Life

Annotation of an observation record

Page 44: Frontiers of discovery with Encyclopedia of Life

For more information

• See & cite Parr, et al. 2014 Biodiv. Data Journal • See our TraitBank paper (in review)

http://www.semantic-web-journal.net/content/traitbank-practical-semantics-organism-attribute-data

• Talk to your favorite EOL person• Become an EOL Curator • See our NMNH collection of collections http://eol.org/collections/743

Page 45: Frontiers of discovery with Encyclopedia of Life

Take home messages

• EOL can be useful for research• TraitBank is already awesome• Mutualism between collections,

EOL, citizen science• Let us know how we can help

Page 46: Frontiers of discovery with Encyclopedia of Life

Atlas of Living Australia • Biodiversity Heritage Library Consortium • Chinese Academy of Sciences • La Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO) • The Field Museum • Harvard University • El Instituto Nacional de Biodiversidad (INBio) • Marine Biological Laboratory • Missouri Botanical Garden • Muséum National d’histoire Naturelle • Naturalis Netherlands • New Library of Alexandria • Smithsonian Institution • South African National Biodiversity Institute • All of our content providers and curators

Steve Cairnes • John Keltner • Katie Barker • Jonathan Coddington • Sean Brady • Tom Orrell • Chris Meyers • Patricia Gentilis • Sylvia Orli • Kate Lyons • Yan Wong • Jon Norenburg • Torsten Dikow • Yurong He • Jenny Preece and others on BioTracker team • Pensoft Publishing • EOL Science Advisory Board

Katja Schulz, Jen Hammock, Marie Studer, Jeff Holmes, Nathan Wilson, Patrick Leary, Jeremy Rice, Lisa Walley, Bob Corrigan, Erick Mata, Dmitry Mozzherin, Abi Nishimura • Sarah Miller • Anthony Goddard, Mark Westneat and former BioSynC staff

http://eol.org @eol [email protected]

Major Funding for TraitBank provided by the Alfred P. Sloan Foundation. Fellows program supported by Daniel M. Rubenstein, Research sprint by Richard Lounsbery Foundation.