27
Biodiversity Informatics and the Biodiversity Literature

Biodiversity Informatics and the Biodiversity Literature

  • Upload
    ronnie

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

Biodiversity Informatics and the Biodiversity Literature. Overview. Progress over the last decade Organism occurrence data Taxonomic databases The next challenge Describing diversity. Organism Occurrence Data. - PowerPoint PPT Presentation

Citation preview

Page 1: Biodiversity  Informatics and the Biodiversity Literature

Biodiversity Informatics and the Biodiversity Literature

Page 2: Biodiversity  Informatics and the Biodiversity Literature

Overview• Progress over the last

decade– Organism occurrence data– Taxonomic databases

• The next challenge – Describing diversity

Page 3: Biodiversity  Informatics and the Biodiversity Literature

CAS USNM FMNH NHM MNHN Institutions

Tools and standards created in biodiversity informatics enable data to be aggregated from around the world.

Collection Databases

End UserAnywhere data.GBIF.org

San Francisco Washington Chicago London Paris

The Global Biodiversity Information Facility (GBIF) is the largest aggregator of organism occurrence data.

California Academyof Sciences

National Museumof Natural History

Field Museumof Natural History

The Natural History Museum

Museum Nacionalde Histoire Natural

Organism Occurrence Data

Page 4: Biodiversity  Informatics and the Biodiversity Literature

Organism occurrence data

Page 5: Biodiversity  Informatics and the Biodiversity Literature
Page 6: Biodiversity  Informatics and the Biodiversity Literature
Page 7: Biodiversity  Informatics and the Biodiversity Literature

Distribution models

Page 8: Biodiversity  Informatics and the Biodiversity Literature

Remaining challenges with occurrence data

• Lots of digitization still to do• Taxonomic identifications need to be updated• Georeferencing still needs to be done

Relationship to literature:• Specimens and observations are primary data• Literature contains both reports of primary

data, as well as summarized data• Large scale digitization efforts in museums

might (will) swamp the content in literature

Page 9: Biodiversity  Informatics and the Biodiversity Literature

Taxonomic Databases

Nomenclator Checklist valid /accepted taxa(plus synonyms)

Catalog of uses in taxonomic works

Index – all uniquename-stringsmapped to valid names/concepts

>20M

increasing density of names in relevant corpus

Page 10: Biodiversity  Informatics and the Biodiversity Literature

Emergent consensus• Philosophical/methodological debates

– Species concepts • Biological • Evolutionary• Phylogenetic

– Taxonomic definitions• Circumscription • Synonymized types• Set of specimens identified by taxon author• Tree or linneage-based definition

Page 11: Biodiversity  Informatics and the Biodiversity Literature

Anchor name-usage to publication metadata; actual publication;

enable validation

Name Usage Name

Citation(publication metadata)

begin end

Page 12: Biodiversity  Informatics and the Biodiversity Literature

Remaining challenges with taxonomic data

• Taxa are concepts created in literature

• Physical instances of the same published work are “equivalent”

• Develop shared logical identifiers• Reconciliation across “authoritative”

databases; fewer number of same as records

Page 13: Biodiversity  Informatics and the Biodiversity Literature

Recap• Taxonomic names are key to

– Information retrieval– Information summary and grouping– Publication metadata are critical to anchoring

taxonomic concepts, and– Providing the semantic touchstones for

collaboration (critical)• Occurrence data gives us species distributions

– Direct relationship to literature is small– But taxonomy is critical to integrating occurrence

data, so the literature is still fundamental

Page 14: Biodiversity  Informatics and the Biodiversity Literature

What’s next?

What’s next

Page 15: Biodiversity  Informatics and the Biodiversity Literature

What other classes of information remain in the literature?…that could be extracted and structured to be really useful?

Page 16: Biodiversity  Informatics and the Biodiversity Literature

Genetic and genomic data?

…are not communicated or stored in the literature

Genetic/Genomic data

Page 17: Biodiversity  Informatics and the Biodiversity Literature

A Model Organism

Danio rerio

the zebrafish

Page 18: Biodiversity  Informatics and the Biodiversity Literature

Understanding the origins of speciesthrough structured descriptions of diversity

Genotype A

Deve

lopm

ent

Phenotype A

evolutionGenotype B

Phenotype B

mutationGenomic Diversity

MorphologicalDiversity

Page 19: Biodiversity  Informatics and the Biodiversity Literature

Morphological variation across species difficult to find and

synthesize

Page 20: Biodiversity  Informatics and the Biodiversity Literature

Information retrieval from free-text is difficult

Page 21: Biodiversity  Informatics and the Biodiversity Literature

21(Lundberg and Akama 2005)

Not computable across studies

Page 22: Biodiversity  Informatics and the Biodiversity Literature

What is an ontology?

• A set of well-defined terms and the logical relationships that hold between them

• Represents knowledge of a discipline

Page 23: Biodiversity  Informatics and the Biodiversity Literature

Teleost Anatomy Ontology terms and relationships

is_apart_of

is_a

develops_from

part_of

basihyal bone

ventral hyoid arch

basihyal cartilage

pharyngeal arch cartilage

is_abasihyal element

replacement bone

Page 24: Biodiversity  Informatics and the Biodiversity Literature

24

Ontologies quickly become large and complex; guiding philosophy required

Dahdul et al., 2010, Systematic Biology

The Teleost Anatomy Ontology contains 3,039 terms, with >600 skeletal terms

Page 25: Biodiversity  Informatics and the Biodiversity Literature

Fig. 1, Washington et al., 2010

Translational medicine

Translation from model organisms to humans

Page 26: Biodiversity  Informatics and the Biodiversity Literature

Phenoscape II & Research Coordination Network (RCN)

• Extended to include other model organisms and taxonomic groups, e.g.:– Amphibian Anatomy Ontology (AAO) – Blackburn,

CAS– Hymenoptera Anatomy Ontology (HAO) – Deans,

NCSU– Plant Ontology – Huala, Stanford

• NLP and term extraction (Hong Cui, Univ of Arizona)

Page 27: Biodiversity  Informatics and the Biodiversity Literature

What’s next?• Description of biological phenomena• Determining how best to do this will take

time• Top-down design, guided by functional

demonstration• Bottom-up curation of existing descriptions,• into structured knowledge through iteration