68
Donat Agosti Plazi http://plazi.org Systematics Association Oxford, 28. August 2015 Nothing in taxonomy makes sense except in the light of Open Access

Nothing in taxonomy makes sense except in the light of Open Access

  • Upload
    agosti

  • View
    695

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Nothing in taxonomy makes sense except in the light of Open Access

Donat Agosti Plazihttp://plazi.org

Systematics AssociationOxford, 28. August 2015

Nothing in taxonomy makes sense except in the light of Open Access

Page 2: Nothing in taxonomy makes sense except in the light of Open Access
Page 3: Nothing in taxonomy makes sense except in the light of Open Access

I want to be able at anytime, anywhere to access, mine and analyse a

significant body of published and digitized taxonomic knowledge.

I want to build by machine the catalogue of life.

I hope taxonomiy communications arrives in the 21st century

Vision and hope

Page 4: Nothing in taxonomy makes sense except in the light of Open Access

1. The demand

Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the

only location with a complete set of ant systematics publications from 1758 - present.

Through antbase.org‘s

digital library, access

to this body of

literature is worldwide,

and it is actively used

(>10,000 visits in one

month only).2004

Page 5: Nothing in taxonomy makes sense except in the light of Open Access

2. The corpus of taxonomic literature

Page 6: Nothing in taxonomy makes sense except in the light of Open Access

Build and establish a TreatmentBank, such as Plazi, as basis forcontent mining of and linking to the taxonomic literature

3. The core corpus of taxonomic knowledge: Treatments

Page 7: Nothing in taxonomy makes sense except in the light of Open Access

4. Make use of the semantic linked WWW

Avoid all the waistful actual publishing!

• Publish structured data• Publish open access• Make taxonomic literature first class literature by minting

DOIs and making digital copies accessible• Add links to names, treatments, articles, DNA sequences,

digital objects• Help by building your own public corpus of citable data

Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.

Page 8: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the seduction of science (for a young kid)

Page 9: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the seduction of science (for a young kid)

Page 10: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the seduction of science (for a young kid)

Page 11: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the seduction of science (for an adult)

Page 12: Nothing in taxonomy makes sense except in the light of Open Access

Get a copy of the Cyclothone paper

Surfing or the seduction of science (for an adult)

Page 13: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the imperative for science

Page 14: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the imperative for science

Page 15: Nothing in taxonomy makes sense except in the light of Open Access

Linking treatments and data with external resources

NCBI

Surfing or the imperative for science

Page 16: Nothing in taxonomy makes sense except in the light of Open Access

Establish Plazi as, or use Plazi to build TreatmentBank as source for content mining of thetaxonomic literature

TreatmentBank

Page 17: Nothing in taxonomy makes sense except in the light of Open Access

What are the species in Amazonia?

TreatmentBank

Page 19: Nothing in taxonomy makes sense except in the light of Open Access

Text mining tools: Visualization of treatment content

Summary of content of 37 Zootaxa spider publications and 8 Biodiversity Data Journal. (Miller et al., 2015)

Page 20: Nothing in taxonomy makes sense except in the light of Open Access

Pseudomyrmex ants and Vachellia ant-acaciasare a classic example of mutualism in biology.

allenii

melanoceras

ruddiae

chiapensis

collinsii

cookii

cornigera

globulifera

hindsii

janzenii

mayana

sphaerocephala

boopis

flavicornis

hesperius

ita

janzenikuenckeli

mixtecus

nigrocinctus

nigropilosus

opaciceps

particeps

peperi

reconditus

satanicus

simulansspinicola

subtilissimus

veneficus

ferrugineus

gentlei

gracilis

Transbiotic link networkAssociated species linked throughreferences in taxonomic treatments

Acacia-ant species: Pseudomyrmex gracili

Treatment: redescription

Associated ant-acacia: Acacia gentlei

Ants Plants

Photocredits: Alex Wild

Treatment

Treatments linked through citations

Text mining tools: Visualization of treatment content

Page 21: Nothing in taxonomy makes sense except in the light of Open Access

What does this mean?

The Linking Open Data cloud diagram

Linked Open Data Cloud

Page 22: Nothing in taxonomy makes sense except in the light of Open Access

The demand: scientists and citizen scientists

Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the

only location with a complete set of ant systematics publications from 1758 - present.

Through antbase.org‘s

digital library, access

to this body of

literature is worldwide,

and it is actively used

(>10,000 visits in one

month only).

Online catalogueOpen accessOnline library2004

Page 23: Nothing in taxonomy makes sense except in the light of Open Access

Online catalogue

The interest of big science

2004

2005

Page 24: Nothing in taxonomy makes sense except in the light of Open Access

The demand: scientists and citizen scientists

Page 25: Nothing in taxonomy makes sense except in the light of Open Access

The scientific challenge: Bridging the gap

1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca421 acatttattt

Page 26: Nothing in taxonomy makes sense except in the light of Open Access

Where do we stand?

Page 27: Nothing in taxonomy makes sense except in the light of Open Access
Page 28: Nothing in taxonomy makes sense except in the light of Open Access

The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone

In contrast, ichthyologists put the likely figure for bristlemouths at hundreds of trillions — and perhaps quadrillions, or thousands of trillions.

Page 29: Nothing in taxonomy makes sense except in the light of Open Access

The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone

Page 30: Nothing in taxonomy makes sense except in the light of Open Access
Page 31: Nothing in taxonomy makes sense except in the light of Open Access

Taxonomy?Source?

Page 32: Nothing in taxonomy makes sense except in the light of Open Access
Page 33: Nothing in taxonomy makes sense except in the light of Open Access

Issue USD 266.00Article USD 48.00

Page 34: Nothing in taxonomy makes sense except in the light of Open Access

Get a copy of the Cyclothone paper

Our contribution for a better understanding of biodiversity

Page 35: Nothing in taxonomy makes sense except in the light of Open Access

Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages. Source: (Agosti 2005)

Access

Page 36: Nothing in taxonomy makes sense except in the light of Open Access

• Limited access (copyright)

• Limited discoverability of content

• Research results cannot be cited

• Data mining does not work

Issues of access

Page 37: Nothing in taxonomy makes sense except in the light of Open Access

Provide an open access, linked corpus of taxonomic literature

A solution

Page 38: Nothing in taxonomy makes sense except in the light of Open Access

Surfing at breakfast table

Page 39: Nothing in taxonomy makes sense except in the light of Open Access

article

treatment

CiteshttpURI

cites (DOI)

Scientific name

https://www.wikidata.org/wiki/Property:P1992

Feed Wikipedia with taxonomic data

Page 40: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the imperative for science

Page 41: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the imperative for science

Page 42: Nothing in taxonomy makes sense except in the light of Open Access

Surfing or the imperative for science

Page 43: Nothing in taxonomy makes sense except in the light of Open Access

LODPDF

HNS

HNS

Surfing or the imperative for science: Use of name services

Page 44: Nothing in taxonomy makes sense except in the light of Open Access

The goal

Page 45: Nothing in taxonomy makes sense except in the light of Open Access

Create a citable open corpus of taxonomic publications

Page 46: Nothing in taxonomy makes sense except in the light of Open Access
Page 47: Nothing in taxonomy makes sense except in the light of Open Access

Biodiversity Literature Repository: Record

Page 48: Nothing in taxonomy makes sense except in the light of Open Access

Biodiversity Literature Repository: RecordTreatment

Illustration

Page 49: Nothing in taxonomy makes sense except in the light of Open Access

http://plazi.org/wiki/Blue_ListPatterson et al., 2014: http://dx.doi.org/10.1186/1756-0500-7-79

Legal issues

Page 50: Nothing in taxonomy makes sense except in the light of Open Access

Workflow

Plazi SRS

find scan «OCR» markup store +access

Page 51: Nothing in taxonomy makes sense except in the light of Open Access

Text

<tax:treatment>

<tax:nomenclature>

<tax:name>

<tax:xid source="HNS" identifier="193329"/>

<tax:xmldata>

<dc:Genus>Mystrium</dc:Genus>

<dc:Species>leonie</dc:Species>

</tax:xmldata>

Mystrium leonie

</tax:name>

<tax:status>n. sp.</tax:status>

Fig 1 D - F

</tax:nomenclature>

<tax:div type="description">

<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL

1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin

to a sharp apical tooth, the apex parallel to the anterior

(Holotype with material in mandibles, so mandibles and

$ described below from paratypes.) Median clypeus

....

</treatment>

Semantisch erweiterter Text(TaxonX)

… alternatives: From human to machine readable text

RDF

Page 52: Nothing in taxonomy makes sense except in the light of Open Access

Plazi tools: table extraction

«Treatment»Wissenschaftliche ArtnameVerbreitungsnachweisBibliographische Records

Cataglyphis tartessica workersVariable mean ± SDHead length 11.23 ± 0.12Head width 11.15 ± 0.12Scape length 11.47 ± 0.12Mesosoma length 11.94 ± 0.16Femur length 12.03 ± 0.14Cephalic index 0 93.60 ± 3.940Scape index 128.10 ± 7.660

Page 53: Nothing in taxonomy makes sense except in the light of Open Access

Plazi tools: discovering of scientific names

Page 54: Nothing in taxonomy makes sense except in the light of Open Access

Plazi tools: discovering and parsing of bibliographic references

Page 55: Nothing in taxonomy makes sense except in the light of Open Access

Plazi tools: discovering and parsing of observation data

Page 56: Nothing in taxonomy makes sense except in the light of Open Access

Plazi tools: discovering of treatments

Page 57: Nothing in taxonomy makes sense except in the light of Open Access

Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication).

Treatment

The special case taxonomic literature: The citated elements aretreatments, not article

Formica obsoleta Linnaeus, 1758: 580

Page 58: Nothing in taxonomy makes sense except in the light of Open Access

Treatment

Page 59: Nothing in taxonomy makes sense except in the light of Open Access

Original combinations

Reference to an orginal combination

Subsequent useages of names cite the referenced treatment

What is a treatment?

Page 60: Nothing in taxonomy makes sense except in the light of Open Access

Treatment and treatment reference and citation

Trea

tmen

t ci

tati

on

Treatment references

Page 61: Nothing in taxonomy makes sense except in the light of Open Access

Treatment

Citing of treatments or linking of treatments to treatments

By minting persistent httpURIs for treatments, treatmentscan be cited like a bibliographic reference

http://treatment.plazi.org/id/A9FFD1FC-4629-FFB4-968F-AD38386521BA

Page 62: Nothing in taxonomy makes sense except in the light of Open Access

Status quo

• 50,000+ treatments life, daily growth

• RDF in Betaversion

• GoldenGate Imagine (PDF and text mining tool) in betaversion

• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb

• Biodiversity Literature Repository functional

Page 63: Nothing in taxonomy makes sense except in the light of Open Access

Next steps

• Collaborate with ContentMine to extract >50

treatments/day

Page 64: Nothing in taxonomy makes sense except in the light of Open Access

Next steps

Planned collaboration with ContentMine to extract treatments on a daly bases

http://www.slideshare.net/petermurrayrust/?

BioDiv

Page 65: Nothing in taxonomy makes sense except in the light of Open Access

Next steps

• Collaborate with ContentMine to extract 50 treatments/day

• 1 Million treatments life

• RDF Version accessibl

• GoldenGate Imagine (Text mining tool)

• Provider für Daten für NCBI, GBIF, EOL, antweb

• Biodiversity Literature Repository mit 100,000 bibliographic

references and digital copies (PDF, images, etc.)

Page 66: Nothing in taxonomy makes sense except in the light of Open Access

Next steps

BUT

Page 67: Nothing in taxonomy makes sense except in the light of Open Access

Next steps

Avoid all this waste (our next generation will have to clean up)!

Publish structured dataPublish open accessPublish in journals with DOIAdd links to names, treatments, articles, DNA sequences, digital objectsHelp build your own corpus of citable data

Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.

Page 68: Nothing in taxonomy makes sense except in the light of Open Access

Thanks!

Donat Agosti

[email protected]

Acknowledgment: Pensoft, Zenodo/CERN, NCBI, Wikidata, ContentMine