37
A Schema for Description and Exchange of Taxonomic Publication's Content Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter Plazi, Bern, Switzerland 25. July 2011, IBC, Melbourne

20110725 ibc xml

  • Upload
    agosti

  • View
    579

  • Download
    3

Embed Size (px)

DESCRIPTION

this is the preprint of my lecture at the International Botanical Congress http://www.ibc2011.com/

Citation preview

Page 1: 20110725 ibc xml

A Schema for Description and Exchange of Taxonomic

Publication's Content

Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter Plazi, Bern, Switzerland

25. July 2011, IBC, Melbourne

Page 2: 20110725 ibc xml

WHY?

Page 3: 20110725 ibc xml

disseminateaccess

knowledge

Page 4: 20110725 ibc xml

New York Times, July 19, 2011

Page 5: 20110725 ibc xml

“JSTOR's the one that should be in prison, man, for locking up

knowledge.”

Hufpost Politics, July 19, 2011http://www.huffingtonpost.com/2011/07/19/huffpost-hill----gang-vio_n_904027.html

Page 6: 20110725 ibc xml

OpenAccess

Page 7: 20110725 ibc xml

An example from the Neurocommons text mining pilot:

• PubMed abstracts: > 16,000,000• CNS classified abstracts: 874,727• text mining recognized: 368,688• text mining processed: 94,381

• extracted graph of 30,000+ relationships and 5,500 genes and proteins

“protein-protein interaction networks” John Wilbanks, Neurocommons

Page 8: 20110725 ibc xml

In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:

27,266 papers

4,563 papers41,985 papers

10,365 papers

128,437 papers

“protein-protein interaction networks” John Wilbanks, Neurocommons

Page 9: 20110725 ibc xml

It will open up scientific literature for data mining

“protein-protein interaction networks” John Wilbanks, Neurocommons

Page 10: 20110725 ibc xml

HOW?

Page 11: 20110725 ibc xml

accessfor human

ANDmachine

Page 12: 20110725 ibc xml

It is about digesting millions of pages:

>>100 M pages taxonomic literature

25M scientific publications / year25K journals

>2K with zoological taxonomic descriptions

18K descriptions of new species / year

Page 13: 20110725 ibc xml

PDF is not enough

Page 14: 20110725 ibc xml

data and information in context

Page 15: 20110725 ibc xml

semantic markup

Page 16: 20110725 ibc xml

context of content

Page 17: 20110725 ibc xml

XMLeXtended Markup Language

Page 18: 20110725 ibc xml

<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie Bihn & Verhaagh, new species </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus described below from paratypes.) Median clypeus....</treatment>

Page 19: 20110725 ibc xml

content in a complex e-environment

Page 20: 20110725 ibc xml

linking

Page 21: 20110725 ibc xml

Azteca instabilis

Would then read like

<tax:name><tax:xid source=“LSID" identifier=“urn:lsid:biosci.ohio-state.edu.osuc_concetps:13452"/> Link to external database <tax:xmldata> Normalization of data <dc:Genus>Azteca</dc:Genus> <dc:Species>instabilis</dc:Species> </tax:xmldata>

Azteca instabilis </tax:name>

Page 22: 20110725 ibc xml

definition of XML tags

DTDschema

Page 23: 20110725 ibc xml

transformations from XML

htmlpdf

print

rdfarchiving

database

Page 24: 20110725 ibc xml

legacy TaxonXTaxpub prospective

Page 25: 20110725 ibc xml

how to use XML?

Page 26: 20110725 ibc xml

legacy publications

Page 27: 20110725 ibc xml

- Get LSID from Hymenoptera Name Server for names; ZooBank?-Add new names

- Get bibliographic Metadata from HNS (MODS)

- Get bibliographic Guids from bioguid (or EDIT?)

- Get geographic long/lat from geonames.org

Plazi workflow: GoldenGate editor based mark up and linking

-Get Guids for - CBOL- NCBI- specimen- images- .....

Legacy publications

Page 28: 20110725 ibc xml

linked data

Page 29: 20110725 ibc xml

last resort

Page 30: 20110725 ibc xml

prospective publications

Page 31: 20110725 ibc xml
Page 32: 20110725 ibc xml

the future

Page 33: 20110725 ibc xml

dissemination - access

Page 34: 20110725 ibc xml

Plazi: access to treatments

TAPIR, SPM, etc.

You

You

You

human

machine

Page 35: 20110725 ibc xml

It will open up scientific literature for data mining and extraction

“protein-protein interaction networks” John Wilbanks, Neurocommons

Page 36: 20110725 ibc xml

http://plazi.org

Thank you very much!

Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter

[email protected]

Page 37: 20110725 ibc xml

JSTOR did not permit users:c. to make other than

personal use of individually downloaded articles.

Aaron Swartz indictment, July 14, 2011