Upload
agosti
View
579
Download
3
Tags:
Embed Size (px)
DESCRIPTION
this is the preprint of my lecture at the International Botanical Congress http://www.ibc2011.com/
Citation preview
A Schema for Description and Exchange of Taxonomic
Publication's Content
Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter Plazi, Bern, Switzerland
25. July 2011, IBC, Melbourne
WHY?
disseminateaccess
knowledge
New York Times, July 19, 2011
“JSTOR's the one that should be in prison, man, for locking up
knowledge.”
Hufpost Politics, July 19, 2011http://www.huffingtonpost.com/2011/07/19/huffpost-hill----gang-vio_n_904027.html
OpenAccess
An example from the Neurocommons text mining pilot:
• PubMed abstracts: > 16,000,000• CNS classified abstracts: 874,727• text mining recognized: 368,688• text mining processed: 94,381
• extracted graph of 30,000+ relationships and 5,500 genes and proteins
“protein-protein interaction networks” John Wilbanks, Neurocommons
In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:
27,266 papers
4,563 papers41,985 papers
10,365 papers
128,437 papers
“protein-protein interaction networks” John Wilbanks, Neurocommons
It will open up scientific literature for data mining
“protein-protein interaction networks” John Wilbanks, Neurocommons
HOW?
accessfor human
ANDmachine
It is about digesting millions of pages:
>>100 M pages taxonomic literature
25M scientific publications / year25K journals
>2K with zoological taxonomic descriptions
18K descriptions of new species / year
PDF is not enough
data and information in context
semantic markup
context of content
XMLeXtended Markup Language
<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie Bihn & Verhaagh, new species </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus described below from paratypes.) Median clypeus....</treatment>
content in a complex e-environment
linking
Azteca instabilis
Would then read like
<tax:name><tax:xid source=“LSID" identifier=“urn:lsid:biosci.ohio-state.edu.osuc_concetps:13452"/> Link to external database <tax:xmldata> Normalization of data <dc:Genus>Azteca</dc:Genus> <dc:Species>instabilis</dc:Species> </tax:xmldata>
Azteca instabilis </tax:name>
definition of XML tags
DTDschema
transformations from XML
htmlpdf
rdfarchiving
database
legacy TaxonXTaxpub prospective
how to use XML?
legacy publications
- Get LSID from Hymenoptera Name Server for names; ZooBank?-Add new names
- Get bibliographic Metadata from HNS (MODS)
- Get bibliographic Guids from bioguid (or EDIT?)
- Get geographic long/lat from geonames.org
Plazi workflow: GoldenGate editor based mark up and linking
-Get Guids for - CBOL- NCBI- specimen- images- .....
Legacy publications
linked data
last resort
prospective publications
the future
dissemination - access
Plazi: access to treatments
TAPIR, SPM, etc.
You
You
You
human
machine
It will open up scientific literature for data mining and extraction
“protein-protein interaction networks” John Wilbanks, Neurocommons
http://plazi.org
Thank you very much!
Donat Agosti, Terry Catapano, Lyubomir Penev & Guido Sautter
JSTOR did not permit users:c. to make other than
personal use of individually downloaded articles.
Aaron Swartz indictment, July 14, 2011