20110122 vibrant final

Preview:

DESCRIPTION

lecture presented at ViBRANT meeting in PARIS, January 20, 2011

Citation preview

The Future of Scientific Publishing

Donat Agosti (Plazi, Bern) 21 January 2011

Paris

I don‘t know the future, but I have a dream…

Immersing in the knowledge

I want to ask a publication a question, not the author telling me what I have

to read.

I want to find out

how many and which species are there? how are they related? do they disappear?

how are they distributed?

I want to find out

how many and which species there are how are they related do they disappear

Other people have different interests

An example from the Neurocommons text mining pilot:

• PubMed abstracts: > 16,000,000• CNS classified abstracts: 874,727• text mining recognized: 368,688• text mining processed: 94,381

• extracted graph of 30,000+ relationships and 5,500 genes and proteins “protein-protein

interaction networks” John Wilbanks, Neurocommons

In a semantic Web environment (where machines talk to each other and do most of our work), data need to be able to talk to each other:

27,266 papers

4,563 papers41,985 papers

10,365 papers

128,437 papers

“protein-protein interaction networks” John Wilbanks, Neurocommons

It will open up scientific literature for data mining

“protein-protein interaction networks” John Wilbanks, Neurocommons

An example from the taxonomy text mining pilot:

• Every year: > 17,000 new species described / year• Every year: >100,000 species redescribed /year• Total journals: >2,000 with taxonomic content• Total: 1,900,000 species described• Total: >20,000,000 treatments• text mining processed: 0

• extracted graph of 0 species 0 relationships Taxon mining project

1996

Conservation, Phylogeny, Systematics, Curiosity, Aesthetics, Fascination

2011

Experience, Frustration, Wonder, Excitment, Satisfaction,

Determination

Modeling taxonomic literature:TaxonX

Taxpub NLM DTDPlazi

- Get LSID from Hymenoptera Name Server for names; ZooBank?-Add new names

- Get bibliographic Metadata from HNS (MODS)

- Get bibliographic Guids from bioguid (or EDIT?)

- Get geographic long/lat from geonames.org

Plazi workflow: GoldenGate mark up as an example

-Get Guids for - CBOL- NCBI- specimen- images- .....

The semantically enhanced treatments, extracted, stored on Plazi.org, and served in a human readable form, are linked to the underlying data: Fisher & Smith, 2008, PLoS ONE.

Plazi Search and Retrieval Server: Access to data

TAPIR, SPM

You

You

You

human

machine

The conversion comes at a cost, even though GoldenGate and other editors exist

Ann. Soc. Entomol. Belg.

0

1

2

3

4

5

6

7

3961

3967

3956

3954

3855

3686

3920

3923

3712

3953

3786

3723

4001

4018

3715

3940

4022

4026

8070

HNS ID

min

Time per minute to produce clean OCR using ABBYY; publications in chronological order

Production metrics to measure effort and compare various approaches and alogrithm

How to mark up large body of legacy publications?

Inhouse?Build / use commercial services?Use the community, e.g. volunteers?

Activation energy

Gutenberg Semantic Web

Cos

t pe

r kn

owle

dge

Training and demos...

Avoid it

Prospective publications:Zookeys / Phytokeys

Semantic enhancements to published texts

2036

?

Why do we publish?

Public funded research

Contribute to the welfare of the nations…

Dissemination

Access

Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present.

Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only).

Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages)

The Biodiversity Heritage Library is currently digitizing and make accessible >100 million pages, most of them out of copyright, ie older then 1925. ........ to be finished in 2048...

What is a publication from public funded science?

Open Access

What is a scientific publication?

Print, journal, article, treatment, public funding, pdf, xml

Tool to disseminate scientific knowledge

Why do we publish the way we publish?

What kind of publications serve our needs?

IPBES

Access

Beyond the PDF

Access to what?

Scratchpad, EOL page, Wikipage, species page

Treatment

Treatments come with a lot of overhead

Title

Author

Abstract

Introduction

Taxon descriptions

Suppl. Materials

AcknowledgmentsReferences

Genus

Diagnosis

Notes

Biology

Distribution

Key to sp.

Species descriptions

Species 1

Species 2

Species 3

Species 4

Species ..

Species n

The structure of a systematics publication

Species treatments

Nomenclature

Diagnosis

Distribution

Material Examined

Comments

Description

Graphic art

Species 1

Treatments come with a lot of overheadTreatments are highly structured

Title

Author

Abstract

Introduction

Taxon descriptions

Suppl. Materials

AcknowledgmentsReferences

Genus

Diagnosis

Notes

Biology

Distribution

Key to sp.

Species descriptions

Species 1

Species 2

Species 3

Species 4

Species ..

Species n

The structure of a systematics publication

Species treatments

Nomenclature

Diagnosis

Distribution

Material Examined

Comments

Description

Graphic art

Species 1

Treatments come with a lot of overheadTreatments are highly structured

Content ist defined

Treatments come with a lot of overheadTreatments are highly structured

Content ist defined XML can define it

This can also be applied to entire sections of text, such as the descriptions of a species and its parts.

<tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 93, SL 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margin strongly curving to a sharp apical tooth, the apex parallel to the anterior clypeal margin. (Holotype with material in mandibles, so mandibles and anterior clypeus $ described below from paratypes.) Median clypeus....</treatment>

Treatments come with a lot of overheadtreatments are highly structured

Content ist definedXML defines them

The question is, how to get them

Mark-up of legacy publications

$$$$$$$$$$$$$$$$$

Prospective semantic mark-up and linking to external sources is the

future

Treatment repository+

external resources

BHL-Modern

The future is writable.

Happy Birthday!January 15, 2001

What is a scientific publication?

Wikipedia entry as a publication?

Quality control

What is a scientific publication?

Centrifugal versus centripetal forcesor

are we attractive enough?

Continuity

$$$$$$$

http://plazi.org

Thank you very much!

Donat Agosti

agosti@plazi.org

Recommended