Asking the scientific literature to tell us about metabolism

Preview:

Citation preview

Extraction of metabolism data from the literature

Peter Murray-Rust, Dept of Chemistry and TheContentMine

Lhasa, Leeds, UK, 2017-01-12

contentmine.org is supported by a grant to PMR as a

Thousands of scientists have to type the literature.

Machines should be doing it!

Special ThanksMolecular Informatics, CambridgePeter CorbettAndy HowlettDaniel LoweLezan HawizyMark Williamson

OSCAR (chemical entities),OPSIN (name 2 structure)ChemicalTagger (recipes)“OSIRIS” (graphical chemistry)

What is “Content”?

http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0111303&representation=PDF CC-BY

SECTIONS

MAPS

TABLES

CHEMISTRYTEXT

MATH

contentmine.org tackles these

Example papers – what do you want? What can we find?

Entity extraction

Reaction Schemes

Tables

Tables

Graphs

Entities

Plot

Plot

Maths?

Models?

Semantics in Wikidata

What’s the title?

Some demos

“… simulated by 21cmFAST is in principle independent”

“it is a feature of the 21cmFAST code, and is explained in §3.1.”

SciCodes[1]: Searching for software in arXiv[1]

[1] Proposal to LJ Arnold Foundation (Alice Allen ASCL and PMR)

Using the semi-numerical simulation, 21cmFAST,

[2] arxiv.org: the physics/maths/astronomy.. Preprint server

The language identifies the software!

arxIv has >500 mentions of “21cmFast”

http://chemicaltagger.ch.cam.ac.uk/

• Typical

Typical chemical synthesis

Automatic semantic markup of chemistry

Could be used for analytical, crystallization, etc.

AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home

Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:

AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other

CLICK HERE FOR ANIMATION

(may be browser dependent)

UNITS

TICKS

QUANTITYSCALE

TITLES

DATA!!2000+ points

VECTOR PDF

Dumb PDF

CSV

SemanticSpectrum

2nd Derivative

Smoothing Gaussian Filter

Automaticextraction

C) What’s the problem with this spectrum?

Org. Lett., 2011, 13 (15), pp 4084–4087

Original thanks to ChemBark

After AMI2 processing…..

… AMI2 has detected a square

Recommended