Upload
petermurrayrust
View
134
Download
0
Embed Size (px)
Citation preview
Asking the scientific literature to tell us about metabolism
Peter Murray-Rust, Reader Emeritus, Dept of Chemistry, Univ Cambridge
and Founder TheContentMine
Lhasa, Leeds, UK, 2017-01-12
contentmine.org is supported by a grant to PMR as a
Thousands of scientists have to re-type the literature.
Machines should be doing it!
Treat them as friends.
100 clinical trials a day, 5000 articles a day
Software and Special Thanks
Molecular Informatics, CambridgePeter Corbett, OSCAR (chemical entities),Andy Howlett, “OSIRIS” (graphical chemistry)Daniel Lowe, OPSIN (name 2 structure)Lezan Hawizy, ChemicalTagger (recipes)Mark Williamson, integration and deployment
ContentMine Rik Smith-Unna, getpapers, quickscrape (discovery) Tom Arrow, WikiFactMine (Wikimedia semantics)PM-R norma, AMI (platform) CML (semantics)
ALL SOFTWARE IS OPEN (Apache2)
AMI! Tell me what YOU know about monoxidine?
Wikipedia
Wikidata for monoxidine
Wikidata for moxonidine
Entity extraction
OPSIN says this name is wrong! OSIRIS will interpret this structureIncluding the annotation
Reaction Schemes
Tables
Tables
Graphs
Entities
Plot
Plot
Maths?
Models?
What’s the title?
Some demos
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0111303&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRYTEXT
MATH
contentmine.org tackles these
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Automatic semantic markup of chemistry
Could be used for analytical, crystallization, etc.
AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other
CLICK HERE FOR ANIMATION
(may be browser dependent)
UNITS
TICKS
QUANTITYSCALE
TITLES
DATA!!2000+ points
VECTOR PDF
Dumb PDF
CSV
SemanticSpectrum
2nd Derivative
Smoothing Gaussian Filter
Automaticextraction
Search on publicly accessible papers on “Zika”
https://rawgit.com/ContentMine/amidemos/master/zika/full.dataTables.html
C) What’s the problem with this spectrum?
Org. Lett., 2011, 13 (15), pp 4084–4087
Original thanks to ChemBark
After AMI2 processing…..
… AMI2 has detected a square
“… simulated by 21cmFAST is in principle independent”
“it is a feature of the 21cmFAST code, and is explained in §3.1.”
SciCodes[1]: Searching for software in arXiv[1]
[1] Proposal to LJ Arnold Foundation (Alice Allen ASCL and PMR)
Using the semi-numerical simulation, 21cmFAST,
[2] arxiv.org: the physics/maths/astronomy.. Preprint server
The language identifies the software!
arxIv has >500 mentions of “21cmFast”