27
AMI2: High-throughput extraction of semantic chemistry from the scientific literature the ChemistryVisitor Andy Howlett, Mark Williamson, Peter Murray-Rust, Robert Glen Unilever Centre, Cambridge

Dial-A-Molecule Talk

  • Upload
    mw529

  • View
    102

  • Download
    0

Embed Size (px)

DESCRIPTION

Andy Howlett's Dial-A-Molecule talk 9th July 2014

Citation preview

Page 1: Dial-A-Molecule Talk

AMI2: High-throughput

extraction of semantic chemistry

from the scientific literature – the

ChemistryVisitorAndy Howlett, Mark Williamson, Peter Murray-Rust, Robert Glen

Unilever Centre, Cambridge

Page 2: Dial-A-Molecule Talk

AMI2 is a framework that can extract

semantic data from the scientific

literature.

Page 3: Dial-A-Molecule Talk

Importance of semantic data

<html>

<p>

<a href="http://news.bbc.co.uk">BBC

News</a>

<p>

<a href="http://www.cam.ac.uk">Cambridge

University</a>

<p>

</html>

Page 4: Dial-A-Molecule Talk

AMI2 architecture

Page 5: Dial-A-Molecule Talk

SpeciesVisitor

Page 6: Dial-A-Molecule Talk

ChemistryVisitor

Page 7: Dial-A-Molecule Talk

PhylogeneticTreeVisitor

Page 8: Dial-A-Molecule Talk
Page 9: Dial-A-Molecule Talk

Turning dumb PDF into smart SVG

<svg version="1.0" xmlns="http://www.w3.org/2000/svg">

<line x1="10" y1="10"

x2="200" y2="200"

stroke="black" />

<circle cx="75" cy="300" r="50"

fill="blue"

stroke="red" />

<path d="M 100 410 L 410 710 L 100 700 z"

fill="red"

stroke="blue"

stroke-width="3" />

</svg>

X1,Y1

X2,Y2

Page 10: Dial-A-Molecule Talk

1) SpeciesVisitor

Page 11: Dial-A-Molecule Talk

2) ChemistryVisitor

Page 12: Dial-A-Molecule Talk

3) PhylogeneticTreeVisitor

Page 13: Dial-A-Molecule Talk

ChemistryVisitor:

converting graphics primitives

<svg>

<line x1="269.97" y1="528.12" x2="269.97" y2="536.58" />

<line x1="272.16" y1="528.12" x2="272.16" y2="536.58" />

</svg>

SVGBuilder ChemistryVisitor

SVG from PDF

“Magic Happens Here”

removal of cosmetic artifact

(Graphics program eye candy)

Page 14: Dial-A-Molecule Talk

<?xml version="1.0"?><molecule xmlns="http://www.xml-cml.org/schema"><atomArray><atom id="a1" elementType="C"/><atom id="a2" elementType="C"/><atom id="a3" elementType="H"/><atom id="a4" elementType="H"/><atom id="a5" elementType="H"/><atom id="a6" elementType="H"/></atomArray><bondArray><bond atomRefs2="a1 a2" order="2"/><bond atomRefs2="a1 a3" order="1"/><bond atomRefs2="a1 a4" order="1"/><bond atomRefs2="a2 a5" order="1"/><bond atomRefs2="a2 a6" order="1"/></bondArray></molecule>

SVG to CML

Implicit hydrogens

Page 15: Dial-A-Molecule Talk

http://bitbucket.org/AndyHowlett/ami2-poc

Page 16: Dial-A-Molecule Talk

A1) Reactions using ChemistryVisitor

Fig 4: Metabolites 2012, 2(1), 100-133

<reactant> <conditions> <product>

Page 17: Dial-A-Molecule Talk

A2) More complex reactions

Page 18: Dial-A-Molecule Talk

B) Error detection

Page 19: Dial-A-Molecule Talk

C) Fraud detection

Org. Lett., 2011, 13 (15), pp 4084–4087

Original thanks to ChemBark

Page 20: Dial-A-Molecule Talk

After AMI2 processing…

… AMI2 has detected a square

Page 21: Dial-A-Molecule Talk
Page 22: Dial-A-Molecule Talk
Page 23: Dial-A-Molecule Talk

Conclusions

We can extract reactions semantically.

We can validate structures with associated names.

This can be done as part of a generic framework for extracting data.

Such capabilities are useful for Dial-a-Molecule – applied only to

metabolism so far but could also be used for synthesis

Page 24: Dial-A-Molecule Talk

https://bitbucket.org/petermr/ami/wiki/Home

Page 25: Dial-A-Molecule Talk

Acknowledgments

Unilever metabolism team, Cambridge

Unilever for funding

Page 26: Dial-A-Molecule Talk
Page 27: Dial-A-Molecule Talk