36
Lexichem, a New Era Dr Ed Cannon Scientific Software Developer

Jcup 3 (2012) Presentation: Lexichem, a new Era. By Ed Cannon

Embed Size (px)

Citation preview

OpenEye Template 2010

Lexichem, a New EraDr Ed CannonScientific Software Developer

1

OverviewLexichem TK v2.1.1, released February 2012 Real world applications of Lexichem Performance metric New features Lexichem Workbench

5/23/12 2012 OpenEye Scientific Software2

Start by talking about some real world applications of Lexichem, what you can do with it and who is using it.then move on to talk about Lexichem, TK v2.11.released Metric to assess how well Lexichem is performingNew features from v2.0.2Finish off with GUI2

Lexichem

Now lets talk about the main purpose of my were here, Lexichem. 3

Lexichem5/23/12 2012 OpenEye Scientific Software4

Supported Nomenclature

IUPAC 79 / 93 / 2005Chemical Abstract / CASTraditionalMDL / BeilsteinAutoNomOpenEyeSupported Languages (17)

English (American / British) German JapaneseSpanishSwedish

Lexichem is OpenEyes chemical nomenclature software, you can:->convert names to molecules->molecules to names->translate names from one language to anotherLexichem comes in two flavours4

Command Line Applications5/23/12 2012 OpenEye Scientific Software5

Glycinate

Standalone applications run from command line.5

Lexichem TK5/23/12 2012 OpenEye Scientific Software

6

For those who want to program and use Lexichem:Lexichem TK is written in C++ and Swig (Simplified Wrapper and Interface Generator) wrapped to python, Java and C#.6

Applications

So what can Lexichem do, other than help you buy heroin?7

Pipelines5/23/12 2012 OpenEye Scientific Software8

Large scale conversion of structures to names and names to structures

Easy integration in workflows

Keith Taylor showed yesterday a Pipeline pilot workflow with a node for converting structures to namesWorkflow integration with Pipeline Pilot.Node use 1 of Lexichems functions-> mol2nam, nam2mol, translateMatt Stahl working on nodes using OpenEye Software.Lexichem node highlighted in square ->convert structures to names.8

Webservices5/23/12 2012 OpenEye Scientific Software9

Mol2Nam

PUG SOAP

Molecules

Hits in PubChem

Lexichem Webservice available

Integration with 3rd party webservicesPubChem usesLexichem

Craig Bruce hired recently, been developing Webservices for OE, one being LexichemUse Lexichem prior / post processing to Webservice.PUG (Power User Gateway)Search for structures with the names found in across PubChem.9

Lexiparser5/23/12 2012 OpenEye Scientific Software10Automated extraction of structures and names from documentsSupported formats:.txt .html.doc.docx.rtf.pdf FUTURE

Chemical name extraction from patents/documents and structure generation (Lexiparser uses Lexichem)->Uses Lexichem after its extracted chemical names to convert them to molecules.10

Extracting Structures from a Patent5/23/1211

Patent URL

Names extracted

Generate Structures

Desktop Applications5/23/12 2012 OpenEye Scientific Software12

Electronic Laboratory Notebook

NEW! Lexichem Workbench

Lexichem is the engine beneath the hood, when a user draws a structure a call to Lexichem is made which generates a name which can be rendered.Alternatively you can import chemical names convert them to molecules and visualize the image.12

Performance Metric

Purpose of metric-> ensure we are not regressing but improving Lexichem-> identify areas / features in need of improvement-> gold standard which other companies can then compare chemical nomenclature software13

Why a New Performance Metric ?5/23/12 2012 OpenEye Scientific Software14Ensures consistent improvement of LexichemIdentify areas in need of developmentGold standard for all chemical nomenclature software

->Concept: start pt, and an end pt after some processing, then compare start pt to end pt+ve: quick to calculate, gives one figure value of how accurate Lexichem is on dataset->Paper accepted

Adv SMILEs: human readable, less verbose than inchi, tautomer support

14

Round Tripping5/23/12 2012 OpenEye Scientific Software15*E.O.Cannon. JCIM 2012, DOI: 10.1021/ci3000419

Compare the initial and final structure after name generation

Easy to calculate

->Concept: start pt, and an end pt after some processing, then compare start pt to end pt+ve: quick to calculate, gives one figure value of how accurate Lexichem is on dataset->Paper accepted

Adv SMILEs: human readable, less verbose than inchi, tautomer support

15

Results

The concept of a benchmark is good, but do we have good results using it?16

Performance5/23/1217

Speed%RTCS

Whilst these results are good, is Lexichem feasible on a large scale?

Seen Lexichem performs well and is feasible on large datasets, now lets look at what features been added.----- Meeting Notes (3/28/12 11:38) -----Mol2Nam -> canonicalize atoms & bonds,identify atom types, identify ring systems and size, bridges, locants and positions,identify stereo / walk the graph17

New Features

So what have we added since v2.0.2?

Our main drive has been looking nam2mol features (as theyre not quite as well supported as mol2nam), in particular the ability to generate molecules for large ring systems. (one of Lexichems weaker points in the previous releases)18

5/23/12 2012 OpenEye Scientific Software

19

Nam2mol New Featuresvon Baeyer

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid19

von BaeyerPoly-linear von Baeyer Spiro Compounds

5/23/12 2012 OpenEye Scientific Software

18

Nam2mol New Features

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid20

Nam2mol New Featuresvon BaeyerPoly-linear von Baeyer Spiro CompoundsPoly-branched von Baeyer Spiro Compounds

5/23/12 2012 OpenEye Scientific Software

18

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid21

Nam2mol New Featuresvon BaeyerPoly-linear von Baeyer Spiro CompoundsPoly-branched von Baeyer Spiro CompoundsSteroids

5/23/12 2012 OpenEye Scientific Software

18

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid22

Nam2mol New Featuresvon BaeyerPoly-linear von Baeyer Spiro CompoundsPoly-branched von Baeyer Spiro CompoundsSteroidsAlkaloids

5/23/12 2012 OpenEye Scientific Software

18

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid23

Nam2mol New Features von BaeyerPoly-linear von Baeyer Spiro CompoundsPoly-branched von Baeyer Spiro CompoundsSteroidsAlkaloidsTerpenes

5/23/12 2011 OpenEye Scientific Software

18

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid24

Nam2mol New Featuresvon BaeyerPoly-linear von Baeyer Spiro CompoundsPoly-branched von Baeyer Spiro CompoundsSteroidsAlkaloidsTerpenesL/D-amino acids

5/23/12 2012 OpenEye Scientific Software

18L-ArginineD-Arginine

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid25

Nam2mol New Featuresvon BaeyerPoly-linear von Baeyer Spiro CompoundsPoly-branched von Baeyer Spiro CompoundsSteroidsAlkaloidsTerpenesL/D-amino acidsR-groups

5/23/12 2012 OpenEye Scientific Software

18

Von Baeyer -> polyalicyclic ring systems-> previously only bicyclic supported.

-> working hard on augmenting natural productsBeta-carotene in carrots -> Provitamin A carotenoid26

5/23/12 2012 OpenEye Scientific Software5,6,6a,7-tetrahydro-4H-dibenzo[de,g]quinolineYohimbanBenzo[cd]indoleOctahydro-1H-4,7-epoxyisoindoleRingtemplates

Mol2Nam New Templates19

Ring templates for conversion of molecules to names.Mainly bridge and fused ring templates have been added.27

Lexichem Workbench

Primary goal of the GUI was to:-> Lower the bar to use Lexichems functionality (for people not keen on using an API to program against, or using the command line)28

Lexichem WorkbenchGraphical user interface for LexichemFeatures:Home page LexiWebLexiParserNam2MolMol2NamTranslateResults 5/23/12 2012 OpenEye Scientific Software21

Main Window5/23/12 2012 OpenEye Scientific Software22Converts input SMILES string or chemical nameVisual display of the structureChemical information:Molecular weightSMILESIUPAC name

->Primarily modeled on the command line tools, but provided numerous additional features

2 Menus: Open a molecular fileImport name, SMILES, set default options, clear the view.30

Results5/23/12 2012 OpenEye Scientific Software23

Results historyOriginal input on display

31

Results5/23/12 2012 OpenEye Scientific Software24Text options:Copy selected cellsSave table Display selection

Results5/23/12 2012 OpenEye Scientific Software25

Display options:SaveCopyPrint

Substructure Search5/23/12 2012 OpenEye Scientific Software26

Options:Functional group from listCustom SMILES/SMARTS patternCustom name

Filter the results34

ConclusionsLarge number of new features available in Lexichem TK v2.1.1

New! Performance metric

New! Lexichem Workbench desktop application available

5/23/12 2012 OpenEye Scientific Software27

Story about Lexichem, you know you want it, then please feel free to contact us

Future work: continue to work on fused polycyclic ring systems, natural products35

OpenEye Scientific SoftwareFor more information, please contact us:

[email protected]@[email protected]+1-505-473-7385 (USA)+81-3-6206-1425 (Japan)

5/23/12 2012 OpenEye Scientific Software28