33
Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities Workshop on CEFIC LRI Project EEM9.4 LRI AMBIT with IUCLID6 support and extended search capabilities AMBIT Cheminformatics system 1 Nina Jeliazkova, Nikolay Kochev Ideaconsult Ltd. Sofia,Bulgaria

AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Workshop on CEFIC LRI Project EEM9.4

LRI AMBIT with IUCLID6 support and extended search capabilities

AMBIT Cheminformatics system

1

Nina Jeliazkova, Nikolay Kochev

Ideaconsult Ltd.

Sofia,Bulgaria

Page 2: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Content

– Introduction

– Substance data integration in AMBIT ( different input formats)

– Search functionalities

- Structures, substances and endpoint data

- Structure standardization , transformation, tautomers

– Tools integration – via common API

- Toxtree, VEGA, other models, descriptors

– User management system to grant access rights via roles

– The read across workflow

- An use case integrating the above functionalities

– IT requirements

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium2

Page 3: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT Chemoinformatics System

Developed within a CEFIC Long-Range Initiative (LRI)

EEM9.3 (2005,2008), EEM9.3-IC (2013-2015), EEM9.4 (2016-ongoing)

Continuously developed and extended through various projects

An Open Source Application with the following functions

Search for structure(s) [exact, similar, substructure] and meta data

Assigning structures to constituents, impurities …

Assessment tools (read across/category formation)

Prediction tools e.g. Toxtree (including Cramer rules , Protein binding, etc.),

descriptor calculation, pKa etc;

Data analysis tools e.g. regression, classification, clustering etc;

Data management : flexible import/export of data

Data exchange tools: manual or automated via REST Web services API;

Read across workflow

3

Page 4: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT: Chemical structures database & machine

learning with web services API http://ambit.sourceforge.net

4

Page 5: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT : Data integration via common data model

Excel spreadsheets

IUCLID6

Other

formats

Reports (Excel,

Word)

Other formats (RDF,

ISA-TAB, etc.)IUCLID5

JSON

ambitlri.ideaconsult.netFree text search

REST API

Page 6: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

IUCLID6 support in AMBIT2

– IUCLID6: Completely new XML schema of all objects

- 372 schema files, 111 endpoint study record files

- Different approach of linking between objects (compared to IUCLID5)

– Implementation

- Java classes generated from the XML schema (via JAXB)

- AMBIT code to convert the generated classes to the internal data model and be

able to store into the database

- Use existing code for writing into the database

- And existing UI to show the data

– Transparent from user point of view: select .i6z or .i5z

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium6

Page 7: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Spreadsheets for substance data import

7

configurable parser for spreadsheet data templates

Page 8: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

EFSA OpenFoodTox datahttps://www.zenodo.org/record/344883

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium8

• Excel files• Not only

chemical structures and data

• Relationships between structures

• Imported into AMBIT database with the help of a JSON configuration

Page 9: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Search substances by endpoint data

Ideaconsult Ltd.9

• Check one or more checkboxes and click the Update

results

The endpoints are combined by AND.

The results above show there are

only two substances having data for the three

selected endpoints (Appearance, Melting

point and Dissociation constant),

although there are

16 substances with data for appearance,

36 substances with melting point values and

15 substances with dissociation constant

Endpoints are grouped

in four categories

P-Chem, Env Fate

Eco Tox, Tox

Page 10: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Free text search (experimental)

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium10

Page 11: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT Search for Structures & Endpoint data

2) Find

Substance(s)

3) Display data

1) Find Structure(s)

11

Page 12: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Combining information from other data sources and prediction results

The vertical sidebar

allows collating data

and model information

with the search results.

Page 13: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Structure Diagram Editor

Ideaconsult Ltd.13

Click to show/hide

the editor

The structure

editor is JavaScript

based.

• To use the drawn

structure for

search, click

the Use button.

• To show the

structure,

specified as

SMILES in the

search bar, click

the Draw button.

Page 14: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Substructure search

Ideaconsult Ltd.14

The substructure search query can be defined by drawing the

structure, selecting a SMARTS from the predefined list of

SMARTS, or entering a SMARTS, SMILES or chemical name in

the text box

Page 15: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Substance tab

Ideaconsult Ltd.15

Use the folder

icon to open

the details.

The Substanc

es tab shows

the substances

related to the

chemical

structure, and

the role of the

chemical

structure (last

column ,

e.g. Constituent,

Impurity, Additive).

Page 16: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

16

NH2 O

O-Na

+

NH2 O

O-Na

+

conversion

to implicit

hydrogens

keep the

largest

fragment

NH2 O

O-

kekulisation

NH2 O

O-

NH2 O

OH

(i) molecule neutralization

(ii) Custom reaction transformations

NH2 O

OH

Isotopes

cleanup

NH O

OH

structure

conversion to

a canonic

tautomer

Output: smiles, InChI

N=CCCCC(=O)O

InChI=1/C5H9NO2/c6-4-2-1-3-

5(7)8/h4,6H,1-3H2,(H,7,8)

Enabling Structure Search : Structure Standardization

Page 17: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Canonic tautomer generation(a component of the standardisation procedure)

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium17

Input structure Generation of all

tautomers:

Rule instance search

RankingCanonical

NH2

OH

OH

NH

OH

OH

NH2

O

OH

NH

O

OH

NH2 O

OH

NH2 O

OH

NH O

OH

0.0 C(C=CN)C=C(O)O-0.1 C(C=CN)CC(=O)O-0.05 C(CC=N)C=C(O)O-0.15 C(CC=N)CC(=O)O

Page 18: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Ideaconsult Ltd.18

• Automatic generation of all tautomeric forms of a given organic compound.

• Customizable rules for tautomeric transformations.

• The predefined knowledge base covers 1–3, 1–5 and 1–7 proton tautomeric

shifts. Typical supported tautomerism rules are keto-enol, imin-amin, nitroso-

oxime, azo-hydrazone, thioketo-thioenol, thionitroso-thiooxime, amidine-imidine,

diazoamino-diazoamino, thioamide-iminothiol and nitrosamine-diazohydroxide

• Simple energy based system for tautomer ranking implemented by a set of

empirically derived rules.

Page 19: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT TAUTOMER

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium19

I D E A C O N S U L T L T D . 19

Result

Input structure

OC(O)=C(N)C

Generating of tautomeric forms:

- Combinatorial method

- Combinatorial method improved

- Incremental method (IA-DFS)

Rule selection

and

flag settings

RankingRe

cu

rsio

n

Structure is

removed

Post-generation

filtering

Canonical

Page 20: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Structures transformation :AMBIT SMARTS/ SMIRKS

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium20

(1) Efficient representation of SMARTS

Queries (full Daylight syntax)

(2) Fast structure isomorphism /mapping/

(3) Support of recursive SMARTS and

stereo

(4) Syntax extensions

(5) Parsing of SMIRKS

(6) Transformation of the target chemical

objects

Transformations modes:

(1) single

(2) non-overlapping,

(3) non-identical,

(4) non-homomorphic or

(5) externally specified list of sites.

Recursive expressions explicitly define

the environment around S atom.

Page 21: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Structure standardization in large datasets

– Flexible standardisation workflow

- The rules synchronised with pharma companies

– Datasets standardised with AMBIT ( H2020 FET ExCAPE

project)

- PubChem,ChEMBL,eMolecules,SureChem,ZINC,tox datasets ( > 80

mln compounds)

- http://ambit.sf.net/ambitcli_standardisation.html

- ExCAPE DB (1 mln compounds, 70 mln SAR data points) , AMBIT-

hosted , open access

- Possible future integration with LRI AMBIT

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium21

Page 22: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Communications with other systems

Other

Tools

Other

Databases

Company IUCLID DB

& ECHA IUCLID DB

as

Major Data Sources

Transfer

of 14570

Dossiers

Transfer via

Web service

or *.i6z files

22

Data

transfer

Data

transfer

Data

transfer

LRI AMBIT

Supporting

Read across &

Category formation

Page 23: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT Web API /UI for data analysis

Dataset

Models

Visualisation

Page 24: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT Web API / UI for data analysis

24

Descriptor calculation, feature selection;

Classification and regression algorithms;

Rule based algorithms;

Applicability domain algorithms;

Visualization, similarity and substructure

queries ;

Composite algorithms (workflows);

Structure optimization (MOPAC), metabolite

generation, tautomer generation, etc.

Page 25: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Integration with external toolshttps://www.vegahub.eu

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium25

Command line java application

provided by IRFMN

Page 26: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Integration with external tools : VEGAhttps://ambitlri.ideaconsult.net/tool2/ui/vega

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium26

– REST model

wrapper

– Same API as

other models

(e.g. Toxtree)

– Same user

interface

– Predictions

automatically

stored

– Straightforward

integration with

read across

matrix

Page 27: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT users management

Ideaconsult Ltd.27

The authorization is role based.

• Default roles: user, data

manager, admin, read-across

• Roles can be assigned at the

users page by admin user

Page 28: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Restricted access to assessments

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium28

Page 29: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

The read across workflow: integrated view of data and predictions

AMBIT2 Hands-on Training Workshop 29.09.2017, Brussels, Belgium29

Page 30: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

AMBIT publications and contributing projects

Peer reviewed publications (excerpt)

1. J. Sun, N. Jeliazkova, et al, ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics, J. Cheminform., vol. 9, n. 1, p. 17, Mar. 2017.

2. N. Jeliazkova, et al, The eNanoMapper database for nanomaterial safety information, Beilstein J. Nanotechnol., vol. 6, pp. 1609–1634, Jul. 2015.

3. N. Kochev, V. Paskaleva, and N. Jeliazkova, AMBIT-Tautomer: An open source tool for tautomer generation, Mol. Inform., vol. 32, pp. 1–24, 2013.

4. N. Jeliazkova and V. Jeliazkov, AMBIT RESTful web services: an implementation of the OpenTox application programming interface, J. Cheminform., vol. 3, no. 1, p. 18, Jan. 2011.

5. N. Jeliazkova, J. Jaworska, and A. Worth, Open Source Tools for Read-Across and Category Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry, 2010, pp. 408–445.

CEFIC LRI EEM9.3

P&G (J.Jaworska), Nina Jeliazkova

CEFIC LRI EEM9.3-IC , EEM9.4 (ongoing) :

IdeaConsult Ltd., UM, Clariant

Projects contributed to the development

EC FP7 OpenTox (2008-2011)

EC FP7 ToxBank (2011-2015)

EC FP7 eNanoMapper (2014-2017)

EC H2020 ExCAPE (2015-2018)

(and more)

Open source libraries

The Chemistry Development Kit

(and many more)

30

Page 31: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

can be downloaded or consulted online:

31

Publicly available https://ambitlrli.ideaconsult.net

- Clients only need a web browser

More information and download links

- http://cefic-lri.org/news/cefic-launches-ambit-chemical-safety-prediction-software/

Installation options

LOCAL on a LAPTOP/DESKTOP

- Local database, local webserver

SERVER (on company INTRANET)

- Shared database and web server. Clients only need a web browser.

Requirements

- Java 7, MySQL 5.7, Web server (servlet container, e.g. Apache Tomcat 7.x)

TECHNICAL SUPPORTcontact Ideaconsult Ltd, Sofia www.ideaconsult.net , email: [email protected]

Page 32: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities

Acknowledgements

CEFIC LRI EEM9.3-IC/EEM9.4

o Bruno Hubesch

Project idea for LRI EEM9.3-IC

o Volker Koch, Clariant

Project input :

Clariant CompTox Team

o Udo Jensch (Toxicologist)

o Volker Koch (Ecotoxicologist)

o Qiang Li (Toxicologist)

o Joachim Schneider-Reigl (Ecotoxicologist)

Project implementation

Ideaconsult Ltd. www.ideaconsult.net

32

Page 33: AMBIT Cheminformatics system - Cefic-Lricefic-lri.org/wp-content/uploads/2014/03/5...Formation, in In Silico Toxicology, M. Cronin and J. Madden, Eds. Cambridge: Royal Society of Chemistry,

Cefic LRI AMBIT2 with IUCLID6 support and extended search capabilities