41
New Target Prediction and Visualization Tools Incorporating Open Source Molecular Fingerprints For TB Mobile Version 2.0 Alex M. Clark 1 Malabika Sarker 2 and Sean Ekins 3, 4 1 Molecular Materials Informatics, 1900 St. Jacques #302, Montreal Quebec, Canada H3J 2S1 2 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA. 3 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA. 4 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. .

New Target Prediction and Visualization Tools Incorporating Open Source Molecular Fingerprints For TB Mobile Version 2.0

Embed Size (px)

Citation preview

New Target Prediction and Visualization Tools Incorporating Open Source Molecular Fingerprints

For TB Mobile Version 2.0

Alex M. Clark1 Malabika Sarker2 and Sean Ekins3, 4

1Molecular Materials Informatics, 1900 St. Jacques #302, Montreal Quebec, Canada H3J 2S12SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.

3Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.4Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.

.

Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)

1/3rd of worlds population infected!!!!

streptomycin (1943)

para-aminosalicyclic acid (1949)

isoniazid (1952)

pyrazinamide (1954)

cycloserine (1955)

ethambutol (1962)

rifampicin (1967)

• Multi drug resistance in 4.3% of

cases

• Extensively drug resistant

increasing incidence

• One new drug (bedaquiline) in 40

yrs

TB key points

Tested >350,000 molecules Tested ~2M 2M >300,000

>1500 active and non toxic Published 177 100s 800

Big Data: Screening for New Tuberculosis Treatments

TBDA screened over 1 million, 1 million more to go

TB Alliance + Japanese pharma screensHow many will become a new drug?How do we learn from this big data?

What are the targets for these molecules?

Over 6 years analyzed in vitro data and built models

Top scoring molecules

assayed for

Mtb growth inhibition

Mtb screening

molecule

database/s

High-throughput

phenotypic

Mtb screening

Descriptors + Bioactivity (+Cytotoxicity)

Bayesian Machine Learning classification Mtb Model

Molecule Database

(e.g. GSK malaria

actives)

virtually scored

using Bayesian Models

New bioactivity data

may enhance models

Identify in vitro hits and test models3 x published prospective tests ~750

molecules were tested in vitro

198 actives were identified

>20 % hit rate

Multiple retrospective tests 3-10 fold

enrichment

NH

S

N

Ekins et al., Pharm Res 31: 414-435, 2014

Ekins, et al., Tuberculosis 94; 162-169, 2014

Ekins, et al., PLOSONE 8; e63240, 2013

Ekins, et al., Chem Biol 20: 370-378, 2013

Ekins, et al., JCIM, 53: 3054−3063, 2013

Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,

Pathway analysis

Binding site similarity to Mtb proteins

Docking

Bayesian Models - ligand similarity

Predicting the target/s for small molecules

Multi-step process

1. Identification of essential in vivo enzymes of Mtb involved intensive

literature mining and manual curation, to extract all the genes essential

for Mtb growth in vivo across species.

2. Homolog information was collated from other studies.

3. Collection of metabolic pathway information involved using TBDB.

4. Identifying molecules and drugs with known or predicted targets

involved searching the CDD databases for manually curated data. The

structures and data were exported for combination with the other data.

5. All data were combined with URL links to literature and TBDB and

deposited in the CDD database.

Initially over 700 molecules in dataset

Dataset Curation: TB molecules and target information

database connects molecule, gene, pathway and literature

Sarker et al., Pharm Res 2012, 29, 2115-2127.

TB molecules and target information database connects

molecule, gene, pathway and literature

• mobile is revolutionary: a clean break

entirely new user interface

no backward compatibility

highly constrained resources

applicable to entirely new situations

mainframes

minicomputers

personal computers

portable laptops

mobile tablets

smartphones

?

The future of computing for science is mobile

Create an app that would be useful to TB

researchers

An active compound from phenotypic screens–

but what is target

Predict TB targets by similarity, mine published

data

Push the boundaries of what can be done in a

mobile app

Cheminformatics and Bioinformatics in an App

http://goo.gl/vPOKS

http://goo.gl/iDJFR

TB Mobile 2– Is on iTunes and TB Mobile 1 is on Google

play and are FREE

TB Mobile 2. layout on iPhone

About CDD

Molecule search

Filters

Action Menu Molecule prediction Clustering About TB Mobile

Control block

Compound list

Text search

TB Mobile 2. iPhone vs TB Mobile 1. Android

Molecule Detail and Links

iPhone Android

Bookmark

copy

open-in

cluster

close

TB Mobile 2. iPhone vs TB Mobile 1.

Android Similarity Searching in the app

iPhone Android

TB Mobile 2. – Filtering and Sharing

Functions

Data can also be filtered by target name, pathway name,

essentiality and human ortholog

TB Mobile 2. iPhone Filtering and Sharing Functions

Each molecule can be copied to the clipboard then opened with other apps (e.g. MMDS, MolPrime, MolSync, ChemSpider, and from these exported via Twitter or email) or shared via Dropbox.

PCA of 745 compounds with Mtb

targets (blue) and 60 new

compounds (yellow)

Chemical property space of screening hits and

molecules evaluated in TB Mobile 2.

PCA of 745 compounds with Mtb

targets (blue) and 20 new test

compounds (yellow)

Gene Count Gene Count

Rv0283 1 Rv0678 1

Rv1211 1 Rv1685c 1

Rv1885c 4 Rv3160c 1

Rv3161c 1 TB27.3 (Rv0577) 2

ald (Rv2780) 2 alr (Rv3423c) 8

aroD (Rv2537c) 14 aspS (Rv2572c) 1

atpE (Rv1305) 2 blaC (Rv2068c) 1

clpB (Rv0384c) 1 clpC (Rv3596c) 1

cyp121 (Rv2276) 2 cyp130 (Rv1256c) 2

cyp51 (Rv0764c) 2 cysH (Rv2392) 10

cysS (Rv2130c) 1 dacB2 (Rv2911) 1

dapA (Rv2753c) 12 deaD (Rv1253) 1

def (Rv0429c) 14 dfrA (Rv2763c) 3

dinG (Rv1329c)" (count=1) 1 dlaT (Rv2215) 2

dnaA (Rv0001)" (count=1) 1 dnaB (Rv0058) 1

dnaE2 (Rv3370c)" (count=1) 1 dprE1 (Rv3790) 8

dprE2" (count=1) 1 drpE2 (Rv3791) 2

dxr (Rv2870c) 1 dxs1 (Rv2682C) 29embA (Rv3794) 2 embB (Rv3795) 1

embC (Rv3793) 1 engA (Rv1713) 1

era (Rv2364c) 1 ethA (Rv3854c) 1

fabG (Rv0242c) 2 fabH (Rv0533) 48fadD32 (Rv3801c) 5 fbpC (Rv0129C) 21folP1 (Rv3608C) 1 folP2 (Rv1207) 1

frdA (Rv1552) 1 ftsZ (Rv2150c) 3

fusA1 (Rv0684) 3 fusA2 (Rv0120c) 3

glcB (Rv837c) 1 glf (Rv3809c) 40

glmU (Rv1018c) 1 guab2 (Rv3411) 1

gyrA (Rv0006) 24 gyrB (Rv0005) 9

ilvG (Rv1820) 1 infB (Rv2839c) 1

inhA (Rv1484) 157 kasA (Rv2245) 9

kasB (Rv2246) 5 ldtMt1 (Rv0116c) 4

ldtMt2 (Rv2518c) 1 lpd (Rv0462) 5

lppS (Rv2515c) 1 mbtA (Rv2384) 95

mca (Rv1082) 27 mfd (Rv1020) 1

mmpL3 (Rv0206c) 15 moeW (Rv2338c) 1

mshB (Rv1170) 4 murB (Rv0482) 1

murD (Rv2155c) 2 nadB (Rv1595) 1

ndhA (Rv0392c) 1 nrdR (Rv2718c) 2

pH Homeostasis 5 panC (Rv3602c) 20pks13 (Rv3800c) 3 proteasome 2

ptpA (Rv2234) 38 ptpB (Rv0153c) 3

purU (Rv2964) 2 qcrB (Rv2196) 5

quinol oxidase 1 recG (Rv2973c) 1

rplC (Rv0701) 2 rplJ (Rv0651) 3

rpoB (Rv0667) 4 sahH (Rv3248c) 2

thiL (Rv2977c) 106 tlyA (Rv1694) 2

tuf (Rv0685) 3 uvrA (Rv1638) 1

Target distribution

in TB Mobile 2.

MoDELS RESIDE IN PAPERS

NOT ACCESSIBLE…THIS IS

UNDESIRABLE

How do we share them?

How do we use Them?

Open Extended Connectivity Fingerprints

ECFP_6 FCFP_6• Collected,

deduplicated, hashed

• Sparse integers

• Invented for Pipeline Pilot: public method, proprietary details

• Often used with Bayesian models: many published papers

• Built a new implementation: open source, Java, CDK– stable: fingerprints don't change with each new toolkit release

– well defined: easy to document precise steps

– easy to port: already migrated to iOS (Objective-C) for TB Mobile app

• Provides core basis feature for CDD open source model service

•Clark et al., J Cheminformatics 6: 38 (2014)

Testing the fingerprints – comparing to published data

Dataset Leave one out ROC Published

Reference Leave one out ROC in this study

Combined model (5304

molecules) ECFP_6

fingerprints

N/A N/A 0.77

Combined model (5304

molecules) FCFP_6

fingerprints

0.71 J Chem Inf

Model

53:3054-

3063.

0.77

MLSMR dual event model

(2273 molecules) and

ECFP_6 fingerprints

N/A N/A 0.84

MLSMR dual event model

(2273 molecules) and

FCFP_6 fingerprints

0.86 PLOSONE

8:e63240

0.83

Published models also include 8 additional descriptors as well as fingerprints

•Clark et al., J Cheminformatics 6: 38 (2014)

Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6

fingerprints; (b) modified Bayesian estimators for active and inactive

compounds; (c) structures of selected binders.

For each listed target with at least two binders, it is first assumed that all of the

molecules in the collection that do not indicate this as one of their targets are

inactive.

In the app we used ECFP_6 fingerprints

Building Bayesian models for each target

Predict targetsCluster molecules

Open in MMDS

Bayesian predictions, data export and clustering

Clark et al., J Cheminformatics, 6: 38 (2014)

Draw structures either in app, paste or open from

other apps e.g. MMDS

TB Mobile ranks content

TB Mobile can use built in target Bayesian models to

predict target

Take a screenshot of results

Output bayesian model predictions to MMDS

Compare to published data

Annotate results, tabulate

Process used to evaluate TB Mobile

We have curated an additional set of 20 molecules that have activity

against Mtb and were identified by HTS or other methods

Several targets were not in the database

Molecules active against Mtb evaluated in TB Mobile app

•Clark et al., J Cheminformatics 6: 38 (2014)

Continue to update with more data

Outreach to increase awareness of app and data

Add machine learning algorithms to predict Mtb activity

(in vitro and in vivo whole cell activity)

Could we appify similar target data for other neglected

diseases/ targets e.g. malaria

What next ?

Uses Bayesian algorithm and FCFP_6 fingerprints

Model with >300,000 compounds

Copyright © 2013 All Rights Reserved Collaborative Drug Discovery

MM4TB: 25 organizations

New

Old

Neuroscience

Kinetoplastid Drug Development

Consortium

Exposure of CDD content from collaboration with SRI

More visibility for brand in new places

Experiment in small database with focus on content

delivery

A functional app to reach scientists that may not have

cheminformatics or bioinformatics training

Used in several collaborations e.g. at Rutgers etc.

Test bed for developing technologies for main product

Lead to CDD Models

on Github look for the tools section, for class

org.openscience.cdk.fingerprint.model.Bayesian.

Opened door to models in other apps

Benefits of creating TB Mobile

Papers published on TB Mobile or using it

Ekins et al., Tuberculosis 94: 162-169 (2014)

Ekins et al., J Chem Inf Model 53: 3054 (2013)

Clark et al., J Cheminformatics 6:38 (2014)

Ekins et al., J Cheminformatics 5:13 (2013)

Krishna Dole and all at CDD, SRI, IDRI and many others …Funding: 2R42AI088893-02 NIAID, NIH;

9R44TR000942-02 NCATS, NIH; CDD TB has been developed thanks to funding from the Bill and Melinda

Gates Foundation (Grant#49852)

Freundlich Lab

Drop by and visit CDD in Booth 1155.

Alex Malabika

App papers

• Ekins S, Clark AM, Swamidass SJ, Litterman N and Williams AJ, Bigger data, collaborative tools and the future of predictive drug discovery, J Comp-Aided Mol Design, 54:2157-65, 2014.

• Clark AM, Sarker M and Ekins S, New target predictions and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0, J Cheminform 6: 38, 2014.

• Ekins S, Casey A.C, Roberts D, Parish T. and Bunin BA, Bayesian Models for Screening and TB Mobile for Target Inference with Mycobacterium tuberculosis, Tuberculosis, 94:162-9, 2014.

• Clark AM, Williams AJ and Ekins S, Cheminformatics workflows using mobile apps, Chem-Bio Informatics Journal, 13: 1-18 2013.

• Ekins S, Clark AM and Williams AJ, Incorporating Green Chemistry Concepts into Mobile Applications and their potential uses, ACS Sustain Chem Eng, 1. 8-13, 2013.

• Ekins S, Freundlich JS and Reynolds RC, Fusing dual-event datasets for Mycobacterium Tuberculosis machine learning models and their evaluation, J Chem Inf Model, 53(11):3054-63, 2013.

• Ekins S, Clark AM and Williams AJ, Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration, Mol Informatics, 31: 585-597, 2012.

• Clark, AM, Ekins S and Williams AJ, Redefining cheminformatics with intuitive collaborative mobile apps, Mol Informatics, 31: 569-584, 2012.

• Williams, AJ, Ekins S, Clark AM, Jack JJ and Apodaca RL, Mobile apps for chemistry in the world of drug discovery, Drug Disc Today, 16:928-939, 2011.

Extra slides

14 First line drugs active against Mtb evaluated in TB Mobile app and the top 3 molecules shown

Confirms all in TB Mobile and retrieved

Ekins et al., Tuberculosis 94: 162-169 (2014)

Predicted targets using TB Mobile

No verification yet

iPhone Android

TB Mobile 1. layout on iPhone and Android

~745 compounds with targets

Predicted targets of GSK TB hits months earlier using TB Mobile

GSK report hits Dec 2012

24th Jan 2013 http://goo.gl/9LKrPZ

GSK predict targets Oct 2013

PCA of 745 compounds

with Mtb targets (blue)

and 1200 Mtb active and

non cytotoxic hits

compounds (yellow)

Chemical property space of TB Mobile compounds

Ekins et al., Tuberculosis 94: 162-169 (2014)

PCA of 745 compounds

with Mtb targets (blue)

and 177 GSK Mtb leads

(yellow)

Ekins et al., J Chem Inf Model 53: 3054 (2013)