Upload
sean-ekins
View
284
Download
1
Tags:
Embed Size (px)
Citation preview
New Target Prediction and Visualization Tools Incorporating Open Source Molecular Fingerprints
For TB Mobile Version 2.0
Alex M. Clark1 Malabika Sarker2 and Sean Ekins3, 4
1Molecular Materials Informatics, 1900 St. Jacques #302, Montreal Quebec, Canada H3J 2S12SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
3Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.4Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
.
Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)
1/3rd of worlds population infected!!!!
streptomycin (1943)
para-aminosalicyclic acid (1949)
isoniazid (1952)
pyrazinamide (1954)
cycloserine (1955)
ethambutol (1962)
rifampicin (1967)
• Multi drug resistance in 4.3% of
cases
• Extensively drug resistant
increasing incidence
• One new drug (bedaquiline) in 40
yrs
TB key points
Tested >350,000 molecules Tested ~2M 2M >300,000
>1500 active and non toxic Published 177 100s 800
Big Data: Screening for New Tuberculosis Treatments
TBDA screened over 1 million, 1 million more to go
TB Alliance + Japanese pharma screensHow many will become a new drug?How do we learn from this big data?
What are the targets for these molecules?
Over 6 years analyzed in vitro data and built models
Top scoring molecules
assayed for
Mtb growth inhibition
Mtb screening
molecule
database/s
High-throughput
phenotypic
Mtb screening
Descriptors + Bioactivity (+Cytotoxicity)
Bayesian Machine Learning classification Mtb Model
Molecule Database
(e.g. GSK malaria
actives)
virtually scored
using Bayesian Models
New bioactivity data
may enhance models
Identify in vitro hits and test models3 x published prospective tests ~750
molecules were tested in vitro
198 actives were identified
>20 % hit rate
Multiple retrospective tests 3-10 fold
enrichment
NH
S
N
Ekins et al., Pharm Res 31: 414-435, 2014
Ekins, et al., Tuberculosis 94; 162-169, 2014
Ekins, et al., PLOSONE 8; e63240, 2013
Ekins, et al., Chem Biol 20: 370-378, 2013
Ekins, et al., JCIM, 53: 3054−3063, 2013
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,
Pathway analysis
Binding site similarity to Mtb proteins
Docking
Bayesian Models - ligand similarity
Predicting the target/s for small molecules
Multi-step process
1. Identification of essential in vivo enzymes of Mtb involved intensive
literature mining and manual curation, to extract all the genes essential
for Mtb growth in vivo across species.
2. Homolog information was collated from other studies.
3. Collection of metabolic pathway information involved using TBDB.
4. Identifying molecules and drugs with known or predicted targets
involved searching the CDD databases for manually curated data. The
structures and data were exported for combination with the other data.
5. All data were combined with URL links to literature and TBDB and
deposited in the CDD database.
Initially over 700 molecules in dataset
Dataset Curation: TB molecules and target information
database connects molecule, gene, pathway and literature
Sarker et al., Pharm Res 2012, 29, 2115-2127.
• mobile is revolutionary: a clean break
entirely new user interface
no backward compatibility
highly constrained resources
applicable to entirely new situations
mainframes
minicomputers
personal computers
portable laptops
mobile tablets
smartphones
?
The future of computing for science is mobile
Create an app that would be useful to TB
researchers
An active compound from phenotypic screens–
but what is target
Predict TB targets by similarity, mine published
data
Push the boundaries of what can be done in a
mobile app
Cheminformatics and Bioinformatics in an App
http://goo.gl/vPOKS
http://goo.gl/iDJFR
TB Mobile 2– Is on iTunes and TB Mobile 1 is on Google
play and are FREE
TB Mobile 2. layout on iPhone
About CDD
Molecule search
Filters
Action Menu Molecule prediction Clustering About TB Mobile
Control block
Compound list
Text search
TB Mobile 2. iPhone vs TB Mobile 1. Android
Molecule Detail and Links
iPhone Android
Bookmark
copy
open-in
cluster
close
TB Mobile 2. – Filtering and Sharing
Functions
Data can also be filtered by target name, pathway name,
essentiality and human ortholog
TB Mobile 2. iPhone Filtering and Sharing Functions
Each molecule can be copied to the clipboard then opened with other apps (e.g. MMDS, MolPrime, MolSync, ChemSpider, and from these exported via Twitter or email) or shared via Dropbox.
PCA of 745 compounds with Mtb
targets (blue) and 60 new
compounds (yellow)
Chemical property space of screening hits and
molecules evaluated in TB Mobile 2.
PCA of 745 compounds with Mtb
targets (blue) and 20 new test
compounds (yellow)
Gene Count Gene Count
Rv0283 1 Rv0678 1
Rv1211 1 Rv1685c 1
Rv1885c 4 Rv3160c 1
Rv3161c 1 TB27.3 (Rv0577) 2
ald (Rv2780) 2 alr (Rv3423c) 8
aroD (Rv2537c) 14 aspS (Rv2572c) 1
atpE (Rv1305) 2 blaC (Rv2068c) 1
clpB (Rv0384c) 1 clpC (Rv3596c) 1
cyp121 (Rv2276) 2 cyp130 (Rv1256c) 2
cyp51 (Rv0764c) 2 cysH (Rv2392) 10
cysS (Rv2130c) 1 dacB2 (Rv2911) 1
dapA (Rv2753c) 12 deaD (Rv1253) 1
def (Rv0429c) 14 dfrA (Rv2763c) 3
dinG (Rv1329c)" (count=1) 1 dlaT (Rv2215) 2
dnaA (Rv0001)" (count=1) 1 dnaB (Rv0058) 1
dnaE2 (Rv3370c)" (count=1) 1 dprE1 (Rv3790) 8
dprE2" (count=1) 1 drpE2 (Rv3791) 2
dxr (Rv2870c) 1 dxs1 (Rv2682C) 29embA (Rv3794) 2 embB (Rv3795) 1
embC (Rv3793) 1 engA (Rv1713) 1
era (Rv2364c) 1 ethA (Rv3854c) 1
fabG (Rv0242c) 2 fabH (Rv0533) 48fadD32 (Rv3801c) 5 fbpC (Rv0129C) 21folP1 (Rv3608C) 1 folP2 (Rv1207) 1
frdA (Rv1552) 1 ftsZ (Rv2150c) 3
fusA1 (Rv0684) 3 fusA2 (Rv0120c) 3
glcB (Rv837c) 1 glf (Rv3809c) 40
glmU (Rv1018c) 1 guab2 (Rv3411) 1
gyrA (Rv0006) 24 gyrB (Rv0005) 9
ilvG (Rv1820) 1 infB (Rv2839c) 1
inhA (Rv1484) 157 kasA (Rv2245) 9
kasB (Rv2246) 5 ldtMt1 (Rv0116c) 4
ldtMt2 (Rv2518c) 1 lpd (Rv0462) 5
lppS (Rv2515c) 1 mbtA (Rv2384) 95
mca (Rv1082) 27 mfd (Rv1020) 1
mmpL3 (Rv0206c) 15 moeW (Rv2338c) 1
mshB (Rv1170) 4 murB (Rv0482) 1
murD (Rv2155c) 2 nadB (Rv1595) 1
ndhA (Rv0392c) 1 nrdR (Rv2718c) 2
pH Homeostasis 5 panC (Rv3602c) 20pks13 (Rv3800c) 3 proteasome 2
ptpA (Rv2234) 38 ptpB (Rv0153c) 3
purU (Rv2964) 2 qcrB (Rv2196) 5
quinol oxidase 1 recG (Rv2973c) 1
rplC (Rv0701) 2 rplJ (Rv0651) 3
rpoB (Rv0667) 4 sahH (Rv3248c) 2
thiL (Rv2977c) 106 tlyA (Rv1694) 2
tuf (Rv0685) 3 uvrA (Rv1638) 1
Target distribution
in TB Mobile 2.
MoDELS RESIDE IN PAPERS
NOT ACCESSIBLE…THIS IS
UNDESIRABLE
How do we share them?
How do we use Them?
Open Extended Connectivity Fingerprints
ECFP_6 FCFP_6• Collected,
deduplicated, hashed
• Sparse integers
• Invented for Pipeline Pilot: public method, proprietary details
• Often used with Bayesian models: many published papers
• Built a new implementation: open source, Java, CDK– stable: fingerprints don't change with each new toolkit release
– well defined: easy to document precise steps
– easy to port: already migrated to iOS (Objective-C) for TB Mobile app
• Provides core basis feature for CDD open source model service
•Clark et al., J Cheminformatics 6: 38 (2014)
Testing the fingerprints – comparing to published data
Dataset Leave one out ROC Published
Reference Leave one out ROC in this study
Combined model (5304
molecules) ECFP_6
fingerprints
N/A N/A 0.77
Combined model (5304
molecules) FCFP_6
fingerprints
0.71 J Chem Inf
Model
53:3054-
3063.
0.77
MLSMR dual event model
(2273 molecules) and
ECFP_6 fingerprints
N/A N/A 0.84
MLSMR dual event model
(2273 molecules) and
FCFP_6 fingerprints
0.86 PLOSONE
8:e63240
0.83
Published models also include 8 additional descriptors as well as fingerprints
•Clark et al., J Cheminformatics 6: 38 (2014)
Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6
fingerprints; (b) modified Bayesian estimators for active and inactive
compounds; (c) structures of selected binders.
For each listed target with at least two binders, it is first assumed that all of the
molecules in the collection that do not indicate this as one of their targets are
inactive.
In the app we used ECFP_6 fingerprints
Building Bayesian models for each target
Predict targetsCluster molecules
Open in MMDS
Bayesian predictions, data export and clustering
Clark et al., J Cheminformatics, 6: 38 (2014)
Draw structures either in app, paste or open from
other apps e.g. MMDS
TB Mobile ranks content
TB Mobile can use built in target Bayesian models to
predict target
Take a screenshot of results
Output bayesian model predictions to MMDS
Compare to published data
Annotate results, tabulate
Process used to evaluate TB Mobile
We have curated an additional set of 20 molecules that have activity
against Mtb and were identified by HTS or other methods
Several targets were not in the database
Molecules active against Mtb evaluated in TB Mobile app
•Clark et al., J Cheminformatics 6: 38 (2014)
Continue to update with more data
Outreach to increase awareness of app and data
Add machine learning algorithms to predict Mtb activity
(in vitro and in vivo whole cell activity)
Could we appify similar target data for other neglected
diseases/ targets e.g. malaria
What next ?
Copyright © 2013 All Rights Reserved Collaborative Drug Discovery
MM4TB: 25 organizations
New
Old
Neuroscience
Kinetoplastid Drug Development
Consortium
Exposure of CDD content from collaboration with SRI
More visibility for brand in new places
Experiment in small database with focus on content
delivery
A functional app to reach scientists that may not have
cheminformatics or bioinformatics training
Used in several collaborations e.g. at Rutgers etc.
Test bed for developing technologies for main product
Lead to CDD Models
on Github look for the tools section, for class
org.openscience.cdk.fingerprint.model.Bayesian.
Opened door to models in other apps
Benefits of creating TB Mobile
Papers published on TB Mobile or using it
Ekins et al., Tuberculosis 94: 162-169 (2014)
Ekins et al., J Chem Inf Model 53: 3054 (2013)
Clark et al., J Cheminformatics 6:38 (2014)
Ekins et al., J Cheminformatics 5:13 (2013)
Krishna Dole and all at CDD, SRI, IDRI and many others …Funding: 2R42AI088893-02 NIAID, NIH;
9R44TR000942-02 NCATS, NIH; CDD TB has been developed thanks to funding from the Bill and Melinda
Gates Foundation (Grant#49852)
Freundlich Lab
Drop by and visit CDD in Booth 1155.
Alex Malabika
App papers
• Ekins S, Clark AM, Swamidass SJ, Litterman N and Williams AJ, Bigger data, collaborative tools and the future of predictive drug discovery, J Comp-Aided Mol Design, 54:2157-65, 2014.
• Clark AM, Sarker M and Ekins S, New target predictions and visualization tools incorporating open source molecular fingerprints for TB Mobile 2.0, J Cheminform 6: 38, 2014.
• Ekins S, Casey A.C, Roberts D, Parish T. and Bunin BA, Bayesian Models for Screening and TB Mobile for Target Inference with Mycobacterium tuberculosis, Tuberculosis, 94:162-9, 2014.
• Clark AM, Williams AJ and Ekins S, Cheminformatics workflows using mobile apps, Chem-Bio Informatics Journal, 13: 1-18 2013.
• Ekins S, Clark AM and Williams AJ, Incorporating Green Chemistry Concepts into Mobile Applications and their potential uses, ACS Sustain Chem Eng, 1. 8-13, 2013.
• Ekins S, Freundlich JS and Reynolds RC, Fusing dual-event datasets for Mycobacterium Tuberculosis machine learning models and their evaluation, J Chem Inf Model, 53(11):3054-63, 2013.
• Ekins S, Clark AM and Williams AJ, Open Drug Discovery Teams: A Chemistry Mobile App for Collaboration, Mol Informatics, 31: 585-597, 2012.
• Clark, AM, Ekins S and Williams AJ, Redefining cheminformatics with intuitive collaborative mobile apps, Mol Informatics, 31: 569-584, 2012.
• Williams, AJ, Ekins S, Clark AM, Jack JJ and Apodaca RL, Mobile apps for chemistry in the world of drug discovery, Drug Disc Today, 16:928-939, 2011.
14 First line drugs active against Mtb evaluated in TB Mobile app and the top 3 molecules shown
Confirms all in TB Mobile and retrieved
Predicted targets of GSK TB hits months earlier using TB Mobile
GSK report hits Dec 2012
24th Jan 2013 http://goo.gl/9LKrPZ
GSK predict targets Oct 2013
PCA of 745 compounds
with Mtb targets (blue)
and 1200 Mtb active and
non cytotoxic hits
compounds (yellow)
Chemical property space of TB Mobile compounds
Ekins et al., Tuberculosis 94: 162-169 (2014)
PCA of 745 compounds
with Mtb targets (blue)
and 177 GSK Mtb leads
(yellow)
Ekins et al., J Chem Inf Model 53: 3054 (2013)