Upload
sean-ekins
View
108
Download
0
Tags:
Embed Size (px)
DESCRIPTION
ACS talk 2013
Citation preview
Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using
Bayesian Models
Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5, Joel S. Freundlich6,7and Barry A. Bunin1
1Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.2Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.3Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA.5 Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA 6Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.7Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.
.
Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!!
Multi drug resistance in 4.3% of cases Extensively drug resistant increasing incidence One new drugs in over 40 yrs Drug-drug interactions and Co-morbidity with HIV
Collaboration between groups is rare These groups may work on existing or new targets Use of computational methods with TB is rare
Applying CDD to Build a disease community for TB
~ 20 public datasets for TBIncluding Novartis data on TB hits >300,000 cpds
Patents, Papers Annotated by CDD
Open to browse by anyone
http://www.collaborativedrug.com/
register
Ekins et al,Trends in Microbiology
19: 65-74, 2011
Fitting into the drug discoveryprocess
HTS Hit rates
SRI papers
Usually less than 1%
ProviderCompound
Library
Number of
compounds
Inhibitor
concentration (ug/ml
or uM)
ReadoutHit rate (%) at 90%
Inhibition
ChemBridge Novacore 50,000 30 uMLuminescence
(LuxAB)4.55
Asinex Diverse 59,760 50 uMLuminescence
(LuxAB)1.91
ASDI 6,811 30 uMLuminescence
(LuxAB)2.73
Prestwick 1,120 20 ug/ml Luminescence (ATP) 20.6
Fluorescence (MABA) 16.07
MRCT 100,000 10 uMLuminescence
(LuxABCDE)0.67
UIC hit rates
Wasting data?
Information from these inefficient and expensive HTS campaigns does not appear to have been used to direct “informed” selection of new libraries in subsequent screens and compound optimization in TB drug discovery
How can we continuously learn from all the data?
Bayesian machine learning
Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
Bayesian classification is a simple probabilistic classification model. It is based on Bayes’ theorem
h is the hypothesis or modeld is the observed datap(h) is the prior belief (probability of hypothesis h before observing any data)p(d) is the data evidence (marginal probability of the data)p(d|h) is the likelihood (probability of data d if hypothesis h is true) p(h|d) is the posterior probability (probability of hypothesis h being true given the observed data d)
A weight is calculated for each feature using a Laplacian-adjusted probability estimate to account for the different sampling frequencies of different features.
The weights are summed to provide a probability estimate
Top scoring molecules assayed forMtb growth inhibition
Mtb screening molecule database
High-throughputphenotypic
Mtb screening
Descriptors + Bioactivity
Bayesian Machine Learning Mtb Model
Molecule Database (e.g. GSK malaria actives)
virtually scored using Bayesian Models
New bioactivity datamay enhance models
Identify in vitro hits
Increased hit/lead discovery efficiency
Process – Bioactivity only
Bayesian Classification TB Models
Dateset (number of molecules)
External ROC Score
Internal ROC
Score Concordance Specificity Sensitivity
MLSMR All single point
screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26
MLSMR dose response set
(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96
We can use the public data for machine learning model buildingUsing Discovery Studio Bayesian modelLeave out 50% x 100
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Classification Models for TB
G1: 1704324327
73 out of 165 good Bayesian Score: 2.885
G2: -2092491099 57 out of 120 good
Bayesian Score: 2.873
G3: -1230843627
75 out of 188 good Bayesian Score: 2.811
G4: 940811929
35 out of 65 good Bayesian Score: 2.780
G5: 563485513
123 out of 357 good Bayesian Score: 2.769
B1: 1444982751
0 out of 1158 good Bayesian Score: -3.135
B2: 274564616
0 out of 1024 good Bayesian Score: -3.018
B3: -1775057221 0 out of 982 good
Bayesian Score: -2.978
B4: 48625803
0 out of 740 good Bayesian Score: -2.712
B5: 899570811
0 out of 738 good Bayesian Score: -2.709
Good
Bad
active compounds with MIC < 5uM
Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Classification Dose response
Good
Bad
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Both models substantially better than the random hit rate for identifying known active compounds with MIC 5 uM in the first 1000 compounds sorted by the Bayesian model scores
The number of active compoundswas substantially larger in the NIAID dataset (1871 out of3748) versus the GVKbio dataset (377 out of 2880),
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Initial testing of Mtb Bayesian models using NIAID and GVKbio data
100K library Novartis Data FDA drugs
Additional test sets
Suggests models can predict data from the same and independent labsEnrichments 4-10 foldInitial enrichment – enables screening few compounds to find actives
21 hits in 2108 cpds34 hits in 248 cpds1702 hits in >100K cpds
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.Ekins et al., Mol BioSyst, 6: 840-851, 2010
Dual-Event models
Become more stringent in what we call an ACTIVE
IC90 < 10 uM and a selectivity index (SI) greater than ten. SI was calculated as SI = CC50/IC90 where CC50 is the concentration that resulted in 50% inhibition of Vero cells (CC50).
Top scoring molecules assayed forMtb growth inhibition
Mtb screening molecule database
High-throughputphenotypic
Mtb screening
Descriptors + Bioactivity (+Cytotoxicity)
Bayesian Machine Learning Mtb Model
Molecule Database (e.g. GSK malaria actives)
virtually scored using Bayesian Models
New bioactivity datamay enhance models
Identify in vitro hits
Increased hit/lead discovery efficiency
Dual-Event models
Bayesian Classification TB Models
Dateset (number of molecules)
External ROC
Score
Internal ROC
Score Concordance Specificity Sensitivity
MLSMR All single point
screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26
MLSMR dose response set
(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96
NEW Dose resp and cytotoxicity (N =
2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47
Single pt ROC XV AUC = 0.88Dose resp = 0.78Dose resp + cyto = 0.86
Ekins et al., PLOSONE, in press 2013
Good
bad
MLSMR dual event model
Ekins et al., PLOSONE, in press 2013
A new dataset to model
Models with SRI kinase data
Model 1 ROC XV AUC (N 23797) = 0.89Model 2 (N 1248) = 0.72Model 3 (N 1248) = 0.77
Leave out 50% x 100
Dateset (number of molecules)
External ROC Score
Internal ROC
Score Concordance Specificity Sensitivity
Model 1(N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96
Model 2(N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24
Model 3(N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84
64.84 ± 12.11
Ekins et al., PLOSONE, in press 2013
Testing to date has been retrospective
Can we use our models to select compounds and influence design?
Prospective prediction
Do it enough times to show robustness
MLSMR dose response with cytotoxicity and the TAACF kinase dose response with cytotoxicity models were used to screen the
Asinex library (N = 25,008)
Maybridge library (N = 57,200)
Selleck Chemicals kinase library (N = 194)
Testing prospectively
Results - Asinex library
94 molecules selected with the MLSMR dose response and cytotoxicity model
88 with the library based on kinase inhibitor scaffolds with cytotoxicity model and were tested at a single concentration.
8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at 100 ug/ml (8.5% and 21.5% hit rates)
Results - Maybridge library
50 molecules had greater than or equal to 90% inhibition at 100 ug/ml (28.7% hit rate) - 8 with good SI
Ekins et al., PLOSONE, in press 2013
Asinex and MLSMR actives PCA
Ekins et al., PLOSONE, in press 2013
Examples of selective and active
compounds with MIC <10 ug/ml
JFD02381
98.9 95 5.84 10.09 >100 25.27 (0.80)
12.79 (0.5)
JFD02382
91.5 90.1 > 100 47.99 >100 18.32 (0.69)
9.78 (0.43)
O
O
O
O
OH
OCH3
OO
O
O
OH
O
CH3
CH3
Maybridge
number
Structure Inhibition %
MABA at 100
g g/ml
Inhibition %
LORA at 100
g g/ml
MIC
MABA
(g/ml)
MIC LORA
(g g/ml)
CC50 Vero
(g g/ml)
MLSMR
model
score
Kinase
model
score
An example of the model ranking similar compounds
Analysis of SelleckChem Kinase library N=194
47 molecules greater than or equal to 90% inhibition of M. tuberculosis activity, at 100ug/ml
hit rate of 24.2%.
Note best model was another dual activity model (Ekins et al., Chem Biol 20: 370-378, 2013)
Ekins et al., PLOSONE, in press 2013
Kinase inhibitors active vs Mtb
SI not ideal– several other weaker actives are approved drugs
A summary of the numbers involved – filtering for hits.
82,403 molecules screened through Bayesian models
550 molecules were tested in vitro
124 actives were identified
22.5 % hit rate
Identified several novel potent lead series with good cytotoxicity & selectivity
Identified known human kinase inhibitors and FDA approved drugs as new hits
Conclusions
Still difficult to identify molecules with bioactivity and no cytotoxicity
Models perform differently on different data sets
Need to understand what factors are key
Hit rate much higher than HTS / screen a fraction of molecules
Computational models should be used prior to HTS
Focus resources
Acknowledgments
The project described was supported by Award Number R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug discovery” from the National Library of Medicine (PI: S. Ekins)
Accelrys
The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”)
Allen Casey (IDRI)
You can find me @... CDD Booth 205
PAPER ID: 13433PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and statistical analyses”April 8th 8.35am Room 349
PAPER ID: 14750PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models” April 9th 1.30pm Room 353PAPER ID: 21524
PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and tools”April 9th 3.50pm Room 350PAPER ID: 13358
PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”April 10th 8.30am Room 357
PAPER ID: 13382PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates”April 10th 10.20am Room 350
PAPER ID: 13438PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”April 10th 3.05 pm Room 350