Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models

Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using

Bayesian Models

Sean Ekins1, 2*, Robert C. Reynolds3,4, Baojie Wan5 Scott G. Franzblau5, Joel S. Freundlich6,7and Barry A. Bunin1

1Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.2Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.3Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA. 4Current address: University of Alabama at Birmingham, College of Arts and Sciences , Department of Chemistry, 1530 3 rd Avenue South, Birmingham, Alabama 35294-1240, USA.5 Institute for Tuberculosis Research, University of Illinois at Chicago, Chicago, IL 60607, USA 6Department of Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.7Department of Pharmacology & Physiology, UMDNJ – New Jersey Medical School, 185 South Orange Avenue Newark, NJ 07103, USA.

.

Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!!

Multi drug resistance in 4.3% of cases Extensively drug resistant increasing incidence One new drugs in over 40 yrs Drug-drug interactions and Co-morbidity with HIV

Collaboration between groups is rare These groups may work on existing or new targets Use of computational methods with TB is rare

Applying CDD to Build a disease community for TB

~ 20 public datasets for TBIncluding Novartis data on TB hits >300,000 cpds

Patents, Papers Annotated by CDD

Open to browse by anyone

http://www.collaborativedrug.com/

register

Ekins et al,Trends in Microbiology

19: 65-74, 2011

Fitting into the drug discoveryprocess

HTS Hit rates

SRI papers

Usually less than 1%

ProviderCompound

Library

Number of

compounds

Inhibitor

concentration (ug/ml

or uM)

ReadoutHit rate (%) at 90%

Inhibition

ChemBridge Novacore 50,000 30 uMLuminescence

(LuxAB)4.55

Asinex Diverse 59,760 50 uMLuminescence

(LuxAB)1.91

ASDI 6,811 30 uMLuminescence

(LuxAB)2.73

Prestwick 1,120 20 ug/ml Luminescence (ATP) 20.6

Fluorescence (MABA) 16.07

MRCT 100,000 10 uMLuminescence

(LuxABCDE)0.67

UIC hit rates

Wasting data?

Information from these inefficient and expensive HTS campaigns does not appear to have been used to direct “informed” selection of new libraries in subsequent screens and compound optimization in TB drug discovery

How can we continuously learn from all the data?

Bayesian machine learning

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

Bayesian classification is a simple probabilistic classification model. It is based on Bayes’ theorem

h is the hypothesis or modeld is the observed datap(h) is the prior belief (probability of hypothesis h before observing any data)p(d) is the data evidence (marginal probability of the data)p(d|h) is the likelihood (probability of data d if hypothesis h is true) p(h|d) is the posterior probability (probability of hypothesis h being true given the observed data d)

A weight is calculated for each feature using a Laplacian-adjusted probability estimate to account for the different sampling frequencies of different features.

The weights are summed to provide a probability estimate

Top scoring molecules assayed forMtb growth inhibition

Mtb screening molecule database

High-throughputphenotypic

Mtb screening

Descriptors + Bioactivity

Bayesian Machine Learning Mtb Model

Molecule Database (e.g. GSK malaria actives)

virtually scored using Bayesian Models

New bioactivity datamay enhance models

Identify in vitro hits

Increased hit/lead discovery efficiency

Process – Bioactivity only

Bayesian Classification TB Models

Dateset (number of molecules)

External ROC Score

Internal ROC

Score Concordance Specificity Sensitivity

MLSMR All single point

screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26

MLSMR dose response set

(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

We can use the public data for machine learning model buildingUsing Discovery Studio Bayesian modelLeave out 50% x 100

Ekins et al., Mol BioSyst, 6: 840-851, 2010

Bayesian Classification Models for TB

G1: 1704324327

73 out of 165 good Bayesian Score: 2.885

G2: -2092491099 57 out of 120 good

Bayesian Score: 2.873

G3: -1230843627


G4: 940811929


G5: 563485513


B1: 1444982751

0 out of 1158 good Bayesian Score: -3.135

B2: 274564616


B3: -1775057221 0 out of 982 good

Bayesian Score: -2.978

B4: 48625803


B5: 899570811


Good

Bad

active compounds with MIC < 5uM

Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds


Bayesian Classification Dose response

Good

Bad


Both models substantially better than the random hit rate for identifying known active compounds with MIC 5 uM in the first 1000 compounds sorted by the Bayesian model scores

The number of active compoundswas substantially larger in the NIAID dataset (1871 out of3748) versus the GVKbio dataset (377 out of 2880),


Initial testing of Mtb Bayesian models using NIAID and GVKbio data

100K library Novartis Data FDA drugs

Additional test sets

Suggests models can predict data from the same and independent labsEnrichments 4-10 foldInitial enrichment – enables screening few compounds to find actives

21 hits in 2108 cpds34 hits in 248 cpds1702 hits in >100K cpds

Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.Ekins et al., Mol BioSyst, 6: 840-851, 2010

Dual-Event models

Become more stringent in what we call an ACTIVE

IC90 < 10 uM and a selectivity index (SI) greater than ten. SI was calculated as SI = CC50/IC90 where CC50 is the concentration that resulted in 50% inhibition of Vero cells (CC50).

Top scoring molecules assayed forMtb growth inhibition

Mtb screening molecule database

High-throughputphenotypic

Mtb screening

Descriptors + Bioactivity (+Cytotoxicity)

Bayesian Machine Learning Mtb Model

Molecule Database (e.g. GSK malaria actives)

virtually scored using Bayesian Models

New bioactivity datamay enhance models

Identify in vitro hits

Increased hit/lead discovery efficiency

Dual-Event models

Bayesian Classification TB Models


External ROC

Score

Internal ROC


MLSMR All single point

screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26

MLSMR dose response set

(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

NEW Dose resp and cytotoxicity (N =

2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47

Single pt ROC XV AUC = 0.88Dose resp = 0.78Dose resp + cyto = 0.86

Ekins et al., PLOSONE, in press 2013

Good

bad

MLSMR dual event model


A new dataset to model

Models with SRI kinase data

Model 1 ROC XV AUC (N 23797) = 0.89Model 2 (N 1248) = 0.72Model 3 (N 1248) = 0.77

Leave out 50% x 100


External ROC Score

Internal ROC


Model 1(N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96

Model 2(N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24

Model 3(N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84

64.84 ± 12.11


Testing to date has been retrospective

Can we use our models to select compounds and influence design?

Prospective prediction

Do it enough times to show robustness

MLSMR dose response with cytotoxicity and the TAACF kinase dose response with cytotoxicity models were used to screen the

Asinex library (N = 25,008)

Maybridge library (N = 57,200)

Selleck Chemicals kinase library (N = 194)

Testing prospectively

Results - Asinex library

94 molecules selected with the MLSMR dose response and cytotoxicity model

88 with the library based on kinase inhibitor scaffolds with cytotoxicity model and were tested at a single concentration.

8 (MLSMR) and 19 hits (kinase) with > 90% inhibition at 100 ug/ml (8.5% and 21.5% hit rates)

Results - Maybridge library

50 molecules had greater than or equal to 90% inhibition at 100 ug/ml (28.7% hit rate) - 8 with good SI


Asinex and MLSMR actives PCA


Examples of selective and active

compounds with MIC <10 ug/ml

JFD02381

98.9 95 5.84 10.09 >100 25.27 (0.80)

12.79 (0.5)

JFD02382

91.5 90.1 > 100 47.99 >100 18.32 (0.69)

9.78 (0.43)

O

O

O

O

OH

OCH3

OO

O

O

OH

O

CH3

CH3

Maybridge

number

Structure Inhibition %

MABA at 100

g g/ml

Inhibition %

LORA at 100

g g/ml

MIC

MABA

(g/ml)

MIC LORA

(g g/ml)

CC50 Vero

(g g/ml)

MLSMR

model

score

Kinase

model

score

An example of the model ranking similar compounds

Analysis of SelleckChem Kinase library N=194

47 molecules greater than or equal to 90% inhibition of M. tuberculosis activity, at 100ug/ml

hit rate of 24.2%.

Note best model was another dual activity model (Ekins et al., Chem Biol 20: 370-378, 2013)


Kinase inhibitors active vs Mtb

SI not ideal– several other weaker actives are approved drugs

A summary of the numbers involved – filtering for hits.

82,403 molecules screened through Bayesian models

550 molecules were tested in vitro

124 actives were identified

22.5 % hit rate

Identified several novel potent lead series with good cytotoxicity & selectivity

Identified known human kinase inhibitors and FDA approved drugs as new hits

Conclusions

Still difficult to identify molecules with bioactivity and no cytotoxicity

Models perform differently on different data sets

Need to understand what factors are key

Hit rate much higher than HTS / screen a fraction of molecules

Computational models should be used prior to HTS

Focus resources

Acknowledgments

The project described was supported by Award Number R43 LM011152-01 “Biocomputation across distributed private datasets to enhance drug discovery” from the National Library of Medicine (PI: S. Ekins)

Accelrys

The CDD TB has been developed thanks to funding from the Bill and Melinda Gates Foundation (Grant#49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing”)

Allen Casey (IDRI)

You can find me @... CDD Booth 205

PAPER ID: 13433PAPER TITLE: “Dispensing processes profoundly impact biological assays and computational and statistical analyses”April 8th 8.35am Room 349

PAPER ID: 14750PAPER TITLE: “Enhancing High Throughput Screening For Mycobacterium tuberculosis Drug Discovery Using Bayesian Models” April 9th 1.30pm Room 353PAPER ID: 21524

PAPER TITLE: “Navigating between patents, papers, abstracts and databases using public sources and tools”April 9th 3.50pm Room 350PAPER ID: 13358

PAPER TITLE: “TB Mobile: Appifying Data on Anti-tuberculosis Molecule Targets”April 10th 8.30am Room 357

PAPER ID: 13382PAPER TITLE: “Challenges and recommendations for obtaining chemical structures of industry-provided repurposing candidates”April 10th 10.20am Room 350

PAPER ID: 13438PAPER TITLE: “Dual-event machine learning models to accelerate drug discovery”April 10th 3.05 pm Room 350

Health & Medicine

Enhancing high throughput screeing for mycobacterium tuberculosis drug discovery using bayesian models