Upload
sean-ekins
View
1.020
Download
0
Embed Size (px)
DESCRIPTION
Talk given at St Jude Children's Research Hospital 18 November
Citation preview
Collaboration in Pharmaceutical Research: From Neglected Diseases to ADME/Tox
Sean Ekins
Collaborations in Chemistry, Fuquay Varina, NC.Collaborative Drug Discovery, Burlingame, CA.
Department of Pharmacology, University of Medicine & Dentistry of New Jersey-Robert Wood Johnson Medical School, Piscataway, NJ.
School of Pharmacy, Department of Pharmaceutical Sciences, University of Maryland, Baltimore, MD.
In the long history of human kind (and animal kind, too) those who have learned to collaborate and improvise most
effectively have prevailed.
Charles Darwin
Outline
Introduction Collaborative Drug Discovery TB Collaborations and Drug Discovery Research Open ADME Models Repurposing FDA approved drugs The Future – Mobile Apps for Drug Discovery
Open Innovation Open innovation is a paradigm that assumes that firms can and should use external ideas as well as
internal ideas, and internal and external paths to market, as the firms look to advance their technology
Chesbrough, H.W. (2003). Open Innovation: The new imperative for creating and profiting from technology.
Boston: Harvard Business School Press, p. xxiv
Collaborative Innovation A strategy in which groups partner to create a product - drive the efficient allocation of R&D
resources. Collaborating with outsiders-including customers, vendors and even competitors-a company is able to import lower-cost, higher-quality ideas from the best sources in the world.
e.g. Innocentive, crowdsourcing
Open SourceCompanies can donate their patents to an independent organization, put them in a common
pool or grant unlimited license use to anybody.
e.g. GSK malaria data, Novartis TB data
Some Definitions
How to do it better?
What can we do with software to facilitate it ?
The future is more collaborative
We have tools but need integration
• Groups involved traverse the spectrum from pharma, academia, not for profit and government
• More free, open technologies to enable biomedical research• Precompetitive organizations, consortia..
A starting point for collaboration
A core root of the current inefficiencies in drug discovery are due to organizations’ and individual’s barriers to collaborate effectivelyBunin & Ekins DDT
16: 643-645, 2011
Major collaborative grants in EU: Framework, IMI …NIH moving in same direction?
Cross continent collaboration CROs in China, India etc – Pharma’s in US / Europe
More industry – academia collaboration ‘not invented here’ a thing of the past
More effort to go after rare and neglected diseases -Globalization and connectivity of scientists will be key –
Current pace of change in pharma may not be enough.
Need to rethink how we use all technologies & resources…
Collaboration is everywhere
Hardware is getting smaller
1930’s
1980s
1990s
Room size
Desktop size
Not to scale and not equivalent computing power – illustrates mobility
Laptop
Netbook
Phone
Watch
2000s
Models and software becoming more accessible- free, precompetitive efforts - collaboration
Free tools are proliferating
Typical Lab: The Data Explosion Problem & Collaborations
DDT Feb 2009
Collaborative Drug Discovery Platform
• CDD Vault – Secure web-based place for private data – private by default
• CDD Collaborate – Selectively share subsets of data
• CDD Public –public data sets - Over 3 Million compounds, with molecular properties, similarity and substructure searching, data plotting etc
will host datasets from companies, foundations etc
vendor libraries (Asinex, TimTec, ChemBridge)
• Unique to CDD – simultaneously query your private data, collaborators’ data, & public data, Easy GUI
www.collaborativedrug.com
CDD: Single Click to Key Functionality
CDD: Mining across projects and datasets
Tuberculosis Kills 1.6-1.7m/yr (~1 every 8 seconds) 1/3rd of worlds population infected!!!!
Multi drug resistance in 4.3% of cases Extensively drug resistant increasing incidence No new drugs in over 40 yrs Drug-drug interactions and Co-morbidity with HIV
Collaboration between groups is rare These groups may work on existing or new targets Use of computational methods with TB is rare Literature TB data is not well collated (SAR)
Funded by Bill and Melinda Gates Foundation
Applying CDD to Build a disease community for TB
~ 20 public datasets for TBIncluding Novartis data on TB hits
>300,000 cpds
Patents, PapersAnnotated by CDD
Open to browse by anyone
http://www.collaborativedrug.com/
register
Molecules with activity against
CDD is a partner on a 5 year project supporting >20 labs and providing cheminformatics support www.mm4tb.org
More Medicines for Tuberculosis
Ekins et al,Trends in Microbiology
19: 65-74, 2011
Fitting into the drug discoveryprocess
Searching for TB molecular mimics; collaboration
Lamichhane G, et al Mbio, 2: e00301-10, 2011
Modeling – CDDBiology – Johns HopkinsChemistry – Texas A&M
Simple descriptor analysis on > 300,000 compounds tested vs TB
Dataset MWT logP HBD HBA RO 5Atom count PSA RBN
MLSMR
Active ≥ 90% inhibition at 10uM (N = 4096)
357.10 (84.70)
3.58 (1.39)
1.16 (0.93)
4.89 (1.94)
0.20 (0.48)
42.99 (12.70)
83.46 (34.31)
4.85 (2.43)
Inactive < 90% inhibition at 10uM (N = 216367)
350.15 (77.98)**
2.82 (1.44)**
1.14 (0.88)
4.86 (1.77)
0.09 (0.31)**
43.38 (10.73)
85.06 (32.08)
*4.91
(2.35)
TAACF-NIAID CB2
Active ≥ 90% inhibition at 10uM (N =1702)
349.58(63.82)
4.04(1.02)
0.98(0.84)
4.18(1.66)
0.19(0.40)
41.88(9.44)
70.28(29.55)
4.76(1.99)
Inactive < 90% inhibition at 10uM (N =100,931)
352.59(70.87)
3.38(1.36)**
1.11(0.82)**
4.24(1.58)
0.12(0.34)**
42.43(8.94)*
77.75(30.17)
**4.72
(1.99)
Novartis aerobic and anaerobic TB hits
Anaerobic compounds showed statistically different and higher mean descriptor property values compared with the aerobic hits
(e.g. molecular weight, logP, hydrogen bond donor, hydrogen bond acceptor, polar surface area and rotatable bond number)
The mean molecular properties for the Novartis compounds are in a similar range to the MLSMR and TAACF-NIAID CB2 hits
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.
Bayesian machine learning
Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010
Bayesian classification is a simple probabilistic classification model. It is based on Bayes’ theorem
h is the hypothesis or modeld is the observed datap(h) is the prior belief (probability of hypothesis h before observing any data)p(d) is the data evidence (marginal probability of the data)p(d|h) is the likelihood (probability of data d if hypothesis h is true) p(h|d) is the posterior probability (probability of hypothesis h being true given the observed data d)
A weight is calculated for each feature using a Laplacian-adjusted probability estimate to account for the different sampling frequencies of different features.
The weights are summed to provide a probability estimate
Bayesian Classification Models for TB
G1: 1704324327
73 out of 165 good Bayesian Score: 2.885
G2: -2092491099 57 out of 120 good
Bayesian Score: 2.873
G3: -1230843627
75 out of 188 good Bayesian Score: 2.811
G4: 940811929
35 out of 65 good Bayesian Score: 2.780
G5: 563485513
123 out of 357 good Bayesian Score: 2.769
B1: 1444982751
0 out of 1158 good Bayesian Score: -3.135
B2: 274564616
0 out of 1024 good Bayesian Score: -3.018
B3: -1775057221 0 out of 982 good
Bayesian Score: -2.978
B4: 48625803
0 out of 740 good Bayesian Score: -2.712
B5: 899570811
0 out of 738 good Bayesian Score: -2.709
Good
Bad
active compounds with MIC < 5uM
Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Classification Dose response
Good
Bad
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Classification TB Models
Dateset (number of molecules)
External ROC Score
Internal ROC
Score Concordance Specificity Sensitivity
MLSMR All single point
screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26
MLSMR dose response set
(N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96
Leave out 50% x 100
Ekins et al., Mol BioSyst, 6: 840-851, 2010
100K library Novartis Data FDA drugs
External Test sets
Suggests models can predict data from the same and independent labs
Initial enrichment – enables screening few compounds to find actives
21 hits in 2108 cpds34 hits in 248 cpds1702 hits in >100K cpds
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.Ekins et al., Mol BioSyst, 6: 840-851, 2010
Bayesian Models Generated with kinase data [1] - - (blind testing of previous models showed 3-4 fold enrichment ) Models were built as described previously [2] 1.Data for single point screening (cut off for activity % inhibition at 10uM >or equal to 90%) 2.IC50 data Cut off for active = or equal to 5uM 3.IC90 data Cut off for active = or equal to 10uM and vero cell selectivity index greater or equal to 10. [1] Reynolds RC, et al. Tuberculosis (Edinburgh, Scotland) 2011 In Press.
[2] Ekins S, et al.,Mol BioSystems 2010;6:840-51.
Models with SRI kinase library data
Models with SRI kinase library data
Model 1 ROC XV AUC (N 23797) = 0.89Model 2 (N 1248) = 0.72Model 3 (N 1248) = 0.77
Leave out 50% x 100
Adding cytotoxicity data improves models
Dateset (number of molecules)
External ROC Score
Internal ROC Score Concordance Specificity Sensitivity
Model 1(N = 23797) 0.87 ± 0 0.88 ± 0 76.77 ± 2.14 76.49 ± 2.41 81.7 ± 2.96
Model 2(N = 1248) 0.65 ± 0.01 0.70 ± 0.01 61.58 ± 1.56 61.85 ± 8.45 61.30 ± 8.24
Model 3(N=1248) 0.74 ± 0.02 0.75 ± 0.02 68.67 ± 6.88 69.28 ± 9.84 64.84 ± 12.11
Bayesian Classification TB Models
Dateset (number of molecules)
External ROC Score
Internal ROC Score Concordance Specificity Sensitivity
MLSMR All single point screen
(N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26
MLSMR dose response set (N =
2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96
NEW Dose resp and cytotoxicity (N = 2273) 0.82 ± 0.02 0.84 ± 0.02 82.61 ± 4.68 83.91 ± 5.48 65.99 ± 7.47
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Single pt ROC XV AUC = 0.88Dose resp = 0.78Dose resp + cyto = 0.86
Leave out 50% x 100
Combining cheminformatics methods and pathway analysis Identified essential TB targets that had not been exploited Used resources available to both to identify targets and molecules that
mimic substrates Computationally searched >80,000 molecules - tested 23 compounds in
vitro (3 picked as inactives), lead to 2 proposed as mimics of D-fructose 1,6 bisphosphate, (MIC of 20 and 40 ug/ml)
POC took < 6mths - - Submitted phase II STTR, Submitted manuscript Still need to test vs target - verify it hits suggested target
Ekins et al,Trends in Microbiology Feb 2011
Phase I STTR - NIAID funded collaboration with Stanford Research International
Sarker et al, submitted 2011
Malaria data in CDD
> 22,000 compounds
Including datasets from Dr.
Guy’s group
Ekins, Hohman and Bunin in:Collaborative Computational Technologies for Biomedical Research, Edited by Sean Ekins, Maggie A. Z. Hupcey, Antony J. Williams.Published 2011 by John Wiley & Sons, Inc
Other datasets
http://www.slideshare.net/ekinsseanEkins S and Williams AJ, MedChemComm, 1: 325-330, 2010.
Analysis of malaria and TB datasets
Multiple antimalarial datasets
Ekins and Williams Drug Disc Today 15; 812-815, 2010 Ekins and Williams, MedChemComm, 1: 325-330, 2010.
Dataset MW logP HBD HBA Lipinski rule of 5 alerts
PSA (Å2) RBN
GSK data (N = 13,471) 478.2 ± 114.3 4.5 ± 1.6 1.8 ± 1.0 5.6 ± 2.0 0.8 ± 0.8 76.8 ± 30.0 7.2 ± 3.4
St Jude (N = 1524) 385.3 ± 71.2 3.8 ± 1.6 1.1 ± 0.8 4.9 ± 1.8 0.2 ± 0.4 72.2 ±29.3 5.2 ±2.3
Novartis (N = 5695) 398.2 ± 105.3 3.7 ± 2.0 1.2 ± 1.1 4.7 ± 2.1 0.4 ± 0.7 74.7 ± 37.9 5.6 ± 3.0
Johns Hopkins All FDA drugs (N = 2615)
349.1 ± 355.8 1.2 ± 3.4 2.4 ± 4.6 5.1 ± 5.5 0.3 ± 0.8 96.0 ±139.8 5.4 ± 9.6
Johns Hopkins Subset > 50% malaria inhibition at 96h (N = 165)
458.0 ± 298.6 2.2 ± 2.7 2.1 ± 3.4 5.4 ± 4.7 0.6 ± 0.9 90.6 ± 104.4 7.1 ± 7.7
Antimalarial drugs (N = 14)
341.6 ± 67.0 3.8 ± 1.6 1.8 ± 1.0 5.3 ± 1.5 0.2 ± 0.6 53.4 ± 21.2 5.8 ± 3.0
Screening hits in total are not ‘lead-like’ (MW < 350, LogP< 3) closest to ‘natural product lead-like’. Although GSK suggests that the compounds are “drug-like” the evidence for this is weak
Antimalarial Compound libraries and filter failures
Ekins and Williams Drug Disc Today 15; 812-815, 2010
0
20
40
60
80
100G
SK
(13
,35
5)
St J
ud
e(1
52
4)
No
vart
is(5
69
5)
FD
A d
rug
s(1
04
1)
An
tima
lari
al
dru
gs
(14
)
Abbott Alerts
Pfizer Lint Alerts
GSK Alerts
% F
ailu
reFiltering using SMARTs filters to remove thiol reactives, false positives etc at University of New Mexico (http://pasilla.health.unm.edu/tomcat/biocomp/smartsfilter)
TB Compound libraries and filter failures
Filtering using SMARTs filters to remove thiol reactives, false positives etc at University of New Mexico (http://pasilla.health.unm.edu/tomcat/biocomp/smartsfilter)
Ekins et al., Mol Biosyst, 6: 2316-2324, 2010
0
20
40
60
80
100%
Fa
ilu
re
TB
Ma
dd
ry (
90
)
TB
An
an
tha
n (
16
0)
TB
dru
gs
(13
)
US
an
tibio
tics
(16
3)
FD
A d
rug
s (1
04
1)
Abbott Alerts
Pfizer Lint Alerts
GSK alerts
Correlation between the number of SMARTS filter failures and the number of Lipinski violations for different types of rules sets with FDA drug set from CDD (N = 2804)
Suggests # of Lipinski violations may also be an indicator of undesirable chemical features that result in reactivity
Correlations
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.
Summary
Computational models based on Whole cell TB data could improve efficiency of screening
Collaborations get us to interesting compounds quickly
Availability of datasets enable analysis that could suggest simple rules
Active compounds vs Mtb and P. Falciparum have higher mean molecular weights and logP values
A high proportion of compounds fail the Abbott filters for reactivity when compared to drugs and antimalarials
Understanding the chemical properties and characteristics of compounds = better compounds for lead optimization.
St Jude and Novartis datasets should be screened vs Mtb as their property space is close to TB actives
Rare and Neglected disease researchers lack ADME/Tox insights
Could all pharmas share their data as models with each other?
Increasing Data & Model Access
Ekins and Williams, Lab On A Chip, 10: 13-22, 2010.
The big idea
Challenge..There is limited access to ADME/Tox data and models needed for R&D
How could a company share data but keep the structures proprietary?
Sharing models means both parties use costly software What about open source tools? Pfizer had never considered this - So we proposed a
study and Rishi Gupta generated models
What can be developed with very large training and test sets?
HLM training 50,000 testing 25,000 molecules
training 194,000 and testing 39,000
MDCK training 25,000 testing 25,000
MDR training 25,000 testing 18,400
Open molecular descriptors / models vs commercial descriptors
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
Open source tools for modeling
Massive Human liver microsomal stability model
HLM Model with CDK and SMARTS Keys:
HLM Model with MOE2D and SMARTS Keys
# Descriptors: 578 Descriptors# Training Set compounds: 193,650
Cross Validation Results: 38,730 compounds
Training R2: 0.79
20% Test Set R2: 0.69
Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367
Continuous Categorical:κ = 0.40Sensitivity = 0.16Specificity = 0.99PPV = 0.80Time (sec/compound): 0.252
# Descriptors: 818 Descriptors# Training Set compounds: 193,930
Cross Validation Results: 38,786 compounds
Training R2: 0.77
20% Test Set R2: 0.69
Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367
Continuous Categorical: κ = 0.42Sensitivity = 0.24Specificity = 0.987PPV = 0.823Time (sec/compound): 0.303
PCA of training (red) and test (blue) compounds
Overlap in Chemistry space
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
RRCK Permeability and MDRC5.0 RRCK Permeability
C5.0 MDR
CDK descriptors Kappa = 0.47Sensitivity = 0.59Specificity = 0.93PPV = 0.67
Kappa = 0.62Sensitivity = 0.85Specificity = 0.77PPV = 0.83
MOE2D and SMARTS Keys
Kappa = 0.53Sensitivity = 0.64Specificity = 0.94PPV = 0.72(Baseline)
Kappa = 0.67Sensitivity = 0.86Specificity = 0.80PPV = 0.85(Baseline)
CDK and SMARTS Keys
Kappa = 0.50Sensitivity = 0.62Specificity = 0.94PPV = 0.68
Kappa = 0.65Sensitivity = 0.86Specificity = 0.78PPV = 0.84
Open descriptors results almost identical to commercial descriptors
Across many datasets and quantitative and qualitative dataSmaller solubility datasets give similar results
Provides confidence that open models could be viable
MDCK training 25,000 testing 25,000
MDR training 25,000 testing 18,400
Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010
Merck KGaA
Combining models may give greater coverage of ADME/ Tox chemistry space and improve predictions?
Lundbeck
Pfizer
Merck
GSK
Novartis
Lilly
BMS
Allergan Bayer
AZ
Roche BI
Merk KGaA
Model coverage of chemistry space
Next steps
ADME/Tox Data crosses diseases Potential to share models selectively with collaborators e.g.
academics, neglected disease researchers We used the proof of concept to submit an SBIR
“Biocomputation across distributed private datasets to enhance drug discovery”
Develop prototype for sharing models securely- collaborate to show how combining data for TB etc could improve models
Phase II- develop a commercial product that leverages CDD Engage Pistoia Alliance to expand concept to many
companies – in progress
Open source software for molecular descriptors and algorithms Spend only a fraction of the money on QSAR Selectively share your models with collaborators and control access Have someone else host the models / predictions
The next opportunities for crowdsourcing…
Inside company
Collaborators
Current investments>$1M/yr
>$10-100’s M/yr
Inside Company
Collaborators
Inside Academia
Collaborators
Molecules, Models, Data Molecules, Models, Data
Inside Foundation
Collaborators
Molecules, Models, Data
Inside Government
Collaborators
Molecules, Models, Data
IP
IP
IP
IP
SharedIP
Collaborative platform/s
Bunin & Ekins DDT 16: 643-645, 2011
A complex ecosystem of collaborations: A new business model
Finding Promiscuous Old Drugs for New Uses
Research published in the last six years - 34 studies - Screened libraries of FDA approved drugs against various whole cell or target assays.
1 or more compounds with a suggested new bioactivity
13 drugs were active against more than one additional disease in vitro
Finding Promiscuous Old Drugs for New Uses
109 molecules were identified by screening in vitro
Statistically more hydrophobic (log P) and higher MWT than orphan-designated products with at least one marketing approval for a common disease indication or one marketing approval for a rare disease from the FDA’s rare disease research database.
Created structure searchable databases in CDD
Data in publications is increasing but who is tracking it?
Ekins and Williams, Pharm Res, 28, 1785-1791, 2011.
2D Similarity search with “hit” from screening
Export database and use for 3D searching with a pharmacophore or other model
Suggest approved
drugs for testing - may also
indicate other uses if it is
present in more than one database
Suggest in silico hits for in vitro screening
Key databases of structures and bioactivity data FDA drugs
database
Repurpose FDA drugs in silico
Ekins S, Williams AJ, Krasowski MD and Freundlich JS, Drug Disc Today, 16: 298-310, 2011
Crowdsourcing Project “Off the Shelf R&D”
All pharmas have assets on shelf that reached clinic
“Off the Shelf R&D”
Get the crowd to help in repurposing / repositioning these assets
How can software help?
- Create communities to test
- Provide informatics tools that are accessible to the crowd - enlarge user base
- Data storage on cloud – integration with public data
- Crowd becomes virtual pharma-CROs and the “customer” for enabling services
Tools for Open Science
• Blogs• Wikis• Databases• Journals
• What about Twitter, Facebook, could these be used for social collaboration, science?
2020: A Drug Discovery Odyssey
Could our Pharma R&D look like this
Massive collaboration networks – software enabled. We are in “Generation App”
Crowdsourcing will have a role in R&D. Drug discovery possible by anyone with “app access”
Ekins & Williams, Pharm Res, 27: 393-395, 2010.
Example of Social Collaboration in Science:Tweets, Blog Lead to The Green Solvents App
I attend seminar on solvent selection guide
I tweet during talk
Mobile App developer Alex Clark responds to twitter along with Antony Williams starts an email discussion about Green Chemistry apps
I blog that evening
3 days later an App is createdBy Alex
•Make science more accessible = >communication
•Mobile – take a phone into field /lab and do science more readily than on a laptop
•GREEN – energy efficient computing
•MolSync (+ DropBox) + MMDS = Share molecules as SDF files on the cloud = collaborate
Mobile Apps for Drug Discovery
Williams et al DDT 16:928-939, 2011
www.scimobileapps.com
How do you find scientific mobile Apps ?
Development of Wiki’s to track developments in tools..
Acknowledgments Rishi Gupta, Eric Gifford, Ted Liston, Chris Waller (Pfizer) Antony J. Williams (RSC) Joel Freundlich (Texas A&M), Gyanu Lamichhane (Johns Hopkins) Carolyn Talcott, Malabika Sarker, Peter Madrid, Sidharth Chopra (SRI International) MM4TB colleagues Chris Lipinski Takushi Kaneko (TB Alliance) Nicko Goncharoff (SureChem) Matthew D. Krasowski (University of Iowa) Alex Clark (Molecular Materials Informatics, Inc) Accelrys CDD – Barry Bunin Funding BMGF, NIAID. Everyone that has shared data in CDD..
Email: [email protected]
Slideshare: http://www.slideshare.net/ekinssean
Twitter: collabchem
Blog: http://www.collabchem.com/
Website: http://www.collaborations.com/CHEMISTRY.HTM