Upload
alex-clark
View
518
Download
4
Embed Size (px)
Citation preview
Why have one modelwhen you could have thousands?
Alex M. Clark, Ph.D.
January 2016
© 2016 Molecular Materials Informatics, Inc. http://molmatinf.com
MOLECULAR MATERIALS INFORMATICS
Cheminformatics• Generally 2D structures with activities:
• Look for trends: structure-activity relationships
• Leverages quantity rather than detail... but quality is also supremely important
2
MOLECULAR MATERIALS INFORMATICS
Structure-Activity Models
• Bayesian models very effective
• Tabulate structure fingerprints for actives vs. inactives
• Prediction: ordering, probability
• Low maintenance
3
10001001000001101001011101110111
• ECFP6 fingerprints
0.8343ROC integral
MOLECULAR MATERIALS INFORMATICS
The Data Problem• > 10 years ago: quantity the biggest issue
- open structure-activity data rare and small - paid collections, big pharma registration
• ~5 years ago: quality the biggest issue
- huge databases, e.g. PubChem, ChemSpider, ZINC, vendors, etc.
- generally no provenance: anything goes
• Cheminformatics seemed to be stagnant...
- new methods, same mediocre performance
4
MOLECULAR MATERIALS INFORMATICS
The Data Solution• Recently: some excellent developments
- Open Melting Points: models actually work - PubChem: direct submission by scientists - CDD: store and share with same platform - ChEMBL: large, open, high quality, broad
• Can now have quantity and quality, without fees or restrictions
• Evidence suggests that the data was holding us back, not the methods
5
MOLECULAR MATERIALS INFORMATICS
ChEMBL• Hierarchy looks like this:
• What we need it to be:
6
target assay activity molecule
dataset assayactivitymolecule
target
mergedactivity
materialsfor model
MOLECULAR MATERIALS INFORMATICS
Slicing & Dicing
• Divide by target, species and type of assay (protein binding, whole cell, ADMET, etc.)
• Measurements: [Ki, Kd] or [IC50, EC50, AC50, GI50]
• Units: [M, mM, μM, nM]
• Relations [=, <, >, ≤, ≥]
• Total of 8646 groups of structure-activity
7
MOLECULAR MATERIALS INFORMATICS
Consolidation• Strip salts / adducts
• Common organic elements only:
- [H, C, N, O, P, S, F, Cl, Br, I, B, Si, Se, As, Sb, Te]
• Duplicate molecules: merge activities, e.g.
- [1.2, 1.8] ➡ 1.5 ± 0.3 - [> 5, 5.5] ➡ > 5 - [< 1, 3.5] ➡ invalid
• Keep groups with at least 100 molecules remaining
• Now down to 1839 datasets
8
MOLECULAR MATERIALS INFORMATICS
Model Building• Bayesian models need a threshold...
9
pIC50 9 157 3
inactive active
• Suitable values often known; large scale automation: must estimate
• Score: population, balance, trial Bayesian
• See J. Chem. Inf. Model. 55, 1246-1260 (2015)
MOLECULAR MATERIALS INFORMATICS
Model Results
• Metrics generally good for Bayesian models using ECFP6 fingerprints
• Note that not all datasets have any SAR
10
AU
C (
easy
)
AU
C (
hard
)
population population
MOLECULAR MATERIALS INFORMATICS
Deliverable• Datasets with acceptable models: 1826
- list of unique molecules - activity (standard molar units) - threshold (active/inactive) - target & assay provenance - Bayesian model (ECFP6)
• Targets are diverse, data is high quality: thanks to the ChEMBL project
• Can apply all models to any molecule...
• Start with a set of discontinued drugs...
11
MOLECULAR MATERIALS INFORMATICS
Discontinued Drugs12
• ~50 drugs that passed most tests, but never made it to market
• Maybe they cure something else?
MOLECULAR MATERIALS INFORMATICS
Detail & Visualisation13
Atom-centric Bayesian
Honeycomb clustering
MOLECULAR MATERIALS INFORMATICS
PolyPharma app
• Proof of concept tools being explored for several drug discovery collaborations
• Interactive functionality demonstrated as a mobile app for iPhone & iPad
• Free to use
14
http://itunes.apple.com/app/polypharma/id1025327772
MOLECULAR MATERIALS INFORMATICS 15
MOLECULAR MATERIALS INFORMATICS 16
MOLECULAR MATERIALS INFORMATICS 17
MOLECULAR MATERIALS INFORMATICS 18
MOLECULAR MATERIALS INFORMATICS 19
MOLECULAR MATERIALS INFORMATICS 20
MOLECULAR MATERIALS INFORMATICS 21
MOLECULAR MATERIALS INFORMATICS 22
MOLECULAR MATERIALS INFORMATICS 23
Acknowledgments
http://molmatinf.com http://molsync.com http://cheminf20.org
@aclarkxyz
• Collaborative Drug Discovery
• Sean Ekins
• Society for Laboratory Automation & Screening
• Inquiries to [email protected]