Upload
bin-chen
View
582
Download
2
Embed Size (px)
DESCRIPTION
ACS Annual MeetingSan Francisco 2010
Citation preview
Pfizer Confidential
Molecular Similarity Characterization of ADME Landscapes
ACS Annual MeetingSan Francisco 2010
Bin Chen‡, Rishi Gupta* and Eric Gifford†
‡ School of Informatics and Computing, Indiana University, Bloomington, IN 47408* Anti Bacterial Research Unit, Pfizer Global R&D, Groton, CT 06340
† Computational Sciences CoE, Pfizer Global R&D, Groton, CT 06340
Molecular Similarity Characterization of ADME Landscapes
ACS Annual MeetingSan Francisco 2010
Bin Chen‡, Rishi Gupta* and Eric Gifford†
‡ School of Informatics and Computing, Indiana University, Bloomington, IN 47408* Anti Bacterial Research Unit, Pfizer Global R&D, Groton, CT 06340
† Computational Sciences CoE, Pfizer Global R&D, Groton, CT 06340
Pfizer Confidential2
OutlineOutline
Introduction
Methods
Results & discussions
Use cases
Conclusions
Pfizer Confidential3
What has been done so far?What has been done so far?
A lot of excellent work in the Activity space using a variety of similarity methods and descriptors
Current work focuses primarily on ADME end points and Molecular properties while examining various descriptor types and similarity methods
Pfizer Confidential4
Do similar compounds have similar ADME properties?Do similar compounds have similar ADME properties?
0.9 0.8 0.7
OH
OH
OH
O
Similarity 0.92 Similarity 0.85Varies based on descriptors used
Similar ADME Properties?
Pfizer Confidential5
Do different ADME endpoints have different landscapes?Do different ADME endpoints have different landscapes?
0.9 0.8 0.7
0.9 0.8 0.7
HLM
RRCK
8.05
49.0 Ratio
67.012
88.0 Ratio
8.05
49.0 Ratio
75.012
98.0 Ratio
High Risk Compound
Low Risk Compound
Probe Compound neighborstotal
classsamewithneighborsRatiosimilarity #
#
Pfizer Confidential6
Hypothesis: Visualizing Chemical LandscapeHypothesis: Visualizing Chemical Landscape
0.5 0.80.2
0.5
1.0
Identical Compounds Ratio ~1.0
Similarity cutoff
Rat
io
1
Ratio=f(endpoint, similarity)
Ratio ~ High (low) risk compounds/total
compounds
Endpoint1
Endpoint3
Endpoint2
Endpoint4
Pfizer Confidential7
Datasets, Assays and BinsDatasets, Assays and Bins
• Full matrix consisting of 17787 compounds and 9 endpoints• Solubility and cLogP are predicted endpoints using in-house computational models
on datasets with more than 10K compounds ,the rest are experimental results
Endpoint Description Result unit Low Risk High Risk
RRCK passive permeability in RRCK cell line 10-6 cm/sec >10 <=10
HLM metabolic stability using human liver microsomes
µL/min/mg <20 >=20
MDR Pgp influenced permeability andefflux in MDCK-MDR1 cells
10-6 cm/sec >10 <=10
CYP1A2 CYP1A2 inhibition in a substrate cocktail assay % Inhibition <10 >=10
CYP3A4 CYP3A4 inhibition in a substrate cocktail assay % Inhibition <10 >=10
CYP2D6 CYP2D6inhibition in a substrate cocktail assay % Inhibition <10 >=10
CYP2C9 CYP2C9 inhibition in a substrate cocktail assay % Inhibition <10 >=10
*Solubility ADMET Aqueous Solubility properties Solubility level >2 <=2
*cLogP logarithm partition coefficient Octanol-Water Partition Coefficient
<3 >=3
Pfizer Confidential8
Characterize Chemical Landscape: Proposed Workflow*Characterize Chemical Landscape: Proposed Workflow*
Full matrix (cmpd*endpoint)
Similarity matrix
Select all high/low risk compounds in an
Endpoint
Calculate the ratio of each compound
Average the ratio of all the compounds
Iterate all highrisk compounds
FCFP6Tanimoto
• Structure similarity• Fingerprint (4)
• MDL public keys• Atom pairs• FCFP6• ECFC4
• Coefficient (2)• Tanimoto• Cosine
• Risk categorization (2)• High risk• Low risk
• Endpoints (9)• Complexity: 4*2*2*9=144
Workflow for Plotting landscape of an endpoint using FCFP6 and tanimoto as similarity measurement
Select one similarity cutoff
Select one compound
Iterate all Cutoffs(total 14)
Plot: Similarity cutoff & ratio
neighborstotal
classsamewithneighborsRatiosimilarity _#
___#
*Molecular Similarity Characterization of ADME Landscapes; Chen et al., JCIM, Submitted, 2010
Pfizer Confidential9
What are we evaluating?What are we evaluating?
Calculate the ratio of all compounds, individually. Average the ratio of all the compounds at each similarity threshold,
ignoring the ratio is 0 (either no same class neighbor or no neighbor)
Compound ID Similarity 0.9 Similarity 0.8 Similarity 0.7 …
PF_1 0.9 0.9 0.7 …
PF_2 1 0.5 0.7 …
PF_3 0.95 0.8 0.7 …
… … … … …
PF_N 0.91 0.85 0.68 …
average 0.95 0.85 0.7 …
Pfizer Confidential10
Results: Compare Different EndpointsResults: Compare Different Endpoints
(a) ECFC4, Tanimoto, low risk (b) ECFC4, Tanimoto, high risk
• Rate of “fall” of a given curve defines how easy/difficult it would be to modify a compound and modify its property i.e. transform a compound from being high risk to low risk or vice versa
• Compounds in MDR are relatively difficult to come out of a High Risk Class compared to HLM at any given similarity cutoff
• Ratio stays constant after a given certain similarity threshold (i.e. 0.4 in the case of CYP2C9 )
Pfizer Confidential11
Results: Compare Different Fingerprints*Results: Compare Different Fingerprints*
(a) RRCK high risk(a) RRCK high risk (b) RRCK low risk
• Ratio is different among fingerprints, the order is always FCFP6> Atom-pairs >ECFC4>MDL
*Molecular Similarity Characterization of ADME Landscapes; Chen et al., JCIM, Submitted, 2010
Pfizer Confidential12
Results: Compare different similarity coefficientsResults: Compare different similarity coefficients
(a) RRCK Low Risk (b) RRCK High Risk
• Ratio is different among similarity coefficients, the order is always tanimoto>Cosine
Pfizer Confidential13
SN
N
N
N
MDR:LOWRRCK:HIGH…
Use Case: Which one is better to optimize?Use Case: Which one is better to optimize?
N
ON
MDR: HIGHRRCK: LOW…
SN
N
N
N
MDR:LOW?RRCK:LOW?…
MDR: LOW?RRCK: LOW?…
Probability of Success?
N
ON
Pfizer Confidential14
Use Case: Data Driven Compound Prioritization?Use Case: Data Driven Compound Prioritization?
Compds # High Risk
SCORE
Compound1 - - + - - - - - - 1 0.688
Compound2 + - - - - - - - - 1 0.694
Compound3 - - - - - - - + + 2 0.623
Compound4 - - - + + - + - - 3 0.627
hl
ratioEratioE
ADMET
h
jj
l
ii
score
))(1()(
+ and - represent high risk and low risk endpoint, respectively
HLM
RR
CK MD R
CY
P1A
2C
YP
3A4
CY
P2D
6C
YP
2C9
Aq.
Sol
.
cLog
P
Pfizer Confidential15
Potential Combinations Potential Combinations
• 4 descriptor types are used• 2 similarity metrics are used• 9 endpoints,• 512 combinations.• Overlap means some compounds with higher risk endpoints should go first than those
with lower e.g.: MDL+Tanimoto Coeff.
Pfizer Confidential16
• + and - represent high risk and low risk endpoint, respectively• totally, 9 endpoints and 512 combinations
# high endpoints
Score at similarity 0.5
Score at similarity 0.6
+ + + + + + + + + 9 0.326676 0.275558
- + + + + + + + + 8 0.372088 0.333456
+ - + + + + + + + 8 0.372646 0.332717
- - + + + + + + + 7 0.418058 0.390616
+ + - + + + + + + 8 0.374459 0.336353
... ... ... ... ... ... ... ... ... … … …
- - + + + + + + + 1 0.679591 0.714969
+ + - - - - - - - 2 0.635992 0.660706
- + - - - - - - - 1 0.681403 0.718605
+ - - - - - - - - 1 0.681962 0.717866
- - - - - - - - - 0 0.727373 0.775765
Results: Ranking matrixResults: Ranking matrixH
LM
RR
CK M
D R
CY
P1A
2C
YP
3A4
CY
P2D
6C
YP
2C9
Aq.
Sol
.
cLog
P
Pfizer Confidential17
ConclusionConclusion
Small structural changes result in change of class (High/Low Risk) within a given endpoint
Different endpoints behave differently from each other e.g. MDR may be difficult to modify than CYP2C9
Curves are relatively parallel to each other independent of descriptor and similarity metric
Derived scoring function out of the plots to prioritize compounds (for screening or series selection)
Ratios could be used for differentiating between “difficult” endpoints versus “easy” endpoints
0.5 0.80.2
0.5
1.0
Similarity cutoff
Rat
io
1
Difficult
Easy
0.5 0.80.2
0.5
1.0
Similarity cutoff
Rat
io
1
Difficult
Easy
Pfizer Confidential18
ReferenceReference Martin YC et al. Do Structurally Similar Molecules Have Similar Biological
Activity?. J. Med. Chem. 2002, 45, 4350-4358 Medina-Franco, JL; et al. Characterization of Activity Landscapes Using 2D and
3D Similarity Methods: Consensus activity Cliffs. J. Chem. Inf. Model. 2009, 49, 477-491
Segall MD, et al. Focus on Success: Using a Probabilistic Approach to Achieve an Optimal Balance of Compound Properties in Drug Discovery. Expert Opin. Drug Metab. Toxicol. 2006, 2, 325-37
Pfizer Confidential19
AcknowledgementAcknowledgement
David Wild (School of Informatics and Computing, Indiana University)
Veerabahu Shanmugasundaram (AB RU)
Robyn Ayscue Hua Gao
ThanksQuestions and Comments
Pfizer Confidential21
ResultsResults
Heatmap for ratios of all compounds at 14 similarity cutoffs
RRCK, ECFC4, Tanimoto, High Risk RRCK, ECFC4, Tanimoto, Low Risk
Pfizer Confidential22
Discussion & further workDiscussion & further work
Normal distribution Outliers analysis Ranking function validation Implementation
On virtue of full matrix and ADME predictive model, any given compound can be assigned a score for prioritization
Pfizer Confidential23
Backup—Normal distributionBackup—Normal distribution
Binned Ratio
0
100
200
300
400
500
600
700
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
42
104 101 121
185
331
523
554 548
456
620
397 396
263
198
148
212
161 150
308
RRCK, ECFC4, high, similarity 0.85 RRCK, ECFC4, high, similarity 0.65