in partnership with
Making the discoveries that defeat cancer
Enriching SAR Information by Molecular Scaffold Enumeration Dr Yi Mok
in silico Medicinal Chemistry Cancer Research UK Cancer Therapeutics Unit Division of Cancer Therapeutics ICR London
7th Joint Sheffield Conference on Chemoinformatics Monday 4 July 2016
Overview 2
Aims Enhancing the application of objective scaffold definitions in clustering Mimicking scaffold exploration when establishing SAR Enriching SAR information during HTS hit identification
Method
Systematic molecular scaffold enumeration Results
Relevance to medicinal chemistry Scaffold structural diversity Relevance to SAR analysis
Conclusions and Outlook
Applications in scaffold morphing and hopping
Ring systems Graph framework
Molecular Scaffolds and SAR 4
NVP-AUY922
Markush scaffold Exemplar medicinal chemistry scaffold
Langdon et al. (2011) J Chem Inf Model, 51, 2174.
1Schuffenhauer et. al. (2007) J Chem Inf Model, 47, 47. 2Langdon et al. (2011) J Chem Inf Model, 51, 2174.
Molecular Scaffolds and SAR 5
• Preferably dataset-independent
• Objective and invariant
O NHO
OH O N HN
O
N
NO
O N
N
NO
O N
N
O N
N
The Scaffold Tree1,2
Level 3 Level 2 Level 1 Level 0
Clustering using Core Scaffolds 6
Introduce ‘controlled fuzziness’ in scaffold representation to enhance objective scaffold definitions
and its applications in compound clustering
Using objective scaffold definitions in compound clustering
ü Represents the concept of ‘compound series’ in medicinal chemistry
ü Structurally relevant and easily interpretable
ü Can readily derive SAR
× May overlook key functional groups and functional group similarities
× Definition may be too stringent
Enumeration of Core Scaffold (EnCore)1
Aims - To mimic scaffold exploration efforts when establishing SAR - To enrich SAR information during HTS hit identification
Literature precedence of molecular enumeration - Regioisomers enumeration (HREMS)2
- Reagents enumeration3
- Scaffold morphing4
Core Scaffold Enumeration 7
1Mok & Brown. J Chem Inf Model, submitted. 2Krska et al. (2015) J Chem Inf Model, 55, 1130.
3Ward & Kettle (2011) J Med Chem, 54, 4670. 4Beno & Langley (2010) J Chem Inf Model, 50, 1159.
Features - C, N, O elemental changes - Change one atom (‘mutation’) on the scaffold only per generation - Applicable to all atoms in a scaffold - Keep the (non-)aromaticity of scaffold - Collection of mutated scaffolds à Enumerated scaffold cluster
EnCore 8
Generate canonical SMILES
Introduce single-atom mutations
Check valence of mutated scaffolds
Compare aromaticity to
parent scaffold
Keep unique mutated scaffolds
Output enumerated
scaffold cluster
Input molecular scaffold
Next generation
Mok & Brown. J Chem Inf Model, submitted.
ON
N
N
NH
NN
ON
NH
N
N
ON
N
O
N
O
N
N ON
N8
NH
NN
N
NH
N
O
N
NO
N
O N
N
NH
6
EnCore 9
EnCore enumerated scaffold cluster represents an exhaustive collection of scaffolds within the defined constraints
NH
N
Mok & Brown. J Chem Inf Model, submitted.
ON
O
N
N
O
N
O
N O
N O
NH
NN
N
NH
N
N
NH
N
N NH
N
N NH
NN
NH
N
N
NH
N
N NH
N
N NH
N
N
N NH
NN
NHN
N
NH
N
N
NH
NHNN N
H
NN N
H
NH
NH
N
NH
N
N
NH
N
NH
O
N NH
N NH
7
22
Can EnCore enumerated scaffold clusters retrieve explored scaffolds in a medicinal chemistry compound series?
Relevance to Medicinal Chemistry 10
Medicinal chemistry compound series
EnCore enumeration
Literature medicinal chemistry compound series - Published in J Med Chem in a single manuscript - 62 publications contained at least 100 compounds with IC50 against defined target
Relevance to Medicinal Chemistry 11
Mok & Brown. J Chem Inf Model, submitted.
Literature medicinal chemistry compound series - Published in J Med Chem in a single manuscript - 62 publications contained at least 100 compounds with IC50 against defined target - Removed from list papers on virtual screening, QSAR models and reviews - 43 literature medicinal chemistry series
- A wide spectrum of therapeutic targets - Diverse structural profile
Relevance to Medicinal Chemistry 12
0
10
20
30
40
80 120 160 200
No.
of S
caffo
lds
No. of Compounds
Mok & Brown. J Chem Inf Model, submitted.
0
5
10
15
20
25
1 2 3 4 N
umbe
r of S
erie
s Generation
Used the top Level 1 scaffold in each series for EnCore enumeration
Relevance to Medicinal Chemistry 13
Used the top Level 1 scaffold in each series for EnCore enumeration
Relevance to Medicinal Chemistry 14
125 HIV-1 Reverse Transcriptase inhibitors1
N
HN
NH
OHN
NH
O
113 2
O
NH NH
N
O
NH
121 Sigma Receptor ligands2
1 34 10 EnCore could mimic the scaffold exploration when establishing SAR
1Hargrave et. al. (1991) J Med Chem, 34, 2231. 2Gilligan et al. (1992) J Med Chem, 35, 4344.
0 1 2 3 4Generation
0.1
1
10
100
1000
10000
100000
Num
ber o
f Uni
que
Mut
ated
Sca
ffold
s
12 65 210 448 median
Scaffold Structural Diversity 15
How many generations of EnCore enumeration?
DrugBank1 approved drugs - 1826 compounds - 962 Lipinski-compliant with minimum two rings - 475 unique Level 1 Scaffolds
1 www.drugbank.ca
O
NH
N
OO
O O
O
Scaffold Structural Diversity 16
Two generations of enumeration a balance between chemical space sampling and time required to generate enumerated scaffold clusters
Average EPFP7 FPsim Avg FP similarity to parent scaffold for each generation
0 1 2 3 4Generation
0.0
0.2
0.4
0.6
0.8
1.0
Tani
mot
o Fi
nger
prin
t Sim
ilarit
y
Max FP similarity to parent scaffold for each generation
0 1 2 3 4Generation
0.0
0.2
0.4
0.6
0.8
1.0
Tani
mot
o Fi
nger
prin
t Sim
ilarit
y
Maximum EPFP7 FPsim
Mok & Brown. J Chem Inf Model, submitted.
Relevance to SAR Analysis 17
ICR/CRT screening library - Designed for high-throughput screening - 214,540 compounds - 23,319 Level 1 scaffolds
- 11,657 representing at least two compounds - 11,662 represented only once (singletons)
Can EnCore enumerated scaffold clusters identify extant scaffolds and associate structurally related screening compounds to the parent scaffold?
EnCore EnCore
Exemplar scaffold A
NH
Exemplar scaffold B
NH
N
0
20
40
0 1 2 3 4Log10(Number of Compounds in Parent Scaffold)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
0
20
40
0 1 2 3 4Log10(Number of Compounds in Parent Scaffold)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
Relevance to SAR Analysis 18
After two generations of mutations - Extant scaffolds identified in
17,199 enumerated scaffold clusters out of 23,319 scaffolds (74%)
- Maximum increase to 54 extant scaffolds
Mok & Brown. J Chem Inf Model, submitted.
No. of Compounds (Log)
No.
of S
caffo
lds
0
20
40
0 1 2 3 4Log10(Number of Compounds in Parent Scaffold)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
0
20
40
0 1 2 3 4Log10(Number of Compounds in Parent Scaffold)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
Relevance to SAR Analysis 19
After two generations of mutations - Extant scaffolds identified in
17,199 enumerated scaffold clusters out of 23,319 scaffolds (74%)
- Maximum increase to 54 extant scaffolds
- No extant scaffold match for only 6,120 scaffolds (26%)
EnCore can enrich SAR information in screening library by associating structurally related screening compounds to multiple scaffolds
Mok & Brown. J Chem Inf Model, submitted.
No. of Compounds (Log)
No.
of S
caffo
lds
0
20
40
0 1 2 3 4Log10(Number of Compounds in Parent Scaffold)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
No. of Compounds (Log)
No.
of S
caffo
lds
0
20
40
0 1 2 3 4Log10(Number of Compounds in Enumerated Scaffold Cluster)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
Singleton Scaffolds after EnCore Enumeration
0
20
40
0 1 2 3 4Log10(Number of Compounds in Enumerated Scaffold Cluster)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
No. of Compounds (Log)
No.
of S
caffo
lds
Relevance to SAR Analysis 20
Mok & Brown. J Chem Inf Model, submitted.
- Two enumerated scaffold clusters have >10,000 screening compounds after enumeration
0
20
40
0 1 2 3 4Log10(Number of Compounds in Enumerated Scaffold Cluster)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
Singleton Scaffolds after EnCore Enumeration
0
20
40
0 1 2 3 4Log10(Number of Compounds in Enumerated Scaffold Cluster)
Num
ber o
f Sca
ffold
s in
Enu
mer
ated
Sca
ffold
Clu
ster
13103010030010003000
No. of Compounds (Log)
No.
of S
caffo
lds
Relevance to SAR Analysis 21
EnCore can enrich SAR information in screening library by introducing structurally related screening compounds in singleton scaffolds
Mok & Brown. J Chem Inf Model, submitted.
After two generations of mutations - 7,369 out of 11,662 singleton scaffolds
match with extant scaffolds in their enumerated scaffold clusters (63%)
Conclusions and Outlook 22
• A list of literature medicinal chemistry compound series defined • EnCore can mimic scaffold exploration when establishing SAR
• EnCore can enrich SAR information in screening library
• By associating structurally related compounds to multiple clusters
• By introducing structurally related compounds in singleton scaffolds
• EnCore offers complementary capabilities to literature enumeration tools and clustering methods
• Multiple generations of enumerated scaffolds represent an exhaustive collection for scaffold morphing and hopping