Upload
dylan-henderson
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
ScreenLigand based virtual screening
presented by …maintained by Miklós Vargyas
Last update: 13 April 2010
Screen
Virtual screening by topological descriptors
Screen performs high throughput virtual screening of compound libraries using similarity comparisons by various molecular descriptors.
Description of the product
Screen
Availabilty• JChemBase• JChem Oracle cartridge• Instant Jchem• Server version• standalone command line application programs• KNIME• PipelinePilot
Various 2D descriptors• ChemAxon chemical fingerprint (CCFP)• PipelinePilot ECFP/FCFP• ChemAxon pharmacophore fingerprint (CPFP)• BCUT• Scalars (logP, logD, Szeged index …)• custom descriptors, in-house fingerprints
Optimized similarity measures• Improves similarity prediction• depends on set of known actives• high enrichment ratios in virtual screening
Multiple queries• 3 types of hypotheses• combined hit lists
Key features
Versatile • Use various descriptors in your well established model• Access your trusted in-house fingerprint in IJC, JCB,
JCART• Easy integration in corporate discovery pipelines• Search chemical files directly no need to import structures
in database• New descriptors are pluggable in deployed systems
Optimal• Consistent similarity scores• Smaller hit set• More focused library
Benefits
0.47 0.55
0.57
0.28
0.20
0.06
More consistent similarity scores
Benefits
regular Tanimoto
optimized Tanimoto
High enrichment ratio• Fewer false hits • Known actives are true positive hits (ACE inhibitors)
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Num
ber
of H
its
Tanimto Euclidean Optimized Ideal
Benefits
1
10
100
1000
10000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Number of Spikes retrieved
Num
ber o
f Hits
Tanimoto Euclidean Optimized Ideal
Results
NPY-5 (pharmacophore similarity)
β2-adrenoceptor (pharmacophore similarity)
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Num
ber
of H
its
Tanimto Euclidean Optimized Ideal
Results
Case study at Axovan
• GPCR activity prediction
• distinguishing between GPCR subclasses
GPCR-Tailored Pharmacophore Pattern Recognition of Small Molecular Ligands
Modest von Korff and Matthias Steger, JCICS 2004, 44
Screen roadmap
• New molecular descriptors– ECFP/FCFP (in 5.4)
– Shape descriptors (in 5.4)
• Hidden use of the optimiser – No-pain black-box approach
– Simultaneous multi-descriptor search
• Enhanced IJC integration– Easy descriptor configuration and generation
– Similarity search type instead of descriptors, metrics and other unfriendly concepts
Screen roadmap
• GUI– New web interface (HTML/AJAX)
– Desktop application for descriptor generation
• 3D shape similarity– fast pre-filtering by 3D fingerprint
– Alignment based volumetric Tanimoto calculation
– scaffold hopping by maximizing topological dissimilarity and spatial similarity
Supplementary slides
000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000
query
targets
query fingerprint
metric
target fingerprints
hits
0101010100010100010100100000000000010010000010010100100100010000
)&()()(
)&(
yxByBxB
yxB
A typical approach
0000000100001101000000101010000000000110000010000100001000001000
0100010110010010010110011010011100111101000000110000000110001000
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0100011100011101000100001011101100110110010010001101001100001000
0101110100110101010111111000010000011111100010000100001000101000
0100010100111101010000100010000000010010000010100100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010011000000000000000000010100000010000000000000000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0100010100010100000000100000000000010000000000000100001000011000
0001000100001100010010100000010100101011100010000100001000101000
0100011100010100010000100001001110010010000010001100000000101000
0101010100010100010100100000000000010010000010010100100100010000
queries
targets
hypothesis fingerprint
optimized metric
target fingerprintshits
0100010100011101010000110000101000010011000010100000000100100000
0001101110011101111110100000100010000110110110000000100110100000
0100010100110100010000000010000000010010000000100100001000101000
0101110100110101010111111000010000011111100010000100001000101000
0001000100010100010100100000000000001010000010000100000100000000
0100010100010100000000000000101000010010000000000100000000000000
0101010101111100111110100000000000011010100011100100001100101000
0100010100011000010000011000000000010001000000110000000001100000
0000000100000000010000100000000000001010100000000100000100100000
0101110100110101010111111000010000011111100010000100001000101000
i iiii iiii ii iiii i
i iii
yxsyxsyyxsx
yxs
),min(),min(1),min(
),min(
optimization
ChemAxon’s approach
Chemical fingerprint generation: 500/sPharmacophore fingerprint generation
• calculated: 80/s• rule-based: 200/s
Screening: 12000/sOptimization: 10s/metric
Hardware/software environment:• P4 3GHz, 1GB RAM• Red Hat Linux 9• Java 1.4.2
Performance
Use of various fingerprints and metrics in JSP
http://www.chemaxon.com/jchem/examples/jsp1_x/index.jsp
UGM presentation by Aureus Pharma
Improved Virtual Screening Strategies and Enrichment of Focused Libraries in Active Compounds Using Target-Oriented Databases
http://www.chemaxon.com/forum/viewpost2307.html
Implementations
Chemical, pharmacological or biological properties of two compounds match.
The more the common features, the higher the similarity between two molecules.
Chemical
Pharmacophore
Molecular similarity
)&()()(
)&(),(
yxByBxB
yxByxT
n
iii yxyxE
1
2),(
Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics.
Quantitative assessment of similarity of structures• need a numerically tractable form• molecular descriptors, fingerprints, structural keys
Similarity measures
i i i iiii
i ii
i ii
i iiTanimoto
yxyx
yx
yx
yxyxD
),min(
),min(1
),max(),(
2),(
iiiEuclidean yxyxD
TanimotoD ( , ) = 0.68
EuclideanD ( , ) = 21.93
Standard metrics
hashed binary fingerprint• encodes topological properties of the chemical graph:
connectivity, edge label (bond type), node label (atom type)• allows the comparison of two molecules with respect to their
chemical structure
Construction
1. find all 0, 1, …, n step walks in the chemical graph2. generate a bit array for each walks with given number of bits set3. merge the bit arrays with logical OR operation
Topological chemical fingerprint
length walk bit array
0 C 1010000000
1 C – H 0001010000
1 C – C 0001000100
2 C – C – H 0001000010
2 C – C – O 0100010000
3 C – C – O – H 0000011000
ALL 1111011110
C C OH H
H H
H H
Construction of chemical fingerprint
0100010100010100010000000001101010011010100000010100000000100000
0100010100010100010000000001101010011010100000000100000000100000
Chemical similarity
• encodes pharmacophore properties of molecules as frequency counts of pharmacophore point pairs at given topological distance
• allows the comparison of two molecules with respect to their pharmacophore
Construction
1. perceive pharmacophoric features2. map pharmacophore point type to atoms3. calculate length of shortest path between each pair of atoms4. assign a histogram to every pharmacophore point pairs and
count the frequency of the pair with respect to its distance
Topological pharmacophore fingreprint
Rule based approach
donordonor
Rule 1: The pharmacophore type of an atom is an acceptor, if • it is a nitrogen, oxygen or sulfur, and• it is not an amide nitrogen or sulfur, and• it is not an aniline nitrogen, and• it is not a sulfonyl sulfur, and • it is not a nitro group nitrogen.
acceptor
Pharmacophore perception
sp2 atom n-cyano-methil piperidine
donor
exception extra rules large number of rules maintenance, performance
Exceptions to simple rules
pH = 7
pH = 1
acceptor
donor
pH pH specific rules large number of rules maintenance, performance
Effect of pH
Step 1: estimation of pKa
allows the determination of the protonation state for ionizable groups at the given pH
Step 2: partial charge calculation
Pharmacophore perception
Calculation based approach
Step 3: hydrogen bond donor/acceptor recognition
Step 4: aromatic perception
Step 5: pharmacophore property assignment
acceptornegatively charged acceptoracceptor and donorhydrophobicnone
Pharmacophore perception
Calculation based approach
Pharmacophore type coloring: acceptor, donor, hydrophobic, none.
AA1
AA2
AA3
AA4
AA5
AA6
DA1
DA2
DA3
DA4
DA5
DA6
DD1
DD2
DD3
DD4
DD5
DD6
HA1
HA2
HA3
HA4
HA5
HA6
HD1
HD2
HD3
HD4
HD5
HD6
HH1
HH2
HH3
HH4
HH5
HH6
0
1
2
3
4
5
6
7
8
9
1 0
11
1 2
AA1
AA2
AA3
AA4
AA5
AA6
DA1
DA2
DA3
DA4
DA5
DA6
DD1
DD2
DD3
DD4
DD5
DD6
HA1
HA2
HA3
HA4
HA5
HA6
HD1
HD2
HD3
HD4
HD5
HD6
HH1
HH2
HH3
HH4
HH5
HH6
0
1
2
3
4
5
6
7
8
9
10
11
12
Pharmacophore fingerprint
25
3
25
3
2
54
2
4
0
1
2
AA1 AA2 AA3 AA4 AA5 AA6
0
1
AA1 AA2 AA3 AA4 AA5 AA6
DE=1.41
0
1
2
AA1 AA2 AA3 AA4 AA5 AA6
0
1
2
AA1 AA2 AA3 AA4 AA5 AA6
DE=0.45
Fuzzy smoothing
000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000
query
targets
query fingerprint
metric
target fingerprints
hits
0101010100010100010100100000000000010010000010010100100100010000
Virtual screening using fingerprints
000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000
queries
targets
hypothesis fingerprint
metric
target fingerprints
hits
010001010001110101000011000010100001001100001010000000010010000000011011100111011111101000001000100001101101100000001001101000000100010100110100010000000010000000010010000000100100001000101000010111010011010101011111100001000001111110001000010000100010100000010001000101000101001000000000000010100000100001000001000000000100010100010100000000000000101000010010000000000100000000000000010101010111110011111010000000000001101010001110010000110010100001000101000110000100000110000000000100010000001100000000011000000000000100000000010000100000000000001010100000000100000100100000
0101110100110101010111111000010000011111100010000100001000101000
Multiple query structures
• allows faster operation • compiles features common to each individual actives• reduces noise
Active 1 0 2 7 1 0 1 6 4 0 0 9 0
Active 2 1 6 0 4 3 3 1 2 2 0 5 1
Active 3 2 4 4 1 0 2 5 3 4 3 4 5
Minimum 0 2 0 1 0 1 1 2 0 0 4 0
Average 1 4 3.67 2 1 2 4 3 2 1.33 6 2
Median 1.5 4 5.5 1 0 2 5 3 3 0 5 3
Hypothesis types
Advantages
Hypothesis fingerprints
Advantages Disadvantages
Minimum • strict conditions for hits if actives are fairly similar
• false results with asymmetric metrics
• misses common features of highly diverse sets
• very sensitive to one missing feature
Average • captures common features of more diverse active sets
• less selective if actives are very similar
Median • captures common features of more diverse active sets
• specific treatment of the absence of a feature
• less sensitive to outliers
• less selective if actives are very similar
Hypothesis fingerprints
Too many hits
The need for optimization
0.47 0.55
0.57
Inconsistent dissimilarity values
The need for optimization
22, 1),(
iiii yx
iiiyx
iiiasymmetricweighted
Euclidean yxwyxwyxD
i iiii iiii ii iiii i
i iiiasymmetricscaledTanimoto
yxsyxsyyxsx
yxsyxD
),min(),min(1),min(
),min(1),(,
1,0 asymmetry factor
Nis scaling factor
1,0 asymmetry factor
1,0iw weights
Parametrized metrics
selected targets
training set
test set
known actives
query set
training set
testset
Step 1 optimize parameters for maximum enrichmentStep 2 validate metrics over an independent test set
Optimization of metrics
query set
training set
Step 1 optimize parameters for maximum enrichment
Target h its
A ctive h its
Target h its
A ctive h its
Target h its
A ctive h its
1111100010000100001000101000
query fingerprint
parametrized metric
Optimization of metrics
v1
v2
v3
vi
vn
potential variable value
temporarily fixed value
running variable value
final value
Optimization of metrics
test set
Step 2 validate metrics over an independent test set
Target h its
A ctive h its
Target h its
A ctive h its
Target h its
A ctive h its
query set
1111100010000100001000101000
query fingerprint
optimized metric
Optimization of metrics
0.47 0.55
0.57
0.28
0.20
0.06
1. Similar structures get closer
Results of Optimization
2. Hit set size reduced
Enrichment Test Hits Target HitsBasic 70.47 5.43 172.00Scaled 7.63 6.00 1101.71Asymmetric 99.36 5.29 106.00Scaled Asymmetric 11.94 5.86 731.14Basic 5.59 5.43 1465.57Normalized 11.33 5.14 791.29Asymmetric Normalized 18.58 4.71 368.71Weighted Normalized 296.30 4.14 27.57Weighted Asymmetric Normalized 281.30 3.43 17.00
Metric
Tan
imot
oE
uclid
ean
Active set: 18 mGlu-R1 antagonistsTarget set: 10000 randomly selected drug-like structures
Results of Optimization
Active set size Euclidean Optimized Improvement5-HT3 12 12.55 239.24 49.26ACE 89 1.42 6.50 4.64Angiotensin 10 27.81 85.45 11.15Beta2 50 1.52 24.70 17.42D2 13 27.64 123.25 11.19Delta 20 11.66 243.57 69.11FTP 35 46.88 71.54 5.35mGluR1 18 5.59 296.30 70.93NPY-5 139 3.05 12.75 3.25Thrombin 8 2.56 7.68 2.62
3. Higher enrichment
Results of Optimization
4. Top ranked structures are spikes
• offers a more intuitive way to evaluate the efficiency of screening• based on sorting random set hits and known actives on dissimilarity values
and counting the number of random set hits preceding each active in the sorted list
0.0140.0150.0170.0200.0220.0230.0270.0410.043
number of spikes retrieved
number of virtual
hits
Results of Optimization
ACE (pharmacophore similarity)
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of spikes retrieved
Num
ber
of h
its
Euclidean
OptimizedEuclidean
Results
1
10
100
1000
10000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Number of Spikes retrieved
Num
ber o
f Hits
Tanimoto Euclidean Optimized Ideal
Results
NPY-5 (pharmacophore similarity)
β2-adrenoceptor (pharmacophore similarity)
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Number of Active Hits
Num
ber
of H
its
Tanimto Euclidean Optimized Ideal
Results
3D flexible search
Expected top performance 200 structures/s