Screen Ligand based virtual screening presented by … maintained by Miklós Vargyas Last update: 13 April 2010

ScreenLigand based virtual screening

presented by …maintained by Miklós Vargyas

Last update: 13 April 2010

Screen

Virtual screening by topological descriptors

Screen performs high throughput virtual screening of compound libraries using similarity comparisons by various molecular descriptors.

Description of the product

Screen

Availabilty• JChemBase• JChem Oracle cartridge• Instant Jchem• Server version• standalone command line application programs• KNIME• PipelinePilot

Various 2D descriptors• ChemAxon chemical fingerprint (CCFP)• PipelinePilot ECFP/FCFP• ChemAxon pharmacophore fingerprint (CPFP)• BCUT• Scalars (logP, logD, Szeged index …)• custom descriptors, in-house fingerprints

Optimized similarity measures• Improves similarity prediction• depends on set of known actives• high enrichment ratios in virtual screening

Multiple queries• 3 types of hypotheses• combined hit lists

Key features

Versatile • Use various descriptors in your well established model• Access your trusted in-house fingerprint in IJC, JCB,

JCART• Easy integration in corporate discovery pipelines• Search chemical files directly no need to import structures

in database• New descriptors are pluggable in deployed systems

Optimal• Consistent similarity scores• Smaller hit set• More focused library

Benefits

0.47 0.55

0.57

0.28

0.20

0.06

More consistent similarity scores

Benefits

regular Tanimoto

optimized Tanimoto

High enrichment ratio• Fewer false hits • Known actives are true positive hits (ACE inhibitors)

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Number of Active Hits

Num

ber

of H

its

Tanimto Euclidean Optimized Ideal

Benefits

1

10

100

1000

10000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Number of Spikes retrieved

Num

ber o

f Hits

Tanimoto Euclidean Optimized Ideal

Results

NPY-5 (pharmacophore similarity)

β2-adrenoceptor (pharmacophore similarity)

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18


Num

ber

of H

its


Results

Case study at Axovan

• GPCR activity prediction

• distinguishing between GPCR subclasses

GPCR-Tailored Pharmacophore Pattern Recognition of Small Molecular Ligands

Modest von Korff and Matthias Steger, JCICS 2004, 44

Screen roadmap

• New molecular descriptors– ECFP/FCFP (in 5.4)

– Shape descriptors (in 5.4)

• Hidden use of the optimiser – No-pain black-box approach

– Simultaneous multi-descriptor search

• Enhanced IJC integration– Easy descriptor configuration and generation

– Similarity search type instead of descriptors, metrics and other unfriendly concepts

Screen roadmap

• GUI– New web interface (HTML/AJAX)

– Desktop application for descriptor generation

• 3D shape similarity– fast pre-filtering by 3D fingerprint

– Alignment based volumetric Tanimoto calculation

– scaffold hopping by maximizing topological dissimilarity and spatial similarity

Supplementary slides

000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000

query

targets

query fingerprint

metric

target fingerprints

hits

0101010100010100010100100000000000010010000010010100100100010000

)&()()(

)&(

yxByBxB

yxB

A typical approach

0000000100001101000000101010000000000110000010000100001000001000

0100010110010010010110011010011100111101000000110000000110001000

0100010100011101010000110000101000010011000010100000000100100000

0001101110011101111110100000100010000110110110000000100110100000

0100010100110100010000000010000000010010000000100100001000101000

0100011100011101000100001011101100110110010010001101001100001000

0101110100110101010111111000010000011111100010000100001000101000

0100010100111101010000100010000000010010000010100100001000101000

0001000100010100010100100000000000001010000010000100000100000000

0100010100010011000000000000000000010100000010000000000000000000

0100010100010100000000000000101000010010000000000100000000000000

0101010101111100111110100000000000011010100011100100001100101000

0100010100011000010000011000000000010001000000110000000001100000

0000000100000000010000100000000000001010100000000100000100100000

0100010100010100000000100000000000010000000000000100001000011000

0001000100001100010010100000010100101011100010000100001000101000

0100011100010100010000100001001110010010000010001100000000101000

0101010100010100010100100000000000010010000010010100100100010000

queries

targets

hypothesis fingerprint

optimized metric

target fingerprintshits

0100010100011101010000110000101000010011000010100000000100100000

0001101110011101111110100000100010000110110110000000100110100000

0100010100110100010000000010000000010010000000100100001000101000

0101110100110101010111111000010000011111100010000100001000101000

0001000100010100010100100000000000001010000010000100000100000000

0100010100010100000000000000101000010010000000000100000000000000

0101010101111100111110100000000000011010100011100100001100101000

0100010100011000010000011000000000010001000000110000000001100000

0000000100000000010000100000000000001010100000000100000100100000

0101110100110101010111111000010000011111100010000100001000101000

i iiii iiii ii iiii i

i iii

yxsyxsyyxsx

yxs

),min(),min(1),min(

),min(

optimization

ChemAxon’s approach

Chemical fingerprint generation: 500/sPharmacophore fingerprint generation

• calculated: 80/s• rule-based: 200/s

Screening: 12000/sOptimization: 10s/metric

Hardware/software environment:• P4 3GHz, 1GB RAM• Red Hat Linux 9• Java 1.4.2

Performance

Use of various fingerprints and metrics in JSP

http://www.chemaxon.com/jchem/examples/jsp1_x/index.jsp

UGM presentation by Aureus Pharma

Improved Virtual Screening Strategies and Enrichment of Focused Libraries in Active Compounds Using Target-Oriented Databases

http://www.chemaxon.com/forum/viewpost2307.html

Implementations





Chemical, pharmacological or biological properties of two compounds match.

The more the common features, the higher the similarity between two molecules.

Chemical

Pharmacophore

Molecular similarity

)&()()(

)&(),(

yxByBxB

yxByxT

n

iii yxyxE

1

2),(

Sequences/vectors of bits, or numeric values that can be compared by distance functions, similarity metrics.

Quantitative assessment of similarity of structures• need a numerically tractable form• molecular descriptors, fingerprints, structural keys

Similarity measures

i i i iiii

i ii

i ii

i iiTanimoto

yxyx

yx

yx

yxyxD

),min(

),min(1

),max(),(

2),(

iiiEuclidean yxyxD

TanimotoD ( , ) = 0.68

EuclideanD ( , ) = 21.93

Standard metrics

hashed binary fingerprint• encodes topological properties of the chemical graph:

connectivity, edge label (bond type), node label (atom type)• allows the comparison of two molecules with respect to their

chemical structure

Construction

1. find all 0, 1, …, n step walks in the chemical graph2. generate a bit array for each walks with given number of bits set3. merge the bit arrays with logical OR operation

Topological chemical fingerprint

length walk bit array

0 C 1010000000

1 C – H 0001010000

1 C – C 0001000100

2 C – C – H 0001000010

2 C – C – O 0100010000

3 C – C – O – H 0000011000

ALL 1111011110

C C OH H

H H

H H

Construction of chemical fingerprint

0100010100010100010000000001101010011010100000010100000000100000

0100010100010100010000000001101010011010100000000100000000100000

Chemical similarity

• encodes pharmacophore properties of molecules as frequency counts of pharmacophore point pairs at given topological distance

• allows the comparison of two molecules with respect to their pharmacophore

Construction

1. perceive pharmacophoric features2. map pharmacophore point type to atoms3. calculate length of shortest path between each pair of atoms4. assign a histogram to every pharmacophore point pairs and

count the frequency of the pair with respect to its distance

Topological pharmacophore fingreprint

Rule based approach

donordonor

Rule 1: The pharmacophore type of an atom is an acceptor, if • it is a nitrogen, oxygen or sulfur, and• it is not an amide nitrogen or sulfur, and• it is not an aniline nitrogen, and• it is not a sulfonyl sulfur, and • it is not a nitro group nitrogen.

acceptor

Pharmacophore perception

sp2 atom n-cyano-methil piperidine

donor

exception extra rules large number of rules maintenance, performance

Exceptions to simple rules

pH = 7

pH = 1

acceptor

donor

pH pH specific rules large number of rules maintenance, performance

Effect of pH

Step 1: estimation of pKa

allows the determination of the protonation state for ionizable groups at the given pH

Step 2: partial charge calculation


Calculation based approach

Step 3: hydrogen bond donor/acceptor recognition

Step 4: aromatic perception

Step 5: pharmacophore property assignment

acceptornegatively charged acceptoracceptor and donorhydrophobicnone


Calculation based approach

Pharmacophore type coloring: acceptor, donor, hydrophobic, none.

AA1

AA2

AA3

AA4

AA5

AA6

DA1

DA2

DA3

DA4

DA5

DA6

DD1

DD2

DD3

DD4

DD5

DD6

HA1

HA2

HA3

HA4

HA5

HA6

HD1

HD2

HD3

HD4

HD5

HD6

HH1

HH2

HH3

HH4

HH5

HH6

0

1

2

3

4

5

6

7

8

9

1 0

11

1 2

AA1

AA2

AA3

AA4

AA5

AA6

DA1

DA2

DA3

DA4

DA5

DA6

DD1

DD2

DD3

DD4

DD5

DD6

HA1

HA2

HA3

HA4

HA5

HA6

HD1

HD2

HD3

HD4

HD5

HD6

HH1

HH2

HH3

HH4

HH5

HH6

0

1

2

3

4

5

6

7

8

9

10

11

12

Pharmacophore fingerprint

25

3

25

3

2

54

2

4

0

1

2

AA1 AA2 AA3 AA4 AA5 AA6

0

1


DE=1.41

0

1

2


0

1

2


DE=0.45

Fuzzy smoothing

000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000

query

targets

query fingerprint

metric

target fingerprints

hits

0101010100010100010100100000000000010010000010010100100100010000

Virtual screening using fingerprints

000000010000110100000010101000000000011000001000010000100000100001000101100100100101100110100111001111010000001100000001100010000100010100011101010000110000101000010011000010100000000100100000000110111001110111111010000010001000011011011000000010011010000001000101001101000100000000100000000100100000001001000010001010000100011100011101000100001011101100110110010010001101001100001000010111010011010101011111100001000001111110001000010000100010100001000101001111010100001000100000000100100000101001000010001010000001000100010100010100100000000000001010000010000100000100000000010001010001001100000000000000000001010000001000000000000000000001000101000101000000000000001010000100100000000001000000000000000101010101111100111110100000000000011010100011100100001100101000010001010001100001000001100000000001000100000011000000000110000000000001000000000100001000000000000010101000000001000001001000000100010100010100000000100000000000010000000000000100001000011000000100010000110001001010000001010010101110001000010000100010100001000111000101000100001000010011100100100000100011000000001010000101010100010100010100100000000000010010000010010100100100010000

queries

targets

hypothesis fingerprint

metric

target fingerprints

hits

010001010001110101000011000010100001001100001010000000010010000000011011100111011111101000001000100001101101100000001001101000000100010100110100010000000010000000010010000000100100001000101000010111010011010101011111100001000001111110001000010000100010100000010001000101000101001000000000000010100000100001000001000000000100010100010100000000000000101000010010000000000100000000000000010101010111110011111010000000000001101010001110010000110010100001000101000110000100000110000000000100010000001100000000011000000000000100000000010000100000000000001010100000000100000100100000

0101110100110101010111111000010000011111100010000100001000101000

Multiple query structures

• allows faster operation • compiles features common to each individual actives• reduces noise

Active 1 0 2 7 1 0 1 6 4 0 0 9 0

Active 2 1 6 0 4 3 3 1 2 2 0 5 1

Active 3 2 4 4 1 0 2 5 3 4 3 4 5

Minimum 0 2 0 1 0 1 1 2 0 0 4 0

Average 1 4 3.67 2 1 2 4 3 2 1.33 6 2

Median 1.5 4 5.5 1 0 2 5 3 3 0 5 3

Hypothesis types

Advantages

Hypothesis fingerprints

Advantages Disadvantages

Minimum • strict conditions for hits if actives are fairly similar

• false results with asymmetric metrics

• misses common features of highly diverse sets

• very sensitive to one missing feature

Average • captures common features of more diverse active sets

• less selective if actives are very similar

Median • captures common features of more diverse active sets

• specific treatment of the absence of a feature

• less sensitive to outliers

• less selective if actives are very similar

Hypothesis fingerprints

Too many hits

The need for optimization

0.47 0.55

0.57

Inconsistent dissimilarity values

The need for optimization

22, 1),(

iiii yx

iiiyx

iiiasymmetricweighted

Euclidean yxwyxwyxD

i iiii iiii ii iiii i

i iiiasymmetricscaledTanimoto

yxsyxsyyxsx

yxsyxD

),min(),min(1),min(

),min(1),(,

1,0 asymmetry factor

Nis scaling factor

1,0 asymmetry factor

1,0iw weights

Parametrized metrics

selected targets

training set

test set

known actives

query set

training set

testset

Step 1 optimize parameters for maximum enrichmentStep 2 validate metrics over an independent test set

Optimization of metrics

query set

training set

Step 1 optimize parameters for maximum enrichment

Target h its

A ctive h its

Target h its

A ctive h its

Target h its

A ctive h its

1111100010000100001000101000

query fingerprint

parametrized metric


v1

v2

v3

vi

vn

potential variable value

temporarily fixed value

running variable value

final value


test set

Step 2 validate metrics over an independent test set

Target h its

A ctive h its

Target h its

A ctive h its

Target h its

A ctive h its

query set

1111100010000100001000101000

query fingerprint

optimized metric


0.47 0.55

0.57

0.28

0.20

0.06

1. Similar structures get closer

Results of Optimization

2. Hit set size reduced

Enrichment Test Hits Target HitsBasic 70.47 5.43 172.00Scaled 7.63 6.00 1101.71Asymmetric 99.36 5.29 106.00Scaled Asymmetric 11.94 5.86 731.14Basic 5.59 5.43 1465.57Normalized 11.33 5.14 791.29Asymmetric Normalized 18.58 4.71 368.71Weighted Normalized 296.30 4.14 27.57Weighted Asymmetric Normalized 281.30 3.43 17.00

Metric

Tan

imot

oE

uclid

ean

Active set: 18 mGlu-R1 antagonistsTarget set: 10000 randomly selected drug-like structures


Active set size Euclidean Optimized Improvement5-HT3 12 12.55 239.24 49.26ACE 89 1.42 6.50 4.64Angiotensin 10 27.81 85.45 11.15Beta2 50 1.52 24.70 17.42D2 13 27.64 123.25 11.19Delta 20 11.66 243.57 69.11FTP 35 46.88 71.54 5.35mGluR1 18 5.59 296.30 70.93NPY-5 139 3.05 12.75 3.25Thrombin 8 2.56 7.68 2.62

3. Higher enrichment


4. Top ranked structures are spikes

• offers a more intuitive way to evaluate the efficiency of screening• based on sorting random set hits and known actives on dissimilarity values

and counting the number of random set hits preceding each active in the sorted list

0.0140.0150.0170.0200.0220.0230.0270.0410.043

number of spikes retrieved

number of virtual

hits


ACE (pharmacophore similarity)

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Number of spikes retrieved

Num

ber

of h

its

Euclidean

OptimizedEuclidean

Results

1

10

100

1000

10000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Number of Spikes retrieved

Num

ber o

f Hits

Tanimoto Euclidean Optimized Ideal

Results

NPY-5 (pharmacophore similarity)

β2-adrenoceptor (pharmacophore similarity)

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18


Num

ber

of H

its


Results

3D flexible search

Expected top performance 200 structures/s

Documents

Screen Ligand based virtual screening presented by … maintained by Miklós Vargyas Last update: 13 April 2010