Chemical Similarity
Willie PeijnenburgRIVM – Laboratory for Ecological RiskAssessment
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
2
Similarity : philosophers’ view
exploiting the similarity concept is a sign of immaturescience (Quine)“it is ill defined to say “A is similar to B” and it is onlymeaningful to say “A is similar to B with respect to C”
A chemical “A” cannot be similar to a chemical “B”A chemical “A” cannot be similar to a chemical “B”in absolute termsin absolute terms
but only with respect to some measurable key featurebut only with respect to some measurable key feature
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
3
Similarity : chemists’ view
Intuitively, based on expert judgmentA chemist would describe “similar” compounds in
terms of “approximately similar backbone andalmost the same functional groups”.
Chemists have different views on similarityExperience, contextLajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists
in Reviewing Sets of Compounds, J. Med. Chem., 47(20), 4891-4896.
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
4
Chemical similarity
Computerized similarity assessment needsunambiguous definitionsStructurally similar molecules have similar biologicalactivities
The basic tenet of chemical similarityLong supporting experienceMany exceptions Exceptions are important!Exceptions are important!
Identification of the most informative representationof molecular structures Avoiding information loss isAvoiding information loss isimportant!important!Similarity measures
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
5
Chemical similarity quantified
Numerical representation of chemical structureStructural similarityDescriptor –based similarity3D similarityField –basedSpectralQuantum mechanicsMore…
Comparison between numerical representationsDistance-likeAssociation,Correlation
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
6
Structural similarity
Substructure searchingMaximum Common SubstructureFragment approach
Atom, bond or ring counts, degree of connectivityAtom-centred, bond-centred, ring-centred fragmentsFingerprints, molecular holograms, atom environments
Topological descriptorsHosoya’ Z, Wiener number, Randic index, indices on distancematrices of graph (Bonchev & Trinajstic), bonding connectivityindices (Basak), Balaban J indices, etc.Initially designed to account for branching, linearity, presence ofcycles and other topological featuresAttempts to include 3D information (e.g. distance matricesinstead of adjacency matrices)
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
7
Structural similarity
Oral LD50 for malerats = 2.5g/kgDermal LD50 for malerats = 3.54g/kgNot irritating to eyesof rabbitsSlightly irritating toskin of rabbits
Not mutagenic inNot mutagenic inSalmonella strainsSalmonella strainsHigher potentialHigher potentialbinding affinity tobinding affinity tothe estrogenthe estrogenreceptor than thereceptor than thenitrophenyl acetatenitrophenyl acetate
3-(2-chloro-4-(trifluoromethyl)phenoxy-)phenyl acetate,
CAS# 50594-77-9
5-(2-chloro-4-(trifluoromethyl)phenoxy)-2-nitrophenyl acetate,
CAS# 50594-44-0
Higher potential toHigher potential tocause cancer thancause cancer thanthe phenyl acetatethe phenyl acetate
Walker . J. (2003) ,QSARs for pollution prevention, Toxicity Screening,Risk Assessment and Web Applications, SETAC Press
Isosteric replacements ofgroups
•Substituents:•F, Cl, Br, I, CF3,NO2•Methyl,Ethyl, Isoprpyl,Cyclopropyl, t-Butyl,-OH,-SH,-NH2,-OMe,-N(Me)2
•Atoms and groups in rings:•-CH=,-N=•-CH2-,-NH-,-O-,-S-
•More …
Depends on the endpoint!
(e.g. lipophilicity, receptorbinding)
So: A single group makesdifference …but…
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
8
Structural similarity
Rosenkranz H.S., Cunningham A.R.(2001) Chemical Categories for HealthHazard Identification: A feasibility Study,Regulatory Toxicology andPharmacology 33, 313-318.Examined the reliability of usingchemical categories to classify HPVchemicals as toxic or nontoxicFound: “most often only a proportion“most often only a proportionof chemicals in a category were toxic”of chemicals in a category were toxic”Conclusion: "traditional organic"traditional organicchemical categories do notchemical categories do notencompass groups of chemical thatencompass groups of chemical thatare predominately either toxic orare predominately either toxic ornontoxic across a number ofnontoxic across a number oftoxicological endpoints or even fortoxicological endpoints or even forspecific toxic activities”specific toxic activities”
The bold portion of the chemical in the Category columndefined the fragment used to query each data set.
Abbreviations: EyI,eye irritation;LD50, rat LD50; Dev,developmental toxicity;CA, rodent
carcinogenesis; Mnt, in vivo induction of micronuclei; Sal,Salmonella
mutagenesis; MLA, mutagenesis in cultured mouse lymphomacells.
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
9
3D Similarity
Distance-based and angle-based descriptors (e.g. inter-atomic distance)Field similarity (not exhaustive list)
Comparative Molecular Field Analysis (CoMFA), CoMSIAElectrostatic potentialShapeElectron densityTest probeAny grid-based structural property
Molecular multi-pole moments (CoMMA)Shape descriptors (not exhaustive list)
van der Waals volume and surface (reflect the size of substituents)Taft steric parameterSTERIMOLMolecular Shape Analysis4D QSARWHIM descriptors
Receptor binding
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
10
Structurally similar compounds can have verydifferent 3D properties
Kubinyi, H., Chemical Similarity and Biological activity
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
11
Physicochemical properties
Molecular weightOctanol - water partition coefficientTotal energyHeat of formationIonization potentialMolar refractivityMore…
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
12
Quantum chemistry approaches
The wave function and the density function containall the information of a system.
All the information about any molecule could be extractedfrom the electron density. Bond creation and bond breakingin chemical reactions, as well as the shape changes inconformational processes, are expressed by changes inthe electronic density of molecules. The electronic densityfully determines the nuclear distribution, hence theelectronic density and its changes account for all therelevant chemical information about the molecule.In principle, quantumIn principle, quantum--chemical theory should be ablechemical theory should be ableto provide precise quantitative descriptions ofto provide precise quantitative descriptions ofmolecular structures and their chemical properties.molecular structures and their chemical properties.
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
13
Quantum chemistry approaches
Quantum chemical descriptors - characterize thereactivity, shape and binding properties of acomplete molecule or molecular fragments andsubstituents:HOMO and LUMO energies, total energy, number of filled
orbitals, standard deviation of partial atomic charges andelectron densities, dipole moment, partial atomic charges
Approaches from The Theory of Atom in Molecules– BCP space, TAE/RECON, MEDLA, QShAR(additive density fragments)Quantum chemistry calculations depend on severallevels of approximationComputationally intensive
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
14
Reactivity
Similarity between reactionsSimilarity of chemical structures assessed bygeneralized reaction types and by grossstructural features. Two structures areconsidered similar if they can be convertedby reactions belonging to the samepredefined groups (for example oxidation orsubstitution reactions).
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
15
Similarity indices
Association, correlation, distance coefficientsMost popular :
Tanimoto distance (fingerprints)Euclidean distance (descriptors)Carbo index (fields)
Essentially a classification problem has to be solved(decide if a query compound is closer to one oranother set of compounds)
Many methods available (Discriminant Analysis, Neuralnetworks, SVM, Bayesian classification, etc.)Statistical assumptions and statistical error is involved
ABAB
AA BB
ZCZ Z
ABAB
A B AB
NTN N N
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
16
Similarity indices
Association indices Correlation indices
J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter-Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & HighThroughput Screening,5, 155-166 155
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
17
Fingerprint similarity
Information loss – fragmentspresence and absence insteadof countsBit string saturation – within alarge database almost all bitsare setCan give nonintuitive resultsThe average similarity appearsto increase with thecomplexity of the querycompoundLarger queries are morediscriminating (flatter curve,Tanimoto values spread wider)Smaller queries have sharppeak, unable to distinguishbetween molecules
Flower D., On the Properties of BitString-Based Measures of Chemical Similarity, J.Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998
The distribution of Tanimoto valuesfound in database searches with a
range of query molecules
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
18
Distance indices
Euclidean distance
City-block distance
Mahalanobis distance
Equidistant contours = Points on theequal distance from the query point
Distances obey triangle inequality
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
19
Similarity in descriptor space
Comparison between a point and groups of points is a classification problem.Euclidean distance performs very well if groups are separable (left). Other
classification methods help in other cases.
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
20
What do we measure
We compare numericalrepresentations ofchemical compounds
The numericalrepresentation is not uniqueThe numericalrepresentation includes onlypart of all the informationabout the compoundA distance measure reflects“closeness” only if the dataholds specific assumptions
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
21
Example: Y. Martin et al ( 2002)Do structurally similar molecules have similarbiological activity ?
Set of 1645 chemicals with IC50s for monoamineoxidase inhibitionDaylight fingerprints 1024 bits long ( 0-7 bonds)When using Tanimoto coefficient with a cut off valueof 0.85 only 30 % of actives were detected
J. Med. Chem. 2002,45,4350-4358
Cutoff values % of actives detected % False positives
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
22
Chemical similarity caveats
The similarity computation may not correctlyrepresent the intuitive similarity between twochemical structures
The properties of a chemical might not be implicit in itsmolecular structureMolecular structure might not be fully measured andrepresented by a set of numbers (information loss)Comparison by similarity indices may be counterintuitive
Intuitively similar chemical structures may not havesimilar biological activity
Bioisosteric compoundsStructurally similar molecules may have different mechanisms ofaction
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
23
Similarity and Activity “Neighbourhood principle”
Proximity with respect todescriptors does not necessarymean proximity with respect tothe activityDepends on the relationshipbetween descriptor and activity
True if a continuous &monotonous (e.g. linear,…)relationship holds betweendescriptors and activityThe linear relationship is only aspecial case, given thecomplexity of biochemicalinteractions. Its use should bejustified in every specific caseand/or used only locally
Neighbourhood in thedescriptor space
Similar activityvalues
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
24
Similarity vs. ActivityBlack square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98)
Red circle: Glende et al. 2001 set: alkyl-substituted (ortho to the amino function) derivatives not included inoriginal Debnath data set
logP, Ehomo, Elumo
Similar compounds, Relatively small data setlogP, Ehomo, Elumo
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
25
Similarity by atom environments vs. logP
Syracuse Research KOWWin training set, 2400 compounds
(diverse compounds, large data set)
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
26
Molecular representation requirements
Information preserving or allowing only controlledloss of informationFeature selection
By domain knowledge (e.g. receptor binding, anyknowledge of mechanism of action)By verification of the « neighbourhood » assumptionBy feature selection methods
Examples: PCA, Entropy, Gini index, Kullback-Leiblerdistance, filter and wrapper methodsCompounds should cluster tightly within a class and be farapart for different classes
Combining different measures (consensusapproach)
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
27
Structure is not the sole factor for biologicalactivity
Interactions withenvironment
Solvation effectsMetabolismTime dependenceMore...
Biological activity indifferent species
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
28
Conclusions
Molecular similarity is relativeMolecular representation and similarity index have to accountfor the underlying bio-chemistryValidation of the similarity formulation and its algorithmicsolution is essential
“Neighbourhood” assumption has to be proven case by case
“As understanding of the chemistry and biology of“As understanding of the chemistry and biology ofdrug action improves and a greater ability todrug action improves and a greater ability tomodel the underlying mechanisms appears, themodel the underlying mechanisms appears, theneed for ‘similarity’ approaches will diminish.”need for ‘similarity’ approaches will diminish.”Bender, A.; Glen, R. C. (2004)
Molecular similarity: a key technique in molecular informatics.Org. Biomol. Chem., 2(22), 3204-3218
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
29
Case studyClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
30
Case study
Tanimoto coefficient: 0.51 (i.e. < 0.80 border):Chemist’ view:Type of effects:
DISSIMILAR, although sulphonamide-moieties share similarities!
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com
Nikolova N., Jaworska J.,Approaches to Measure ChemicalSimilarity - a Review, QSAR Comb.Sci. 22 (2003) pp.1006-1024
Click h
ere to
buy
ABB
YY PDF Transformer 2.0
www.ABBYY.comClic
k here
to buy
ABB
YY PDF Transformer 2.0
www.ABBYY.com