31
Chemical Similarity Willie Peijnenburg RIVM – Laboratory for Ecological Risk Assessment Click here to buy A B B Y Y P D F T r a n s f o r m e r 2 . 0 w w w . A B B Y Y . c o m Click here to buy A B B Y Y P D F T r a n s f o r m e r 2 . 0 w w w . A B B Y Y . c o m

Chemical Similarity

  • Upload
    ssa-kpi

  • View
    127

  • Download
    0

Embed Size (px)

DESCRIPTION

AACIMP 2009 Summer School lecture by Willie Peijnenburg. "Environmental Chemoinfornatics" course.

Citation preview

Page 1: Chemical Similarity

Chemical Similarity

Willie PeijnenburgRIVM – Laboratory for Ecological RiskAssessment

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 2: Chemical Similarity

2

Similarity : philosophers’ view

exploiting the similarity concept is a sign of immaturescience (Quine)“it is ill defined to say “A is similar to B” and it is onlymeaningful to say “A is similar to B with respect to C”

A chemical “A” cannot be similar to a chemical “B”A chemical “A” cannot be similar to a chemical “B”in absolute termsin absolute terms

but only with respect to some measurable key featurebut only with respect to some measurable key feature

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 3: Chemical Similarity

3

Similarity : chemists’ view

Intuitively, based on expert judgmentA chemist would describe “similar” compounds in

terms of “approximately similar backbone andalmost the same functional groups”.

Chemists have different views on similarityExperience, contextLajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists

in Reviewing Sets of Compounds, J. Med. Chem., 47(20), 4891-4896.

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 4: Chemical Similarity

4

Chemical similarity

Computerized similarity assessment needsunambiguous definitionsStructurally similar molecules have similar biologicalactivities

The basic tenet of chemical similarityLong supporting experienceMany exceptions Exceptions are important!Exceptions are important!

Identification of the most informative representationof molecular structures Avoiding information loss isAvoiding information loss isimportant!important!Similarity measures

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 5: Chemical Similarity

5

Chemical similarity quantified

Numerical representation of chemical structureStructural similarityDescriptor –based similarity3D similarityField –basedSpectralQuantum mechanicsMore…

Comparison between numerical representationsDistance-likeAssociation,Correlation

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 6: Chemical Similarity

6

Structural similarity

Substructure searchingMaximum Common SubstructureFragment approach

Atom, bond or ring counts, degree of connectivityAtom-centred, bond-centred, ring-centred fragmentsFingerprints, molecular holograms, atom environments

Topological descriptorsHosoya’ Z, Wiener number, Randic index, indices on distancematrices of graph (Bonchev & Trinajstic), bonding connectivityindices (Basak), Balaban J indices, etc.Initially designed to account for branching, linearity, presence ofcycles and other topological featuresAttempts to include 3D information (e.g. distance matricesinstead of adjacency matrices)

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 7: Chemical Similarity

7

Structural similarity

Oral LD50 for malerats = 2.5g/kgDermal LD50 for malerats = 3.54g/kgNot irritating to eyesof rabbitsSlightly irritating toskin of rabbits

Not mutagenic inNot mutagenic inSalmonella strainsSalmonella strainsHigher potentialHigher potentialbinding affinity tobinding affinity tothe estrogenthe estrogenreceptor than thereceptor than thenitrophenyl acetatenitrophenyl acetate

3-(2-chloro-4-(trifluoromethyl)phenoxy-)phenyl acetate,

CAS# 50594-77-9

5-(2-chloro-4-(trifluoromethyl)phenoxy)-2-nitrophenyl acetate,

CAS# 50594-44-0

Higher potential toHigher potential tocause cancer thancause cancer thanthe phenyl acetatethe phenyl acetate

Walker . J. (2003) ,QSARs for pollution prevention, Toxicity Screening,Risk Assessment and Web Applications, SETAC Press

Isosteric replacements ofgroups

•Substituents:•F, Cl, Br, I, CF3,NO2•Methyl,Ethyl, Isoprpyl,Cyclopropyl, t-Butyl,-OH,-SH,-NH2,-OMe,-N(Me)2

•Atoms and groups in rings:•-CH=,-N=•-CH2-,-NH-,-O-,-S-

•More …

Depends on the endpoint!

(e.g. lipophilicity, receptorbinding)

So: A single group makesdifference …but…

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 8: Chemical Similarity

8

Structural similarity

Rosenkranz H.S., Cunningham A.R.(2001) Chemical Categories for HealthHazard Identification: A feasibility Study,Regulatory Toxicology andPharmacology 33, 313-318.Examined the reliability of usingchemical categories to classify HPVchemicals as toxic or nontoxicFound: “most often only a proportion“most often only a proportionof chemicals in a category were toxic”of chemicals in a category were toxic”Conclusion: "traditional organic"traditional organicchemical categories do notchemical categories do notencompass groups of chemical thatencompass groups of chemical thatare predominately either toxic orare predominately either toxic ornontoxic across a number ofnontoxic across a number oftoxicological endpoints or even fortoxicological endpoints or even forspecific toxic activities”specific toxic activities”

The bold portion of the chemical in the Category columndefined the fragment used to query each data set.

Abbreviations: EyI,eye irritation;LD50, rat LD50; Dev,developmental toxicity;CA, rodent

carcinogenesis; Mnt, in vivo induction of micronuclei; Sal,Salmonella

mutagenesis; MLA, mutagenesis in cultured mouse lymphomacells.

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 9: Chemical Similarity

9

3D Similarity

Distance-based and angle-based descriptors (e.g. inter-atomic distance)Field similarity (not exhaustive list)

Comparative Molecular Field Analysis (CoMFA), CoMSIAElectrostatic potentialShapeElectron densityTest probeAny grid-based structural property

Molecular multi-pole moments (CoMMA)Shape descriptors (not exhaustive list)

van der Waals volume and surface (reflect the size of substituents)Taft steric parameterSTERIMOLMolecular Shape Analysis4D QSARWHIM descriptors

Receptor binding

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 10: Chemical Similarity

10

Structurally similar compounds can have verydifferent 3D properties

Kubinyi, H., Chemical Similarity and Biological activity

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 11: Chemical Similarity

11

Physicochemical properties

Molecular weightOctanol - water partition coefficientTotal energyHeat of formationIonization potentialMolar refractivityMore…

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 12: Chemical Similarity

12

Quantum chemistry approaches

The wave function and the density function containall the information of a system.

All the information about any molecule could be extractedfrom the electron density. Bond creation and bond breakingin chemical reactions, as well as the shape changes inconformational processes, are expressed by changes inthe electronic density of molecules. The electronic densityfully determines the nuclear distribution, hence theelectronic density and its changes account for all therelevant chemical information about the molecule.In principle, quantumIn principle, quantum--chemical theory should be ablechemical theory should be ableto provide precise quantitative descriptions ofto provide precise quantitative descriptions ofmolecular structures and their chemical properties.molecular structures and their chemical properties.

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 13: Chemical Similarity

13

Quantum chemistry approaches

Quantum chemical descriptors - characterize thereactivity, shape and binding properties of acomplete molecule or molecular fragments andsubstituents:HOMO and LUMO energies, total energy, number of filled

orbitals, standard deviation of partial atomic charges andelectron densities, dipole moment, partial atomic charges

Approaches from The Theory of Atom in Molecules– BCP space, TAE/RECON, MEDLA, QShAR(additive density fragments)Quantum chemistry calculations depend on severallevels of approximationComputationally intensive

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 14: Chemical Similarity

14

Reactivity

Similarity between reactionsSimilarity of chemical structures assessed bygeneralized reaction types and by grossstructural features. Two structures areconsidered similar if they can be convertedby reactions belonging to the samepredefined groups (for example oxidation orsubstitution reactions).

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 15: Chemical Similarity

15

Similarity indices

Association, correlation, distance coefficientsMost popular :

Tanimoto distance (fingerprints)Euclidean distance (descriptors)Carbo index (fields)

Essentially a classification problem has to be solved(decide if a query compound is closer to one oranother set of compounds)

Many methods available (Discriminant Analysis, Neuralnetworks, SVM, Bayesian classification, etc.)Statistical assumptions and statistical error is involved

ABAB

AA BB

ZCZ Z

ABAB

A B AB

NTN N N

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 16: Chemical Similarity

16

Similarity indices

Association indices Correlation indices

J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter-Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & HighThroughput Screening,5, 155-166 155

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 17: Chemical Similarity

17

Fingerprint similarity

Information loss – fragmentspresence and absence insteadof countsBit string saturation – within alarge database almost all bitsare setCan give nonintuitive resultsThe average similarity appearsto increase with thecomplexity of the querycompoundLarger queries are morediscriminating (flatter curve,Tanimoto values spread wider)Smaller queries have sharppeak, unable to distinguishbetween molecules

Flower D., On the Properties of BitString-Based Measures of Chemical Similarity, J.Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998

The distribution of Tanimoto valuesfound in database searches with a

range of query molecules

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 18: Chemical Similarity

18

Distance indices

Euclidean distance

City-block distance

Mahalanobis distance

Equidistant contours = Points on theequal distance from the query point

Distances obey triangle inequality

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 19: Chemical Similarity

19

Similarity in descriptor space

Comparison between a point and groups of points is a classification problem.Euclidean distance performs very well if groups are separable (left). Other

classification methods help in other cases.

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 20: Chemical Similarity

20

What do we measure

We compare numericalrepresentations ofchemical compounds

The numericalrepresentation is not uniqueThe numericalrepresentation includes onlypart of all the informationabout the compoundA distance measure reflects“closeness” only if the dataholds specific assumptions

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 21: Chemical Similarity

21

Example: Y. Martin et al ( 2002)Do structurally similar molecules have similarbiological activity ?

Set of 1645 chemicals with IC50s for monoamineoxidase inhibitionDaylight fingerprints 1024 bits long ( 0-7 bonds)When using Tanimoto coefficient with a cut off valueof 0.85 only 30 % of actives were detected

J. Med. Chem. 2002,45,4350-4358

Cutoff values % of actives detected % False positives

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 22: Chemical Similarity

22

Chemical similarity caveats

The similarity computation may not correctlyrepresent the intuitive similarity between twochemical structures

The properties of a chemical might not be implicit in itsmolecular structureMolecular structure might not be fully measured andrepresented by a set of numbers (information loss)Comparison by similarity indices may be counterintuitive

Intuitively similar chemical structures may not havesimilar biological activity

Bioisosteric compoundsStructurally similar molecules may have different mechanisms ofaction

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 23: Chemical Similarity

23

Similarity and Activity “Neighbourhood principle”

Proximity with respect todescriptors does not necessarymean proximity with respect tothe activityDepends on the relationshipbetween descriptor and activity

True if a continuous &monotonous (e.g. linear,…)relationship holds betweendescriptors and activityThe linear relationship is only aspecial case, given thecomplexity of biochemicalinteractions. Its use should bejustified in every specific caseand/or used only locally

Neighbourhood in thedescriptor space

Similar activityvalues

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 24: Chemical Similarity

24

Similarity vs. ActivityBlack square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98)

Red circle: Glende et al. 2001 set: alkyl-substituted (ortho to the amino function) derivatives not included inoriginal Debnath data set

logP, Ehomo, Elumo

Similar compounds, Relatively small data setlogP, Ehomo, Elumo

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 25: Chemical Similarity

25

Similarity by atom environments vs. logP

Syracuse Research KOWWin training set, 2400 compounds

(diverse compounds, large data set)

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 26: Chemical Similarity

26

Molecular representation requirements

Information preserving or allowing only controlledloss of informationFeature selection

By domain knowledge (e.g. receptor binding, anyknowledge of mechanism of action)By verification of the « neighbourhood » assumptionBy feature selection methods

Examples: PCA, Entropy, Gini index, Kullback-Leiblerdistance, filter and wrapper methodsCompounds should cluster tightly within a class and be farapart for different classes

Combining different measures (consensusapproach)

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 27: Chemical Similarity

27

Structure is not the sole factor for biologicalactivity

Interactions withenvironment

Solvation effectsMetabolismTime dependenceMore...

Biological activity indifferent species

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 28: Chemical Similarity

28

Conclusions

Molecular similarity is relativeMolecular representation and similarity index have to accountfor the underlying bio-chemistryValidation of the similarity formulation and its algorithmicsolution is essential

“Neighbourhood” assumption has to be proven case by case

“As understanding of the chemistry and biology of“As understanding of the chemistry and biology ofdrug action improves and a greater ability todrug action improves and a greater ability tomodel the underlying mechanisms appears, themodel the underlying mechanisms appears, theneed for ‘similarity’ approaches will diminish.”need for ‘similarity’ approaches will diminish.”Bender, A.; Glen, R. C. (2004)

Molecular similarity: a key technique in molecular informatics.Org. Biomol. Chem., 2(22), 3204-3218

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 29: Chemical Similarity

29

Case studyClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 30: Chemical Similarity

30

Case study

Tanimoto coefficient: 0.51 (i.e. < 0.80 border):Chemist’ view:Type of effects:

DISSIMILAR, although sulphonamide-moieties share similarities!

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com

Page 31: Chemical Similarity

Nikolova N., Jaworska J.,Approaches to Measure ChemicalSimilarity - a Review, QSAR Comb.Sci. 22 (2003) pp.1006-1024

Click h

ere to

buy

ABB

YY PDF Transformer 2.0

www.ABBYY.comClic

k here

to buy

ABB

YY PDF Transformer 2.0

www.ABBYY.com