Upload
haitham-hijazi
View
766
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Here we present a new method of classifying the similar molecules using
Citation preview
Molecular similarity searching methods in drug discovery
A Presentation in advanced graphical engineering systems seminar 2011/2012
By: Haytham Hijazi
Advisor: Univ-Prof. Hon-Prof. Dr. Dieter Roller
Molecular similarity searching methods in drug discovery
A Presentation in advanced graphical engineering systems seminar 2011/2012
By: Haytham Hijazi
Advisor: Univ-Prof. Hon-Prof. Dr. Dieter Roller
In this work, I propose a contribution to the field of “Cheminformatic”.Cheminformatic means solving chemical problems using computational methods[1].
James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007).
A Presentation in advanced graphical engineering systems seminar 2011/2012
Agenda• The main question in this research
• The principle of similarity
• Drug discovery as an application
• Research problem
• Molecular representations (1D, 2D…)
• Searching the similarity
• Similarity coefficients calculations
• The probabilistic model (BIM)
• The contribution (MDC)
• Experiments, conclusions and discussion
Shape Colour
Size Pattern
“The similarity is in the eye of the beholder”
Can we claim?
Question: Which molecules in a database are similar to the query molecule?
Application: •better compounds than initial lead compound (Drug discovery)•Property prediction of unknown compound.
The main question
Structurally similar molecules are assumed to have similar biological properties.
Similar biological propritiesdrug discovery.
In our context…the principle
1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition. 17.06.2011
[1]
7
Problems
Claim: General manufacturing problems!
8
The Map
Molecule represntation
Feature selection
Similarity coefficients
calculations and ranking for search
Historical progression◦ Complete structure◦ Sub-Structure
Descriptors◦ 1D (psychophysical properties), 2D, 3D, and 4D
Connectivity tables and graph theory!
Molecular representation
Image Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.
2D structure, line notation
CC(=O)OC1=CC=CC=C1C(=O)OCCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C
SMILES – Simplified Molecular Line Entry System
SMILES
Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.
A fingerprint is a vector encoding the presence (‘1’) or absence (‘0’) of FRAGMENT substructures in a molecule
Dictionary based or and hash based fingerprints
2D Fingerprints - Structural key
Descriptor Fragment
1 AR
2 CCCCN
3 Me
9 NH2
2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.
[1] [2]
3D-fingerprint-topology In 3D keys the position of each bit
corresponds to a certain range of distances or angels.
Computationally complex
Source: Karine Audouze, “Representation of molecular structures and structural diversity”, ChemoInformatics in Drug Discovery, 2009.
13
The Map
Similarity coefficients
calculations and ranking for search
Molecule represntation
Feature selection
Exact structure search Structure search
Substructure search
Similarity searching: maximal common sub graph isomorphism, Tanimoto/Dice/Cosine coefficients
Searching the similarity
The similarity measure (coefficient) is a quantitative measure of similarity
Used to rank the results of the query
Results are ordered decreasingly
Searching the similarity
Distance coefficients. Probabilistic coefficients. Correlation coefficients. Association coefficients.
Associative
Simple matching coefficient (c+d)/(a+b-c+d)
Jaccard measure (Tanimoto) c/(a+b-c) =AND/OR
Cosine, Ochiai c/√(a+b)(c+d)
Dice c/.5[(a+c)+(b+c)] and 2c/a+b
Distance
Hamming distance a+b-2c
Euclidean distance √a+b-2c
Soregel distance a+b-2c/a+b-c
Other coefficients
Pattern difference ab/(a+b c+d)2
Size (a-b)2/(a+b+c+d)2
More coefficients !
Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research Management Centre Project Vote – 75207, University of Malaysia, 2009
Assume we generate the fingerprint fragment based bits
Molecule A:00010100010101000101010011110100
Molecule B:00000000100101001001000011100000
Tanimoto coefficient = Where c=A AND B
Tanimoto=6/(13+8)-6=0.4
Example
( )
c
a b c
ba c
Associate the relevance of a structure to an explicit feature
pi=probability that bit bi appears in an active structure. qi=probability that bit bi appears in an inactive structure αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated. P (A|S) is the probability of an active structure given S. P (NA|S) is the probability of an inactive structure given S. P(A) is the probability of ACTIVEs P(NA) is the probability of INACTIVES
A probabilistic model (BIM)
Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research Management Centre Project Vote – 75207, University of Malaysia, 2009
19
Problems again
Claim: General manufacturing problems !
20
My proposed hybrid search design Molecular Dynamic Classification method (MDC)
Active compounds DatabaseClass 1
Class 2
Class n
Molecular dynamic
simulating tool
Psychophysical properties
Classification Algorithm
Voting
Better insight about the similarity in terms of bioactivity, toxicity, reactivity...(+)
The time of searching (+)
Prediction and voting possibilities (+)
Cost of simulation tools (-)
Classification errors (-)
MDC discussion
Materials Explorer
Itemtracker -Freezer/Cryogen sample tracking system
CHARMM
MDynaMix
Simulation tools
Fingerprint time generation experiment
Data source: simulating tool indicated in the report [17]
Consider if we have more than 1000 bits!
45
67
8
0
5
10
15
20
25
30
2 bits
3 bits
4 bits
Fingerprint time gneration
2 bits3 bits4 bits
Max path.length
Time (Ms)
Hit rate expirement
0 500 1000 1500 2000 25000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Hit rate
Hit rate
Selection Size
Hit
Rate
Data source: simulating tool indicated in the report [17]
The more we increase the size of features, the more the hit rate of finding actives decreaes.
Even fingerprint fragment based is time consuming
Probabilistic models and machine learning introduced substantial changes
Mixing more than type of descriptors seems efficient i.e. Time and results quality
Still need to have experimental results
General evaluation and conclusions
Molecular similarity searching methods in drug discovery
A Presentation to the advanced graphical engineering systems seminar 2011/2012
Thanks for your listening
Haytham Hijazi