59
CZ3253: Computer Aided Drug design CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Drug Design Methods I: QSAR Prof. Chen Yu Zong Prof. Chen Yu Zong Tel: 6874-6877 Tel: 6874-6877 Email: Email: [email protected] [email protected] http://xin.cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of Singapore National University of Singapore

CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: [email protected] Room

Embed Size (px)

Citation preview

Page 1: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

CZ3253: Computer Aided Drug designCZ3253: Computer Aided Drug design

Drug Design Methods I: QSARDrug Design Methods I: QSAR

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg

Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of SingaporeNational University of Singapore

Page 2: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

22

TerminologyTerminology• SAR (Structure-Activity Relationships)

– Circa 19th century?

• QSAR (Quantitative Structure Activity Relationships)– Specific to some biological/pharmaceutical function of

molecule (Absorption, Distribution/Digestion, Metabolism, Excretion)

– Brown and Frazer (1868-9)• ‘constitution’ related to biological response

– LogP

• QSPR (Quantitative Structure Property Relationships)– Relate structure to any physical-chemical property of

molecule

Page 3: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

33

Statistical ModelsStatistical Models

• Simple– Mean, median and variation– Regression

• Advanced– Validation methods– Principal components, co-variance– Multiple Regression

QSAR,QSPR

Page 4: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

44

Modern QSARModern QSAR

– Hansch et. Al. (1963)• Activity ‘travel through body’ partitioning

between varied solvents

– C (minimum dosage required)– (hydrophobicity)– (electronic)– Es (steric)

1/C a b 2 c dE s const.

Page 5: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

55

Choosing DescriptorsChoosing Descriptors

• Buffon’s Problem

– Needle Length?– Needle Color?– Needle Composition?– Needle Sheen?– Needle Orientation?

Page 6: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

66

Choosing DescriptorsChoosing Descriptors• Constitutional

– MW, Natoms of element

• Topological– Connectivity,Weiner index (sums of bond distances)– 2D Fingerprints (bit-strings)– 3D topographical indices, pharmacophore keys

• Electrostatic – Polarity, polarizability, partial charges

• Geometrical Descriptors– Length, width, Molecular volume

Page 7: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

77

Choosing DescriptorsChoosing Descriptors• Chemical

– Hydrophobicity (LogP)– HOMO and LUMO energies– Vibrational frequencies– Bond orders– Energy total– GSH

Page 8: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

88

Statistical MethodsStatistical Methods

• 1-D analysis• Large dimension sets require decomposition

techniques– Multiple Regression– PCA– PLS

• Connecting a descriptor with a structural element so as to interpolate and extrapolate data

Page 9: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

99

Simple Error Analysis(1-D)Simple Error Analysis(1-D)

• Given N data points

– Mean

– Variance

– Regression

ycalc

yobs

xcalc

xobs

)()(

),(

YStdXStd

YXCovR

Page 10: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1010

Simple Error Analysis(1-D)Simple Error Analysis(1-D)

• Given N data points– Regression

residualy

yyy obsi

calci

obscalc

obscalc

xx

yy

Page 11: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1111

Simple Error Analysis(1-D)Simple Error Analysis(1-D)

• Given N data points– (Poor 0<R2<1(Good)

2

)()(

),(

)(

N

icalc yySSR

YStdXStd

YXCov

YStd

SSRR

nsfluctuatiobetween n Correlatio

1),(

1

YYXXN

YXCov i

N

ii

2

1

1)(

N

ii YY

NYStd

Page 12: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1212

Correlation vs. Dependence?Correlation vs. Dependence?

• Correlation– Two or more variables/descriptors may correlate to

the same property of a system

• Dependence– When the correlation can be shown to be due to one

changing caused by the change of the other

• Example: Elephants head and legs– Correlation exists between size of head and legs– The size of one does not depend on the size of the other

Page 13: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1313

Quantitative Structure Quantitative Structure Activity/Property Relationships Activity/Property Relationships

(QSAR,QSPR)(QSAR,QSPR)

• Discern relationships between multiple variables (descriptors)

• Identify connections between structural traits (type of subunits, bond angles local components) and descriptor values (e.g. activity, LogP, % denatured)

Page 14: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1414

Pre-QualificationsPre-Qualifications

• Size– Minimum of FIVE samples per descriptor

• Verification– Variance– Scaling– Correlations

Page 15: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1515

QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications

• Variance– Coefficient of Variation

Standard Deviation

Mean

x

x

"Spread"

Page 16: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1616

QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications

• Scaling – Standardizing or normalizing descriptors to

ensure they have equal weight (in terms of magnitude) in subsequent analysis

Page 17: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1717

QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications

• Scaling – Unit Variance (Auto Scaling)– Ensures equal statistical weights (initially)

– Mean Centering

x i' x i

' 1

x i' x i x

x ' 0

Page 18: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1818

QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications

• Correlations

– Remove correlated descriptors

– Keep correlated descriptors so as to reduce data set size

– Apply math operation to remove correlation (PCR)

n)correlatio positive (100% 1

n)correlatio negative (100% 1

:

11

ij

ij

r

ENTATIONOVERREPRES

r

2

,

2

,

,,

,

thth,

descriptor j and ibetween n Correlatio)()(

),(

M

kjkj

M

kiki

jkj

M

kiki

ji

ji

jiji

XXXX

XXXXR

YStdXStd

XXCovR

Page 19: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

1919

QSAR/QSPRQSAR/QSPRPre-QualificationsPre-Qualifications

• Correlations

Page 20: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2020

QSAR/QSPR SchemeQSAR/QSPR Scheme

• Goal– Predict what happens next (extrapolate)!– Predict what happens between data points

(interpolate)!

Page 21: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2121

QSAR/QSPR SchemeQSAR/QSPR Scheme

• Types of Variable– Continuous

• Concentration, occupied volume, partition coefficient, hydrophobicity

– Discrete• Structural (1: Methyl group substituted, 0: no

methyl group substitution)

Page 22: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2222

QSAR/QSPRQSAR/QSPRPrincipal Components AnalysisPrincipal Components Analysis

• Reduces dimensionality of descriptors

• Principle components are a set of vectors representing the variance in the original data

Page 23: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2323

Principal components – Principal components – reducing the dimensionality of a datasetreducing the dimensionality of a dataset

x

y

Clearly there is a relationship between x and y- a high correlation.We can define a new variable z = x+y suchthat we can express most of the variation inthe data as the new variable z.This new variable is a principal component.

v

j

jjii xcp1

,pi is the ith principalcomponent and ci,j is the coefficient of the variable xj.There are v such variables.

Page 24: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2424

QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis

• Geometric Analogy (3-D to 2-D PCA)

y

z

x

x1 x2 ....xNy1 y2 ....yNz1 z2 ....zN

O

~

Page 25: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2525

PCA is the transformation of a set of correlated variablesto a set of orthogonal uncorrelated variables called principalcomponents. These new variables are a linear combination of theoriginal variables in decreasing order of importance.

ikpkipiik tbYYr

p

.1

data matrix loadings (measure of the variation betweenvariables)

scores (measure of the variation between samples)

eigenvalue

Principal componentsPrincipal components

Page 26: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2626

QSAR/QSPRQSAR/QSPRPrincipal Components AnalysisPrincipal Components Analysis

• Formulate matrix

• Diagonalize matrix

• Eigenvectors are the principal components – These principal components (new descriptors) are a linear

combination of the original descriptors

• Eigenvalues represent variance– Largest accounts for greatest % of data variance– Next corresponds to second greatest and so on

Page 27: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2727

QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis

• Formulate matrix (Several types)

– Correlation or covariance (N x P)• N is number of molecules• P is number of descriptors

– Variance-Covariance matrix (N x N)

• Diagonalize (Rotate) matrix

r11 r12 ....r1pr21 r22 ....r2p rn1 rn2 ....rnp

A~

AA

T Avc

Page 28: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2828

QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis

• Eigenvectors (Loadings) – Represents contribution from each original descriptor

to PC (new descriptor)• # columns = # of descriptors• # rows = # of descriptors OR # of molecules

• Eigenvalues– Indicate which PC most important (representative of

original descriptors)• Benzene has 2 non-zero and 1 zero eigenvalue (planar)

Page 29: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

2929

QSAR/QSPR-Principal QSAR/QSPR-Principal Components AnalysisComponents Analysis

• Scores

– Graphing each object/molecule in space of 2 or more PCs

• # rows = # of objects/molecules• # columns = # of descriptors OR # of molecules

For benzene corresponds to graph in PC1 (x’) and PC2 (y’) system

Page 30: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3030

PC1

PC2

x

y

The PC’s each maximise the variancein the data in orthogonal directions andare ordered by size.

Usually only a few components are neededto explain (>90%) of the variance in thedata – or the properties are not relevant

The first step is to calculate the varience-covarience matrix from the data

Principal componentsPrincipal components

Page 31: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3131

PC1

PC2

x

y

If there are s observations each of which contains v values, the data can be represented by a matrix D with v rows and s columns.

The varience-covariance matrix is Z = DTD.

The eigenvectors of Z are the principal components. Z is a square symmetric matrix so the eigenvectors are orthogonal. Usually the matrix is diagonalised to obtain the eigenvectors (the weightings for the properties) and eigenvalues (the explained variance).

Principal componentsPrincipal components

Page 32: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3232

80 10 5 3 2

p1 .2 .3 .4 .1 .1 p2 .01 .02 .3 .4 .5p3 .02 .03 .1 .2 .4p5 .03 .4 .4 .04 .3p5 .3 .5 .5 .05 .3

eigenvalues – explain % variance

Properties

Multiply the property valuefor molecule by this for eacheigenvalue

Can do regression on the PC’s, egV = 0.3PC1(0.1) + 0.2PC2(0.1) + 0.4(0.2)

so, we’ve reduced a 5 property problem to a two property problem

The output looks like this :

Principal componentsPrincipal components

Page 33: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3333

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)

Page 34: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3434

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)

10D3D

Page 35: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3535

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)

• Eigenvalues Explanation of variance in data

Page 36: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3636

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)

• Each point corresponds to column (# points = # descriptors) in original data

Proximity correlation

Page 37: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3737

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)• Each point corresponds to row of original data

(i.e. #points = #molecules) or graph of molecules in PC space

HeNapthalene

H2O

Molecular Size

Small acting Big

Proximitysimilarity

Page 38: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3838

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)

Outlier

Page 39: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

3939

QSAR on SYBYL (Tripos Inc.)QSAR on SYBYL (Tripos Inc.)

Page 40: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4040

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Principal Component Analysis

Page 41: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4141

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Principal Component Analysis

Page 42: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4242

Non-Linear MappingsNon-Linear Mappings

• Calculate “distance” between points in N-dimensional descriptor/parameter space– Euclidean– City-block distances

• Randomly assign compounds in set to points on a 2-D or 3-D space

• Minimize Difference (Optimal N-d 2D plot)

Page 43: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4343

Non-Linear MappingsNon-Linear Mappings

• Advantages– Non-linear– No assumptions!– Chance groupings unlikely (2D group likely an

N-D group)

• Disadvantages– Dependence on initial guess (Use PCA scores

to improve)

Page 44: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4444

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Multiple Regression (MR)• PCR• PLS

Page 45: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4545

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Linear Regression– Minimize difference between calculated and

observed values (residuals)

Multiple Regression

y mx b

mx i x y i y

i1

N

x i x 2

i1

N

b y m x

y mi * x ii1

N

B

Page 46: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4646

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Principal Component Regression

– Regression but with Principal Components substituted for original descriptors/variables

Page 47: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4747

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Partial Least Squares

– Cross-validation determines number of descriptors/components to use

– Derive equation – Use bootstrapping and t-test to test

coefficients in QSAR regression

Page 48: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4848

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• Partial Least Squares (a.k.a. Projection to Latent Structures)– Regression of a Regression

• Provides insight into variation in x’s(bi,j’s as in PCA) AND y’s (ai’s)

– The ti’s are orthogonal – M= (# of variables/descriptors OR

#observations/molecules whichever smaller)

y ai * tii

N

ti bij * x jj

M

Page 49: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

4949

QSAR/QSPR-Regression TypesQSAR/QSPR-Regression Types

• PLS is NOT MR or PCR in practice

– PLS is MR w/cross-validation– PLS Faster

• couples the target representation (QSAR generation) and component generation while PCA and PCR are separate

• PLS well applied to multi-variants problems

Page 50: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5050

QSAR/QSPRQSAR/QSPRPost-QualificationsPost-Qualifications

• Confidence in Regression– TSS-Total Sum of Squares– ESS-Explained Sum of Squares– RSS-Residual Sum of Squares

TSSESS RSS

R2 ESS

TSS

1 (100% explaination of data)

0 (no explaination of data)

y i y 2

i

N

TSS

ycalc,i y 2ESS

i

N

y i ycalc,i 2

i

N

RSS

Page 51: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5151

QSAR/QSPRQSAR/QSPRPost-QualificationsPost-Qualifications

• Confidence in Prediction (Predictive Error Sum of Squares)

Q2 1PRESS

y i y 2

i1

N

, PRESS y i ycalc,i 2

i1

N

Page 52: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5252

QSAR/QSPRQSAR/QSPRPost-QualificationPost-Qualification

• Bias?– Bootstrapping

• Choosing best model?– Cross Validation

Page 53: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5353

QSAR/QSPRQSAR/QSPRPost-QualificationPost-Qualification

• Bootstrapping

– ASSUME calculated data is experimental/observed data

– Randomly choose N data (allowing for a multiple picks of same data)

– Re-generate parameters/regression – Repeat M times– Average over M bootstraps– Compare (calculate residual)

• If close to zero then no bias• If large then bias exists

M is typically 50-100

Page 54: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5454

QSAR/QSPRQSAR/QSPRPost-QualificationPost-Qualification

• Cross-Validation (used in PLS)– Remove one or more pieces of input data– Re-derive QSAR equation– Calculate omitted data– Compute root-mean-square error to evaluate efficacy of model

• Typically 20% of data is removed for each iteration• The model with the lowest RMS error has the optimal number of

components/descriptors

Page 55: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5555

QSPR ExampleQSPR Example

• Relation between musk odorant properties and benzenoid structure– Training set of 148 compounds (81 non-musk and 67 musk)– 47 chemical descriptors initially– Pre-qualifications

• Correlations (47-12=35)

– Post-qualifications• Bootstrapping • Test-set

– 6/6 musks, 8/9 non-musks

Narvaez, J. N., Lavine, B. K. and Jurs, P. C. Chemical Senses, 11, 145-156 (1986)

Page 56: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5656

Practical IssuesPractical Issues

• 10 times as many compounds as parameters fit

• 3-5 compounds per descriptor

• Traditional QSAR – Good for activity prediction– Not good for whether activity is due to binding

or transport

Page 57: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5757

Advanced MethodsAdvanced Methods

• Neural Networks• Support Vector Machines• Genetic/Evolutionary Algorithms• Monte Carlo• Alternate descriptors

– Reduced graphs– Molecular connectivity indices– Indicator variables (0 or 1)

• Combinatorics (e.g. multiple substituent sites)

Page 58: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5858

Tools AvailableTools Available

• Sybyl (Tripos Inc.)

• Insight II (Accelrys Inc.)

• Pole Bio-Informatique Lyonnais – http://pbil.univ-lyon1.fr/

• Molecular Biology– http://www.infobiogen.fr/services/deambulum/

english/logiciels.html

Page 59: CZ3253: Computer Aided Drug design Drug Design Methods I: QSAR Prof. Chen Yu Zong Tel: 6874-6877 Email: csccyz@nus.edu.sg  Room

5959

SummarySummary

• QSAR/QSPR– Statistics connect structure/behavior w/ observables– Interpolate/Extrapolate

• Multi-Variate Analysis– Pre-Qualification– Regression

• PCA• PLS• MLS

– Post-Qualification