Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular

Ahmad Reza Mehdipour 07.11.2017

Quantitative Structure-Activity Relationship(QSAR)

07.11.2017

http://www.biophys.mpg.de/en/theoretical-biophysics/computational-drug-design.html





Course Outline

1.Ligand-‐based approaches 1.(Quantitative) structure-‐activity relationship (SAR & QSAR)2.Pharmacophore modeling

2.Bioinformatics approaches (target recognition and structural modeling) 1.Sequence alignments and searches2.Gene identiBication and prediction3.Homology modeling

3.Structure-‐based approaches 1.Molecular docking

1.Ligand docking: theory and scoring functions2.Virtual screening3.Protein-‐protein docking and interaction

2.Molecular dynamics simulation1.Introduction into molecular dynamics

3.Estimation of ligand binding afBinity1.Free energy perturbation2.Enhance sampling methods

1.

Ligand-based approach

• Structure-Activity Relationships (SAR)

• Quantitative Structure-Activity Relationships (QSAR)

Molecular descriptors

( )= fBiological activity

QSAR: Historical perspective

1900. Meyer-Overton

Public Domain, https://commons.wikimedia.org/w/index.php?curid=6597630

QSAR: Historical perspective

1964. Hansch analysis

Hansch & Fujita, JACS 1964

log 1! = −!!! + !!!!! − !!!!!! + log !! + !!!!!

Quantitative Structure-Activity Relationships (QSAR)

Definition

QSAR is building a mathematical model correlating a set

of structural descriptors of a set of chemical compounds

to their biological activity.

QYXR is building a mathematical model correlating a set of

independent variables of a set of samples to a set of dependent

variables.


1. Set of compounds

4. Biological activities

Considerations

All compounds should belong to congeneric series

Same mechanism of action

A similar binding mechanism

Biological activity should be exactly the same

Biological activity is correlated to binding affinity


1. Set of compounds

2. Molecular descriptors



1. Set of compounds


3. Mathematical models


! = !! + !!!! + !!!! +⋯+ !!!! !Mul$ple Linear Regression (MLR)

Par$al Least Square (PLS)

Ar$ficial Neural Network (ANN)

Gene$c Algorithm (GA)




1D descriptors

2D descriptors

3D descriptors

Molecular weight, LogP, No. of functional groups

Topological indices

Geometrical parameters, Molecular surfaces, Quantum

chemistry descriptors

2D descriptors

Topological indices based on adjacency matrix

1

3 4

6

5

21 3 4 652

1

3

4

6

5

20 22 01 12 23 33 3

!!!!!

1 21 20 11 02 12 1

!!!!!

3 33 32 21 10 22 0

!! !

!! = 12 !!"

!

!!!

!

!!!!TI = 29

3D descriptors

Quantum chemical descriptors

Descriptors calculated by Quantum Mechanic methods

(semi empirical, Ab initio or DFT )

Partial atomic charges

Lowest occupied molecular orbital energy (LUMO)

Highest occupied molecular orbital energy (HOMO)

Electrostatic potential

Molecular polarizability

Molecular descriptors Softwares

Dragon

GAUSSIAN

HyperChem

CODESSA

MOE


1. Set of compounds


3. Mathematical models


! = !! + !!!! + !!!! +⋯+ !!!! !Mul$ple Linear Regression (MLR)

Par$al Least Square (PLS)

Ar$ficial Neural Network (ANN)

Gene$c Algorithm (GA)

Multiple Linear Regression (MLR)

InterceptCoefficients

! = (!!!)!!!′!!

! = !! = !(!!!)!!!′!!

! = !! + !!!! + !!!! +⋯+ !!!! !

!! − ! − !!!!,! −⋯ !!!!,! !

!!Objective Function


! = !!!!

/(! − ! − 1)!

! = !! + !!!! + !!!! +⋯+ !!!! !

! =!! − ! !!

!!! !!! − !! !!

!!! ! − ! − 1!!! = 1− !! − !! !!

!!!!! − ! !!

!!!!

ȓ = -

!!!!…!!

!!!!!!!!!!!1!2…!"

!

Expr Estimated

!"# = ! log !!!!! !

! + 2(! + 1)!Akaike Information Criterion


X1 X2 X3 X4 Yexp Ycalc Residual

1 3.42 38.51 6.62 6.63 3 2.9 0.12 3.05 38.91 6.61 6.04 3.15 3.37 -‐0.223 2.52 54.28 6.58 6.23 3.28 3.07 0.214 3.29 54.27 6.63 6.09 4.24 3.91 0.335 2.25 54.62 6.61 6.03 3.28 3.14 0.146 2.42 55.37 6.59 5.67 4.35 3.75 0.67 3.15 70.6 6.67 6.51 3.88 3.69 0.198 1.67 69.77 6.49 5.79 3.64 3.3 0.349 2.91 70.03 6.64 6.11 4.35 3.99 0.3610 1.73 70.57 6.61 6.04 3.4 3.11 0.2911 1.36 86.18 6.64 6.12 3.3 3.12 0.1812 2.81 85.83 6.62 6.05 4.7 4.38 0.3213 2.96 102.96 6.66 6.52 4.67 4.35 0.3214 0.65 102.7 6.61 6.04 3.34 3.06 0.2815 2.22 117.89 6.62 6.04 4.11 4.74 -‐0.6316 0.19 118.98 6.61 6.18 3.37 2.92 0.4517 2.85 135.34 6.67 6.52 5.93 5.1 0.8318 0.39 134.08 6.65 6.32 3.65 3.31 0.3419 3.58 22.34 6.7 6.6 2.7 2.69 0.0120 3.41 54.34 6.62 6.64 3.49 3.29 0.221 0.43 77.39 1.87 4.37 1.99 1.87 0.1222 0.35 93.05 1.88 4.34 2.38 2.25 0.1323 0.09 109.53 1.87 4.34 2.76 2.46 0.324 -‐0.2 125.8 1.88 4.34 3.29 2.65 0.6425 1.41 87.61 0.35 -‐14.65 0.87 0.85 0.02

∂2=0.170 R2=0.899 F=42.4

! = 4.224− 1.305!! + 0.535!! + 0.026!! + 0.817!!!

∂2Y=0.712

Variable selection

1. Systematic approaches

1. Forward selection

2. Backward elimination

2. Heuristic approaches

1. Genetic algorithm

2. Simulated annealing

Forward selection

Y X1 X2 X3 X4 X5X1 X2 X3 X4 X5

AIC 57.7 60.70 54.7 56.1 56.5Y=a+Xn

X1 X2 X3 X4 X5

AIC 56.3 47.55 56.7 56.5Y=a+X3+Xn

X1 X2 X3 X4 X5

AIC 29.4 49.5 48.3Y=a+X3+X2+Xn

X1 X2 X3 X4 X5

AIC 13.8 25.1Y=a+X3+X2+X1+Xn

X1 X2 X3 X4 X5

AIC 15.7Y=a+X3+X2+X1+X4+Xn

!"# = ! log !!!!! !

! + 2(! + 1)!

Backward elimination

Y X1 X2 X3 X4 X5 !"# = ! log !!!!! !

! + 2(! + 1)!

X1 X2 X3 X4 X5

AIC 15.7Y=a+X1+X2+X3+X4+X5

X1 X2 X3 X4 X5

AIC 21.8 50.6 59.9 25.1 13.8Y=a+X1+X2+X3+X4

X1 X2 X3 X4

AIC 31.9 49.5 58.0 29.4Y=a+X1+X2+X3+X4

Genetic algorithm

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 GENOME

0 1 0 0 1 0 0 1 0 0

0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0

1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 1

AIC

! = !! + !!!! + !!!! + !!!!!

0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0

Mutation Mutation

Partial least square

The X-variables are correlated

The number of X-variables is relatively high compared with the number of samples

X = TPT Y =UQT

Y =ß X + ℇ

U =ß T + ℇ

Other modeling methods

Non-linear regression

Artificial neural network

Classification methods

Multiple logistic regression

Support vector machine

! = !! + !!!!! + !!!!! +⋯+ !!!!! ! Y

X1

X2

X3

! !!

!

!!!!! !

Validation

Valida&on is required to ensure model quality

Over-‐fi6ng

Chance correla&on

1. Cross-validation

1. Leave-one-out

2. Leave-N-out

2. Bootstrapping

3. External validation (prediction set)

4. Y randomization

Cross-validation

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y19Y20

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y19

Y20

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16Y17Y18Y20

Y19

P Tim

es

Leave-one-out


Leave-N-out

Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y14Y15Y16

Y20

Y19

Y18

Y17

Y1

Y2

Y7

Y8

Y9

Y10

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y20

Y3

Y4

Y5

Y6P/

N T

imes

Rcv2LOO Rcv2LNO

Bootstrapping


Y1Y2Y3Y4Y5Y6Y7Y8Y9Y10Y11Y12Y13Y15Y17Y19

Y20

Y16

Y18

Y14

Y2

Y3

Y4

Y5

Y8

Y9

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y20

Y7

Y10

Y1

Y6

N T

imes

RBS2

External validation

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

Y9

Y10

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y19

Y20

Y1

Y3

Y4

Y5

Y6

Y7

Y8

Y10

Y11

Y12

Y13

Y15

Y16

Y17

Y19

Y2

Y9

Y14

Y18

Y20

Variable selection

Cross-validation

Final model

Predic

t

R2EV

Y-randomization

Y1

Y2

Y3

Y4

Y5

Y6

Y7

Y8

Y9

Y10

Y11

Y12

Y13

Y14

Y15

Y16

Y17

Y18

Y19

Y20

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

Y =ß X + ℇ

Y20

Y19

Y18

Y17

Y16

Y15

Y14

Y13

Y12

Y11

Y10

Y9

Y8

Y7

Y6

Y5

Y4

Y3

Y2

Y1

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

X17

X18

X19

X20

Ynew =ß X + ℇ RYrand2

N T

imes

Good model?

! = !! + !!!! + !!!! +⋯+ !!!! !∂2 R2 F (R)MSEModel Robustness

!"#$ = !"#$ !! − ! !

! − 1

!

!!!!

Model Quality Rcv2LOO Rcv2LNO RBS2 RMSEcv

Model Reliability RYrand2 RMSEYrand

Model Predictability REV2 RMSEEv

Good model?

! = !! + !!!! + !!!! +⋯+ !!!! !∂2 R2 >0.8 F (R)MSEModel Robustness

Model Quality Rcv2LOO >0.6 Rcv2LNO >0.6 RBS2 >0.6 RMSEcv

R2 - Rcv2 < 0.3

Model Reliability RYrand2 <0.3 RMSEYrand

R2 - RYrand2 > 0.4

Model Predictability REV2 >0.6 RMSEEV

R2 - REV2 < 0.3

Applicability domain

! = !! + !!!! + !!!! +⋯+ !!!! !

X1

X2

Principal component analysis

Prediction Vs Description

VE_b(e): coefficient sum of the last eigenvector from Burden matrix weighted by Sanderson electronegativityATS1v: Broto-Moreau autocorrelation of lag 1 (log function) weighted by van der Waals volumeSM02_AEA: spectral moment of order 2 from augmented edge adjacency mat. weighted by resonance integral

! = 2.34+ 3.5!!! ! − 0.87!"!1! + 3.76!!02_!"!!

! = 8.34+ 2.5!"#$ + 0.93!"#!

∂2=0.003 R2=0.951 F=260.2 REV2=0.891

∂2=0.113 R2=0.811 F=43.2 REV2=0.761

LogP: water-oil partition coefficientNAR: Number of aromatic rings

Documents

Quantitative Structure-Activity Relationship (QSAR) · Ligand-based approach • Structure-Activity Relationships (SAR) • Quantitative Structure-Activity Relationships (QSAR) Molecular