44
QSAR modeling Pavel Polishchuk Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacky University [email protected] qsar4u.com

QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

QSAR modeling

Pavel Polishchuk

Institute of Molecular and Translational MedicineFaculty of Medicine and Dentistry

Palacky University

[email protected]

Page 2: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Activity = F(structure)

M – mapping functionE – encoding function

Modeling of compounds properties

Activity = M(E(structure))

D1 D2 D3 D4 D5 D6 … DN

1 0 9 0 11 1 … 1

4 0 1 0 0 0 … 1

0 0 0 0 0 4 … 6

0 2 3 6 0 0 … 3

… … … … … … … …

4 0 0 0 1 2 … 1

Page 3: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

QSAR modeling workflow

D1 D2 D3 D4 D5 D6 … DN

1 0 9 0 11 1 … 1

4 0 1 0 0 0 … 1

0 0 0 0 0 4 … 6

0 2 3 6 0 0 … 3

… … … … … … … …

4 0 0 0 1 2 … 1

Structure Descriptors (features) Model

Encoding(represent structure with

numerical features)

Mapping(machine learning)

Page 4: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Overall QSAR workflow

Input data

Bioassays

Databases

PreprocessingFeature

engineeringModel

trainingModel

validation

Classification

Regression

Clustering

Cross-validation

Bootstrap

Test set

Applicability Domain

Feature selection

Feature combination

Data normalization & curation

Feature extraction

Interpretation

j

j

ii

z

xxx '

4

1) a defined endpoint2) an unambiguous algorithm3) a defined domain of applicability4) appropriate measures of goodness-of–fit, robustness and predictivity5) a mechanistic interpretation, if possible

OECD principles for the validation, for regulatory purposes, of (Q)SAR models

Page 5: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 1. Data collection

Scientific literature and patentsDatabases (ChEMBL, PubChem, BindingDB, etc)

Traditionally modeled compounds should have the same mechanism of action, however using of complex non-linear machine learning method allows to model data sets with mixed or even unknown mechanism of action with reasonable accuracy.

Conditions may substantially influence the results of bioassays (change in temperature, activators, detectors, etc)

Units checking

Page 6: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 2. Data curation (normalization)

Page 7: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 2. Data curation (normalization)

Page 8: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 2. Data curation (normalization)

Removal of mixtures, inorganics, metalorganics, etc

Strip of salts, counterions, etc

Ionization, if necessary (at the particular pH level)

Chemotype normalization, resonance structure and tautomers

Duplicates removal

Manual checking

Page 9: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 3. Descriptors: classificationObject type:

molecular descriptors (single molecules)descriptors of molecular ensemble (mixtures, materials)reaction descriptors (reactions)

Descriptor origin:calculated from the structureempirical (Hammet constants, lipophilicity chemical shifts in NMR, etc)

Locality:local (atom charge)global (molecular weight, molecular volume, lipophilicity, etc)

Dimensionality:1D (number of methyl groups, molecular weight, etc)2D (topological indices, fragmental descriptors)3D (molecular volume, quantum chemical descriptors)4D (based on a set of conformers)

Calculation method:physico-chemical (lipophilicity, etc)topological (invariants of molecular graph, Randic index, Wiener index, etc)fragmental (fingerprints, etc)pharmacophorespatial (moment of inertia, etc)quantum-chemical (energy of HOMO/LUMO, etc)etc. R. Todeschini and V. Consonni Handbook of Molecular Descriptors, 2008

Page 10: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Atom-centric (augmented atoms) fingerprints

Generate substructures starting from each atom and considering all its neighbors up to the specified distance (radius or diameter).

Morgan fingerprints, Extended-connectivity fingerprints (ECFP). Functional-class fingerprints (FCFP) , etc.

radius=2 (diameter=4)

Rogers, D. & Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 50, 742-754 (2010)

Page 11: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Atom-centric (augmented atoms) fingerprints

Page 12: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Fingerprints

Each molecule has variable length set of substructures – variable length fingerprints

N-C-C C-C-O C-C=O C-C-C C:C:C C:C:N C:N:C

1 1 1 0 0 0 0

1 1 1 1 0 0 0

0 0 0 0 1 1 1

2-bond sequences

Page 13: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Hashed fingerprints

Have fixed length (usually 512, 1024 or 2048 bits)

hash code

N-C-C C-C-O C-C=O C-C-C C:C:C C:C:N

13823 9740 37278 28478 874 283764

0 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1

pseudo-randomnumber generator

fixed-length bit string

Each substructure activates several bits (usually 4-5) to avoid collisions and produce bit string of enough density

Missing bits mean that certain substructures are not presented, Active bits mean that certain substructure may be present (but due to possible collisions one cannot be sure)

Page 14: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Folding of fingerprints and keys bit strings

To increase density of very sparse bit vectors

0 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1

0 0 0 1 0 0 1 1 0 1

0 0 1 0 1 0 0 0 1 1

OR

n bit fingerprints

n/2 bits

n/2 bits

0 0 1 1 1 0 1 1 1 1 n/2 bit fingerprint

Usually optimal density is 0.2-0.3

Page 15: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 4. Feature processing

Feature transformations:linear and non-linear scaling

Feature combinations:

Feature selection:removal of correlated descriptors (important e.g. for linear regression models)removal of descriptors with missing valuesremoval constant and near constant featuresremoval of descriptors with low correlation with the target propertystep-wise feature selectionevolutionary feature selection (genetic algorithm, etc)

sd

xxz i

i

n

i

i xxn

nsd

1

2)(1

ixie

z

1

1range (0; 1)

jiij xxz

2

ii xz add quadratic term

Page 16: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 5. Model building

Unsupervisedclustering

Supervised

Regression Classification

Multiple linear regression (MLR)

Partial linear regression (PLS) Logistic regression

Gaussian Process (GP) Naïve Bayes (NB)

Decision trees (DT)

Support vector machine (SVM)

Neural nets (NN)

Random forest (RF)

k-Nearest neighbors (kNN)

Page 17: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

1/C = 4.08π – 2.14π2 + 2.78σ + 3.38

π = logPX – logPH

σ - Hammet constant

plant growth inhibition activity of phenoxyacetic acids

Hansch equation

R is H or CH3;

X is Br, Cl, NO2 and

Y is NO2, NH2, NHC(=O)CH3

Inhibition activity of compounds against Staphylococcus aureus

Act = 75RH – 112RCH3 + 84XCl – 16XBr – 26XNO2 + 123YNH2 + 18YNHC(=O)CH3 – 218YNO2

Free-Wilson models

QSAR model building

Page 18: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

18

Simulated data set of actives and inactives with two descriptors – MW and logP

logP

MW

Decision tree

Page 19: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

IF logP >= 3.2

AND MW >= 255

THENcompound is

19

logP < 3.2YES NO

MW < 358

YES NO

MW < 255

YES NO

Decision tree

Page 20: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

20

LogP

MW

Decision tree

Page 21: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

21

Initial dataset

Bootstrapsample

Bootstrapsample

Bootstrapsample

CART tree1 CART tree2 CART tree3

Combined prediction

Random feature subspace in each node

Random Forest

Page 22: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Consensus (ensemble) modeling

Page 23: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Consensus (ensemble) modeling

Page 24: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 6. Validation

Test set (usually 20-25% of the work set)

1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2working set

random test set 1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2

stratified test set

1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2

Cross-validation

1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2working set

1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2

1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2

1.2 1.3 1.7 2.0 2.2 2.8 3.1 3.2 3.2 3.6 4.7 5.7 5.8 6.4 7.2 8.1 9.0 9.1 9.2

fold 1

fold 2

fold 3

predictions of different folds are combined to calculate the final predictive measure

Page 25: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 6. Measures of predictive ability of models

Classification

FPTN

TNySpecificit

FNTP

TPySensitivit

2

ySensitivitySpecificitaccuracy Balanced

baseline-1

baseline -accuracy Kappa

N

TNTPaccuracy

2N

FP)FN)(TP(TPFN)FP)(TN(TNbaseline

Confusion matrix

Predicted

positive class (1)

negative class (0)

observed

positive class (1)

truepositive

(TP)

false negative

(FN)

negative class (0)

false positive

(FP)

true negative

(TN)

))()()(( FNTNFPTNFNTPFPTP

FNFPTNTPMCC

[-1; 1]

[0; 1]

[0; 1]

[0; 1]

[0; 1]

[0; 1]

Page 26: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 6. Measures of predictive ability of models

Regression

i

2

obspredi,

i

2

obsi,predi,2

)y(y

)y(y

1Q

1N

)y(y

RMSE i

2

obsi,predi,

N

1i

obsi,predi, yyN

1MAE

Determination coefficient

Root mean squared error

Mean absolute error

Page 27: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 7. Applicability domain (AD)Extrapolation to very distant objects is dangerous

There is a need to define the domain where our model is reliable (models are not universal!)

Only compounds which are similar to the training set compounds should be included in applicability domain of the model. One should estimate similarity of new compounds (test set, etc) to the training set compounds.

Page 28: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 7. Applicability domain (AD) measures

Bounding box - based on descriptor range

Desc_1

Desc_2

AD• internal regions are usually empty, especially if the number of descriptors is big

• it doesn’t take into account descriptor correlation

Distance from training set compounds in descriptor space

Desc_1

Desc_2Distance from training set compounds in model space

Model_1

Model_2

Requires several models (e.g. consensus model, bootstrap models)

One-class SVM

Conformal predictions

Page 29: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 7. Applicability domain (AD) example

J. Chem. Inf. Model., Vol. 50, No. 12, 2010

Page 30: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Why interpretation is important?

Found active/inactive patterns which can be used for optimization of compound properties

Retrieve trends of stricture-activity relationships which can be used for knowledge-base model validation

Regulatory purposes

Page 31: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Principles and issues

Model should be predictive

Interpretation is valid within the applicability domain of the model

Interpretation results are data set dependent

Page 32: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

1/C = 4.08π – 2.14π2 + 2.78σ + 3.38

π = logPX – logPH

σ - Hammet constant

plant growth inhibition activity of phenoxyacetic acids

electronic factorsrate of penetration of membranes in the plant cell

Hansch equation

R is H or CH3;

X is Br, Cl, NO2 and

Y is NO2, NH2, NHC(=O)CH3

Inhibition activity of compounds against Staphylococcus aureus

Act = 75RH – 112RCH3 + 84XCl – 16XBr – 26XNO2 + 123YNH2 + 18YNHC(=O)CH3 – 218YNO2

Free-Wilson models

Page 33: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

347 agonists of 5-HT1A receptor

Ar - substituted (hetero)arylsL - polymethylene chainR - various (poly)cyclic residues

O O OMe Cl Cl CF3

N

N

CH3

F

O2N

Cl

PLS 0.84 0.18 0.03 -0.04 -0.06 -0.09 -0.11 -0.66 -0.73 -0.94 -0.96RF 0.27 0.24 0.04 0.07 -0.02 0.11 0.04 -0.04 -0.55 -0.66 -0.66

Ar

L -(CH2)6- -(CH2)5- -(CH2)4- -(CH2)3- -(CH2)2- -CH2-

PLS 0.8 0.71 0.81 0.08 -0.04 0.06

RF 0.14 0.19 0.14 -0.01 -0.03 0.05

Kuz’min, V.E., et al. Molecular Informatics, 2011, 30, 593-603.

Page 34: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Page 35: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

=–

A B C

Activitypred(A) Activitypred(B) Contribution(C)

f(A) = x f(B) = y W(C) = x – y

Polishchuk, P. G.; Kuz'min, V. E.; Artemenko, A. G.; Muratov, E. N. Universal Approach for Structural Interpretation of QSAR/QSPR Models. Molecular Informatics 2013, 32, 843-853

Universal approach

Page 36: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Page 37: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V. Journal of Chemical Information and Modeling 2016, 56, 1455-1469.

Acute oral toxicity on rats

?

?

Page 38: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Page 39: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Page 40: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

Page 41: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Interpretation results of valid predictive models should converge independent of:

interpretation approach

descriptors

machine learning method

All models are interpretable but not all end-points

Step 8. Interpretation of QSAR models

Page 42: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Overall QSAR workflow

Input data

Bioassays

Databases

PreprocessingFeature

engineeringModel

learningModel

validation

Classification

Regression

Clustering

Cross-validation

Bootstrap

Test set

Applicability Domain

Feature selection

Feature combination

Data normalization

Feature extraction

Interpretation

j

j

ii

z

xxx '

Page 43: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature
Page 44: QSAR modeling - Katedra fyzikální chemie UPOLfch.upol.cz/wp-content/uploads/2018/01/ADD_12_Polishchuk_QSAR.pdfOverall QSAR workflow Input data Bioassays Databases Preprocessing Feature

Step 8. Interpretation of QSAR models

δ)f(xδ)f(xC

f(x)δ)f(xC

δ)f(xf(x)C

forward difference

backward difference

central difference

Partial derivatives is used to calculate contributions of single features.It is applicable to any model built on meaningful (interpretable) descriptors.May be calculated analytically or numerically.

Estimated contributions of separate substructural features can be mapped back on the structure to reveal contribution of fragments.

Marcou, G.; Horvath, D.; Solov'ev, V.; Arrault, A.; Vayer, P.; Varnek, A. Interpretability of SAR/QSAR Models of any Complexity by Atomic Contributions. Molecular Informatics 2012, 31, 639-642