Upload
partha-pratim-roy
View
216
Download
1
Embed Size (px)
Citation preview
QSAR Model Reproducibility and Applicability: A Case
Study of Rate Constants of Hydroxyl Radical Reaction
Models Applied to Polybrominated Diphenyl Ethers
and (Benzo-)Triazoles
PARTHA PRATIM ROY, SIMONA KOVARICH, PAOLA GRAMATICA
Department of Structural and Functional Biology, University of Insubria,Via Dunant 3, 21100, Varese, Italy
Received 14 February 2011; Revised 22 March 2011; Accepted 22 March 2011DOI 10.1002/jcc.21820
Published online 3 May 2011 in Wiley Online Library (wileyonlinelibrary.com).
Abstract: The crucial importance of the three central OECD principles for quantitative structure-activity relationship
(QSAR) model validation is highlighted in a case study of tropospheric degradation of volatile organic compounds
(VOCs) by OH, applied to two CADASTER chemical classes (PBDEs and (benzo-)triazoles). The application of any
QSAR model to chemicals without experimental data largely depends on model reproducibility by the user. The repro-
ducibility of an unambiguous algorithm (OECD Principle 2) is guaranteed by redeveloping MLR models based on both
updated version of DRAGON software for molecular descriptors calculation and some freely available online descrip-
tors. The Genetic Algorithm has confirmed its ability to always select the most informative descriptors independently
on the input pool of variables. The ability of the GA-selected descriptors to model chemicals not used in model develop-
ment is verified by three different splittings (random by response, K-ANN and K-means clustering), thus ensuring the
external predictivity of the new models, independently of the training/prediction set composition (OECD Principle 4).
The relevance of checking the structural applicability domain (OECD Principle 3) becomes very evident on comparing
the predictions for CADASTER chemicals, using the new models proposed herein, with those obtained by EPI Suite.
q 2011 Wiley Periodicals, Inc. J Comput Chem 32: 2386–2396, 2011
Key words: reproducible algorithm; molecular descriptors; external validation; applicability domain; CADASTER
chemicals
Introduction
Quantitative structure-activity relationships (QSARs) are predic-
tive models derived from the application of statistical tools
correlating biological activity, physico-chemical properties or
reactivity of chemicals (drugs/industrial chemicals/environmental
pollutants) with descriptors representative of molecular structure
and/or property. QSAR models have demonstrated their utility
for a long time, initially in drug design and more recently also
in general chemical screening of big libraries of compounds. It
is important to distinguish between ‘‘descriptive QSARs’’ and
‘‘predictive QSARs.’’1 In ‘‘descriptive QSARs,’’ the main atten-
tion is focused on modelling the existing data, fitting them as
best as possible, using molecular descriptors that are mostly
selected by a supposed ‘‘understanding’’ of the correlation/cau-
sality, in terms of mechanism interpretability. These kinds of
QSAR models are highly useful for mechanism interpretation,
particularly in local models developed on homogeneous data
sets of congeneric compounds, and are widely applied, mainly
in drug design. However, in virtual screening a ‘‘predictive
QSAR’’ approach should be preferred: global models exploit the
limited existing experimental information to predict information
relative to chemicals without experimental data. This can be
highly useful to screen big data sets and prioritize, for experi-
mental tests, compounds that are in silico highlighted as poten-
tially more dangerous. Thus, the check of predictivity should be
the most important and primary aspect of ‘‘predictive QSARs.’’
The recent European legislation REACH (Registration
Evaluation Authorization and restriction of Chemicals)2 includes
the use of QSAR models for the prediction of data not
Additional Supporting Information may be found in the online version of
this article.
Correspondence to: P. Gramatica; e-mail: [email protected]
Contract/grant sponsor: European Union (CADASTER); contract/grant
numbers: FP7-ENV-2007-1-212668
q 2011 Wiley Periodicals, Inc.
experimentally available. However the predicted values must be
reliably obtained by QSAR models validated according to
OECD principles for the validation, for regulatory purposes, of
(Q)SAR models.3 These principles, defined after much discus-
sion in QSAR and regulatory communities, are an optimum
summary of the most important points that need to be addressed
to obtain reliable QSAR models. A guidance document on the
validation of QSAR models,4 including useful information on
good practices in QSAR modeling, has been prepared from the
collaborative work of various international experts.
In this article, we focussed on three central principles: Princi-
ple 2) an unambiguous algorithm; Principle 3) a defined domain
of applicability; Principle 4) appropriate measures of goodness-of
fit, robustness and predictivity, in the context of a specific case
study, i.e., the reactivity with hydroxyl radicals in troposphere.
The intent of Principle 2 (unambiguous algorithm) is to
ensure transparency in the model algorithm that generates pre-
dictions of an endpoint from information on chemical structure
and/or physicochemical properties, so that others can reproduce
the model. In fact, without information on how QSAR estimates
are derived, the performance of a model cannot be
independently established. The algorithms used in QSAR model-
ling (in terms of methods and molecular descriptors) should be
described thoroughly, so that the user can understand exactly
how the estimated value was produced, and be able to reproduce
the calculations, if desired. Thus, the important issue of predic-
tion reproducibility is covered by this OECD principle.
The need to define an applicability domain (Principle 3)
expresses the fact that QSARs models are inevitably associated
with limitations in terms of types of chemical structures, phys-
ico-chemical properties and mechanisms of action for which the
models can generate reliable predictions. Even a robust, signifi-
cant and validated QSAR model cannot be expected to reliably
predict a studied end-point for the entire universe of chemicals.
The applicability domain of a QSAR model has been defined5 as
the response and chemical structure space in which the model
makes predictions with a given reliability and is defined by the
nature of the chemicals in the training set. It is generally
felt that if a new molecule is somehow similar, or is in the
‘‘domain’’ or ‘‘space’’ of the training set, it is likely to be well-
predicted (interpolation), otherwise there is significant ‘‘extrapo-
lation’’ and the prediction could be less reliable: it is highly
useful that a user has this type of information.
The Principle 4 expresses the need to perform statistical vali-
dation to establish the performance of a model, which consists
of internal model performance (goodness-of-fit and robustness)
and external model performance (predictivity).6–8
The real utility of QSAR models in the REACH context, and
in specific EU-funded projects dedicated to develop or apply
QSAR models for the REACH legislation, is to obtain reliable
predicted data for compounds without experimental data. In the
CADASTER Project,9 in which the authors are involved, some
classes of emerging pollutants (flame retardants including
PBDEs, perfluorinated chemicals, fragrances and (benzo)tria-
zoles) are studied for the possible application of QSAR predic-
tions in their risk assessment. This offered us the opportunity to
apply some of our previously developed and published QSAR
models10 to CADASTER chemicals to study their persistence,
even without experimental data. Reactions with hydroxyl radi-
cals in the troposphere is the dominant removal pathway for
many industrial chemicals, and for this reason it is crucial to
determine the chemical persistence in air.
Recently QSAR models with validated external predictivity
for degradation by OH of a big set (460) of Volatile organic com-
pounds (VOCs)10 had been developed in our laboratory, using the
version 5.0 (2004) of DRAGON for molecular descriptors calcu-
lation. However, when we tried to apply those models for the pre-
diction of new data sets in the CADASTER Project, we found
that the models were no longer reproducible due to the lack of
some descriptors, or their changing, in versions 5.5 (2007)11 and
the last 6 (2009) of DRAGON software. This is a serious draw-
back for QSAR modeling, both for model developers and users.
So this raises the question of what to do when molecular
descriptors of already developed and validated models are no lon-
ger reproducible because of the new software versions used for
molecular descriptors calculation (deletion of some descriptors,
changes of their values, etc). This problem was already faced and
solved in a previous publication when a BCF model had to be
updated for the above reason.12 In fact, DRAGON is a software
for molecular descriptors calculation that is continuously updated,
not only in terms of addition of new descriptors, but also in the
revision of the old ones. This highlights the fact that QSAR mod-
eling is a dynamic process and continuous updating of QSAR
models is useful to have constantly applicable models. Neverthe-
less it is problematic that descriptors, which had demonstrated
their ability to model some data sets can no longer be calculated,
or are calculated differently, by the newer software versions used
by QSAR model developers: thus the practical utility of that spe-
cific model, which is no longer reproducible, is lost, and it is
therefore no longer suitable for new users to apply.
Thus, we decided to verify if new predictive QSAR models
could be proposed, based on the same data set, but using more and
different descriptors, both by the new versions of DRAGON11 and
by some free-calculable descriptors by web.13 In this work, we
aimed also to verify the ability of our method of variable selection,
Genetic Algorithms, to select descriptors for predictive models
from a changed pool of input descriptors, but with similar mecha-
nistic meaning. The final goal is the proposal of reproducible
QSAR models for OH tropospheric degradation, with external pre-
dictivity6–8 rigorously verified on different splittings and also by
applying various statistical parameters,14–18 some of them15–18
proposed after our previous work. These new models will be prac-
tically and regularly applicable to chemicals in CADASTER Pro-
ject (here PBDEs and (benzo)triazoles) and also for regulation in
REACH to wide set of new chemicals, verifying always the
applicability domain. A comparison of our predicted data with
predictions for the same classes obtained by the widely used EPI
Suite software is also performed and commented.
Materials and Methods
Experimental Data Set
Experimental data of the OH radical degradation rate constants
of 460 heterogeneous organic compounds were obtained from
2387OH Degradation QSAR Model Reproducibility and Applicability
Journal of Computational Chemistry DOI 10.1002/jcc
literature reported by Atkinson.19 The selected data were for
reactions at 258C and 1 atm; all the rate constants, reported in
cm3 s21 per molecule, were transformed to logarithmic units
and multiplied by 21 to obtain positive values (higher the value
in –logarithm scale lower will be the reactivity and vice versa)
and used as response variable for subsequent QSAR analyses.
The data set includes alkanes, alkenes, alcohols, halogenated
chemicals, amines, aromatics, and other functional groups.
In Supporting Information Table S-I all the chemicals in the
experimental data set, ordered according to their CAS number,
are listed with names, SMILES, molecular descriptor values,
experimental and predicted response values.
Molecular Descriptors
The molecular descriptors for the given compounds were mainly
calculated using DRAGON software11 on the (x, y, z)-atomic
coordinates of the minimal energy conformations determined by
the AM1 method in HYPERCHEM Package.20
In this study we consider only zero-, mono-, bi-dimensional
descriptors in DRAGON 5.5 version. Then we deleted those
descriptors that are no longer available or that have somewhat
different values in the updated version (DRAGON 6.0). Finally
constant values and descriptors found to be correlated pair-wise
were excluded in a pre-reduction step (one of any two descrip-
tors with a K correlation greater than 0.95 was removed to
reduce redundant and not useful information), thus obtaining a
pruned set of 341 molecular descriptors.
For the calculation of Online descriptors we used the online
platform of molecular descriptors available at CADASTER
web.13 Different 2D-descriptors (E-state, ALogPS, Molprint
fragment, AMBIT Descriptors, GSFragment, ISIDA fragments
etc) were calculated, and were pruned by deleting descriptors
with less than 2 unique values as well as a correlation [0.95. In
addition, we added ETA descriptors,21 obtaining a large pool of
1023 input descriptors.
Furthermore, to provide energy information, the following
electronic descriptors were added: three quantum-chemical
descriptors (Highest Occupied Molecular Orbital (HOMO) and
Lowest Unoccupied Molecular Orbital (LUMO) energies,
HOMO-LUMO gap), calculated by the semi empirical molecu-
lar orbital program MOPAC (AM1 method for energy minimi-
zation) in the software HYPERCHEM. We used quantum
chemical descriptors previously10,22 calculated in our group,
but the same descriptors can also be freely calculated on the
web.13 Respectively, input sets of 344 descriptors and 1026
descriptors underwent the subsequent selection for the best
modeling variables.
QSAR Modeling
Multiple linear regression (MLR) and variable selection were
performed by Ordinary Least Squares regression (OLS).23 The
Genetic Algorithm-Variable Subset Selection (GA-VSS)24
approach was applied separately on a set of 344 (DRAGON and
MOPAC) and 1026 (Online and MOPAC) descriptors to select
those most relevant to obtain models with the highest predictive
power. First of all, models with 1–2 variables were developed
by the all-subset-method procedure to explore all the low dimen-
sion combinations. The number of descriptors was subsequently
increased one by one, and new models were formed. The out-
come of the Genetic Algorithms in MOBY DIGS software is a
population of 100 regression models, ordered according to their
decreasing internal predictive performance. The coefficient of
determination (R2) was reported as a measure of the total
variance of the response explained by the regression models (fit-
ting). All the models were validated internally by the leave-one-
out procedure (Q2LOO), and the robustness of the models was fur-
ther evaluated by bootstrap (Q2BOOT). The GA was stopped when
increasing the model size did not increase the Q2LOO value to any
significant degree.
Evidence that the proposed models were well founded,
and not just the result of chance correlation, was provided by
Y-scrambling testing: new models, based on the GA-selected
descriptors, were recalculated for a randomly reordered
response, which resulted in a significantly lower R2 than the
originally proposed models. The averaged scrambled R2 (R2YS)
was calculated after 500 scrambling iterations.6,7 Additionally,
another parameter cR2p (cR2
p ¼ R � ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2 � R2
r
p) (R2
r being
squared mean correlation coefficient of random models) was
also calculated25 to check the distance of our
developed models from chance models (Supporting Information
Table S-II).
Particular attention was devoted to the collinearity of the
selected molecular descriptors: in fact, to avoid multicollinearity
without, or with, ‘‘apparent’’ prediction power (due to chance
correlation), regression was calculated only for variable subsets
with an acceptable multivariate correlation with response, by
applying the QUIK rule (Q Under Influence of K).26 The accept-
able models were only those with a global correlation of [X 1y]
block (KXY) greater than the global correlation of the X block
(KXX) variable, X being the molecular descriptors, and y the
response variable. The collinearity in the original set of molecu-
lar descriptors resulted in many similar models that more or less
yield the same predictive power (in MOBY-DIGS software23
100 models of different dimensionality). Therefore, when there
were models of similar performance those with higher DK (KXY-
KXX) were selected and further verified.
Data Splitting for External Validation
For this study three different splitting techniques were applied
to select the training set for model development and the pre-
diction set for model external validation: Random by response,
Kohonen Artificial Neural Network (K-ANN) and K-means
clustering.
The random by response splitting was obtained by ordering
the chemicals according to their descending kinetic constant
value, and then putting the most and the least reactive in the
training set and one out every two chemicals in the prediction
set (50% of the full dataset). This splitting guarantees that the
prediction set spans the entire range of the experimental meas-
urements and is numerically representative of the dataset.
However, such splitting does not guarantee that the training set
represents the entire molecular descriptor space of the original
dataset, being only dependent on response values.
2388 Roy, Kovarich, and Gramatica • Vol. 32, No. 11 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
The splitting of the data set realized by Kohonen Artificial
Neural Network (K-ANN)27 takes advantage of the clustering
capabilities of K-ANN, allowing the selection of a structurally
meaningful training set and a representative prediction set. The
211 most significant principal components, calculated from each
group of DRAGON molecular descriptors, were used to describe
the relevant structural information of the chemicals. This struc-
tural information and the response were used as variables to build
a Kohonen map (12 3 12 neurons, 500 epochs). At the end of
500 epochs of the net training, similar chemicals fall within the
same neuron, i.e., they carry the same information. To select the
training set of chemicals, it is assumed that the compound closest
to each neuron centroid is the most representative of all the chem-
icals within the same neuron. Thus, the selection of the training
set chemicals was performed by the minimal distance from the
centroid of each cell in the top map. The remaining objects, close
to the training set chemicals, were used for the prediction set.
Another approach for splitting into training and prediction
sets is by using K-means clustering28 based on the standardized
predictor variables (DRAGON zero-, mono-, and bi-dimensional
descriptors). This nonhierarchical approach (clustering) ensures
that the similarity principle can be employed for grouping chem-
icals and splitting them in balanced training and prediction sets.
It must be supplied with the number of clusters (K) into which
the data are to be grouped and it expresses only the final cluster
membership for each compound. This procedure ensures that
any chemical classes (as determined by the clusters derived
from the K-means clustering technique) will be represented in
both series of compounds, i.e., training and prediction sets,
selecting randomly from each cluster.
External Validation
The statistically internally optimized models were further eval-
uated for their real predictive power on the prediction set chemi-
cals not used in the model building process. The developed
models were judged by different external validation parameters
like Q2-F114, Q2-F215, Q2-F316, r2m17 and a recently proposed pa-
rameter by our group.18
Q2-F114, which is widely used as a metric for external valida-
tion for long time, was calculated for all the developed models
for their external predicivity. The two other variant of Q2 for
external validation (Q2-F215, Q2-F316) were also calculated.
They are expressed as follows
Q2F1 ¼ 1�
PnEXTi¼1 ðyi � yiÞ2PnEXTi¼1 ðyi � �yTRÞ2
¼ 1� PRESS
SSEXTð�yTRÞ (i)
Q2F2 ¼ 1�
PnEXTi¼1 ðyi � yiÞ2PnEXT
i¼1 ðyi � �yEXTÞ2¼ 1� PRESS
SSEXTð�yEXTÞ (ii)
Q2F3 ¼ 1�
PnEXTi¼1 ðyi � yiÞ2
h i=nEXT
PnTRi¼1 ðyi � �yTRÞ2
h i=nTR
¼ 1� PRESS=nEXTTSS=nTR
(iii)
where yi and yi indicate calculated and observed activity values
respectively.
�yTR, �yEXT indicates the response means of the training and
external test set respectively. PRESS is the predictive sum of
squares, SSEXT (�yTR) and SSEXT (�yEXT) are the total sum of
squares of the external set calculated by means of the training
set mean and the external set mean, respectively. TSS is the
total sum of squares.
An additional parameter r2m,17 which penalizes a model for
large differences between observed and predicted values of the
prediction set compounds, as well as independent of the mean of
training and prediction set, was also calculated for model exter-
nal predictivity. The expression of r2m is defined as:
r2m ¼ r2 ð1�ffiffiffiffiffiffiffiffiffiffiffiffiffiffir2 � r20
qÞ (iv)
where r2 and r20 are determination coefficients of linear relations
between the observed and predicted values of the prediction set
compounds with and without intercept respectively.
Finally, an additional measure of the accuracy of the pro-
posed QSARs is the Root Mean Squared of Errors (RMSE) that
summarizes the overall error of the model. It is calculated as the
square root of the sum of squared errors in prediction divided by
their total number. This parameter was used to compare the ac-
curacy and the stability of our models in the training (RMSET)
and in the prediction (RMSEP) sets.
Applicability Domain
In this study, the AD was defined by the leverage approach29
(for the structural domain), and by the identification of response
outliers (compounds with cross-validated standardized residuals
greater than 2.5 standard deviation units).
Graphically, the plot of hat values (h) versus standardized
residuals, i.e., the Williams graph, verified the presence of
response outliers and training set chemicals that are structurally
very infiuential in determining model parameters (compounds
with leverage value (h) greater than 3p0/n (h*), where p0 is the
number of the model variables plus one, and n is the number of
the objects used to calculate the model). The data predicted for
high leverage chemicals in the prediction set are extrapolated
and could be less reliable.
Results and Discussion
The studied data set of the kinetic constant for degradation by
OH (kOH) of VOCs has been modeled in the past by some
authors,10,30–34 including our group,10,31 with similar performan-
ces, and it is also included in the training set of the AOPWIN
package in the widely used software EPI Suite.35 We have rede-
veloped QSAR models on this data set to apply them to
CADASTER chemicals, and to have an updated reproducible
model, rigorously validated for its external predictivity and
applicability domain, for possible application in REACH. A
wide range of theoretical molecular descriptors (zero-, mono-,
and bi-dimensional) were here used as input descriptors (some
calculated from new versions of DRAGON software11 and some
freely-available online13) to find the statistical correlation
with the studied response, based on updated and reproducible
2389OH Degradation QSAR Model Reproducibility and Applicability
Journal of Computational Chemistry DOI 10.1002/jcc
descriptors. Additionally quantum chemical descriptors, like
HOMO and LUMO energies and HOMO-LUMO gap, were used
in the input pool of variables, as they had already demonstrated
in previous works10,31,33,34 to have a pivotal role in reactivity
modeling. The Genetic Algorithm, as Variable Subset Selection
(GA-VSS), was applied to select only the best combination of
descriptors from both pools, affording models with the highest
internal predictive power (verified by cross-validation). Since the
main utility of QSAR models, mainly for virtual screening, is
their ability to make accurate predictions for new query com-
pounds, never used in model development but within their
applicability domain, we supposed that a part of the experimen-
tally available data were not known and put them in the predic-
tion set, which was not used for model development, but was
used only later to check the predictive power of our models
developed on the reduced training set. Three different splitting
procedures were adopted, two based on structural similarity
analysis (K-ANN, K-means) and one random by sorting the
response, to propose models that have a demonstrated high per-
formance in predicting external chemicals of different typology,
avoiding the bias derived from an unique split. The selection of
modeling variable by GA was performed by Multiple Linear
Regression separately in the three different training sets, obtain-
ing three parallel populations of good models with similar inter-
nal predictivity (Q2 [ 0.7) and verified for performance on the
corresponding external prediction sets. Those models based on
the same combination of descriptor, selected independently in
three splittings and demonstrating high predictivity on the re-
spective prediction set chemicals, were chosen as the best for
external predictive performance. In fact, similar good perform-
ance in the prediction of ‘‘supposed unknown’’ chemical, in
each splitting, demonstrates the validity of that particular combi-
nation of the structural information in the studied response pre-
diction, regardless of the composition of the training sets (thus
unbiased of structure and response value).
For external predictivity check, because of the recent increase
of various statistical parameters, proposed and preferred by vari-
ous authors14–17 and because we have verified that they are not
always concordant,18 we have applied all the parameters
reported in Table 1, those already published14–17 and one that
we recently adopted in our lab.18 Finally, the set of combined
descriptors, which had been demonstrated as useful for the pre-
diction of chemicals not used in model development, was
applied to derive a full model from the complete data set, in
order not to lose any available information36 (Scheme of the
procedure in Fig. 1).
Model Based on DRAGON Descriptors
The chosen predictive models selected from a population of 100
different models were based on the same 4 variables in the three
split training sets (by K-ANN, by K-means algorithm and by
random on response). They are listed in Table 1 with their statis-
tical parameters.
It is evident that all models perform similarly in their
ability to predict external chemicals, independently on the
splitting. Additionally, similar values of RMSE both in train-
ing and prediction sets are guarantee of model generalizabil- Table
1.ComparativeStatistical
Perform
ancesofDifferentDeveloped
Models.
Descriptors
Splittingmethod(no.ofchem
icals)
R2
Q2 LOO
Q2 F114
Q2 F215
Q2 F316
R2 m17
Conc.
coeff18
RMSE(prediction)
Rys
DRAGON
descriptors
HOMO,nX,ID
E,nCbH
K-A
NN
(191a/269b)
0.867
0.856
0.797
0.794
0.766
0.77
0.89
0.47
0.021
Random
(230a/230b)
0.826
0.817
0.819
0.819
0.810
0.80
0.90
0.44
0.018
K-m
eans(230a/230b)
0.836
0.827
0.804
0.802
0.836
0.75
0.90
0.43
0.017
Fullmodel
(460)
0.824
0.819
0.901
0.431
0.009
Onlinedescriptors
HOMO,SeaC2C2aa,G_([Cl,Br,I]),
D_PathSum(F,rel)
K-A
NN
(191a/269b)
0.847
0.834
0.778
0.775
0.745
0.76
0.88
0.49
0.020
Random(230a/230b)
0.814
0.803
0.796
0.795
0.786
0.76
0.89
0.47
0.017
K-m
eans(230a/230b)
0.813
0.803
0.795
0.793
0.829
0.75
0.89
0.44
0.018
Fullmodel
(460)
0.806
0.801
0.891
0.451
0.008
aTrainingcompounds,
bPredictioncompounds,1
representresultsforallthechem
icals.
2390 Roy, Kovarich, and Gramatica • Vol. 32, No. 11 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
ity. The difference of the split models lies in the regression
coefficients depending on the training set composition. By
Principal Component Analysis of the compounds, represented
by the selected modeling descriptors, it is possible to verify
that in all three splittings the distribution between the train-
ing and prediction sets is balanced and representative of the
chemical domain. The PCA score plots of first three compo-
nents of the selected descriptor matrix (Supporting Informa-
tion Fig. S1) show the distribution of training and prediction
set compounds in 3D space: each prediction set member is
close to at least one training set member in the multidimen-
sional space.
Figure 1. Scheme of the QSAR procedure for model development and external validation.
2391OH Degradation QSAR Model Reproducibility and Applicability
Journal of Computational Chemistry DOI 10.1002/jcc
All the plots of experimental vs. predicted values in the three
splittings, as well as the corresponding Williams plots for analy-
sis of the applicability domain (AD) are in Supporting Informa-
tion (Figs. S2 and S3), here we report the equation and the
graphs for the full model [eq. (i); Fig. 2]:
� logðOHÞ ¼ 4:07ð60:48Þ � 0:72ð60:04ÞHOMO
þ 0:37ð60:04ÞnXþ0:16ð60:02ÞnCbH� 0:34ð60:07ÞIDE eq: ðiÞ
n ¼ 460; R2 ¼ 0:824; Q2LOO ¼ 0:819; Q2
BOOT ¼ 0:817;
RMSEtr ¼ 0:43; RMSECV ¼ 0:43
It is important to note that the applied variable selection proce-
dure, GA, was able to select, from a wider and slightly different set
of descriptors developed from updated versions of DRAGON, also
in the current model, four descriptors (HOMO, nX, nCbH, IDE) ei-
ther identical or with almost similar information of those in the
previously published model10 (HOMO, nX, CIC0, nCaH) and was
able to confirm their respective negative or positive influence on
the studied response. Highest occupied molecular orbital (HOMO)
energy, already a well recognized molecular property for OH mod-
eling, was again found to be the best descriptor in all the models,
negatively correlated to the response (here standardized regr. coef-
ficient 5 20.755). This descriptor characterizes the susceptibility
of a molecule toward the attack by the electrophile OH radical,
more reactive chemicals having higher HOMO energy. Further,
nX (standardized regr. coefficient 5 0.356) is the number of halo-
gen atoms. Molecules with more halogen atoms tend to have less
reactivity (higher log kOH values). No longer present in the
updated versions of DRAGON software is nCaH, which was the
number of unsubstituted sp2-carbon in any ring, mainly aromatics.
The new descriptor, selected here as alternative to nCaH, is nCbH
(the number of unsubstituted sp2-carbon only in benzene-type
rings (standardized regr. coefficient 5 0.324)). These descriptors,
which are negatively correlated to the response in univariate mod-
els, are both able to condense information on possible reactive sites
in aromatic rings. The chemicals with higher number of hydrogen
atoms can be more attacked by the hydroxyl radical and are, for
this reason, more reactive. Less important are the topological
descriptors, CIC0 in the old version and IDE in the new version
(standardized regr. Coefficient 5 20.201). They are the informa-
tion containing indices carrying similar structural information, and
are interchangeable without significant loss of model quality. Thus
it can be stated that Genetic Algorithm reliably extracted structural
information included in the above combination of descriptors,
which was obtained from different training set input for model de-
velopment and demonstrated its ability also in external prediction.
Model Based on Online Descriptors
Moreover, we have developed QSAR models, based on freely
available online 2D-descriptors,13 to propose models that can be
also applicable without a commercial software for descriptor cal-
culation. The best predictive models were found in a population
of 100 models with 4 variables, using separately K-ANN, ran-
dom and K-means algorithm for splitting; they are listed in Ta-
ble 1 with their statistical parameters. The final stable combina-
tion of descriptors, present in all the model populations obtained
from three different training set inputs and with maximum pre-
dictive performance on the prediction set compounds, was:
HOMO, SeaC2C2aa, D_path(F, rel), G_([Cl, Br, I]). Finally, a
full model with significant statistical quality [eq. (ii)] was devel-
oped based on the above mentioned descriptors (Fig. 3):
Figure 2. (a) Plot of experimental vs. calculated values for the full model based on DRAGON
descriptors; (b) Williams plot for the AD of the DRAGON full model.
2392 Roy, Kovarich, and Gramatica • Vol. 32, No. 11 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
� logðOHÞ ¼ 3:83ð60:48Þ � 0:69ð60:05ÞHOMO
þ 1:26ð60:17ÞD PathSumðF; relÞþ 0:43ð60:07ÞG ð½Cl; Br; I�Þ þ 0:06ð60:01ÞSeaC2C2aa eq: ðiiÞ
n ¼ 460; R2 ¼ 0:806; Q2LOO ¼ 0:801; Q2
BOOT ¼ 0:797;
RMSEtr ¼ 0:45RMSECV ¼ 0:45
Also from this completely different pool of input descriptors GA
selected HOMO as the most relevant (Std coeff.520.718). The E-
state index37 SeaC2C2aa (Std coeff.5 0.266) is the sum of the bond
electro topological values of carbon–carbon aromatic bonds in
which the carbons are not substituted. This descriptor, which is
inversely correlated with the modeled response in the univariate
model, gives similar information as nCbH DRAGON descriptor.
The remaining two descriptors D_pathSum(F, rel) (Std coeff. 50.319) and G_([Cl, Br, I]) (Std coeff. 5 0.265), both positively cor-
related to the response as nX, are AMBIT descriptors38 and are
counts of the number of halogen atoms in the molecules. Thus the
above descriptors cumulatively gave us the same information as we
obtained from nX in the current and previous DRAGON descriptor
model. Interestingly, it can again be stated that GA identified the
useful variables for the modeling of hydroxyl radical rate constants
irrespective of the different input descriptors.
Applicability Domain
QSAR models are developed on a defined domain of compounds
based on properties and structures of training set compounds.
Therefore new chemicals outside the chemical domain are ex-
trapolated and have a higher possibility of being predicted
worse. Thus, there is the need for a quantitative measure of the
applicability domain (AD) to identify problematic chemicals5 in
the modeled data set, both to highlight chemicals that could be
outliers for the response (not well predicted) or for the peculiar
structure (influential or high leverage outliers). An interesting
extension of applicability domain study, particularly for ‘‘pre-
dictive’’ QSAR models, is the check of possible belonging to
the training chemical space for new chemicals without experi-
mental data, to verify if the predicted data could be interpolation
or extrapolation of the proposed model.
The outliers compounds in training and prediction sets in dif-
ferent splittings (Supporting Information Figs. S2 and S3) are
somehow different due to the dissimilar combination of com-
pounds and modeling descriptors. However, on analyzing the
applicability domain for the above models, and also in full mod-
els (Figs. 2b and 3b), some common compounds have been
found as outliers or influential in all the models:
i. Triethyl phosphate (61) and 2-(chloromethyl)-3-chloro-1-pro-
pene (403) are two response outliers that were predicted as
less reactive by all the models;
ii. Bromomethane (18), dimethylsulfide (37), diethyl sulfide
(263), ethyl methyl sulfide (353), 3-methyl-1,2-butadiene
(342), are response outliers, that were predicted as more
reactive by all the models, raising some doubts with regard
to the quality of the experimental data of these compounds,
for which new experimental measures are suggested;
iii. Fluorinated chemicals: 1,1,2,2-tetrachloroethene (232), 1,1-
dichloro-2,2,2-trifluoroethane (262), 1,1,1,2,2-pentafluoro-
ethane (265), hexafluorobenzene (267), 1-chloro-1,2,2,2-tet-
rafluoroethane (414) and propylpentafluorobenzene (457), are
highly structurally influential compounds in all the models.
This was already found in our previous study.10
Figure 3. (a) Plot of experimental vs. calculated values for the full model based on Online descriptors;
(b) Williams plot for the AD of the online full model.
2393OH Degradation QSAR Model Reproducibility and Applicability
Journal of Computational Chemistry DOI 10.1002/jcc
Application to CADASTER Chemicals: PBDEs
and (B)TAZs
Two classes of CADASTER chemicals, namely Polybromi-
nated diphenylethers (PDBEs) and (benzo)triazoles (BTAZs),
were used to verify the applicability of our models in the
prediction of chemicals without experimental data. Contempo-
raneously, we verified whether the new studied chemicals lie
within the structural AD of our models by verifying their
leverage (hat value in comparison to h* cut-off value). In
Figure 4 two plots of predicted values vs. hat values are
reported for both sets (PBDEs and BTAZs) for the DRAGON
model. It is evident from these plots that all the PBDEs are
outside the applicability domain of our model, whereas,
for BTAZs almost 75% of the chemicals are within its applic-
ability domain.
From the PBDE plot (Fig. 4a) it can be verified that chemi-
cals with an increasing number of bromine atoms have the tend-
ency to go far from the domain, and were extrapolated as less
reacting chemical then those with fewer Br atoms, which were
extrapolated as higher reacting chemicals.
On evaluating the domain of applicability for BTAZs (Fig.
4b), we did not observed any significant trend. We verified that
chemicals within the applicability domain, interpolated as high
reacting chemicals, have a thio linkage in their structures,
whereas, chemicals far from the AD, extrapolated as less react-
ing chemicals, have more fluorine atoms or have a metal atom
in their chemical structure.
In addition, we obtained predicted data for the same chemi-
cals by applying the widely used online package EPI Suite35. In
Supporting Information Table S-III it is possible to verify that
the difference in the predictions for PDBEs is within 0.8 log
unit between our models and those of EPI Suite (91 % into 0.5
log units), indeed a good correlation (94%) between the two sets
of predicted values is observed. The dominant trend in both
modeling approaches is determined mainly by the number of
bromines. Thus, we can conclude that our model and EPI Suite
have similar predicted data, but our AD check can inform that
all these data are extrapolated and, for this reason, could be
unreliable; similar information on reliability for AD is not avail-
able in EPI Suite.
Larger prediction differences were observed for (benzo)tria-
zoles between our models and EPI Suite (Supporting Informa-
tion Table S-IV).
It is interesting to note that the majority of chemicals within
the applicability domain were overestimated as high reacting
chemicals compared to EPI Suite predictions. On the contrary,
most of the compounds outside the applicability domain were
underestimated as less reacting chemicals by our model com-
pared to EPI Suite.
The information on AD for completely new chemicals is an
advantageous aspect of our approach in comparison to EPI
Suite: predicted values can always be obtained by QSAR mod-
els, but the crucial information regarding the interpolation or
extrapolation is needed. Also important to note is that PBDEs
and BTAZs are structurally quite different from the volatile
organic compounds present in the training sets of our models
and EPI Suite.
Our models were not considered to be reliably applicable to
perfluorinated chemicals (PFCs), another CADASTER class, as
the fluorinated compounds present in the original data set were
always structurally influential. Our models tended to predict
PFCs, which however were all out of model AD, as being highly
less reactive than EPI Suite. We verified big discrepancies
between our predicted values and those obtained by EPI
Suite, with differences higher than 1 log unit for 84% of the
checked compounds
Comparison with Published Models
The statistical qualities of the different published models and
current models are listed in Table 2. Comparative comments can
Figure 4. (a) Plot of predicted values vs. hat values for PBDEs (b) Plot of predicted values vs. hat
values for BTAZs.
2394 Roy, Kovarich, and Gramatica • Vol. 32, No. 11 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc
be made, although it is not possible to make a perfect compari-
son of the published models, as different data sets and different
algorithms were used for model building and validation. It is
interesting to note that the descriptors selected in different
models, mainly in those obtained from training sets similar in
dimension and typology, have comparable structural and mecha-
nistic meaning. Also the statistical quality of all these models is
similar and satisfactory.
Bakken and Jurs30 used their non linear artificial neural net-
work (CNN) model to provide accurate predictions over a wide
range of functionalities. Neural Networks is a more complex
method but generally gives better statistics. The peculiarity of
Oberg model32 is its application for screening a big set of chem-
icals with half life ranging from days to years, considering also
the percentage of compounds in or outside the applicability
domain of the model. Recently Wang et al.33 developed statisti-
cally validated models for the constant rate of degradation by
OH of phenols, alkenes and alcohols, with the applicability
domain limited to the chemical domain of the model. These
authors also developed global PLS models34 with an extended
applicability domain. But no comment was made on influential
chemicals with high leverage values.
Our models, developed on a big data set as those of Oberg32
and Wang et al.33 are in perfect accordance with three central
OECD Principles: i) Principle 2: simple and now easily repro-
ducible unambiguous algorithms [eqs. (i) and (ii)], obtained by
the simplest MLR method based on only 4, easily interpretable,
molecular descriptors; ii) Principle 3: possibility to verify AD,
not only for the split training and prediction sets, but also for
new chemicals without experimental data; iii) Principle 4: rigor-
ous external validation by different splittings, and the application
of different statistical parameters.
Conclusions
The need for regular check and updating of published QSAR
models is again demonstrated, if these models are to be useful
for practical applications and not just for scientific purposes.
Indeed, QSAR models must be reproducible, and must be practi-
cally applicable to new chemicals that have no experimental
data, in this case CADASTER classes.
The newly developed models, both from the more recent
DRAGON versions and the online descriptors plus HOMO
energy, were found to be statistically valid both internally and
especially externally, considering different composition of the
external prediction sets, obtained by applying different splitting
methods for leaving out some chemicals (those of prediction
sets) from the model development procedure. The present work
also confirmed the ability of Genetic Algorithms to extract, and
not by chance, important information related to the studied
response, from different pools of input descriptors. The relevant
information included in the selected descriptors has interpretable
mechanistic meaning.
Furthermore, our study placed special emphasis on the
applicability domain of the models, identifying not only
response outliers or structurally influential chemicals in the orig-
inal set, but also verifying which of the CADASTER chemicals,Table
2.ComparisonofthePresentModelswithPreviouslyPublished
QSAsR
s.
Reference
Modeling
technique
No.ofdescriptors/PLS
components
No.of
compounds
Descriptors
Q2 LOO
RMSEtr
Q2 EXT
RMSEExt
30
CNN
552/5
aTopological
0.071
0.064
CNN
10
281/31a
Topological,electronic,
0.230
0.250
10
MLR
4234/226a
HOMO,nX,nCaH
,CIC0
0.816
0.422
0.813
0.436
31
MLR
6460
HOMO,MATS1m,nDB,
nO,CIC2,RTeÞ
0.841
0.407
32
PLS
333/7
495/238a
–0.875
0.449
0.840
0.501
33
MLR
444/11a
HOMO,QH,MSA
andl
0.806
0.139
0.922
0.079
34
PLS
22/3
576/146a
(Ds,HOMO,nX,BELm2)b
0.865
0.391
0.872
0.430
Thisstudy(external
validationin
Table
1)
MLR
4460
HOMO,nX,nCbH,ID
E0.819
0.430
Thisstudy(external
validationin
Table
1)
MLR
4460
HOMO,SeaC2C2aa,D_path(F,
rel),G_([Cl,Br,I])
0.801
0.450
aNumber
ofexternal
setcompounds,
bInfluential
descriptors
inPLSlatentvariables.
2395OH Degradation QSAR Model Reproducibility and Applicability
Journal of Computational Chemistry DOI 10.1002/jcc
for which no experimental reactivity values are available, are
within or out the AD of our models. We compared our predic-
tions with those of the widely used software EPI Suite, and
found some (PBDEs) to be in good agreement, whereas, others
(BTAZs) had limited comparability. One of the advantages of
our model is that a chemical’s position inside or outside the
model AD is known, which is not the case for the EPI Suite
software. However, such AD information is highly important to
users of QSAR predictions as it facilitates their decision-making.
Acknowledgments
We wish to thank Ester Papa, Nicola Chirico and Stefano Cassani
for their support to P.P. Roy. We thank the University of Insubria
for providing a post-doc fellowship to Dr. P.P. Roy.
References
1. Zefirov, N. S.; Palyulin, V. A. J Chem Inf Comput Sci 2001, 41,
1022.
2. http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm (accessed
27 January 2011).
3. http://www.oecd.org/dataoecd/33/37/37849783.pdf (accessed 27 Jan-
uary 2011)
4. http://www.oecd.org/officialdocuments/displaydocumentpdf (accessed
27 January 2011).
5. Netzeva, T. I.; Worth, A. P.; Aldenberg, T.; Benigni, R.; Cronin, M.
T. D.; Gramatica, P.; Jaworska, J. S.; Kahn, S.; Klopman, G.; March-
ant, C. A.; Myatt, G.; Nikolova-Jeliazkova, N.; Patlewicz, G. Y.; Per-
kins, R.; Roberts, D. W.; Schultz, T. W.; Stanton, D. T.; van de
Sandt, J. J. M.; Tong, W.; Veith, G.; Yang, C. ATLA 2005, 33, 155.
6. Tropsha, A.; Gramatica, P.; Gombar, V. K. QSAR Comb Sci 2003,
22, 69.
7. Gramatica, P. QSAR Comb Sci 2007, 26, 694.
8. Tropsha, A. Mol Inf 2010, 29, 476.
9. http//www.cadaster.eu (accessed 27 January 2011).
10. Gramatica, P.; Pilutti, P.; Papa, E. J Chem Inf Comput Sci 2004, 44,
1794.
11. DRAGON for Windows, ver.5.5, 2007, Talete srl, Milano, Italy.
12. Gramatica, P.; Papa, E. QSAR Comb Sci 2005, 24, 953.
13. www.cadaster.eu/database (accessed 27 January 2011).
14. Shi, L. M.; Fang, H.; Tong, W.; Wu, J.; Perkins, R.; Blair, R. M.;
Branham, W. S.; Dial, S. L.; Moland, C. L.; Sheehan, D. M. J Chem
Inf Comput Sci 2001, 41, 186.
15. Schuurmann, G.; Ebert, R. U.; Chen, J.; Wang, B.; Kuhne, R.
J Chem Inf Model 2008, 48, 2140.
16. Consonni, V.; Ballabio, D.; Todeschini, R. J Chem Inf Model 2009,
49, 1669.
17. Roy, P. P.; Roy, K. QSAR Comb Sci 2008, 27, 302.
18. Chirico, N.; Papa, E.; Gramatica, P. Presented at the 21 SETAC
Europe Meeting, May 2011, Milan, Italy.
19. Atkinson, R. J Phys Ref Data 1989, Monograph 1, 1.
20. HyperChem, Rel. 7.03 for Windows, 2002. Hypercube. Inc. Gaines-
ville, Florida, USA.
21. Roy, K.; Ghosh, G. Int Electron J Mol Des 2003, 2, 599.
22. Papa, E.; Kovarich, S.; Gramatica, P. QSAR Comb Sci 2009, 28, 790.
23. MOBYDIGS Professional for Windows Ver. 1.0 beta, 2004. Talete
srl, Milano, Italy.
24. Leardi, R.; Boggia, R.; Terrile, M. J Chemom 1992, 6, 267.
25. Mitra, I.; Saha, A.; Roy, K. Mol Simul 2010, 36, 1067.
26. Todeschini, R.; Consonni, V.; Maiocchi, A. Chemom Int Lab Syst
1999, 46, 13.
27. Gasteiger, J.; Zupan, J. Angew Chem Int Ed Engl 1993, 32, 503.
28. Leonard, J. T.; Roy, K. QSAR Comb Sci 2006, 25, 235.
29. Atkinson, A. C. Plots, Transformations and Regression; Clarendon
Press: Oxford, 1985.
30. Bakken, G.; Jurs, P. J Chem Inf Comput Sci 1999, 39, 1064.
31. Gramatica, P.; Pilutti, P.; Papa, E. Atmos Environ 2004, 38, 6167.
32. Oberg, T. Atmos Environ 2005, 39, 2189.
33. Wang, Y.; Chen, J.; Li, X.; Zhang, S.; Qiao, X. QSAR Comb Sci
2009, 28, 1309.
34. Wang, Y.; Chen, J.; Li, X.; Wang, B.; Cai, X.; Huang, L. Atmos En-
viron 2009, 43, 1131.
35. EPI Suite. http://www.epa.gov/oppt/exposure/pubs/EPI Suite.htm
(accessed 27 January 2011).
36. Bhhatarai, B.; Gramatica, P. Chem Res Toxicol 2010, 23, 528.
37. Hall, L. H.; Kier, L. B. J Chem Inf Comput Sci 2000, 30, 784.
38. http://ambit.sourceforge.net/intro.html (accessed 27 January 2011).
2396 Roy, Kovarich, and Gramatica • Vol. 32, No. 11 • Journal of Computational Chemistry
Journal of Computational Chemistry DOI 10.1002/jcc