22
RESEARCH ARTICLE Examining predictors of chemical toxicity in freshwater fish using the random forest technique Baigal-Amar Tuulaikhuu 1,2 & Helena Guasch 1 & Emili García-Berthou 1 Received: 18 July 2016 /Accepted: 20 February 2017 /Published online: 3 March 2017 # Springer-Verlag Berlin Heidelberg 2017 Abstract Chemical pollution is one of the main issues glob- ally threatening the enormous biodiversity of freshwater eco- systems. The toxicity of substances depends on many factors such as the chemical itself, the species affected, environmental conditions, exposure duration, and concentration. We used the random forest technique to examine the factors that mediate toxicity in a set of widespread fishes and analyses of covari- ance to further assess the importance of differential sensitivity among fish species. Among 13 variables, the 5 most important predictors of toxicity with random forests were, by order of importance, the chemical substance itself (i.e., Chemical Abstracts Service number considered as a categorical factor), octanol-water partition coefficient (log P), pollutant prioritiza- tion, ecological structure-activity relationship (ECOSAR) classification, and fish species for 50% lethal concentrations (LC 50 ) and the chemical substance, fish species, log P , ECOSAR classification, and water temperature for no ob- served effect concentrations (NOECs). Fish species was a very important predictor for both endpoints and with the two con- trasting statistical techniques used. Different fish species displayed very different relationships with log P , often with different slopes and with as much importance as the partition coefficient. Therefore, caution should be exercised when ex- trapolating toxicological results or relationships among spe- cies. In addition, further research is needed to determine species-specific sensitivities and unravel the mechanisms be- hind them. Keywords Ecotoxicology . Octanol-water partition coefficient . Species-specific sensitivity Introduction Chemical pollution is one of the main issues globally threat- ening the enormous biodiversity of freshwater ecosystems (Dudgeon et al. 2006). There is a large amount of different chemicals (over 100,000; Hansen et al. 1999), differing in their toxicity in the aquatic environment. The toxicity of sub- stances depends on many factors such as the characteristics of the chemical substance itself (e.g., its mode of action; Vaal et al. 1997); the species affected and its life stage (Woltering 1984); the exposure duration and concentration; and the envi- ronmental conditions such as temperature (Li et al. 2014), pH (Thurston and Russo 1981), salinity (Grosell et al. 2007), hardness, or alkalinity (Riethmuller et al. 2001). Thousands of toxicological assays have been performed to determine the effects of different chemical compounds on a multitude of organisms and conditions, and syntheses and quantitative comparisons are needed to provide further understanding. The availability of databases with extensive data provides in- creasing opportunities to statistically evaluate the most impor- tant predictors of chemical toxicities and its interactions (Tebby et al. 2011). As shown by previous ecotoxicological studies, contrasting sensitivity to different types of chemicals has been observed at different trophic levels, such as algae, crustaceans, and fish (Henegar et al. 2011; Riethmuller et al. Responsible editor: Henner Hollert Electronic supplementary material The online version of this article (doi:10.1007/s11356-017-8667-4) contains supplementary material, which is available to authorized users. * Emili García-Berthou [email protected] 1 GRECO, Institute of Aquatic Ecology, University of Girona, 17003 Girona, Spain 2 Department of Ecology, School of Agroecology, Mongolian University of Life Sciences, Ulaanbaatar, Mongolia Environ Sci Pollut Res (2017) 24:1017210181 DOI 10.1007/s11356-017-8667-4

Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

RESEARCH ARTICLE

Examining predictors of chemical toxicity in freshwater fishusing the random forest technique

Baigal-Amar Tuulaikhuu1,2& Helena Guasch1

& Emili García-Berthou1

Received: 18 July 2016 /Accepted: 20 February 2017 /Published online: 3 March 2017# Springer-Verlag Berlin Heidelberg 2017

Abstract Chemical pollution is one of the main issues glob-ally threatening the enormous biodiversity of freshwater eco-systems. The toxicity of substances depends on many factorssuch as the chemical itself, the species affected, environmentalconditions, exposure duration, and concentration. We used therandom forest technique to examine the factors that mediatetoxicity in a set of widespread fishes and analyses of covari-ance to further assess the importance of differential sensitivityamong fish species. Among 13 variables, the 5 most importantpredictors of toxicity with random forests were, by order ofimportance, the chemical substance itself (i.e., ChemicalAbstracts Service number considered as a categorical factor),octanol-water partition coefficient (log P), pollutant prioritiza-tion, ecological structure-activity relationship (ECOSAR)classification, and fish species for 50% lethal concentrations(LC50) and the chemical substance, fish species, log P,ECOSAR classification, and water temperature for no ob-served effect concentrations (NOECs). Fish species was a veryimportant predictor for both endpoints and with the two con-trasting statistical techniques used. Different fish speciesdisplayed very different relationships with log P, often withdifferent slopes and with as much importance as the partition

coefficient. Therefore, caution should be exercised when ex-trapolating toxicological results or relationships among spe-cies. In addition, further research is needed to determinespecies-specific sensitivities and unravel the mechanisms be-hind them.

Keywords Ecotoxicology . Octanol-water partitioncoefficient . Species-specific sensitivity

Introduction

Chemical pollution is one of the main issues globally threat-ening the enormous biodiversity of freshwater ecosystems(Dudgeon et al. 2006). There is a large amount of differentchemicals (over 100,000; Hansen et al. 1999), differing intheir toxicity in the aquatic environment. The toxicity of sub-stances depends on many factors such as the characteristics ofthe chemical substance itself (e.g., its mode of action; Vaalet al. 1997); the species affected and its life stage (Woltering1984); the exposure duration and concentration; and the envi-ronmental conditions such as temperature (Li et al. 2014), pH(Thurston and Russo 1981), salinity (Grosell et al. 2007),hardness, or alkalinity (Riethmuller et al. 2001). Thousandsof toxicological assays have been performed to determine theeffects of different chemical compounds on a multitude oforganisms and conditions, and syntheses and quantitativecomparisons are needed to provide further understanding.The availability of databases with extensive data provides in-creasing opportunities to statistically evaluate the most impor-tant predictors of chemical toxicities and its interactions(Tebby et al. 2011). As shown by previous ecotoxicologicalstudies, contrasting sensitivity to different types of chemicalshas been observed at different trophic levels, such as algae,crustaceans, and fish (Henegar et al. 2011; Riethmuller et al.

Responsible editor: Henner Hollert

Electronic supplementary material The online version of this article(doi:10.1007/s11356-017-8667-4) contains supplementary material,which is available to authorized users.

* Emili Garcí[email protected]

1 GRECO, Institute of Aquatic Ecology, University of Girona,17003 Girona, Spain

2 Department of Ecology, School of Agroecology, MongolianUniversity of Life Sciences, Ulaanbaatar, Mongolia

Environ Sci Pollut Res (2017) 24:10172–10181DOI 10.1007/s11356-017-8667-4

Page 2: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

2001; Tebby et al. 2011), and also among different fish species(Vittozzi and De Angelis 1991).

Quantitative structure-activity relationships (QSARs) areoften used to predict toxicity from physicochemical properties(e.g., molecular descriptors) for chemicals not well known orto reduce the number of assays (Netzeva et al. 2008). Amongmany molecular descriptors, the most widely used include themolecular weight and the octanol-water partition coefficient(log P) (Levet et al. 2013). Log P is a measure of hydropho-bicity, which mediates several processes, including sorptionand accumulation (Katritzky et al. 2010; Meylan et al. 1999),and is also a key parameter for the environmental fate andeffects of toxicants (Lifongo and Nfon 2009). Identifying thepotential mechanisms ormodes of action and closest structuralsimilarity of new chemicals is an initial step for predictingtheir toxicity with QSARs. For acute aquatic toxicity effects,several endpoint-specific profilers, including ecologicalstructure-activity relationship (ECOSAR) (Mayo-Bean et al.2012) classification, Optimized Approach based on StructuralIndices Set (OASIS) acute toxicity mode of action, Verhaarclassification, and Cramer classification have been suggested(OECD 2009). In organic chemistry, functional groups are aset of specific atoms (e.g., aldehydes, ethers, ketones, or phe-nols) that occur in a wide range of compounds and conferupon them a common kind of reactivity (Vollhardt andSchore 2011) and are thus helpful in predicting their toxicity(Mayo-Bean et al. 2012). In addition, regulations often distin-guish between Bpriority pollutants,^ i.e., human-producedpollutants, frequent in streams, with published analytical testmethods and more well-known toxicity, and Bemergingpollutants,^ i.e., substances that have been discovered in theenvironment more recently (often because of improved ana-lytical chemistry detection levels) and potentially cause dele-terious effects in aquatic life at environmentally relevant con-centrations (USEPA 2014). Comparing priority and emergingcontaminants may help to understand the environmental sig-nificance of the latter.

Although physiochemical properties of substances are con-sidered good predictors of toxicology of industrial chemicals,ecological effects of a specific chemical may be under- orover-estimated and differences among species are oftenneglected. Fish are excellent ecological indicators due to anumber of reasons (e.g., greater life span and high trophiclevel, which often imply integration of perturbations at largerspatial and temporal scales), and differences in toleranceamong species are widely used in biomonitoring and ecosys-tem health assessment (Maceda-Veiga and De Sostoa 2011;Oberdorff et al. 2002). For example, Fedorenkova et al. (2013)reported a three to four orders of magnitude difference in 50%lethal concentrations (LC50) among fish species in RiverRhine and its tributaries, based on data from the ECOTOXdatabase. Differences in species sensitivity can be very sub-stantial, and the number of species for which some

toxicological information is available represents only a smallfraction of the total number of species existing (Yang andRandall 1997). For this reason, toxicity is often inferred frommodel species or from taxonomically related species, imply-ing high uncertainty in predictions.

Random forests (RFs) are a recent machine-learning tech-nique increasingly used in many scientific areas due to theirhigh accuracy and ability to characterize complex interactionsamong predictors (Breiman 2001; Strobl et al. 2008). RFs arean extension of classification or regression trees, which in turnare a form of binary recursive partitioning that builds a deci-sion tree by repeatedly partitioning the data set into a nestedseries of mutually exclusive groups, selecting the best candi-date split at each step and then selecting the optimal tree. RFsfit many regression trees to a data set and then combine thepredictions from all the trees (Breiman 2001). RF have manyadvantages over other more conventional statistical tech-niques, which are run efficiently on large databases with manycorrelated predictors, give estimates of what variables aremore important, have an effective method for estimating miss-ing data while maintaining accuracy, and handle particularlywell non-linearities and interactions (Prasad et al. 2006; Cutleret al. 2007).

In this paper, we aimed to (i) rank the importance of pre-dictors of aquatic toxicity of the most frequently testedchemicals in a set of widespread fishes using RF and (ii) toquantify the differences in sensitivity among these species andto test if the relationship with log P varies among them. Wehypothesized that (i) species might rank among the most im-portant predictors for modeling toxicity and (ii) log P is wellrelated to toxicity but this relationship varies markedly amongspecies. Our aim was not to develop another predictive modelof toxicity but to compare the usefulness of several previoustoxicological tools and other predictors in a large set of fishand substances. Thus, this study exemplifies how random for-ests can contribute to understand the effects of chemicals onthe environment.

Methods

Data compilation

We compiled ecotoxicological data mainly from the ECOTOXdatabase (accessed through https://cfpub.epa.gov/ecotox/ inMarch 2014). The ECOTOX database was created and ismaintained by the United States Environmental ProtectionAgency and provides single chemical toxicity information foraquatic and terrestrial organisms (USEPA 2015). In ECOTOX,we searched for toxicological information of all fish speciesnative or naturalized (i.e., alien established) to Spain. The listof fish species in Spain was based on the last published atlas(Doadrio 2002). Initially, 73,734 cases describing every single

Environ Sci Pollut Res (2017) 24:10172–10181 10173

Page 3: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

result of toxicity tests for 37 fish species were found. Withinthem, we chose for analyses only trials done in freshwater andwith endpoints of LC50 of test animals or no observed effectconcentrations (NOECs), since those two endpoints had manymore cases than other endpoints such as EC50 and LOEC.Although many toxicologists consider that NOEC and relatedendpoints have many serious drawbacks and should not beused (van der Hoeven 1997; Crane and Newman 2000;Laskowski et al. 2010), we consider it because of its extensiveongoing use (Lewis et al. 1994) and to test whether it providedsimilar conclusions to LC50.

In the downloaded ECOTOX data, we ignored subspeciesnames and corrected some obvious taxonomic misspellings,we converted all concentration units into μg/L of toxic sub-stance concentrations in water, and a few unrealistic concen-trations were compared with the original references andcorrected or excluded from the analyses. This led to a databasewith 34,118 cases of 87 substances and 25 freshwater fishspecies (Tables S1 and S2).

Afterwards, we assigned substances to the following clas-sifications, using the QSAR 3.0 toolbox (Bhatia et al. 2015):

i) Acute aquatic toxicity classification by Verhaar et al.(1992), which separates them into inert chemicals, lessinert chemicals, reactive chemicals, specifically actingchemicals, and Bnot possible to classify^;

ii) Acute aquatic toxicity mode of action (MOA) byOASIS (Russom et al. 1997), which divides them inaldehydes, phenols and anilines, esters, alpha, beta-unsaturated alcohols, base surface narcotics, narcoticamines, and Breactive unspecified.^

iii) Aquatic toxicity classification by ECOSAR (Mayo-Bean et al. 2012), which identifies chemicals in 118classes. In case that the program provided results withmultiple classes for the input substance, we chose theone that exhibited greatest toxicity. We pooled variousesters (e.g., mono- or di-thiophosphate esters) in single-class Besters.^ Therefore, the substances that we usedfor analyses finally corresponded to 16 classes, whichwere acid moieties, aliphatic amines, amides, aromatictriazines, carbamate esters, esters, inorganic com-pounds, neutral organics, phenols, polynitrobenzenes,pyrethroids, substituted ureas, thiocarbamate, vinyl/allyl ethers, vinyl/allyl halides, and substances thatBshould not be profiled.^ The substances that containa metal atom are classified by ECOSAR in the groupnamed should not be profiled because sufficient toxi-cological knowledge is not available regarding thistype of compounds.

iv) Toxic hazard classification by Cramer (Cramer et al.1978), which places substances in three classes (classesI, II, and III). Class I substances are those with struc-tures and related data suggesting a low order of toxicity;

if combined with low exposure, they should enjoy avery low priority for investigation. Class III substancesare those that permit no strong initial presumptions ofsafety, or that may even suggest significant toxicity andthus deserve the highest priority for investigation,whereas class II are intermediate. The classificationwas developed by expert judgment and consists of adecision tree based on 33 questions mostly about fea-tures of chemical structure (Cramer et al. 1978).

We also classified compounds on the basis of their princi-pal functional groups and substitutive chemical nomenclature(Leigh et al. 1998) into acids, alcohols, aldehydes, alkanes,alkenes, amines and amides, aromatics, azides, esters, ethers,halogens, inorganic compounds, ketones, metallic or organo-metallic compounds, nitriles, and organosulfur compounds. Insubstitutive nomenclature, the suffix of a compound is givenby the principal functional group (Leigh et al. 1998). We alsoclassified toxicants in three groups, which are priority pollut-ants, contaminants of emerging concern (hereafter, Bemergingcontaminants^), and non-classified, based on the lists by theUSEPA (2014). Octanol-water partition coefficients (log P),estimated by the atomic contribution method (Ghose et al.1998), were obtained from the European Inventory ofExisting Commercial Chemical Substances (EuropeanUnion Reference Laboratory for alternatives to animal testing(EURL-ECVAM) 2015). The effect types (e.g., mortality, ac-cumulation, and behavior) were directly obtained from theECOTOX database. In the database, NOEC was related to20 different effect types; most of them were biochemical, cel-lular, and growth responses, and a few of them were behav-ioral or physiological measurements. Water temperature wasconverted to degrees Celsius and hardness units to ppmCaCO3.

Data analyses

We used random forests (Breiman 2001), as implementedin the package BrandomForest^ (Liaw and Wiener 2002)of the R software (R Development Core Team 2015), toanalyze the factors that best predict toxicity (LC50,NOEC) in the compiled dataset. We used as categoricalpredictors the Chemical Abstracts Service (CAS) registrynumber, pollutant prioritization, ECOSAR classification,functional group, Cramer classification, Verhaar classifi-cation, and acute aquatic toxicity MOA by OASIS as tox-icant classifications and fish family, fish species, and ef-fect type as features of the test organism and their re-sponse. As numerical predictors, we used the octanol-water partition coefficient (log P) as a key physicochem-ical parameter of chemicals and water temperature andhardness as environmental conditions of the toxicologicaltest. We excluded many other predictors because they had

10174 Environ Sci Pollut Res (2017) 24:10172–10181

Page 4: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

many missing values, which would reduce the samplesize, and because in preliminary RF analyses, they werenot among the 15 most important predictors. SincerandomForest cannot handle categorical predictors withmore than 53 categories and to improve robustness, weselected substances that had more than 49 assays for LC50

and more than 7 assays for NOEC, where assay is anindependent datum or row in the ECOTOX database(e.g., test of a substance at a different concentration, watertemperature or of a different fish species). On the otherhand, to avoid excluding many fish species, we includedfish species with more than three assays. After this selec-tion, the database consisted of 7892 cases with 52 sub-stances with >49 assays for LC50 and 1987 cases with 51substances with >7 assays for NOEC. We ran analyseswith and without CAS number to test the ability of com-monly used predictors without the knowledge of thechemical substance itself and with a larger dataset forcomparative purposes. Note that for categorical predictorssuch as CAS number, numbers are not used in the analysisand only the affiliation to a certain group is used to assessthe relevance as a predictor.

We used 500 trees to build the RF because increasing thisdefault number did not substantially change the results of var-iable importance or explained variation (Liaw and Wiener2002). As the number of variables randomly sampled as can-didates at each split, we used the recommended default of thesquare root of number of predictors (Liaw and Wiener 2002).Note that the out-of-bag estimate of variance used in RF is asaccurate as using a test set of the same size as the training setand thus removes the need for a set-aside test set in standardapplications (Breiman 2001; Prasad et al. 2006).

RFs provide a measure of percentage of variance explainedor pseudo-R2 for the model obtained and a measure of variableimportance, i.e., the importance of the predictor variables,which can be obtained as the total decrease in node impuritiesfrom splitting on the variable, averaged over all trees (Liawand Wiener 2002). RF partial dependence plots (Friedman2001) were also obtained for the most important predictors.These plots give a graphical depiction of the marginal effect ofa predictor on the response variable, i.e., the dependence of thetoxicity response on a specific predictor after partialling outthe effects of the other predictors in the model.

Analysis of covariance (ANCOVA) was also used to fur-ther compare the effect concentrations among fish species andchemical substances, using log P as a covariate. AnANCOVAdesign with the factor × covariate interaction allows us to testthe assumption of homogeneity of regression coefficients ofthe standard ANCOVA design, i.e., to compare slopes(García-Berthou and Moreno-Amich 1993). Quantitative var-iables were log10-transformed for ANCOVA because residualplots suggested that the assumptions (normality, homoscedas-ticity, and linearity) were thus satisfied.

Results

Random forests

RF explained most of the variation in toxicity (89.3% for LC50

and 94.6% for NOEC). The five most important variables topredict LC50 with RF were the chemical substance (i.e., CASnumber), log P, pollutant prioritization, ECOSAR classifica-tion, and fish species (Fig. 1a). The results for NOEC weresimilar, but water temperature was among the first five predic-tors, replacing pollutant prioritization, which was the thirdmost important for LC50. The order of the most importantvariables was moderately different for the two endpoints; al-though four of the predictors were among the top five for bothstatistics and CAS number was always the most important, logP was the second for LC50 and third for NOEC, whereas fishspecies was ranked as the fifth and second most importantpredictor for LC50 and NOEC, respectively (Fig. 1a, b).Priority pollutants were the most toxic, followed by emergingpollutants and substances Bnot classified,^ the differences be-ing also less apparent for NOEC (Fig. 2). For both endpoints,pyrethroids and vinyl/allyl halides were the most toxic com-pounds, metal-contained substances were also among themost toxic, whereas aliphatic amines were the least toxic(Fig. S3).

We also ran random forests without CAS number as a pre-dictor, for the same dataset as above (51 substances, 7843cases for LC50 and 1987 cases for NOEC) and for a largerdataset (320 substances, 10,788 and 2174 cases for LC50 andNOEC, respectively). For both LC50 and NOEC, the five mostimportant predictors and their order were very similar in thetwo analyses that excluded CAS number (Fig. 1). However,the percentage of explained variation ignoring the chemicalsubstance (CAS number) was slightly lower (89.3 vs. 87.8%or lower for LC50 and 94.6 vs. 93.3% or lower for NOEC),and CAS number explained most of the variation when used(Fig. 1), suggesting that predictive models perform relativelywell but do not capture all toxicological information of sub-stances. Therefore, we used the models with CAS number tofurther analyze the effects of predictor variables.

Partial dependence on octanol-water partition coefficientshowed that toxicity increased (LC50 values decreased) withlog P but rather sharply at values of log P > 3 (Fig. 3). ForNOEC, the relationship was weaker and non-monotonic, withtoxicity first decreasing and then increasing.

Fish species was the second most important predictor forNOEC and the fifth one for LC50 (Fig. 1). The two endpointsagreed well in the order of species-specific sensitivities, withrainbow trout (Oncorhynchus mykiss), northern pike (Esoxlucius), coho salmon (Oncorhynchus kisutch), brown trout(Salmo trutta), and Atlantic salmon (Salmo salar) being lesstolerant and goldfish (Carassius auratus), guppy (Poeciliareticulata), bream (Abramis brama), common carp

Environ Sci Pollut Res (2017) 24:10172–10181 10175

Page 5: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

(Cyprinus carpio), and European perch (Perca fluviatilis) be-ing more resistant (Fig. 4). Although the five salmonids areamong the eight most sensitive species according to both end-points (Fig. 4), fish family is not as a good predictor as species(Fig. 1), because of the contrasting tolerance among specieswithin families. In particular, within cyprinids, tench showedhigh sensitivity, whereas goldfish and common carp wereamong the most tolerant.

The mean values predicted by RF (partial dependenceplots) were significantly correlated with the observed meansfor chemical substances and fish species, and the order ofpredicted and observed values was very similar (Figs. S4–S7). However, the range of predicted values was much re-duced compared to observed values, because the former adjustfor other predictors (i.e., predicted values are the expectedvalues after accounting for all other predictors); this fact

Fig. 1 Variable importance ofpredictors of LC50 (left) andNOEC (right) values according tothe random forest technique. a, bIncluded CAS number as acategorical predictor. c, dCorresponded to the same databut not using CAS number as apredictor. e, f Did not includeCAS number as predictor and fora larger data set. Percentages ofexplained variation were 89.3,87.8, and 86.4 for LC50 and 94.6,93.3, and 91.7% for NOEC

Fig. 2 Partial dependence of LC50 and NOEC on pollutant prioritizationbased on the random forest prediction

10176 Environ Sci Pollut Res (2017) 24:10172–10181

Page 6: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

suggests that substances with quite different toxicity havebeen tested with different fish species and that many factorscontribute to the variability in toxicological databases.

Analyses of covariance

ANCOVAs (Table 1) were quite in agreement with RF, as theyshowed that (i) chemical substance and secondarily fish spe-cies and log P explained most of the variation (85–91%) of

both endpoints; (ii) explained variation was slightly lower forLC50; (iii) for NOEC, fish species is a more important predic-tor than log P; and (iv) for a given log P, widely tested speciessuch as rainbow trout or brown trout generally displayed lowvalues (more sensitivity), whereas some cyprinids such asgoldfish and common carp showed much higher values(Table S3). However, ANCOVAs also showed that (i) theslopes are significantly heterogeneous (see log P × fish spe-cies in Table 1) and vary markedly among species (Figs. 5 and6; see also Table S3); (ii) the relationships with log P are moreheterogeneous for NOEC, where some slopes are flat or evenmarkedly positive and that fish species explains a much higherproportion of the variation (see SS in Table 1); and (iii) al-though generally significant, the species-specific relationshipswith log P that ignore the chemical substance are weak(Figs. 5 and 6), with explained variation generally less than20% and hence low predictive power (Table S3).

Discussion

RF showed that the chemical substance itself, octanol-waterpartition coefficient (log P), pollutant prioritization, ECOSARclassification, and fish species for LC50 and also water tem-perature for NOEC were the best predictors of toxicity.Chemical substance (CAS number as a categorical factor)was the most important predictor, and in agreement with otherstudies, no other correlates are as good predictors as the chem-ical itself (Vaal et al. 1997, 2000). However, RF without CASnumbers explained almost asmuch variability in toxicity (onlyabout 1.3–1.5% less), suggesting that the predictors used inthe RF models explained much of the variability in toxicity of

Fig. 4 Partial dependencies ofLC50 and NOEC on fish speciesbased on the random forestprediction

Fig. 3 Partial dependence of LC50 and NOEC on octanol-water partitioncoefficients (log P) based on the random forest prediction

Environ Sci Pollut Res (2017) 24:10172–10181 10177

Page 7: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

the chemicals. Simple models such as ANCOVAs with onlythree predictors (and their interactions) explained as much

variation as RFs (although note that the R2 measures are notthe same). LogP, which is the parameter most commonly usedfor toxicity prediction in QSARs and often shows a positiverelationship with toxicity (Blum and Speece 1991; Meylanet al. 1999), was after CAS number the most important pre-dictor of LC50 both with RF and ANCOVA, in agreement withprevious studies that suggest that predictions based on thisparameter are reasonable (Levet et al. 2013). However, theexplained variation by log P was much lower than for CASnumber, and the relationship of toxicological endpoints withlog P varied markedly with ECOSAR groups (Fig. S2), fur-ther illustrating potential predictive problems and thatchemicals with similar structures can have very different tox-icity. For instance, dieldrin (CAS 60571) and endrin (CAS72208) are in the same ECOSAR chemical class (vinyl/allylhalides) with almost the same log P values (3.4) but markedtoxicity differences (Figs. S1 and S2). This toxicity difference

-6 -4 -2 0 2 4 6

-2

0

2

4

6

A. brama

C. auratus

C. carpio

O. kisutch

O. mykiss

P. parva

R. rutilus

S. salar

S. trutta

S. fontinalis

S. erythrophthalmus

T. tinca

Fish species

log

10(LC

50)

-6 -4 -2 0 2 4 6

0

2

4

6

8

A. melas

A. angu illa

A. facetus

E. lucius

F. heteroclitus

G. aculeatus

I. punctatus

L. gibbosus

M. salmoides

P. fluviatilis

P. marinus

P. reticulata

S.lucioperca

log

10(L

C5

0)

Octanol-water partition coefficient (log P)

Fish species

Fig. 5 Relationship of LC50 with octanol-water partition coefficient (logP) among fish species. Upper panel is for the species, belonging to thefamilies Cyprinidae and Salmonidae; the lower panel is for the rest ofspecies. Note that both axes were log-transformed. The regression linesby species are also shown. See ST3 for regression

Table 1 Analyses of covarianceof LC50 and NOEC values withchemical substance (CASnumber) and fish species ascategorical factors and theoctanol-water partition coefficient(log P) as a covariate

LC50 (R2adj = 0.85) NOEC (R2adj = 0.91)

SS df P SS df P

log P 1,100.2 1 <0.0005 173.1 1 <0.0005

Fish species 1,015.1 24 <0.0005 1,196.5 14 <0.0005

CAS 10,094.9 50 <0.0005 3,479.7 47 <0.0005

log P × fish species 40.6 17 <0.0005 67.7 8 <0.0005

CAS × fish species 847.7 280 <0.0005 97.6 18 <0.0005

Error 0.53 7,519 0.5 1,898

SS sum of squares, df degrees of freedom, P P value, R2adj adjusted coefficient of determination

-2 0 2 4 6

-2

0

2

4

6

C. auratus

C. carpio

E. lucius

I. punctatus

M. salmoides

O.kisutch

O. mykiss

P. fluviatilis

P. marinus

P. reticulata

S. salar

S. trutta

S. fontinalis

T. tinca

Fish species

log

10 (N

OE

C)

Octanol-water partition coefficient (log P)

Fig. 6 Relationship of NOEC with octanol-water partition coefficient(log P) among fish species. Note the both axis were log-transformed.The regression lines by species are also shown. See ST3 for regression

10178 Environ Sci Pollut Res (2017) 24:10172–10181

Page 8: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

was also observed in a rat study by Allen et al. (2013), whofound that small structural changes in dieldrin (compared tothe stereoisomer endrin) yielded significant differences in tox-icity. Pollutant prioritization (i.e., the lists of priority and emerg-ing pollutants), which is mostly based on their occurrence inaquatic environments and their toxicity risk, was an importantpredictor for LC50 but much less for NOEC, probably in partbecause regulatory rules and previous predictive tools havefocused on lethal effects rather than non-lethal effects, the latterlikely being less understood. Priority pollutants were more tox-ic than emerging pollutants, suggesting that the former includethe most toxic compounds; the number of cases in the originaldatabase was similar for the priority vs. emerging pollutants(9287 vs. 9144) so does not affect this conclusion. ECOSARclassification was the best predictor among the aquatic toxicityclassifications. Moore et al. (2003) evaluated model perfor-mance for six QSAR packages that predict acute toxicity to fishand showed that ECOSAR and OASIS had higher correlationsbetween predicted and measured toxicities. The ECOSARmodel has been shown to correctly classify about 65% of alarge test set of industrial chemicals into defined classes ofaquatic toxicity for six fish species (Reuschenbach et al. 2008).

Cramer classification, mode of action by OASIS, fish fam-ily, and Verhaar classification were less useful predictors, whilewater temperature, hardness, and chemical functional groupalso influenced the toxicity results. Evaluation of compounds’hazard through Cramer scheme has been assumed useful(Patlewicz et al. 2008), but it was among the least importantpredictors among the variables that we analyzed. Bhatia et al.(2015) evaluated 1016 fragrance materials by conductingCramer classification using Toxtree, the OECDQSAR toolbox,and expert judgment and recommended possible coding chang-es to reduce disparities among classifications. Aquatic toxicityclassifications based on modes of action (MOA by OASIS andVerhaar classification) did not perform so well for the sub-stances included in the present analyses; 70–75% of the sub-stances were assigned as reactive unspecified by OASIS and72–84% to the class Bnot possible to classify according to theserules^ by Verhaar classification, possibly explaining the lowimportance of these two classifications for toxicity predictionin the RF models. We were not able to study many environ-mental conditions (e.g., pH, oxygen, nutrient concentrations,and salinity) that are known to affect toxicity (Grosell et al.2007; Pickering 1968; Thurston and Russo 1981), because theywere generally less reported in the database and likely vary lessin laboratory conditions. In real environmental exposure, theymay have significant effects on toxicity of chemical substances.In general, LC50 and NOEC yielded similar results (e.g., ex-plained variance by RF and ANCOVA, importance order ofpredictors, relationship with log P).

As hypothesized, fish species was one of the most impor-tant factors, being the fifth for LC50 and second for NOEC,suggesting that this factor (and its interactions with log P and

chemical substance) is fundamental for prediction and studyof chemical toxicity to fish. Our results suggest a good corre-lation between log P and toxicity, but the interaction betweenspecies, chemical substance, and log P as well as the interac-tion between species and chemicals are pervasive and impor-tant. Thus, using only log P to predict toxicity of non-testedsubstances might be inaccurate. Although for more than halfof the fish species (13 out of 25 species) analyzed here, LC50

decreased with log P, many did not show significantrelationships and explained that variation was generally low,particularly for NOEC, indicating that chemicals with similarlog P may differ widely in their toxicity. Vittozzi and DeAngelis (1991) also showed species-dependent acute toxicityof chemicals among fish species.

Although fish species sensitivity to toxicants varies amongsubstances (Ibrahim et al. 2014), our results confirm that salmo-nids and also northern pike are more intolerant species and canlive in narrower ranges of water and habitat quality (Hung et al.2004; Kennard et al. 2005; Oberdorff et al. 2001). Goldfish,common carp, guppy, roach, and bream were the most tolerantspecies, in partial agreement with traditional views (Lyons 2006;Maceda-Veiga and De Sostoa 2011). However, the toxicity ofsubstances was barely dependent on fish family, suggesting thatthis is not a good surrogate of species and that more understand-ing of species-specific differences and the mechanisms thatcause them is needed. Species-specific tolerance might be fur-ther explained by their morphology, physiology, and ecologicalcondition. Guénard et al. (2011) proposed a method to predictspecies tolerance using phylogenetic information, for 25 aquaticspecies and some pesticides (carbaryl, malathion, DDT, andlindane). Sensitivity differences between taxa may depend onlife history traits (e.g., respiratory strategy and body size).

In conclusion, we have illustrated that modern machine-learning techniques such as RF can help to understand thecomplexity of toxicological processes and quantify the impor-tance of the multitude of factors that mediate them. Our resultsconfirm the hypothesis that fish species is among the mostimportant predictors for modeling toxicity. Therefore, re-searchers should be cautious when generalizing ecotoxicolog-ical results of models that use a few predictors and species,since chemicals with very similar structure and log P can havedifferent toxicity and different species in same taxonomicfamily can display different sensitivity.

Acknowledgments This research was financially supported by theSpanish Ministry of Economy and Competitiveness (projectsCGL2013-43822-R, CGL2015-69311-REDT, and CGL2016-80820-R),the Government of Catalonia (ref. 2014 SGR 484), the University ofGirona (MPCUdG2016/120), and the European Commission (COSTAction TD1209). BT benefited from a doctoral fellowship from theEuropean Commission (Erasmus Mundus Partnership BTechno II,^372228-1-2012-1-FR-ERA MUNDUS-EMA21). We thank Dr. PaoSrean for the help in using the R software and anonymous reviewers forthe comments on the manuscript.

Environ Sci Pollut Res (2017) 24:10172–10181 10179

Page 9: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

References

Allen EM, Florang VR, Davenport LL, Jinsmaa Y, Doorn JA (2013)Cellular localization of dieldrin and structure–activity relationshipof dieldrin analogues in dopaminergic cells. Chem Res Toxicol 26:1043–1054

Bhatia S, Schultz T, Roberts D, Shen J, Kromidas L, Api AM (2015)Comparison of Cramer classification between Toxtree, the OECDQSAR toolbox and expert judgment. Regul Toxicol Pharmacol 71:52–62. doi:10.1016/j.yrtph.2014.11.005

Blum DJW, Speece RE (1991) Quantitative relationships for chemicaltoxicity to environmental bacteria. Ecotox Environ Safe 22:198–224

Breiman L (2001) Random forests. Mach Learn 45:5–32Cramer GM, Ford RA, Hall RL (1978) Estimation of toxic hazard—a

decision tree approach. Food Cosmet Toxicol 16:255–276Crane M, Newman MC (2000) What level of effect is a no observed

effect? Environ Toxicol Chem 19:516–519Cutler D, Edwards T, Beard K, Cutler A, Hess K, Gibson J, Lawler J

(2007) Random forests for classification in ecology. Ecology 88:2783–2792

Doadrio I (2002) Atlas and red book of the inland fish of Spain. Ministryof Environment, Madrid (in Spanish)

Dudgeon D, Arthington AH, Gessner MO, Kawabata Z-I, Knowler DJ,Lévêque C, Naiman RJ, Prieur-Richard A-H, Soto D, StiassnyMLJ,Sullivan CA (2006) Freshwater biodiversity: importance, threats,status and conservation challenges. Biol Rev 81:163–182. doi:10.1017/S1464793105006950

European Union Reference Laboratory for alternatives to animal testing(EURL-ECVAM) (2015) EC Inventory; EINECS. https://eurl-ecvam.jrc.ec.europa.eu/laboratories-research/predictive_toxicology/information-sources/ec_inventory. Accessed 15 Sep2015

Fedorenkova A, Vonk JA, Breure AM, Hendriks AJ, Leuven R (2013)Tolerance of native and non-native fish species to chemical stress: acase study for the river Rhine. Aquat Invasions 8:231–241. doi:10.3391/ai.2013.8.2.10

Friedman JH (2001) Greedy function approximation: a gradient boostingmachine. Ann Stat 29:1189–1232

García-Berthou E, Moreno-Amich R (1993) Multivariate analysis of co-variance in morphometric studies of the reproductive cycle. Can JFish Aquat Sci 50:1394–1399

Ghose A, Viswanadhan V, Wendoloski J (1998) Prediction of hydropho-bic (lipophilic) properties of small organic molecules using fragmen-tal methods: an analysis of ALOGP and CLOGP methods. J PhysChem A 102:3762–3772

Grosell M, Blanchard J, Brix KV, Gerdes R (2007) Physiology is pivotalfor interactions between salinity and acute copper toxicity to fish andinvertebrates. Aquat Toxicol 84:162–172. doi:10.1016/j.aquatox.2007.03.026

Guénard G, von der Ohe PC, de Zwart D, Legendre P, Lek S (2011) Usingphylogenetic information to predict species tolerances to toxicchemicals. Ecol Appl 21:3178–3190. doi:10.1890/10-2242.1

Hansen BJ, van Haelst AG, van Leeuwen K, van der Zandt P (1999)Priority setting for existing chemicals: European Union risk rankingmethod. Environ Toxicol Chem 18:772–779. doi:10.1002/etc.5620180425

Henegar A, Mombelli E, Pandard P, Péry ARR (2011)What can be learntfrom an ecotoxicity database in the framework of the REACh reg-ulation? Sci Total Environ 409:489–494. doi:10.1016/j.scitotenv.2010.10.028

Hung DQ, Nekrassova O, Compton RG (2004) Analytical methods forinorganic arsenic in water: a review. Talanta 64:269–277. doi:10.1016/j.talanta.2004.01.027

Ibrahim L, Preuss TG, Schaeffer A, Hommen U (2014) A contribution tothe identification of representative vulnerable fish species for

pesticide risk assessment in Europe—a comparison of populationresilience using matrix models. Ecol Model 280:65–75. doi:10.1016/j.ecolmodel.2013.08.001

Katritzky AR, Kuanar M, Slavov S, Hall CD, Karelson M, Kahn I,Dobchev DA (2010) Quantitative correlation of physical and chem-ical properties with chemical structure: utility for prediction. ChemRev 110:5714–5789. doi:10.1021/cr900238d

KennardMJ, ArthingtonAH, Pusey BJ, Harch BD (2005) Are alien fish areliable indicator of river health? Freshw Biol 50:174–193. doi:10.1111/j.1365-2427.2004.01293.x

Laskowski R, Bednarska AJ, Kramarz PE, Loureiro S, Scheil V, Kudłek J,HolmstrupM (2010) Interactions between toxic chemicals and naturalenvironmental factors—a meta-analysis and case studies. Sci TotalEnviron 408:3763–3774. doi:10.1016/j.scitotenv.2010.01.043

Leigh GJ, Favre HA, Metanomski WV (1998) Principles of chemicalnomenclature: a guide to IUPAC recommendations. Blackwell,Oxford. doi:10.1515/ci.2007.29.4.23

Levet A, Bordes C, Clément Y, Mignon P, Chermette H, Marote P, Cren-Olivé C, Lantéri P (2013) Quantitative structure–activity relation-ship to predict acute fish toxicity of organic solvents. Chemosphere93:1094–1103. doi:10.1016/j.chemosphere.2013.06.002

Lewis PA, Klemm DJ, Lazorchak JM, Norberg-King TJ, Peltier WH,Heber MA (1994) Short-term methods for estimating the chronictoxicity of effluents and receiving waters to freshwater organisms.U.S. Environmental Protection Agency, Cincinnati

Li AJ, Leung PTY, Bao VWW, Yi AXL, Leung KMY (2014)Temperature-dependent toxicities of four common chemical pollut-ants to the marine medaka fish, copepod and rotifer. Ecotoxicology23:1564–1573. doi:10.1007/s10646-014-1297-4

LiawA,WienerM (2002) Classification and regression by random forest.R news 2:18–22

Lifongo L, Nfon E (2009) Evaluating the fate of organic compounds inthe Cameroon environment using a level III multimedia fugacitymodel. African J Environ Sci Technol 3:376–386

Lyons J (2006) A fish-based index of biotic integrity to assess intermittentheadwater streams in Wisconsin, USA. Environ Monit Assess 122:239–258. doi:10.1007/s10661-005-9178-1

Maceda-Veiga A, De Sostoa A (2011) Observational evidence of thesensitivity of some fish species to environmental stressors inMediterranean rivers. Ecol Indic 11:311–317. doi:10.1016/j.ecolind.2010.05.009

Mayo-Bean K, Kendra Moran L, Meylan B, Ranslow P (2012)Methodology document for the ecological structure-activity rela-tionship model (ECOSAR) class program; estimating toxicity ofindustrial chemicals to aquatic organisms. U.S. EnvironmentalProtection Agency, Washington. https://www.epa.gov/sites/production/files/2015-09/documents/ecosartechfinal.pdf .Accessed 26 July 2015

Meylan WM, Howard PH, Boethling RS, Aronson D, Printup H,Gouchie l S (1999) Improved method for es t imat ingbioconcentration/bioaccumulation factor from octanol/water parti-tion coefficient. Environ Toxicol Chem 18:664–672

Moore DRJ, Breton RL, MacDonald DB (2003) A comparison of modelperformance for six quantitative structure-activity relationship pack-ages that predict acute toxicity to fish. Environ Toxicol Chem 22:1799–1809

Netzeva TI, Pavan M, Worth AP (2008) Review of (quantitative) struc-ture–activity relationships for acute aquatic toxicity. QSAR CombSci 27:77–90. doi:10.1002/qsar.200710099

Oberdorff T, Pont D, HuguenyB, Chessel D (2001) A probabilistic modelcharacterizing riverine fish communities of French rivers: a frame-work for environmental assessment. Freshw Biol 46:399–415

Oberdorff T, Pont D, Hugueny B, Porcher JP (2002) Development andvalidation of a fish-based index for the assessment of river health inFrance. Freshw Biol 47:1720–1734

10180 Environ Sci Pollut Res (2017) 24:10172–10181

Page 10: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

OECD (2009) Guidance document for using the OECD (Q)SAR appli-cation toolbox to develop chemical categories according to theOECD guidance on grouping of chemicals. http://www.oecd.org/officialdocuments/. Accessed 14 June 2015

Patlewicz G, Jeliazkova N, Safford RJ, Worth AP, Aleksiev B (2008) Anevaluation of the implementation of the Cramer classificationscheme in the Toxtree software. SAR QSAR Environ Res 19:495–524. doi:10.1080/10629360802083871

Pickering QH (1968) Some effects of dissolved oxygen concentrationsupon the toxicity of zinc to the bluegill, Lepomis macrochirus Raf.Water Res 2:187–194

Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regres-sion tree techniques: bagging and random forests for ecological pre-diction. Ecosystems 9:181–199. doi:10.1007/s10021-005-0054-1

R Development Core Team (2015) R: a language and environment forstatistical computing. R Foundation for Statistical Computing,Vienna

Reuschenbach P, Silvani M, Dammann M, Warnecke D, Knacker T(2008) ECOSAR model performance with a large test set of indus-trial chemicals. Chemosphere 71:1986–1995. doi:10.1016/j.chemosphere.2007.12.006

Riethmuller N, Markich SJ, Van Dam RA, Parry D (2001) Effects ofwater hardness and alkalinity on the toxicity of uranium to a tropicalfreshwater hydra (Hydra viridissima). Biomarkers 6:45–51. doi:10.1080/135475001452788

Russom CL, Bradbury SP, Broderius SJ, Hammermeister DE,Drummond RA (1997) Predicting modes of toxic action from chem-ical structure: acute toxicity in the fathead minnow (Pimephalespromelas). Environ Toxicol Chem 16:948–967

Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008)Conditional variable importance for random forests. BMCBioinformatics 9:307. doi:10.1186/1471-2105-9-307

Tebby C, Mombelli E, Pandard P, Péry ARR (2011) Exploring anecotoxicity database with the OECD (Q)SAR toolbox andDRAGON descriptors in order to prioritise testing on algae,

daphnids, and fish. Sci Total Environ 409:3334–3343. doi:10.1016/j.scitotenv.2011.05.029

Thurston RV, Russo RC (1981) Ammonia toxicity to fishes. Effect of pHon the toxicity of the un-ionized ammonia species. Environ SciTechnol 15:837–840

USEPA (U.S. Environmental Protection Agency) (2014) Aquatic lifecriteria development documents. https://www.epa.gov/wqc/aquatic-life-criteria-development-documents. Accessed 9 Feb 2017)

USEPA (U.S. Environmental Protection Agency) (2015) ECOTOX userguide: ECOTOXicology database system. Version 4.0. Available at:https://cfpub.epa.gov/ecotox/. Accessed 9 Feb 2017

Vaal MA, Van Leeuwen CJ, Hoekstra JA, Hermens JLM (2000) Variationin sensitivity of aquatic species to toxicants: practical consequencesfor effect assessment of chemical substances. Environ Manag 25:415–423. doi:10.1007/s002679910033

Vaal MA, Wall T, Hoekstra JA, Hermens JLM (1997) Variation in thesensitivity of aquatic species in relation to the classification of envi-ronmental pollutants. Chemosphere 35:1311–1327

van der Hoeven N (1997) How to measure no effect. Part III: statisticalaspects of NOEC, ECx and NEC estimates. Environmetrics 8(3):255–261

Verhaar HJM, Leeuwen CJV, Hermens JLM (1992) Classifying environ-mental pollutants. 1: structure-activity relationships for prediction ofaquatic toxicity. Chemosphere 25:471–491

Vittozzi L, De Angelis G (1991) A critical review of comparative acutetoxicity data on freshwater fish. Aquat Toxicol 19:167–204. doi:10.1016/0166-445X(91)90017-4

Vollhardt P, Schore N (2011) Organic chemistry, Sixth edn.WH Freemanand Company, New York

Woltering DM (1984) The growth response in fish chronic and early lifestage toxicity tests: a critical review. Aquat Toxicol 5:1–21

Yang R, Randall DJ (1997) Biomarkers for rainbow trout (Oncorhynchusmykiss) and coho salmon (Oncorhynchus kisutch) exposed to 1,2,4,5-tetrachlorobenzene and tetrachloroguaiacol. Chemosphere 34:1167–1180

Environ Sci Pollut Res (2017) 24:10172–10181 10181

Page 11: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S1. Relationship of LC50 with octanol-water partition coefficient (log P) among

chemical substances. Note that both axes were log-transformed.

2 3 4 5

-2

0

2

4

6

8

CAS number

Octanol-water partition coefficient

log 1

0L

C50

CAS.Number

58899

60571

63252

72208

72435

83794

86500

87865

88302

115866

121755

122145

127651

131522

315184

569642

1397940

1420048

1582098

1929733

2032599

3383968

7632000

8001352

52645531

2 3 4 5

-2

0

2

4

6

8

CAS number

Octanol-water partition coefficient

log 1

0L

C50

CAS.Number

58899

60571

63252

72208

72435

83794

86500

87865

88302

115866

121755

122145

127651

131522

315184

569642

1397940

1420048

1582098

1929733

2032599

3383968

7632000

8001352

52645531

2 3 4 5

-2

0

2

4

6

8

CAS number

Octanol-water partition coefficient

log 1

0L

C50

CAS.Number

58899

60571

63252

72208

72435

83794

86500

87865

88302

115866

121755

122145

127651

131522

315184

569642

1397940

1420048

1582098

1929733

2032599

3383968

7632000

8001352

52645531

2 3 4 5

-2

0

2

4

6

8

CAS number

Octanol-water partition coefficient

log 1

0L

C50

CAS.Number

58899

60571

63252

72208

72435

83794

86500

87865

88302

115866

121755

122145

127651

131522

315184

569642

1397940

1420048

1582098

1929733

2032599

3383968

7632000

8001352

52645531

2 3 4 5

-2

0

2

4

6

8

CAS number

Octanol-water partition coefficient

log 1

0L

C50

CAS.Number

58899

60571

63252

72208

72435

83794

86500

87865

88302

115866

121755

122145

127651

131522

315184

569642

1397940

1420048

1582098

1929733

2032599

3383968

7632000

8001352

526455312 3 4 5

-2

0

2

4

6

8

CAS number

Octanol-water partition coefficient

log

10

LC

50

Octanol-water partition coefficient (log P)

log

10(L

C50)

Chemical substances (i.e. CAS numbers)

Page 12: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S2. Relationship of LC50 with octanol-water partition coefficient (log P) among

ECOSAR classification groups. Note that both axes were log-transformed.

2 3 4 5

-2

0

2

4

6

8

ECOSAR

Octanol-water partition coefficient

log

10

LC

50

Octanol-water partition coefficient (log P)

log

10(L

C50)

ECOSAR classification

2 3 4 5

-2

0

2

4

6

8

ECOSAR

Octanol-water partition coefficient Kow

log

10

LC

50

ECOSAR

Amides

Carbamate Esters

Esters

inorganic

Neutral organics

Phenols

Polynitrobenzenes

Pyrethroids

Vinyl/Allyl Ethers

Vinyl/Allyl Halides

2 3 4 5

-2

0

2

4

6

8

ECOSAR

Octanol-water partition coefficient Kow

log

10

LC

50

ECOSAR

Amides

Carbamate Esters

Esters

inorganic

Neutral organics

Phenols

Polynitrobenzenes

Pyrethroids

Vinyl/Allyl Ethers

Vinyl/Allyl Halides

2 3 4 5

-2

0

2

4

6

8

ECOSAR

Octanol-water partition coefficient Kow

log

10

LC

50

ECOSAR

Amides

Carbamate Esters

Esters

inorganic

Neutral organics

Phenols

Polynitrobenzenes

Pyrethroids

Vinyl/Allyl Ethers

Vinyl/Allyl Halides

Page 13: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S3. Partial dependence of LC50 (left) and NOEC (right) on Aquatic toxicity

classification by ECOSAR based on the random forest prediction.

Page 14: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S4. Observed (left) and predicted (right) log-transformed LC50 values by CAS

number. For the observed values, the mean (bar) and SD (error bars) are given. The

predicted values correspond to those obtained by random forests (partial dependence on

CAS number). The bars are ordered by observed (log-transformed) means. The

correlation coefficient (r) between observed and predicted LC50 values was 0.947 (P < 5

0.0001).

Observed

-2 0 2 4 6 8

CA

S n

um

ber

7647145333200

3056019138641940

94097127651

10718367718549

315184108952

33839681314621

883027733020

121250297722647

632522032599

52686122145

19297337646857

167527757632000

1217557758987

569642115866104405

86500131522143339

158209887865

74405087778543

8379458899

1420048481390

778306410108642

74473947243550293

101243647761888

605718001352

5264553172208

1397940

Predicted

0 1 2 3 4

log10

(LC50

)

Page 15: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S5. Observed (left) and predicted (right) log-transformed LC50 values by fish

species. For the observed values, the mean (bar) and SD (error bars) are given. The

predicted values correspond to those obtained by random forests (partial dependence on

fish species). The bars are ordered by observed (log-transformed) means. The

correlation coefficient (r) between observed and predicted LC50 values was 0.45 (P < 5

0.024).

Observed

0 1 2 3 4 5 6

Fis

h s

pecie

s

Abramis brama

Scardinius erythrophthalmus

Rutilus rutilus

Perca fluviatilis

Petromyzon marinus

Australoheros facetus

Carassius auratus

Salmo trutta

Cyprinus carpio

Gasterosteus aculeatus

Ictalurus punctatus

Salmo salar

Pseudorasbora parva

Oncorhynchus kisutch

Anguilla anguilla

Poecilia reticulata

Oncorhynchus mykiss

Salvelinus fontinalis

Lepomis gibbosus

Fundulus heteroclitus

Ameiurus melas

Micropterus salmoides

Esox lucius

Tinca tinca

Sander lucioperca

Predicted

2.6 2.7 2.8 2.9 3.0 3.1

log10 (LC50)

Page 16: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S6. Observed (left) and predicted (right) log-transformed NOEC values by CAS

number. For the observed values, the mean (bar) and SD (error bars) are given. The

predicted values correspond to those obtained by random forests (partial dependence on

CAS number). The bars are ordered by observed (log-transformed) means. The

correlation coefficient (r) between observed and predicted NOEC values was 0.95 (P < 5

0.0001).

Observed

-2 0 2 4 6

CA

S n

um

be

r

76471457646799

45298906335671

675617786814

11426188302

31898915307865

100027122145

68783788121755108952

51285771854980180177722647

333415118967

77330207646857

605157446700

1004904410108642

648452295954

121250297722841

100430131563662

3305527758987

878657440508

554065367758954

850181912249

74908292188274404391336363

676415231507810124364

776188852918635

50282

Predicted

0 1 2 3 4

log10 (NOEC)

Page 17: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Figure S7. Observed (left) and predicted (right) log-transformed NOEC values by fish

species. For the observed values, the mean (bar) and SD (error bars) are given. The

predicted values correspond to those obtained by random forests (partial dependence on

fish species). The bars are ordered by observed (log-transformed) means. The

correlation coefficient (r) between observed and predicted NOEC values was 0.91 (P < 5

0.0001).

Observed

0 1 2 3 4 5 6

Fis

h s

pecie

s

Perca fluviatilis

Carassius auratus

Petromyzon marinus

Gasterosteus aculeatus

Ictalurus punctatus

Poecilia reticulata

Cyprinus carpio

Salmo trutta

Salmo salar

Oncorhynchus kisutch

Salvelinus fontinalis

Micropterus salmoides

Oncorhynchus mykiss

Esox lucius

Tinca tinca

Predicted

1.0 1.5 2.0 2.5

log10 (NOEC) 10

Page 18: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Table S1. Chemical substances considered in the data analyses (substances with more

than 49 assays for LC50 and more than 7 for NOEC). The number of assays is given for

each substance and toxicity statistic

CAS

Number Chemical Name LC50 NOEC

50282 (17beta)Estra-1,3,5(10)triene-3,17-diol 10

50293 1,1'-(2,2,2-Trichloroethylidene)bis[4-chlorobenzene] 98 51285 2,4-Dinitrophenol 33

52686

P-(2,2,2-Trichloro-1-hydroxyethyl)phosphonic acid

dimethyl ester 269

58899

(1alpha,2alpha,3beta,4alpha,5alpha,6beta)-1,2,3,4,5,6-

Hexachlorocyclohexane 72

60515

Phosphorodithioic acid, O,O-Dimethyl S-[2-

(methylamino)-2-oxoethyl] ester 62

60571

(1aR,2R,2aS,3S,6R,6aR,7S,7aS)-rel-3,4,5,6,9,9-

Hexachloro-1a,2,2a,3,6,6a,7,7a-octahydro-2,7:3,6-

dimethanonaphth[2,3-b]oxirene 55 63252 1-Naphthalenol methylcarbamate 230 67561 Methanol 17

67641 2-Propanone 13

72208

3,4,5,6,9,9-Hexachloro-1a,2,2a,3,6,6a,7,7a-octahydro-

[2,7:3,6-dimethanonaphth[2,3-b]oxirene,[1a alpha,2

beta,2a beta,3 alpha,6 alpha,6a beta,7 beta,7a alpha] 145

72435

1,1'-(2,2,2-Trichloroethylidene)bis[4-

methoxybenzene] 220 74908 Hydrocyanic acid 9

83794

[2R-(2-alpha, 6a-alpha, 12a-alpha)]-1,2,12,12a-

Tetrahydro-8,9-dimethoxy-2-(1-methylethenyl)-[1]-

benzopyrano[3,4-b]furo[2,3-h][1]benzopyran-6(6aH)-

one 184 85018 Phenanthrene 15

86500

O,O-Dimethyl S-[(4-oxo-1,2,3-benzotriazin-3(4H)-

yl)methyl] ester, Phosphorodithioic acid 173 87865 2,3,4,5,6-Pentachlorophenol 170 8

88302 4-Nitro-3-(trifluoromethyl)phenol 1120 84

94097 4-Aminobenzoic acid ethyl ester 141 95954 2,4,5-Trichlorophenol 10

100027 4-Nitrophenol 35

104405 4-Nonylphenol 58 108952 Phenol 83 8

114261 2-(1-Methylethoxy)phenol, 1-(N-Methylcarbamate) 9

115866 Phosphoric acid, Triphenyl ester 76 118967 2-Methyl-1,3,5-trinitrobenzene 10

121755

[(Dimethoxyphosphinothioyl)thio]butanedioic acid,

Diethyl ester 96 119

122145

Phosphorothioic acid O,O-dimethyl O-(3-methyl-4-

nitrophenyl)ester 185 22

Page 19: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

2

127651 N-Chloro-4-methylbenzenesulfonamide, Sodium salt 121 131522 2,3,4,5,6-Pentachlorophenol sodium salt (1:1) 115 143339 Sodium cyanide (Na(CN)) 74

315184

4-(Dimethylamino)-3,5-dimethylphenol,

Methylcarbamate(ester) 91

318989

1-[(1-Methylethyl)amino]-3-(1-naphthalenyloxy)-2-

propanol hydrochloride (1:1) 8

330552 N'-(3,4-Dichlorophenyl)-N-methoxy-N-methylurea 17

333200 Thiocyanic acid, Potassium salt (1:1) 67

333415

O,O-Diethyl O-[6-methyl-2-(1-methylethyl)-4-

pyrimidinyl] ester phosphorothioic acid 52

335671

2,2,3,3,4,4,5,5,6,6,7,7,8,8,8-Pentadecafluorooctanoic

acid 8

481390 Juglone 52

569642

N-[4-[[4-(Dimethylamino)phenyl]phenylmethylene]-

2,5-cyclohexadien-1-ylidene]-N-

methylmethanaminium chloride 99 1071836 N-(Phosphonomethyl)glycine 73 1314621 Vanadium oxide (V2O5) 153 1336363 1,1'-Biphenyl, Chloro derivs. 26

1397940 Antimycin A 156

1420048

5-Chloro-N-(2-chloro-4-nitrophenyl)-2-

hydrozybenzamide, compd. with 2-aminoethanol (1:1) 78

1563662

2,3-Dihydro-2,2-dimethyl-7-benzofuranol 7-(N-

methylcarbamate) 18

1582098

2,6-Dinitro-N,N-dipropyl-4-

(trifluoromethyl)benzenamine 63

1912249

6-Chloro-N-ethyl-N'-(1-methylethyl)-1,3,5-triazine-

2,4-diamine 30

1929733

2-(2,4-Dichlorophenoxy)acetic acid, 2-Butoxyethyl

ester 65

2032599

4-(Dimethylamino)-3-methylphenol,

Methylcarbamate (ester) 137

2921882

Phosphorothioic acid, O,O-Diethyl O-(3,5,6-trichloro-

2-pyridinyl) ester 65

3383968

O,O'-(Thiodi-4,1-phenylene) O,O,O',O'-tetramethyl

ester phosphorothioic acid 76 6484522 Nitric acid ammonium salt 30

7440439 Cadmium 13

7440508 Copper 62 86

7446700 Aluminum chloride (AlCl3) 10

7447394 Copper chloride (CuCl2) 248 7632000 Nitrous acid, Sodium salt 104 7646799 Cobalt chloride (CoCl2) 49

7646857 Zinc chloride (ZnCl2) 504 27

7647145 Sodium chloride (NaCl) 63 34

7718549 Nickel chloride (NiCl2) 55 11

7722647 Potassium permanganate 89 16

Page 20: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

3

7722841 Hydrogen peroxide (H2O2) 32

7733020 Sulfuric acid, Zinc salt (1:1) 197 52

7758954 Lead chloride (PbCl2) 12

7758987 Sulfuric acid copper(2+) salt (1:1) 422 263

7761888 Nitric acid, silver (1+) salt 71 216

7778543 Hypochlorous acid, Calcium salt (2:1) 67 7783064 Hydrogen sulfide 160 7786814 Sulfuric acid, Nickel(2+)salt (1:1) 9

8001352 Toxaphene 146

8018017

[N-[2-

[(Dithiocarboxy)amino]ethyl]carbamodithioato(2-)-

kappaS,kappaS']manganese mixt. with [N-[2-

[(dithiocarboxy)amino]ethyl]carbamodithioato(2-)-

kappaS,kappaS']zinc 19

10043013 Sulfuric acid, Aluminum salt (3:2) 80

10049044 Chlorine oxide (ClO2) 11

10108642 Cadmium chloride (CdCl2) 451 132

10124364 Sulfuric acid, Cadmium salt (1:1) 55 22

12125029 Ammonium chloride ((NH4)Cl) 141 25

15307865 2-[(2,6-Dichlorophenyl)amino]benzeneacetic acid 65

16752775

N-[[(Methylamino)carbonyl]oxy]ethanimidothioic

acid methyl ester 50

30560191

N-Acetylphosphoramidothioic acid O,S-dimethyl

ester 56

38641940

N-(Phosphonomethyl)glycine compd. with 2-

propanamine (1:1) 81

45298906

1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8-Heptadecafluoro-1-

octanesulfonic acid ion(1-) 8

52315078

3-(2,2-Dichloroethenyl)-2,2-

dimethylcyclopropanecarboxylic acid cyano(3-

phenoxyphenyl)methyl ester 8

52645531

3-(2,2-Dichloroethenyl)-2,2-

dimethylcyclopropanecarboxylic acid, (3-

Phenoxyphenyl)methyl ester 75

52918635

(1R,3R)-3-(2,2-Dibromoethenyl)-2,2-

dimethylcyclopropanecarboxylic acid (S)cyano(3-

phenoxyphenyl)methyl ester 23

55406536 N-Butylcarbamic acid, 3-Iodo-2-propyn-1-yl ester 11

68783788

Dimethylditallow alkyl quaternary ammonium

compounds, Chlorides 35

Page 21: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Table S2. Fish species considered in the data analyses. The number of assays available

for each endpoint is also given.

Common name Species name Family LC50 NOEC

Bream Abramis brama Cyprinidae 15 Black bullhead Ameiurus melas Ictaluridae 91 Common eel Anguilla anguilla Anguillidae 6 Chameleon cichlid Australoheros facetus Cichlidae 16 Goldfish Carassius auratus Cyprinidae 484 65

Common carp Cyprinus carpio Cyprinidae 334 275

Northern pike Esox lucius Esocidae 93 4

Mummichog Fundulus heteroclitus Cyprinodontidae 5 Three-spined stickleback Gasterosteus aculeatus Gasterosteidae 28 35

Channel catfish Ictalurus punctatus Ictaluridae 748 189

Pumpkinseed Lepomis gibbosus Centrarchidae 22 Largemouth bass Micropterus salmoides Centrarchidae 108 4

Coho salmon Oncorhynchus kisutch Salmonidae 424 92

Rainbow trout Oncorhynchus mykiss Salmonidae 3582 1026

European perch Perca fluviatilis Percidae 15 59

Sea lamprey Petromyzon marinus Petromyzontidae 530 84

Guppy Poecilia reticulata Poeciliidae 242 14

Top mouth gudgeon Pseudorasbora parva Cyprinidae 4 Roach Rutilus rutilus Cyprinidae 27 Atlantic salmon Salmo salar Salmonidae 268 19

Brown trout Salmo trutta Salmonidae 267 13

Brook trout Salvelinus fontinalis Salmonidae 561 75

Zander Sander lucioperca Percidae 6 Rudd Scardinius erythrophthalmus Cyprinidae 6 Tench Tinca tinca Cyprinidae 10 33

Page 22: Examining predictors of chemical toxicity in freshwater ...Examining predictors of chemical toxicity in freshwater fish ... and is also a key parameter for the environmental fate and

1

Table S3. Relationship of endpoints with octanol-water partition coefficients (log P) for

each fish species (see also Figures 5-6). SE = standard error, d.f. = degrees of freedom.

Significant regressions (p < 0.05) are highlighted in bold. NA = not available (species

tested in a single log P value).

Species Endpoint intercept slope SE d.f. R2 P

Abramis brama LC50 4.8373 0.1616 0.139 13 0.46 0.005

Ameiurus melas LC50 3.4687 -0.3792 0.299 89 0.23 <0.0001

Anguilla anguilla LC50 2.7149 NA 0.049 5 NA NA

Australoheros facetus LC50 3.2848 NA 0.207 15 NA NA

Carassius auratus LC50 3.5879 -0.2055 0.090 482 0.10 <0.0001

Cyprinus carpio LC50 3.1543 -0.039 0.065 332 0.01 0.1397

Esox lucius LC50 1.7918 0.0031 0.103 91 0.01 0.9608

Fundulus heteroclitus LC50 1.5253 -0.4115 0.184 3 0.89 0.015

Gasterosteus aculeatus LC50 3.187 0.0426 0.265 26 0.01 0.6489

Ictalurus punctatus LC50 3.3349 -0.2249 0.047 746 0.26 <0.0001

Lepomis gibbosus LC50 2.36 -0.3456 0.175 20 0.62 <0.0001

Micropterus salmoides LC50 2.5816 -0.2184 0.182 106 0.13 0.0002

Oncorhynchus kisutch LC50 3.0031 -0.2038 0.073 422 0.15 <0.0001

Oncorhynchus mykiss LC50 2.6815 -0.1029 0.024 3580 0.04 <0.0001

Perca fluviatilis LC50 3.7808 NA 0.146 14 NA NA

Petromyzon marinus LC50 6.3282 -1.615 0.155 528 0.41 <0.0001

Poecilia reticulata LC50 2.9216 -0.2572 0.086 240 0.30 <0.0001

Pseudorasbora parva LC50 2.8354 NA 0.101 3 NA NA

Rutilus rutilus LC50 4.126 0.0178 0.258 25 0.01 0.83

Salmo salar LC50 3.4059 -0.1919 0.107 25 0.10 <0.0001

Salmo trutta LC50 3.2805 -0.0523 0.105 265 0.01 0.1802

Salvelinus fontinalis LC50 2.9727 -0.2119 0.07 559 0.16 <0.0001

Sander lucioperca LC50 1.0942 NA 0.123 NA NA

Scardinius erythrophthalmus LC50 4.2632 NA 0.056 NA NA

Tinca tinca LC50 1.1838 NA 0.114 NA NA

Carassius auratus NOEC 4.1744 0.5164 0.164 63 0.37 <0.0001

Cyprinus carpio NOEC 2.7134 -0.074 0.167 273 0.01 0.1619

Esox lucius NOEC 1.374 NA 0.105 3 NA NA

Gasterosteus aculeatus NOEC 3.2374 NA 0.055 34 NA NA

Ictalurus punctatus NOEC 2.9359 0.2357 0.047 187 0.41 <0.0001

Micropterus salmoides NOEC 1.5961 NA 0.125 3 NA NA

Oncorhynchus kisutch NOEC 2.0367 -0.4442 0.161 90 0.13 0.0004

Oncorhynchus mykiss NOEC 1.6411 0.0743 0.054 1024 0.01 0.0005

Perca fluviatilis NOEC 4.301 NA 1.344 58 NA NA

Petromyzon marinus NOEC 3.663 NA 3.005 83 NA NA

Poecilia reticulata NOEC 2.7834 NA 0.001 13 NA NA

Salmo salar NOEC 1.8009 -0.2447 0.070 17 0.06 0.3186

Salmo trutta NOEC 2.5052 0.4047 0.330 11 0.48 0.009

Salvelinus fontinalis NOEC 1.4674 -0.0675 0.172 73 0.02 0.1977

Tinca tinca NOEC 4.0618 -1.0019 0.146 31 0.96 <0.0001

5