15
Advanced Drug Delivery Reviews 54 (2002) 417–431 www.elsevier.com / locate / drugdeliv Computer systems for the prediction of toxicity: an update * Nigel Greene MS 8274-1246, Drug Safety Evaluation, Pfizer Global Research and Development, Eastern Point Road, Groton, CT 06340, USA Received 10 September 2001; accepted 16 October 2001 Abstract In order to survive in the current economic climate, the pharmaceutical, agrochemical and personal product companies are required to produce large numbers of new, effective products whilst significantly reducing development time and costs. With the advent of combinatorial chemistry and high-throughput screening (HTS), the numbers of new candidate structures coming out of the discovery cycle has increased significantly. This has created a demand for faster screening of the toxicological properties of these candidates. Not surprisingly, computer methods for toxicity prediction offer an attractive solution to this problem because of their ability to screen large numbers of structures even before synthesis has occurred. In this paper the major, commercially available computer software systems for toxicity prediction are discussed together with their main strengths and limitations. 2002 Elsevier Science B.V. All rights reserved. Keywords: SAR; QSAR; Prediction; DEREK; MultiCASE; TOPKAT; HazardExpert; TOXSYS; COMPACT; OncoLogic Contents 1. Introduction ............................................................................................................................................................................ 418 2. Commercial toxicity prediction systems .................................................................................................................................... 418 2.1. DEREK........................................................................................................................................................................... 418 2.1.1. Strengths of DEREK .............................................................................................................................................. 421 2.1.2. Limitations of DEREK ........................................................................................................................................... 421 2.2. OncoLogic ...................................................................................................................................................................... 421 2.2.1. Strengths of OncoLogic .......................................................................................................................................... 421 2.2.2. Limitations of OncoLogic ....................................................................................................................................... 421 2.3. HazardExpert ................................................................................................................................................................... 422 2.3.1. Strengths of HazardExpert ...................................................................................................................................... 422 2.3.2. Limitations of HazardExpert ................................................................................................................................... 422 2.4. COMPACT...................................................................................................................................................................... 422 2.4.1. Strengths of COMPACT ......................................................................................................................................... 423 2.4.2. Limitations of COMPACT ...................................................................................................................................... 423 2.5. CASE / Multi CASE / MCASE-ES ...................................................................................................................................... 423 2.5.1. Strengths of CASE / Multi-CASE............................................................................................................................. 425 2.5.2. Limitations of CASE / Multi-CASE.......................................................................................................................... 425 2.6. TOPKAT ........................................................................................................................................................................ 425 *Tel.: 11-860-715-4921; fax: 11-860-715-1251. E-mail address: nigel [email protected] (N. Greene). ] 0169-409X / 02 / $ – see front matter 2002 Elsevier Science B.V. All rights reserved. PII: S0169-409X(02)00012-1

Computer systems for the prediction of toxicity: an update

Embed Size (px)

Citation preview

Page 1: Computer systems for the prediction of toxicity: an update

Advanced Drug Delivery Reviews 54 (2002) 417–431www.elsevier.com/ locate /drugdeliv

Computer systems for the prediction of toxicity: an update*Nigel Greene

MS 8274-1246, Drug Safety Evaluation, Pfizer Global Research and Development, Eastern Point Road, Groton, CT 06340, USA

Received 10 September 2001; accepted 16 October 2001

Abstract

In order to survive in the current economic climate, the pharmaceutical, agrochemical and personal product companies arerequired to produce large numbers of new, effective products whilst significantly reducing development time and costs. Withthe advent of combinatorial chemistry and high-throughput screening (HTS), the numbers of new candidate structurescoming out of the discovery cycle has increased significantly. This has created a demand for faster screening of thetoxicological properties of these candidates. Not surprisingly, computer methods for toxicity prediction offer an attractivesolution to this problem because of their ability to screen large numbers of structures even before synthesis has occurred. Inthis paper the major, commercially available computer software systems for toxicity prediction are discussed together withtheir main strengths and limitations. 2002 Elsevier Science B.V. All rights reserved.

Keywords: SAR; QSAR; Prediction; DEREK; MultiCASE; TOPKAT; HazardExpert; TOXSYS; COMPACT; OncoLogic

Contents

1. Introduction ............................................................................................................................................................................ 4182. Commercial toxicity prediction systems .................................................................................................................................... 418

2.1. DEREK........................................................................................................................................................................... 4182.1.1. Strengths of DEREK .............................................................................................................................................. 4212.1.2. Limitations of DEREK ........................................................................................................................................... 421

2.2. OncoLogic ...................................................................................................................................................................... 4212.2.1. Strengths of OncoLogic .......................................................................................................................................... 4212.2.2. Limitations of OncoLogic ....................................................................................................................................... 421

2.3. HazardExpert................................................................................................................................................................... 4222.3.1. Strengths of HazardExpert ...................................................................................................................................... 4222.3.2. Limitations of HazardExpert ................................................................................................................................... 422

2.4. COMPACT...................................................................................................................................................................... 4222.4.1. Strengths of COMPACT ......................................................................................................................................... 4232.4.2. Limitations of COMPACT ...................................................................................................................................... 423

2.5. CASE/Multi CASE/MCASE-ES...................................................................................................................................... 4232.5.1. Strengths of CASE/Multi-CASE............................................................................................................................. 4252.5.2. Limitations of CASE/Multi-CASE.......................................................................................................................... 425

2.6. TOPKAT ........................................................................................................................................................................ 425

*Tel.: 11-860-715-4921; fax: 11-860-715-1251.E-mail address: nigel [email protected] (N. Greene).

]

0169-409X/02/$ – see front matter 2002 Elsevier Science B.V. All rights reserved.PI I : S0169-409X( 02 )00012-1

Page 2: Computer systems for the prediction of toxicity: an update

418 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

2.6.1. Strengths of TOPKAT ............................................................................................................................................ 4262.6.2. Limitations of TOPKAT ......................................................................................................................................... 426

2.7. Other software applications............................................................................................................................................... 4273. Genomic and proteomic applications ........................................................................................................................................ 4274. Performance measures of commercial systems for toxicity prediction.......................................................................................... 427

4.1. NTP carcinogenicity exercises .......................................................................................................................................... 4284.2. Comparisons for Salmonella typhimurium (Ames) test results.............................................................................................. 428

5. Conclusions and future perspectives ......................................................................................................................................... 430References .................................................................................................................................................................................. 430

1. Introduction methods have had limited success in predicting thetoxicological properties of novel chemical structures

Structure–activity relationships (SAR), i.e., the that are not representative of the training set used inrelationships between chemical structure and bio- developing the model. This can be attributed in partlogical activity, have long been applied to the to the quality and quantity of the available toxicolo-prediction and characterization of chemical toxicity gy data, but also to the complexity of the toxicologi-[1]. The early linear free-energy approaches de- cal endpoint and breadth of mechanisms that theveloped by Hansch and Free-Wilson have provided a QSAR methods are attempting to predict.fundamental scientific framework for the quantitative This complexity of issues relating to the predictioncorrelation of chemical structure with biological of toxicological properties has led to the develop-activity and spurred many of developments in the ment of two main types of commercial approachfield of quantitative structure–activity relationships being applied to prediction: knowledge-based ap-(QSAR) [2,3]. proaches and statistically based systems. Knowledge-

The problem of predicting the toxicity of chemi- based systems such as DEREK, OncoLogic andcals is, however, quite different from that of model- HazardExpert use rules about generalized relation-ing medicinal properties. In contrast to lead optimi- ships between structure and biological activity thatzation studies, toxicity testing is motivated by social are derived from human expert opinion and interpre-concerns for human health and safety. Therefore, the tation of toxicological data to predict the potentialavailable data pool for modeling chemical toxicity toxicity of novel chemicals. On the other hand,tends to be both structurally and mechanistically statistically based systems such as TOPKAT, anddiverse [1]. The primary goal of chemical toxicity MultiCASE use calculated parameters, structuralprediction is to distinguish between toxicologically connectivity and the application of various statisticalactive and inactive compounds. Typically, multiple methods to derive mathematical relationships for amechanisms can lead to the same toxicological training set of non-congeneric compounds in anendpoint and therefore predictive models need to be unbiased manner.able to distinguish multiple regions of activity amidst The aforementioned programs have formed thea mass of inactive chemical structures. basis for many review articles and publications. In

Over recent years SAR and QSAR techniques what follows, we will attempt to provide an updatedhave been applied to a wide variety of toxicological view of recent progress whilst comparing and con-endpoints from the prediction of LD and maximum trasting the functionality and utility of each of the50

tolerated dose (MTD) values to Salmonella leading commercial systems for toxicity prediction.typhimurium (Ames) assay results, carcinogenic po-tential and developmental toxicity effects. Generallyhowever, toxicological endpoints such as carcino- 2. Commercial toxicity prediction systemsgenicity, reproductive effects and hepatotoxicity aremechanistically ill-defined leading to added com- 2.1. DEREKplexity when trying to predict these endpoints.

It is not surprising therefore that these QSAR DEREK (Deductive Estimation of Risk from

Page 3: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 419

Existing Knowledge) is a knowledge-based expert program also allows for batch processing of struc-system, originally devised at Schering Agrochemi- tures using MDL standard SD files as input, andcals and currently being marketed by LHASA Ltd. output as either tab-delimited text, RTF file format or[4]. It uses known toxicological SAR relationships to modified SD files.make qualitative predictions about the activity of a Other features of the latest release (version 4.01)query compound [5–7]. include the incorporation of an estimate of skin

The development of the rules in DEREK is permeability using an internal log P calculation tooverseen by a collaborative group which consists of moderate the predictions for skin sensitization. Thisrepresentatives from commercial, educational and feature uses a proprietary reasoning engine capablenon-profit organizations. The rules within the com- of combining both numerical and non-numericalmercially available system cover a broad range of statements to reach a conclusion about a given event.toxicological endpoints but its main strengths include This reasoning model is based upon the mathemati-the prediction of mutagenicity, carcinogenicity and cal framework of the Logic of Argumentation andskin sensitization. has been better described elsewhere [8]. The reason-

The new Microsoft Windows compatible version ing engine can be adapted to any toxicologicalof DEREK (Fig. 1) uses the MDL ISIS/Draw endpoint within the DEREK system.package as its molecular editor, although MDL Once a prediction has been made by the system,standard MOL files can be imported should the user the user is able to explore the alert description for thenot have access to ISIS/Draw. This version of the rules fired. This provides a brief outline of the basis

Fig. 1. The DEREK for Windows interface.

Page 4: Computer systems for the prediction of toxicity: an update

420 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

for the rule and describes the structural requirements can be restricted within an organization by the use ofto fire the alert (Fig. 2). In some cases, example license keys. However, the default rule set providedcompounds with known toxicological assay results by LHASA cannot be altered by the user.have been provided to give further credence to the Whilst DEREK does not explicitly identify metab-existence of the rule. Literature references are also olites of the query, the rules do take into considera-provided so that the user can more easily verify the tion some account of metabolic pathways that chemi-applicability of the structural alert to the query cals can undergo either in vitro or in vivo. This iscompound. limited to those chemical classes where mechanisms

Users are able to add their own rules to the system of interaction with biological systems are adequatelyusing the graphical editor provided with the complete understood or the key metabolite is easily identifiedversion of the program although access to the editor [5]. However, recent developments have seen a

Fig. 2. An example of an Alert Description from DEREK.

Page 5: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 421

formal link made to a sister product, called system in which the rules are structured in a hierar-METEOR, which generates metabolites of query chical decision tree structure, consisting of ‘if-then-compounds. else’ logic statements. Users start by selecting the

subsystem according to the type of substance theyare interested in from four main categories, fibers,

2.1.1. Strengths of DEREK metals or metal-containing compounds, polymers ororganics. The organics subsystem is by far the most

(1) Development of the rules in DEREK is peer- extensive and best developed of the four and con-reviewed by the users. tains over 40 000 rules based on knowledge and

(2) DEREK provides an easy to use interface which generalizations derived from the examination ofprovides on-screen justifications for its predic- more than 10 000 organic chemicals which fall intotions including highlighting of the toxicophore. approximately 50 chemical classifications within the

(3) Adding rules to the system is very easy with the program [9]. It provides an overall assessment of thenew graphical rule editor although sufficient carcinogenic potential based on the answers to thesafeguards are in place to avoid illicit modi- questions provided by the user with a level offication of the system. concern for the toxicity. The user is also provided

(4) A batch processing feature makes large-scale with a detailed justification report which conveys thethroughput possible for both testing and valida- expert opinion for the mechanistic basis of toxicity.tion. As with the DEREK system, metabolites are not

(5) Rules are based on scientific knowledge of explicitly generated but some metabolism is inherentstructure–toxicity relationships and mecha- within the knowledge base of the system. However,nisms. in contrast to DEREK, batch processing is not

available and rules can neither be added nor modified2.1.2. Limitations of DEREK by the user.

(1) The use of physicochemical parameters should 2.2.1. Strengths of OncoLogicbe extended to include other 2D and 3D param-eters where appropriate. Their use in predicting (1) The wealth of expertise in the evaluation ofother toxicological endpoints also needs to be carcinogenicity that underpins the rules in On-explored further. coLogic make this system a rich source of

(2) The activating and detoxification effects of knowledge.metabolism need to be explored in more detail. (2) The reports generated contain mechanism-based

(3) Links to public or commercial toxicity databases justifications for the evaluation that accuratelywould allow for structure searching and lead to reflects the rationale used by the US EPA’sgreater confidence in the predictions made by Structure–Activity Team.the program. (3) The system restricts its evaluations to those

chemical classes for which adequate knowledge2.2. OncoLogic is available for a prediction to be made.

OncoLogic is a knowledge-based expert system 2.2.2. Limitations of OncoLogicdeveloped and marketed by LogiChem Inc. It uses aseries of rules to describe and predict the carcino- (1) The system is not able to recognize or classifygenic potential of chemical structures. These rules compounds according to chemical structurehave been developed in collaboration with the Struc- automatically and it is up to the user to de-ture–Activity Team at the US EPA’s Office of termine all of the classes to which their chemi-Pollution Prevention and Toxics. cal structure belongs.

In its current form, OncoLogic is a PC-based (2) OncoLogic is unable to calculate or use

Page 6: Computer systems for the prediction of toxicity: an update

422 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

physicochemical parameters as part of its be added to the database and new toxicophoreevaluation. fragments can be added to the knowledge base using

(3) If the system is unable to make a prediction due the ‘Knowledge Maintenance module’ [9]. It is alsoto of a lack coverage by the rules, it is difficult linked to MetabolExpert, a system to take intoto determine whether this is because of a lack of account the effect of metabolism on the queryknowledge by the experts or because the system compound.has not yet been developed sufficiently.

(4) The inability to batch process structures makes 2.3.1. Strengths of HazardExperttesting and validation laborious.

(5) Rules cannot be added or edited by the user. (1) HazardExpert incorporates some reasonableestimates of physicochemical properties in its

2.3. HazardExpert predictions.(2) It can provide estimates for bioavailability and

HazardExpert, produced by CompuDrug [10], is bioaccumulation.another rule-based approach to the problem of (3) It provides semi-quantitative estimates for toxic-toxicity prediction [11]. Chemical structures can ity.either be selected from a database or, if it is a new (4) The knowledge base can be inspected and editedchemical, the user has to enter the structure into the by the end users.database before a prediction can be made. The useralso has to define the species, dose level, route and 2.3.2. Limitations of HazardExpertduration of exposure.

The program works by searching the query struc- (1) HazardExpert gives no indication of the relativeture for known toxicophores that are derived from probabilities for the formation of metabolites.literature in the field of QSAR or from the US EPA (2) Further ‘validation’ studies of the systemand Interagency Testing Committee (ITC) mono- against novel data sets may lead to greatergraphs. These fragments are stored in the ‘Toxic confidence in the predictions.Fragments Knowledge Base’ and include substruc- (3) The ability to add different physicochemicaltures that exert both positive and negative modulator parameter calculations and QSAR models mayeffects. Once a toxicophore has been identified, this lead to improved performance.triggers estimates for a number of toxicity endpointsbased on rules in the knowledge base. These end- 2.4. COMPACTpoints include mutagenicity, carcinogenicity,teratogenicity, irritation, sensitization, immunotoxici- COMPACT (Computer-Optimised Molecularty and neurotoxicity. Parametric Analysis of Chemical Toxicity) is a

Calculations of physicochemical properties such as methodology developed by Lewis et al. at themolecular weight, log P, and pK are used in QSAR University of Surrey in the UK. The system worksa

equations to modulate the predictions by simulating by predicting the potential for a chemical to act as athe effects of bioavailability and bioaccumulation. substrate for one or more of the P450 cytochromes,Fuzzy logic is also used to give models for different P450 I, P450 IIB, P450 IIE, and P450 IV [12,13]. Itexposure conditions. can also be used to identify chemicals that have the

The resulting predictions are given four distinct potential to bind to receptors involved in either theclassifications based upon the outcome of the assess- induction of cytochromes P450 or peroxisome prolif-ment and provides an International Agency for eration.Research on Cancer (IARC) human carcinogenic The COMPACT methodology assumes that therisk classification. The user can also search abstracts structural characteristics of a molecule will deter-published in the journal Quantitative Structure–Ac- mine its ability to fit into the appropriate binding sitetivity Relationships using keywords to find literature of an enzyme, in this case cytochrome P450, and itssupport for the prediction. Chemical structures can electronic characteristics will determine the ability of

Page 7: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 423

the enzyme–substrate complex to undergo oxidative (2) Some direct-acting carcinogens can be identifiedmetabolism. From molecular modeling studies of the by this system but not all.cytochrome P450, the structural parameter ‘molecu-lar planarity’, defined as the molecular cross-section- 2.5. CASE /Multi CASE /MCASE-ESal area divided by the molecular depth squared, isconsidered to be an important factor in determining The CASE (Computer Automated Structurethe binding affinity to the active site of the cyto- Evaluation) programs are primarily VAX-based sys-chromes [14]. tems developed by MultiCASE [15] and marketed by

The electronic parameter used in COMPACT is Charles River. The unique capability of the CASEthe ‘electronic activation energy’ (DE) defined as the system is to be able to automatically generatedifference in energy between the highest occupied predictive models from a training set of non-con-molecular orbital (HOMO) and the lowest unoc- generic compounds with associated biological orcupied molecular orbital (LUMO). This DE value of toxicological data. The activity of each chemical ina molecule can be used as a measure of its disposi- the training set is classified according to a lineartion for metabolic activation: the smaller the value scale of CASE units in which, typically, chemicalsthe more susceptible the molecule is to activation. with an assigned value of 10–19 are inactive, 20–29

By combining these two parameters according to have marginal activity and 30–99 are active. TheEq. (1) it is possible to distinguish P450 I substrates program then takes the chemical structures of thefrom those for the other P450 cytochromes. Hence, training set and generates all possible fragments ofassuming that P450 I specificity is indicative of two to 10 heavy (non-hydrogen) atoms in length.carcinogenic potential, it is possible to differentiate Chemical structures can be entered using either ofbetween carcinogens and non-carcinogens. A COM- the KLN or SMILES line notations, or from threePACT radius of less than 15.5 is indicative of P450 I types of structure files, Clark Still, MDL MOL filespecificity. and SYBYL MOL files.

Statistical methods are then used to classify frag-2 2 2COMPACT radius 5 œ(DE 2 9.5) 1 ((a /d ) 2 7.8) ments as either ‘biophores’, i.e., those fragments(1) statistically associated with activity, and ‘biophobes’,

i.e., those fragments statistically associated with2.4.1. Strengths of COMPACT inactivity. Where relative potency information is

available, CASE is capable of applying linear regres-(1) The methodology for COMPACT prediction is sion techniques in an attempt to quantify the predic-

based on a mechanistic background supported tions made by the system. The Multi-CASE systemby molecular modeling studies is also capable of identifying fragments that act as

(2) The COMPACT radius is easy to calculate and modifiers to the activity of each biophore class [16–simple to apply to new molecules. 20]. Multi-CASE can also calculate various physico-

(3) A whole molecule approach is taken rather than chemical and 2D descriptors for use within thea fragment-based approach and is capable of QSAR development process.dealing with molecules of up to 150 heavy It is possible to purchase separately modules for(non-hydrogen) atoms. the prediction of a variety of endpoints, including

Ames mutagenicity, rodent carcinogenicity, irrita-tion, teratogenicity and biodegradation. More recent2.4.2. Limitations of COMPACTdevelopments have included modules developed bythe FDA Center for Drug Evaluation and Research(1) COMPACT is not a stand-alone toxicity predic-for rodent carcinogenicity based upon proprietarytion system and is specific to P450 mediateddata submitted to the FDA [21]. It is also possible forpathways. These need to be supplemented withusers to generate their own prediction models fromadditional information and the significance oftheir own data. However, as with all statisticalthe oxidative metabolism product should betechniques, care must be taken to ensure that there isconsidered.

Page 8: Computer systems for the prediction of toxicity: an update

424 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

sufficient coverage of chemical space and that the compound. The contributions of the various parame-biological data have been consistently classified. ters to the QSAR calculation are outlined for the userAutomated prediction systems such as CASE can be and a final activity in CASE units is determined. Aeasily skewed by inadequate coverage or misclassifi- probability for activity and a confidence level in thecation of activity. biophore are presented to the user. The system also

The output from Multi-CASE (Fig. 3) takes the alerts the user to the presence of fragments in theform of textual information describing the physico- query molecule that were not present in the trainingchemical properties calculated for the query and the set and hence may be outside of the chemical spacepresence of active or inactive fragments in the query coverage.

Fig. 3. Sample output from the Multi-CASE program.

Page 9: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 425

However, the output from Multi-CASE can often 2.6. TOPKATbe confusing and leave the user uncertain as to thefinal outcome of the results. Statements pertaining to TOPKAT (TOxicity Prediction by Komputer As-the water solubility can often contradict the conclu- sisted Technology), originally developed by Healthsions drawn from the presence of a biophore and, Designs and now developed and marketed by Ac-moreover, the biophore is often considered to be celrys [23], is a PC-based system for the predictionquestionable without an adequate explanation. This of a range of acute and chronic toxicity endpoints.confusion can lead to a great deal of flexibility in the These include rodent carcinogenicity, Ames muta-interpretation of the results from Multi-CASE and genicity, skin sensitization and rat oral LD values50

hence the system needs to be used with care. amongst many the many endpoints available, withOther features of Multi-CASE include the ability each endpoint being distributed as an individual

to batch process structures making the validation and module.testing process easier and quicker. It also has links to Structures are entered into the program usingthe META program for metabolism prediction [22]. SMILES code representations of chemical structures

and the user then selects the appropriate module orendpoint (Fig. 4). The query compound is then

2.5.1. Strengths of CASE /Multi-CASE analyzed to ensure that it is ‘covered by’ or lieswithin the optimum prediction space (OPS) for the

(1) Multi-CASE is capable of generating predictive module selected.models that are not reliant on foreknowledge of TOPKAT predictions are derived by using themechanisms of action. concept of linear free energy relationships in a

(2) The program utilizes a number of physicochem- statistical regression analysis structure [24]. Theical properties to modulate its predictions. models use topology-based descriptors that have

(3) The input from the FDA has substantially been shown by the developers of TOPKAT to beincreased the applicability of Multi-CASE to the comparable to the molecular orbital methods used inprediction of rodent carcinogenicity for pharma- more conventional QSAR studies [24]. These ‘elec-ceutical type compounds. tro-topological’ descriptors are continuous quantita-

(4) Batch processing of molecules is very fast once tive measures that combine a graph theoretical indexthe input files have been generated. with an approximate measure of the valence elec-

tronic state of the atom [25]. These descriptors,whilst less intuitive, provide a more generalized,

2.5.2. Limitations of CASE /Multi-CASE numerical representation of an atom within a mole-cule that is more amenable to statistical manipula-

(1) The quality of the predictions made by the tion.system is closely linked to the quality of the For continuous measures such as LD and50

data used in the training set. The phrase ‘gar- LOAEL, TOPKAT uses multiple linear regressionbage in, garbage out’ applies in this case. equations to generate predictions reported as weight /

(2) Output from the program is often ambiguous weight or weight /volume values. In cases where theand can lead to misinterpretation of the predic- outcome of a bioassay is either positive or negative,tions. for example in Ames mutagenicity assays, TOPKAT

(3) The system currently relies on VAX-based tech- uses two-group linear discriminant regression func-nology although a PC-based system is promised tions to generate endpoint predictions. In these cases,later in 2001. predictions are given a probability of a positive result

(4) The system often fails to distinguish between between 0 and 1 [3,9].molecules containing several small chains with- Similarity searches through the database used toin one complex fragment from other molecules generate the model can be performed to give the usercontaining the same fragments distributed separ- some level of confidence in the prediction. However,ately. it is worth noting that ‘similarity’ is defined using the

Page 10: Computer systems for the prediction of toxicity: an update

426 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

Fig. 4. TOPKAT interface.

electro-topological descriptors as opposed to the little time to obtain results for a variety ofmore intuitive chemical structure-based similarity. It endpoints.is therefore possible that quite different chemical (2) The program automatically checks to ensure thatstructures are returned that have the same electro- the query compound is within its optimumtopological characteristics. prediction space and alerts the user to those

In its current form TOPKAT does not allow users predictions thought to be unreliable.to implement their own models within the system, (3) ‘Similarity searches’ enable the user to put thenor does it allow for batch processing of compounds, prediction into perspective with known activethereby making high throughput tasks extremely time molecules.consuming. However, version 6.1 of TOPKAT,available in December 2001, will have the ability to 2.6.2. Limitations of TOPKATbatch process structures via an SD file input.

(1) The quantity and, more importantly, quality ofdata that can be used for QSAR development

2.6.1. Strengths of TOPKAT has resulted in poor coverage within somemodels.

(1) Once the structure has been entered it takes very (2) TOPKAT assumes that substructural features

Page 11: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 427

contribute independently to biological activity, 3. Genomic and proteomic applicationswhich is not always the case.

(3) The use of probabilities for positive versus Very recently, scientists working in the proteomicsnegative biological activity often leads to mis- and genomics arena have started to look at theinterpretations. These probabilities for activity prediction of toxicological endpoints from geneare sometimes mistakenly used as measures of expression data. Unraveling the mechanisms andpotency when ranking chemicals. pathways that result in gene expression profiles is a

(4) The lack of batch processing capability makes complex and difficult problem and will requiretesting and validation exercises difficult and extensive experimentation and computational powertime consuming. to resolve. However, this research may well reveal

vital mechanistic information that will help to unlock2.7. Other software applications some of the mysteries of toxicity expression. The

genomics research area is still in the very earlyMore recent software applications such as stages of analyzing the information presented by the

QSARIS and TOXSYS developed and marketed by gene expression arrays and it may be some timeScivision [26], whilst not truly predictive systems, before any significant impact is made on the area ofhave provided low-cost, easy to use tools for the chemical toxicity prediction.toxicologist to generate and test predictive modelsfor toxicological endpoints. The next release ofTOXSYS will include a mutagenicity prediction 4. Performance measures of commercial systemsmodule based upon data available in the public for toxicity predictiondomain or supplied by the FDA.

QSARIS is designed to allow toxicologists (and Over recent years many exercises have beenchemists) to build QSAR models based on their own conducted and published concerning the performancedata sets. It provides a simple to use interface that of the various toxicity prediction systems [27–31].allows the user to calculate some of the common As stated by Richard and Benigni [30], there are twoelectro-topological, 2D and 3D descriptors used in common approaches to assessing the predictivecommercial toxicology packages. On the other hand, performance of these programs: beta-testing andTOXSYS is designed more for data storage and prospective prediction. In beta-testing, the availableretrieval, allowing the user to search and filter data set is divided into a training and test set. Thetoxicological data by either chemical structure or model is built upon the training set and the per-property. The current version of TOXSYS contains formance is then measured for the test set. Typicallyonly RTECS data but non-proprietary carcino- the test set is a representative subset of the traininggenicity data from the FDA are promised in the set of chemicals. It is therefore common to see afuture releases. It is possible for users to import and good correlation between the predicted and ex-search their own in-house data in either package. perimental toxicology results in these cases.Models generated in QSARIS can then be used to In prospective prediction, however, the model ispredict the activity of all chemicals stored in a applied to a novel group of chemical structuresTOXSYS database. where there is often little consistency with training

Whilst software that allows the application of set used to generate the model. Often the toxicity ofstatistical modeling to novel data sets will help to these chemicals has yet to be determined or has beendrive the understanding and prediction of toxicologi- previously hidden from the developers. The mostcal endpoints, these programs in the hands of inex- well documented of these exercises were conductedperienced or unwary users can lead to false assump- by the National Toxicology Program (NTP) in whichtions and ultimately to over-interpretation of the system developers were invited to predict in advancepredictions. As with all modeling applications, the the carcinogenic potential for rodents of chemicalsuser must be aware of the limitations of the model that were about to be tested by the NTP.and the quality of the data used as its basis. When comparing the predictive performance of a

Page 12: Computer systems for the prediction of toxicity: an update

428 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

model against the results of a particular toxicological HazardExpert [40] and OncoLogic [41] were pub-assay, the reliability of the assay itself is often lished in advance of the bioassays being performed.overlooked. An analysis of the inter- and intra- To date 26 chemicals have completed testing andlaboratory reproducibility of Salmonella test results the results have been peer reviewed, two chemicalsyielded a strict positive versus negative concordance have completed testing but their results are yet to beof 85% [27]. It would therefore be unreasonable to peer reviewed and two chemicals have not completedexpect predictive models to fare better than the the testing yet. The results from the bioassays arereproducibility of the assay that they are attempting summarized in Table 2. The preliminary results ofto predict. the performance of the commercial systems based on

26 of the 30 chemicals have been published by4.1. NTP carcinogenicity exercises Benigni [31]. The overall concordance figures (Table

3), i.e., the percentage of correctly identified nega-The first exercise conducted by the NTP involved tive and positive carcinogens, show the OncoLogic

44 chemicals. The predictions made by Multi-CASE system to be leading the field of commercial sys-[32], TOPKAT [33], DEREK [7], and COMPACT tems.[34] were published in advance of the bioassaysbeing performed. A workshop was held [35] when itwas considered that enough (40 out of 44 chemicals) 4.2. Comparisons for Salmonella typhimuriumresults had completed testing in the four experimen- (Ames) test resultstal groups: male and female rat, male and femalemouse. The predictions were then compared to the The primary limitation of prospective evaluationresults from the bioassays. exercises has been the availability of sources of

This has been a topic for much discussion [28,30]. reliable toxicological data that have not been previ-The overall results for the commercial prediction ously used in the development of the predictivesystems as published by Benigni [28] are shown in models. One such source is the historical archives ofTable 1. By way of comparison human experts the pharmaceutical companies. Despite the efforts of(Ashby and Tennant), managed to correctly predict the consortium approach adopted by LHASA for the75% of the compounds, significantly out-performing development of DEREK and the efforts of the FDAthe computer systems. It later transpired that Ashby in developing the Multi-CASE program, these arc-and Tennant had access to some additional biological hives still hold a wealth of data that have remainedresults which helped to guide them in their predic- unpublished, and therefore unavailable to the de-tions. velopers of the commercial systems, due to the

A subsequent, similar exercise was proposed obvious concerns over competitive advantage.involving a further 30 chemicals selected for testing Many pharmaceutical companies are now begin-by the NTP [36]. Whilst testing has not been ning to use these previously unpublished data tocompleted on all chemicals, this exercise has already compare the relative performances of the commercialbeen the topic of some debate [30,37]. Predictions programs. In one such exercise, 974 proprietaryfrom Multi-CASE [38], DEREK [39], COMPACT/ structures with associated Ames test data from the

Pfizer genetic toxicology historical database wereprocessed through both DEREK (version 4.01) andTable 1

Results from the NTP 44 chemicals the Multi-CASE (version 3.45) module for Amesmutagenicity. The results for DEREK and Multi-System Overall

accuracy CASE are shown in Tables 4 and 5, respectively.(%) However, when comparing the performance of the

two systems it should be noted that mutagenicityDEREK 59TOPKAT 57 rules in DEREK are not specific to the Ames assayCOMPACT 54 and therefore there may be a tendency for a higherMulti-CASE 49 false positive rate than with those systems, such as

Page 13: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 429

Table 2Results to date from the NTP 30 chemicals exercise

Name CAS MR FR MM FM

*Anthraquinone 84-65-1 (1) 1 1 1

Chloroprene 126-99-8 1 1 1 1

1-Chloro-2-propanol 127-00-4 2 2 2 2

Cinnamaldehyde 104-55-2 NA NA NA NACitral 5392-40-5 2 2 2 (1)Cobalt sulfate heptahydrate 10026-24-1 (1) 1 1 1

Codeine 76-57-3 2 2 2 2

D&C Yellow No.11 8003-22-3 (1) (1) ND NDDiethanolamine 111-42-2 2 2 1 1

1,2-Dihydro-2,2,4-trimethylquinoline 147-47-7 (1) 2 2 2

*Emodin 518-82-1 2 Equivocal Equivocal 2

Ethylbenzene 100-41-4 (1) 1 1 1

Ethylene glycol monobutyl ether 111-76-2 2 Equivocal (1) (1)Furfuryl alcohol 98-00-0 (1) Equivocal (1) 2

Gallium arsenide 1303-00-0 2 1 2 2

Isobutene 115-11-7 (1) 2 2 2

Isobutyraldehyde 78-84-2 2 2 2 2

Methyleugenol 93-15-2 1 1 1 1

Molybdenum trioxide 1313-27-5 Equivocal 2 (1) (1)Nitromethane 75-52-5 2 1 1 1

Oxymethalone 434-07-1 Equivocal 1 ND NDPhenolphthalein 77-09-8 1 (1) 1 1

Primaclone 125-33-7 Equivocal 2 1 1

Pyridine 110-86-1 (1) Equivocal Equivocal 1

Scopolamine 6533-68-2 2 2 2 2

Sodium Nitrite 7632-00-0 2 2 2 EquivocalSodium xylenesulfonate 1300-72-7 2 2 2 2

t-Butylhydroquinone 1948-33-0 2 2 2 2

Tetrahydrofuran 109-99-9 (1) 2 2 1

Vanadium pentoxide 1314-62-1 NA NA NA NA

*Awaiting peer review; (1), some evidence of carcinogenicity; 1, clear evidence of carcinogenicity; 2, no evidence of carcinogenicity;NA, not complete; ND, not tested.

Table 3 Table 4Preliminary results of commercial systems for the NTP 30 Results for 972 proprietary Pfizer structures for the DEREKchemicals exercise program (version 4.01)

System Overall Ames DEREK version 4.01 Totalaccuracy result

Mutagenicity No(%)

alert alertOncoLogic 67

Positive 41 49 90COMPACT 44

Negative 335 547 882DEREK 38

Total 376 596 972Multi-CASE 18

tural fragments that were not covered in the trainingMulti-CASE, that are designed to predict Ames set. Therefore, for the purpose of this evaluationresults only. Multi-CASE predictions were limited to those 723

DEREK was unable to process two out of the 974 structures (74.2%) that did not contain unknownstructures in the test set (99.8%), whereas Multi- fragments (Table 6).CASE identified 251 chemicals that contained struc- These results are in agreement with those found by

Page 14: Computer systems for the prediction of toxicity: an update

430 N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431

Table 5 in toxicology endpoints expands, our ability toResults for 723 proprietary Pfizer structures for the Multi-CASE predict the toxicity of novel structures will improve.program (version 3.45)

Statistically based methods for predicting biologi-Ames Multi-CASE version 3.45 Total cal activity work best when applied to series ofresult

Active Inactive chemicals that interact with a biological system via asingle, common mechanism. However, these shouldPositive 19 44 63only be used to interpolate within the boundaries ofNegative 105 555 660

Total 124 596 723 the model and care should be taken to ensure thatany prediction made is done from within these limits.Extrapolation of these models beyond the limits ofTable 6their prediction space will only lead to unreliableComparison of performance for DEREK and MultiCASEresults.DEREK Multi-CASE

True predictivity will come only with the under-4.01 (%) 3.45 (%)standing of the mechanisms and factors that in-

Sensitivity 45 30fluence the expression of chemical toxicity. ModelsSpecificity 62 84will be required to identify and classify novelConcordance 60 79structures by their modes of action and apply spe-Where sensitivity is percentage of correctly predicted positives;cific, mechanistically based QSAR to predict theirspecificity is percentage of correctly predicted negatives; and

concordance is percentage of correctly predicted compounds toxicological properties. This will require the calcu-lation and inclusion of a wide variety of descriptorsto encompass the complexity of biological systems.

other pharmaceutical companies [42,43] when look-ing at their own proprietary structures. Combiningthese two systems to generate a single prediction, Referenceswhilst resulting in a higher overall concordance,significantly reduces the coverage of the model. [1] R. Benigni, A. Richard, Quantitative structure-based model-

However, it is worth noting that in most cases the ing applied to characterization and prediction of chemicaltoxicity; Methods: A Companion to Methods in Enzymolo-test sets are heavily biased towards Ames negativegy, Methods Enzymol. 14 (1998) 264–276.compounds, in the above example only 9% of

[2] T. Fujita, in: C. Hansch (Ed.), Comprehensive Medicinalcompounds are positive in Salmonella-based assays.Chemistry, Pergamon, Oxford, 1990, pp. 497–560.

Therefore models that are more conservative in their [3] R. Franke, in: Theoretical Drug Design Methods, Elsevier,approach and which tend to predict compounds to be Amsterdam, 1984.

[4] http: / /www.chem.leeds.ac.uk / luk /negative will naturally have higher concordance[5] N. Greene, P.N. Judson, J.J. Langowski, C.A. Marchant,values than those models that attempt to identify

Knowledge-based expert systems for toxicity prediction:positives.DEREK, StAR and METEOR, SAR QSAR, Environ. Res.10 (1998) 299–314.

[6] J.E. Ridings, M.D. Barratt, R. Cary, C.G. Earnshaw, C.E.Eggington, M.K. Ellis, P.N. Judson, J.J. Langowski, C.A.Marchant, M.P. Payne, W.P. Watson, T.D. Yih, Computer5. Conclusions and future perspectivesprediction of possible toxic action from chemical structure:an update on the DEREK system, Toxicology 106 (1996)

Whilst the use of in silico approaches has become 267–279.very popular over recent years, the current systems [7] D.M. Sanderson, C.G. Earnshaw, Computer prediction of

possible toxic action from chemical structure: The DEREKcannot be used without adequate care and the inputsystem, Hum. Exp. Toxicol. 10 (1991) 261–273.of human experts in the relevant field of toxicology.

[8] P.J. Krause, S.J. Ambler, M. Elvang-Gøransson, J. Fox, AOne of the biggest limitations in the development oflogic of argumentation for reasoning under uncertainty,

predictive systems is the lack of reliable and con- Comput. Int. 11 (1995) 113–131.sistent data available to the developers. However, as [9] J.C. Dearden, M.D. Barratt, R. Benigni, D.W. Bristol, R.D.knowledge about the mechanisms of action involved Combes, M.T.D. Cronin, P.N. Judson, M.P. Payne, A.M.

Page 15: Computer systems for the prediction of toxicity: an update

N. Greene / Advanced Drug Delivery Reviews 54 (2002) 417 –431 431

Richard, M. Tichy, A.P. Worth, J.J. Yourick, The develop- [29] A.M. Richard, Structure-based methods for predicting muta-ment and validation of expert systems for predicting toxicity, genicity and carcinogenicity: are we there yet?, Mutat. Res.ATLA 25 (1997) 223–252. 400 (1998) 493–507.

[10] http: / /www.compudrug.com/ [30] A.M. Richard, R. Benigni, AI and SAR approaches for[11] M.P. Smithing, F. Darvas, HazardExpert: an expert system predicting chemical carcinogenicity: survey and status report.

for predicting chemical toxicity, in: J.W. Finlay, S.F. Robin- SAR and QSAR. Environ. Res. (2002) in press.son, D.J. Armstrong (Eds.), Food Safety Assessment, Ameri- [31] R. Benigni, Structure-activity relationships in mutagenesiscan Chemical Society, Washington, DC, 1992, pp. 191–200. and carcinogenesis, in: M. Balls, A.M. van Zeller, M.E.

[12] D.V. Parke, C. Ioannides, D.F.V. Lewis, The safety evaluation Halder (Eds.), Progress in the Reduction, Refinement andof drugs and chemicals by the use of computer optimized Replacement of Animal Experimentation, Elsevier, Amster-molecular parametric analysis of chemical toxicity (COM- dam, 2000, pp. 469–478.PACT), ATLA 18 (1990) 91–102. [32] H.S. Rosenkranz, G. Klopman, Prediction of the carcino-

[13] D.F.V. Lewis, in: The Cytochromes P450: Structure, Function genicity in rodents of chemicals currently being tested by theand Mechanism, Taylor & Francis, London, 1996, p. 348. US National Toxicology Program: structure–activity correla-

[14] D.F.V. Lewis, C. Ioannides, D.V. Parke, Molecular modeling tions, Mutagenesis 5 (1990) 425–432.of cytochrome CYP1A1: a putative access channel explains [33] K. Enslein, B.W. Blake, H.H. Borgstedt, Prediction ofthe differences in induction potency between the isomers probability of carcinogenicity for a set of ongoing NTPbenzo(a)pyrene and benzo(e)pyrene, and 2- and 4-acetyl- bioassays, Mutagenesis 5 (1990) 305–306.aminofluorene, Toxicol. Lett. 71 (1994) 235–243. [34] D.F.V. Lewis, C. Ionnides, D.V. Parke, A prospective toxicity

[15] http: / /www.multicase.com/ evaluation (COMPACT) on 40 chemicals currently being[16] G. Klopman, Computer automated structure evaluation of tested by the National Toxicology Program, Mutagenesis 5

organic molecules, J. Am. Chem. Soc. 106 (1984) 7315– (1990) 433–435.7324. [35] Anonymous, Predicting chemical carcinogenesis in rodents.

[17] G. Klopman, MULTI-CASE: 1. A hierarchical computer An international workshop, National Institute of Environ-automated structure evaluation program, Quant. Struct.-Act. mental Health Sciences, Research Triangle Park, NC, USA,Relatsh. 11 (1992) 176–184. 1993.

[18] H.S. Rosenkranz, G. Klopman, Structural relationships be- [36] D.W. Bristol, J.T. Wachsman, A. Greenwell, The NIEHStween mutagenicity, maximum tolerated dose and carcino- predictive-toxicology evaluation project: chemcarcinogenici-genicity in rodents, Environ. Mol. Mutagen. 21 (1993) 193– ty bioassay, Environ. Health Perspect. 104 (1996) 1001–206. 1010.

[19] G. Klopman, H.S. Rosenkrantz, Approaches to SAR in [37] J. Ashby, Comparison of 17 methods of predicting thecarcinogenesis and mutagenisis, prediction of carcino- carcinogenicity of 30 chemicals, Environ. Health Perspect.genicity /mutagenicity using MULTI-CASE, Mutat. Res. 305 105 (5) (1997) 466.(1994) 33–46. [38] Y.P. Zhang, N. Sussman, O.T. Macina, H.S. Rosenkranz, G.

[20] M. Lui, N. Sussman, G. Klopman, H.S. Rosenkrantz, Struc- Klopman, Prediction of the carcinogenicity of a secondture-activity and mechanistic relationships: the effect of group of organic chemicals undergoing carcinogenicitychemical overlap on structural overlap in data bases of testing, Environ. Health Perspect. 104 (1996) 1059–1060.varying size and composition, Mutat. Res. 372 (1996) 79– [39] C.A. Marchant, Prediction of rodent carcinogenicity using85. the DEREK system for 30 chemicals currently being tested

[21] E.J. Matthews, J.F. Contrera, A new highly specific method by the National Toxicology Program, Environ. Health Per-for predicting the carcinogenic potential of pharmaceuticals spect. 104 (1996) 1065–1074.in rodents using enhanced MCASE QSAR-ES software, Reg. [40] D.F.V. Lewis, C. Ioannides, D.V. Parke, COMPACT andToxicol. Pharm. 28 (1998) 242–264. molecular structure in toxicity assessment: a prospective

[22] G. Klopman, M. Tu, BT Fan, META 4. Prediction of the evaluation of 30 chemicals currently being tested for rodentmetabolism of polycyclic aromatic hydrocarbons, Theor. carcinogenicity by the NCI/NTP, Environ. Health Perspect.Chem. Acc. 102 (1999) 33–38. 104 (1996) 1011–1016.

[23] http: / /www.accelrys.com/ [41] Y.T. Woo, D.Y. Lai, J.C. Arcos, M.F. Argus, M.C. Cimino, S.[24] K. Enslein, The future of toxicity prediction with QSAR, In DeVito, L. Keifer, Mechanism-based structure–activity rela-

Vitro Toxicol. 6 (1993) 163–169. tionship (SAR) analysis of carcinogenic potential of 30 NTP[25] L.H. Hall, B. Mohney, L.B. Kier, The electrotopological test chemicals, Environ. Carcinogen. Ecotoxicol. Rev. C15

state: structure information at the atomic level for molecular (1997) 139–160.graphs, J. Chem. Inf. Comput. Sci. 31 (1991) 76–81. [42] G.M. Pearl, Integration of multiple computational models as

[26] http: / /www.scivision.com/ a sentinel filter for predicting drug safety liabilities, pre-[27] E. Zeiger, J. Ashby, G. Bakale, K. Enslein, G. Klopman, sented at IBC In silico Technologies in Drug Discovery,

H.S. Rosenkranz, Prediction of Salmonella mutagenicity, Bethesda, MA, 21–23 May 2001.Mutagenesis 11 (1996) 471–484. [43] R. Mueller, Application of computer-assisted (in silico)

[28] R. Benigni, The first US National Toxicology Program approaches for ADME and toxicity prediction, presented atexercise on the prediction of rodent carcinogenicity: defini- IBC In silico Technologies in Drug Discovery, Bethesda,tive results, Mutat. Res. 387 (1997) 35–45. MA, 21–23 May 2001.