17
Molecular Docking to Ensembles of Protein Structures Ronald M. A. Knegtel, Irwin D. Kuntz and C. M. Oshiro* Department of Pharmaceutical Chemistry School of Pharmacy University of California, San Francisco, CA, 94143- 0446, USA Until recently, applications of molecular docking assumed that the macromolecular receptor exists in a single, rigid conformation. However, structural studies involving different ligands bound to the same target biomolecule frequently reveal modest but significant conformational changes in the target. In this paper, two related methods for molecular docking are described that utilize information on conformational variabil- ity from ensembles of experimental receptor structures. One method combines the information into an ‘‘energy-weighted average’’ of the inter- action energy between a ligand and each receptor structure. The other method performs the averaging on a structural level, producing a ‘‘geo- metry-weighted average’’ of the inter-molecular force field score used in DOCK 3.5. Both methods have been applied in docking small molecules to ensembles of crystal and solution structures, and we show that exper- imentally determined binding orientations and computed energies of known ligands can be reproduced accurately. The use of composite grids, when conformationally different protein structures are available, yields an improvement in computational speed for database searches in pro- portion to the number of structures. # 1997 Academic Press Limited Keywords: DOCK; receptor flexibility; ensemble; structure-based drug design *Corresponding author Introduction The discovery of new drugs has evolved from a random process of screening natural products to a suite of sophisticated procedures that include com- ponents from computational and structural chem- istry. The availability of high-resolution data on enzymes involved in critical metabolic pathways has triggered the development of techniques utiliz- ing such data in the quest for novel compounds of therapeutical relevance. Using this information, computer-based approaches help identify or design ligands that possess good steric and chemical com- plementarity to various sites in the macromolecular target. This process is often referred to as ‘‘struc- ture-based design’’ (Kuntz, 1992). The many different algorithms for structure-based design can be divided into roughly two classes: de novo design, which builds ligands tailored to fit the target, and docking, which searches for existing compounds with good complementarity to the tar- get. In both these paradigms, the enzyme or recep- tor has traditionally been treated as a rigid body and only one conformation of the enzyme is con- sidered. (Examples of de novo design include Lewis (Lewis, 1992) and Miranker (Miranker & Karplus, 1995) and the program LUDI (Bo ¨ hm, 1992a,b); examples of molecular docking include the works of Kuntz et al. (1982), Nussinov et al. (Lin, 1994; Norel et al., 1994), and Bacon & Moult (1992).) While increasing attention is being paid to explor- ing the conformational space of putative ligands in molecular docking (Leach & Kuntz, 1992; Mizutani et al., 1994; Clark & Ajay, 1995; Judson et al., 1995; Oshiro et al., 1995; Welch et al., 1996; Rarey et al., 1996), relatively little effort has been expended on the conformational state of the receptor (Leach, 1994; Jones et al., 1995). Obviously, using a single protein conformation ignores important dynamic aspects of protein –ligand binding. In particular, ‘‘induced fit’’ effects (Koshland, 1958; Jorgensen, 1991) are ignored. The general problem, however, of docking or designing fully flexible ligands to match fully flexible receptors remains daunting. Free energy calculations, using flexible enzyme and ligand structures and explicit solvent mol- ecules, can reproduce ligand binding affinities and structures (see, for example, Kollman, 1994). Ther- modynamic quantities must, however, be extracted from appropriately weighted ensembles of a sys- tem and adequate sampling of such configurations Abbreviations used: rmsd, root-mean-square deviation; cpu, central processing unit; PCB, polychlorobiphenyl; PDB, Protein Data Bank. J. Mol. Biol. (1997) 266, 424–440 0022–2836/97/070424–17 $25.00/0/mb960776 # 1997 Academic Press Limited

Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

Molecular Docking to Ensembles of Protein Structures

Ronald M. A. Knegtel, Irwin D. Kuntz and C. M. Oshiro*

Department of PharmaceuticalChemistry School of PharmacyUniversity of California, SanFrancisco, CA, 94143-0446, USA

Until recently, applications of molecular docking assumed that themacromolecular receptor exists in a single, rigid conformation. However,structural studies involving different ligands bound to the same targetbiomolecule frequently reveal modest but signi®cant conformationalchanges in the target. In this paper, two related methods for moleculardocking are described that utilize information on conformational variabil-ity from ensembles of experimental receptor structures. One methodcombines the information into an ``energy-weighted average'' of the inter-action energy between a ligand and each receptor structure. The othermethod performs the averaging on a structural level, producing a ``geo-metry-weighted average'' of the inter-molecular force ®eld score used inDOCK 3.5. Both methods have been applied in docking small moleculesto ensembles of crystal and solution structures, and we show that exper-imentally determined binding orientations and computed energies ofknown ligands can be reproduced accurately. The use of composite grids,when conformationally different protein structures are available, yieldsan improvement in computational speed for database searches in pro-portion to the number of structures.

# 1997 Academic Press Limited

Keywords: DOCK; receptor ¯exibility; ensemble; structure-based drugdesign*Corresponding author

Introduction

The discovery of new drugs has evolved from arandom process of screening natural products to asuite of sophisticated procedures that include com-ponents from computational and structural chem-istry. The availability of high-resolution data onenzymes involved in critical metabolic pathwayshas triggered the development of techniques utiliz-ing such data in the quest for novel compounds oftherapeutical relevance. Using this information,computer-based approaches help identify or designligands that possess good steric and chemical com-plementarity to various sites in the macromoleculartarget. This process is often referred to as ``struc-ture-based design'' (Kuntz, 1992).

The many different algorithms for structure-baseddesign can be divided into roughly two classes: denovo design, which builds ligands tailored to ®t thetarget, and docking, which searches for existingcompounds with good complementarity to the tar-get. In both these paradigms, the enzyme or recep-tor has traditionally been treated as a rigid bodyand only one conformation of the enzyme is con-

sidered. (Examples of de novo design include Lewis(Lewis, 1992) and Miranker (Miranker & Karplus,1995) and the program LUDI (BoÈhm, 1992a,b);examples of molecular docking include the worksof Kuntz et al. (1982), Nussinov et al. (Lin, 1994;Norel et al., 1994), and Bacon & Moult (1992).)

While increasing attention is being paid to explor-ing the conformational space of putative ligands inmolecular docking (Leach & Kuntz, 1992; Mizutaniet al., 1994; Clark & Ajay, 1995; Judson et al., 1995;Oshiro et al., 1995; Welch et al., 1996; Rarey et al.,1996), relatively little effort has been expended onthe conformational state of the receptor (Leach,1994; Jones et al., 1995). Obviously, using a singleprotein conformation ignores important dynamicaspects of protein±ligand binding. In particular,``induced ®t'' effects (Koshland, 1958; Jorgensen,1991) are ignored. The general problem, however,of docking or designing fully ¯exible ligands tomatch fully ¯exible receptors remains daunting.Free energy calculations, using ¯exible enzymeand ligand structures and explicit solvent mol-ecules, can reproduce ligand binding af®nities andstructures (see, for example, Kollman, 1994). Ther-modynamic quantities must, however, be extractedfrom appropriately weighted ensembles of a sys-tem and adequate sampling of such con®gurations

Abbreviations used: rmsd, root-mean-squaredeviation; cpu, central processing unit; PCB,polychlorobiphenyl; PDB, Protein Data Bank.

J. Mol. Biol. (1997) 266, 424±440

0022±2836/97/070424±17 $25.00/0/mb960776 # 1997 Academic Press Limited

JMB MS1697 [31/1/97]

Page 2: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

is computationally demanding (McCammon &Harvey, 1987; Kollman, 1994).

Rather than exploring the receptor conformationalspace by theoretical means, one could insteadmake use of the increasingly available experimen-tal data on protein structure and ¯exibility. Thiswould focus computational efforts on a more lim-ited, but still important, aspect of the problem,where partial information about the receptor con-formational status is available from either multiplecrystallographic or NMR solution structure deter-minations. Both types of experiments produce col-lections of structures, but with different physicalinterpretations.

A set of related crystal structures are ``snap shots''of a dominant conformation perturbed by differentligands, different crystallization conditions, pointmutations, etc. The structural changes observed insuch a collection of receptor structures include re-gions capable of accommodating differentlyshaped ligands as well as areas of an induced ®t ofthe ligand. Different crystal forms of the sameprotein±ligand complex provide conformationalchanges due to different crystal packing environ-ments.

On the other hand, the result of a structure deter-mination by means of high-resolution NMR spec-troscopy is usually not a single structure, butrather an ensemble of structures, all nominallyequally in agreement with nuclear Overhausereffect and J-coupling data (WuÈ thrich, 1986).Although it is possible to calculate an energy-mini-mized average structure, an ensemble or a subsetof conformers may provide a more accurate rep-resentation of the protein structure (Sutcliffe, 1993;Bonvin & BruÈ nger, 1995). Structural variability inensembles of solution structures can be due to trueconformational ¯exibility as re¯ected in decreasednuclear relaxation rates or a lack of suf®cient ex-perimental data. Truly ¯exible regions within abinding site may be relevant to the plasticity of the®t between receptor and ligand. In either case,however, parts of the receptor structure will be ill-de®ned and therefore cannot be represented accu-rately by a single structure.

With the availability of multiple crystal or solutionstructures, a method that allows for the use of anentire ensemble in molecular docking, rather thana single structure, could prove to be advantageousin cases where conformational changes in the re-ceptor are expected. Such a method would be use-ful for molecular design purposes and, inparticular, computational database screening. Thegeneral issue for both kinds of ensembles is thesame: what procedures best utilize the additional,but still incomplete, information contained in a setof related structures?

Here, methodology is presented for dockingligands to a suite of target structures. The criticalconstructs are weighting protocols for the inter-molecular force ®eld used to measure ligand±tar-get interactions. Force ®eld terms are combinedand stored on a single ``scoring grid'' that is used

in DOCK 3.5 (Kuntz et al., 1982; Meng et al., 1992)to evaluate ligand orientations. The weighting ofthe force ®elds was developed to meet two basiccriteria. First, the correct orientations of theligand(s) should have a favorable score. Some lossof accuracy can be expected when a ligand orien-tation is scored against an ensemble of receptorstructures, only one of which is adapted to thatspeci®c ligand. Nevertheless, a ligand that scoreswell when docked to its native receptor structureshould still do so when scored against the ensem-ble. Secondly, the evaluation should still be ef®-cient enough to allow for fast screening of largedatabases for novel lead compounds to be used inthe drug design process.

The most straightforward method one might con-sider is to evaluate the ligand±receptor binding en-ergies with each structure in the ensemble, thenuse a Boltzmann-weighted energy average at eachgrid point as the score. With more than a few en-zyme conformations, however, evaluating theenergy of every ligand orientation with every re-ceptor conformation becomes prohibitive in termsof both computer memory and computationaltime. Furthermore, this kind of sum cannot be im-plemented on a single scoring grid, since such aweighted sum depends upon all the ligand atompositions, which cannot be precalculated. Onecould calculate a score by summing the individualinteraction energies, giving each energy equalweight. Such ``mean-value'' energy is dominatedby a few bad contacts of the ligand with the recep-tor. Alternatively, one could calculate an energy-minimized positionally averaged structure fromthe ensemble of structures. Such an average struc-ture, however, does not adequately take into ac-count any large conformational variability presentand known ligands which dock to some membersof the ensemble, will not dock to the average struc-ture.

Since the direct method presents dif®culties, moretractable approaches were sought. Two generaltypes of averaging appear to be applicable. Oneapproach, referred to as the ``energy-weightedaverage'', attempts to construct an appropriatelyaveraged ligand±protein interaction energy.Another procedure, referred to as the ``geometry-weighted average'', constructs an average score byconsidering the variance in atom positions. The®rst method calculates, for every atom of themacromolecule in every structure, the van derWaals and Coulombic potential energy factors. Foreach atom, a weighted potential is then calculated,averaging over all structures. The contribution tothe potential of each atom is weighted to make the®nal energy resemble a Boltzmann energy-weighted sum, with the simpli®cation of approxi-mating the free energy with the interaction energy.The actual (unnormalized) weight assigned to eachatom from each structure is a sigmoidal function ofthe distance from the receptor atom to the ligandatom or grid point. This function approaches zeroat short distances and increases to unity at longer

Molecular Docking to Ensembles of Protein Structures 425

JMB MS1697 [31/1/97]

Page 3: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

distances. As a result, only the attractive poten-tial is considered when there is a mixture ofboth attractive and repulsive contributions froman atom in different structures; i.e. repulsivepotentials have a negligible contribution to theaverage in a manner analogous to a true Boltz-mann energy-weighted sum. Repulsive potentialsare only included when the potentials fromatoms from all the structures are repulsive. Inthis manner, a ligand atom can have a reason-able attractive interaction energy, rather than alarge repulsive energy, when positioned close toa receptor atom of only one of the several re-ceptor structures. Such an averaging method al-lows for small variations in local geometry andattempts to model a Boltzmann-like sum. Werefer to this method of averaging as the ``en-ergy-weighted average''.

The second method performs the averaging on thestructural level and calculates for every atom ofthe macromolecule, a mean position, averagedover all structures, and its corresponding variance.When averaging the position of a particle, thenature of its motion needs to be considered inorder to validate the averaging. For small harmo-nic motions, the average position is a physicallyrelevant quantity, but for larger anharmonicmotions, possibly involving multiple minima, theentire distribution of atom positions offers a moreaccurate description. As depicted in Figure 1, aligand binding site can be divided in structurallywell-determined regions where the atoms of differ-ent structures in the ensemble overlay with smallroot-mean-square deviations (rmsd values) andless conformationally restricted regions where thesame atom can occupy very different positions inspace. Atomic motions in these two categories canbe considered to represent harmonic and anharmo-nic oscillators, respectively. For generating pos-ition-weighted scoring grids, atoms are assumed tobe disordered if their positional standard deviationis above a user-de®ned threshold. Such atoms aretreated as volumeless and copies of all positions inthe ensemble are used with fractional occupancy togenerate scoring grids. For atoms with a varianceless than the threshold, the average position isused. This allows larger ligands to bind by occupy-ing ¯exible portions of the binding site, withoutdetailed analysis of conformational changes in theprotein induced by the ligand. We refer to thismethod of averaging as geometry-weighted aver-aging.

In Methods we describe both types of averaging indetail. In Results we evaluate the usefulness ofboth types of composite grids in regeneratingknown crystal ligand orientations, using severalcrystal structures of HIV protease and retinol bind-ing protein as well as an ensemble of NMR struc-tures of ras p21 and uteroglobin. We also evaluatethe ability of both types of composite grids toidentify known ligands from a small database ofcompounds. Finally, a comparison of the two

methods and a discussion of their limitations isgiven.

Results

Choice of test cases

Test cases were chosen from a limited set of sys-tems where data on multiple target structures in-teracting non-covalently with a variety of ligandsare publicly available. Two cases, HIV proteaseand ras p21 protein, were selected for their thera-peutic interest. HIV protease is a realistic and well-studied target for structure-based drug design andmany crystal structures of complexes with differentinhibitors have been reported. The inhibitors usedin our studies are generally extended peptide-likemolecules, except for the thioketal haloperidol (seeFigure 2), and are almost completely enclosed bythe fairly hydrophobic active site of the enzyme.The disorder observed in the HIV protease activesite is mainly due to the occurrence of differentside-chain conformers. The ras p21 protein belongsto a class of oncogene proteins whose mode of ac-tion is actively studied. It cycles between an activeand an inactive state as a function of ligand bind-

Figure 1. Schematic depiction of how binding site atomsexhibiting different degrees of positional disorder arerepresented in a single scoring grid. For atoms in staticregions of the binding site (on the right), the averageposition is used to calculate AMBER force ®eld potentialE. Atoms that have positional standard deviationsabove a user-de®ned threshold (on the left) are rep-resented by all copies of those atoms in Nens proteinstructures. Their van der Waals volume and AMBER po-tential are set to zero within a user-de®ned radius foreither polar or non-polar atoms. The remaining attrac-tive portion of the potential E is scaled by the numberof copies Nens.

426 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 4: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

ing (Milburn et al., 1992). In contrast to HIV pro-tease, the ras p21 solution structure has a chargedand open binding site where the observed disorderinvolves both side-chains and loop regions. Theother two test cases are less well studied targets,but they are also of pharmacological and structural

interest. The uteroglobin solution structure has awell determined backbone conformation and ahydrophobic but still solvent-accessible bindingsite. Random conformational variability is ob-served for several side-chains. In this case, fewligands are known and our main focus was on acomparison of the behavior of the energy-mini-mized average structure with the ensemble in mol-ecular docking using structures determined byNMR. Finally, in the ensemble of retinol-bindingprotein crystal structures the structural variabilityis mainly due to conformational changes involvingtwo side-chains. The ensemble is, however, domi-nated by the less permissive conformations. Forthis test case, the ability of our methods to re¯ectrelatively small conformational differences in theaverage scoring grid can be evaluated. A moreelaborate discussion of the test cases is given inMethods.

Figure 2. Schematic representation of the chemical structures of HIV protease inhibitors used in docking studies.Ligands and the PDB entry codes of the enzyme±inhibitor complexes listed (see Methods).

Table 1. Force ®eld scores (kcal/mol) of HIV proteaseinhibitors docked to individual enzyme structures

7hvp 9hvp 1hos 5hvp q7k

JG365 ÿ65 �27 ÿ52 ÿ53 ÿ3A74704 ÿ14 ÿ57 ÿ59 ÿ11 ÿ25SB204144 ÿ28 ÿ25 ÿ49 �56 �30Pepstatin ÿ37 ÿ15 ÿ42 ÿ30 ÿ24thk ÿ35 ÿ27 ÿ29 ÿ30 ÿ31

All values listed are based on single docking calculations. Diag-onal elements are values for the ligand docked to its co-crystal-lized enzyme structure (printed in bold face).

Molecular Docking to Ensembles of Protein Structures 427

JMB MS1697 [31/1/97]

Page 5: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

Use of composite grids in singleligand docking

HIV protease test case

The accuracy of DOCK 3.5 in regenerating theligand orientation observed in crystal structures ofHIV protease using composite scoring grids iscompared with that using the scoring grids of the®ve individual enzyme structures (see Methods) inTables 1 and 2. Table 1 shows that not every ligandcan be docked successfully to every enzyme struc-ture. Several combinations of ligand and enzymestructure yield high interaction energies thatwould prevent these inhibitors from being ident-i®ed in docking searches. When either compositeensemble-based grid is used, the crystal ligandorientations and energies can be reproducedreasonably well (Table 2). Both composite grids®nd orientations within 1 AÊ of the crystal orien-tation for almost all ligands within 10 to 100 CPUseconds/ligand. The thk inhibitor was translatedby 4 AÊ in the binding site when using the energy-weighted composite grid. This displacement iscaused by the replacement of the chlorine ion, ob-served in the crystal structure, by a water molecule

in order to make this enzyme structure equivalentto the other HIV protease structures (see Methods).When the chlorine ion is reintroduced at its orig-inal position, the native ligand orientation is re-trieved (Rutenber et al., 1993).

The energies obtained with individual grids com-pare reasonably well with those obtained with thecomposite grids although the scores obtained withthe energy-weighted grid are not as favorable. Anexception is pepstatin, which scores much betterwhen docked using the ensemble-based grids in-stead of to its own (5hvp) structure (see Table 2).In the experimental structure, steric clashes are pre-sent between the inhibitor and the enzyme, yield-ing a force ®eld score of �140 kcal/mol. Dockingand simplex minimization of the ligand to thenative enzyme structure yields a more favorableenergy value (see Table 2). The ensemble-basedgrid abolishes the steric hindrance between en-zyme and inhibitor, allowing the inhibitor to im-prove its interaction energy with the enzyme. Forthe energy-weighted average, a grid spacing of 0.2to 0.25 AÊ was found to give results closer to thoseof the individual grids (data not shown).

Table 2. Root-mean-square deviations (AÊ ) and force ®eld scores (kcal/mol) ofHIV protease inhibitors docked to an ensemble of enzyme structures

Individual Ensemble Ensemblestructures G-weighted E-weighted

rmsd Energy rmsd Energy rmsd Energy

JG365 0.3 ÿ65 0.6 ÿ63 0.8 ÿ44A74704 0.6 ÿ57 0.41 ÿ50 0.7 ÿ47SB204144 0.5 ÿ49 0.6 ÿ55 0.8a ÿ43Pepstatin 0.9 ÿ30 0.4 ÿ57 0.8 ÿ41thk 0.4 ÿ31 1.1 ÿ35 4.0b ÿ29

The standard deviation threshold above which 65 receptor atoms were treated as ¯exiblewas 0.5 AÊ for the geometry-weighted average grids. The rmsd per non-hydrogen atomlocated in the box used for the generationof scoring grids was 0.49 AÊ . rmsd values andenergy values are are listed for energy (E)-weighted and geometry(G)-weighted compositegrids. Ligands are depicted in Figure 4.

armsd values after ligand was rotated by 180 degrees along the C2 axis of the HIV pro-tease dimer.

bLigand translated in binding pocket caused by its interaction with a water moleculeinstead of a chlorine ion. The correct orientation was found with 0.9 AÊ rmsd and aÿ27 kcal/mol force ®eld energy.

Table 3. Root-mean-square deviations (in AÊ ) and force ®eld scores (kcal/mol) of GDP and GTP analogs docked tothe ras p21 GDP-bound solution structure

20 NMR structures Minimized Ensemble Ensemble(ranges) Average G-weighted E-weighted

rmsd Energy rmsd Energy rmsd Energy rmsd Energy

GCP 1.0 . . . 10.0 ÿ57 . . . ÿ 23 1.5 ÿ52 1.4 ÿ52 1.2 ÿ39GDP 0.5 . . . 9.2 ÿ55 . . . ÿ 27 0.8 ÿ48 0.6 ÿ49 0.6 ÿ39GNP 1.6 . . . 10.7 ÿ50 . . . ÿ 22 1.9 ÿ47 1.6 ÿ48 1.6 ÿ32GNQ 1.2 . . . 11.1 ÿ51 . . . ÿ 20 10.4 ÿ34 0.4 ÿ54 1.2 ÿ41GNR 0.7 . . . 14.2 ÿ61 . . . ÿ 21 9.9 ÿ38 0.4 ÿ48 0.5 ÿ36

The standard deviation threshold above which receptor atoms were treated as ¯exible (181 heavy atoms in total and rmsd/atom of0.66 AÊ ) was 0.5 AÊ . rmsd and energy values are are listed for energy (E)-weighted and geometry (G)-weighted composite grids. Theensemble consisted of 20 solution structures. See the text for ligand abbreviations. The rmsd values of ligands that could not bedocked successfully are printed in bold face.

428 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 6: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

ras p21 protein test case

The results of docking GDP and GTP analoges (seeFigure 2) to ras p21 solution structures using en-ergy and geometry-weighted averaging are listedin Table 3. No single structure of the ensemble gen-erates orientations closest to the native orientationfor all ligands. Also the minimized average struc-ture does not allow for the successful docking ofall GTP analogues. Using composite scoring grids,however, the GTP analogs can be docked to yieldnear-native orientations with deviations of 1.6 AÊ orless from the crystal structure. These deviations aredue to differences in the placement of the chargedphosphate group relative to the Mg ion. Relativelylarge changes in the electrostatic interaction canoccur with relatively small changes in position,even though the formal charges have been reduced(see Methods). For the computed energies the sametrend is observed as with HIV protease where theenergy-weighted grids display a less broad rangeand somewhat higher energies than are obtainedusing individual grids.

It is interesting to note that the NMR ensemble ofthe GDP-bound solution structure apparently con-tains suf®cient information on the conformational¯exibility in this system to allow for the largerGTP analogs to be successfully docked. Residues27 to 32, 58 to 66 and 107 to 109 were identi®ed tobe ¯exible in solution on the basis of reduced 15Nrelaxation rates while a different orientation wasobserved for the second helix composed of resi-dues 60 to 75 (Kraulis et al., 1994). Additional dis-order in the NMR ensemble was observed nearresidues 30 to 38, 47 to 50 and 120 to 122, althoughthis is not clearly re¯ected in the reported 15Nrelaxation rates. These regions overlap reasonablywell with residues 32 to 38 and 60 to 75, which areobserved to change conformation upon GTP bind-ing in crystal structures (Milburn et al., 1990).

To assist in comparing the two composite gridmethods, the volumes available for favorablehydrophobic and polar interactions have beenevaluated for the above two test cases in Table 4.Carbon and nitrogen probe atoms were used to de-termine favorable volumes for hydrophobic andpolar groups, respectively.

Uteroglobin test case

The solution structure of reduced uteroglobin wassolved with a polychlorobiphenyl (PCB) metabolitebound to the protein (HaÈrd et al., 1995). The pro-tein is also known to bind progesterone, albeitwith a much lower af®nity. We were motivated toask if the progesterone±uteroglobin interactioncould be recovered from the solution structure ofuteroglobin with an unrelated ligand. Progesteronewas docked to both the energy-minimized averagestructure and an ensemble consisting of 25 solutionstructures generated with simulated annealingtechniques. The average atom rmsd for the binding

site is 0.42 AÊ , making it a fairly well determinedstructure. When progesterone was docked againstthe energy-minimized average uteroglobin struc-ture a force ®eld score of only ÿ4.3 kcal/mol wasobtained. Using the ensemble grid prepared with apositional standard deviation limit of 0.5 AÊ yieldeda force ®eld score of ÿ25.5 kcal/mol. The energy-weighted composite grid found a similar orien-tation for progesterone with a force-®eld score ofÿ22.7 kcal/mol.

The rmsd between the solutions produced withgeometry and energy-weighted average grids is1.3 AÊ and mostly due to a translation along thelong axis of the progesterone molecule. The orien-tation of the progesterone molecule in both cases issimilar to that reported by Dunkel et al. (1995), andco-workers based on computer modelling and mu-tational studies, with the O-3 and O-20 atoms ofprogesterone positioned between the side-chains ofTyr23 and Thr62. If the orientation of progesterone,as it is docked to the ensemble, were placed in theminimized average structure, severe steric hin-drance with Leu15 and Ile65 especially wouldoccur. The conformations of these residues are notvery well restrained by the NMR data and weretreated as being disordered in the ensemble-basedgrids. Thus, the minimized average solution struc-ture does not allow for the successful docking of aknown ligand, whereas using information from theentire ensemble allows the study of the progester-one±uteroglobin interaction.

Table 4. Comparison of accessible volumes for carbonand nitrogen atoms in energy-weighted and geometry-weighted ensemble-based grids

ras p21 (NMR)Volume (AÊ 3) Volume (AÊ 3)

favorable for C favorable for N

Energy-weightedEnsemble (range) 173±319 291±416Composite 287 318

Geometry-weightedEnsemble (range) 163±299 271±337Composite (s.d. 1.0 AÊ ) 244 360Composite (s.d. 0.5 AÊ ) 319 476

HIV protease (X-ray)Volume (AÊ 3) Volume (AÊ 3)

favorable for C favorable for N

Energy-weightedEnsemble (range) 454±537 232±328Composite 523 270

Geometry-weightedEnsemble (range) 425±504 211±267Composite (s.d. 1.0 AÊ ) 540 268Composite (s.d. 0.5 AÊ ) 740 410

Volumes within a box enclosing the binding site that have anattractive energy for a carbon atom (C, partial charge �0.23) ora nitrogen atom (N, partial charge ÿ0.5) are given in AÊ 3. Thetotal volume of the box used in the case of HIV protease was1920 AÊ 3 while for ras p21 it was 1314 AÊ 3.

Molecular Docking to Ensembles of Protein Structures 429

JMB MS1697 [31/1/97]

Page 7: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

Retinol binding protein test case

A total of ®ve crystal complexes of bovine retinolbinding with retinol derivatives (see Figure 3) wasused for ensemble-based docking. Of the ®ve cor-responding ligands, fenretinide especially is ob-served to induce changes in the side-chainconformations of Leu35 and Leu63 (Zanotti et al.,1994). As is shown in Table 5, fenretidine doesnot dock correctly to the retinol-bound form ofthe protein, but all ligands can be docked suc-cessfully to the fenretinide-bound form. Whencomposite grids are used, all ligands can be

docked with rmsd values and force ®eld energiescomparable to those obtained when using thefenretidine-bound crystal structure. The energy-weighted scoring grid again yields a narrowerrange of energies. The retinol binding protein en-semble is dominated by structures similar to theless permissive retinol-bound form and has thelowest average per atom rmsd of all test cases.Still, both ensemble-based grids allow for success-ful docking of all ligands.

The force ®eld energy of fenretinide dockedagainst its native protein structure is higherthan that obtained when ensemble-based grids

Figure 3. Schematic representationof the chemical structures of GDPand GTP analogs used in dockingstudies of the ras p21 protein. SeeMethods for an explanation of theabbreviations used for the differentcompounds.

Table 5. Root-mean-square deviations (AÊ ) and force ®eld scores (kcal/mol) of retinol derivatives docked to retinolbinding protein

Retinol Fenretinide- Ensemble Ensemblebound bound G-weighted E-weighted

rmsd Energy rmsd Energy rmsd Energy rmsd Energy

AZE 0.6 ÿ34 0.4 ÿ34 0.7 ÿ34 0.4 ÿ33ETR 0.6 ÿ29 0.7 ÿ39 0.4 ÿ40 0.4 ÿ36FEN 4.7 ÿ23 0.7 ÿ44 0.8 ÿ38 0.8 ÿ29REA 1.0 ÿ38 0.8 ÿ39 0.8 ÿ39 0.6 ÿ36RTL 0.5 ÿ36 0.9 ÿ36 0.8 ÿ36 0.8 ÿ35

The standard deviation threshold above which 35 receptor heavy atoms were treated as ¯exible (with a receptor rmsd/atom of0.26 AÊ ) was 0.5 AÊ . The ensemble consisted of ®ve crystal structures. Results derived with geometry (G)-weighted and energy (E)-weighted composite grids are listed. See the text for ligand abbreviations. The rmsd values of ligands that could not be docked suc-cessfully are printed in bold face.

430 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 8: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

are used. This is probably due to the fact thatonly one out of ®ve protein structures has theside-chains of Leu35 and Leu63 in a confor-mation adjusted to fenretinide binding, andtherefore only one-®fth of the potential functionof these residues will interact favorably withthis ligand.

Database experiments

To determine the usefulness of composite grids indatabase searches, a small test database was pre-pared that included the known HIV and ras p21binding compounds, as well as about 150 othercompounds (Methods). In order to determinewhether all known inhibitors or ligands of thesesystems could be identi®ed, all compounds in thisdatabase were docked using the composite grids aswell as the standard DOCK grids. In the lattercase, a grid was generated for each structure in theensemble. The relative ranks of known inhibitorswithin the test database are reported in terms ofthe portion of the top scoring compounds theyconstitute, i.e. their fractional rankings are listed in

Tables 6 and 7, respectively. The DOCK par-ameters for these runs were selected to generate areasonable number of orientations for searching adatabase (see Methods).

For both the energy-weighted average and geome-try-weighted average composite grids, all ®ve HIVinhibitors are ranked within the top 17% of the da-tabase. For grids constructed from the separate en-zyme structures, at most three inhibitors wereidenti®ed within the top 25% of the database(Table 6). Here the use of composite grids clearlyoutperforms the use of a single structure for dock-ing purposes. For the ras p21 test case, all ligandswere found within the top 5% of the databaseusing the geometry-weighted average grid andwithin the top 11% when using the energy-weighted grids. In this case, however, many of thegrids generated from individual structures in theensemble performed better than the compositegrids by ranking all inhibitors within the top 5% ofthe database (Table 7). However, since it is notknown a priori which of the structures in an ensem-ble should be used, a composite grid is more likelyto identify all ras p21 ligands than a grid con-structed from a randomly selected structure in theensemble.

In order to examine to what extent the compositionof the ensemble determines the likelihood withwhich a composite grid identi®es an inhibitor,composite grids were generated using only fourout of ®ve HIV protease structures. These gridswere used to dock our test database and the result-ing fractional rankings for all ®ve permutations ofone deleted enzyme structure show that the en-ergy-weigthed average grids ®nd all ®ve inhibitorswithin 12 to 33% of the database while the geome-try-weighted grid allow identi®cation of all inhibi-tors within 11 to 21% of the database. Deleting the1hos and q7k systems, especially, from the ensem-bles causes problems with the successful retrieval

Figure 4. Schematic representation of the chemicalstructures of retinol analogs used in docking studies ofuteroglobin. See Methods for an explanation of theabbreviations used for the different compounds.

Table 7. Fractional rankings of ligands of ras p21 within the test database

ras p211,2,3,4 ras p21 Energy- Geometry-

No. ligands 5,6,7,8, ras p21 14,15 ras p21 ras p21 weighted weightedidenti®ed Grid 9,10,11 12,13 16,18 19 20 average average

1 0.02 0.01 0.02 0.02 0.01 0.01 0.012 0.02 0.01 0.02 0.07 0.05 0.02 0.013 0.03 0.02±0.04 0.03 0.14 0.17 0.02 0.024 0.04 0.03±0.05 0.04±0.08 0.16 0.23 0.04 0.035 0.04 0.07±0.10 0.13±0.17 0.30 0.46 0.11 0.05

Table 6. Fractional rankings of inhibitors of HIV protease within the test database

Energy- Geometry-No. inhibitors weighted weighted

identi®ed Grid 7hvp 9hvp 1hos 5hvp q7k average average

1 0.01 0.01 0.01 0.01 0.19 0.01 0.012 0.06 0.30 0.01 0.42 0.36 0.01 0.013 0.12 0.51 0.02 0.59 0.45 02 0.024 0.16 0.97 0.03 0.96 0.52 0.03 0.035 0.28 1.00 0.35 0.99 0.99 17 0.16

Molecular Docking to Ensembles of Protein Structures 431

JMB MS1697 [31/1/97]

Page 9: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

of all ®ve inhibitors for both grids. However, alsoin this case the composite grids are still more likelyto identify all ligands than a grid based on a ran-domly selected structure.

In addition, we compared the score of a particularcompound derived from both composite gridswith the best score derived from any of the ®vestandard HIV protease and the 20 ras p21 struc-tures. The two sets of scores are plotted againsteach other in Figure 5 for the HIV protease caseand in Figure 6 for the ras p21 test case. Knowninhibitors are indicated. A reference line of equalscores is also plotted.

The correlation between the best DOCK score andthe composite scores is high: 0.9 for the both com-posite grids for the HIV protease test case and 0.8for the ras p21 test case. In addition, the inhibitorsand ligands, on average, have a more favorablecomposite grid score than compounds randomlyselected from the database and are identi®able atleast as well by the composite grid score as by theDOCK score. The relative rank ordering of thecompounds is roughly preserved; that is, the top

10% of compounds scored with composite-gridsare also among the top 10% of the best scoringcompounds found using all individual structures.Most notably, there are considerable savings incomputational time when using composite grids,the order of the number of enzyme systems underconsideration.

Discussion

Sensitivity of composite grids to parameters

To be generally applicable, the composite gridsmust be robust to parameter choices. Our dockingexperiments showed that the grid spacing for theenergy-weighted grids needs to be ®ner than thatof standard or geometry-weighted grids in order toreproduce continuum-calculated values. In particu-lar, the energy-weighted average energies are con-sistently less favorable when a coarser spacing isused in single ligand docking runs (0.3 AÊ ) com-pared to the database searches (0.25 AÊ ). The con-struction of a coarser grid can be visualized as the

Figure 5(a) legend opposite

432 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 10: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

averaging of points of a ®ner grid. Due to the largevariation in the repulsive van der Waals potential,this average can be dominated by repulsive inter-actions, resulting in less favorable scores asobserved. For the purpose of reproducing exper-imentally determined orientations, a coarser spa-cing was found to be adequate. For studiesrequiring more accurate energies, however, the®ner grid spacing is recommended for this methodof averaging. A ®ner grid spacing demands morerandom access memory and requires more compu-ter time during grid generation but does not in-crease the computer time required for docking.

The scores obtained with geometry-weighted aver-age grids are somewhat sensitive to the standarddeviation threshold that is chosen. Using a 0.5 AÊ

threshold is recommended, since it assures thatsmall conformational variability is represented inthe composite grid. No detrimental effects on thequality of docking solutions were observed whenthe van der Waals radii, which determine at whatpoint the potential of disordered atoms is set tozero, were changed by 0.1. In addition, the use of

spheres generated from a randomly chosen struc-ture of the ras p21 ensemble, instead of the mini-mized average structure, did not in¯uence thequality of the docked ligand orientations.

Finally, neither method appears to be highly sensi-tive to the mode of superpositioning. For both theHIV protease and ras p21 test cases, equally goodresults were obtained when superposition was per-formed on a different set of residues or differentstructures.

Limitations

The use of ensembles of discrete conformations tomimic receptor ¯exibility poses certain limitationson the accuracy and scope of the methodology de-scribed here. The use of only a limited number ofconformations cannot adequately represent the fullrange of the conformational space available to thereceptor and some ligands may therefore not beidenti®ed in database searches with compositegrids. In this respect, the methodology presentedhere differs from truly ¯exible docking in that

Figure 5. Best DOCK force ®eld energy from ®ve individual HIV protease scoring grids versus the energy-weighted(a) and geometry-weighted (b) composite grid scores. Known HIV protease inhibitors are indicated and the linedepicting equal scores is plotted for comparison.

Molecular Docking to Ensembles of Protein Structures 433

JMB MS1697 [31/1/97]

Page 11: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

different conformational states of the receptor arenot generated but taken from available experimen-tal data. Nevertheless, there is more informationcontained in a set of multiple conformations thanin any single conformation. We have addressed theproblem of how best to utilize this additional infor-mation. In the following paragraphs we will dis-cuss some of the physical issues raised by energyand geometry-weighted averaging methods.

The energy-weighted averaging method attemptsto model a Boltzmann-weighted average. Ratherthan one Boltzmann weight for each structure,each receptor atom has its own separate Boltz-mann-like weight. Thus an appropriate portion ofeach conformation contributes to the energy. Treat-ing atoms in the receptor molecule independentlyis not physically correct, but attempts to model theaccommodation of the ligand by the receptor.There is, of course, some loss of geometric accuracyand the binding energies computed on this compo-site grid are less favorable than is found for the in-dividual grids. This is to be expected, since anaverage of energies cannot, by de®nition, be as fa-vorable as the most attractive enzyme±ligandinteraction energy. Nevertheless native ligand

orientations can be regenerated and known bind-ing molecules can be identi®ed with this compositegrid.

The different treatment of ¯exible regions by en-ergy and geometry-weighted averaging methodsresults in differences in the accessibility of disor-dered regions in the binding site. Energy-weightedaveraging smooths repulsive interactions betweenindividual protein structures and a ligand, unlessall the structures in the ensemble contribute with arepulsive interaction to the inter-molecular energyat a grid point. The region of overlap between vander Waals radii of receptor atoms from differentstructures, which decreases with increasing spatialdisorder of the atom, can therefore be unaccessibleto ligand atoms on the grid. The geometry-weighted average grid, on the other hand, removesthe repulsive potentials of ¯exible atoms entirely,under the assumption that the position of suchatoms is uncertain and representing them withtheir full potentials could interfere with the dock-ing of larger ligands. Therefore, the geometry-weighted average retains less information in thecomposite grids on which regions of the bindingsite are inaccessible to ligands. This can yield a

Figure 6(a) legend opposite

434 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 12: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

geometry-weighted average grid that is more per-missive as is illustrated by the results obtainedwhen docking our test database to ensembles con-sisting of only four out of ®ve HIV protease struc-tures. Using the geometry-weighted average grid,all known ligands dock within the top 21 % of thedatabase while the energy-weighted grids ®nd allinhibitors within the ®rst 33%. The difference iscaused by the fact that the geometry-weightedaverage does not depend on the exact positions of¯exible atoms, but only on the variance in theatomic coordinates. Therefore, geometry-weightedaveraging is less sensitive to which structure is de-leted. This more permissive treatment of disorderin the ensemble may result, however, in protein±ligand complexes being generated that do not cor-respond to a physical realistic protein confor-mation.

The differences in how ¯exible regions are treatedare also re¯ected in the volumes favorable tohydrophobic or polar probe atoms as shown inTable 4. The accessible volumes of energy-weighted composite grids fall within the range dis-

played by the ensemble. For geometry-weightedcomposite grids, they increase with the number ofdisordered atoms. Since disordered atoms are re-moved from the bump ®lter, a larger number ofligand orientations will be scored and minimized,resulting in somewhat increased execution timescompared to energy-weighted composite grids.This is compensated by the fact that the calculationof energy-weighted average grids for 20 structurestakes of the order of ten hours, which could proveto be prohibitive when large numbers of NMRstructures are available for docking studies. Geo-metry-weighted averaging for the same number ofstructures requires execution times of the order often minutes.

Both methods reported here have been pragmaticin nature and do not improve the quality of indi-vidual scores per se. The development of a newforce ®eld, adapted for multiple structures as wellas incorporating solvation, hydrophobic and entro-pic effects, would be highly desirable, but is be-yond the scope of this work. Here, we sought toincorporate the information at hand in available

Figure 6. Best DOCK force ®eld energy from 20 individual ras p21 scoring grids versus the energy-weighted (a) andgeometry-weighted (b) composite grid scores. Known ras p21 ligands are indicated and the line depicting equalscores is plotted for comparison.

Molecular Docking to Ensembles of Protein Structures 435

JMB MS1697 [31/1/97]

Page 13: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

structural data in order to generate a feasible meth-od of evaluating ligand orientations in moleculardocking. Our test cases have shown that our com-posite grid can effectively replace all the individualscoring grids and, for database searching in par-ticular, can be a more useful representation thanany single snap shot. If a more accurate evaluationof binding modes and energies is required, ensem-ble-based grids could be used as a ®rst ®lter inthree dimensional database searches. More accu-rate scores and orientations can be obtained at alater stage by docking the highest scoring ligandsof the composite grids to each of the individualprotein structures.

Conclusions

It has been shown that the inclusion of receptorconformational ¯exibility in molecular docking canbe achieved by averaging ensembles of NMR orcrystal structures on the basis of interaction energyor structural variability to yield a single scoringgrid. Both scoring schemes are based upon theDOCK molecular mechanics interaction energyand are highly correlated with the best DOCKscore of a ligand with the available macromolecu-lar conformations. By smoothing or removing therepulsive part of the chemical potentials of ¯exibleatoms, differently shaped ligands can be docked toyield orientations close to known binding modesnear to ¯exible regions of protein structures. Underthe same sampling conditions, the relative rank or-dering of the compounds is grossly preserved. Inthe case of HIV protease and ras p21, bothmethods can identify known inhibitors while af-fording a saving in time of the order of the numberof structures. In all test cases presented here, thecomposite grids outperform many of the grids de-rived from individual structures of the ensemble,or in the case of NMR-derived structures, the mini-mized average structure.

Although the energy-weighted and geometry-weighted averaging methods are based on differ-ent assumptions regarding the treatment ofmultiple structures in molecular docking, both per-formed equally well for the test cases presentedhere. These methods do not explore receptor ¯exi-bility in a dynamic fashion but, rather, they intro-duce an averaging algorithm that uses informationfrom an ensemble of structures to allow for abroader range of realistic ligands to be identi®edwhen searching three-dimensional databases.

Methods

This section is divided into three parts: (1) a description ofthe computational procedures; (2) a description of the testsystems used in this work; and (3) the parameters used.

Computational procedures

The starting point of all our calculations is a set of pro-tein±ligand complexes, determined either by X-ray crys-

tallography (HIV protease, retinol binding protein) or byNMR (ras p21, uteroglobin). The general docking pro-cedure that we have used is implemented in the pro-grams SPHGEN and DOCK 3.5, which have beendescribed (Kuntz et al., 1982; Meng et al., 1992, 1993;Shoichet et al., 1992). The active site of each enzyme is®rst characterized with a set of overlapping spheres.These packed spheres form a negative image of the en-zyme active site and are used to orient the ligand.Ligands oriented within the binding site are evaluatedon a scoring grid yielding an inter-molecular interactionenergy. In the following we describe the construction ofenergy-weighted and geometry-weighted compositescoring grids as well as the individual grids for each en-zyme structure.

The standard method of scoring in DOCK 3.5 uses theAMBER potential function (Weiner et al., 1984, 1986) toapproximate the ligand±enzyme binding energy with amolecular mechanics interaction energy. The van derWaals and Coulombic interactions are summed over allpairs of ligand atoms and enzyme atoms within a dis-tance cutoff. To calculate this interaction energy, the in-dividual terms in the potential function are separatedinto protein factors and ligand factors. The protein con-tributions are precalculated and stored on a scoring grid;this grid ®lls the volume accessible to ligand atoms. Eachligand orientation is then evaluated by combining thesefactors with ligand factors to generate a score, the inter-molecular interaction energy.

Standard DOCK interaction energy

The DOCK grid-based interaction energy, E, of theligand with a receptor, is given by (Meng et al., 1992):

E �X

i

�Ai

�Xj

�Aj=r12ij ��ÿ Bi

�Xj

�Bj=r6ij��

� Kqi

�Xj

�qj="rij���

where i indexes the ligand atoms and j indexes the re-ceptor atoms; A and B are the van der Waals attractiveand dispersive parameters, respectively; q is the partialcharge on the atom; K is the scaling constant which con-verts electrostatic energy into kcal/mol; e is the dielectricconstant of the medium; rij is the distance betweenligand atom i and receptor atom j.

Terms in boldface are the receptor van der Waals attrac-tive, repulsive and Coulombic terms, which have beenprecalculated and stored on a grid.

Energy-weighted average interaction energy

Theenergy-weightedaverage interactionenergyover theN receptor structures is de®ned as:

E � �1=N�X

i

�Ai

�Xj

Xn

wtjn�Ajn=r12ijn��

ÿ Bi

�Xj

Xn

wtjn�Bjn=r6ijn��

� Kqi

�Xj

Xn

wtjn�qjn="rij���

where i and j are indexes over ligand atoms and receptoratoms, respectively, as above; A,B, q, K e and rij are arede®ned above; n, indexes over receptor structures; wtj n

436 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 14: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

is a (normalized) sigmoidal-like function of the distancefrom the receptor atom j of structure n to ligand atom i(at a grid point).

Terms in bold face are precalculated and stored on agrid.

For every receptor atom j, the potential (of the atom)from each structure is calculated, then combined with anormalized weighted sum. (The sum is over the jth atomfrom all N structures.) The unnormalized weight, wt, forreceptor atom j in each structure n, increases from nearzero at short distances to unity at large distances via atruncated cosine term as shown below:

wt �� 0 if dij4 min Rj

0:5��1ÿ cos�p�dij ÿ min Rj��;if min Rj < dij < min Rj � 1

1; o th erwise

8>><>>:

where dij is the distance between receptor atom j (ofstructure n) to ligand atom i (at a grid point);

minRj � (well-depth of atom j/b)1/12 and b is an adjusta-ble parameter, set to 50 in our systems and minRj can beinterpreted as the van der Waals radius of the atom. Atdistances (from the ligand atom at a grid point) wherethe van der Waals repulsive potential is greater than orequal to b, the unnormalized weight of an enzyme atompotential is approximately equal to zero and set to d.Note that if the (unnormalized) weight for the enzymeatom potential from each structure is the same and equalto d, then the normalized weight becomes unity.Roughly speaking, if, at one point, the repulsive termfrom an atom from one structure is large, it does notcontribute to the potential function at that point. How-ever, if the potential of the atom from all structures is re-pulsive, then all contribute equally to the potential.

For each orientation, ligand terms, stored separately, arecombined with receptor terms to calculate a score.Ligand atom positions falling between grid points usevalues calculated by trilinear interpolation between gridpoints (Meng et al., 1992).

In order to limit the number of ligand orientations thatare actually evaluated in DOCK, each orientation is ®rsttested to see if ligand atoms intersect the enzyme; if toomany ligand atoms ``bump'' into the enzyme, the orien-tation is not scored. The energy-weighted average bumpgrid is constructed in a manner similar to the standardDOCK bump grid: for every grid point k, the number ofenzyme atoms from every structure that is close to thegrid point k is counted. Grid points with counts equal toor greater than the number of structures are consideredto be a bump.

Geometry-weighted average interaction energy

The geometry-weighted average method distinguishesbetween structurally well-de®ned and disorderd atomsin the ensemble in order to arrive at a composite grid formolecular docking. Ensembles of protein structures aresuperimposed on residues near the binding site andaverages and standard deviations are calculated for allindividual atom positions. A single structure is then de-rived in which, for atoms with standard deviationsabove a user-de®ned threshold, all copies of the originalatom positions are kept, while for less-¯exible atomsonly the average position is used. To reduce the numberof atoms in the resulting structure, this procedure is onlyfollowed for atoms within the boundaries of the box thatis used to de®ne the volume and position of the scoring

grid. All atoms located outside of this box are averaged,regardless of their positional standard deviation.

In order to allow larger ligands to occupy space within¯exible regions of the binding site, the repulsive portionof the force ®eld potentials of disordered atoms needs tobe adjusted. In DOCK 3.5 the scoring grids consist of abump grid and chemical grids (Meng et al., 1992). Thebump grid serves as a crude initial ®lter for orientationsgenerated by the matching algorithm. A grid point of thebump grid is considered to be unaccessible to ligandatoms when the van der Waals volume of a receptoratom overlaps with it. Orientations that place ligandatoms at such grid points are rejected without the needfor computationally more demanding scoring. The vo-lume of receptor atoms is determined by two distinctvan der Waals radii for either polar or non-polar atoms.For the purpose of creating bump grids, disorderedatoms were considered to be volumeless. For the chemi-cal grids, both the van der Waals and electrostatic poten-tials were set to zero for grid points within the van derWaals radius of a disordered atom. Beyond the van derWaals radius the attractive part of the potential remainsintact but is scaled by the number of copies, as is de-picted in Figure 1. Thus, although the disordered atom isno longer present as a van der Waals barrier for poten-tial ligands, its chemical characteristics are still encodedin the scoring grid. Due to the fact that the force ®eldgrid is no longer continuous, the trilinear grid interp-olation implemented in DOCK 3.5 (Meng et al., 1992)was sometimes observed to produce spurious resultsduring simplex minimization of docked ligands (Menget al., 1993). Therefore we conducted the docking simu-lations that made use of ensemble-based grids withoutthe use of grid interpolation.

Test cases

Suitable test cases are critical to evaluating methodologyfor docking to ensembles of receptor structures. For crys-tal structures to be useful as test cases, several structuresof complexes with different ligands at high resolution(preferably less than 2 AÊ ) are required. The ligandsshould not be covalently linked to the protein andshould induce large enough conformational changes inthe protein, such that not all ligands ®t all receptor struc-tures. For NMR structures, a conclusive de®nition of res-olution has not yet been proposed, but more than tenrestraints per residue and a rmsd smaller than 1 AÊ forthe backbone would be preferred. Unfortunately, thereare few ensembles of crystal sctructures availablethrough the Protein Data Bank (PDB) that ful®l all the re-quirements listed above and even less NMR structures ofproteins complexed with small molecules.

As test cases we used ensembles of solution structures ofthe ras p21 protein and uteroglobin as well as crystalstructures of retinol binding protein and HIV protease.The NMR solution structure of HIV protease bound to acyclic urea has been reported recently (Yamazaki et al.,1995) but the coordinates for these structures are not yetavailable for comparative dockings.

Five complexes of the HIV-1 aspartyl protease withbound inhibitors were used for docking studies. Fourcomplexes were taken form the PDB, namely entries7hvp (the complex with JG365, a substrate-like hydro-xyethylamine inhibitor at 2.4 AÊ resolution; Swain et al.,1990), 9hvp (the complex with A74704 at 2.8 AÊ resol-ution; Erickson et al., 1990), 1hos (the complex withSB204144 at 2.3 AÊ resolution; Abdel-Meguid et al., 1993),

Molecular Docking to Ensembles of Protein Structures 437

JMB MS1697 [31/1/97]

Page 15: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

and 5hvp (complex with acetyl pepstatin at 2.0 AÊ resol-ution; Fitzgerald et al., 1990). The structure of the theQ7K mutant of the homodimeric protein complexedwith thioketal haloperidol (UCSF8) (Rutenber et al., 1993)was obtained from R. Keenan (Biochemistry Department,UCSF). These ligands will be referred to as JG365,A74704, SB204144, pepstatin and thk, respectively. Theirchemical structures are depicted in Figure 2.

The ras p21 protein is a product of the human ras onco-gene and can exist in a GTP-bound activated state or aGDP-bound inactive state. In both conformations theguanine binding pocket is structurally conserved butlarge structural changes of the order of 3 to 7 AÊ are ob-served in the region where the b- and g-phosphategroups of GTP and a magnesium ion are bound(Milburn et al., 1990; Prive et al., 1992). The crystal struc-tures of ®ve complexes of ras p21 with GDP and GTPanalogs were retrieved from the PDB (Bernstein et al.,1977). These were PDB entries 4q21 (the complex withGDP at 2 AÊ resolution; (Milburn et al., 1990), 6q21 (thecomplex with a GTP-like inhibitor at 1.95 AÊ resolution;Milburn et al., 1990; Prive et al., 1992), 1gnp (the complexwith 30-O-(N-methyl-anthraniloyl-20-deoxyguanosine-50-(b,g-imido)-triphosphate, at 2.7 AÊ resolution), 1gnq (thecomplex with P3-1[S]-(2-nitrophenyl) ethyl-guanosine-50-triphosphate at 2.5 AÊ resolution) and 1gnr (the complexwith P3-1[R]-(2-nitrophenyl) ethyl-guanosine-50-tripho-sphate at 1.85 AÊ resolution; Scheidig et al., 1995). Theseligands will be referred to as GDP, GCP, GNP, GNQ andGNR, respectively, and their chemical structures areshown in Figure 3. The latter three GTP analogs havelarge aromatic substituents at either the ribose (GNP) orthe g-phosphate (GNQ and GNR) and are expected notto dock correctly into the GDP-bound form, since theirlarger volume requires structural adaptation by the pro-tein. In order to obtain ligand coordinates that were notdependent on the mode of superposition of the differentprotein structures, all ligands were superimposed withtheir guanine moieties on that of the GDP molecule ofthe minimized average NMR structure. This could giverise to somewhat larger root-mean-square deviations dueto slightly different placing of the ligands in the differentras p21 structures.

The magnesium ion bound by the ras p21 protein waskept in place but its formal charge and those of theligand phosphate groups were divided by a factor of 5.This was empirically shown to reduce the dominance ofthe electrostatic interactions between these highlycharged ions on the total energy, without affecting otherelectrostatic interactions in the binding site.

The NMR solution structure of the GDP-bound form ofras p21 has recently been solved (Kraulis et al., 1994) andthe minimized average structure (PDB entry 1crq) to-gether with 20 structures generated using a simulatedannealing protocol (PDB entry 1crr) were retrieved fromthe PDB for use in docking studies. For residues 1 to 166of the protein coordinates were given in all crystal andNMR structures and therefore only these residues wereused.

The structure of the reduced rat uteroglobin protein com-plexed with 4,40-bismefylsulphonyl-2,20,5,50-tetrachlorobi-phenyl ((MeSO2)2-TCB) was recently solved using NMRspectroscopy (HaÈrd et al., 1995). This homodimeric pro-tein has a high af®nity for polychlorinated biphenyls(PCB)s (Kd � 10 nM) (HaÈrd et al., 1995) and is alsoknown to weakly bind progesterone (Ka � 15 mMÿ1)(Dunkel et al., 1995). Its endogenous ligand is still un-known. The binding site of this protein is extremelyhydrophobic, with only residues Tyr23 and Thr62 sup-

posedly being capable of forming hydrogen bondinginteractions with ligands. Upon binding of the ligand thebinding site is thought to close due to oxidation of cy-steine residues. The minimized avaraged structure of thecomplex was obtained from the PDB (entry 1utr) whilean ensemble of 25 simulated annealing structures waskindly provided by Dr T. HaÈrd (Karolinska Institute,Sweden).

In the structures of complexes between bovine plasmaretinol binding protein and retinol analogs (Zanotti et al.,1993, 1994), changes in side-chain conformations of resi-dues near the retinol binding site have been observed. Inparticular, fenretidine, which has a hydroxyphenylamide group replacing the retinol hydroxyl group,altered the side-chain conformations of Leu35 and Leu63(Zanotti et al., 1994). Five complexes were retrieved fromthe PDB, namely entries 1erb (the complex with N-ethylretinamide at 1.9 AÊ resolution; Zanotti et al., 1993), 1fel(the complex with fenretinide at 1.8 AÊ resolution), 1fem(the complex with reitnoic acid at 1.9 AÊ resolution), 1fen(the complex with axerophtene at 1.9 AÊ resolution) and1hbp (the complex with retinol at 1.9 AÊ resolution;Zanotti et al., 1994). For convenience, these ligands willbe referred to as ETR, FEN, REA, AZE and RTL, respect-ively, and their structures are shown in Figure 4. Onlythe ordered residues 3 to 174 of the protein were usedfor docking studies.

All protein structures were stripped of water moleculesexcept in the case of HIV protease where the water mol-ecule placed between the ¯aps and the inhibitors waskept in place. In the thioketal haloperidol structure achloride ion was observed to replace this water(Rutenber et al., 1993). In order to make the thioketal ha-loperidol complex homologous to the other four com-plexes the chloride ion was replaced by water. Inaddition, residues 7, 14, 37, 41, 43, 61, 63, 64, 67, 72 and95 in both momomers were replaced by alanine, sincethe amino acids at these positions varied among the ®vecomplexes. These residues are remote from the activesite and are not expected to in¯uence the docking pro-cess. Protons were added to all protein structures and inthe case of the HIV protease structures a proton wasplaced between the catalytic aspartates.

Parameters used for grid generation and docking

All ligand structures were taken from the PDB ®les, afterwhich proton positions and Gasteiger-Marsili charges(Gasteiger & Marsili, 1980) were calculated using SYBYLV6.02 (Tripos Associates, St Louis, MO).

Spheres were generated with the program SPHGEN asdescribed (Kuntz et al., 1982) and edited in order to re-move spheres remote from the binding site. For ensem-bles of solution structures the minimized averagestructure was used for sphere generation. A box enclos-ing the spheres was generated with the box faces placedat 3 AÊ from the nearest sphere. Protein structures weresuperimposed on the Ca atoms of residues within 5 AÊ

from the ligand. For ras p21 these were residues 11 to18, 28 to 32, 34, 57, 116 to 120 and 144 to 147, for utero-globin residues 8, 11, 15, 23, 40, 43, 58, 61, 62 and 65 inboth monomers, for retinol binding protein residues nosuperposition was performed, since the structures takenfrom the PDB already superimposed well. The HIV pro-tease structures were superimposed on residues 8, 23, 25,27 to 32, 45 to 53, 76 and 80 to 86 of both monomers inthe dimer.

438 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]

Page 16: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

In the case of geometry-weighted averaging, force ®eldscoring grids were generated with the CHEMGRID pro-gram (Meng et al., 1992), which was modi®ed to removethe force ®eld potentials of disordered atoms within thesame van der Waals radii as are used for creating thebump grids. For polar atoms the radius was set to 2.3 AÊ

and for non-polar atoms to 2.8 AÊ while for polar protonsa radius of 1 AÊ was used. A distance-dependent dielec-tric constant ("� 1/4r) was assumed for atoms within10 AÊ of each other and the grid spacing was set to either0.25 AÊ or 0.3 AÊ . For the protein the AMBER united atomrepresentation was used (Weiner et al., 1984), while forthe ligands all atom parameters were applied (Weineret al., 1986).

In the case of the energy-weighted averaging, grids wereconstructed as described in the previous section. TheAMBER united atom parameter set was used (Weineret al., 1984). A distance-dependent dielectric constant(e � 1/4r) was assumed. All receptor atoms within 10 AÊ

of a grid point were used to calculate the receptor terms.Grid spacing was either 0.25 AÊ or 0.3 AÊ , although theformer spacing was found to be more appropriate and issuggested for future work.

Favorable volumes of chemical scoring grids were calcu-lated using either a sp3 carbon atom from the AMBER allatom force ®eld with a 0.235 charge or a sp2/sp nitrogenatom with charge ÿ0.5 as a probe. For database runs thegrid spacing was 0.2 AÊ while for single ligand dockingruns 0.3 AÊ was used.

Single ligand and ras p21 database docking runs weredone with a matching tolerance of 0.7 AÊ and bin sizes of0.5 AÊ with 0.2 AÊ overlap (Shoichet et al., 1992). For HIVprotease database searches matching tolerances were setto 1.2 AÊ with ligand bin sizes of 0.2 AÊ with 0.05 AÊ over-lap (0.5 and 0.2 AÊ , respectively for receptor binsizes).Simplex minimization of the generated orientations(Meng et al., 1993) was done using either 500 or 100 asthe maximum number of iterations in the case of singleligand or database docking runs, while the convergencecriterion was set to 0.2 or 5 kcal/mol, respectively.

Acknowledgements

We gratefully acknowledge support from NIH grant GM31497. R.M.A.K. acknowledges ®nancial support fromthe Netherlands Organization for Scienti®c Research(NWO). We thank Dr T. HaÈrd for making the uteroglo-bin ensemble available to us, MDL Information SystemsInc. for the CMC-3D database and the UCSF ComputerGraphics Laboratory for use of the Midas moleculargraphics software.

References

Abdel-Meguid, S., Zhao, B., Murthy, K. H. M.,Winborne, E., Choi, J.-K., Desjarlais, R. L., Minnich,M. D., Culp, J. S., Debouck, C., Tomaszek, T. A.,Jr, Meek, T. D. & Dreyer, G. B. (1993). Inhibitionof human immunode®ciency virus-1 protease by aC2-symmetric phosphinate. Biochemistry, 32, 7972±7980.

Bacon, D. J. & Moult, J. (1992). Docking by least-squares®tting of molecular surface patterns. J. Mol. Biol.225, 849±858.

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,E. T. J., Brice, M. D., Rodgers, J. R., Kennard, O.,

Shimanouchi, T. & Tasumi, M. (1977). The ProteinData Bank: a computer-based archival ®le formacromolecular structures. J. Mol. Biol. 112, 535±542.

BoÈhm, H. J. (1992a). The computer program LUDI: anew method for the de novo design of enzymeinhibitors. J. Comput.-aided Mol. Design, 6, 61±78.

BoÈhm, H. J. (1992b). LUDI: Rule-based automatic designof new substituents for enzyme inhibitor leads.J. Comput-aided-Mol. Design, 6, 593±606.

Bonvin, A. M. J. J. & BruÈ nger, A. T. (1995). Confor-mational variability of solution nuclear magneticresonance structures. J. Mol. Biol. 250, 80±93.

Clark, K. P. & Ajay, (1995). Flexible ligand dockingwithout parameter adjustment across four ligandreceptor complexes. J. Comput. Chem. 16, 1210±1226.

Dunkel, R., Vriend, G., Beato, M. & Suske, G. (1995).Progesterone binding to uteroglobin: two alternativeorientations of the ligand. Protein Eng. 8, 71±79.

Erickson, J., Neidhart, D. J., van Drie, J., Kempf, D. J.,Wang, X. C., Norbeck, D. W., Plattner, J. J.,Rittenhouse, J. W., Turon, M., Wideburg, N.,Kohlbrenner, W. E., Simmer, R., Helfrich, R., Paul,D. A. & Knigge, M. (1990). Design, activity and2.8 AÊ crystal structure of a C2 symmetric inhibitorcomplexed to HIV-1 protease. Science, 249, 527±533.

Fitzgerald, P. M. D., McKeever, B. M., vanMiddlesworth, J. F., Springer, J. P., Heimbach, J. C.,Leu, C.-T., Herber, W. K., Dixon, R. A. F. & Darke,P. L. (1990). Crystallographic analysis of a complexbetween human imunode®ciency virus type 1 pro-tease and acetyl-pepstatin at 2.0 AÊ resolution. J. Biol.Chem. 265, 14209±14219.

Gasteiger, J. & Marsili, M. (1980). Iterative partial equali-zation of orbital electronegativity: rapid access toatomic charges. Tetrahedron Letters, 36, 3219±3222.

HaÈrd, T., Barnes, H. J., Larsson, C., Gustafson, J.-AÊ . &Lund, J. (1995). Solution structure of a mammalianPCB-binding protein in complex with a PCB. Nat.Struct. Biol. 2, 983±989.

Jones, G., Willett, P. & Glen, R. C. (1995). Molecular rec-ognition of receptor sites using a genetic algorithmwith a description of desolvation. J. Mol. Biol. 245,43±53.

Jorgensen, W. L. (1991). Rusting of the lock and keymodel for protein±ligand binding. Science, 254,954±955.

Judson, R. S., Tan, Y. T., Mori, E., Melius, C., Jaeger,E. P., Treasurywala, A. M. & Mathiowetz, A. (1995).Docking ¯exible molecules: a case study of threeproteins. J. Comput. Chem. 16, 1405±1419.

Kollman, P. (1994). Theory of macromolecule-ligandinteractions. Curr. Opin. Struct. Biol. 4, 240±245.

Koshland, D. E. (1958). Application of a thoery ofenzyme speci®city to protein synthesis. Proc. NatlAcad. Sci. USA, 44, 98±104.

Kraulis, P. J., Domaille, P. J., Campbell-Burk, S. L., vanAken, T. & Laue, E. D. (1994). Solution structureand dynamics of ras p21-GDP determined by het-eronuclear three- and four-dimensional NMRspectroscopy. Biochemistry, 33, 3515±3531.

Kuntz, I. D. (1992). Structure-based strategies for drugdesign and discovery. Science, 257, 1078±1082.

Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. &Ferrin, T. E. (1982). A geometric approach to macro-molecule-ligand interactions. J. Mol. Biol. 161, 269±288.

Leach, A. R. (1994). Ligand docking to proteins with dis-crete side-chain ¯exibility. J. Mol. Biol. 235, 345±356.

Molecular Docking to Ensembles of Protein Structures 439

JMB MS1697 [31/1/97]

Page 17: Molecular Docking to Ensembles of Protein Structures › ... › kuntz-dock-to-ensemble-JMB.pdf · 2001-04-19 · Molecular Docking to Ensembles of Protein Structures Ronald M. A

Leach, A. R. & Kuntz, I. D. (1992). Conformationalanalysis of ¯exible ligands in macromolecular recep-tor sites. J. Comput. Chem. 13, 730±748.

Lewis, R. A. (1992). Automated site-directed drug de-sign using molecular lattices. J. Mol. Graph. 10, 66±78.

Lin, S. L. (1994). Molecular surface representation bysparse critical points. Proteins: Struct. Funct. Genet.18, 94±101.

McCammon, J. A. & Harvey, S. C. (1987). In Dynamics ofProteins and Nucleic Acids. Cambridge UniversityPress, Cambridge.

Meng, E. C., Schoichet, B. K. & Kuntz, I. D. (1992).Automated docking with grid-based energyevaluation. J. Comput. Chem. 13, 505±524.

Meng, E. C., Gschwend, D. A., Blaney, J. M. & Kuntz,I. D. (1993). Orientational sampling and rigid-bodyminimization in molecular docking. Proteins: Struct.Funct. Genet. 17, 266±278.

Milburn, M. V., Tong, L., de Vos, A. M., BruÈ nger, A.,Yamaizumi, Z., Nishimura, S. & Kim, S.-H. (1990).Molecular switch for signal transduction: structuraldifferences between active and inactive forms ofprotooncogenic ras protein. Science, 247, 939±945.

Mirankar, A. & Karplus, M. (1995). An automatedmethod for dynamic ligand design. Proteins: Struct.Funct. Genet. 23, 472±490.

Mizutani, M. Y., Tomioka, N. & Itai, A. (1994). Rationalautomatic search method for stable docking modelsof protein and ligand. J. Mol. Biol. 243, 310±326.

Norel, R., Fischer, D., Wolfson, H. J. & Nussinov, R.(1994). Molecular surface recognition by a computervision-based techniquw. Protein Eng. 7, 39±46.

Oshiro, C. M., Kuntz, I. D. & Dixon, J. S. (1995). Flexibleligand docking using a genetic algorithm. J. Comput.-aided Mol. Design, 9, 113±130.

PriveÂ, G. G., Milburn, M. V., Tong, L., de Vos, A. M.,Yamaizumi, Z., Nishimura, S. & Kim, S.-H. (1992).X-ray crystal structures of transforming p21 rasmutants suggest a transition-state stabilization forGTP hydrolysis. Proc. Natl Acad. Sci. USA, 89,3649±3653.

Rarey, M., Kramer, B., Lengauer, T. & Klebe, G. (1996).A fast ¯exible docking method using an incremen-tal construction algorithm. J. Mol. Biol. 261, 470±489.

Rutenber, E., Fauman, E. B., Keenan, R. J., Fong, S.,Furth, P. S., Ortiz de Montellano, P. R., Meng, E.,Kuntz, I. D., De Camp, D. L., Salto, R., Rose, J. R.,Craik, C. S. & Stroud, R. M. (1993). Structure of anon-peptide inhibitor complexed with HIV-1protease. J. Biol. Chem. 268, 15343±15346.

Scheidig, A. J., Franken, S. M., Corrie, J. E. T., Reid,G. P., Wittinghofer, A., Pai, E. F. & Goody, R. S.(1995). X-ray crystal structure analysis of the cataly-tic domain of the oncogene product p21H-ras com-plexed with caged GTP and Mant dGppNHp. J. Mol.Biol. 253, 132±150.

Shoichet, B. K., Bodian, D. L. & Kuntz, I. D. (1992). Mol-ecular docking using shape descriptors. J. Comput.Chem. 13, 380±397.

Sutcliffe, M. J. (1993). Representing an ensemble ofNMR-derived protein structures by a singlestructure. Protein Sci. 2, 936±944.

Swain, A. L., Miller, M. M., Green, J., Rich, D. H.,Schneider, J., Kent, S. B. H. & Wlodawer, A. (1990).X-ray crystallographic structure of a complexbetween a synthetic protease of human immunode-®ciency virus 1 and a substrate based hydroxyethy-lamine inhibitor. Proc. Natl Acad. Sci. USA, 87,8805±8809.

Weiner, S. J., Kollman, P. A., Case, D. A., Singh, U. C.,Ghio, C., Alagona, G., Profeta, S., Jr & Weiner, P. A.(1984). A new force ®eld for molecular mechanicssimulation of nucleic acids and proteins. J. Am.Chem. Soc. 106, 765±784.

Weiner, S. J., Kollman, P. A., Hguyen, D. T. & Case,D. A. (1986). An all atom force ®eld for simulationsof proteins and nucleic acids. J. Comput. Chem. 7,230±252.

Welch, W., Ruppert, J. & Jain, A. N. (1996). Hammer-head: fast, fully automated docking of ¯exibleligands to protein binding sites. Chem. Biol. 3, 449±462.

WuÈ thrich, K. (1986). In NMR of Proteins and NucleicAcids. John Wiley & Sons, New York.

Yamazaki, T., Hinck, A. P., Wang, Y. X., Nicholson,L. K., Torchia, D. A., Wing®eld, P., Stahl, S. J.,Kaufman, J. D., Chang, C. H., Domaille, P. J. &Lam, P. Y. S. (1995). Three dimensional solutionstructure of the HIV-1 protease complexed withDMP323, a novel cyclic urea-type inhibitor, deter-mined by nuclear magnetic resonance spectroscopy.Protein Sci. 5, 495±506.

Zanotti, G., Malpeli, G. & Berni, R. (1993). The inter-action of N-ethyl retinamide with plasma retinol-binding protein (RBP) and the crystal structure ofthe retinoid-RBP complex at 1.9 AÊ resolution. J. Biol.Chem. 268, 24873±24879.

Zanotti, G., Marcello, M., Malpeli, G., Folli, C., Sartori,G. & Berni, R. (1994). Crystallographic studies oncomplexes between retinoids and plasma retinol-binding protein. J. Biol. Chem. 26, 29613±29620.

Edited by B. Honig

(Received 6 October 1996; received in revised form 31 October 1996; accepted 5 November 1996)

440 Molecular Docking to Ensembles of Protein Structures

JMB MS1697 [31/1/97]