13
Asparagine and Glutamine: Using Hydrogen Atom Contacts in the Choice of Side-chain Amide Orientation J. Michael Word, Simon C. Lovell, Jane S. Richardson and David C. Richardson* Biochemistry Department Duke University, Durham NC 27710-3711, USA Small-probe contact dot surface analysis, with all explicit hydrogen atoms added and their van der Waals contacts included, was used to choose between the two possible orientations for each of 1554 asparagine (Asn) and glutamine (Gln) side-chain amide groups in a dataset of 100 unrelated, high-quality protein crystal structures at 0.9 to 1.7 A ˚ resol- ution. For the movable-H groups, each connected, closed set of local H-bonds was optimized for both H-bonds and van der Waals overlaps. In addition to the Asn/Gln ‘‘flips’’, this process included rotation of OH, SH, NH 3 , and methionine methyl H atoms, flip and protonation state of histidine rings, interaction with bound ligands, and a simple model of water interactions. However, except for switching N and O identity for amide flips (or N and C identity for His flips), no non-H atoms were shifted. Even in these very high-quality structures, about 20 % of the Asn/Gln side-chains required a 180 flip to optimize H-bonding and/or to avoid NH 2 clashes with neighboring atoms (incorporating a conserva- tive score penalty which, for marginal cases, favors the assignment in the original coordinate file). The programs Reduce, Probe, and Mage provide not only a suggested amide orientation, but also a numerical score com- parison, a categorization of the marginal cases, and a direct visualization of all relevant interactions in both orientations. Visual examination allowed confirmation of the raw score assignment for about 40 % of those Asn/Gln flips placed within the ‘‘marginal’’ penalty range by the auto- mated algorithm, while uncovering only a small number of cases whose automated assignment was incorrect because of special circumstances not yet handled by the algorithm. It seems that the H-bond and the atomic- clash criteria independently look at the same structural realities: when both criteria gave a clear answer they agreed every time. But consider- ation of van der Waals clashes settled many additional cases for which H-bonding was either absent or approximately equivalent for the two main alternatives. With this extra information, 86 % of all side-chain amide groups could be oriented quite unambiguously. In the absence of further experimental data, it would probably be inappropriate to assign many more than this. Some of the remaining 14 % are ambiguous because of coordinate error or inadequacy of the theoretical model, but the great majority of ambiguous cases probably occur as a dynamic mix of both flip states in the actual protein molecule. The software and the 100 coordinate files with all H atoms added and optimized and with amide flips corrected are publicly available. # 1999 Academic Press Keywords: side-chain amide orientation; hydrogen atom placement; Asn/Gln flips; hydrogen bond network; small-probe contact dots *Corresponding author E-mail address of the corresponding author: [email protected] Abbreviation used: PDB, Protein Data Bank. Article No. jmbi.1998.2401 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 285, 1735–1747 0022-2836/99/041735–13 $30.00/0 # 1999 Academic Press

Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

Asparagine and Glutamine: Using Hydrogen AtomContacts in the Choice of Side-chainAmide Orientation

J. Michael Word, Simon C. Lovell, Jane S. Richardsonand David C. Richardson*

Biochemistry DepartmentDuke University, DurhamNC 27710-3711, USA

Small-probe contact dot surface analysis, with all explicit hydrogenatoms added and their van der Waals contacts included, was used tochoose between the two possible orientations for each of 1554 asparagine(Asn) and glutamine (Gln) side-chain amide groups in a dataset of 100unrelated, high-quality protein crystal structures at 0.9 to 1.7 AÊ resol-ution. For the movable-H groups, each connected, closed set of localH-bonds was optimized for both H-bonds and van der Waals overlaps.In addition to the Asn/Gln ``¯ips'', this process included rotation of OH,SH, NH3

�, and methionine methyl H atoms, ¯ip and protonation state ofhistidine rings, interaction with bound ligands, and a simple model ofwater interactions. However, except for switching N and O identity foramide ¯ips (or N and C identity for His ¯ips), no non-H atoms wereshifted. Even in these very high-quality structures, about 20 % of theAsn/Gln side-chains required a 180 � ¯ip to optimize H-bonding and/orto avoid NH2 clashes with neighboring atoms (incorporating a conserva-tive score penalty which, for marginal cases, favors the assignment in theoriginal coordinate ®le). The programs Reduce, Probe, and Mage providenot only a suggested amide orientation, but also a numerical score com-parison, a categorization of the marginal cases, and a direct visualizationof all relevant interactions in both orientations. Visual examinationallowed con®rmation of the raw score assignment for about 40 % of thoseAsn/Gln ¯ips placed within the ``marginal'' penalty range by the auto-mated algorithm, while uncovering only a small number of cases whoseautomated assignment was incorrect because of special circumstances notyet handled by the algorithm. It seems that the H-bond and the atomic-clash criteria independently look at the same structural realities: whenboth criteria gave a clear answer they agreed every time. But consider-ation of van der Waals clashes settled many additional cases for whichH-bonding was either absent or approximately equivalent for the twomain alternatives. With this extra information, 86 % of all side-chainamide groups could be oriented quite unambiguously. In the absence offurther experimental data, it would probably be inappropriate to assignmany more than this. Some of the remaining 14 % are ambiguousbecause of coordinate error or inadequacy of the theoretical model, butthe great majority of ambiguous cases probably occur as a dynamic mixof both ¯ip states in the actual protein molecule. The software and the100 coordinate ®les with all H atoms added and optimized and withamide ¯ips corrected are publicly available.

# 1999 Academic Press

Keywords: side-chain amide orientation; hydrogen atom placement;Asn/Gln ¯ips; hydrogen bond network; small-probe contact dots*Corresponding author

E-mail address of the corresponding author: [email protected] used: PDB, Protein Data Bank.

Article No. jmbi.1998.2401 available online at http://www.idealibrary.com on J. Mol. Biol. (1999) 285, 1735±1747

0022-2836/99/041735±13 $30.00/0 # 1999 Academic Press

Page 2: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

Introduction

Correct assignments of the NH2 versus the Obranches of Asn and Gln side-chain amide groupsare a relatively minor part of a protein structuredetermination, but they can be quite important ifthe residue is involved in H-bonding at the activesite, or if one wants to analyze bound water mol-ecules, H-bond networks, or detailed electrostatics.However, such assignments have been considereddif®cult, and sometimes are not even attempted.The Protein Data Bank (PDB; Bernstein et al., 1977)format actually provides a special ambiguous atomdesignator of A, meaning ``either N or O'', for useby crystallographers who want to keep the uncer-tainty explicit. Finding the NH2 hydrogen atoms ortelling apart the N and O atoms by direct obser-vation in the electron-density map is not possibleexcept at extremely high resolution. The distinc-tion, therefore, is almost always made on the indir-ect evidence of H-bonding possibilities, usuallydone by inspection as part of the model-®ttingprocess.

The most secure assignments are when theenvironment includes H-bonding groups that areeither obligate donors such as a peptide NH group(which can interact only with the O branch of theamide) or obligate acceptors such as a carboxylgroup (which can interact only with the NH2

branch). In the frequent cases where the surround-ing groups are ambiguous donors or acceptors(OH, His, other amide groups, or water molecules),assignment involves analyzing the entire local net-work of H-bonds. The energy terms in re®nementprotocols are capable, in principle, of favoring theamide orientation with the best H-bonding, but inpractice the 180 � ¯ip between the two orientationsis a large enough step that minimization willalways, and molecular dynamics sometimes, betrapped in one of the local minima. Even after fullanalysis, many H-bond networks have two equallyfavorable solutions that involve concertedexchange of all donors and acceptors, many Asn orGln amide groups are undetermined because theyare exposed at the surface, and some make onlynon-polar interactions. Histidine residues also havea similar assignment problem, since a 180 � ¯ip ofthe imidazole ring exchanges N and C atoms inthe d and e positions. In place of the amide ambi-guities of H-bond donor versus acceptor, for Histhe choice is a more drastic one between a polar,or even charged, NH and a CH with only veryweak H-bonding potential; however, imidazoleorientation can also be ambiguous.

Several automated methods have been devel-oped to help deal with this problem. HBPLUS(McDonald & Thornton, 1994) tries the ¯ip statesof each Asn, Gln, and His, and chooses the alterna-tive that minimizes unsatis®ed buried H-bondinggroups, dividing the prior Asn/Gln/His orien-tations into highly favored, slightly favored, indif-ferent, slightly suspect, and highly suspect;however, it does not deal with pairs or larger

interacting groups. NETWORK (Bass et al., 1992)analyzes H-bond networks to optimize polar Hplacement, but does not allow for amide or imida-zole ¯ips. WhatIf (Hooft et al., 1996) deals withboth aspects of the problem, including even theassignment of H positions for all crystallographi-cally located water molecules with occupancy >0.5;it builds in crystal symmetry, and has a penaltybias against ¯ips in marginal cases. Inclusion of thewater hydrogen atoms makes the combinatorialproblem so huge that it cannot possibly be treatedexhaustively, so it is done by a variant of simu-lated annealing. WhatIf does a thorough and care-ful job of analyzing the H-bond networks, comingout with a decision for all the ambiguous polargroups; using it would improve assignments forthe majority of structures. This feature is just onepart of an overall package with many othervaluable functionalities. Its disadvantages foramide assignment are that it relies strongly onplacement of water molecules, which are the leastreliable feature in macromolecular structures; itsoutput is not convenient, and its answers cannotbe critically evaluated because there are noestimates of con®dence and the reasons for itschoices are well hidden inside a complex, stochas-tic process.

We are now revisiting this problem because oursmall-probe contact dot methodology (Word et al.,1999) uncovers a source of independent new infor-mation from the analysis of van der Waals clashesfor explicit H atoms, so that these decisionsbecome less complex and much less subtle. Thereasons for a given choice can easily be expressedboth as numerical scores and in a visual displaythat explicitly shows all relevant positive andnegative interactions (e.g. Figure 1), so that theuser can easily evaluate con®dence levels for agiven choice. The method is applied here tooptimizing the H-bond networks and assigningAsn, Gln, and His ¯ips in a set of very high-resol-ution crystal structures.

Results

Individual Asn/Gln examples

The basic point of this new approach is that thevan der Waals interactions of polar H atoms arecrucial to ruling out incorrect amide orientations,even if they are not necessary for evaluating theenergy of correct hydrogen bonding (see Pro-cedures). The NH hydrogen extends 0.6 AÊ fartherthan the bare oxygen, which alters the van derWaals interactions so drastically that these amide¯ip choices generally become blatantly evidentrather than subtle, once hydrogen atoms are addedby the program Reduce and contacts are scoredand examined with small-probe dots generated bythe program Probe (see Procedures). Figure 2shows one of the many really obvious cases, whichhas excellent H-bonding in the correct orientationand extreme van der Waals clashes in the incorrect

1736 Asn/Gln Amide Flips, Using Contact Dots

Page 3: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

orientation; the un-normalized score comparison(see Procedures) is �0.3 versus ÿ7.0. This Glncould be assigned correctly by any person or com-puter program explicitly considering it, since theH-bonding is completely unambiguous. However,some cases this obvious were found to be misas-signed in our reference dataset, including some inproteins re®ned using molecular dynamics andeven a few in structures at atomic resolution. Thisparticular example, Gln90 from the immunoglobu-

lin VL dimer 1REI at 2.0 AÊ resolution, used ``A''atom designations to leave the amide assignmentexplicitly ambiguous.

Many other cases cannot be determined by theH-bonds alone, but are unambiguous if the vander Waals interactions are included. Figure 3shows an example (from ribonuclease F1) whichhas no H-bonds, but where the non-polar inter-actions accommodate one amide orientation verynicely but not the other. In one ¯ip state the NH2

group of Gln57 nestles neatly against the backbone,while in the other ¯ip state it collides with aproline side-chain. The score comparison is �1.0versus ÿ1.4.

Figure 4 illustrates a pair of doubly H-bondedside-chain amide groups from the fungal peroxi-dase 1ARU, for which two of the four possibleorientations are equally good if only H-bonding isconsidered. Such situations are rather common,either for pairs or for larger H-bond networks, inwhich switching all donors and acceptors in unisoncan produce equally good H-bonding. However, asin Figure 4, van der Waals clashes usually rule outone of the two best H-bond possibilities: in thiscase, the original assignment in the coordinate ®lehas good H-bonds but bad clashes of both amide

Figure 1. Small-probe (radius 0.25 AÊ ) contact dotsaround Gln71 of cutinase (1CUS), colored by contactgap and including favorable van der Waals contacts(green and blue dots) as well as H-bonds (pale greendots), slight overlaps (short yellow spikes), and clashes(orange or red spikes; none present).

Figure 2. (a) and (b) Amide ¯ip comparison for Gln90from the immunoglobulin VL dimer of 1REI (Epp et al.,1975), colored by atom type (O, red; N, blue; C, white)and with clashes emphasized by spikes. There are threeH-bonds and no clashes in the correct ¯ip position (a)versus no H-bonds and three serious clashes of the NH2

hydrogen atoms in the incorrect ¯ip position (b). Theprobe radius is 0.25 AÊ .

Figure 3. (a) and (b) Amide ¯ip comparison for Gln57in the 1FUS ribonuclease F1 (Vassylyev et al., 1993),which has no H-bonding but whose orientation isunambiguously determined by the NH2 clashes in theposition of (b), mainly with the Hb of a Pro side-chain.In contrast, it ®ts well against the backbone in (a). Proberadius is 0.25 AÊ .

Asn/Gln Amide Flips, Using Contact Dots 1737

Page 4: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

NH2 groups with their own Ha atoms, while ¯ip-ping both amide groups gives very much bettervan der Waals contacts and equally good H-bond-ing. The score comparison is ÿ0.4 versus �3.6 forthe original and double-¯ip states, respectively(compared with ÿ5.0 and ÿ6.7 for the two single-¯ip states). A slight shear offset between the twoside-chains is visible in Figure 4, which puts thetwo polar H atoms further apart (2.5 AÊ ) in the bet-ter orientation. That does not necessarily mean thatthe amide H radius needs to be larger than 1.0 AÊ ,however, because the two polar H atoms arepresumably shifted apart by electrostatic as well asvan der Waals repulsion. Unfortunately, our data-set cannot calibrate the consistency of such a shearoffset, because it happens that each of the threeother doubly H-bonded Asn/Gln pairs has one ofthe amide groups incorrectly oriented by ourcriteria, which seems to have caused re®nement toslightly distort the interactions.

Systematic surveys

The reference datasets for 100 proteins contain1554 unique Asn and Gln residues, 1539 of whichhave no missing atoms, metal ligation, or covalentmodi®cations. To provide a cross-check on thisnew methodology (and to identify unusual situ-ations that could be handled by improving the

algorithm), the Asn and Gln residues were system-atically surveyed three times, each time by a differ-ent combination of contact score comparison (seeProcedures) and inspection of their small-probecontact dots in the Mage display program. The®rst time through, each Asn or Gln with B < 40was examined if its individual ¯ipped score wasnot clearly worse than its original score; H-bondinteractions between multiple residues wereassessed visually. The above process resulted in¯ipping 252 Asn/Gln residues (17 % of the total) in71 of the ®les. For several ®les the Asn/Gln ¯iprate was approximately random (near 50 %),implying that the amide groups had not beenexamined (note: 451C (Matsuura et al., 1982)uses the ambiguous ``A'' atom designations).

For the second-round survey, an algorithm wasdeveloped to analyze full H-bond networks auto-matically, as part of the Reduce program. Theorientations of 47 His, 17 Cys, ten OH, nine Asn,and three Gln residues are ®xed by metal ligation,and three Asn and several Cys, Ser, and Tyr resi-dues by various other covalent modi®cations; theAsn/Gln modi®cations are listed in Table 1. Thisleaves 6548 unique movable-H groups, includingAsn, Gln and His residues, OH, SH, NH3

� and Metmethyl groups, and similar functionalities in thesmall-molecule ``heterogens'' (see the accompany-ing paper to explain why methyl groups arerotated only in Met side-chains). Those groupswere then partitioned into closed sets of local inter-acting networks or ``cliques'' (see Procedures). Ofthe movable groups, 5050 were found to be iso-lated from any other; there are 557 interactingpairs, 94 triples, 14 cliques of four, eight cliques of®ve, one clique of six, and no larger groupings.

The clique score (using both favorable H-bondsand unfavorable overlaps, including a simplemodel for interaction with the crystallographicwater molecules) is evaluated for all combinationsof possible H atom positions, in order to choosethe optimal arrangement. Most movable groupshave either two, four, or six potential H atompositions, while we restrict an OH or SH to some-thing between two and as many as 18 in the mostcrowded environments (see Procedures). Since thelargest clique found had only six members, anexhaustive search is computationally tractable: ittakes three hours to do all 100 proteins on anR10000 SGI Indigo II.

For each residue in a clique, the best total cliquescore and the best conformation are reported plus,for Asn/Gln/His, the best total clique score foundwith that residue in the opposite ¯ip state. Thisscore comparison directly shows how sensitive, orwell-determined, the ¯ip state is for that speci®cresidue. The H atom positions for the best-scoringarrangement are added to the output PDB formatcoordinate ®le, and N versus O or N versus Cidentities are switched where indicated for Asn/Gln/His ¯ips. Agreement with decisions made inthe ®rst-round survey was used to optimize thevalue of a penalty against changing the depositor-

Figure 4. (a) and (b) A double amide ¯ip of theAsn128-Gln34 pair in the fungal peroxidase 1ARU(Flory, 1969) that cannot be resolved just by H-bonding.In the incorrect double ¯ip (b), there is a very bad clashof the Gln NH2 with Ha and a smaller clash of the AsnNH2, whereas the amide ¯ip state shown in (a) isaccommodated well. There is also a shear offset betweenthe amide groups that puts the two NH groups at afurther, and more favorable, distance in (a). Here and inFigures 6 and 8, the contact dots are simpli®ed byshowing only the H-bonds and the overlaps, not theattractive van der Waals contacts.

1738 Asn/Gln Amide Flips, Using Contact Dots

Page 5: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

assigned ¯ip state; the penalty was set at 0.5,which means that the score difference must favorthe ¯ipped state by at least 0.5 or Reduce wouldnot assign a ¯ip. Any ¯ip state for which an atomin the movable group has a serious clash (over-lap 5 0.4 AÊ ) is ¯agged with a !. If both the beststate and the best ¯ip state have clashes, then non-H atoms must be badly placed and B-factors areusually high on at least one side of the clash. If it isnot practical to evaluate these cases individually,then they should be omitted from any further anal-ysis. For the automated algorithm, therefore, wehave adopted the conservative policy of notassigning any ¯ips for these double-clash cases.

The result of the automated analysis of round 2is 100 coordinate ®les with all H atoms added andoptimized, and all changes and assignmentsdocumented in the ®le headers, plus contact dotkinemages which animate between the two ¯ipstates, one set for Asn/Gln and another set for Hisside-chains. Out of the 1554 Asn � Gln side-chains,Reduce ¯ipped 290, or 19 %, of them (see Figure 5).The rest were all left in their original orientation,including 49 with bad clashes both ways (3 %) and314 with small score differences between ÿ0.5 and�0.5. Out of 379 His side-chains, 30 were ¯ipped(8 %), 27 had bad clashes both ways (7 %), and 49had small score differences. Only 17 Asn, 29 Gln,and 12 His (3 % of the total) residues are so com-

Table 1. Side-chain amide ¯ips of Asn and Gln (round 3)

PDB N � Qa Fixedb Keep Clashc Unk.d Flip

1aac 3 31ads 27 15 5 71aky 20 14 5 11amm 16 9 1 61arb 26 16 1 91aru 26 1CHO 15 4 61benAB 6 4 1 11bkf 6 2 2 21bpi 4 41cem 37 24 9 41cka 4 2 1 11cnr 3 31cnv 25 15 4 61cpcB 12 1 CH3 3 1 2 51cse 33 2 Ca 21 3 71ctj 10 6 1 31cus 15 9 3 31dad 18 9 4 51dif 9 5 3 11edmB 5 1 Ca 41etm 1 11ezm 27 9 1 4 131fnc 17 12 3 21fus 14 11 31fxd 3 31hfc 14 8 1 1 41ifc 12 8 41igd 4 3 11iro 4 41isuA 7 6 11jbc 19 1 Ca 12 1 1 41kap 53 1 Ca 44 2 61knb 20 10 4 61lam 36 24 1 111lit 12 6 4 21lkk 10 5 1 3 11lucB 34 19 3 121mctI 01mla 25 12 3 101mrj 29 19 4 61nfp 30 18 1 7 41nif 21 17 1 31not 1 11osa 11 2 Ca 4 1 41phb 34 12 1 8 131php 15 8 3 41plc 7 71poa 14 9 2 31ptf 6 3 31ptx 5 4 11ra9 9 4 1 41rcf 17 13 2 21rgeA 7 5 1 11rie 6 5 11rro 11 1 Ca 4 2 41sgpI 5 3 1 11smd 52 1 Ca 42 4 51snc 10 6 3 11sriA 10 6 1 31tca 32 1 CHO 28 31ttaA 3 2 11whi 6 4 21xic 21 14 1 2 41xsoA 12 9 1 21xyzA 45 27 1 7 10256bA 12 8 42ayh 23 15 1 2 52bopA 11 1 Yb 5 2 2 12cba 21 16 3 22ccyA 8 6 22cpl 12 122ctc 25 16 2 1 62end 9 8 1

PDB N � Qa Fixedb Keep Clashc Unk.d Flip

2er7 15 4 2 92erl 3 1 22hft 21 16 3 22ihl 16 12 2 22mcm 7 6 12mhr 6 5 12msbA 10 2 Ca 6 22olb 54 46 1 4 32phy 11 8 32rhe 8 7 12rn2 15 8 1 62trxA 7 73b5c 5 53chy 10 8 23ebx 7 4 1 23grs 25 15 1 3 63lzm 17 7 7 33pte 34 23 3 83sdhA 16 11 5451c 8 2 1 54fgf 9 8 14ptp 26 13 6 75p21 15 7 87rsa 17 14 38abp 20 17 2 1bio1rpo 5 3 2bio2wrp 10 3 1 6

Totals 1554 15 1006 20 195 318

a Total number of Asn + Gln.b Number of amide orientations ®xed by covalent modi®ca-

tions: by metals (Ca or Yb), carbohydrates (CHO), or methyla-tion (CH3).

c Number with severe clashes (overlap 50.4 AÊ ) in both orien-tations.

d Number classi®ed as ``Unknown'' (score difference <0.5),even after individual examination.

Asn/Gln Amide Flips, Using Contact Dots 1739

Page 6: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

pletely exposed that they had scores of exactlyzero in both orientations.

As an example of Reduce's automated analysisof an H-bond network, Figure 6 shows the twoprincipal alternatives for the linear Asn138-His123-His131 clique in cellulase (correctly assigned in thePDB ®le 1CEM); there are two states for the Asn

and six for each His, giving a total of 72 possibili-ties. There is a water molecule at each end of thenetwork, and internally all H-bond acceptors anddonors can be switched, giving four equivalent H-bonds in the two best states. When H atom clashesare considered, however, they occur in (and unam-biguously disfavor) only one of these two overall¯ip states: the score comparison is �5.4 versusÿ0.7.

Because this is a new method, and because wewant these modi®ed coordinate ®les to be areliable basis for future analyses, we undertook athird-round survey in which both ¯ip alternativesand their contact dots were visually examined inMage for all of the Asn/Gln/His residues that theautomated algorithm had ¯ipped, for all of thesmall number of cases where Reduce (round 2)disagreed with the round 1 assignments, and forall cases in a subset of 20 ®les. Out of 290 amide¯ips recommended by Reduce, only 15 wererejected in round 3 (5 % of the ¯ips, or 1 % of allAsn/Gln amides), of which 12 were declaredambiguous and only three as clear ``Keep'' resi-dues.

During this process, it became obvious thatmany of the double-clash cases and the marginalcases with low score differences could actually alsobe assigned an orientation with con®dence. There-fore, round 3 was expanded to include visualexamination of all Asn/Gln/His in those two cat-egories. Most of the resulting reassignmentsinvolve con®rming the orientation indicated by the¯ip scores: for example, Gln61 of 1MLA, malonylCoA carrier protein (Serre et al., 1995), with a scoredifference of 0.48 was promoted from marginal to¯ipped, because it can make weak H-bonds bothto backbone and to a Glu Oe in the preferable¯ipped orientation. However, there are a small butsigni®cant number of cases where the visualassignment contradicts the direction of the score

Figure 5. Categories of side-chain amide assignments for each of the 100 proteins, sorted in decreasing order oftotal Asn � Gln residues. Here, the ``Keep � ®xed'' category includes the few ®xed by covalent modi®cations, andthe ``Clash � unknown'' category includes both the ``C'' (double-clash) and ``X'' (low-score difference) groups.N stands for Asn, Q for Gln.

Figure 6. (a) Correct versus (b) ¯ipped arrangementsof the Asn138-His123-His131 H-bonding network fromthe 1CEM cellulase (Alzari et al., 1996). All four H-bonds are equivalent in the two forms, which are distin-guished by clashes of the Asn NH2 group in (b). vander Waals contact dots are not shown.

1740 Asn/Gln Amide Flips, Using Contact Dots

Page 7: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

difference. Sometimes these involve factors that arenot yet included in the algorithm but which couldbe (such as the in¯uence of a charged groupslightly too far away to score as an H-bond), whilesometimes the analysis involves judging the rela-tive probability of different types of errors in waysit is hard to imagine automating. As an example ofthe latter sort, Gln260 of 1ARU (fungal peroxidase)has a score difference of ÿ1.0 versus �0.1 and abad clash of the NH2 with Cd of Glu325 in the orig-inal orientation, yet our judgment agrees unam-biguously with the original crystallographicassignment: the low B-factor Gln260 in its ¯ippedorientation has only a water H-bond and Oe isvery close to two other oxygen atoms, while theoriginal orientation adds a backbone CO H-bondand improves contacts, and a minor rotation of thehigh B-factor Glu325 could convert the clash intoan H-bond with the Glu Oe. Such cases shouldemphasize the point that although the automaticalgorithm does a very good job, the most reliableassignments combine the automated analysis withvisual inspection.

Summary of final assignments

The third-round survey results in ®ve categoriesof Asn/Gln residues summarized in Table 1. Firstis the small subset whose orientation is ®xed bymetal ligation or covalent modi®cation. The largestcategory by far (65 % of the total) are the 1006Asn/Gln, with a clear, unambiguous ¯ip assign-ment that agrees with their identi®cation in thedeposited PDB ®le. Then there are the side-chainamide groups which unambiguously require ¯ip-ping: 318 Asn/Gln residues (20.5 %) in 73 of the100 ®les. For 20 of the Asn/Gln side-chains (1 %)whose movable group has a severe clash in bothorientations, either they or a neighboring group ispositioned incorrectly in such an arrangement thatit is unclear which would be the correct amideorientation once the problem was ®xed. The ®fth,®nal category are 195 still-ambiguous Asn/Glncases (12.5 %), mostly with small score differences(ÿ0.5 4 D < 0.5). All examples in the two ambigu-ous categories are left in their original orientation.

For each of the 100 proteins, sorted by totalAsn � Gln residues, Figure 5 plots the number of``Keep � ®xed'' Asn/Gln (in light gray), the num-ber of ``Clash � unknown'' (in cream), and thenumber of ``Flip'' Asn/Gln (in dark gray). Twenty-seven ®les had no ¯ips and four ®les had 50 % ormore ¯ips, but overall the distribution is relativelyuniform. Somewhat surprisingly, the percentage ofamide ¯ips shows no signi®cant relation with res-olution. However, the percentage of ¯ips doesshow a small but signi®cant ( p � 0.017) positiverelation to the size of the protein, perhaps re¯ect-ing time limitations for evaluating correct amideorientation as protein size increases.

For histidine side-chains, there are 47 ®xed bymetal ligation, 250.5 which unambiguously shouldbe kept in their original orientation (66 %), 37.5

which unambiguously require ¯ipping (only 10 %),13 with unresolvable clashes both ways (3.4 %),and 31 ambiguous cases with small score differ-ences (8.2 %). The non-integral values occur at adimer interface, as discussed below. Histidine resi-dues show fewer ¯ips than Asn/Gln amidegroups, but more often have unresolvable clashes.Those unresolved clashes are almost all with Oatoms and, therefore, may be cases of CH � � �OH-bonding (Derewenda et al., 1995) in His rings.

The marginal Asn/Gln/His cases with a smallscore difference include: (1) completely exposedside-chains with no neighboring atoms or selfclashes; (2) H-bond networks with no externalclashes to any alternative H positions and nearlyequal scores when donors and acceptors are allswitched; (3) H-bond networks across symmetricaldimer interfaces; and (4) con¯icts in which each¯ip state has a favorable interaction incompatiblewith the other state. Some of these examples makeit clear that a ¯ip state can genuinely alternatebetween two possibilities. For the ROP proteindimer-interface clique of SerA42a-HisA46-HisB46-SerB42b in 1RPO (Vlassi et al., 1994), both the Seralternate conformation and the His ¯ip state mustdiffer across the two subunits; 1RPO His46 showsup as a half-integer value in the His assignmentsabove, because the two interacting copies must bepositioned asymmetrically, with one ¯ipped andthe other not. Figure 7 shows the two ¯ip states ofHis54 in 1PTX (scorpion toxin), each of which hasone good H-bond to a water molecule whichclashes in the other ¯ip state. Allowance for thepositive effect of CH � � �O H-bonding woulddecrease the severity of the clashes and allow awater molecule to stay (a bit farther out) when thering ¯ipped. However, this histidine residuealmost certainly occurs in a mixture of the twoorientations.

The cases originally assigned as double-clash byReduce predominantly consist of side-chains thatare themselves well de®ned but are bumped byanother slightly mispositioned group, often withhigh B-factors: for example, the double-clashAsn366 in 2OLB (oligopeptide-binding protein;Tame et al., 1995) overlaps Hd of Arg413, but bothfrom the scores of �0.2! versus ÿ8.8!, and from ourvisual examination, it is clear that four goodH-bonds in the original orientation are clearlypreferable to just one in the ¯ipped orientation.Such cases were reassigned in round 3.

Three of the Asn double-clash examples are con-sistent with the possibility of chemical de-amida-tion of the side-chain: 1CSE Asn158E (subtilisin;Bode et al., 1987), 1HFC Asn206 (collagenase;Spurlino et al., 1994), and 1LUC Asn12 (luciferase;Fisher et al., 1996), which is shown in Figure 8. Thetight interactions around Asn12, including twoH-bonds to backbone NH groups, de®nitelyrequire an Od for the left-hand branch of theamide. There is some room around the arginineguanidinium group, and in the unmodi®ed proteinit would need to move farther to the right, away

Asn/Gln Amide Flips, Using Contact Dots 1741

Page 8: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

from what would then be the NH2 group of Asn.However, the fact that the Arg was observed thisclose to the side-chain of residue 12 in the crystalstructure provides circumstantial evidence thatAsn12 may have become de-amidated to Asp.

Overall, less than 14 % of all the Asn/Gln amideorientations in the 100 proteins remain undeter-mined here, once H atom van der Waals inter-actions are considered. Some of those ambiguouscases represent our inadequate level of knowledge,but most of them probably indeed occur in theprotein molecule as a mixture of both orientations.

Discussion

The present analysis departs from commonpractice in two main ways: the use of small-probe

contact dots for explicit visualization and quanti®-cation of molecular interactions, and the placementof all hydrogen atoms, both polar and non-polar,and inclusion of their van der Waals as well asH-bonding contributions. That additional infor-mation made the process of orienting side-chainamide groups much more straightforward andmore often de®nitive than the H-bonding analysesused in previous work. H-optimized coordinate®les for the 100 high-resolution proteins of ourdataset are now available for further structuralanalysis. The programs Reduce, Probe, and Mageare available for adding and optimizing H atoms,for analyzing the contacts in known macromolecu-lar structures, and to help in the determination ofnew structures.

It appears that most side-chain amide groups doindeed have surroundings in the equilibrium pro-tein structure that enforce a unique, and readilyidenti®able, amide orientation. Assigning thoseorientations correctly will help in the details ofre®nement and the identi®cation of water mol-ecules in crystal structure determinations, and will

Figure 7. (a) and (b) Evidence for dynamic equili-brium in the ¯ip state of His54 in the scorpion toxin1PTX (Housset et al., 1994). Although this His ring hasgood B-factors and makes good contact with the struc-ture behind it, each possible ring ¯ip position makes anNd H-bond to a well-ordered water molecule thatclashes with the other ¯ip state. Although those clashescan be partly mitigated by considering CH � � �O H-bonds, presumably His54 occupies both conformations.The probe radius is 0.25 AÊ .

Figure 8. (a) and (b) Asn12 of 1LUCb luciferase(Fisher et al., 1996), an example whose interactions areconsistent with possible side-chain deamidation. Theleft-hand side is clearly compatible only with an Od thatcan H-bond tightly with two backbone NH groups as in(a), rather than an Nd that would clash impossibly asin (b). The fact that the Arg guanidinium group wasobserved in this close position also implies an Od on theright-hand branch of the Asn, both for steric and forelectrostatic compatibility.

1742 Asn/Gln Amide Flips, Using Contact Dots

Page 9: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

aid in analyses of hydrogen bonding, water struc-ture, side-chain conformations, and ligand binding.In particular, the discovery of serious internalclashes in Asn/Gln side-chains as they are ®t evenin the highly accurate structures of our dataset(e.g. the severe Gln He-Ha clash in Figure 4(b))implies that there are problems with existing side-chain rotamer libraries. We plan to use themethods and datasets described here to addressthat issue.

Even using H-bonds and van der Waals inter-actions, there still remain between 10 and 15 % ofthe side-chain amide groups (and also of the Hisrings) whose orientation is ambiguous. A few ofthese cases are due to unresolved problems in thecoordinates, and there would be somewhat moresuch cases in lower-resolution structures. However,at least 10 % of these side-chains are probably indynamic equilibrium in the actual proteinmolecules, such that both ¯ip states would besigni®cantly populated. For some of the unas-signed cases (e.g. Figure 7) the amide or ring planeis well de®ned but the ¯ip is not, while for some ofthe fully exposed, high B-factor cases (more com-mon for Gln than Asn) even the plane orientationis probably dynamic. Thus we feel that the 14 % ofside-chain amide groups left unassigned heremainly represent not a failure of the method, but asuccessful identi®cation of the cases that shouldnot be assigned. It also follows that such unassign-able cases should be omitted or treated separatelyin any statistical analyses of side-chain confor-mations or interactions. To this end, we ¯ag thesecases in the headers of the modi®ed PDB ®les.

Here, we have tried to start out with a simpleand straightforward model and add complicationsonly when we are convinced of their necessity andare also sure that they will contribute in the rightdirection even for the pathological cases caused byoccasional coordinate errors. Our initial aim wassimply adding hydrogen atoms in order to usecontact dots to quantitatively analyze interiorpacking in proteins. It was immediately obviousthis could not be done without ®rst correctingside-chain amide ¯ips, which led to the studydescribed here. In addition, both the ring ¯ip andprotonation state of histidine residues had to beconsidered, although that treatment was developedonly far enough to avoid incorrect His in¯uence onAsn/Gln orientations, since a complete analysis ofHis protonation equilibria would require detailedelectrostatics and knowledge of the pH.

The major, completely necessary, complication inthe present method is of course the combinatorialanalysis of the H-bond network cliques. The tract-ability of the clique analysis depends, in turn,upon keeping several other aspects of the modelsimple: Met methyl groups are the only onesrotated; the probe radius is set to zero so that onlyH-bond and overlap terms enter the combinatorialsearch; and, most importantly, interactions withcrystallographically located water molecules areincluded but water-water interactions are not.

Inclusion of water molecules is crucial for successof the algorithm, but a simpli®ed model that treatstheir possible H-bonds as completely independenthas worked quite well. We preferred not toattempt explicit placement of hydrogen atoms onthe water molecules, because the errors in positionor occupancy for a signi®cant fraction of watermolecules (for example, those that are impossiblyclose to side-chain atoms) could often produceproblems that would propagate through the net-work of water molecules. Interactions across crys-tal contacts were also deliberately omitted, becausewe are more interested in what can be learnedabout the molecular structure than in the crystalstructure for its own sake. There are, of course,many ef®cient search algorithms that could beapplied to optimizing the clique scores. However,since the simplest and most guaranteed method ofcomplete enumeration is indeed fast enough forthe actual cases found, it is the preferred choice.

The contact-dot analysis, in general, is based ongeometry and atom types, rather than on energies.In particular, its treatment of electrostatics is indir-ect and, so far, only short range: hydrogen-bond-ing interactions based on degree of atomic overlap,with charged H-bonds stronger only because theycan overlap further. Certainly those short-range H-bonds are the most dominant single factor affectingamide orientation. As a future modi®cation, wecould incorporate a weighting scheme for the non-overlap contact dots based on atom types, in orderto add mid-range electrostatic preferences fordistances between grazing contact and a probediameter further out. We do not intend to addlong-range electrostatics, however. It would notcombine easily with the geometrical terms, andany simple, dielectric-based treatment is likely tooverestimate the contribution.

Although adding and optimizing hydrogenatoms and determining side-chain amide orien-tations is an apparently simple set of tasks, thenumber and detail of considerations involved andthe variety of unexpected special cases that occurin 100 protein structures are very large. Therefore,the automated algorithm in Reduce is graduallydeveloping into an expert system. Fortunately, forthe most part those developments make it morerobust and easier to use.

Procedures

In exchanging the two possible ¯ip orientations for anAsn or Gln amide, we exchange the identities of the Nand the O atoms rather than doing a bond rotation of180 �. These two methods give slightly different results,because the bond lengths and angles are not identical forthe N and O branches. In an electron density map, thepositions of the amide N/O atoms are more preciselyknown than that of the central C atom, so that this meth-od seems more conservative. Similarly, to ¯ip a His ring,we exchange identities of the d and e C and N atoms;then the assigned ring nitrogen atoms are considered inthree protonation states (see details below).

Asn/Gln Amide Flips, Using Contact Dots 1743

Page 10: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

Definition of contact dots

Contact dot surfaces are loosely related to theConnolly algorithm for calculating solvent-accessible sur-faces on the outside of a molecule (Connolly, 1983), inthat a spherical probe is rolled around the van der Waalssurface of each atom, visiting each of a set of prede®nedpoints, and a dot is drawn if certain tests are satis®ed inthat position. The differences are that the contact dotalgorithm, as implemented by the program Probe andfor some calculations in Reduce, uses a very small probe(typically 0.25 AÊ in radius or smaller, rather than the1.4 AÊ radius used for Connolly surfaces) and leaves adot when the probe does touch another ``not-covalently-bonded'' atom, rather than when it does not touchanother atom. Small-probe dots form discontinuous sur-faces, the patches of which directly show the location,extent, and shape of close atomic contacts (e.g. Figure 1).

One color scheme used here for these contact dots(e.g. Figure 2) re¯ects atom type: C, white; N, blue; Ored; S, yellow; and H in the color of its bonded heavyatom. The NH � � �O hydrogen bonds show as interpene-trating lens shapes in red and blue. Overlapped van derWaals shells of non-polar atoms are emphasized byshowing spikes instead of dots: a spike is a line drawnfrom the dot position to the contact midplane, along theatom radius. An alternative color scheme (as in Figure 1and Figures 6 to 8) re¯ects the gap distance betweenatoms at each dot position: green or yellow for good con-tact (greens for narrow gaps, yellows for slight overlaps),pale green dots for H-bonds, blues for wider gaps(>0.25 AÊ ), orange or red spikes for unfavorable interpe-netrations, or clashes, and hot pink spikes for severeclashes 50.4 AÊ overlap.

Contact dots or spikes are output by Probe as a simpletext ®le of dot lists or vector lists in kinemage format(Mime standard: chemical/x-kinemage), with color,source atom, and contact type speci®ed. They are shownin the Mage display program (Richardson & Richardson,1992, 1994), which supports alternate color schemes,atom or dot identi®cation by picking, turning on or offgroups by atom type or by contacts versus clashes versusH-bonds, saving many local views within a large struc-ture, and animating between different forms. Probe canalso format output for display as graphics objects in O(Jones et al., 1991) or in XtalView (McRee, 1993), so thatcontact dots can be used to help rebuild models in elec-tron density maps.

Adding H atoms

Small-probe contact dots require the use of all explicitH atoms. The program Reduce adds them to PDB-formatcoordinate ®les, using local geometry. Methyl hydrogenatoms are added in staggered positions, and only theMet methyl groups are rotationally optimized. OH, SH,and NH3

� hydrogen atoms are rotationally optimizedand His protonation assigned as part of the H-bond net-work analysis described below. Water molecules aretreated by presuming that they can always orient so asto present whatever is needed for each interaction. Ifwater molecules are extremely close, we restrict their H-bonding score to a reasonable value. The details of theseprocedures, the choice of parameters for bond lengthsand van der Waals radii, and further details of the con-tact-dot methodology are explained in the accompanyingpaper (Word et al., 1999).

Especially in the context of evaluating amide orien-tations, we must justify using any van der Waals terms

at all for polar H atoms, since they are set to zero inmany energy calculations. For instance, Hagler et al.(1974) optimized force-®eld parameters to agree withcrystal dimensions, vaporization energies, etc. for a var-iety of small molecules with H-bonded amide groups,reaching the conclusion that van der Waals terms fromthe polar H atoms do not make a signi®cant contributionbeyond what can be ®t with only electrostatic terms andvan der Waals interactions for the heavy atoms. Simi-larly, Berendsen et al. (1981) found polar H van derWaals terms unnecessary for obtaining a satisfactorymodel of water interactions. Those conclusions areundoubtedly justi®ed for the cases analyzed, where allamide or water interactions are through H-bonds. How-ever, such studies do not address the issue of what par-ameters are needed in the less typical but still fairlycommon cases where non-polar groups contact amide Hatoms. In our current analysis, such parameters areabsolutely essential for evaluating the counterfactualnon-polar-to-amide clashes that are characteristic ofwrong ¯ip choices for side-chain amide groups. Ourapproach returns to application of very simple physicalprinciples: as stated for instance in the classic crystallo-graphic text by Stout & Jensen (1968), ``Any postulatedarrangement of atoms . . . must ful®l the simple stericrequirement that no two atoms should approach closerthan the sum of their van der Waals radii unless they arebonded together.''

In order to decide the best polar H radius for use incalculating contact dots, we have analyzed the spacingsactually seen for non-polar-to-amide contacts and thencompared the proposed radii to the shape of electron-density distributions calculated by quantum mechanics.Figure 9(a) shows calculated electron density contoursfor an NH group (Bader et al., 1967), the nearly roundshape of which has been used to argue for a negligibleeffect of the polar H van der Waals terms (Hagler et al.,1974). However, equivalent contours lie about 0.5 AÊ

farther out in the H direction than elsewhere, a differ-ence that can be quite crucial in tightly packed regions.Shown overlaid in Figure 9(a) are the simple radii of1.55 AÊ for N and 1.0 AÊ for the polar H, which ®t theshape of the electron-density contours almost perfectly.For comparison, Figure 9(b) shows the equivalent over-lay for the non-polar CH group; standard radii ®t thecalculated electron density equally well in both cases.The difference in contour level matched is withinthe uncertainty in our present parameters and has theadvantage of keeping the NH radii conservatively onthe small side.

Scoring

Quantitative measures for goodness-of-®t are thende®ned in ways that seek to capture the insights gainedfrom the contact-dot visual representation of packinginteractions. We have found three levels of scoring to beuseful in analyzing protein structures. As in the standardde®nition of van der Waals energies, our general scoringsystem is a sum of competing terms, but the contactscores are evaluated per dot, not per atom pair, and arethen summed. Hydrogen bonds and other overlaps arequanti®ed by the volume of overlap. Each non-over-lapped contact dot is counted with an error-functionweight of:

w �gap� � eÿ gap

err

� �2

1744 Asn/Gln Amide Flips, Using Contact Dots

Page 11: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

where gap is the distance from the dot to the otheratom's surface, and err is taken as the probe radius, typi-cally 0.25 AÊ for general evaluation of goodness-of-®t, asused in the accompanying paper. If the probe radius iszero that term is not produced. Multiplying overlapvolume by 10 and H-bond volume by 4 before addingthe terms, gives an overall scoring pro®le similar inshape to the van der Waals function for an isolated pair-wise interaction, thus:

score �Xdots

w �gap� � 4 Vol �Hbond� ÿ 10 Vol �Overlap�

Scores can then be normalized by possible surface area.For all of the scores used here (such as output from theclique analysis in Reduce), only the overlap and H-bondterms are included; these scores are not normalized byarea. This is equivalent to, and achieved by, setting theprobe radius to zero. (The accompanying paper uses nor-malized scores with the contact dot term and also a``clashscore'', which is the number of severe overlaps50.4 AÊ per 1000 atoms.)

Asn, Gln survey in reference datasets

The set of 100 protein structures used here is listedin Table 1. The accompanying paper (Word et al., 1999)describes the criteria for their choice, including resol-ution (1.7 AÊ or better), R-value, non-homology, and

absence of any unusual problems, starting from thePDB index of January 13, 1997. Only one identical sub-unit was used from each ®le, except for three caseswith unusually tight interactions: 1DIF, 1RPO, and2WRP, where interactions to the second subunit arescored but only one set of side-chains is tabulated.

These 100 proteins contain 1554 unique Asn and Glnresidues, 15 of which are ®xed by metal ligation orcovalent modi®cations. For an initial screen (round 1),contact dots and scores were calculated for each one ofthem (only using `a' if there are alternate conformations),both in the originally assigned position and also with theside-chain N and O atoms interchanged (¯ipped), includ-ing interactions to water molecules, but omitting anyclashes of atoms with B-factor 540 or with alternate con-formations, producing comparisons like those rep-resented in Figures 2 and 3. Cases for which the ¯ippedscore was not obviously inferior were examined in Magein order to learn what range of circumstances to expect.However, since side-chain amide groups often interacteither with one another or with other hydrogen atoms(such as OH groups) that require positional optimiz-ation, de®nitive assignments of amide ¯ips must be donewithin interacting networks rather than individually.

These interacting closed sets of side-chains with ¯ip-pable groups or rotatable H atoms (most often local H-bond networks) were analyzed in a series of steps.First, Reduce identi®ed all metal-liganding or cova-lently modi®ed groups (listed for Asn/Gln in Table 1)and ®xed their orientations. Then all potentially-inter-acting pairs and larger closed sets among the remain-ing side chains or heterogen groups are identi®ed byconsidering their full range of possible hydrogen pos-itions (in both orientations for ¯ips and each 10 � forrotations) along with the positions of ¯ippable heavyatoms. At each such position, a sphere is placed withthe van der Waals radius of the corresponding atom. Ifa sphere from one adjustable group overlaps a sphereof another group, then those two groups can interact.Such pairwise interactions are then gathered into dis-joint sets, which we call cliques, in the sense that theirmembers all interact internally, but not with any adjus-table group outside the clique. The cliques do not pro-pagate through water molecules, because a watermolecule is assumed here to be able to act as either anH-bond donor or acceptor independently, even if itmakes more than one polar interaction. Only watermolecules with B < 40 and occupancy 50.66 are usedin this analysis. If a side-chain has alternate confor-mations, the `a' is used but not the `b'.

An Asn or Gln amide has only two possible states,and all of its interactions must switch between donorand acceptor in synchrony. A His has two ¯ip states forthe ring as a whole; within each of those we considerthree possible protonation states (H only on Nd, H onlyon Ne, or doubly protonated with a small penalty of0.05), so that its two H-bonds usually but not alwayschange in correlation. Double deprotonation is allowedonly if the His ligands two metal ions, such as for His61in the 1XSO superoxide dismutase (Carugo et al., 1996).

For fully rotatable H atoms, before undertaking thecombinatorial step, we use the following process toreduce the number of states that must be considered.For each OH or SH, orientations for the rotatable Hatom are selected in the direction of each one of thepotential H-bond acceptors surrounding it. If the accep-tor is too close for an acceptable straight-line H-bond,then potential H orientations are also de®ned 15 � (and,

Figure 9. Total molecular charge density contours inatomic units (Figure modi®ed from Bader et al., 1967),overlaid with, in (a) a nitrogen van der Waals radius of1.55 AÊ and a hydrogen radius of 1.0 AÊ at a distance of1.0 AÊ ; (b) a carbon radius of 1.75 AÊ and a hydrogenradius of 1.17 AÊ at a distance of 1.1 AÊ .

Asn/Gln Amide Flips, Using Contact Dots 1745

Page 12: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

if necessary, also 30 �) on either side of it. Finally, anadditional orientation is located which avoids theseacceptors and has minimal interaction with all sur-rounding atoms. For an OH surrounded by three H-bond acceptors, for instance, we would thus de®nefour potential positions to be tried in the combinatorialsearch; but if the acceptors were all close, there wouldbe ten potential H positions (three near each acceptorand one that avoids them all). Each Met methyl andeach Lys or N-terminal NH3

� is considered in fourpossible orientations, 30 � apart. Each side-chain in aninteracting clique has a fairly small number of possibleH arrangement states that must be considered. Sincethe largest clique found in our 100 reference structurescontains six members (for a later set of 240 proteinstested, one clique had eight members), an exhaustivesearch is computationally tractable in practice.

The isolated OH, SH, NH3�, and Met methyl groups

are rotationally optimized in Reduce by samplingevery preassigned orientation and then testing 1 � incre-ments around the best one; the ®nal rotation angle andscore for the best position are reported, and the opti-mized H atom is added to the output PDB ®le. Eachclique of two or more is then analyzed by an exhaus-tive search through all combinations of the preassignedpotential H positions for all residues in the clique, plusa ®nal rotational optimization around the best pre-assigned position. For each residue, the programaccumulates its best score, the best total score for itsclique, and (for Asn, Gln, or His) the best total cliquescore found with this residue in its opposite ¯ip state.Those scores and the best assignment (e.g. ``FLIP -� both NH'' for a His) are written both to the screenand onto the header of the output PDB ®le, and Hatoms for the chosen clique conformation are added tothe PDB output ®le. Each Asn/Gln/His (unless ®xedin advance by a covalent modi®cation) is assigned toone of four categories: Keep (K), Flip (F), double-Clash(C), or unknown (X). An adjustable penalty can beapplied to the difference between the best score andthe best ¯ipped score, in order to automatically leavethe marginal cases in the state originally assigned bythe depositors. Alternatively, they can be examinedand the ¯ip state decided by the user.

In order to obtain a visual comparison of the alterna-tive states, Reduce is rerun, using the ®le header infor-mation from its previous output to specify that it nowshould optimize cliques (OH rotations, etc.) with each¯ippable group ®xed in the non-preferred orientation.Both output PDB ®les are used by a Unix script that pro-duces a kinemage with precalculated views scaled andcentered on each Asn/Gln/His, so that they can easilybe examined in Mage using animation to switch betweenthe two alternative arrangements. The user can thendecide whether to reject any of the automated assign-ments.

Program and data availability

The annotated list of 100 high-resolution structures,the coordinate ®les with H atoms added and Asn/Gln¯ips corrected, and the contact dot kinemage ®les withanimated Asn/Gln comparisons, plus the programsReduce, Probe, Prekin, Mage, and supporting scripts andutilities are available from the anonymous FTP site(ftp://kinemage.biochem.duke.edu) or the WorldWide-Web site (http://kinemage.biochem.duke.edu). Probe isa generic Unix C program; the current version (v2) of

Reduce that includes H-bond clique optimization is inC��. Mage and Prekin are in C, available for Mac, PC,Linux, and SGI Unix, and can be re-compiled to runon most Unix platforms where Motif is available.A stripped-down version of Mage written in Java is usedon our Web site to provide real-time interactive displayof small kinemages with contact dots.

Acknowledgments

This work was supported by NIH research grantGM-15000, by use of the Duke Comprehensive CancerCenter Shared Resource for Macromolecular Graphics,and by an educational leave for J.M.W. from the GlaxoWellcome Inc.

References

Alzari, P. M., Souchon, H. & Dominguez, R. (1996). Thecrystal structure of endoglucanase CelA, a family 8glycosyl hydrolase from Clostridium thermocellum.Structure, 4, 265-275.

Bader, R. F. W., Keaveny, I. & Cade, P. E. (1967).Molecular charge distributions and chemical bond-ing. II. First-row diatomic hydrides, AH. J. Chem.Phys. 47, 3381-3402.

Bass, M. B., Hopkins, D. F., Jaquysh, W. A. N. &Ornstein, R. L. (1992). A method for determiningthe positions of polar hydrogens added to a proteinstructure that maximizes protein hydrogen bonding.Proteins: Struct. Funct. Genet. 12, 266-277.

Berendsen, H. J. C., Postma, J. P. M., van Gunsteren,W. F. & Hermans, J. (1981). Interaction models forwater in relation to protein hydration. In Intermole-cular Forces (Pullman, B., ed.), pp. 331-342, D.Reidel Publishing Company, Boston.

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,E. F., Brice, M. D., Rodgers, J. R., Kennard, O.,Shimanouchi, T. & Tasumi, M. (1977). The ProteinData Bank: a computer-based archival ®le formacromolecular structures. J. Mol. Biol. 112, 535-542.

Bode, W., Papamokos, E. & Musil, D. (1987). The high-resolution X-ray crystal structure of the complexformed between subtilisin Carlsberg and eglin c, anelastase inhibitor from the leech, Hirudo medicinalis.Eur. J. Biochem. 166, 673-692.

Carugo, K. D., Battistoni, A., Carri, M. T., Polticelli, F.,Desideri, A., Rotilio, G., Coda, A., Wilson, K. S. &Bolognesi, M. (1996). Three-dimensional structure ofXenopus laevis Cu,Zn superoxide dismutase b deter-mined by X-ray crystallography at 1.5 AÊ resolution.Acta Crystallog. sect. D, 52, 176-188.

Connolly, M. L. (1983). Solvent-accessible surfaces ofproteins and nucleic acids. Science, 221, 709-713.

Derewenda, Z. S., Lee, L. & Derewenda, U. (1995). Theoccurrence of C-H � � �O hydrogen bonds inproteins. J. Mol. Biol. 252, 248-262.

Epp, O., Lattman, E. E., Schiffer, M., Huber, R. & Palm,W. (1975). The molecular structure of a dimer com-posed of the variable portions of the Bence-Jonesprotein REI re®ned at 2.0-AÊ resolution. Biochemistry,14, 4943-4952.

Fisher, A. J., Thompson, T. B., Thoden, J. B., Baldwin,T. O. & Rayment, I. (1996). The 1.5-AÊ resolutioncrystal structure of bacterial luciferase in low saltconditions. J. Biol. Chem. 271, 21956-21968.

1746 Asn/Gln Amide Flips, Using Contact Dots

Page 13: Asparagine and Glutamine: Using Hydrogen Atom Contacts in ...kinemage.biochem.duke.edu/teaching/workshop/CSHL2012/pdfs/199… · Asparagine and Glutamine: Using Hydrogen Atom Contacts

Flory, P. J. (1969). Statistical Mechanics of Chain Molecules(Jackson, J. G. & Wood, C. J., eds), 1st edit., vol. 3,Interscience Publishers, New York.

Hagler, A. T., Huler, E. & Lifson, S. (1974). Energyfunctions for peptides and proteins. I. Derivation ofa consistent force ®eld including the hydrogenbond from amide crystals. J. Am. Chem. Soc. 96,5319-5327.

Hooft, R. W. W., Sander, C. & Vriend, G. (1996). Posi-tioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures. Proteins: Struct.Funct. Genet. 26, 363-376.

Housset, D., Habersetzer-Rochat, C., Astier, J.-P. &Fontecilla-Camps, J. C. (1994). Crystal structure oftoxin II from the scorpion Androctonus australisHector re®ned at 1.3 AÊ resolution. J. Mol. Biol. 239,88-103.

Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M.(1991). Improved methods for building proteinmodels in electron density maps and the location oferrors in these models. Acta Crystallog. sect. A, 47,110-119.

Matsuura, Y., Takano, T. & Dickerson, R. E. (1982).Structure of cytochrome c551 from Pseudomonasaeruginosa re®ned at 1.6 AÊ resolution and compari-son of the two redox forms. J. Mol. Biol. 156, 389-409.

McDonald, I. K. & Thornton, J. M. (1994). Theapplication of hydrogen bonding analysis in X-raycrystallography to help orientate asparagine,glutamine and histidine side chains. Protein Eng. 8,217-224.

McRee, D. E. (1993). Practical Protein Crystallography, 1stedit., Academic Press, San Diego.

Richardson, D. C. & Richardson, J. S. (1992). The kine-mage: a tool for scienti®c illustration. Protein Sci. 1,3-9.

Richardson, D. C. & Richardson, J. S. (1994). Kinemages:simple macromolecular graphics for interactive

teaching and publication. Trends Biochem. Sci. 19,135-138.

Serre, L., Verbree, E. C., Dauter, Z., Stuitje, A. R. &Derewenda, Z. S. (1995). The Escherichia coli malo-nyl-CoA:acyl carrier protein transacylase at 1.5-AÊ

resolution. J. Biol. Chem. 270, 12961-12964.Spurlino, J. C., Smallwood, A. M., Carlton, D. D., Banks,

T. M., Vavra, K. J., Johnson, J. S., Cook, E. R.,Falvo, J., Wahl, R. C., Pulvino, T. A., Wendoloski,J. J. & Smith, D. L. (1994). 1.56 AÊ structure ofmature truncated human ®broblast collagenase.Proteins: Struct. Funct. Genet. 19, 98-109.

Stout, G. H. & Jensen, L. H. (1968). X-ray StructureDetermination: A Practical Guide, vol. 12, The Mac-Millan Company Collier-MacMillan Ltd, London.

Tame, J. R. H., Dodson, E. J., Murshudov, G., Higgins,C. F. & Wilkinson, A. J. (1995). The crystalstructures of the oligopeptide-binding proteinOppA complexed with tripeptide and tetrapeptideligands. Structure, 3, 1395-1406.

Vassylyev, D. G., Katayanagi, K., Ishikawa, K.,Tsujimoto-Hirano, M., Danno, M., Pahler, A.,Matsumoto, O., Matsushima, M., Yoshida, H. &Morikawa, K. (1993). Crystal structures of ribonu-clease F1 of Fusarium moniliforme in its free formand in complex with 20 GMP. J. Mol. Biol. 230, 979-996.

Vlassi, M., Steif, C., Weber, P., Tsernoglou, D., Wilson,K. S., Hinz, H.-J. & Kokkinidis, M. (1994). Restoredheptad pattern continuity does not alter the foldingof a four-a-helix bundle. Nature Struct. Biol. 1, 706-716.

Word, J. M., Lovell, S. C., LaBean, T. H., Taylor, H. C.,Zalis, M. E., Presley, B. K., Richardson, J. S. &Richardson, D. C. (1999). Visualizing and quantify-ing molecular goodness-of-®t: small-probe contactdots with explicit hydrogen atoms. J. Mol. Biol. 285,1711-1733.

Edited by J. Thornton

(Received 28 May 1998; received in revised form 2 November 1998; accepted 3 November 1998)

Asn/Gln Amide Flips, Using Contact Dots 1747