Protein flexibility predictions using graph theory · 2005-12-30 · work of a protein can then be analyzed using new graph theoretical techniques that were originally developed for

Protein Flexibility Predictions Using Graph TheoryDonald J. Jacobs,1 A.J. Rader,1 Leslie A. Kuhn,2* and M.F. Thorpe1

1Department of Physics and Astronomy, Michigan State University, East Lansing, Michigan2Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan

ABSTRACT Techniques from graph theory areapplied to analyze the bond networks in proteinsand identify the flexible and rigid regions. The bondnetwork consists of distance constraints defined bythe covalent and hydrogen bonds and salt bridges inthe protein, identified by geometric and energeticcriteria. We use an algorithm that counts the de-grees of freedom within this constraint network andthat identifies all the rigid and flexible substruc-tures in the protein, including overconstrained re-gions (with more crosslinking bonds than are neededto rigidify the region) and underconstrained orflexible regions, in which dihedral bond rotationscan occur. The number of extra constraints or re-maining degrees of bond-rotational freedom withina substructure quantifies its relative rigidity/flexibil-ity and provides a flexibility index for each bond inthe structure. This novel computational procedure,first used in the analysis of glassy materials, isapproximately a million times faster than moleculardynamics simulations and captures the essentialconformational flexibility of the protein main andside-chains from analysis of a single, static three-dimensional structure. This approach is demon-strated by comparison with experimental measuresof flexibility for three proteins in which hinge andloop motion are essential for biological function:HIV protease, adenylate kinase, and dihydrofolatereductase. Proteins 2001;44:150–165.© 2001 Wiley-Liss, Inc.

Key words: conformational change; mobility and dy-namics; coupled/collective motions; hy-drogen-bond networks; distance con-straints; dihedral angle constraints androtations; structural stability; dihydro-folate reductase; adenylate kinase

INTRODUCTION

More than 15,000 protein structures have been deter-mined to date, using X-ray crystallography and nuclearmagnetic resonance (NMR) spectroscopy.1 Such structuresare deduced from sophisticated refinement procedures inconjunction with stereochemical modeling.2 Proteins canbe described as a collection of stable fragments,3 rangingin size from a small number of residues to an entiredomain. Different packings of the protein molecules inalternative crystal forms can trap the protein in differentconformational states, providing snapshots of some of theconformations accessible to the protein.3 NMR spectros-

copy can also indicate dynamic regions of proteins byshowing that several conformations are consistent withthe experimental constraints, typically inter-proton dis-tances.4

Other computational procedures have been devel-oped5–10 to characterize the intrinsic flexibility and rigid-ity within a protein. These procedures fall within threeclasses. One approach compares different conformationalstates, e.g., from different crystallographic or NMR confor-mations observed for the protein, to deduce which regionsin the protein are flexible or rigid.5–7 A second approachfocuses on simulating a protein’s motion, by means of molecu-lar dynamics calculations, using a forcefield describing inter-atomic potentials.11–14 The third class focuses on identifyingrigid protein domains or flexible hinge joints8–10,15,16 basedon a single conformation. Each method has its limitations.In the first class, the methods are limited by the diversityof the conformational states that are available from experi-ment for comparison, whereas the second class is limitedby the computational time involved, meaning that largemotions of the protein are usually not sampled. The thirdclass, to which our approach belongs, defines the rigid andflexible regions in the protein from a single conformationand can be used as the starting point to sample a range ofmotions. These methods are generally computationallyfast and can provide a starting point for more efficientmolecular dynamics or Monte Carlo sampling of proteinconformations, as the number of degrees of freedom in theprotein is reduced significantly by defining which regionsare rigid. A significant question is, as always: to whatextent do such methods for defining rigid/flexible regionscorrelate with what is observed experimentally?

The bonds within a protein can be represented as anetwork in which the covalent forces and strong hydrogenbonds are modeled as distance constraints between atoms.The mechanical stability of the corresponding bond net-

Grant sponsor: National Science Foundation; Grant number:DMR-96 32182; Grant number: DBI-96 00831; Grant sponsor: Na-tional Institutes of Health; Grant number: R43 GM58337-01; Grantsponsor: Michigan State University; Grant sponsor: American HeartAssociation; Grant number: 9940091N.

Donald J. Jacobs is currently at the Department of Physics andAstronomy, California State University Northridge, Northridge, CA91330.

Leslie A. Kuhn and M.F. Thorpe are contributing authors.*Correspondence to: Leslie A. Kuhn, Department of Biochemistry

and Molecular Biology, Michigan State University, East Lansing, MI.E-mail: [email protected].

Received 23 September 2000; Accepted 15 March 2001

Published online 00 Month 2001

PROTEINS: Structure, Function, and Genetics 44:150–165 (2001)

© 2001 WILEY-LISS, INC.

work of a protein can then be analyzed using new graphtheoretical techniques that were originally developed foranalyzing the rigidity of substructures within covalentnetwork glasses.17–19 The computer implementation20 ofthis procedure is referred to as Floppy Inclusion and RigidSubstructure Topography (FIRST), which provides a real-time tool for evaluating the intrinsic flexibility within aprotein. FIRST gives the precise mechanical properties ofa protein structure under a given set of constraints. Thisapproach defines not only the rigid regions in a protein,but also those regions that move collectively (whose mo-tions are coupled), as well as those that move indepen-dently of other regions in the structure. Furthermore, therelative flexibility or rigidity of each region is quantified,based on the density of bonds remaining rotatable in eachflexible region. This article applies this approach to defin-ing the flexible regions and their relationship to ligandbinding in three diverse proteins.

METHODS

Both bonding and nonbonding forces play an importantrole in determining the structure of a protein and thedynamics about the native fold. The nonbonding forces areboth short- and long-range. Although both hydrophobicinteractions and van der Waals forces, as well as long-range electrostatic forces, play an important role in stabi-lizing a protein structure in the native state,21 these forcesare generally weak between pairs of atoms and not highlydirectional, and thus are not represented as interatomicdistance constraints. Strong bonding forces, such as cova-lent and hydrogen bonds, tend to be directional and aremodeled in the present study as distance and angleconstraints.

Constraints

The covalent bonding within the protein resulting frombond-stretching (central), bond-bending, and torsionalforces defines a natural set of interactions that can bemodeled as distance constraints. It is common practice torepresent the degrees of freedom accessible to a protein byfixing the covalent bond lengths and associated bondangles, while allowing the dihedral angles to rotate. Thetorsional forces associated with peptide bonds and theother partial-double and double bonds in proteins effec-tively prevent dihedral rotation about the bond and arealso represented as constraints. Using the rotatable dihe-dral angles as a set of internal coordinates, the number ofdegrees of freedom to describe the flexibility of a protein istypically reduced by a factor of about 7 relative to aCartesian representation.22

In addition to covalent bonding forces, hydrogen bonds(Fig. 1) have high directional dependence and act overshort distances, in contrast to the hydrophobic forces,which are less specific and may be regarded as slippery. Bythis, we mean that the energy associated with the hydro-phobic forces does not change significantly for gentleconformational shifts away from the native state. It isreasonable to expect that the buried hydrogen bonds willbe substantially maintained as the protein undergoes

conformational changes near its native structure. Recentresults by Lu and Schulten23 suggest that the breaking ofa hydrogen bond occurs as a well-defined event, involvinggoing over an energy barrier, as opposed to a continuousstretching until a feeble final breaking occurs. Theseinvestigators suggest that hydrogen bonds typically breakone by one as the protein unfolds, while in some cases aconsortium of hydrogen bonds break simultaneously, giv-ing a much higher effective barrier.

Rigidity in Networks

Our graph-theory algorithm20,24 for analyzing proteinsis a three-dimensional (3-D) extension and implementa-tion of results in mathematical rigidity theory that havedeveloped over the past few years. The roots of this work goback to Lagrange’s25 introduction of constraints on themotion of mechanical systems during the late eighteenthcentury, which Maxwell26 used during the late nineteenthcentury to determine whether structures were stable ordeformable. The applications of this type of work havetraditionally been to solve problems in engineering, suchas the structural stability of different truss configurationsin bridges. A very significant advance occurred with La-man’s theorem27 in 1970, which precisely determines thedegrees of freedom within two-dimensional (2-D) net-works, and allows the rigid regions and flexible jointsbetween them to be found. The most general type of 3-Dnetwork for which results can be calculated are theso-called bond-bending networks (or, equivalently, molecu-lar frameworks), in which vertices are connected by edgesand in which every angle between edges is defined. Abroader class of networks is the bar-joint framework, inwhich the angles are not specified; analyzing flexibilityand rigidity for this case remains a significant unsolvedproblem in mathematics. For 3-D bond-bending networks,the flexibility in the system derives from dihedral ortorsional rotations of the bonds that are not locked in bythe network, and this kind of framework is used torepresent the constraints within a protein.

First, we present a general algorithm using brute-forcematrix diagonalization to identify all the rigid clusters in a

Fig. 1. Model of a hydrogen bond, involving donor and acceptoratoms, shown as nitrogen and oxygen, respectively. Covalent bonds areshown as thin black lines. The hydrogen bond is modeled as threedistance constraints, consisting of a nearest-neighbor central-force con-straint shown as a thick solid line (top center), and two next-nearest-neighbor bond-bending force constraints (constraining the donor–hydrogen–acceptor angle), shown as dashed lines. Each hydrogen bondis also associated with three a priori rotatable dihedral angles, indicatedby the arrows.

PROTEIN FLEXIBILITY PREDICTIONS USING GRAPH THEORY 151

framework consisting of a network of atoms with intercon-necting bonds. The fundamental step on which all suchcalculations are based is the ability to test whether aconstraint (bond between atoms) is redundant or indepen-dent. A constraint is considered redundant if breaking ithas no effect on the flexibility of the network, and indepen-dent if its breakage does affect the flexibility. This can bedetermined using the following procedure:

1. Define a network of atoms and distance constraintsbetween them.

2. Replace each distance constraint by a spring. All springconstants are set to unity in arbitrary units, and thenatural length of each spring is set equal to the distancebetween the associated pair of atoms.

3. Construct a dynamical matrix28 for the spring network,and calculate all the normal mode frequencies (eigenval-ues) of this network. The eigenvalues indicate whetherthe mode of vibration has a finite or zero frequency.

4. Count the number of zero eigenvalues, corresponding tozero-frequency vibrational modes, or floppy modes, ofthe system.

5. Add to the system the distance constraint being testedfor redundancy or independence, and repeat the abovesteps.

6. If the number of zero eigenvalues remains the same, theadded constraint is redundant; otherwise, it is indepen-dent.

This procedure is repeated, using steps 5 and 6 above,for each constraint in the network that is to be tested forredundancy or independence. Afterwards, in order toidentify all the rigid clusters of atoms within the network,called a rigid cluster decomposition, a test constraint isplaced between each pair of atoms. Only test constraintsfound to be redundant are added to the network. Once allpairs of atoms have been tested and all redundant con-straints added, the rigid clusters are identified by selectingall atoms that are connected by a contiguous path offace-sharing tetrahedrons, which form a rigid pathway.Tetrahedra are the 3-D generalization of triangles, wherea pathway of edge-sharing triangles forms a rigid path intwo dimensions.

The implementation described above can be used on anytype of 3-D framework, but the resulting computationalperformance scales as O(N2 3 N3), where N is the numberof vertices (atoms) in the framework, rendering it uselessfor application to protein structures with thousands ofatoms. Also of interest are the regions that have moredistance constraints than are required for rigidity, that is,that have redundant constraints within the rigid clusters.When redundant constraints are present, this createsregions of internal strain. In order to measure the strain,an additional separate numerical algorithm must be usedto relax the network and calculate the stress in thesprings. These methods, such as conjugate gradient, typi-cally scale as O(N2).

Clearly, the distance constraint approach will only beuseful in as much as the calculation for identifying rigid

clusters and flexible regions is fast and scales linearly ornearly linearly with system size. The two main contribu-tions of the present article are to show how to use conceptsfrom graph rigidity29 to make this type of calculationfeasible and then to present applications of this approachto three diverse proteins, showing that their predictedconformational flexibility correlates well with known, bio-logically relevant flexible regions in these proteins.

Generic Rigidity

A generic structure is one that has no special symme-tries, such as parallel bonds or bond angles of 180°, thatcould create geometric singularities.30 Under these condi-tions, the study of rigidity becomes much easier becausethe properties of network rigidity depend only on theconnectivity as determined by an underlying graph. La-man’s theorem27 states how generic rigidity within any2-D generic framework can be characterized completely byapplying constraint counting to all the subgraphs withinthe framework. Applying Laman’s theorem directly leadsto a combinatorial calculation that scales exponentially inthe number of atoms within the framework. However, byapplying Laman’s theorem recursively, a very fast andefficient integer algorithm (the so-called “pebble game”)has been constructed17,31,32 for identifying rigid clustersand stressed regions. The computational complexity of thepebble game scales in the worst case as O(N2) for pathologi-cal networks, where N is the number of vertices or atomsin the network, and scales linearly in practice. The algo-rithm also scales linearly with N in computer memory.

Unfortunately, it has been a longstanding problem thatin 3-D generic bar-joint frameworks, constraint countingover the subgraphs is known to fail in general; that is,Laman’s theorem does not simply generalize to 3-D bar-joint frameworks. However, for the special class of trussframeworks, in which the bars and joints have extent andthe bars are subject to bond-bending angle constraints,Laman’s theorem can be generalized18,24 to three dimen-sions. In addition, the molecular framework conjectureproposed by Tay and Whiteley33,34 indicates that Lamanconstraint counting extends to nongeneric molecular mod-els in which all bonds (hinges) of an atom pass through asingle central point of the atom. Although the molecularframework conjecture requires a rigorous proof, there areno known exceptions after years of exact testing. Thus, wemodel the microstructure of a protein using distanceconstraints that define a bond-bending network and applythe 3-D pebble game to determine precisely the rigidclusters, stressed regions, and internal degrees of freedomin terms of dihedral angles.

Pebble Game

At the heart of the FIRST computer program designed toanalyze protein structures is the 3-D pebble game algo-rithm, constructed in a very similar way to its 2-D counter-part.17,18,31,32 The pebble game is an implementation ofthe counting implicit in Laman’s theorem in two dimen-sions, as well as its 3-D generalization. Here, three pebblesare assigned to each vertex or atom, representing the 3

152 D.J. JACOBS ET AL.

degrees of freedom of the atom. Each bond is representedby a distance constraint, or edge. For each independentedge between two vertices, a pebble from one of the twovertices must be used to cover the edge. Pebbles thatremain associated with vertices are called free pebbles,and they represent the independent degrees of freedomremaining to this vertex within the framework. Eachindependent edge thus uses up one independent degree offreedom from a vertex. Then the following covering rule isapplied: once an edge is covered by a pebble (forming anindependent constraint), it must always remain covered byone of the pebbles associated with its incident vertices.Rearrangement of pebbles throughout the graph is pos-sible, provided that this covering rule is maintained andthat three pebbles remain associated with each vertex, asfree pebbles or as pebbles associated with edges comingfrom the vertex. Figure 2 shows a pebble covering for anetwork. The pebble covering is a convenient way toindicate the degrees of freedom accessible to atoms, and tomark which constraints are independent and which areredundant in the bond network. Essentially the redistribu-tion of pebbles from vertices to bonds, by a series ofstepwise pebble exchanges, is an intuitive way of represent-ing a dynamically changing directed graph.24,31

The 3-D pebble game is a recursive algorithm. Theframework is built up by adding one distance constraint ata time, until the final framework is complete. To maintaina bond-bending network, each bond-stretching distanceconstraint having incident vertices v1 and v2 has associ-ated with it angular (i.e., second-nearest-neighbor) con-straints about both vertices. In the case of proteins, theseangular constraints correspond to the angles of bondcoordination about an atom, defined by its chemistry. Foreach new independent distance constraint introduced,pebbles are rearranged to test whether the new constraintis independent; this is indicated by whether a pebble can

be moved to that constraint from one of its vertices. If thenew distance constraint is independent, it is then coveredby a pebble; otherwise, it is not covered. This processcontinues until all distance constraints within the networkhave been placed and tested for independence. The algo-rithm can be sketched as follows:

1. Begin with a set of unconstrained vertices, each withthree pebbles, representing the three degrees of free-dom for each vertex in 3-D space.

2. Place a central-force distance constraint between verti-ces v1 and v2, thereby building the framework up onedistance constraint at a time.

3. Rearrange the pebble covering, if necessary, to collectthree pebbles on vertex v1.

4. Rearrange the pebble covering to collect as many pebblesas possible (where three is the maximum) on vertex v2,while holding the three pebbles at vertex v1.

5. If the number of pebbles on vertex v2 is two, the edge isredundant. Otherwise, three pebbles reside at vertices(v1 and v2) and both remain independently mobile.Continue to rearrange the pebble coverings:a. Hold the three pebbles on vertex v1 and the three

pebbles on vertex v2.b. For each neighbor of vertex v2, attempt to collect a

pebble from a neighboring bond, restoring a pebbleto that bond from its other vertex.

c. If for any neighbor of vertex v2 a pebble cannot beobtained, the edge joining the two vertices is redun-dant.

d. If the edge is not redundant, cover it with a pebblefrom vertex v2.

Unlike the 2-D pebble game, the distance constraintscannot be placed in random order. The first distanceconstraint that is introduced must correspond to a central-force constraint.

After each central-force distance constraint is placed, allits associated angular or bond-bending constraints (next-nearest-neighbor distance constraints) must be placedbefore another central-force constraint can be introduced.Within this restriction, the order of placing either centralforce or the associated bond-bending constraints is com-pletely arbitrary, and the resulting decomposition of thenetwork into rigid clusters is unique. This restriction onrecursively placing constraints is sufficient18,24 for con-straint counting to remain valid in characterizing therigidity of 3-D bond-bending networks within proteins orother structures. Finally, torsional constraints for thepeptide and resonant bonds are fixed by third-nearest-neighbor distance constraints (e.g., fixing the distancebetween the amide H and carbonyl O in peptide bonds).These are most conveniently placed after the central andbond-bending distance constraints.

After all distance constraints have been placed, thenumber of free pebbles remaining on the vertices gives thetotal number of degrees of freedom required to describe themotion of the framework. This includes the six trivial rigidbody translational and rotational degrees of freedom of the

Fig. 2. Diagrammatic representation of the final pebble covering of asimple network. Free pebbles are placed directly on vertices and denotedegrees of freedom (E). Pebbles covering a bond are placed directly onthe edges of the graph and represent distance constraints (F). The pebblecovering is not unique, because pebbles can be rearranged according to afew simple rules as explained in the text. An example is shown of how anelementary pebble exchange works (two arrows). A free pebble can bemoved onto a covered edge (adjacent to a vertex), provided that acorresponding pebble, presently covering the edge and associated withthe neighboring vertex, is moved off.


whole network. The free pebbles can be rearranged but arerestricted to certain regions because of the pebble-coveringrule. For example, no more than six free pebbles can befound within a rigid cluster. Based on the location andnumber of free pebbles throughout the framework, one canidentify overconstrained regions, rigid clusters, and under-constrained regions, as described below.

Strained Regions

A redundant constraint is identified when a failedpebble search occurs. A failed pebble search consists of aset of vertices that have no extra free pebbles to give up.This physically corresponds to placing an additional dis-tance constraint between a pair of atoms that have apredefined fixed distance. Placing a distance constraintbetween this pair of atoms generally causes a lengthmismatch and leads this region to become internallystrained. Thus, a failed pebble search identifies overcon-strained regions. Overconstrained regions always consistof closed loops. Because distance constraints are added tothe framework, more overconstrained regions will be found,and generally these regions will overlap. Overlapping,overconstrained regions merge together into a single over-constrained region. As these frameworks are generic,stress will propagate and redistribute throughout themerged overconstrained regions. Redundant bonds residewithin overconstrained regions. Therefore, the more redun-dant bonds that are present within a given rigid region,the more stable that region will be against removal ofconstraints, e.g., crosslinking salt bridges or hydrogenbonds.

Rigid Cluster Decomposition

The method used to identify the rigid clusters, includingoverconstrained regions, is very simple once all edges inthe graph are in place and the pebble game is finished. Allrigid clusters can at most have six free pebbles distributedover the vertices within the cluster. Therefore, to identifythese clusters, select a vertex and two of its bondednearest-neighbor vertices. Collect three pebbles on theselected vertex and two pebbles and one pebble, respec-tively, on its two neighboring vertices. Because we areconsidering bond-bending networks, it is always possibleto collect these six pebbles, and never any more. Markthese three vertices. Then, iteratively in a breadth-firstsearch, check all bonded, unmarked nearest neighbors tothe current set of marked vertices, to see whether a freepebble can be obtained. If a free pebble cannot be obtained,mark the new vertex, and note that it is part of the samerigid cluster. This method works because all rigid clustersin bond-bending networks are contiguous through bondednearest neighbors17,20; this point is essential and implicitin the generalization of Laman’s theorem to 3-D bond-bending networks.

The rigid cluster decomposition using the 3-D pebblegame has been compared with the numerical brute forcemethod described earlier. For a variety of generic bond-bending networks containing as many as 450 atoms, exactagreement has always been found.35 It is worth mention-

ing that in contrast to the numerical method, the 3-Dpebble game can be used on networks with more than 10million atoms with the security of knowing the results areexact, because the pebble game is an integer countingalgorithm, eliminating the possibility of any numericalround-off errors. Therefore, even the largest proteins andprotein complexes can be analyzed both precisely andrapidly.

Underconstrained Regions

Locating rotatable bond dihedral angles in the protein,which correspond to hinge joints in the network, is an easytask after the rigid cluster decomposition is made. Notethat a hinge joint can never occur about a next-nearest-neighbor distance constraint, which corresponds to a bond-bending angle.17 If the two incident vertices of a bond-stretching constraint belong to different rigid clusters, adihedral angle rotation is possible, and the bond is re-corded as a hinge joint; otherwise, the dihedral anglemotion is locked, as it is part of a rigid cluster. The numberof rotable dihedral angles will generally be considerablymore than the number of residual internal degrees offreedom in the network. Not all the rotatable dihedralangles associated with hinge joints are independent, asthey are part of a ring of bonds (e.g., within loops formed bycrosslinking hydrogen bonds).

Collective motions consist of coupled dihedral angleswithin the protein and take place in underconstrainedregions. Distinct underconstrained regions are partitionedsuch that collective motions can occur within one undercon-strained region without directly affecting internal coordi-nates within all the other underconstrained regions. Theunderconstrained regions are identified by attempting tospecify a value for each dihedral angle and determiningwhether it can be satisfied. Specifying a dihedral angle isequivalent to placing an external torsional constraint tolock in this choice of angle. Independent, externally im-posed torsional constraints represent independent degreesof freedom available to the system, while redundant,externally imposed constraints indicate the angle is prede-termined as part of a collective motion. Therefore, thealgorithm for finding these distinct underconstrained re-gions is the same as that for finding the overconstrainedregions, except that now the only constraints placed in thenetwork are the external torsional constraints.

Proteins as Bond-Bending Networks

The connectivity of a bond-bending network is com-pletely defined by the nearest-neighbor bond-stretching(central-force) constraints. Next-nearest-neighbor dis-tance constraints represent the angular bond-bendingconstraints due to the atom’s chemistry. Dihedral rotationabout the central-force bonds is the elementary flexibleelement in this type of network. Rotation through adihedral angle is possible a priori but may be lockedbecause of crosslinking bonds in the network. Further-more, dihedral angles associated with double or partial-double bonds are represented as fixed by using a third-nearest-neighbor distance constraint. For instance, to lock


the peptide bonds within a protein structure, the distancebetween the carbonyl oxygen and amide hydrogen is fixed.Therefore, along the backbone of a protein the F, Cdihedral angles are a priori allowed to rotate, but thepeptide bonds and other double-bonded groups are keptplanar in this way.

As shown in Figure 1, the hydrogen bond is modeledsimilarly to a covalent bond, in which the donor, hydrogen,and acceptor atoms are typically generic and hence nonco-linear. Each hydrogen bond will introduce three distanceconstraints, corresponding to one central force betweenthe hydrogen and acceptor atoms and two bond-bendingforces associated with the hydrogen and acceptor atoms.This model for a hydrogen bond allows the protein struc-ture to be described as a bond-bending network and isphysically reasonable because hydrogen bonds are almostnever precisely linear; the three dihedral angle degrees offreedom associated with this representation of the hydro-gen bond also allow it to have some flexibility. Modelingthe hydrogen bond to be more or less constrained than thishas been tested, and the model in Figure 1 provides a goodbalance between neither over- nor underrepresenting theflexibility of a hydrogen bond. This model results in idealrigid a-helices, and b-sheets ranging from rigid to some-what flexible, depending on size and the regularity of theirhydrogen-bonding patterns. Moreover, protein structurestypically show a substantial proportion of rigid regions,while having regions that remain flexible. Determinationof constrained and rotatable dihedrals, as implemented inthe FIRST software’s implementation of the pebble game,has been tested against exact counting and shown to agreefor all the elementary structures: a-helices, parallel andantiparallel b-sheets, and reverse turns.

APPLICATION TO PROTEINS

Detailed information about the mechanical stability of aprotein under a fixed set of distance constraints is providedby FIRST analysis. All overconstrained regions, rigidclusters, and underconstrained regions are determinedand can be colored and viewed using standard molecularrendering packages, including Rasmol and InsightII. Toanalyze a protein, it must first be decided which hydrogenbonds to include and model as distance constraints.

Hydrogen Bonding

Beyond covalent bonds, salt bridges and then hydrogenbonds form the next strongest interactions within proteins(Fig. 3). Hydrogen bonds vary in strength from nearly asstrong as the covalent bonds to as weak as the van derWaals interactions.36,37 Hydrogen bonds form directionalcrosslinks in the bond-bending network that lead to large-scale rigid regions. In proteins, the regular hydrogen-bonding patterns between main-chain amide and carbonylgroups form the regular secondary structures: a-helices,b-sheets, and reverse turns. Hydrogen bonds also stabilizethe tertiary structure of proteins through side-chain inter-actions that interlock parts of the protein chain distant insequence.

For protein structures in which the hydrogen atompositions are not experimentally defined (the case for most

crystallographic structures determined by X-ray ratherthan neutron diffraction), the WhatIf software package isused to assign polar hydrogen atoms positioned such thattheir hydrogen-bonding opportunities are optimized.38 Be-cause FIRST results will depend on the accuracy ofplacement of polar hydrogen atoms (because of theirinfluence on hydrogen bonds being assigned, or not), wehave tested how well WhatIf-defined hydrogen positionscompare with the experimentally determined positions infive neutron diffraction structures from the Protein DataBank (PDB; www.rcsb.org/pdb).1 Five neutron structuresavailable in the PDB are lysozyme (PDB entry 1lzn),trypsin (1ntp), insulin (3ins), myoglobin (2mb5), and ribo-nuclease A (5rsa). For each structure, we processed the filethrough FIRST using two different hydrogen-bond energythreshold or cutoff values, as discussed below. We thencreated a modified version of the neutron structure bystripping the hydrogen atoms from it and adding newhydrogens with WhatIf. This new structure was thenprocessed through FIRST for comparison with the resultsobtained using neutron diffraction-defined hydrogen atompositions.

Table I contains results for these five neutron structuresat two different energy cutoff values: 20.1 kcal/mol and20.6 kcal/mol (the latter corresponding to thermal fluctua-tion at room temperature). The comparative results showonly slight differences due to a few hydrogens placeddifferently in the two structures. The percentages shownare calculated by dividing twice the number of hydrogenbonds in common by the total number of hydrogen bondsfor both versions of the protein structure. While the energythreshold of 20.6 kcal/mol is more restrictive and includes

Fig. 3. Schematic representation of the ordering of microscopicforces, from strongest to weakest. Distance constraints are used in FIRSTto model strong bonding forces to the left of a sliding pointer. Thisapproach defines a network of covalent and hydrogen bonds and saltbridges in the protein.


only the strongest hydrogen bonds, there was slightly lessoverall agreement (94%) in the hydrogen bonds at thisenergy threshold in the WhatIf and neutron versions of thestructure than was found at the chosen threshold of 20.1kcal/mol (96%). Thus, on average, only 4% of the hydrogenbonds were assigned differently in the two types of struc-tures. Not surprisingly, many of the hydrogen-bond differ-ences resulted from different placement of hydrogens onhistidine residues. Histidine has two side-chain nitrogenatoms that can bond to 0, 1, or 2 hydrogens, depending onthe local environment. Overall, we conclude that theWhatIf software package positions hydrogen atoms suffi-ciently accurately to permit analysis of the resultinghydrogen-bond network.

To define the hydrogen bonds and salt bridges forinclusion in the analysis of rigid and flexible regions, thegeometry and energy of these interactions is assessed. Asuperset of possible hydrogen bonds is assigned based onmeeting the following geometric39,40 criteria: the donor–acceptor distance, d # 3.6 Å, the hydrogen-acceptor dis-tance, r # 2.6 Å, and the donor–hydrogen-acceptor angle,u, falls between 90° and 180° (Fig. 4). Salt-bridge (ion-pair)interactions are considered as a special case of hydrogen

bonds. Salt bridges are similar to hydrogen bonds, with amore significant ionic or Coulombic component, which isless geometrically sensitive. Our identification of such saltbridges follows previous studies41–43 by extending themaximum distance between donor and acceptor to 4.6 Åand softening the angular dependence such that u fallsbetween 80° and 180°.

Because the strength of hydrogen bonds and salt bridgesdepends on the chemistry of the particular donor andacceptor atoms as well as their orientation, an energyfunction44 is then used to rank hydrogen bonds:

EHB 5 V0H5Sd0

d D12

2 6Sd0

d D10J F~u, f, w! (1)

where

sp3 donor–sp3 acceptorsp3 donor–sp2 acceptorsp2 donor–sp3 acceptorsp2 donor–sp2 acceptorV0 5 8kcal/mol and d0 5 2.8 Å

F 5 cos2ucos2(f2109.5)F 5 cos2ucos2fF 5 cos4uF 5 cos2ucos2(max[f,w])

The hydrogen bond energy (EHB) is a function of theequilibrium hydrogen bond distance, d0, and well depth,V0. The donor to acceptor distance is d. The angulardependence of the function, F, is dependent on the hybrid-ization of the donor and acceptor atoms. The four possiblecases shown rely on the angular terms u, f, and w. Angle udefines the donor atom–hydrogen–acceptor atom angle,and f is the angle between the hydrogen atom, theacceptor, and the base atom bonded to the acceptor. The wangle is between the normals of the two planes defined bythe sp2 centers. If f is less than 90°, the supplement of theangle is used. Figure 4 illustrates how these parametersare defined.

Salt bridges can be viewed as strong hydrogen bonds36

with average energies of 26 (64) kcal/mol.45 Salt bridgeshave broader distance and angular distributions than are

TABLE I. Effect of Hydrogen Atom Placement on the Number of Hydrogen Bonds Identified*

H-Bond Energy

PDB Code

1lzn 1ntp 2mb5 3ins 5rsa Total

No. of residues 129 223 153 102 124 —No. of protein atoms 1762 1790 1836 1305 1556 —Resolution (Å) 1.70 1.80 1.80 1.50 2.00 —

No. of H-bonds with E # 2 0.1 kcal/mol No. common to both 184 216 249 108 140 897No. unique to neutrona 3 9 13 3 4 76No. unique to modifiedb 11 17 6 6 4% in common 96.3 94.3 96.3 96.0 97.2 95.9

No. of H-bonds with E # 2 0.6 kcal/mol Common to both 130 168 182 80 116 676Unique to neutrona 7 16 22 3 3 84Unique to modifiedb 6 9 8 6 4% in common 95.2 93.1 92.4 94.7 97.1 94.2

*Experimentally defined hydrogen positions and resulting hydrogen bonds of five neutron structures are compared with the hydrogen bondsresulting from assignment of hydrogen positions by WhatIf in the same five structures.aRows labeled unique to neutron include the number of hydrogen bonds from experimentally determined hydrogen atom positions.bRows labeled unique to modified include the number of hydrogen bonds from WhatIf calculated hydrogen atom positions in neutron diffractionstructures from which the experimentally determined hydrogen atom positions had been removed.

Fig. 4. Geometry used in the hydrogen bond energy potential. u is thedonor–hydrogen–acceptor angle; f is the hydrogen–acceptor–baseangle, where base is the atom (C, in this case) covalently bonded to theacceptor; d is the donor–acceptor distance; r is the hydrogen–acceptordistance; and w (not shown) is the angle between the normals of theplanes defined by the covalent bonds of the donor and base atoms (e.g.,the planes defined by the two sp2 centers, N and C, in this case).


found for nonionic hydrogen bonds, and these observeddistributions are not well reflected by the hydrogen-bondenergy functions we have tested. Salt bridges within theabove specified geometric ranges generally have strongerinteractions than those exhibited by hydrogen bonds.Therefore, salt bridges meeting the above geometric crite-ria are always included in the bond network analyzed forflexibility.

For hydrogen bonds, we can tune the energy threshold(the sliding pointer in Fig. 3) used to define which hydro-gen bonds are included in the network. Setting the thresh-old at less negative (less favorable) energy values includesweaker hydrogen bonds, which tend to be common inproteins and have a significant influence on structuralstabilization. The ability to select hydrogen bonds based onstrength allows investigation of how the stability in eachregion of the protein varies as the hydrogen-bond networkis strengthened or weakened. By changing the criteria formodeling a hydrogen bond as a constraint, a protein can besubstructured from containing a few, large rigid clustersdown to being completely floppy, with many small rigidclusters involving single atoms with their covalent bondsacting as rotatable dihedral angle hinges. Individual hydro-gen bonds or small sets of hydrogen bonds that formcritical crosslinks can therefore be identified by shiftingthe energy threshold and observing which hydrogen bonds,when included or omitted, have a large effect on therigidity of the network.

Ideally, we would like to be able to set this energy cutoffvalue, perform a single FIRST analysis, and know that theoutput describes the physically relevant flexibility of aprotein. One way of setting the energy threshold objec-tively is to choose it such that maximum agreement in thehydrogen-bond network is obtained for pairs of indepen-dently determined structures for a protein (e.g., by differ-ent researchers or in different crystallographic packings)in which the main-chain conformations are the same. Thisensures that the results of FIRST analysis are not sensi-tive to the sorts of fluctuations known to occur withinprotein structures. Since the energy cutoff is the tunableparameter in using FIRST, we tested which setting gavethe most similar results between pairs of such structures.Such a cutoff value should naturally be below when allhydrogen bonds are present (Ecut # 0.0 kcal/mol) andabove when the protein substructures into many tiny rigidclusters (Esplinter). A very natural place to look at thebehavior of these proteins is near room temperature whichcorresponds to Ert 5 20.6 kcal/mol. However, because theenergy function is approximate (does not take into accountthe effect of more distant neighboring atoms on hydrogen-bond strength), it is important not to take these energiestoo literally, and to consider them as relative rather thanabsolute.

To determine a reasonable default energy threshold forhydrogen bonds, we evaluated which threshold best con-serves the hydrogen bonds within a family of proteinstructures. Multiple structures within four different pro-tein families were studied to find such a threshold. ThePDB codes used for each family are as follows: trypsin

(1tpo, 2ptn, 3ptn), trypsin inhibitor (4pti, 5pti, 6pti, 9pti),adenylate kinase (1zin, 1zio, 1zip), and human immunode-ficiency virus protease (HIVP) (1dif, 1hhp, 1htg). Figure 5shows the hydrogen-bond energy distribution for one ofthese families, namely the three HIVP structures. A largespike in the distribution of possible bonds located between20.1 kcal/mol and 0.0 kcal/mol for the number of hydrogenbonds in the three structures appears in Figure 5. Thisspike is largely due to the fact that quite generousdefinitions of hydrogen bonds are allowed initially (donor–hydrogen–acceptor angle, u $ 90° and donor–acceptordistance, d # 3.6 Å, as shown in Fig. 4). The inset inFigure 5 expands the region near 0.0 kcal/mol, demonstrat-ing how a large number of very weak hydrogen bonds,often with u angles near 90°, can be removed by settingEcut # 20.1 kcal/mol. Thus, the generous hydrogen bonddistance and angle screening criteria can be effectivelyfiltered by setting Ecut. When these geometric criteria andan energy threshold of 20.1 kcal/mol are applied toanalyze the hydrogen bonds and salt bridges in fiveneutron diffraction structures, a Gaussian distribution isobserved for the number of hydrogen bonds as a function ofdonor–acceptor distance, with virtually all hydrogen bondsand salt bridges having distances between 2.6 and 3.6 Å.The distribution in donor–hydrogen–acceptor angles isbimodal, with a strong, Gaussian peak between 130° and180° and a weaker peak between 90° and 130°.

In the choice of protein structures to analyze, thestereochemical quality of the structure can have a signifi-cant influence on the definition of its network of hydrogenbonds, due to their angular dependence. The result is thatFIRST analysis on a structure with poor stereochemistryis likely to indicate the protein as being more flexible thanit actually is, due to missing hydrogen bonds. It is advis-able to assess the main-chain stereochemistry through a

Fig. 5. Histogram of energies for hydrogen bonds. Distribution ofhydrogen bond energy for three structures of human immunodeficiencyvirus protease (HIVP) (PDB codes 1dif, 1hhp, 1htg). Hydrogen positionswere established by WhatIf. The inset expands the low-energy (weakhydrogen bond) region between 20.2 and 0 kcal/mol. An energy thresh-old of 20.1 kcal/mol is used to eliminate the large number of very weakhydrogen bonds in the spike near 0 kcal/mol.


F, C plot, as well as focus on high-resolution, well-refinedstructures for FIRST analysis, to avoid this possibility ofmissing hydrogen bonds due to the misorientation ofmain-chain hydrogen-bonding groups.

Flexibility Index

A flexible region consisting of many interconnected rigidclusters within a protein may define a collective motionhaving only a few independent degrees of freedom. Al-though underconstrained, this region could be nearly rigidand thus mechanically stable. An isostatically rigid region,however, which contains no redundant constraints and isonly rigid, is not expected to be as stable as an overcon-strained region. Overconstrained regions have more con-straints than necessary to be rigid, and therefore areconsidered more stable. Because of this continuum be-tween rigidity and flexibility, a continuous index is useful.

The total number of floppy modes in a protein, denotedby F, corresponds to the number of independent, internaldegrees of freedom. To obtain F, the six trivial rigid bodydegrees of freedom are subtracted from the total number ofindependent degrees of freedom. The global count of thenumber of floppy modes gives a good sense of overallintrinsic flexibility. However, a more useful measure is totrack how the degrees of freedom are spatially distributedthroughout the protein. In particular, we are interested inlocating the underconstrained or flexible regions.

The quantity, fi, is defined as a flexibility index thatcharacterizes the degree of flexibility of the ith central-force bond in the protein. Let Hk and Fk, respectively,denote the number of hinge joints (rotatable bonds) andthe number of floppy modes within the kth undercon-strained region. Let Cj and Rj, denote the number ofcentral-force bonds and the number of redundant con-straints within the jth overconstrained region, respec-tively. The flexibility index provides a quantitative rangefrom most to least constrained and is given by

fi ; 5Fk

Hkin an underconstrained region

0 in an isostatically rigid region2Rj

Cjin an overconstrained region

(2)

When the ith central-force bond is a hinge joint, theflexibility index is defined to be given by the the number offloppy modes divided by the total number of hinges withinthe underconstrained region. Because the number of inde-pendent dihedral rotations must be less than or equal tothe number of hinge joints, the flexibility index is always#1. When the ith central-force bond is not a hinge joint, itis part of a rigid cluster. If the central-force bond is withinan overconstrained region, the flexibility index is assigneda negative value with magnitude given by the number ofredundant constraints divided by the total number ofcentral-force bonds within the region. This number be-comes more negative as the region becomes more overcon-strained.

As a simple example, consider a single n-fold ring ofatoms that are connected by covalent bonds. From con-

straint counting, the number of degrees of freedom minusthe number of constraints is given by F 5 n 2 6. Thenumber of hinge joints is simply given by n. Therefore, theflexibility index for a n-fold ring is given by

fi 5n 2 6

n for each central-force bond in a n-fold ring

(3)

Note that as the ring becomes larger, the flexibility indexgoes to the limit of 11; in this case, each dihedral angle isnearly independent, and the ring is almost as flexible as alinear chain. For a six-fold ring, the flexibility index iszero, indicating an isostatically rigid structure. For athree-fold ring, the ring is overconstrained, and the flexibil-ity index is 20.2. The flexibility index of a protein can beplotted as a function of residue number, and regionswithin the plot (corresponding to segments within thesequence) can be colored according to whether they arecoupled in motion (Fig. 6). A nice property of the flexibilityindex is that it varies gradually as hydrogen bond con-straints are added or removed.

RESULTS AND DISCUSSION

An important feature of FIRST is that it can predict theintrinsic flexibility of a protein given a single 3-D struc-ture. However, it is typical that upon ligand binding, thehydrogen-bond pattern will change. Therefore, the pre-dicted conformational flexibility using FIRST will dependon whether the structure being analyzed is an open(ligand-free) form, or a closed (ligand-bound) form. Crystalcontacts can also influence the flexibility of a protein, andtheir influence can be assessed in two ways by FIRST: (1)by analyzing the flexibility of the protein independent ofits crystal lattice neighbors (in which case the effects ofintermolecular hydrogen bonds are removed from analy-sis); and (2) by comparing the flexible regions found for thesame protein crystallized in different lattice packings.However, the general features of flexible and rigid regionsfound by FIRST are remarkably consistent among differ-ent 3-D structures (in the same ligand-binding state) for aprotein, as will be shown for HIVP, dihydrofolate reduc-tase, and adenylate kinase.

HIV Protease

An initial application is to HIVP, a major inhibitory drugtarget for current acquired immunodeficiency syndrome

Fig. 6. A: Rigid cluster decomposition of the open conformation ofhuman immunodeficiency virus protease (HIVP) (PDB code 1hhp). B:Flexibility index, color-mapped onto the same HIVP structure. Fourregions of interest, a, b, g, and d, are identified for one of the dimers. C:Flexibility index plotted versus residue number. Of the four regions, a, b,and g are most flexible (shown red in B), while d is rigid (blue) in this openconformation of HIVP. Parts of the sequence that are coupled in motionare plotted in the same color; the same regions in D and E are thencolored accordingly. D: Mobility plotted versus residue for this conforma-tion. Mobility is determined as the average crystallographic temperaturefactor (B-value, or Debye–Waller factor) divided by the average atomicoccupancy, for the main-chain atoms in each residue. E: Dihedral anglechanges between the main chains of the above open conformation andthe closed conformation (Fig. 7, PDB code 1htg). The three flexibleregions identified by FIRST are also those with the greatest experimen-tally defined mobility values and dihedral angle changes.


Figure 6.


(AIDS) therapy. Two ligand-free X-ray crystal structuresavailable for HIV protease, PDB entries 1hhp and 3phv,are superficially very similar in structure and have similarresolution and crystallographic residual error (2.7-Å reso-lution for both, and an R-factor of 0.190 for 1hhp and 0.191for 3phv). However, PROCHECK46 showed that 3phv hadsignificantly fewer residues with stereochemically favoredF, C values, which results in distorted main-chain hydro-gen-bond geometries; therefore we chose 1hhp to representthe open HIVP conformation. The open form of the protein(PDB code 1hhp) is dominated by a single rigid clustershown in blue (Fig. 6A), including the base and walls ofthe substrate and inhibitor binding site (cavity atcenter), and three flexible regions shown as alternating-colored bonds (each color indicating a rigid microclusterwithin the flexible region). The ends of the flaps (bregion labeled in Figure 6A, residues 45–56) are knownfrom crystallographic and NMR structures to be impor-tant for closing over and binding inhibitors47 and appearas the most flexible (red) regions when the structure ischaracterized by the flexibility index (Fig. 6B,C). Otherflexible regions include the base of each flap (region a,residues 39 – 42), which may act as a cantilever, and theg region.

The FIRST results for HIVP have been compared withexperimental measures of protein flexibility. The majorpeaks in main-chain thermal mobility (B-value), mea-sured crystallographically and shown in Figure 6D,correlate directly with the a, b, and g flexible regionspredicted by FIRST. The region labeled d is the dimerinterface, formed by the N- and C-termini of the two,identical protein chains. It should be noted that forproteins with mobile domains or other moving rigidbodies, such as a-helices, the crystallographic mobilityand FIRST results will not necessarily compare wellwith B-values. Crystallographically, they appear asmobile regions, whereas in FIRST they appear as rigidregions flanked by flexible loops (allowing the motion).This confusion can be avoided when NMR order parame-ters are available for comparison, since they also indi-cate moving rigid bodies as rigid regions flanked byflexible loops. The Indiana Dynamical Database48 con-tains such data for a number of proteins, includingHIVP. These data are provided for PDB entry 1bvg, aligand-bound form; as in the FIRST results for a differ-ent ligand-bound structure described in Figure 7, thebase of the flaps are the most flexible region.

HIVP has been crystallized with various inhibitorsbound, resulting in a closed conformation with the flapslowered. The main-chain dihedral angle (F, C) changes(similar to the analysis reported by Korn and Rose49)observed for crystal structures of the open (entry 1hhp)and closed (entry 1htp) conformations are shown inFigure 6E. The FIRST-predicted flexible regions alsodirectly correspond with the regions of greatest dihedralangle change. In the three flexible regions (a, b, and g),the flexibility is associated with a flip in at least onedihedral angle (defined as a change of more than 60°)within a rigid b-turn in the center of each flexible region

(Fig. 6A,E). The results here are consistent with themotion observed by interpolation between different HIVPcrystal structures50 and an earlier dihedral analysis fora different pair of HIVP structures49 indicating thatlarge changes at residues 40, 50, and 51 in the a and bregions result in a large, concerted movement of theflaps. Flexibility of the g region has not been emphasizedin other studies of HIVP; however, it is known thatdrug-resistant mutants of the protease include tworesidues that pack against the g region, 63 and 71, withresidue 63 proposed to induce a conformational perturba-tion.51,52 Thus, conformational coupling between the gregion and the flaps, through the g–a loop interactions,may explain why mutations in the g region, which aredistal from the active site, cause resistance to drugbinding.

Ligand binding restricts the motion of the flaps throughnew hydrogen bonds linking the two flaps to each otherand to the ligand. Some of these hydrogen bonds betweenthe flaps and ligand are mediated by an conserved watermolecule found in retroviral but not mammalian homologsof HIVP,53 providing a useful basis for designing moreHIV-specific drugs. To compare the influence of ligands onHIVP flexibility, there were a number of ligand-boundstructures of good stereochemistry from which to choose.For brevity, here we show the results from PDB entry1htg, with GR137615 bound, to represent the closed formof HIVP. (We have also analyzed two other ligand-boundstructures, 1hiv and 1dif, and found the influence of theseligands on protein flexibility to be substantially similar.)Unlike the open form, the closed structures were resolvedcrystallographically as dimers; thus, independent struc-tural information is available for the two subunits of thedimer. This means it is possible to assess the influence ofdifferent side-chain conformations in the two halves (dueto ligand interactions, thermal fluctuations and environ-mental differences) in terms of their effects on the hydro-gen-bonding network and flexibility. The left and rightsides of HIVP in Figure 7 indicate that the only substantialdifference in their flexibility is caused by the asymmetry ofthe ligand bound (at center).

Comparison of this ligand-bound structure with theopen HIVP also demonstrates how a ligand can rigidifypart of the protein through new hydrogen bonds eventhough the ligand itself is not rigid (note black bondsindicating H-bonds between the protease flaps in Figure7A, and that the flaps are now rigidified), while makingother parts of the protein more flexible. In particular, notethe dimer interface, where inter-subunit rotation occursupon ligand binding, breaking some of the interfacialstabilizing hydrogen bonds, and the loop to the right of thebinding cavity, shown as a flexible (orange) region of themain-chain ribbon in Figure 7B. This loop flexibility is notreflected in the other HIVP subunit, due to ligand asymme-try. Flexibility of the dimer interface in a ligand-boundstructure is also a prominent feature found by NMR54 andMD analyses55; MD also identifies flap flexibility in theligand-free conformation.


The influence of water is easily seen in a comparison ofthe liganded cases. A specific water molecule (WAT301)positioned between the b flaps and the ligand is conservedin most HIVP ligand-bound structures.53 In this rigidregion analysis, we found the inclusion of this wateressential to rigidify the b flaps. This particular watermolecule serves as a hydrogen acceptor from residue 50 ofeach protein chain and a hydrogen donor to the ligands.Adding one water molecule introduces four hydrogenbonds to the structure, with energies within the range of21.5 to 27.5 kcal/mol. For both 1htg and 1dif, the b flapsbecome flexible without the intermolecular hydrogen bondscreated by this water molecule. We have only includedburied water molecules making intermolecular hydrogenbonds in HIVP, as these tend to be reliably assignedbetween structures, whereas surface water molecules areunevenly assigned in many of the crystal structures ofHIVP, as well as being variable in crystallographic tem-perature factor.

Dihydrofolate Reductase

By trapping different ligand-bound states crystallo-graphically, Sawaya and Kraut56 noted five conforma-tional states for Escherichia coli dihydrofolate reductase(DHFR) during its catalytic cycle. We analyzed three ofthese crystallographic structures (PDB codes: 1ra1, 1rx1,and 1rx6) with the FIRST algorithm. These three struc-tures correspond to the open, occluded, and closed confor-mations of DHFR, as shown in Figure 8. The Ca traces arecolored according to each residue’s flexibility index, fi, withthe most flexible regions colored red and the most rigidcolored blue.

According to the previous study,56 the regions ofgreatest interest are the rotations of two major subdo-

Fig. 7. A: Rigid cluster decomposition of the closed conformation ofhuman immunodeficiency virus protease (HIVP) (PDB code 1htg). B:Flexibility index plot of the same, closed conformation of HIVP (PDB code1htg). Four regions of interest, a, b, g, and d, are identified for one of themonomers. Contrast the change in rigidity for these regions between thisligand-bound structure and the ligand-free case (Fig. 6).

Fig. 8. Flexibility map of dihydrofolate reductase for the open confor-mation (PDB code 1ra1) (A) and for the closed conformation (PDB code1rx1) (B). C: Occluded conformation (PDB code 1rx6). Shown in greenare the ligands bound in these reaction pathway intermediates. Two loopsexperimentally determined to be flexible, M20 and bF–bG, are also noted.The motion of the M20 loop is essential to accommodate a variety ofligands during catalysis. The flexible bF–bG loop participates in ligand-induced conformational changes. At the top of the graph is the scale forthe flexibility index used to map color onto the Ca trace. The scale runsfrom red (flexible) through gray (isostatic) to blue (rigid).


mains: the adenoside binding subdomain and the loopsubdomain. The M20 and bF–bG loops of the loopsubdomain are denoted in Figures 8 and 9. Studies byMiller and Benkovic57 concluded that the flexibility ofthese loops is interrelated, such that the flexibility of theouter bF–bG loops guides the conformation of the M20loop. It is this correlated flexibility that gives DHFRligand specificity. We would expect the regions thatmove the most, and that are important for binding, toappear flexible in the FIRST analysis, at least in theopen conformation. Figure 8A shows that the M20 loopis detected by FIRST to be fully flexible, but in the closedform (Fig. 8B), this loop has moved and become partiallylocked into place. Comparing all three conformationsquantitatively in Figure 9, the residues within thesetwo mobile loops tend to be most flexible, and thisflexibility is fairly independent of conformation beinganalyzed. This relates to a functional requirement forthese loops to remain flexible during the catalytic cycle.

The flexible region around residue 88 in all threeconformations of Figure 9 corresponds to a hinge be-tween the two subdomains. Similarly, the flexible regionfound by FIRST around residue 70 (open triangles in theflexibility index plot, Fig. 9) is within the adenosidebinding subdomain and has also been identified asflexible by NMR techniques.58 Plots of crystallographicmobility and regions of greatest main-chain dihedralangle change (data not shown) for the three DHFRstructures in Figure 8 are remarkably consistent be-tween the structures and also agree with FIRST resultsthat the most flexible regions are in the M20 loop andthe bF–bG loop. We have analyzed several other DHFRstructures by FIRST (PDB entries 1rx2, 3, 4, and 5) andhave found substantial agreement with the above conclu-sions. In general, changes in ligands between thesestructures can influence the flexibility of their neighbor-hood in the protein, but the major features of flexibilityremain consistent.

Fig. 9. Flexibility index plot for the three conformations of dihydrofolate reductase (DHFR) shown in Figure8. The value of the flexibility index as defined in eq. (2), plotted versus residue number (a monochrome versionof the type of plot shown for human immunodeficiency virus protease (HIVP) in Fig. 6C). The twoexperimentally determined loops, M20 and bF–bG, are shown at the top of the plot, and correspond to highlyflexible regions. For each collective motion within the structure, a unique symbol is plotted.


Adenylate Kinase

Another protein whose motion has been studied experi-mentally is adenylate kinase.59–61 Previous work indi-cates that adenylate kinase uses hinges rather than shearmotions for conformational change. This intrinsic, large-scale hinge motion upon ligand binding is easily identifi-able in both the open and closed conformations analyzedby FIRST, as indicated in Figure 10 by gold arrows.

Several helices at the extreme right of Figure 10 move inlike fingers via hinges defined as the most flexible (red)portions of the protein by FIRST.

Adenylate kinase binds two ligands, ATP and AMP, in atwo-step mechanism. Unfortunately, structures of adenyl-ate kinase from the same species are not available for allthree steps (ligand-free, followed by two binding/conforma-tional change events). By comparing enzymes from differ-

Fig. 10. Flexibility plot of adenylate kinase for the open conformation (PDB code 1dvr) (A) for the closedconformation (PDB code 1aky) (B). C: Difference in main-chain dihedral angles between these twoconformations, indicating the locations of large, localized conformational changes. Ligands bound to thesestructures are shown in green tubes. The open state (A) has only adenosine triphosphate (ATP) bound. In theclosed state (B), the ligand, P1,P5-bis(adenosine-59-)pentaphosphate (AP5A) mimics the roles of AMP andATP binding concurrently. Dark blue corresponds to highly overconstrained and rigid, with a flexibility index;bright red corresponds to highly flexible. All labeled regions are discussed in the text.


ent species, four hinges were previously defined to contrib-ute to the conformational change between open and ligand-bound forms.59 These hinges move in a concerted way toaccount for the large conformational change closing the liddomain (residues 131–165) over the ligand. Figure 10A,Bshows structures with ligands bound to this domain andnot this initial ligand-binding step.59 However, the peak inmain-chain dihedral change shown around residue 167 inFigure 10C corresponds to a conformational switch madeby the red flexible loop (part of the lid) in Figure 10Abetween structures. Crystallographic mobility analysis forthe open structure (10A) and the closed structure (10B) aregenerally in good agreement with FIRST results, the onlydifference being that regions between residues 135 and 160appear crystallographically mobile in the closed structure.

Closing of the lid domain is associated with the bindingof ATP (green tubes at center in Fig. 10A). Binding of theligand AP5A (green tubes, Fig. 10B) produces the fullyclosed conformation of adenylate kinase and locks many ofthe domain linking hinges (a–f), as seen by the transitionbetween flexible (red) and rigid (blue) regions in Figure10A,B. The NMPbind site is where the part of AP5A that isnonoverlapping with ATP binds, and this site is formed bythe interface between the two domains as they clamp downon the inhibitor.61

Comparing these structures, FIRST shows that theflexibility of the NMPbind domain (especially hinges a, e,and f) decreases upon AP5A binding to this domain. Theflexible linkage (b) between helices a3 and a4 aroundresidue 62 (identified in the Fig. 10C plot of change in F, Cangles between the open and closed structures in Fig.10A,B) seems to account for a large part of the motiontransforming the open to the closed conformation. Thisregion (b in Fig. 10A,B) is found to decrease in size butremains flexible in the closed conformation. The persistentflexibility in the closed conformation hints at the reversibil-ity of the motion required for catalytic turnover. Even inthe ATP-bound closed lid conformation exhibited by bothof these structures, certain key hinges59,60 remain flexible.For example, hinges c and d between helices a4 and a5remain flexible in both states. Thus, FIRST results corre-late well with the crystallographically observed conforma-tional changes upon ligand binding for the complex mo-tions within adenylate kinase.

SUMMARY

A novel distance constraint approach is introduced forcharacterizing the intrinsic flexibility of a protein. Theunderlying physical and mathematical assumptions areoutlined and implemented computationally in the FIRSTsoftware. FIRST determines the floppy inclusion and rigidsubstructure topography of a given protein structure,based on a set of distance constraints determined by thecovalent and hydrogen bonding network within a singleconformation of the protein. A flexibility index is intro-duced as a continuous measure for quantifying the flexibil-ity or stability of each bond within the protein, based onthe FIRST identification of the density of floppy modes aswell as the density of redundant bonds within the protein.

There are several advantages of FIRST relative toprevious methods for analyzing protein flexibility. FIRSTcalculations can be done virtually in real time (a fewseconds of CPU time) once the bond network has beendefined. Analysis of a single protein structure is able toindicate regions likely to undergo conformational changeas part of the protein’s function. For a given set of distanceconstraints, the rigid regions and the flexible joints be-tween them are determined precisely. The ability to deter-mine coupled motions very quickly among the dihedralangles of a flexible region gives FIRST an advantage overother methods. Collective motions, in which changing onedihedral angle will influence the other dihedral angleswithin the region, are localized within the flexible regionsof the protein. Analysis of the relative flexibility withinHIV protease, dihydrofolate reductase, and adenylatekinase, even when performed on a single structure, cap-tures much of the functionally important conformationalflexibility observed experimentally between different li-gand-bound states. A distribution version of the FIRSTsoftware is in preparation and will be available throughthe authors.

ACKNOWLEDGMENTS

The authors thank Yuquing Xiao for help with program-ming and Vishal Thakkar and Brendan Hespenheide fortheir contribution to molecular graphics.

REFERENCES

1. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr., Brice MD,Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The ProteinData Bank: a computer based archival file for macromolecularstructures. J Mol Biol 1977;112:535–542.

2. Ringe D, Petsko GA. A consumer’s guide to protein crystallogra-phy. In: Protein engineering and design. San Diego: AcademicPress, 1996; p 210–229.

3. Bennett W, Huber R. Structural and functional aspects of domainmotions in proteins. Crit Rev Biochem 1984;15:291–384.

4. Wuthrich K, Wagner G. Internal motion in globular proteins.Trends Biochem Sci 1978;3:227–230.

5. Nichols WL, Rose GD, Ten Eyck LF, Zimm BH. Rigid domains inproteins: an algorithmic approach to their identification. Proteins1995;23:38–48.

6. Siddiqui AS, Barton GJ. Continuous and discontinuous domains:an algorithm for the automatic generation of reliable proteindomain definition. Protein Sci 1995;4:872–884.

7. Boutonnet N, Rooman M, Wodak S. Automatic analysis of proteinconformational changes by multiple linkage clustering. J Mol Biol1995;253:633–647.

8. Holm L, Sander C. Parser for protein folding units. Proteins1994;19:256–268.

9. Zehfus MH, Rose GD. Compact units in proteins. Biochemistry1986;25:5759–5765.

10. Karplus PA, Schulz GE. Prediction of chain flexibility in proteins.Naturwissenschaften 1985;72:212–213.

11. Duan Y, Kollman PA. Pathways to a protein folding intermediateobserved in a 1-microsecond simulation in aqueous solution.Science 1998;282:740–744.

12. Ma J, Karplus M. The allosteric mechanism of the chaperoninGroEL: a dynamic analysis. Proc Natl Acad Sci USA 1998;95:8502–8507.

13. Case DA. Molecular dynamics and normal mode analysis ofbiomolecular rigidity. In: Thorpe M, Duxbury P, editors. Rigiditytheory and applications. Kluwer Academic/Plenum, 1999.

14. Keskin O, Jernigan RL, Bahar I. Proteins with similar architec-ture exhibit similar large-scale dynamic behavior. Biophys J2000;78:2093–2106.

15. Janin J, Wodak S. Structural domains in proteins and their role in


the dynamics of protein function. Prog Biophys Mol Biol 1983;42:21–78.

16. Maiorov V, Abagyan R. A new method for modeling large-scalerearrangements of protein domains. Proteins 1997;27:410–424.

17. Jacobs DJ, Thorpe MF. Generic rigidity percolation: The pebblegame. Phys Rev Letters 1995;75:4051–4054.

18. Jacobs DJ. Generic rigidity in three-dimensional bond-bendingnetworks. J Phys A Math Gen 1998;31:6653–6668.

19. Thorpe MF, Jacobs D, Chubynsky N, Rader A. Generic rigidity ofnetwork glasses. In: Thorpe MF, Duxbury P, editors. Rigiditytheory and applications. New York: Kluwer Academic/Plenum;1999.

20. Jacobs D, Thorpe MF. Computer-implemented system for analyz-ing rigidity of substructures within a macromolecule. US Patentnumber 1998:6,014,449.

21. Dill KA. Dominant forces in protein folding. Biochemistry 1990;29:7133–7155.

22. Abagyan R, Totrov M, Kuznetsov D. ICM—a new method forprotein modeling and design: applications to docking and struc-ture prediction from the distorted native conformation. J ComputChem 1994;15:488–506.

23. Lu H, Schulten K. Steered molecular dynamics simulation ofconformational changes in immunoglobulin domain 127 interpretatomic force microscopy observations. Chem Phys 1999;247:141–153.

24. Jacobs D, Kuhn LA, Thorpe MF. Flexible and rigid regions inproteins. In: Thorpe MF, Duxbury P, editors. Rigidity theory andapplications. New York: Kluwer Academic/Plenum; 1999.

25. Lagrange J. Mecanique analytique. Paris, 1788.26. Maxwell JC. On the calculation of the equilibrium and stiffness of

frames. Philos Mag 1864;27:294–299.27. Laman G. On graphs and rigidity of plane skeletal structures. J

Eng Math 1970;4:331–340.28. Ashcroft N, Mermin N. Solid state physics. Fort Worth, TX:

Saunders College Publishing; 1976.29. Graver J, Servatius B, Servatius H. Combinatorial rigidity.

Graduate studies in mathematics. Providence, RI: AmericanMathematical Society; 1993.

30. Guyon E, Roux S, Hansen A, Bibeau D, Troadec J, Crapo H.Non-local and non-linear problems in the mechanics of disorderedsystems: application to granular media and rigidity problems. RepProg Phys 1990;53:373–419.

31. Jacobs DJ, Hendrickson B. An algorithm for two-dimensionalrigidity percolation: the pebble game. J Comput Phys 1997;137:346–365.

32. Jacobs DJ, Thorpe MF. Generic rigidity percolation in two dimen-sions. Phys Rev E 1996;53:3683–3693.

33. Tay T-S, Whiteley W. Recent advances in generic rigidity ofstructures. Struct Topol 1985;9:31–38.

34. Whiteley W. Rigidity of molecular structures: generic and geomet-ric analysis. In: Thorpe M, Duxbury P, editors. Rigidity theory andapplication. New York: Kluwer Academic/Plenum; 1999.

35. Xiao Y, Jacobs D, Thorpe M. Unpublished results; 1997.36. Jeffrey GA. An introduction to hydrogen bonding. New York:

Oxford University Press; 1997.37. Fersht AR. The hydrogen bond in molecular recognition. Trends

Biochem Sci 1987;12:301–304.38. Vriend G. WHAT IF: a molecular modeling and drug design

program. J Mol Graph 1990;8:52–56. http://swift.embl-heidel-berg.de/whatif/

39. Stickle D, Presta L, Dill K, Rose G. Hydrogen bonding in globularproteins. J Mol Biol 1992;226:1143–1159.

40. McDonald I, Thorton J. Satisfying hydrogen bonding potential inproteins. J Mol Biol 1991;238:777–793.

41. Barlow DJ, Thorton JM. Ion-pairs in proteins. J Mol Biol 1983;168:867–885.

42. Gandini D, Gogioso L, Bolognesi M, Bordo D. Patterns in ionizableside chain interactions in protein structures. Proteins 1996;24:439–449.

43. Xu D, Tsai C-J, Nussinov R. Hydrogen bonds and salt bridgesacross protein–protein interfaces. Protein Eng 1997;10:999–1012.

44. Dahiyat BI, Gordon DB, Mayo SL. Automated design of thesurface positions of protein helices. Protein Sci 1997;6:1333–1337.

45. Kumar S, Nussinov R. Salt bridge stability in monomeric proteins.J Mol Biol 1999;293:1241–1255.

46. Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PRO-CHECK: a program to check the stereochemical quality of proteinstructures. J Appl Crystallogr 1993;26:283–291. ftp.biochem.ucl.ac.uk/pub/procheck/tar3_4/procheck.tar.Z

47. Nicholson LK, Yamazaki T, Torchia DA, Grzesiek S, Bax A, StahlSJ, Kaufman JD, Wingfield PT, Yam PYS, Jadhav PK, Hodge CN,Domaille PJ, Chang C-H. Flexibility and function in HIV-1protease. Nature Struct Biol 1995;2:274–280.

48. Goodman J, Pagel M, Stone M. Relationships between proteinstructure and dynamics from a database of NMR-derived back-bone order parameters. J Mol Biol 2000;295:963–978. http://pooh.chem.indiana.edu/IDD.html

49. Korn AP, Rose DR. Torsion angle differences as a means ofpinpointing local polypeptide chain trajectory changes for identi-cal proteins in different conformational states. Protein Eng 1994;7:961–967.

50. Gerstein M, Krebs W. A database of molecular motions. NucleicAcids Res 1998;26:4280–4290.

51. Chen Z, Li Y, Schock HB, Hall D, Chen E, Kuo LC. Threedimensional structure of a mutant HIV-1 protease displayingcross-resistance to all protease inhibitors in clinical trials. J BiolChem 1995;270:21433–21436.

52. Patrick A, Rose R, Greytok J, Bechtold C, Hermsmeier M, Chen P,Barrish J, Zahler R, Colonno R, Lin P. Characterization of ahuman immunodeficiency virus type 1 variant with reducedsensitivity to an aminodiol protease inhibitor. J Virol 1995;69:2148–2152.

53. Wlodawer A, Erickson J. Structure-based inhibitors of HIV-1protease. Annu Rev Biochem 1993;62:543–585.

54. Ishima R, Freedber D, Wang Y-X, Louis J, Torchia D. Flapopening and dimer–interface flexibility in the free and inhibitor-bound HIV protease, and their implications for function. StructFold Design 1999;7:1047–1055.

55. Scott W, Schiffer C. Curling of flap tips in HIV-1 protease as amechanism for substrate entry and tolerance of drug resistance.Struct Fold Design 2000;9:1259–1265.

56. Sawaya M, Kraut J. Loop and subdomain movements in themechanism of Escherichia coli dihydrofolate reductase: crystallo-graphic evidence. Biochemistry 1997;36:586–603.

57. Miller G, Benkovic S. Stretching exercises—flexibility in dihydro-folate reductase catalysis. Chem Biol 1998;5:R105–R113.

58. Epstein D, Benkovic S, Wright P. Dynamics of the dihydrofolatereductase–folate complex: catalytic sites and regions known toundergo conformational change exhibit diverse dynamical fea-tures. Biochemistry 1995;34:11037–11048.

59. Gerstein M, Schulz G, Chothia C. Domain closure in adenylatekinase: joints on either side of two helices close like neighboringfingers. J Mol Biol 1993;229:494–501.

60. Schlauderer GJ, Proba K, Schulz GE. Structure of a mutantadenylate kinase ligated with an ATP-analogue showing domainclosure over ATP. J Mol Biol 1996;256:223–227.

61. Zhang H-J, Sheng X-R, Pan X-M, Zhou J-M. Activation of adenyl-ate kinase by denaturants is due to the increasing conformationalflexibility at its active sites. Biochem Biophys Res Commun1997;238:382–386.


Documents

Protein flexibility predictions using graph theory · 2005-12-30 · work of a protein can then be analyzed using new graph theoretical techniques that were originally developed for