5
Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determination using distances from two- dimensional nuclear Overhauser effect experiments: Effect of approximations on the accuracy of derived structures PAUL D. THOMAS, VLADIMIR J. BASUS, AND THOMAS L. JAMES* Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143 Communicated by Mildred Cohn, October 22, 1990 ABSTRACT Solution structures for many proteins have been determined to date utilizing interproton distance con- straints estimated from two-dimensional nuclear Overhauser effect (2D NOE) spectra. Although the simple isolated spin pair approximation (ISPA) generally used can result in systematic errors in distances, the large number of constraints enables protein structure to be defmed with reasonably high resolution. Effects of these systematic errors on the resulting protein structure are examined. Iterative relaxation matrix calcula- tions, which account for dipolar interactions between all pro- tons in a molecule, can accurately determine internuclear distances with little or no a priori knowledge of the molecular structure. The value of this additional complexity is also addressed. To assess these distance determination methods, hypothetical "experimental" data, including random noise and peak overlap, are calculated for an arbitrary "true" protein structure. Three methods of obtaining distance constraints from 2D NOE peak intensities are examined: one entails a conservative use of ISPA, one assumes the ISPA to be fairly accurate, and one utilizes an iterative relaxation matrix method called MARDIGRAS (matrix analysis of relaxation for discerning the geometry of an aqueous structure), developed in this laboratory. A distance geometry algorithm was used to gener- ate a family of structures for each distance set. The quality of the average structure from each family was good. The root- mean-square deviation of that average structure from the true structure was improved about 2-5% using the more restrictive rather than the more conservative ISPA approach. Use of MARDIGRAS in a conservative fashion-i.e., with a poor initial model-resulted in improvement in the root-mean-square de- viation by 8-15%. With a better initial model, MARDIGRAS obtained even more accurate distances. MARDIGRAS also per- mits analysis of 2D NOE data at longer mixing times, yielding additional distances. Use of more restrictive ISPA distances did, however, result in a few systematically incorrect structural features in local regions of the protein, producing distortions of 2-3 A. Comparison between experimental data and spectra calculated for the structures correlates with root-mean-square deviation, offering a method of structure evaluation. An R factor for evaluating fit between experimental and calculated 2D NOE intensities is proposed. Interproton distances obtained from homonuclear proton two-dimensional nuclear Overhauser effect (2D NOE) exper- iments are used to determine three-dimensional protein struc- ture in solution (1-4). Various protocols are used for struc- ture determination, but the initial step often utilizes distance geometry (DG) to generate a family of structures consistent with NOE distance constraints (5, 6). Some methods entail theoretical energy calculations, energy minimization, or re- strained molecular dynamics, with pseudoenergy terms maintaining NOE-derived distances (7, 8). Families of struc- tures for several proteins have been produced that exhibit little internal variance and few distance violations. But it does not necessarily follow that these families accurately repre- sent the actual molecular structure. Approximations used to derive distance constraints from NOE intensities have been questioned (9-13). The effect of these approximations on the inferred protein structure, however, has not been reported. Here we primarily address two questions: (i) Will semiquan- titative NOE distances suffice to define protein tertiary structure, and how precisely can we specify distances before we sacrifice accuracy? (ii) Are techniques without the ap- proximations worth the computer time required to generate more accurate distances? To assess the effect of different methods of obtaining distance constraints on the derived structures, we must know the "true" protein structure precisely. We therefore used hypothetical 2D NOE spectra, generated by our program CORMA (14), for a defined structure. From the simulated data, we employed three methods to generate three sets of distance constraints. Two of the methods rely on the commonly used two-spin or isolated spin pair approximation (ISPA) at the extremes of restrictive and conservative assignment of dis- tance bounds. The third method utilizes our program MAR- DIGRAS (matrix analysis of relaxation for discerning the geometry of an aqueous structure) (15). MARDIGRAS itera- tively refines the complete proton relaxation matrix until it is both internally consistent and consistent with any input experimental 2D NOE intensities. It then calculates distances from interproton relaxation rates for proton pairs corre- sponding to experimental cross-peaks. Each distance set was used separately for DG calculations. The family of structures generated by DG for each distance set was then compared with the true structure. Systematic errors in DG structures are unlikely to be corrected by energy-based computations, particularly if they are due to distance errors. Energy-based methods would add an additional complication, the balance between "real" intramolecular forces and NOE pseudopo- tentials. We decided that analysis of DG structures provided the most meaningful test of distance assignment methods. Determination of Interproton Distances from 2D NOE Intensities Typically, distances are estimated from 2D NOE cross-peak intensities using the two-spin or ISPA: rij = rref (aref/au) 6, where ru is the interproton distance to be estimated and aY is the corresponding 2D NOE cross-peak intensity; rref and aref are a known interproton distance and its cross-peak intensity, respectively. Assumptions of ISPA are (i) mixing time Tm is Abbreviations: 2D NOE, two-dimensional nuclear Overhauser ef- fect; ISPA, isolated spin pair approximation; DG, distance geometry; rmsd, root-mean-square deviation(s); BPTI, bovine pancreatic tryp- sin inhibitor. *To whom reprint requests should be addressed. 1237 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on December 27, 2020

Protein determination distances Overhauser experiments ...Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determinationusingdistances

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Protein determination distances Overhauser experiments ...Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determinationusingdistances

Proc. Natl. Acad. Sci. USAVol. 88, pp. 1237-1241, February 1991Biophysics

Protein solution structure determination using distances from two-dimensional nuclear Overhauser effect experiments: Effect ofapproximations on the accuracy of derived structuresPAUL D. THOMAS, VLADIMIR J. BASUS, AND THOMAS L. JAMES*Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143

Communicated by Mildred Cohn, October 22, 1990

ABSTRACT Solution structures for many proteins havebeen determined to date utilizing interproton distance con-straints estimated from two-dimensional nuclear Overhausereffect (2D NOE) spectra. Although the simple isolated spin pairapproximation (ISPA) generally used can result in systematicerrors in distances, the large number of constraints enablesprotein structure to be defmed with reasonably high resolution.Effects of these systematic errors on the resulting proteinstructure are examined. Iterative relaxation matrix calcula-tions, which account for dipolar interactions between all pro-tons in a molecule, can accurately determine internucleardistances with little or no a priori knowledge of the molecularstructure. The value of this additional complexity is alsoaddressed. To assess these distance determination methods,hypothetical "experimental" data, including random noise andpeak overlap, are calculated for an arbitrary "true" proteinstructure. Three methods of obtaining distance constraintsfrom 2D NOE peak intensities are examined: one entails aconservative use of ISPA, one assumes the ISPA to be fairlyaccurate, and one utilizes an iterative relaxation matrix methodcalled MARDIGRAS (matrix analysis of relaxation for discerningthe geometry of an aqueous structure), developed in thislaboratory. A distance geometry algorithm was used to gener-ate a family of structures for each distance set. The quality ofthe average structure from each family was good. The root-mean-square deviation of that average structure from the truestructure was improved about 2-5% using the more restrictiverather than the more conservative ISPA approach. Use ofMARDIGRAS in a conservative fashion-i.e., with a poor initialmodel-resulted in improvement in the root-mean-square de-viation by 8-15%. With a better initial model, MARDIGRASobtained even more accurate distances. MARDIGRAS also per-mits analysis of 2D NOE data at longer mixing times, yieldingadditional distances. Use of more restrictive ISPA distancesdid, however, result in a few systematically incorrect structuralfeatures in local regions of the protein, producing distortions of2-3 A. Comparison between experimental data and spectracalculated for the structures correlates with root-mean-squaredeviation, offering a method of structure evaluation. An Rfactor for evaluating fit between experimental and calculated2D NOE intensities is proposed.

Interproton distances obtained from homonuclear protontwo-dimensional nuclear Overhauser effect (2D NOE) exper-iments are used to determine three-dimensional protein struc-ture in solution (1-4). Various protocols are used for struc-ture determination, but the initial step often utilizes distancegeometry (DG) to generate a family of structures consistentwith NOE distance constraints (5, 6). Some methods entailtheoretical energy calculations, energy minimization, or re-strained molecular dynamics, with pseudoenergy terms

maintaining NOE-derived distances (7, 8). Families of struc-tures for several proteins have been produced that exhibitlittle internal variance and few distance violations. But it doesnot necessarily follow that these families accurately repre-sent the actual molecular structure. Approximations used toderive distance constraints from NOE intensities have beenquestioned (9-13). The effect of these approximations on theinferred protein structure, however, has not been reported.Here we primarily address two questions: (i) Will semiquan-titative NOE distances suffice to define protein tertiarystructure, and how precisely can we specify distances beforewe sacrifice accuracy? (ii) Are techniques without the ap-proximations worth the computer time required to generatemore accurate distances?To assess the effect of different methods of obtaining

distance constraints on the derived structures, we must knowthe "true" protein structure precisely. We therefore usedhypothetical 2D NOE spectra, generated by our programCORMA (14), for a defined structure. From the simulated data,we employed three methods to generate three sets ofdistanceconstraints. Two of the methods rely on the commonly usedtwo-spin or isolated spin pair approximation (ISPA) at theextremes of restrictive and conservative assignment of dis-tance bounds. The third method utilizes our program MAR-DIGRAS (matrix analysis of relaxation for discerning thegeometry of an aqueous structure) (15). MARDIGRAS itera-tively refines the complete proton relaxation matrix until it isboth internally consistent and consistent with any inputexperimental 2DNOE intensities. It then calculates distancesfrom interproton relaxation rates for proton pairs corre-sponding to experimental cross-peaks. Each distance set wasused separately forDG calculations. The family of structuresgenerated by DG for each distance set was then comparedwith the true structure. Systematic errors in DG structuresare unlikely to be corrected by energy-based computations,particularly if they are due to distance errors. Energy-basedmethods would add an additional complication, the balancebetween "real" intramolecular forces and NOE pseudopo-tentials. We decided that analysis of DG structures providedthe most meaningful test of distance assignment methods.

Determination of Interproton Distances from2D NOE Intensities

Typically, distances are estimated from 2D NOE cross-peakintensities using the two-spin or ISPA: rij = rref (aref/au) 6,where ru is the interproton distance to be estimated and aY isthe corresponding 2D NOE cross-peak intensity; rref and arefare a known interproton distance and its cross-peak intensity,respectively. Assumptions of ISPA are (i) mixing time Tm is

Abbreviations: 2D NOE, two-dimensional nuclear Overhauser ef-fect; ISPA, isolated spin pair approximation; DG, distance geometry;rmsd, root-mean-square deviation(s); BPTI, bovine pancreatic tryp-sin inhibitor.*To whom reprint requests should be addressed.

1237

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 27

, 202

0

Page 2: Protein determination distances Overhauser experiments ...Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determinationusingdistances

Proc. Natl. Acad. Sci. USA 88 (1991)

sufficiently short that each cross-peak intensity originatesonly from relaxation between two spins and (ii) internalmotions are negligible. In general, assumption of an isotropicmotional model leads to relatively small errors (i.e., < 10%o)(9). The chief cause of error for ISPA lies in neglectingmultispin relaxation effects commonly referred to as "spindiffusion" (9-12). Estimates of the inherent error associatedwith ISPA, reflected by the upper and lower bounds assignedto the distances, vary widely throughout the literature. Somestudies use estimated distances only qualitatively, but othersassume the extreme (i.e., sixth power) dependence of inten-sity on distance allows distances to be specified more pre-

cisely. Borgias et al. (12) showed that, for mixing timesgenerally accepted as sufficiently short (i.e., 50-100 ms),ISPA can result in systematic errors of 45-80% in distancesover 3.5 A, the range most important in defining molecularstructure.

Several techniques have been proposed that obviateISPA's inadequacy. MINSY entails saturating selected spinsduring the mixing period, preventing spin-diffusion fromoccurring by means of these spins (16). Most methods,however, make better distance approximations by at leastpartial consideration of multispin effects (17-20). Completerelaxation matrix approaches take into account all dipole-dipole interactions, explicitly accounting for spin diffusion (9,20-23). In addition, internal motions can be included incalculation of distances, although these will increase theuncertainty in distances to protons involved in the motion.Multiple conformational states can be modeled as well, witha weighted average ofthe relaxation matrices describing eachstate. Iterative methods for fitting experimental and theoret-ical 2D NOE intensities using relaxation matrix approachesyield unbiased distances (11, 15, 24). In particular, an effi-cient program has been developed to accurately calculatedistances (15) without relying at each cycle on more compu-tationally expensive techniques such as DG or restrainedmolecular dynamics. This program is called MARDIGRAS. Ithas been used for nucleic acid structure determination (25)and here is compared to ISPA for determining protein dis-tance constraints.

Methodology

Generation of Hypothetical 2D NOE Spectral Intensities. 2DNOE mixing coefficients (proportional to 2D NOE intensi-ties) were calculated using our program CORMA. The arbitrarystructural model for these calculations was the Spti crystalstructure variant (26) of bovine pancreatic trypsin inhibitor(BPTI), a 58-amino acid protein. Protons were positionedwith a locally written program to idealize their geometry withrespect to heavy atom coordinates. Spectra were calculatedfor mixing times of 100 and 200 ms for isotropic overallcorrelation times of 2 ns and 5 ns. Tc for BPTI at 600C is 2 ns;we ran the 5-ns simulation to determine effects of greater spindiffusion with slower motions. Internal molecular motionswere modeled as follows. Unresolved methylene and methylpeak intensities were calculated with simplified rapid two-and three-state jump models: effective distances to pseudo-atoms were calculated by (r-3) averaging over individualproton positions without using an anisotropic spectral densityfunction. Unresolved aromatic ring proton peak intensitieswere calculated with slower (r-6) averaging.Random noise was added to each spectrum within the

range of ±0.25% of the diagonal peak intensity at mixing time0. This is quite conservative; signal-to-noise ratios in localexperimental spectra are generally much higher. The cutofffor accepted cross-peak intensities was 0.3%. To mimic realdata sets with information loss due to peak overlap, a realisticsubset ofeach calculated spectrum was chosen to correspondto assigned, resolved peaks in an experimental spectrum for

BPTI. All final hypothetical data sets included the same 8122D NOE cross-peak intensities. With longer mixing time (200ms) or longer correlation time (5 ns), more than 812 cross-peaks would be observable, but we limited data sets so thatstructures obtained for all data sets could be compared. Withthe relatively large number of structural constraints, derivedstructures in this study may be less sensitive to sporadicdistance errors than in many actual cases.

Determination of Distances from 2D NOE Intensities. AsISPA requires short mixing times, all ISPA distances wereestimated from the 100-ms data. We selected from the currentliterature two ISPA-based approaches to assigning distances.The first is "conservative ISPA": this approach yields broaddistance ranges allowing for significant error. The second,"restrictive ISPA," utilizes ISPA to calculate shorter dis-tances and otherwise assigns relatively narrow error bound-aries. A third distance set was determined by means ofMARDIGRAS, the least computationally intensive of the iter-ative complete relaxation matrix approaches. NOE-deriveddistances were not supplemented with any additional con-straints, other than holonomic constraints necessary fordistance geometry calculations. Not all NOE intehsities yielduseful distance information-e.g., cross-peaks between gem-inal protons and between aromatic ring protons. So the finaldistance sets contain fewer constraints than there were 2DNOE intensities. Of the 812 "observable" cross-peak inten-sities, only 708 yield useful distance constraints.

Conservative Distance Bounds Using ISPA. Distance con-straints were assigned to categories based on fixed-distancePhe and Tyr H81-Hel and H82-He2 cross-peak intensities.These intensities build up more slowly than those for geminalproton pairs (often used for ISPA) and are therefore a morereasonable approximation of the initial rate condition. In theexperimental spectrum used for reference, only one of thesepeaks was resolvable (H81-Hel of Tyr-35). This cross-peakalone was used as reference (distance = 2.49 A). Table 1 listsdistance assignments. For distances to unresolvable methyland methylene groups, the standard real-atom approach wasused, assigning the distance to the central carbon atom andadding 1.5 or 1.0 A, respectively, to the constraints. Foraromatic ring pseudoatoms, distances were assigned to thegeometric-mean carbon atom (C'y for 6-protons and C; forE-protons) after adding a 2.0-A correction factor. For lowerbounds, the minimum distance was the sum of the van derWaals radii.

Restrictive Distance Bounds Using ISPA. For this case,lower bounds were explicitly assigned. Shorter distanceswere calculated directly using ISPA (±0.3 A). The averagevalue of all Hal-Ha2 geminalproton intensities was used forcalibration (distance = 1.77 A). Smaller cross-peak intensi-ties were placed in one of two distance categories; distance

Table 1. Assignment of distance constraints from2D NOE intensities

Intensity DistanceDistance set (Tf/ns) range, % range, A No.

Conservative ISPA (2) >2.0 1.8 -2.8 3500.8-2.0 1.8-3.5 2620.3 - 2.0 1.8 - 5.0 200

Conservative ISPA (5) >4.5 1.8 - 2.8 3511.5 - 4.5 1.8 - 3.6 3180.3 - 1.5 1.8-5.2 143

Restrictive ISPA (2) >1.2 ISPA ± 0.3 5200.9- 1.2 3.0-3.7 570.3 - 0.9 3.6 - 5.0 235

Restrictive ISPA (5) >2.0 ISPA + 0.3 6041.5-2.0 3.1-3.8 650.3 - 1.5 3.6-5.0 143

1238 Biophysics: Thomas et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 27

, 202

0

Page 3: Protein determination distances Overhauser experiments ...Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determinationusingdistances

Proc. Natl. Acad. Sci. USA 88 (1991) 1239

assignments are in Table 1. Upper bound pseudoatom cor-rections were the same as for the conservative ISPA dis-tances; lower bounds were decreased by 0.5 A for methylsand methylenes and by 1.0 A for unresolved aromatic ringprotons.

Distance Bounds Using MARDIGRAS. MARDIGRAS requiresthat experimental 2D NOE intensities be supplemented byintensities calculated for some arbitrary model structure. Tominimize bias, the model used for all distance calculations,unless otherwise noted, was the 5pti proton coordinate setrandomized by a root-mean-square (rms) shift of 3.0 A. Thisrather poor model gave MARDIGRAS distances with a rmsdeviation (rmsd) between upper and lower bounds of 1.68 Afor 2-ns data and 1.54 A for 5-ns data. When the 4pti crystalstructure (0.39 A rmsd vs. Spti over backbone atoms ofresidues 1-56) was used as starting model, the results weremuch better: rmsd of 1.53 A for 2-ns data and 1.26 A for 5-nsdata. As an additional test, MARDIGRAS was run using anextended-chain structure as the initial model. Overall, resultswere comparable to those for the randomized model. Al-though MARDIGRAS is capable of generating a largely correctset of distances independent of starting model, distances areimproved with a better initial model. Distances generatedusing the 4pti model have higher precision than those foreither the extended-chain or the randomized-coordinatemodel. Each MARDIGRAS calculation took 35-50 min on a SunSparcstation 1.MARDIGRAS yields more accurate distances from the 2-ns

data than from the 5-ns data. More extensive cross-relaxationoccurs during Tm at the longer correlation time, so intensitiestend toward a common value. Although MARDIGRAS accountsfor spin diffusion, it cannot compensate for loss of informa-tion resulting from severe spin diffusion. Despite this limi-tation, because MARDIGRAS inherently accounts for spindiffusion, it still yields accurate distances at longer mixingtimes than can be analyzed using ISPA, potentially providingmore distance constraints. Increasing the number of con-straints will improve structural resolution. We ran MARDI-GRAS using 100- and 200-ms data. Distances calculated forthese different 2D NOE intensity sets were in general agree-ment, but the 100-ms distances were slightly more accurate.But an additional 50 distances were obtained from the 200-msdata whose cross-peaks are below the noise level after 100ms. However, we did not use these distances in the structuredetermination phase of this study as we wanted to test theeffect ofdistance accuracy rather than number ofconstraints.The complete relaxation network approach here providesaccurate distances; generally distances of 3-4 A can bespecified to +0.2 A, and distances of 5 A can be specified to±0.7 A.The iterative portion of MARDIGRAS does not rely on any

model for internal motions. Convergence is required only ofrelaxation rates. Distances calculated from these rates, how-

ever, depend on motional model. For this study, we assumedonly isotropic overall motion. For unresolvable equivalentspins, MARDIGRAS gives distances to a pseudoatom; only theuncertainty in these distances is currently estimated using amotional model. For methyl and methylene groups, thecorrection factor is ±0.3 A, reflecting a worst-case deviationof the effective proton pseudoatom position from the centralcarbon atom, assuming (r-3) averaging. For distances toaromatic ring pseudoatoms, the correction factor is the sameas for ISPA distances.

Generation of Structures Using Distance Geometry. Weused the program VEMBED written by John Thomason atUniv. of California, San Francisco; it is a vectorized versionof EMBED (27). For each set ofNOE distances, two DG runswere made, each producing 20 structures. A control set of"perfect" distances was also run through VEMBED to distin-guish errors due to distance constraints from those due to theVEMBED algorithm and to our subsampling ofthe hypotheticalNOE data. For each intensity in the 100-ms data, the actualdistance (±0.02 A) in the Spti structure was used. In addition,distance ran es were modified for unresolvable equivalentspins, ±0.3 A for methyl and methylene carbon distances and+2.0/-1.0 A for aromatic ring carbons, to reproduce theprecision of the experimentally derived distance sets. Thebest 8-10 structures for each set of distances were selected,based on the final value of the VEMBED target function.

Results of Distance Geometry Calculations

Results are summarized in Tables 2-4. All rmsd are calcu-lated over all residues except for the N-terminal residue andthree at the C terminus. These residues were found to be verypoorly defined even for the control set of perfect distances,due simply to lack of constraints. Table 2 lists rmsd values forbest-fit superposition of each structure relative to (i) the"average structure" for that particular distance set and (ii) thetrue structure. It is common to list rmsd from the averagestructure as a measure of similarity between the individualstructures in a family. Table 3 compares these averagestructures to the starting structure. Since the NOE experi-ment yields time-averaged data, it is often assumed that thisaverage structure is representative of the time-averagedstructure. This approach may be justified: Table 3 shows thatthe average structure is consistently closer to the true struc-ture than any single member of a family.One method ofjudging accuracy of a proposed molecular

structure is to calculate the 2D NOE spectrum for that modeland compare it to the experimental data (28). This has gainedbroad acceptance recently, and several refinement tech-niques based on such calculations have been proposed (11,15, 19, 24, 29). Table 4 was compiled using CORMA tocalculate the theoretical spectrum for each structure andcomparing the fit between these spectra and the original

Table 2. rms difference between upper and lower distance bounds and rmsd of each DG structure in a set from the averaged coordinatesof structures in the set and from the true starting structure

rmsd of DG structures from average rmsd of DG structures from truestructure,* A structure,* A

rmsd between Backbone Side chain Backbone Side chainDistance set (Tr/ns) bounds, A atoms atoms All atoms atoms atoms All atoms

Control 1.09 0.69 ± 0.09 1.34 ± 0.15 1.13 ± 0.12 1.10 ± 0.08 2.16 ± 0.17 1.81 ± 0.12Conservative ISPA (2) 2.77 0.76 ± 0.10 1.44 ± 0.12 1.23 ± 0.11 1.27 ± 0.11 2.43 ± 0.12 2.07 ± 0.09Restrictive ISPA (2) 2.19 0.72 ± 0.09 1.43 ± 0.13 1.22 ± 0.11 1.19 ± 0.09 2.38 ± 0.15 2.00 ± 0.12MARDIGRAS (2) 1.68 0.66 ± 0.06 1.34 ± 0.10 1.13 ± 0.07 1.08 ± 0.10 2.26 ± 0.16 1.89 ± 0.13Conservative ISPA (5) 2.75 0.76 ± 0.08 1.46 ± 0.14 1.23 ± 0.10 1.36 ± 0.06 2.50 ± 0.13 2.13 ± 0.10Restrictive ISPA (5) 2.00 0.64 ± 0.08 1.33 ± 0.07 1.11 ± 0.06 1.22 ± 0.06 2.35 ± 0.15 1.98 ± 0.10MARDIGRAS (5) 1.54 0.73 ± 0.09 1.40 ± 0.13 1.19 ± 0.10 1.13 ± 0.11 2.24 + 0.12 1.89 ± 0.10*Values represent the mean ± standard deviation.

Biophysics: Thomas et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 27

, 202

0

Page 4: Protein determination distances Overhauser experiments ...Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determinationusingdistances

Proc. Natl. Acad. Sci. USA 88 (1991)

Table 3. rmsd between the averaged coordinates of each set ofDG structures and the true structure

Backbone Side chainStructure set (Tcns) atoms, A atoms, A All atoms, A

Control 0.86 1.70 1.42

Conservative ISPA (2) 1.02 1.95 1.66Restrictive ISPA (2) 0.97 1.98 1.65MARDIGRAS (2) 0.86 1.81 1.52

Conservative ISPA (5) 1.13 2.03 1.74Restrictive ISPA (5) 1.03 1.94 1.64MARDIGRAS (5) 0.86 1.75 1.47

"data." Several different figures of merit were calculated toexpress the overall fit. The first two measures are analogousto crystallographic residual indices, or R factors:

R, = E ai - acI/ a'I1/2

R2 = [ (a' - ai 2/ E(a i 2]

where subscripts denote calculated (c) and observed (o)intensities. These R factors are well-established in crystal-lography, but for NMR other functions may be more descrip-tive. We favor sixth-root residual indices:

Rx = E (a )1/6 -(a' )1/61/E (a' )1/6

Rx = [ [(a')1/6 - (a')1/6]2/ E[(a' )1/6]2]

These equations attempt to relate intensities, assuming ap-

proximate dependence on r-6, to the coordinate space of themodel. Because of this extreme distance dependence, errorsin the shortest, often least structurally interesting distancestend to dominate R1 and R2. Sixth-root scaling allows longer-range interactions (i.e., up to =5 A) to be considered as well,though they are still not weighted as heavily as the largercross-peaks (appropriate due to their lower signal-to-noiseratio). Although the different residual indices generally of-fered no consensus in ranking structures within a given set ofstructures, gross results of the comparisons are essentiallyindependent of the form of the figure of merit. Differences inoverall fits to data qualitatively agree with the rmsd calcu-lations. This confirms the usefulness of comparing calculatedspectra to experimental data. Of the different residual indi-ces, RX ranked structures within a given set most consistentlywith rmsd rankings, though the correlation was not signifi-cant for some sets. MARDIGRAS structures fit the data con-sistently better than either ISPA set.

The most significant trend in all of these tables is theincrease in structural accuracy with increasingly restrictiveconstraints. As the distance ranges become smaller, thestructures become more conformationally restricted. As longas the smaller distance ranges are in reasonable agreementwith the true distances, there is a greater probability that a

randomly chosen distance from a smaller range will be more

accurate than one chosen from a larger range. ConservativeISPA distances do not specify a lower bound and are there-fore more likely to result in an underestimated randomdistance. Distance constraints in other parts of the moleculemay correct these initial distances, but sometimes they maynot. Considering rmsd, restrictive ISPA structures are on

average about 2-5% closer to the true structure than thosegenerated from conservative ISPA distances. MARDIGRASstructures are another 5-10% closer to the true structure thanrestrictive ISPA. Significantly, the control set of perfectdistances yielded structures with no improvement in back-bone rmsd and only 5% improvement in side chain rmsdrelative to the MARDIGRAS structures. Improvement in struc-ture quality is strikingly revealed by values of the residualindices (Table 4). These R factors for different distance setsdiffer by two or more standard deviations, whereas the rmsdbetween atom coordinates may differ by less than onestandard deviation.One might expect an overall decrease in accuracy of

derived structures with increasing Tr, particularly with ISPA,due to increased spin diffusion. For all three methods, moreincorrect distance constraints were calculated for the 5-nsdata than for the 2-ns data. For MARDIGRAS and restrictiveISPA methods, however, structures derived for the differentcorrelation times are comparable, for both overall rmsd andR factors (Tables 2-4). This may be a result of tighter boundsin the 5-ns distance sets (Table 2). This tightening of con-straints for the restrictive ISPA distances is shown in Table1: more cross-peak volumes in the 5-ns data set are largeenough to estimate directly. For MARDIGRAS, uncertainty inNOE-derived distances depends in part on signal-to-noiseratio; generally increasing signal intensity at 5 ns will de-crease uncertainty.Tables 2-4 describe structures on a global level. Determi-

nation of local structure will be important in certain regions,such as ligand binding sites. We used computer graphics tocompare each DG structure set with the true structure.Because distances in each structure could be compared tothose in the true structure, we used a list of consistentlyincorrect distances (in error by >0.5 A in all structures in a

set) to direct a search for systematic errors in determinedstructure. Fig. 1 Left shows one region of local structure forwhich distance errors accumulate to produce some system-atic errors. Structures generated from the 5-ns data fromrestrictive ISPA distances and MARDIGRAS distances arecompared. The Phe-45 ring is systematically displaced about

Table 4. Comparisons between calculated 2D NOE intensities for each structure and"experimental" data

Structure set (TC/ns)* R, R2 R1 RX2Control (2) 0.31 ± 0.03 0.38 ± 0.07 0.079 ± 0.005 0.137 ± 0.008Conservative ISPA (2) 0.46 ± 0.02 0.51 ± 0.02 0.125 ± 0.007 0.199 ± 0.010Restrictive ISPA (2) 0.39 ± 0.03 0.44 ± 0.04 0.109 ± 0.009 0.183 ± 0.017MARDIGRAS (2) 0.34 ± 0.02 0.38 ± 0.03 0.092 ± 0.007 0.157 ± 0.014

Control (5) 0.30 ± 0.02 0.37 ± 0.03 0.062 ± 0.005 0.094 ± 0.008Conservative ISPA (5) 0.45 ± 0.02 0.52 + 0.04 0.094 ± 0.003 0.131 ± 0.007Restrictive ISPA (5) 0.40 ± 0.03 0.48 ± 0.03 0.083 ± 0.005 0.117 ± 0.007MARDIGRAS (5) 0.34 ± 0.01 0.40 + 0.03 0.074 ± 0.003 0.107 ± 0.008

Values represent the mean ± standard deviation. See text for definitions of the various residualindices.*Numbers in parentheses also represent the correlation time used for the calculation of 2D NOEintensities.

1240 Biophysics: Thomas et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 27

, 202

0

Page 5: Protein determination distances Overhauser experiments ...Proc. Natl. Acad. Sci. USA Vol. 88, pp. 1237-1241, February 1991 Biophysics Protein solution structure determinationusingdistances

Proc. Natl. Acad. Sci. USA 88 (1991) 1241

FIG. 1. Comparison between best-fit DG structures generatedfrom restrictive ISPA distances (Left) and MARDIGRAS distances(Right). Distances were derived from 2D NOE spectra calculated fora mixing time of 100 ms and a correlation time of 5 ns. The truestructure is shown in dashed lines. Only BPTI residues 45-51 areshown, and all side chains except for Phe-45, Glu-49, and Cys-51have been removed for clarity. Structures are superimposed for bestfit over backbone atoms of residues 44-52.

2.0 A, due to incorrectly short distances to Arg-20 andTyr-21. Systematically short distances from Phe-45 to Cys-51result in an average displacement of the Cys oxygen atom by2.3 A. Although these errors involve a single incorrect torsionangle, Glu-49 contains several additive erroneous torsionangles. Two systematically incorrect intraresidue distances,along with another short distance to Asp-50, resulted in theGlu-49 carboxylate being displaced >3.0 A. MARDIGRASstructures show none of these systematic singularities (Fig. 1Right). Some systematic errors, notably backbone displace-ments over short (two to four residue) segments, were foundin conservative ISPA structures for both correlation times,but these may be a result of limited DG sampling of distancespace, as there were no significant errors in proton distancesin these regions.

Discussion

Results here suggest that more restrictive distance con-straints yield more accurate structures than conservativeapplication of the ISPA approach, using as judgment criteriaboth rmsd of heavy atom coordinates and spectral R factors.However, incorrect restrictive constraints-e.g., some fromthe restrictive ISPA approach-can lead to systematicallyincorrect local structural features. Despite the few instancesof systematic error, even highly qualitative distances coulddefine backbone folding correctly as well as the generalposition of most side chains. This assumption is inherent inmost protein solution structure studies to date; our resultsconfirm its validity. But the small investment of computertime to obtain more accurate distances by means of MARDI-GRAS results in structures that agree more closely both withthe actual structure and with the experimental data. MARDI-GRAS distances can be used with either restrained moleculardynamics or DG to yield structures. MARDIGRAS also offersthe potential of determining additional distances unavailableusing ISPA; MARDIGRAS explicitly accounts for spin diffusionand can thus derive accurate distances from data obtained forlonger mixing times.

Stereospecific assignment of prochiral centers was inten-tionally neglected here; correct stereospecific assignmentswere assumed. In reality, this is often not the case. Theaccuracy of MARDIGRAS distances, however, in tandem witha "floating chirality" algorithm allowing inversion of prochi-ral centers in the structure generation phase, can establishstereospecific assignments and thus higher resolution struc-tures. We note that this study does not address in depth theeffects of internal molecular motions. These effects, thoughgenerally less important than spin diffusion for macromolec-ular distance determination, should be considered for deriv-

ing distances from 2D NOE intensities. MARDIGRAS will soonbe modified to include more sophisticated internal motionalmodels such as those employed in our original completerelaxation matrix studies (9).

We thank Drs. Shauna Farr-Jones and Irwin D. Kuntz for helpfuldiscussions and comments on this manuscript, Mr. Robert Cerpa forsuggesting use ofa sixth-root residual index, and Mr. John Thomasonfor help with the initial DG calculations. This work was supported byNational Institutes of Health Grants GM 39247 and RR 01695 and bya gift from Ajinomoto Co., Inc. The Sun Sparcstation used for thecomputations was purchased using National Science FoundationGrant DMB-8406826. We gratefully acknowledge use of the Com-puter Graphics Laboratory (supported by National Institutes ofHealth Grant RR 01081) and use of the Cray-YMP supercomputer,which was supported by a grant from the Pittsburgh SupercomputingCenter through the National Institutes of Health Division of Re-search Resources, Cooperative Agreement U41RR04154, and a grantfrom the National Science Foundation, Cooperative AgreementASC-8500650.

1. Wuthrich, K. (1986) NMR of Proteins and Nucleic Acids(Wiley, New York).

2. Oppenheimer, N. J. & James, T. L., eds. (1989) MethodsEnzymol. 176.

3. Oppenheimer, N. J. & James, T. L., eds. (1989) MethodsEnzymol. 177.

4. Bax, A. (1989) Annu. Rev. Biochem. 58, 223-256.5. Nilges, M., Gronenborn, A. M., Brunger, A. T. & Clore,

G. M. (1988) Protein Eng. 2, 27-38.6. Billeter, M., Kline, A. D., Braun, W., Huber, R. & Wuthrich,

K. (1989) J. Mol. Biol. 206, 677-687.7. Zuiderweg, E. R. P., Scheek, R. M., Boelens, R., van Gun-

steren, W. F. & Kaptein, R. (1985) Biochimie 67, 707-715.8. Holak, T. A., Prestegard, J. H. & Forman, J. D. (1987) Bio-

chemistry 26, 4652-4660.9. Keepers, J. W. & James, T. L. (1984) J. Magn. Reson. 57,

404-426.10. Lefevre, J.-F., Lane, A. N. & Jardetzky, 0. (1987) Biochem-

istry 26, 5076-5090.11. Borgias, B. A. & James, T. L. (1988) J. Magn. Reson. 79,

493-512.12. Borgias, B. A., Gochin, M., Kerwood, D. J. & James, T. L.

(1990) Prog. Nucl. Magn. Reson. Spectrosc. 22, 83-100.13. Landy, S. B. & Rao, B. D. N. (1989) J. Magn. Reson. 83,

19-43.14. Borgias, B. A., Thomas, P.D. & James, T. L. (1989) CORMA,

Complete Relaxation Matrix Analysis (Univ. of California, SanFrancisco), Version 2.2.

15. Borgias, B. A. & James, T. L. (1990) J. Magn. Reson. 87,475-487.

16. Massefski, W., Jr., & Redfield, A. G. (1988) J. Magn. Reson.78, 150-155.

17. Eaton, H. L. & Andersen, N. H. (1987) J. Magn. Reson. 74,212-225.

18. Hyberts, S. G. & Wagner, G. H. (1989) J. Magn. Reson. 81,418-422.

19. Baleja, J. D. & Sykes, B. D. (1990) J. Magn. Reson. 87,375-384.

20. Borgias, B. A. & James, T. L. (1989) Meth. Enzymol. 176,169-183.

21. Olejniczak, E. T., Gampe, R. T., Jr., & Fesik, S. W. (1986) J.Magn. Reson. 67, 28-41.

22. Macura, S., Farmer, B. T., II, & Brown, L. R. (1986) J. Magn.Reson. 70, 493-499.

23. Mirau, P. A. (1988) J. Magn. Reson. 80, 439-447.24. Boelens, R., Koning, T. M. G. & Kaptein, R. (1988) J. Mol.

Struc. 173, 299-311.25. Gochin, M. & James, T. L. (1990) Biochemistry 29, 11172-

11180.26. Wlodawer, A., Walter, J., Huber, R. & Sjolin, L. (1984) J. Mol.

Biol. 180, 307-329.27. Havel, T. F., Kuntz, I. D. & Crippen, G. M. (1983) Bull. Math.

Biol. 45, 665-720.28. Suzuki, E.-I., Pattabiraman, N., Zon, G. & James, T. L. (1986)

Biochemistry 25, 6854-6865.29. Yip, P. & Case, D. A. (1989) J. Magn. Reson. 83, 643-648.

Biophysics: Thomas et al.

Dow

nloa

ded

by g

uest

on

Dec

embe

r 27

, 202

0