5
Ab initio solution of macromolecular crystal structures without direct methods Airlie J. McCoy a , Robert D. Oeffner a , Antoni G. Wrobel b , Juha R. M. Ojala c , Karl Tryggvason c,d , Bernhard Lohkamp e , and Randy J. Read a,1 a Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom; b Department of Clinical Biochemistry, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom; c Division of Matrix Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden; d Cardiovascular and Metabolic Disorders Program, Duke-NUS (National University of Singapore) Medical School, 16957 Singapore; and e Division of Molecular Structural Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden Edited by Axel T. Brunger, Stanford University, Stanford, CA, and approved February 27, 2017 (received for review January 30, 2017) The majority of macromolecular crystal structures are determined using the method of molecular replacement, in which known related structures are rotated and translated to provide an initial atomic model for the new structure. A theoretical understanding of the signal-to-noise ratio in likelihood-based molecular replace- ment searches has been developed to account for the influence of model quality and completeness, as well as the resolution of the diffraction data. Here we show that, contrary to current belief, molecular replacement need not be restricted to the use of models comprising a substantial fraction of the unknown structure. In- stead, likelihood-based methods allow a continuum of applica- tions depending predictably on the quality of the model and the resolution of the data. Unexpectedly, our understanding of the signal-to-noise ratio in molecular replacement leads to the finding that, with data to sufficiently high resolution, fragments as small as single atoms of elements usually found in proteins can yield ab initio solutions of macromolecular structures, including some that elude traditional direct methods. macromolecular crystallography | likelihood | ab initio phasing | molecular replacement | Shisa O ver the past century, determination of novel crystal struc- tures has evolved from an exercise in logic identifying the locations of single atoms by inspecting diffraction patterns (1) or vector maps (2), through the development of direct methods for small molecules (3) and of isomorphous replacement (4, 5) or anomalous diffraction (6, 7) phasing for molecules as large as proteins. Currently, about 80% of protein structures are solved by the method of molecular replacement (8), exploiting prior structural knowledge of related proteins. In principle, molecular re- placement (MR) involves rotational and translational searches over many possible placements of a molecular model within the unit cell of an unknown structure. The most sensitive method of evaluating the fit to the observed data is a likelihood function (9, 10) that accounts for the effect of measurement errors in the observed diffraction intensities (11). Potential solutions are scored by the log-likelihood-gain on intensities (LLGI), the sum of the log-likelihoods for individual reflections minus the log-likelihoods for an uninformative model (Methods). Success in MR depends on the signal-to-noise of the search, which varies according to two parameters in the likelihood function: D obs characterizes the precision of each measurement, taking values near 1 for moderately well-measured data and only taking values near 0 for extremely weak data; σ A measures the quality of the model in terms of the fraction of a crystallographic structure factor that it explains. The resolution-dependent value of σ A for each reflection can be estimated from the fraction (f P ) of the X-ray scattering power accounted for by the model (where the total scattering power is the sum of the squares of the scat- tering factors for the atoms in the crystal), its estimated accuracy (rms error Δ), and the resolution (d ) of the reflection (9), with (optionally) a correction for the effect of disordered solvent described by the parameters f sol and B sol : σ A = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi f P 1 f sol exp B sol 4d 2 s exp 2π 2 3 Δ 2 d 2 , [1a] σ A ffiffiffiffi f P p exp 2π 2 3 Δ 2 d 2 . [1b] The simpler expression in Eq. 1b neglects the effect of disor- dered solvent at low resolution. The signal for an MR search can be estimated before the calculation as the expected value, or probability-weighted aver- age, of the LLGI for a correctly placed model. The expected value of the contribution of one reflection, hLLGI i hkl , can be approximated simply by D 4 obs σ 4 A =2(Methods), an approximation that is particularly good for the low values of D obs σ A character- izing the difficult cases of most interest. In the following, we refer to the total expected LLGI, summed over all reflections, as the eLLG. The variance of eLLG can similarly be approximated as the sum over all reflections of D 4 obs σ 4 A , leading to the conclusion that the expected signal-to-noise ratio in an MR search will be pro- portional to ffiffiffiffiffiffiffiffiffiffiffiffi eLLG p (Methods). By the same reasoning, the signal-to-noise ratio achieved in a particular search will be pro- portional to ffiffiffiffiffiffiffiffiffiffiffiffi LLGI p . The theoretical deduction that confidence in an MR solution can be judged simply by the LLGI value has been validated by analyzing a database of nearly 22,000 MR Significance It is now possible to make an accurate prediction of whether or not a molecular replacement solution of a macromolecular crystal structure will succeed, given the quality of the model, its size, and the resolution of the diffraction data. This un- derstanding allows the development of powerful structure- solution strategies, and leads to the unexpected finding that, with data to sufficiently high resolution, fragments as small as single atoms can be placed as the basis for ab initio structure solutions. Author contributions: B.L. and R.J.R. designed research; A.J.M., R.D.O., A.G.W., J.R.M.O., K.T., B.L., and R.J.R. performed research; A.J.M., R.D.O., A.G.W., B.L., and R.J.R. analyzed data; A.J.M. and R.J.R. wrote the paper; and all authors contributed to revisions. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The atomic coordinates and structure factors have been deposited in the Protein Data Bank, www.pdb.org (PDB ID code 5m0w). 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1701640114/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1701640114 PNAS | April 4, 2017 | vol. 114 | no. 14 | 36373641 BIOPHYSICS AND COMPUTATIONAL BIOLOGY

Ab initio solution of macromolecular crystal … initio solution of macromolecular crystal structures without direct methods Airlie J. McCoya, Robert D. Oeffnera, Antoni G. Wrobelb,

Embed Size (px)

Citation preview

Ab initio solution of macromolecular crystal structureswithout direct methodsAirlie J. McCoya, Robert D. Oeffnera, Antoni G. Wrobelb, Juha R. M. Ojalac, Karl Tryggvasonc,d, Bernhard Lohkampe,and Randy J. Reada,1

aDepartment of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom; bDepartment ofClinical Biochemistry, Cambridge Institute for Medical Research, University of Cambridge, Cambridge CB2 0XY, United Kingdom; cDivision of MatrixBiology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden; dCardiovascular and Metabolic DisordersProgram, Duke-NUS (National University of Singapore) Medical School, 16957 Singapore; and eDivision of Molecular Structural Biology, Department ofMedical Biochemistry and Biophysics, Karolinska Institutet, 171 77 Stockholm, Sweden

Edited by Axel T. Brunger, Stanford University, Stanford, CA, and approved February 27, 2017 (received for review January 30, 2017)

The majority of macromolecular crystal structures are determinedusing the method of molecular replacement, in which knownrelated structures are rotated and translated to provide an initialatomic model for the new structure. A theoretical understandingof the signal-to-noise ratio in likelihood-based molecular replace-ment searches has been developed to account for the influence ofmodel quality and completeness, as well as the resolution of thediffraction data. Here we show that, contrary to current belief,molecular replacement need not be restricted to the use of modelscomprising a substantial fraction of the unknown structure. In-stead, likelihood-based methods allow a continuum of applica-tions depending predictably on the quality of the model and theresolution of the data. Unexpectedly, our understanding of thesignal-to-noise ratio in molecular replacement leads to the findingthat, with data to sufficiently high resolution, fragments as smallas single atoms of elements usually found in proteins can yield abinitio solutions of macromolecular structures, including some thatelude traditional direct methods.

macromolecular crystallography | likelihood | ab initio phasing | molecularreplacement | Shisa

Over the past century, determination of novel crystal struc-tures has evolved from an exercise in logic identifying the

locations of single atoms by inspecting diffraction patterns (1) orvector maps (2), through the development of direct methods forsmall molecules (3) and of isomorphous replacement (4, 5) oranomalous diffraction (6, 7) phasing for molecules as large asproteins.Currently, about 80% of protein structures are solved by the

method of molecular replacement (8), exploiting prior structuralknowledge of related proteins. In principle, molecular re-placement (MR) involves rotational and translational searchesover many possible placements of a molecular model within theunit cell of an unknown structure. The most sensitive methodof evaluating the fit to the observed data is a likelihood function(9, 10) that accounts for the effect of measurement errors in theobserved diffraction intensities (11). Potential solutions are scoredby the log-likelihood-gain on intensities (LLGI), the sum of thelog-likelihoods for individual reflections minus the log-likelihoodsfor an uninformative model (Methods).Success in MR depends on the signal-to-noise of the search,

which varies according to two parameters in the likelihoodfunction: Dobs characterizes the precision of each measurement,taking values near 1 for moderately well-measured data and onlytaking values near 0 for extremely weak data; σA measures thequality of the model in terms of the fraction of a crystallographicstructure factor that it explains. The resolution-dependent valueof σA for each reflection can be estimated from the fraction (fP)of the X-ray scattering power accounted for by the model (wherethe total scattering power is the sum of the squares of the scat-tering factors for the atoms in the crystal), its estimated accuracy(rms error Δ), and the resolution (d) of the reflection (9), with

(optionally) a correction for the effect of disordered solventdescribed by the parameters fsol and Bsol:

σA =

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifP

�1− fsolexp

�−Bsol

4d2

��sexp�−2π2

3Δ2

d2

�, [1a]

σA ≈ffiffiffiffifP

pexp�−2π2

3Δ2

d2

�. [1b]

The simpler expression in Eq. 1b neglects the effect of disor-dered solvent at low resolution.The signal for an MR search can be estimated before the

calculation as the expected value, or probability-weighted aver-age, of the LLGI for a correctly placed model. The expectedvalue of the contribution of one reflection, hLLGIihkl, can beapproximated simply by D4

obsσ4A=2 (Methods), an approximation

that is particularly good for the low values of DobsσA character-izing the difficult cases of most interest. In the following, we referto the total expected LLGI, summed over all reflections, asthe eLLG.The variance of eLLG can similarly be approximated as the

sum over all reflections of D4obsσ

4A, leading to the conclusion that

the expected signal-to-noise ratio in an MR search will be pro-portional to

ffiffiffiffiffiffiffiffiffiffiffiffiffieLLG

p(Methods). By the same reasoning, the

signal-to-noise ratio achieved in a particular search will be pro-portional to

ffiffiffiffiffiffiffiffiffiffiffiffiffiLLGI

p. The theoretical deduction that confidence

in an MR solution can be judged simply by the LLGI value hasbeen validated by analyzing a database of nearly 22,000 MR

Significance

It is now possible to make an accurate prediction of whether ornot a molecular replacement solution of a macromolecularcrystal structure will succeed, given the quality of the model,its size, and the resolution of the diffraction data. This un-derstanding allows the development of powerful structure-solution strategies, and leads to the unexpected finding that,with data to sufficiently high resolution, fragments as small assingle atoms can be placed as the basis for ab initio structuresolutions.

Author contributions: B.L. and R.J.R. designed research; A.J.M., R.D.O., A.G.W., J.R.M.O.,K.T., B.L., and R.J.R. performed research; A.J.M., R.D.O., A.G.W., B.L., and R.J.R. analyzeddata; A.J.M. and R.J.R. wrote the paper; and all authors contributed to revisions.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The atomic coordinates and structure factors have been deposited in theProtein Data Bank, www.pdb.org (PDB ID code 5m0w).1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1701640114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1701640114 PNAS | April 4, 2017 | vol. 114 | no. 14 | 3637–3641

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

calculations, where an LLGI of 60 or more in a 6-dimensionalrotation/translation search typically indicates a correct solution.(See Fig. 1, which also shows that the required signal scales withthe number of degrees of freedom in the search.) The databaseof test calculations also reveals that the translation function Zscore (TFZ: the number of SDs by which the translation functionpeak exceeds its mean) is roughly on the same scale as

ffiffiffiffiffiffiffiffiffiffiffiffiffiLLGI

p,

although the exact relationship depends on the number ofprimitive symmetry operators; this justifies the success of TFZ asa measure of confidence (10).An LLGI at the level required to distinguish the correct so-

lution from up to millions of alternatives can be achieved bypredictable trade-offs among model quality, completeness, andresolution of the data used. For example, this theoretical insightexplains why it is possible to place individual α-helices with betterthan random success in the Arcimboldo pipeline (12), but alsowhy it is a great advantage to have data extending beyond 2-Åresolution: helices are preserved very well, so that Δ is small anddata to the highest resolution will contribute to the signal. Thetheory also predicts, correctly, that calculations limited to around10-Å resolution can give unambiguous MR solutions for ribo-some structures, because of the large numbers of diffractionobservations available to that resolution with the large ribosomalunit cell. Importantly, it also allows researchers to anticipatewhen MR is unlikely to succeed, so that they avoid fruitlesscalculations.This insight led us to consider the most extreme example of a

small fragment, i.e., a single atom. A single atom is a perfectpartial model (Δ=0), for which σ2A = fP and hence hLLGIihkl ∝ f 2Pfor well-measured data regardless of the resolution. With high-resolution data containing a sufficient number of reflections,the eLLG can rise to a substantial number. This is particularlytrue for atoms that are somewhat heavier than average. Forinstance, the square of the scattering power of a sulfur atom(i.e., the fourth power of its scattering factor) is about 50×greater than that of a carbon atom at a very low resolution suchas 10–20 Å; because scattering drops off less rapidly for sulfur,that ratio increases to about 300 at 1-Å resolution. This effectis amplified if a sulfur atom is better ordered than the averageatom in the structure, because its relative scattering powerbecomes even greater. Furthermore, only half as much signalshould be required to place a single atom with 3 degrees of

freedom compared with a molecule with 6 degrees of freedom(Fig. 1). Our insights predict that, for crystals that contain upto a few thousand unique ordered atoms and diffract beyondabout 1-Å resolution, there should be a significant signal in alikelihood search carried out by translating a single sulfur atomover all of its possible positions. Even if the placement of thefirst atom is ambiguous, the signal will increase quadraticallywith the number of atoms placed (Fig. 2), allowing the ambi-guity to be resolved.

ResultsTest calculations on a number of systems proved the principle ofsingle-atom MR: it was indeed possible to find sulfur atoms in avariety of protein crystals, as well as phosphorus atoms in oneRNA crystal tested (Table S1). The largest structure that yieldedto this approach was that of aldose reductase [Protein DataBank (PDB) ID code 3bcj] (13). The protein has a mass of36 kDa with 2,525 nonhydrogen atoms (2,606 including ligands)and no atom heavier than sulfur, and the deposited data extendto 0.78-Å resolution. The eLLG for a sulfur atom with a B factorequal to the average in the crystal is 4.0, or 12.6 for a well-ordered sulfur atom with a B factor reduced by only 1 Å2. MRimplemented in Phaser was able to locate up to 10 atoms withclear signal (Table 1).A structure comprising a few atoms can then serve as a seed

for structure completion by using log-likelihood-gradient mapsto select locations for new nitrogen atoms (as a surrogate forother types) that improve the MR likelihood score (14) (Meth-ods). Starting from as few as the first two atoms placed by MR,the structure of aldose reductase was extended successfully bylog-likelihood-gradient completion. The result was a model with3,051 atoms (some accounting for solvent molecules and forstatic disorder) that yields an LLGI of 483,292 and an R value of12.9% (Fig. 3). In contrast, all attempts to solve this structure bydirect methods or their dual-space variants (15, 16) have failed.As far as we can determine, it is the largest reported ab initiostructure containing nothing heavier than the sulfur atoms found

Fig. 1. Confidence in MR solution as function of final LLGI score. The finalrefined LLGI score provides a clear diagnostic for success in MR. The threecurves show how the success rate for placing the first copy by MR varies withLLGI in 3 different space-group symmetry classes: P1 (only 3 rotational de-grees of freedom; red; total of 263 MR trials), polar (3 rotational and2 translational degrees of freedom, with an arbitrary origin along one axis;blue; 4,738 MR trials), and nonpolar (3 rotational and 3 translational degreesof freedom; black; 16,740 MR trials).

Fig. 2. Increase in eLLG with resolution and number of atoms. The threecurves show how the eLLG increases with the number of atoms placed (oneatom: blue curve, two atoms: orange curve, three atoms: green curve) andwith increasing numbers of reflections to higher resolution. The calculationsare based on the aldose reductase test case (3bcj), for which the data extendto 0.78-Å resolution and the heaviest atoms are sulfurs. It is assumed that Bfactors for the best-ordered S atoms will be lower than the mean for thewhole structure; by choosing a B factor reduced by just 1.3 Å2 from themean, the actual LLGI values obtained from placing single S atoms (Table 1)can be reproduced fairly well. The eLLG values rise rapidly with resolution, asthe number of observed reflections increases and the relative scatteringpower of the S atoms increases.

3638 | www.pnas.org/cgi/doi/10.1073/pnas.1701640114 McCoy et al.

in natural protein sequences, although larger ab initio structurescontaining metal ions have been solved (17).The formulation predicts that it should also be possible to

place sulfur atoms in smaller structures at lower resolution.This was crucial in solving a previously unknown structure, theN-terminal domain (residues 22–95) of Shisa3, which crystal-lized in space group P43212 and diffracted to 1.39-Å resolution.The protein did not have detectable sequence identity with anyprotein in the PDB, so there was no template structure for tradi-tional MR. The eLLG calculations predict that there shouldbe some signal for placing well-ordered sulfur atoms, giving aneLLG of 4.0 for a sulfur atom with a B factor reduced by 1.5 Å2

from the average. Indeed, up to seven of the eight sulfur atoms inthis protein could be placed with good signal (Table 1).Log-likelihood-gradient completion is expected to work more

poorly at resolutions where atomic peaks are not resolved.Nonetheless, this succeeded in expanding the Shisa3 structure toa total of 56 atoms, with the additional atoms largely corre-sponding to well-ordered main-chain oxygen and nitrogen atoms.At this point, the phase information was sufficient to enablephase improvement by density modification in Parrot (18), andthe resulting map could be interpreted in terms of an atomicmodel in ARP/wARP (19). A hybrid approach exploiting directmethods algorithms implemented in ACORN (17, 20) or inSHELXE (21) was also able to expand a partial structureobtained by single-atom MR. This succeeded when starting fromas little as one pair of sulfur atoms (Fig. 4). The structure, whichcontains no α-helices and represents a protein fold with no de-tectable similarity to other structures in the PDB, was refined toan R value of 11.5% and has been deposited in the PDB withaccession code 5m0w. Details of the structure will be discussedelsewhere.

DiscussionThis work brings together high-resolution ab initio phasing andlow-resolution MR in one unified framework that spans thecontinuum of data and model quality, with the eLLG directingthe tailoring of structure solution to the optimal path for thedata available. It demonstrates the considerable practical im-pact, compared with traditional direct methods, of accountingrigorously for the effects of sources of error in a likelihoodtarget. It is also important to note that these results have beenobtained by a deterministic algorithm. Direct methods, incontrast, are invariably implemented within a random multi-solution framework, an approach that should also improve theoutcome of single-atom MR. Finally, the results were obtained

without taking advantage of any other information that wouldtypically be present, e.g., from single-wavelength anomalousdiffraction (SAD) effects in crystals with intrinsic anomalousscatterers such as sulfur, or even from isomorphous replace-ment experiments. A proper accounting for the effects of un-certainty, as demonstrated here, should allow us to extend ourapproach to use even weak information from these othersources.

MethodsFormalism for the eLLG and Its Approximation. The likelihood function used toscore MR solutions is based on the Rice distribution (9, 10), modified to ac-count for the effect of measurement errors in the observed intensities (11).For acentric reflections, this is given by

paðEe; ECÞ= 2Ee1−D2

obsσ2A

exp

"−E2e + ðDobsσAECÞ21−D2

obsσ2A

#I0

2DobsσAEeEC1−D2

obsσ2A

!, [2]

where Ee (an effective normalized structure factor amplitude) and Dobs (anestimate of its precision) are derived from the observed intensity and itsSE, EC is the normalized structure factor amplitude calculated from theplaced model, σA is the fraction of the calculated structure factor that iscorrelated with the true structure factor, and I0 is a modified Besselfunction of order 0.

The eLLG is defined as the probability-weighted average of the logarithmof the likelihood ratio, integrated over all pairs of observed and calculated

Table 1. Progress of single-atom MR

Atomnumber

Aldose reductase (3bcj) Shisa3 (5m0w)

LLGI TFZAtomtype ΔB (Å2) LLGI TFZ

Atomtype ΔB (Å2)

1 22 4.2 S −0.9 19 6.1 S −1.52 67 8.8 S −0.4 57 8.3 S −1.03 154 12.7 P −0.7 80 6.1 S −1.84 243 12.7 P −0.2 122 8.3 S −1.65 346 13.3 S 0.3 161 8.3 S −0.16 463 14.4 S 0.1 221 10.1 S −0.47 613 16.5 S −0.2 297 11.3 S −1.48 691 12.0 P 1.2 –– –– ––

9 829 15.7 S 0.1 –– –– ––

10 908 11.8 S 1.3 –– –– ––

ΔB, refined difference from overall average B factor. Note that thesearches become more unambiguous as more well-ordered S or P atomsare placed because, for equal atoms, the total LLGI should be proportionalto the square of the number of atoms placed.

Fig. 3. Single-atom model and electron density for aldose reductase.Two single sulfur atoms were placed by MR, then nitrogen atoms werepositioned using the log-likelihood-gradient completion algorithm.Atoms forming the sequence Tyr-Pro-Phe and its environment are shownas gray spheres, and the electron density map phased with the atomicmodel is shown in magenta, contoured at 2.3× the rms electron density.Refined occupancies allow the nitrogen atoms to serve as surrogates forall atom types.

McCoy et al. PNAS | April 4, 2017 | vol. 114 | no. 14 | 3639

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY

normalized structure factors. The contribution of a single reflection to theeLLG is defined in Eq. 3:

ÆLLGIæhkl =Z ∞

0

Z ∞

0pðEe, ECÞln

�pðEe; ECÞpðEeÞ

�dEedEC , [3a]

where, for the acentric case,

paðEe, ECÞ= 4EeEC1−D2

obsσ2A

exp

E2e + E2

C

1−D2obsσ

2A

!I0

2DobsσAEeEC1−D2

obsσ2A

![3b]

and

pðEeÞ= 2Eeexp�−E2

e

�. [3c]

The Maclaurin series expansion of the integrand of Eq. 3a for the acentriccase, to fourth order in DobsσA, is given in Eq. 4:

paðEe, ECÞln�paðEe; ECÞpaðEeÞ

�≈ a+bD2

obsσ2A + cD4

obsσ4A, [4a]

where

a=4e−E2e−E2

C EeEChlnEee−E

2e

− ln

ECe−E

2C

i, [4b]

b= 4e−E2e−E2

C EeEC�1− E2

e

��1− E2

C

�h1+ ln

Eee−E

2e

− ln

ECe−E

2C

i, [4c]

c= e−E2e−E

2C EeEC

n6+4E2

C

�E2C − 3

�− 4�3− 6E2

C + 2E4C

�E2e +�4− 8E2

C +3E4C

�E4e

+�2− 4E2

e + E4e

��2−4E2

C + E4C

�hlnEee−E

2e

− ln

ECe−E

2C

io.

[4d]

The double integrals over a and b both evaluate to zero, whereas the doubleintegral over c yields 1/2. Fig. S1 shows that D4

obsσ4A=2 is an excellent ap-

proximation to ÆLLGIæhkl, especially for the smaller values of DobsσA thatwould be encountered in difficult structure solutions. Although the forms ofthe probability distributions for the centric case are different, the same re-sult is achieved by integrating a series expansion, i.e., that ÆLLGIæhkl is ap-proximately equal to D4

obsσ4A=2.

The variance of ÆLLGIæhkl is defined in Eq. 5:

σ2ðÆLLGIæhklÞ= ÆLLGI2æhkl − ÆLLGIæ2hkl , [5a]

where

ÆLLGI2æhkl =Z ∞

0

Z ∞

0pðEe, ECÞln

�pðEe;EC ÞpðEeÞ

�2

dEedEC . (5b)

For the small values of DobsσA that characterize difficult cases, Eq. 5a will bedominated by the first term (as the second term will have a value of theorder of D8

obsσ8A). The Maclaurin series expansion of the integrand of Eq. 5b

for the acentric case, to fourth order in DobsσA, is given in Eq. 6:

paðEe, ECÞln�paðEe; ECÞpaðEeÞ

�2

≈ 4e−E2e−E

2C EeEC

�1− E2

e

�2�1− E2

C

�2D4

obsσ4A. [6]

The double integral over this single term yields simplyD4obsσ

4A. The same result

is obtained for the contributions of centric reflections to the variance ofthe eLLG.

Because the variance of ÆLLGIæhkl is proportional to ÆLLGIæhkl itself, thevariance of the total eLLG, summed over all reflections, is also proportionalto the total eLLG. Therefore, the signal-to-noise ratio for any eLLG is pro-portional to

ffiffiffiffiffiffiffiffiffiffiffieLLG

p, regardless of how that eLLG is achieved through a com-

bination of model quality, completeness, data quality, and data resolution.Similarly, the value of LLGI obtained in an MR search will indicate the confi-dence that can be placed in the corresponding solution, regardless of how theLLGI was achieved. Indeed the translation function Z score, which is used as ameasure of confidence in an MR solution (10), is seen to be roughly pro-portional to the square root of the LLGI in the database of MR calculations.

Mathematical Derivations. Series approximations and integrals used in thederivation of Eqs. 3–6 were computed with Mathematica (22), which wasalso used to prepare Fig. 2 and Fig. S1.

Single-Atom MR Protocol. In the single-atom MR protocol, the first step is tocarry out translation searches for a specified number of the heavier atomsexpected in the structure. For the trials summarized in Table S1, the searchlooked for four atoms unless fewer sufficiently heavy atoms were expected.In the next step, log-likelihood-gradient completion (described in the nextsection) was used to complete each of the potential few-atom solutions byadding nitrogen atoms as surrogates for all remaining atom types. Re-finement, at each step, of the occupancies of the nitrogen atoms compen-sates for the difference in scattering power compared with other atomtypes, such as carbon or oxygen. The log-likelihood-gradient completioncontinues to convergence, when no further peaks are identified.

The test cases in Table S1 were chosen from the PDB based initially on thecriteria that data extending to atomic resolution (1.2 Å or better) weredeposited in the form of intensities rather than amplitudes, and that therewere no atoms heavier than S in the structure. The initial set was supple-mented with several cases at lower than 1-Å resolution in which there areatoms heavier than S, as the success rate was otherwise low in this resolutionrange. Note that the LLGI per atom after the initial search for individualheavier atoms provides a reasonable diagnostic indication of success. For thecases where the protocol succeeded, LLGI per atom ranged from 21.5 to272.2 with a mean of 88.3, whereas for cases where the protocol failed, LLGIper atom ranged from 19.0 to 43.5 with a mean of 28.4. The difference inLLGI per atom distributions for the data from Table S1 is illustrated in Fig. S2by a box plot, generated with BoxPlotR (23).

Log-Likelihood-Gradient Completion. In a log-likelihood-gradient map, peaksshow positions where the addition of atoms of a specified type would tend toincrease the corresponding likelihood target. The single-atom MR algorithmimplemented in Phaser computes a log-likelihood-gradientmap correspondingto theMR likelihood function, but does so by using the equivalent functionalityrequired for handling singletons (reflections with only onemember of a Friedelpair, hence no anomalous scattering-phase information) in the SAD likelihoodtarget (14). Peak picking is carried out using the same defaults as for log-likelihood-gradient SAD completion, i.e., peaks above 6× the rms value ofthe map are selected, unless the deepest hole in the map has a greatermagnitude. Log-likelihood-gradient completion is iterative, with the additionof atoms increasing the signal in subsequent log-likelihood-gradient maps.

Availability. Computer code for the program Phaser, including the eLLG calcu-lations, is available as open source within the Phenix (24) and CCP4 (25) packages.

Fig. 4. Two-sulfur model and phase-extended density for Shisa3. The twosulfur atoms shown as spheres were placed individually by MR, then theprogram ACORN was used to refine the phase information, giving the mapshown in magenta lines, contoured at 0.6× the rms electron density.

3640 | www.pnas.org/cgi/doi/10.1073/pnas.1701640114 McCoy et al.

ACKNOWLEDGMENTS.We are grateful to the Local Contact at the EuropeanSynchrotron Radiation Facility (ESRF) for providing assistance in usingbeamline ID14-3, as well as Doreen Dobritzsch for help with the datacollection. The diffraction data were collected on beamline ID14-3 at theESRF, Grenoble, France. This research was supported by a Principal ResearchFellowship from the Wellcome Trust (082961/Z/07/Z to R.J.R.), and grants

from the NIH (Grant P01GM063210 to R.J.R.), the Swedish Research Council(Grant 521-2014-1833 to K.T. and Grant 2007-5648 to B.L.), the Knut andAlice Wallenberg Foundation (K.T.), the Novo Nordisk Foundation (K.T.), andthe Röntgen Ångström Cluster (Grant 349-2013-597 to B.L.). The researchwas facilitated by Wellcome Trust Strategic Award 100140 to the CambridgeInstitute for Medical Research.

1. Bragg WL (1913) The structure of some crystals as indicated by their diffraction ofX-rays. Proc R Soc A Math Phys Sci 89:248–277.

2. Patterson AL (1934) A Fourier series method for the determination of the componentsof interatomic distances in crystals. Phys Rev 46:372–376.

3. Hauptman H, Karle J (1953) Solution of the phase problem. I. The CentrosymmetricCrystal, American Crystallographic Association Monograph No. 3 (Edwards Brothers,Ann Arbor, MI).

4. Cork JM (1927) LX. The crystal structure of some of the alums. Lond Edinb DublPhilosoph Mag J Sci 4:688–698.

5. Perutz MF (1956) Isomorphous replacement and phase determination in non-centrosymmetric space groups. Acta Crystallogr 9:867–873.

6. Bijvoet JM (1954) Structure of optically active compounds in the solid state. Nature173:888–891.

7. HendricksonWA (1985) Analysis of protein structure from diffraction measurement atmultiple wavelengths. Trans Am Crystallogr Assoc 21:11–21.

8. Rossmann MG, Blow DM (1962) A method of positioning a known molecule in anunknown crystal structure. Acta Crystallogr 15:24–31.

9. Read RJ (2001) Pushing the boundaries of molecular replacement with maximumlikelihood. Acta Crystallogr D Biol Crystallogr 57(Pt 10):1373–1382.

10. McCoy AJ, et al. (2007) Phaser crystallographic software. J Appl Cryst 40(Pt 4):658–674.11. Read RJ, McCoy AJ (2016) A log-likelihood-gain intensity target for crystallographic

phasing that accounts for experimental error. Acta Crystallogr D Struct Biol 72(Pt 3):375–387.

12. Rodríguez DD, et al. (2009) Crystallographic ab initio protein structure solution belowatomic resolution. Nat Methods 6(9):651–653.

13. Zhao HT, et al. (2008) Unusual binding mode of the 2S4R stereoisomer of the potentaldose reductase cyclic imide inhibitor fidarestat (2S4S) in the 15 K crystal structure of

the ternary complex refined at 0.78 A resolution: implications for the inhibition

mechanism. J Med Chem 51(5):1478–1481.14. McCoy AJ, Read RJ (2010) Experimental phasing: Best practice and pitfalls. Acta

Crystallogr D Biol Crystallogr 66(Pt 4):458–469.15. Weeks CM, DeTitta GT, Miller R, Hauptman HA (1993) Application of the minimal

principle to peptide structures. Acta Crystallogr D Biol Crystallogr 49(Pt 1):179–181.16. Sheldrick GM, Hauptman HA, Weeks CM, Miller M, Usón I (2001) Direct methods.

International Tables for Macromolecular Crystallography, eds Arnold E, Rossmann M

(Kluwer Academic, Dordrecht, The Netherlands), Vol F, pp 333–345.17. Dodson EJ, Woolfson MM (2009) ACORN2: New developments of the ACORN concept.

Acta Crystallogr D Biol Crystallogr 65(Pt 9):881–891.18. Cowtan K (2010) Recent developments in classical density modification. Acta

Crystallogr D Biol Crystallogr 66(Pt 4):470–478.19. Langer G, Cohen SX, Lamzin VS, Perrakis A (2008) Automated macromolecular model

building for X-ray crystallography using ARP/wARP version 7. Nat Protoc 3(7):1171–1179.20. Foadi J, et al. (2000) A flexible and efficient procedure for the solution and phase re-

finement of protein structures. Acta Crystallogr D Biol Crystallogr 56(Pt 9):1137–1147.21. Thorn A, Sheldrick GM (2013) Extending molecular-replacement solutions with

SHELXE. Acta Crystallogr D Biol Crystallogr 69(Pt 11):2251–2256.22. Wolfram Research (2015) Mathematica (Wolfram Research, Champaign, IL), Vol 10.23. Spitzer M, Wildenhain J, Rappsilber J, Tyers M (2014) BoxPlotR: A web tool for gen-

eration of box plots. Nat Methods 11(2):121–122.24. Adams PD, et al. (2010) PHENIX: A comprehensive Python-based system for macro-

molecular structure solution. Acta Crystallogr D Biol Crystallogr 66(Pt 2):213–221.25. Winn MD, et al. (2011) Overview of the CCP4 suite and current developments. Acta

Crystallogr D Biol Crystallogr 67(Pt 4):235–242.

McCoy et al. PNAS | April 4, 2017 | vol. 114 | no. 14 | 3641

BIOPH

YSICSAND

COMPU

TATIONALBIOLO

GY