36
1 Computational Crystallography Newsletter (2010). Volume 1, Part 1. Computational Crystallography Newsletter Iotbx.PDB SECONDARY-STRUCTURE SPOTFINDER Volume ONE JULY MMX Table of Contents PHENIX News 1 Crystallographic meetings 2 Expert Advice 3 FAQ 3 Articles cctbx PDB handling tools 4 Secondary structure restraints in phenix.refine 12 cctbx Spotfinder: a faster software pipeline for crystal positioning 18 On atomic displacement parameters (ADP) and their parameterization in PHENIX 24 Short communications Non‐periodic torsion angle targets in PHENIX 32 Model building updates & new features 34 Editor Nigel W. Moriarty, [email protected] Contributors P. D. Adams, P. V. Afonine, V. B. Chen, N. Echols, J. J. Headd, R. W. Grosse‐Kunstleve, N. W. Moriarty, D. C. Richardson, J. S. Richardson, N. K. Sauter, T. C. Terwilliger, A. Urzhumtsev PHENIX News New releases A new feature in the PHENIX GUI is designed to compare the refined structures of similar proteins. Structure comparison uses the protein sequences to display the differences between each protein chain loaded in both a table format and graphically in COOT. Several features of the protein structure can be compared including rotamers and secondary structure. NCS chains can be overlaid in COOT, edited and saved in the original orientation. The Java kinemage viewer KiNG (Protein Science 2009, 18:2403‐2409) has been incorporated into PHENIX, so that the KiNG jar files and supporting scripts will be part of the distribution package, while relying on the Java virtual machine that is standard on the user's Mac or Linux platform. Without any other setup, the KiNG program in PHENIX would allow viewing of macromolecular structures directly from PDB files, and viewing of kinemages (such as multi‐criterion kinemages from MolProbity). We are currently working to incorporate validation kinemage creation directly within the PHENIX GUI, allowing even more seamless and complete user evaluation of model quality during the refinement process. Later developments might also provide kinemage displays to provide help in evaluating other aspects of refinement and model building progress. In a similar vein to phenix.superpose_pdbs,a new program has been added to PHENIX that is specifically designed for superposing ligands. Superposing protein models is generally done using the C α positions. Ligands require different algorithms to create atomic correspondences, some of which has been implemented in eLBOW The Computational Crystallography Newsletter (CCN) is a regularly distributed electronically via email and the PHENIX website, www.phenix‐online.org/newsletter . Feature articles, meeting announcements and reports, information on research or other items of interest to computational crystallographers or crystallographic software users should be submitted to the editor at any time for consideration. Submission of text by email or word‐processing files using the CCN templates is requested.

CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

1ComputationalCrystallographyNewsletter(2010).Volume1,Part1.

Computational Crystallography Newsletter

Iotbx.PDB SECONDARY-STRUCTURE SPOTFINDER

Vo

lum

e O

NE

JULY

MM

X

TableofContents• PHENIXNews 1• Crystallographicmeetings 2• ExpertAdvice 3• FAQ 3• Articles• cctbxPDBhandlingtools 4• Secondarystructurerestraintsinphenix.refine 12

• cctbxSpotfinder:afastersoftwarepipelineforcrystalpositioning 18

• Onatomicdisplacementparameters(ADP)andtheirparameterizationinPHENIX 24

• Shortcommunications• Non‐periodictorsionangletargetsinPHENIX 32

• Modelbuildingupdates&newfeatures 34

EditorNigelW.Moriarty,[email protected]

ContributorsP.D.Adams,P.V.Afonine,V.B.Chen,N.Echols,J.J.Headd,R.W.Grosse‐Kunstleve,N.W.Moriarty,D.C.Richardson,J.S.Richardson,N.K.Sauter,T.C.Terwilliger,A.Urzhumtsev

PHENIXNewsNewreleases

A new feature in the PHENIX GUI is designed tocomparetherefinedstructuresofsimilarproteins.Structurecomparisonuses theproteinsequences

to display the differences between each proteinchain loaded in both a table format andgraphically in COOT. Several features of theprotein structure can be compared includingrotamersandsecondarystructure.NCSchainscanbe overlaid in COOT, edited and saved in theoriginalorientation.

The Java kinemage viewerKiNG (Protein Science2009, 18:2403‐2409) has been incorporated intoPHENIX, so that theKiNG jar filesandsupportingscripts will be part of the distribution package,while relying on the Java virtual machine that isstandard on the user's Mac or Linux platform.Without any other setup, the KiNG program inPHENIX would allow viewing of macromolecularstructuresdirectly fromPDBfiles,andviewingofkinemages (such as multi‐criterion kinemagesfrom MolProbity). We are currently working toincorporatevalidationkinemagecreationdirectlywithin the PHENIX GUI, allowing even moreseamless and complete user evaluation of modelquality during the refinement process. Laterdevelopments might also provide kinemagedisplays to provide help in evaluatingother aspects of refinement and model buildingprogress.

Inasimilarveintophenix.superpose_pdbs,anew program has been added to PHENIX that isspecifically designed for superposing ligands.Superposing protein models is generally doneusing the Cα positions. Ligands require differentalgorithms to create atomic correspondences,some of which has been implemented in eLBOW

The Computational Crystallography Newsletter (CCN) is a regularly distributed electronically via email and the PHENIXwebsite,www.phenix‐online.org/newsletter.Featurearticles,meetingannouncementsandreports,informationonresearchorotheritemsofinteresttocomputationalcrystallographersorcrystallographicsoftwareusersshouldbesubmittedtotheeditoratanytimeforconsideration.Submissionoftextbyemailorword‐processingfilesusingtheCCNtemplatesisrequested.

Page 2: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

2ComputationalCrystallographyNewsletter(2010).Volume1,Part1.

(Moriarty et. al., Acta Cryst., D65, 2009, 1074–1080). eLBOW’s methods including graphmatching are used to match the atoms from asingle molecule or group of residues to anothersingle molecule. The resulting correspondencescanbeusedinseveralways.Themoleculescanbealigned using the least‐squares residual of thedistances between matched atoms; the exactposition of the corresponding atoms can betransferred; or the PDB attributes such as atomnameandresiduenamecanbetransferred.Afilecontaining a protein and ligand can be used asinput andmultiple instancesof the ligand canbehandled.Theprogramcanbeaccessedbyrunningphenix.superpose_ligands.

A tool developed especially for our industrialconsortium members and designed to decreasethe time to fit a ligand by using a previouslypositioned ligand in the same or similar proteinmodelhasbeenadded toPHENIX. GuidedLigandReplacement (GLR) matches the protein models,determinesthelocationofthefitligand,performsan atomic match of the ligands and inserts theoverlaid ligand into the apo proteinmodel. As afinalstep,areal‐spacerefinementisperformedonthe ligand and the surrounding protein. GLRattempts to match all instances of the guidingligandintheasymmetricunit.

Newfeatures

RESOLVEinPHENIXisnowentirelyC++code.Thisallowstheuseof thecctbx routines inRESOLVEand speeds up RESOLVE model‐building by afactorof nearly two. It also allows caching ofresolve libraries, furtherspeedingupRESOLVE inPHENIX.

Ligand fitting in PHENIX now incorporates real‐spacerefinement,yieldinggreatlyimprovedfitsofligandtodensity.

Loop librarieshavebeen created for rapid fittingofshortloopswithphenix.fit_loops.

Theligandgeometryrestraintseditor,REEL,hasanewfeaturethatallowssearchingoftheChemicalComponents Dictionary available athttp://www.wwpdb.org/ccd.html and alsodistributed with PHENIX. The user can searchusing various ligand attributes including nameandchemical formula. Theresultscanbeviewedinthemolecularviewingwindow.

CurrentPHENIXdevelopmentWork is currently underway to improve thesupport for carbohydrates in macromolecularmodels. The GlycoCT format (Herget et al.,Carbohydr.Res.,2008,343:2162‐2171)whichcanspecifyacarbohydratesequencehasbeenchosenas the base format for file based input. Otherformats will be supported. A GUI input is alsobeingdevelopedaswellastoolstohandlemodelsalready containing the carbohydrates. It isexpected that ligand fitting will be expanded toimprove the fitting of carbohydrates focusing onthe saccharide chains that commonly occur inproteinmodels.Branchingwillbesupportedfromtheinception.

Crystallographicmeetingsandworkshops2010 Australasian Crystallography School, 17‐24July,2010

The 2010 Australasian Crystallography School isbeing held at the University of Queensland,Brisbane, Australia from the 17th to the 24th ofJuly. Pavel Afoninewill be teaching amodule onmacro‐molecularrefinement.

GRC–DiffractionMethodinCrystallography,18‐23July2010

Several of the PHENIX developers will beattending the Gordon Conference entitled“Diffraction Methods in Crystallography” to beheld at Bates College, Lewiston, Maine from the18th to the 23rd of July. Therewill be posters onthe various aspects of PHENIX including thegraphical user interface, validation and ligands.Dropbyduringthepostersessionstospeaktothedevelopers.

ACA–2010AnnualMeeting,24‐29July,2010

The annual meeting of the AmericanCrystallographic Association will be held inChicago,Illinoisfromthe24thtothe29thofJuly.

22ndPHENIXWorkshop,11‐13October,2010

The semi‐annualPHENIXworkshop is being heldinCambridge,Englandfromthe11thtothe13thofOctober. Developerswill be presenting the latestprograms and feature releases on Monday the

Page 3: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

3ComputationalCrystallographyNewsletter(2010).Volume1,Part1.

11th.All interestedparties in theareaare invitedtoattendtheMondaysessions.

5thPHENIXUser’sWorkshop,14October,2010

Following the PHENIX Workshop in CambridgeU.K.,auser’sworkshopwillprovideteachingandhands‐on experiencewithPHENIX for the noviceandexpert.

ParisWorkshop,15October,2010A contingent of PHENIX developers will be atInstitut Pasteur, Paris, France on the 15th ofOctobertoparticipateinaworkshopforthelocalusers. The workshop is being organized byClaudine Mayer and Deshmukh Gopaul and isfunded by the "Groupe Thematique Biologie" oftheFrenchCrystallographyAssociation.

Expertadvice

GeometryrestraintESD

Many users enquire about the details of thegeometry restraints used in macromolecularrefinement programs. Refmac,COOT andPHENIXuseaCIFformat;anexampleoftherestraintsforwaterappearsatright.

Much of the information in the file isstraightforward including atom names, elementsand Cartesian coordinates. Even the bondinginformation is mostly transparent. The atomsinvolved, the type of bond and the ideal bondlengths are easily discerned. The last number onthe line,however, isnotsosimple.TheESD isanEstimated Standard Deviation that allows arefinement program to estimate the gradients ofthe force on the two atoms in the bond using aparabola.TheunitsoftheESDarethesameastheideal value so for bond lengths the unit isÅngstrom. The larger the ESD value, the moreflexible the restraint will be in the refinement.Conversely,reducingthenumberwillrestraintheinternal coordinate to remain closer to the idealvalue.

ThereisalimittotheamountonecanreducetheESDofarestraint.Intheextreme,settingtheESDvalue to zero will remove the restraint from therefinement.Also,reducingtheESDtoavaluethatisordersofmagnitudesmallerthantheotherESDvalues in the same class will greatly reduce the

weightoftheotherrestraintsrelativetothehighlyrestrained value and may even adversely affecttheweightofthex‐rayterm.

Generally, it is best to have the ESD valuesapproximatelythesameastheexpectedaccuracyoftheexperiment.Inthecaseofthebondlengths,an ESD of 0.01 Ångstrom could be considered areasonablelowerbound.AnangleESDofabout1degree and torsion ESD of 5 degrees are alsoreasonableasalowerlimit.

FAQ

HowdoIcreaterestraintsformyligandinPHENIX?There are a number of ways. If you know theligand code corresponding to the ligand in thePDBdatabasessuchastheChemicalComponents,thenyoucanuseeLBOWthus:phenix.elbow --chemical-component=ATP

The resulting files, ATP.pdb and ATP.cif, willhavethedatasuchasatomnamesconsistentwiththeChemicalComponentsdatabase.

data_comp_list loop_ _chem_comp.id _chem_comp.three_letter_code _chem_comp.name _chem_comp.group _chem_comp.number_atoms_all _chem_comp.number_atoms_nh _chem_comp.desc_level HOH HOH 'water ' ligand 3 1 . data_comp_HOH loop_ _chem_comp_atom.comp_id _chem_comp_atom.atom_id _chem_comp_atom.type_symbol _chem_comp_atom.type_energy _chem_comp_atom.partial_charge _chem_comp_atom.x _chem_comp_atom.y _chem_comp_atom.z HOH O O OH2 . -0.2400 0.0000 -0.3394 HOH H1 H HOH2 . 0.8400 0.0000 -0.3394 HOH H2 H HOH2 . -0.6000 0.0000 0.6788 loop_ _chem_comp_bond.comp_id _chem_comp_bond.atom_id_1 _chem_comp_bond.atom_id_2 _chem_comp_bond.type _chem_comp_bond.value_dist _chem_comp_bond.value_dist_esd HOH H1 O single 1.080 0.020 HOH H2 O single 1.080 0.020 loop_ _chem_comp_angle.comp_id _chem_comp_angle.atom_id_1 _chem_comp_angle.atom_id_2 _chem_comp_angle.atom_id_3 _chem_comp_angle.value_angle _chem_comp_angle.value_angle_esd HOH H2 O H1 109.47 3.000

Page 4: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

4

ARTICLES

ComputationalCrystallographyNewsletter(2010).1,4–11

cctbxPDBhandlingtoolsRalfW.Grosse‐KunstleveaandPaulD.Adamsa,b

aLawrenceBerkeleyNationalLaboratory,Berkeley,CA94720bDepartmentofBioengineering,UniversityofCaliforniaatBerkeley,Berkeley,CA94720

Correspondenceemail:RWGrosse‐[email protected]

IntroductionThePDBformat is thepredominantworking format foratomicparameters(coordinates,occupancies,displacementparameters,etc.)inmacromolecularcrystallography.Manysmall‐moleculeprogramsalsosupport this format.ThePDB formatspecificationsareavailableathttp://www.pdb.org/.Technically,theformatisverysimple,thereforeavastnumberofparsersexistinscientificpackages.Thisarticleisabout the PDB handling tools included in the Computational Crystallography Toolbox (cctbx,http://cctbx.sourceforge.net/),theopen‐sourcecomponentofthePHENIXproject(http://www.phenix‐online.org/).

The evolutionof thecctbx PDBhandling tools has gone through threemain stages spreadout overseveralyears.AsimpleparserimplementedinPythonhasbeenavailableforalongtime.InmanycasesPython'sruntimeperformanceissufficientforinteractiveprocessingofPDBfiles,butcanbelimitingforlargefiles,orforrepeatedlytraversingtheentirePDBdatabase.Thishaspromptedustoimplementafast C++parser that is described inGrosse‐Kunstleve et al. (2006).However, initially the fastcctbxPDBhandlingtoolsonlysupported"read‐only"access.WritingofPDBfileswassupportedonlyataverybasiclevel.Thisshortcominghasbeenremovedandthecurrentcctbxversionprovidescomprehensivetools forreading,manipulating,andwritingPDBfiles.Thesetoolsareavailable frombothPythonandC++,undertheiotbx.pdbmodule.

Thisarticlepresentsanoverviewofthemaintypesintheiotbx.pdbmodule,considerationsthatleadto the design, and related important nomenclature. It is not a tutorial for using the iotbx.pdbfacilities. For this, refer to http://cctbx.sourceforge.net/sbgrid2008/tutorial.html. See alsohttp://cci.lbl.gov/hybrid_36/whichdescribesiotbx.pdbfacilitiesforhandlingverylargemodels.

Real‐worldPDBfilesThesimplicityofthePDBformatisonlysuperficialand,inthegeneralcase,stopsaftertheinitialparsinglevel. The structure of the PDB file implies a hierarchy of objects. A first approximation is thishierarchicalviewis:

model chain residue atom This is only an approximation because of a feature that is easily overlooked at first: the "altloc"(official PDB nomenclature) column 17 of PDB ATOM records, specifying "alternative location"identifiersforatomsinalternativeconformations.Asitturnedout,about90%ofthedevelopmenttimeinvested into iotbx.pdb was in some form related to alternative conformations. Our goal was toprovide robust tools that work even for the most unusual (but valid) cases, since this is a vitalcharacteristic of any automated system. The main difficulties encountered while pursuing this goalwere:

• Chainswithconformersthathavedifferentsequences• Chainswithduplicateresseq+icode(residuesequencenumber+insertioncode)• Conformersinterleavedorseparated

Page 5: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

5

ARTICLES

Thesedifficultiesarebestillustratedwithexamples.AnoldversionofPDBentry2IZQ(fromAug2008)includesachainwithconformersthathavedifferentsequences,theresiduewithresseq11inchainA,atoms220through283:

HEADER ANTIBIOTIC 26-JUL-06 2IZQ ATOM 220 N ATRP A 11 20.498 12.832 34.558 0.50 6.03 N ................ATRP A 11 ................................................... ATOM 243 HH2ATRP A 11 15.522 9.077 38.323 0.50 10.40 H ATOM 244 N CPHE A 11 20.226 13.044 34.556 0.15 6.35 N ................CPHE A 11 ................................................... ATOM 254 CZ CPHE A 11 16.789 9.396 34.594 0.15 10.98 C ATOM 255 N BTYR A 11 20.553 12.751 34.549 0.35 5.21 N ................BTYR A 11 ................................................... ATOM 261 CD1BTYR A 11 18.548 10.134 34.268 0.35 9.45 C ATOM 262 HB2CPHE A 11 21.221 10.536 34.146 0.15 7.21 H ATOM 263 CD2BTYR A 11 18.463 10.012 36.681 0.35 9.08 C ATOM 264 HB3CPHE A 11 21.198 10.093 35.647 0.15 7.21 H ATOM 265 CE1BTYR A 11 17.195 9.960 34.223 0.35 10.76 C ATOM 266 HD1CPHE A 11 19.394 9.937 32.837 0.15 10.53 H ATOM 267 CE2BTYR A 11 17.100 9.826 36.693 0.35 11.29 C ATOM 268 HD2CPHE A 11 18.873 10.410 36.828 0.15 9.24 H ATOM 269 CZ BTYR A 11 16.546 9.812 35.432 0.35 11.90 C ATOM 270 HE1CPHE A 11 17.206 9.172 32.650 0.15 12.52 H ATOM 271 OH BTYR A 11 15.178 9.650 35.313 0.35 19.29 O ATOM 272 HE2CPHE A 11 16.661 9.708 36.588 0.15 11.13 H ATOM 273 HZ CPHE A 11 15.908 9.110 34.509 0.15 13.18 H ATOM 274 H BTYR A 11 20.634 12.539 33.720 0.35 6.25 H ................BTYR A 11 ................................................... ATOM 282 HH BTYR A 11 14.978 9.587 34.520 0.35 28.94 H Theoriginalatomnumberingdoesnothavegaps.Herewehaveomittedblocksofatomswithconstantresnameandresseq+icodetosavespace.

AsofJun82010,thereare74fileswithmixedresiduenamesinthePDB,i.e.onlyabout0.1%ofthefiles.However, these filesareperfectlyvalidandaPDBprocessing library issuitableasacomponentofanautomatedsystemonlyifithandlesthemsensibly.

An old version of PDB entry 1ZEH (Aug 2008) includes a chain with consecutive duplicateresseq+icode,atoms878through894:

HEADER HORMONE 01-MAY-98 1ZEH HETATM 878 C1 ACRS 5 12.880 14.021 1.197 0.50 33.23 C HETATM 879 C1 BCRS 5 12.880 14.007 1.210 0.50 34.27 C ................ACRS 5 ................................................... HETATM 892 O1 ACRS 5 11.973 14.116 2.233 0.50 34.24 O HETATM 893 O1 BCRS 5 11.973 14.107 2.248 0.50 35.28 O HETATM 894 O HOH 5 -0.924 19.122 -8.629 1.00 11.73 O HETATM 895 O HOH 6 -19.752 11.918 3.524 1.00 13.44 O HETATM 896 O HOH 7 -1.169 17.936 -6.103 1.00 12.89 O To a human inspecting the old 1ZEH entry, it is of course immediately obvious that the water withresseq5 isnotrelated to thepreviousresiduewithresseq5.However,arrivingat thisconclusionwithanautomaticprocedure isnotentirelystraightforward.Thehumanbrings intheknowledgethatwater atoms without hydrogen are not covalently connected to other atoms. This is very detailed,specialized knowledge. Introducing such heuristics into an automatic procedure is likely to lead tosurprisesinsomesituationsandisbestavoided,ifpossible.

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 6: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

6

ARTICLES

In the PDB archive, alternative conformers of a residue always appear consecutively. However, asmentionedintheintroduction,thePDBformatisalsothepredominantworkingformat.Someprogramsproducefileswithconformersseparatedinthisway(thisfilewasprovidedtousbyauser):

ATOM 1716 N ALEU 190 28.628 4.549 20.230 0.70 3.78 N ATOM 1717 CA ALEU 190 27.606 5.007 19.274 0.70 3.71 C ATOM 1718 CB ALEU 190 26.715 3.852 18.800 0.70 4.15 C ATOM 1719 CG ALEU 190 25.758 4.277 17.672 0.70 4.34 C ATOM 1829 N BLEU 190 28.428 4.746 20.343 0.30 5.13 N ATOM 1830 CA BLEU 190 27.378 5.229 19.418 0.30 4.89 C ATOM 1831 CB BLEU 190 26.539 4.062 18.892 0.30 4.88 C ATOM 1832 CG BLEU 190 25.427 4.359 17.878 0.30 5.95 C ATOM 1724 N ATHR 191 27.350 7.274 20.124 0.70 3.35 N ATOM 1725 CA ATHR 191 26.814 8.243 21.048 0.70 3.27 C ATOM 1726 CB ATHR 191 27.925 9.229 21.468 0.70 3.73 C ATOM 1727 OG1ATHR 191 28.519 9.718 20.259 0.70 5.22 O ATOM 1728 CG2ATHR 191 28.924 8.567 22.345 0.70 4.21 C ATOM 1729 C ATHR 191 25.587 8.983 20.559 0.70 3.53 C ATOM 1730 O ATHR 191 24.872 9.566 21.383 0.70 3.93 O ... residues 191 through 203 not shown ATOM 1828 O AGLY 203 8.948 14.861 23.401 0.70 5.84 O ATOM 1833 CD1BLEU 190 26.014 4.711 16.521 0.30 6.21 C ATOM 1835 C BLEU 190 26.506 6.219 20.135 0.30 4.99 C ATOM 1836 O BLEU 190 25.418 5.939 20.669 0.30 5.91 O ATOM 1721 CD2ALEU 190 24.674 3.225 17.536 0.70 5.31 C ATOM 1722 C ALEU 190 26.781 6.055 20.023 0.70 3.36 C ATOM 1723 O ALEU 190 25.693 5.796 20.563 0.70 3.68 O ATOM 8722 C DLEU 190 26.781 6.055 20.023 0.70 3.36 C ATOM 8723 O DLEU 190 25.693 5.796 20.563 0.70 3.68 O ATOM 9722 C CLEU 190 26.781 6.055 20.023 0.70 3.36 C ATOM 9723 O CLEU 190 25.693 5.796 20.563 0.70 3.68 O In this file, conformersA andBof residue190appear consecutively, but conformersC andDappearonlyafterconformersAandBofallresidues191through203.Whilethisisnotthemostintuitivewayoforderingtheresiduesinafile,itisstillconsideredvalidbecausetheoriginalintentionisclear.Sinceitwasourgoaltodevelopacomprehensivelibrarysuitableforautomaticallyprocessingfilesproducedbyanypopularprogram,wefounditimportanttocorrectlyhandlenon‐consecutiveconformers.

iotbx.pdb.hierarchyWhendevelopingtheprocedurecapableofhandlingthevarietyofreal‐worldsituationsshownabove,westrivedtokeeptheunderlyingrule‐setassimpleaspossibleandtoavoidhighlyspecificheuristics(e.g."waterisnevercovalentlybound").Complexrulesimplycompleximplementations,aredifficulttoexplain and understand, tend to lead to surprises, and are therefore likely to be rejected by thecommunity. With this and the real‐world situations in mind, we arrived at the following primaryorganizationofthePDBhierarchy:

Primary PDB hierarchy: model(s) id chain(s) id residue_group(s) resseq icode atom_group(s) resname altloc atom(s)

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 7: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

7

ARTICLES

Inthispresentationthe"(s)"indicatesalistofobjectsofthegiventype.i.e.ahierarchycontainsalistofmodels,eachmodelhasan"id"(asimplestring)andholdsalistofchains,etc.

Comparingwiththe"firstapproximationhierarchy"above,theresiduetypeisreplacedwithtwonewtypes: residue_group andatom_group. These types had to be introduced to cover all the real‐worldcasesshownabove.Theresidue_groupandatom_grouptypesarenewandunusual.Beforewegointothedetailsofthesetypes,itwillbehelpfultoconsiderthebiggerpicturebyintroducingthealternativesecondaryviewofthePDBhierarchy:

Secondary view of PDB hierarchy: model(s) id chain(s) id conformer(s) altloc residue(s) resname resseq icode atom(s) Thisorganizationisprobablymoreintuitiveatfirst.Thefirsttwolevels(model,chain)areexactlythesame as in the primary hierarchy. Each chain holds a list of conformer objects, which arecharacterized by thealtloc character from column17 in the PDBATOM records. Aconformer isunderstood to be a complete copy of a chain, but usually two conformers share some or evenmostatoms.Aresidueischaracterizedbyauniqueresnameandtheresseq+icode.

The secondary view of the hierarchy evolved in the context of generating geometry restraints forrefinement(e.g.bond,angle,anddihedralrestraints),wherethisorganization ismostuseful. It isalsothe organization introduced in Grosse‐Kunstleve et al. (2006), where it was actually the primaryorganization.Whileworkingwith the conformer‐residue organization, we found that it is difficult tomanipulate a hierarchy (e.g. add or delete atoms) in obvious ways. Finally, while developing theautomatic generation of constrained occupancy groups for alternative conformations, the conformer‐residueorganizationprovedtobeunworkable.Themaindifficultyisthattherelativeorderofresidueswithalternativeconformationsislostintheconformer‐residueorganization;itisonlygivenindirectlybyinterleavedresidueswithsharedatoms‐‐iftheyexist.Asconvenientastheconformer-residueorganizationisforthegenerationofrestraints,itisahindranceforotherpurposes.

Thenames for thenewtypes in theprimaryhierarchywerechosennot tocollidewith thesecondaryview.Wecouldhaveused"conformer"and"residue"again, justreversed,but therewouldbe thebigsurprisethatoneresiduehasdifferentresnames.Tosendthesignal"thisisnotwhatyouusuallythinkofasaresidue",wedecidedtouseresidue_groupasthetypename.Aresiduegroupholdsalist of atom_group objects. All atoms in an atom group have the same resname and altloc.Therefore "resname_altloc_group" would have been another plausible name, but we favoredatom_groupsinceitismoreconciseandbetterconveyswhatisthemaincontent.

DetectionofresiduegroupsandatomgroupsWhen constructing the primary hierarchy given a PDB file, the processing algorithm has to detectmodels,chains,residuegroups,atomgroupsandatoms.Moststepsarefairlystraightforward,butnoneofthestepsisactuallycompletelytrivial.Forexample,whattodoifTERorENDMDLcardsaremissing?What ifresiduesequencenumbersarenotconsecutive?Theribosomecommunitywidelyusessegidinsteadofchainid(eventhoughthesegidcolumnisofficiallydeprecatedbythePDB).Whatifafile

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 8: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

8

ARTICLES

containsbothchainidandsegid?Documentingallouranswers in fulldetail isbeyondthescopeofthis article (and would be more distracting than helpful anyway because the source code is openlyavailable).Thereforewe concentrateon themost important rules for thedetectionof residuegroupsandatomgroups.

Thencentralconflictwehadtoresolvewas:

• Inordertohandlenon‐consecutiveconformers,wehavetousetheresseq+icodetofindandgrouprelatedresidues.

• However,wecannotusetheresseq+icodealoneasaguide,becausewealsowanttohandlechainswithduplicateresseq+icode(consecutiveornon‐consecutive).

Thisleadtoatwostageprocedure.Inthefirststage:

• AresiduegroupisgivenbyablockofconsecutiveATOMorHETATMrecordswithidenticalresseq+icodecolumns(SIGATM,ANISOU,andSIGUIJrecordsmaybeinterleaved),e.g.

ATOM 234 H ATRP A 11 20.540 12.567 33.741 0.50 7.24 H ATOM 235 HA ATRP A 11 20.771 12.306 36.485 0.50 6.28 H ATOM 244 N CPHE A 11 20.226 13.044 34.556 0.15 6.35 N ATOM 245 CA CPHE A 11 20.950 12.135 35.430 0.15 5.92 C However,thereisanimportantpre‐condition:

• Unlessasub‐blockofATOMrecordswithidenticalresname+resseq+icodecolumnscontainsamain‐conformeratom(almost"blankaltloc"),e.g.

HEADER ANTIBIOTIC RESISTANCE 07-MAY-97 1AJQ CRYST1 52.120 65.080 76.300 100.20 111.44 105.81 P 1 1 HETATM 6097 CA CA 1 5.676 34.115 52.446 1.00 18.50 CA HETATM 6100 C2 SPA 1 11.860 36.159 33.853 1.00 14.30 C HETATM 6107 C6 SPA 1 13.085 36.522 34.644 1.00 17.34 C Inthiscasethesub‐blockisassignedtoaseparateresiduegroup.

Withineachresiduegroup,allatomsaregroupedbyaltloc+resnameandassignedtoatomgroups.Theorderoftheatomsinaresiduegroupdoesnotaffecttheassignmenttoatomgroups.

Afterallatomrecordsareassignedtoresiduegroupsandatomgroups,

• residuegroupswithidenticalresseq+icode• thatdonotcontainmain‐conformeratoms

aremerged in the secondprocessing stage.While two residue groups aremerged, atomgroupswithidenticalaltloc+resnamefromthetwosourcesarealsomerged.

Note that the grouping steps in this process may change the order of the atoms. However, ourimplementation preserves the relative order of the atoms as much as possible. I.e. in the hierarchy,resseq+icodeappearintheoriginal"firstseen"order,andsimilarlyforaltloc+resnamewithinaresiduegroup.

ConstructionofsecondaryviewofhierarchyThe secondary view of the hierarchy is constructed trivially from the primary hierarchy objects. Foreach chain, the complete list ofaltloc characters is determined in a first pass. In a second pass, aconformerobject is created for eachaltloc, and a second loopover the chain assigns the atoms toeach conformer. Main‐conformer atoms are assigned to all conformers, atoms in alternative

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 9: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

9

ARTICLES

conformations only to the corresponding conformer.Because of thedifficulties alluded to earlier, theconformer and residue objects in the secondary view are "read‐only". I.e. all manipulations such asadditionor removal of residues, have tobeperformedon theprimaryhierarchy.Re‐constructing thesecondaryviewaftertheprimaryhierarchyhasbeenchangedisveryfast(fractionsofsecondsevenforthelargestfiles,e.g.0.22sforPDBentry1HTQwithalmostonemillionatoms).

Theiotbx/examples/pdb_hierarchy.py script shows how to construct the primary hierarchyfromaPDBfile,andhowtoobtainthesecondaryview.

NomenclaturerelatedtoalternativeconformationsTheiotbx.pdb module uses the following nomenclature to describe various aspects of alternativeconformations:

• Mainconf.atom:anatomwitho ablankaltloccharactero andnootheratomwiththesamename+resname(butdifferentaltloc)intheresidue

group.Thissecondconditionisneededforcaseslikethis:

HEADER HYDROLASE 12-MAR-04 1SO7 ATOM 2460 CG1 VAL A 325 -23.284 97.713 15.815 0.66 21.74 C ATOM 2461 CG1AVAL A 325 -23.010 97.616 18.295 0.66 22.88 C ATOM 2462 CG2BVAL A 325 -24.819 96.373 17.146 0.66 22.57 C

• Alt.conf.atom:notmainconf.• Conformer:acompletechainwithmainconf.atomsandalt.conf.atomswithaspecificaltloc.

Noteforcompleteness:itisalsopossibletoobtainconformersofanindividualresiduegroup.However,conceptuallythisisbestviewedasashortcutforfirstobtainingtheconformersofachain,andthenfindingtheresidueofinterestineachconformer,ignoringallotherresidues.

• Residue:completeresidueinaconformerwithmainconf.atomsandalt.conf.atoms.

Residuesareclassifiedasfollows:

• puremainconf.:allmainconf.atoms• purealt.conf.:allalt.conf.atoms• properalt.conf.:bothmainconf.atomsandalt.confatoms,allaltconf.atomshaveanon‐blank

altloccolumn• improperalt.conf.:bothmainconf.atomsandalt.confatoms,oneormorealtconf.atomshave

ablankaltloccolumn(asshowninthe1SO7fragmentabove).

ErrorsandwarningsThetools iniotbx.pdbaredesignedtobeastolerantasreasonablypossiblewhenprocessinginputPDBfiles.E.g.,asofJun82010,all65802filesinthePDBarchivecanbeprocessedwithoutgeneratingexceptions. However, to assist users and developers in creating PDB files without ambiguities, thehierarchyobjectprovidesmethods for flagging likelyproblemsaserrorsandwarnings. It isup to theapplicationhowtoreacttothediagnostics.

Thephenix.pdb.hierarchycommandcanbeusedtoquicklyobtainasummaryofthehierarchyinaPDBfileandsomediagnostics.Thisexamplecommandhighlightsallmainfeatures:

phenix.pdb.hierarchy pdb1jxw.ent.gz

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 10: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

10

ARTICLES

Output:

file pdb1jxw.ent.gz total number of: models: 1 chains: 2 alt. conf.: 5 residues: 49 atoms: 786 anisou: 0 number of atom element+charge types: 5 histogram of atom element+charge frequency: " H " 383 " C " 261 " O " 77 " N " 59 " S " 6 residue name classes: "common_amino_acid" 48 "other" 1 number of chain ids: 2 histogram of chain id frequency: " " 1 "A" 1 number of alt. conf. ids: 3 histogram of alt. conf. id frequency: "A" 2 "B" 2 "C" 1 residue alt. conf. situations: pure main conf.: 32 pure alt. conf.: 3 proper alt. conf.: 14 improper alt. conf.: 0 chains with mix of proper and improper alt. conf.: 0 number of residue names: 16 histogram of residue name frequency: "CYS" 6 "THR" 6 "ALA" 5 "ILE" 5 "PRO" 5 "GLY" 4 "ASN" 3 "SER" 3 "ARG" 2 "LEU" 2 "TYR" 2 "VAL" 2 "ASP" 1 "EOH" 1 other "GLU" 1 "PHE" 1 ### WARNING: consecutive residue_groups with same resid ### number of consecutive residue groups with same resid: 2 residue group: "ATOM 378 N PRO A 22 .*. N " ... 12 atoms not shown "ATOM 391 HD3APRO A 22 .*. H " next residue group: "ATOM 392 CB BSER A 22 .*. C " ... 6 atoms not shown "ATOM 399 HB3CSER A 22 .*. H " ------------------------------------------ residue group: "ATOM 432 N LEU A 25 .*. N " ... 18 atoms not shown "ATOM 439 CD1CLEU A 25 .*. C " next residue group: "ATOM 452 CG1BILE A 25 .*. C " ... 13 atoms not shown "ATOM 466 HD13CILE A 25 .*. H "

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 11: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

11

ARTICLES

ThediagnosticsaremainlyintendedforPDBworkingfiles,butwehavetestedtheiotbx.pdbmodulebyprocessingtheentirePDBarchive.(Thistookabout3700CPUseconds,butusing40CPUsthelastjobfinishedafteronly277seconds.DiskandnetworkI/Oisrate‐limitinginthiscase,nottheperformanceofthePDBhandlinglibrary.)Theresultsaresummarizedinthefollowingtable:

Total number of .ent files: 65802 (2010 Jun 8) 56873 WARNING: duplicate chain id 3 WARNING: consecutive residue_groups with same resid 2 ERROR: duplicate atom labels 1 ERROR: improper alt. conf. 0 ERROR: duplicate model id 0 ERROR: residue group with multiple resnames using same altloc About86%ofthePDBentriesre‐usethesamechainidformultiplechains(whichareeitherseparatedbyotherchainsorTERcards).Inmanycases,there‐usedchainidistheblankcharacter,whichisclearlya minor issue. However, we flag this situation to assist people in producing new PDB files withunambiguouschainids.

The next item in the list, "consecutive residue_groups with same resid" (whereresid=resseq+icode), was mentioned before. This situation is best avoided to minimize thechancesofmis‐interpretingthePDBfiles.

"duplicate atom labels" pose a serious practical problem (therefore flagged as "ERROR") since it isimpossibletouniquelyselectatomswithduplicate labels, forexampleviaanatomselectionsyntaxasused in many programs (CNS, PyMOL, PHENIX, VMD, etc.) or via PDB records in the connectivityannotationsection(LINK,SSBOND,CISPEP).Atomserialnumbersarenotsuitableforthispurposesincemanyprogramsdonotpreservethem.

OnlyonePDBentry(1JRT)includes"improperalt.conf."asintroducedintheprevioussection.

TherearenoPDBentrieswithtwootherpotentialproblemsdiagnosedbytheiotbx.pdbmodule.MODELids are unique throughout the PDB archive (as an aside: and all MODEL records have a matchingENDMDL record). Finally, in all 74 files with mixed residue names (i.e. conformers with differentsequences),thereisexactlyoneresiduenameforagivenaltloc.

AcknowledgmentsWe gratefully acknowledge the financial support of NIH/NIGMS under grant number P01GM063210.Our work was supported in part by the US Department of Energy under Contract No. DE‐AC02‐05CH11231.

ReferencesGrosse‐Kunstleve, R.W., Zwart, P.H., Afonine, P.V., Ioerger, T.R., Adams, P.D. (2006).Newsletter of theIUCrCommissiononCrystallographicComputing,7,92‐105.

ComputationalCrystallographyNewsletter(2010).1,4–11

Page 12: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

12

ARTICLES

Secondarystructurerestraintsinphenix.refineNathanielEcholsa,JeffreyJ.Headda,PavelAfoninea,andPaulD.Adamsa,b

aLawrenceBerkeleyNationalLaboratory,Berkeley,CA94720bDepartmentofBioengineering,UniversityofCaliforniaatBerkeley,Berkeley,CA94720

Correspondenceemail:[email protected]

IntroductionWe have implemented simple, automatic restraints for common secondary structure elements inproteins and nucleic acids, which can help reduce over‐fitting at low‐to‐moderate resolution andpreservethesecondarystructuregeometrythatcanbedistortedduetothelowresolution.Internally,thesearesimplydistancerestraintsbetweenhydrogen‐bondingatomsinhelices,sheets,ornucleicacidbase pairs,with orwithout explicit hydrogens. The currentmethod has the advantage of being fast,easilyreducedtorelativelysimple inputparameters,anduseful for improvingpoor‐qualityregionsofstructurewherethebondingmaynotbeautomaticallyrecognized.

GeneraldescriptionandsyntaxA new command for identifying secondary structure and generating the necessary parameters,phenix.secondary_structure_restraints, was introduced in version 1.6.1, andmost of thefunctionality is also accessible throughphenix.refine itself. In the absence of user‐defined atomselections,oriftheparameterfind_automatically=True isset,theprogramwill lookfirstintheheader of the input PDB file(s) andparse anyHELIX and SHEET records (Figure 1). Note thatwhen

running from phenix.refine, a different scope requires the usesecondary_structure.input.find_automatically=True.Ifnonearefound,theopen‐sourceprogramKSDSSP (UCSF Computer Graphics Laboratory; Kabsch & Sander 1983) is used to generatetheserecordsbasedontheinputgeometry.BecausethePDBformatusesfixedcolumns(Bernsteinetal.1977,Bermanetal.2000)andisnoteasilyedited,therecordsareconvertedtoanintermediateformatusingthesameparametersyntaxandatomselectionlanguageasotherprogramsinPHENIX(Figure2;Adamsetal.2010).Exampleofuse:

phenix.secondary_structure_restraints model.pdb > ss.eff Although the PDB format specification provides for ten types of alpha helix, only the three mostcommonlyfoundinnaturalproteinsareprocessed:alpha(formingbondsbetweenthecarbonyloxygenofresiduenandtheamidenitrogenofresiduen+4),310(n+3),andpi(n+5).Asthealphaformisbyfarthe most common, this is the default helix type. Beta strands have only two types, parallel andantiparallel, which can coexist in a single sheet (Figure 3). The parameter blocks for sheets areconsiderably more complicated because of the order‐dependency of individual strands, and therequirementofexplicitannotationofthestartandendofhydrogenbondingbetweenstrandpairs.Theparameters may be freely edited to add or remove secondary structure groups, but we recommendminimalchangestothesheets,duetothecomplexityofthedefinitions.

HELIX 8 8 ASP A 181 ARG A 191 1 11 HELIX 9 9 SER A 192 ASP A 194 5 3 HELIX 10 10 SER A 195 GLN A 209 1 15 SHEET 1 A 5 ARG A 13 ASP A 14 0 SHEET 2 A 5 LEU A 27 SER A 30 -1 O ARG A 29 N ARG A 13 SHEET 3 A 5 VAL A 156 HIS A 159 1 O VAL A 156 N PHE A 28 SHEET 4 A 5 ASP A 51 ASP A 54 1 N ALA A 51 O LEU A 157 SHEET 5 A 5 ASP A 74 LEU A 77 1 O HIS A 74 N VAL A 52

Figure1.BetaPDBsyntaxforsecondarystructurerecords(excerptedfromPDBID1ywf;Grundneretal.2005).

ComputationalCrystallographyNewsletter(2010).1,12–17

Page 13: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

13

ARTICLES

Fromwithinphenix.refine (Afonine et al. 2005), use of the restraintsmay be activatedwith theparametermain.secondary_structure_restraints=True.Ifnohelixorsheetatomselectionsareincludedintheinputparameters,theautomatic identification procedure willbe run; otherwise, the existingparameters are used withoutmodification. Once secondary structureisassigneditismaintainedfortherestofthe refinement, but future versions willprobably add the option to re‐annotatethestructureaftereachmacro‐cycle.Thelog file (andconsoleoutput)will containinformation about the overall secondarystructurecontent,ifany.

HydrogenbondparameterizationHydrogen bonds are modeled as simpleharmonic restraints; we have notattempted to use a more physicallyrigoroushydrogen‐bondingpotential(forexample, Fabiola et al. 2002). Formaximum flexibility, either explicit orimplicit hydrogen bonds are supported;the latter use a longer distance restraintbetween heavy atoms. Themethods forconvertingatomselectionsintohydrogenbondsreliesonseveralassumptions:

• The order of residues in the inputPDB file exactly corresponds to theorderintheproteinitself

• Eachsecondarystructureelementiscontinuouswithnomissingresidues

• All residues are complete, with nomissingatoms

Unless the PDB file has undergonesignificant manual editing, theseconditions are unlikely to be violated.Themainexception is in thehandlingofhydrogen atoms, which are handledinconsistently by differentcrystallography applications. TheprogramwillfirstexaminethePDBfiletodetermine whether hydrogens arepresent, in which case it defaults toexplicit hydrogen bonds. However, ifnewly built residues are missinghydrogen atoms, they will not beproperly restrained. This can beremedied by running

Figure3.BetasheetspecifiedbytheparametersinFigure2,asrenderedinPyMOL. (Sidechainshavebeenomittedforclarity.)Note that not all residues annotated as belonging to strandsactually form hydrogen bonds; the actual bonding pattern isdictated by the bond_start_current andbond_start_previousparametersforeachstrand.

refinement.secondary_structure { helix { selection = "chain 'A' and resseq 181:191" } helix { selection = "chain 'A' and resseq 192:194" helix_type = alpha pi *3_10 unknown } helix { selection = "chain 'A' and resseq 195:209" } sheet { first_strand = "chain 'A' and resseq 13:14" strand { selection = "chain 'A' and resseq 27:30" sense = parallel *antiparallel unknown bond_start_current = "chain 'A' and resseq 29" bond_start_previous = "chain 'A' and resseq 13" } strand { selection = "chain 'A' and resseq 156:159" sense = *parallel antiparallel unknown bond_start_current = "chain 'A' and resseq 156" bond_start_previous = "chain 'A' and resseq 28" } strand { selection = "chain 'A' and resseq 51:54" sense = *parallel antiparallel unknown bond_start_current = "chain 'A' and resseq 51" bond_start_previous = "chain 'A' and resseq 157" } strand { selection = "chain 'A' and resseq 74:77" sense = *parallel antiparallel unknown bond_start_current = "chain 'A' and resseq 74" bond_start_previous = "chain 'A' and resseq 52" } } }

Figure2.Equivalentphenix.refineparameterstoFigure1.

ComputationalCrystallographyNewsletter(2010).1,12–17

Page 14: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

14

ARTICLES

phenix.ready_setpriortorefinementtocompletelyaddhydrogenstothestructure. Youcanalsoforce phenix.refine to ignore any hydrogens present and use the implicit bonds by passing theparametersubstitute_n_for_h=True.

Allsettingsrelatedtotheconfigurationofhydrogenbondingarelocatedintheh_bond_restraintsblock1.Currently,thedistancesusedareasfollows(N‐OdistancestakenfromFabiolaetal.2002):

• explicit(H‐O): 1.975Å • H‐Ooutliercutoff: 2.5Å • implicit(N‐O): 2.9Å • N‐Ooutliercutoff: 3.5Å

Bothexplicitandimplicitbondsdefaulttoasigma(standarddeviation‐essentiallyaninverseweight)of0.05,andaslackof0.Increasingthesigmareducesthestrengthofthebondrestraints;increasingtheslackallowsittomovefreelywithinasmallrange(+/‐slackineitherdirection)beforetherestraintisapplied. Initialexperimentsdonot indicateanyadvantagetousinganon‐zeroslack,but lowersigmavalues(e.g.0.02)arebeneficial insomecases,especiallywhereexplicithydrogensareused. Youmayalsooverridetheglobaldefaultsandsetseparatesigmaandslackvaluesforeachsecondarystructuregroup.

Once the hydrogen bonds have been identified, a short summary will be printed out to the logfile/console:

109 hydrogen bonds defined. Distribution of hydrogen bond lengths without filtering: 2.7259 - 2.8381: 7 2.8381 - 2.9504: 38 2.9504 - 3.0626: 38 3.0626 - 3.1748: 11 3.1748 - 3.2870: 7 3.2870 - 3.3993: 5 3.3993 - 3.5115: 0 3.5115 - 3.6237: 0 3.6237 - 3.7360: 1 3.7360 - 3.8482: 2 Ifannotationerrorsarepresent, thecalculatedbond lengthsmaycoveramuchwiderrange. Forthisreason, all outliers above a specified cutoff are filtered out before building the geometry restraints.Passing the parameter remove_outliers=False will override this, and in some instances mayactuallybebeneficial,butwerecommendvisuallyinspectingthehydrogenbondsfirsttoconfirmthatallare chemically appropriate. (This canbedone inPyMOL; see instructionsbelow.) After filtering, theoutputshownabovewillberepeatedforthefinalbondlist.

NucleicacidbasepairingVersion 1.6.2 adds partial support for hydrogen bond restraints between RNA base pairs (Figure 4)while more complete is in later nightly builds. Like CNS (Brünger et al. 1998), these are specifiedindividually rather than as contiguous ranges. The automatic detection and hydrogen bondparameterizationfollowsasimilarproceduretothatusedforproteins,exceptthattheall‐atomcontactanalysisprogramPROBE(Wordetal.1999)isusedtoidentifyexplicithydrogenbonds,andasimplifiedlistofbasepairsisderivedfromtheseresults.OnlycanonicalWatson‐CrickbasepairsandGUpairsaresupportedatthistime.Futureversionswillincorporateotherknownbasepairings,classifiedaccording

1Theseparametersareusedbybothphenix.secondary_structure_restraintsandphenix.refine;forthelatter,thefullscopenameisactuallyrefinement.secondary_structure.h_bond_restraints,butindividualparameterspassedascommandlineargumentsshouldberecognizedautomatically.

ComputationalCrystallographyNewsletter(2010).1,12–17

Page 15: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

15

ARTICLES

togeometryassuggestedbyLeontiset al. (2002), versus the olderSaenger (1984) Roman numeralnomenclature, which is no longercomplete.

For users who require a morecomplete set of hydrogenbonds,werecommend the server provided bythe Noller Lab at UCSC(http://rna.ucsc.edu/pdbrestraints/;Laurbergetal.2008),whichanalyzesaPDBfileandproducesoutputinthecustom bond format forphenix.refine. Note thatwe donotcurrentlyhaveanyequivalent to“stacking restraints” available in

PHENIX; additionally, althoughindividual bases already have tightplanarity restraints, the base pairsarenotforcedtobeplanar.

KnownlimitationsTheprimaryobstacle togenerating theserestraints is thedifficultyofobtainingcorrectandcompleteannotations.TherecordsinthePDBareoftenmisleadingand/orwrong;unfortunately,theannotationsprovidedbyKSDSSParealsooccasionallyincorrect.Commonerrorsincludetwoormoreindependentbutadjacenthelicesbeingannotatedasasinglealphahelix,despite90ºbendsora310helixinbetween.BothKSDSSP andPROBE are also very sensitive to input geometry,whichmeans that poorly refinedand/ormanuallybuiltstructureswillnothaveallsecondarystructureelementsdetectedautomatically.For these situations, carefulmanual annotation is essential tomaximize the usefulness of the addedrestraints.Workisunderwayonagraphicaleditorforpickingsecondarystructuresinamodel(seealsohttp://pymolwiki.org/index.php/ResDeforanotherapproach).

Atlowresolution,amoreseriousproblemisthelackofadditionalrestraintsonbackboneconformationoutsideof regular secondarystructureelements. Whenrefiningagainstdata significantlyworse than3.0Å,Ramachandranoutliersfrequentlycompriseseveralpercentofproteinresidues(0.2%isthelimitrecommended by Molprobity). Although phenix.refine does not currently offer Ramachandranrestraints,version1.6.2introducestheseparateoptionofrestrainingthemodelconformationtothatofahigh‐resolutionreferencestructure.

Additionalcaveats:

• NoattemptismadetoreconcilesecondarystructurerestraintswithNCSrestraints.Asalways,careshouldbetakentoexcluderegionsofgenuinedifferencefromNCSatomselections.

• Onlybondlengthisrestrained;anglesmaymovefreely(althoughinpractice,othergeometryrestraintsshouldlimittherangeofanglescompatiblewiththespecifiedbondlength).

• SupportforPDBfilescontainingnon‐blankinsertioncodesiscurrentlylimited,especiallyifaninsertioncodeoccursinthemiddleofasecondarystructureelement.

• Thedistributionofoxygen‐nitrogendistancesinbetasheetsappearstobeveryslightlybimodal,probablyduetothedifferenceingeometrybetweenparallelandantiparallelsheets.Increasingtheslackmaycompensateforthis,butitisn'tclearwhetherthisisnecessary.Futureversionsmayuse

Figure 4. Automatically detected hydrogen bonds in amixedprotein‐RNAstructure, thesignalrecognitionparticle(PDBID1hq1;Bateyetal.2001).

ComputationalCrystallographyNewsletter(2010).1,12–17

Page 16: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

16

ARTICLES

moreintelligentdistanceparameters. • Secondarystructureswhichcrosssymmetry‐relatedelementsarenotsupported;thismay

occasionallyhappenwithpalindromicDNAorRNAhelices.However,thecustombondsyntaxforphenix.refinemaybeusedtomanuallyspecifyhydrogenbonds.

• Forextremelylargestructures(e.g.ribosomes),wherethenumberofchainsexceedsthenumberofavailablesingle‐charactercodes,weencouragetheuseoftwo‐charactercodes,whicharealsosupportedbyCoot.However,ifyouprefertousethedeprecatedsegID,theparameterspreserve_protein_segid=Trueandpreserve_nucleic_acid_segid=TruewillincludethesegIDsintheoutputatomselectionswhenidentifyingnewsecondarystructureelements.(Youdonotneedtheseparameterstorunphenix.refinewiththeatomselectionscontainingsegIDs.)

• DetectionofnucleicacidbasepairswillonlyberunifRNAorDNAchainsareidentifiedinthePDBfile.Somehighly‐modifiedRNAmoleculessuchastRNAmayfailthisprocedure;forthesecases,theparameterforce_nucleic_acids=Trueprovidesaworkaround.

Otheruses Twootheroutputformatsareavailableforthesecondarystructurerestraints,albeitwithmorelimitedsupport: PyMOL commands for visualization (DeLano 2002), and distance restraints for REFMAC(Murshudovetal.1997).APDBfileisrequiredasinput,withpre‐definedatomselectionsoptional.Thesameproceduresdescribedaboveforoutlierfilteringandatomselectionareapplied,sothefinaloutputshouldbeidenticaltowhatwouldbeobtainedrunningphenix.refinewiththesameparameters.

• PyMOLformat: phenix.secondary_structure_restraints model.pdb \

[restraints.eff] format=pymol > h_bonds.pml

• Exampleofoutput: dist (chain 'A' and resi 9 and name N), (chain 'A' and resi 6 and name O)

• REFMACformat: phenix.secondary_structure_restraints model.pdb \

[restraints.eff] format=refmac > restraints.com

• Exampleofoutput: exte dist first chain A residue 37 atom O \ second chain A residue 41 atom N value 2.900 sigma 0.05

Restraints forREFMAC canbe includedaspartof the inputkeywords; forPyMOL,oncethePDBfile isloaded distance objects can be created by selecting “Run. . .” from the “File”menu and selecting the.pml script, or by typing (for example) “@/path/to/script/h_bonds.pml” at the PyMOL>prompt.(Forclarity,youmaywanttoenterthecommand“hide labels”afterrunningthescript.)

Additionalresources• KSDSSP(includedinPHENIXdistribution):

http://www.cgl.ucsf.edu/Overview/software.html#ksdssp• ResDe(custombondparametereditorforPyMOL):

http://pymolwiki.org/index.php/ResDe• Basepairingrestraintsgenerator(NollerLab):

http://rna.ucsc.edu/pdbrestraints/

ComputationalCrystallographyNewsletter(2010).1,12–17

Page 17: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

17

ARTICLES

AcknowledgementsWethankPeterGrey,BradleyHintze,DirkKostrewa,andFrancisReyesforscientificdiscussions,AntonVila‐Sanjurjo for suggestions and code, TimmMaier for testing and feedback, and Francis Reyes andAllyn Schoeffler for sharing unpublished data. We gratefully acknowledge the PHENIX IndustrialConsortiumandNIH/NIGMS(grantP01GM063210)forfinancialsupport.

ReferencesAdamsPD,AfoninePV,BunkócziG,ChenVB,DavisIW,EcholsN,HeaddJJ,HungLW,KapralGJ,Grosse‐KunstleveRW,McCoyAJ,MoriartyNW,OeffnerR,ReadRJ,RichardsonDC,RichardsonJS,TerwilligerTC,ZwartPH.PHENIX:acomprehensivePython‐basedsystemformacromolecularstructuresolution.ActaCrystallogrDBiolCrystallogr.66:213‐21.PMID:20124702

AfoninePV,Grosse‐KunstleveRW,AdamsPD(2005).CCP4Newsletter.42,contribution8.

BateyRT,SagarMB,DoudnaJA.(2001)StructuralandenergeticanalysisofRNArecognitionbyauniversallyconservedproteinfromthesignalrecognitionparticle.JMolBiol.307:229‐46.PMID:11243816

BermanHM,WestbrookJ,FengZ,GillilandG,BhatTN,WeissigH,ShindyalovIN,BournePE.(2000)TheProteinDataBank.NucleicAcidsRes.28:235‐42.PMID:10592235

BernsteinFC,KoetzleTF,WilliamsGJB,MeyerEFJr,BriceMD,RodgersJR,KennardO,ShimanouchiT,TasumiM.(1977)TheProteinDataBank:acomputer‐basedarchivalfileformacromolecularstructures.JMol.Biol.112:535‐42.PMID:875032

BrüngerAT,AdamsPD,CloreGM,DeLanoWL,GrosP,Grosse‐KunstleveRW,JiangJS,KuszewskiJ,NilgesM,PannuNS,ReadRJ,RiceLM,SimonsonT,WarrenGL.(1998)Crystallography&NMRsystem:Anewsoftwaresuiteformacromolecularstructuredetermination.ActaCrystallogrDBiolCrystallogr.54:905‐21.PMID:9757107

DeLano,W.L.ThePyMOLMolecularGraphicsSystem.(2008)DeLanoScientificLLC,PaloAlto,CA,USA.http://pymol.org

FabiolaF,BertramR,KorostelevA,ChapmanMS.(2002)Animprovedhydrogenbondpotential:impactonmediumresolutionproteinstructures.ProteinSci.11:1415‐23.PMID:12021440

GrundnerC,NgHL,AlberT.(2005)MycobacteriumtuberculosisproteintyrosinephosphatasePtpBstructurerevealsadivergedfoldandaburiedactivesite.Structure13:1625‐34.PMID:16271885

KabschW,SanderC.(1983)Dictionaryofproteinsecondarystructure:patternrecognitionofhydrogen‐bondedandgeometricalfeatures.Biopolymers22:2577‐637.PMID:6667333

LaurbergM,AsaharaH,KorostelevA,ZhuJ,TrakhanovS,NollerHF.(2008)Structuralbasisfortranslationterminationonthe70Sribosome.Nature454:852‐7.PMID:18596689

LeontisNB,StombaughJ,WestofE.(2002)Thenon‐Watson‐Crickbasepairsandtheirassociatedisostericitymatrices.NucleicAcidsRes.30:3497‐3531.PMID:12177293

MurshudovGN,VaginAA,DodsonEJ.(1997)Refinementofmacromolecularstructuresbythemaximum‐likelihoodmethod.ActaCrystallogrDBiolCrystallogr.53:240‐55.PMID:15299926

SaengerW.(1984)PrinciplesofNucleicAcidStructure.Springer‐Verlag,NewYork,NY.

WordJM,LovellSC,LaBeanTH,TaylorHC,ZalisME,PresleyBK,RichardsonJS,RichardsonDC.(1999)Visualizingandquantifyingmoleculargoodness‐of‐fit:small‐probecontactdotswithexplicithydrogenatoms.JMolBiol.285:1711‐33.PMID:9917407

ComputationalCrystallographyNewsletter(2010).1,12–17

Page 18: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

18

ARTICLES

cctbxSpotfinder:afastersoftwarepipelineforcrystalpositioningNicholasK.Sautera

aLawrenceBerkeleyNationalLaboratory,Berkeley,CA94720

Correspondenceemail:[email protected]

SynopsisThisarticledocuments theprogramdistl.signal_strength,usedto locatecandidateBraggspotson X‐ray diffraction images for macromolecular crystallography. While standalone analysis requiresabout 3 seconds/image on typical Linux systems, an order of magnitude increase in the overallthroughput can be achievedusing concurrentmultiprocessingwithin a client/server implementation.Theprogram is thus suitable for the rapid locationof thebest‐diffractingpositionswithin the crystalsampleusingalow‐doseX‐rayprobe.

IntroductionThedevelopmentofmicrofocusedX‐raybeamsatsynchrotronfacilityendstationshasmadeitpossibleto obtain higher‐signal, lower mosaicity datasets from small crystal samples (Fischetti et al., 2009).Withsmallcrystaldimensionsoforder5µm(matchedtothediameterofpresentmicrobeams),itmaybenecessarytoexaminenumerousspecimensinordertoassembleacompletedataset,makingithighlydesirabletoautomatetheprocessofcenteringeachsampleinthemicrobeam.Variousapproacheshavebeenproposed(Pothinenietal.,2006;Songetal.,2007),andithasbeenpossibletoautomaticallycenterthesample‐holdingloop(orotherdevice)usingvideomicroscopy.However,ithasbeenmoredifficulttovisuallyidentifysmallcrystalswithintheloop,thusrequiringthemorerobustapproachofscanningthesample (translating itwithrespect to themicrobeam), soas tocollecta low‐dosediffraction imageateach translationalposition. Ithasbeennoted thatsuchX‐raybasedautocenteringmay takeup to10minutes for the smallest crystals (Songet al., 2007),whichpresumably require very fine rastering tolocate. This outcome could be improved dramatically by the use of a photon‐counting pixel arraydetectorsuchasthePilatus‐6Mthatsupportsaframingrateof10Hz.Anaccompanyingchallengeistowritesoftwarethatcanquantifythemeasuredsignalatareasonablyhighturnaroundrate,sothatthediffractionanalysis cankeeppacewith the rapiddataacquisitionneeded for fine‐grainedcoverageofthesample.

The standalone spotfinder program and client/server pair described below are meant to provide atoolbox for rapid diffraction analysis within the beamline computing environment. The softwarearchitectureisthesameasdescribedforthecctbxpackage(http://cctbx.sf.net;Grosse‐Kunstleveetal.,2002),withahigh‐levelPythonscriptinginterfacethatismeanttoencouragethereuseandadaptationof code as the problems change. One example is that new detector hardware types can be readilysupportedastheyareintroduced.

InstallationThe spotfinder program is bundled and distributedwith the packagesLABELIT andPHENIX, and cantherefore be downloaded from either Web site (http://cci.lbl.gov/labelit/ or http://www.phenix‐online.org/). Newest‐versionPHENIX installers are released on an almost daily basis,whileLABELITreleases currently cycle every few months. Follow the installation instructions, and source theappropriate setup file (given in the message printed by the installer) to place the command‐linedispatchersonpath.

CommandlinereferenceUse the command distl.signal_strength to produce a quick summary of the diffractioncharacteristicsfromasinglediffractionexposure:

ComputationalCrystallographyNewsletter(2010).1,18–23

Page 19: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

19

ARTICLES

Usage:distl.signal_strength image_filename [parameter=value [parameter2=value2

...]] Example:

distl.signal_strength lysozyme_001.img distl.res.outer=2.0

Optionalcommandlineparameterschangetheprogramoperationasfollows(Defaultvaluesareshown):

distl.res.outer=None Ifspecified,thisistheouter(high)resolutionlimittobeusedforthediffractionanalysis,inÅngstroms.This option is useful for two reasons. Firstly, if a large number of images from the same crystalspecimenare tobeexamined(inorder to locate thebest‐diffractingposition), thenplacingauniformlimitontheresolutionrangepermitsthenumberofobservedBraggspotstoactasagoodproxyofthediffraction strength. Secondly, areas on the image that are outside the outer resolution limit are notanalyzed, potentially producing a substantial speedup in program throughput. [Note: areas on thestoredimagethatarenotpartoftheactivedetectorarealreadyexcludedfromanalysis.Examplesarethecornerareasoncircularimageplates(suchastheMar),theinactivepixelstripesbetweenfiber‐optictapers on CCD detectors (such as the ADSC), and the inactive pixel stripes between modules of thePilatusdetector.]Cameraparameterssuchasthesample‐to‐detectordistanceandswing‐armanglearetakenfromthefileheader,tocalculatetheresolutionateachpixel.

distl.res.inner=None Ifspecified,thisistheinner(low)resolutionlimittobeusedforthediffractionanalysis,inÅngstroms.Excludingthelow‐resolutionBraggspotsmaybeapotentialworkaroundifthereisseverebeamleakagearoundthebeamstopmasqueradingasaBraggsignal. However, thebuilt‐inheuristics that filteroutunusually‐largesignalareasandveryintensesignalswillnormallyexcludesuchartifactsanyway.Theinner resolution cutoff has no effect on overall program throughput, as the filter is applied after theimageisanalyzed.Thisprogramoptionisonlyrecommendedforunusualsituations.

distl.minimum_signal_height=[1.5 for CCDs; 2.5 for pixel-array detectors] Thisparameterdetermineswhetheragivenpixelisclassifiedasbackgroundorsignal.Activepixelsareclassified within non‐overlapping 100×100‐pixel tiles. For each tile, the best‐fit plane is chosen torepresentthebackgroundlevel.Pixelheightsaredistributedaboveandbelowthisplanewithanormaldistribution whose standard deviation σ is readily calculated. Pixels higher thanminimum_signal_height×σ above the background are classified as signal. Two successive rounds(nowwith50×50‐pixeltiles)areemployedtoremovethesignalpixelsfromthebackgroundlevel.WithCCD detectors the longstanding default of 1.5σ has been a highly successful compromise betweenincreasedsensitivity(favoringalowercutoff)andbetternoisediscrimination(favoringahighercutoff;any cutoff lower than 1.25σ produces numerous false Bragg spots). For pixel‐array detectors,whichlackanysignificantpoint‐spread function,2.5σgivesbetterresultsand isnowthepixel‐arraydefault.Images fromvery longunit cells (viruses, ribosomes)potentiallyhave close spots that run into eachother,withoutbeingseparateddowntothebaseline.Theheuristicrulesindistlwillrejecttheseclosesignals,andinseverecasesthiswillinterferewiththeabilitytoindexthelattice.Thesolutionistoraisethecutoff level toashighas5σ, thuscorrectlyseparatingtheclosespots. It is therefore importanttoconsiderraisingtheminimum_signal_heightifthecrystalshavealargeunitcell.Atpresentthereisnoautomaticwaytodothis;theprogramhasnowaytoknowthecelllengthaheadoftime.Inthefutureitwillbebeneficialtoaddsomepreliminarynoiseanalysistoprovidemoresophisticatedguidanceforspotdetection.

ComputationalCrystallographyNewsletter(2010).1,18–23

Page 20: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

20

ARTICLES

distl.minimum_spot_area=None [5 for pixel-array detectors, 10 otherwise] Ifspecified,thisistheminimumnumberofcontiguouspixelsthatisconsideredtobeaspot.ForimageplateandCCDdetectorsthehard‐codeddefaultis10pixels/spot.Withpixel‐arraytechnology,spotsaremuchnarrowerdue to the sharppoint‐spread function,but theyare still generallydistributedoverafew pixels. The pixel‐array default is therefore set to 5 pixels/spot. Synchrotron beamlinesexperimentingwithnewdetectorsoropticsshouldtestthisparametertofindtheoptimumsetting!

verbose=False IfTrue, theprogramwill outputdetailed statisticsonallBragg spotsand showsignalpixel locations.Thisispotentiallyusefulforbeamlinecommissioning.

pdf_output=None Ifspecified,thisisthefilename(*.pdf)foranoptionalPDF‐formatgraphicaloutputshowingtheimageandassociatedBraggsignals.Thisvisualoutputisthemostdirectwaytounderstandthe"bigpicture"ofwhattheprogramresultsmean.Comparingone'sownvisualimpressionofthediffractionimagewiththemarkedup spotfinder resultswill revealwhether theprogramhasmissed realBragg spots (falsenegatives)oroverinterpretedbadsignals(falsepositives).Thecontrastlevelispre‐setinternally.Pinkstripes areused to color‐code the inactive areasbetweendetectormodules (for thePilatus), and redsquares(againforthePilatus)areinactivepixels,whichshouldbethesameoneveryimageforagiveninstrument. WithinBraggspots, colorcircles indicate that thespothasbeen taggedasa "goodBraggspotcandidate"(seebelow). Pinkcirclestagthepositionof themaximumpixel,andredcirclesshowthespotcenterofmass.

--help Producesaninformativeprogramsynopsis.

ProgramoperationandoutputComputational steps performed by the program have already been described in several publications(Zhang et al., 2006; Sauter et al., 2004; Sauter& Zwart, 2009; Sauter&Poon, 2010). Typical resultswrittentostdoutareasfollows:

File : 0_clmtr_edge_403.cbf Spot Total : 177 In-Resolution Total : 170 Good Bragg Candidates : 157 Ice Rings : 1 Method 1 Resolution : 2.55 Method 2 Resolution : 2.08 Maximum unit cell : 108.9 %Saturation, Top 50 Peaks : 0.09 In-Resolution Ovrld Spots : 0 Bin population cutoff for method 2 resolution: 20% Number of focus spots on image #403 within the input resolution range: 170 Total integrated signal, pixel-ADC units above local background (just the good Bragg candidates) 147322 Signals range from 37.2 to 11773.1 with mean integrated signal 997.2 Saturations range from 0.0% to 0.7% with mean saturation 0.0%

ComputationalCrystallographyNewsletter(2010).1,18–23

Page 21: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

21

ARTICLES

Threetypesofspotcountarelisted:

• "Spot Total" is the number of separate spots that rise above the minimum thresholds for signalheightandspotarea.Agraphofaveragepixelintensityvs.resolutionisexaminedtoprefilterlikelyicerings.

• "In‐ResolutionTotal"isthesubsetremainingafterhigh‐andlow‐resolutionfiltersareapplied.Thecommand‐linedistl.res filter is appliedhere (if given), aswell as the "Method2Resolution" filterdescribedbelow.

• "Good Bragg Candidates" are the spots remaining after the application of several spot‐qualityheuristics:

o Ahistogramofspotcountvs.resolutionisusedtoimplementasecondfilterforicerings.o Spotswithill‐definedprofileshavingmorethantwosignalmaximaarenotcounted.o Outliersinintensity,area,eccentricityandskewnessarethrownaway.o Spotswithtoo‐closenearestneighborsarefiltered.

LimitingresolutionisestimatedbythetwomethodsdefinedinZhangetal.(2006).Method2isusedtochoosespotsforLABELITautoindexing.Itreliesontheabilitytoproduceahistogramofspotcountvs.resolution, and therefore requires aminimumnumberof spots (usually 25). The resolution cutoff isdeterminedbynotingthefalloffinbinpopulationathigherdiffractionangles.Themaximumunitcellisestimatedbyobservingnearest‐neighbordistancesandassumingthattheclosestspacingcorrespondstothelargestunitcelllength.

"In‐resolution" spots are used to produce several signal strength metrics: "% Saturation", "In‐ResolutionOverloadedSpots","Signalsrange"and"Saturationsrange".

Afinal"Totalintegratedsignal"metriciscomputedonthe"GoodBraggCandidates". Thisvalueisthesummedsignalheight(correctedforbackground)overallsignalpixels.ADCunitsareanalog‐to‐digitalunits,asrecordedintherawimagefile.

PerformanceassessmentforcrystalpositioningTo position the crystal optimally in the beam, we aim to translate the sample to the position thatmaximizes the total number of good Bragg spots, or average intensity of Bragg spots,within a givenresolutionrange.Low‐doseX‐rayscanbeusedtoperformthisrasterscanoverthesample,asdescribed(Song et al., 2007). Very high throughput is achievedwith shutterless exposures using a Pilatus‐6Mdetector. However, the resulting turnaround time of about 0.2 seconds/image places very highperformance requirements on the computational pipeline; which typically takes about 3 seconds toprocessoneimage.Themagnitudeofthechallengecanbeassessedbyprofilingthespotfindingcode,asisdoneherewitha64‐bit,2.9GHzXeonmachinerunningFedoraCore8.Theprocessorwasequippedwith32GBRAMand16CPUcores,althoughonlyonecorewasusedbythesingle‐threadedspotfinderprocess:

Computationalstep Typicaltime/image CommentsLoadthedynamiclibraries 0.50sec Client/serverremovesthisoverheadReadfilefromNFSdisk 0.70sec LocalfileI/Otakesonly0.04secUncompressCBFimagetomemory 0.27sec cctbx‐optimizedcodetakesonly0.09secClassifypixels:background/signal 0.63sec Firsticeringfilter 0.23sec Findallspots 0.14sec Secondiceringfilter 0.15sec Additionalheuristicsforgoodspots 0.15sec Totaltime: 2.77sec

ComputationalCrystallographyNewsletter(2010).1,18–23

Page 22: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

22

ARTICLES

Client‐serverarchitectureThe above‐listedperformancedata show that anorder‐of‐magnitude improvement is needed to keeppacewithPilatus‐6Macquisition.Wethereforemovetoaclient‐serverarchitecturethateliminatestheneed to reload thedynamic libraries foreach image (since theserverprocess ispersistent),andrunswith multiple processes, so that numerous images may be processed concurrently within separatecores. At present, 16‐coreCPUs are available for under $6000, so they arewithin reach of beamlineoperating budgets. With this scheme it is easy to achieve the desired 0.2 second/image totalthroughput.

Server:

distl.mp_spotfinder_server_read_file [parameter=value ... ] Theprogramoperatesasamultithreadedserver.Allowedparametersare:

• distl.port=8125(Required)Theserverwilllistenforrequestsonthisport.• distl.processors=1(Required)Thetotalnumberofprocessestobeforkedtolistenfor

requests.• distl.minimum_spot_area=sameasabove,theminimumspotareainpixels.• distl.minimum_signal_height=sameasabove,theminimumsignalheight.• distl.res.outer=sameasabove,theouterresolutionlimitinÅngstroms.

Client:

distl.thin_client <filepath> <host> <port> Nokeywordparametersareallowed. Theclientsimplytakesthefilepath(mustbeavalid filepathontheservermachine),hostname(usually"localhost")andportnumber,andoutputsthespotanalysistostdout.

Notes:

TheservercanbekilledfromthecommandlinebyCtrl‐C,orviatheclientbysendingthemessage:

distl.thin_client EXIT <host> <port>

Whiletheserverissupposedtoqueuerequests,toomanyatoncecancausesomerequeststodropout,in a manner that is not fully characterized. Therefore in the example given (<path to sources>/spotfinder/servers/thin_client.csh),a/bin/sleepcommandisusedtotimetherequestsatareasonablepacegiventheparticularserverhostandnumberofprocesses.Inapplication,it is the responsibility of the caller (the beamline data collection process) to avoid overloading theserverwithtoomanysimultaneousrequests.

LABELITcontainsaseparateprogram(labelit.distl)toperformasimilarspotfindingfunctionforuseinautoindexing.Thatimplementationisnotsuitableformultiprocessingbecauseitwritesthespotlisttodiskinthecurrentworkingdirectoryunderaconstantfilename.Thefilecanbeerasedwiththecommandlabelit.reset.

In contrast, distl.signal_strength and distl.mp_spotfinder_server_read_file performallworkinmemorywithoutanyfileoutput.

ResultsLow‐doseexposureswereusedtoprobesamplesona5×6‐positiongrid(Diamond,ADSCdetector)ora

ComputationalCrystallographyNewsletter(2010).1,18–23

Page 23: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

23

ARTICLES

5×5‐positiongrid(SSRL,Pilatus‐6M).Imageswerevisuallyinspectedtoproduceasubjectiverankforeach exposure, taking into account factors such as the limiting resolution and strength of the Braggspots.Separately,theimageswereanalyzedwiththeautomatedspotfinderprocess,usinganup‐frontresolutionlimit(distl.res.outer)of3.0Å.

An automated ranking schemewas developed that is consistentwith the subjective rank from visualinspection. The 30 or 25 images from each sample are ranked by two criteria, the "Total IntegratedSignal"inpixel‐ADCunits,andthecountof"GoodBraggCandidates",andtherankscoresbasedonthesetwocriteriaareaveragedwithequalweight. If two images receive thesameaveragescore, the tie isbrokenbygivingahigherpriorityto"TotalIntegratedSignal". Noscoreisdevelopedunlessthereisanumerical score for "Method 2 Resolution" (although the resolution isn't actually included in theranking);thereforeimageswithtoofewspotsarenotscored.

ApplicationprogramminginterfaceThe spotfinder software can be accessed directly through Python code. Although the necessaryinterface isnot formallydocumented,exampleusage isgivenby theprogram itself,within the<path to sources>/spotfinder/ directory:

./command_line/signal_strength.py:Illustratestheuseofcommand‐lineparameters.

./applications/signal_strength.py: Illustrates thecore functioncalls, aswellas thedetailedparsingoftheresultingspotlist.

AcknowledgmentsNumerouscontributionsfromsynchrotrongroupshelpedtoshapethissoftware.TestresultswerecontributedbyMichael Soltis, Ana González and Penjit (Boom)Moorhead (Stanford SynchrotronRadiation Lightsource); CraigOgata,MarkHilgartandSudhirPothineni(GM/CA‐CAT,AdvancedPhotonSource);JohnSkinnerandAnnieHeroux(National Synchrotron Light Source); and Alan Ashtun, KatherineMcAuley, GraemeWinter andMarkWilliams(Diamond Light Source, Ltd., UK). Herbert Bernstein (Dowling College) offered his expertise toward theoptimization of CBF decompression, and Chris Nielson (Area Detector Systems Corp.) engaged in valuablediscussions.RalfGrosse‐KunstleveatLBNLhelpedtoimplementtheoverallsoftwarearchitecture.Thefinancialsupport of NIH/NIGMS under grant number R01GM077071 is gratefully acknowledged. The work was partlysupportedbytheUSDepartmentofEnergyunderContractNo.DE‐AC02‐05CH11231.

ReferencesFischetti,R.F.,Xu,S.,Yoder,D.W.,Becker,M.,Nagarajan,V.,Sanishvili,R.,Hilgart,M.C.,Stepanov,S.,Makarov,O.&Smith, J.L.(2009).Mini‐beamcollimatorenablesmicrocrystallographyexperimentsonstandardbeamlines.J.SynchrotronRad.16,217‐225.Grosse‐Kunstleve, R.W., Sauter, N.K., Moriarty, N.W. & Adams, P.D. (2002). The Computational Crystallography Toolbox:crystallographicalgorithmsinareusablesoftwareframework.J.Appl.Cryst.35,126‐136.Poon, B.K., Grosse‐Kunstleve, R.W., Zwart, P.H. & Sauter, N.K. (2010). Detection and correction of underassigned rotationalsymmetrypriortostructuredeposition.ActaCryst.D66,503‐513.Pothineni,S.B.,Strutz,T.&Lamzin,V.S.(2006). Automateddetectionandcentringofcryocooledproteincrystals. ActaCryst.D62,1358‐1368.Sauter,N.K.,Grosse‐Kunstleve,R.W.&Adams,P.D. (2004). Robust indexing forautomaticdata collection. J.Appl. Cryst.37,399‐409.Sauter,N.K.&Zwart,P.H.(2009).Autoindexingthediffractionpatternsfromcrystalswithapseudotranslation.ActaCryst.D65,553‐559.Sauter,N.K.&Poon,B.K.(2010).Autoindexingwithoutlierrejectionandidentificationofsuperimposedlattices.J.Appl.Cryst.43,611‐616.Song,J.,Mathew,D.,Jacob,S.A.,Corbett,L.,Moorhead,P.&Soltis,S.M.(2007).Diffraction‐basedautomatedcrystalcentering.J.SynchrotronRad.14,191‐195.Zhang,Z.,Sauter,N.K.,vandenBedem,H.,Snell,G.,andDeacon,A.M.(2006). Automateddiffraction imageanalysisandspotsearchingforhigh‐throughputcrystalscreening.J.Appl.Cryst.39,112‐119.

ComputationalCrystallographyNewsletter(2010).1,18–23

Page 24: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

24

ARTICLES

AtomicDisplacementParameters(ADPs),theirparameterizationandrefinementinPHENIX

PavelV.Afonine,aAlexandreUrzhumtsev,b,cRalfW.Grosse‐KunstleveaandPaulD.Adamsa,daLawrenceBerkeleyNationalLaboratory,Berkeley,CA94720bIGBMC,CNRS­INSERM­UdS,1rueLaurentFries,B.P.10142,67404Illkirch,FrancecUniversitéNancy;DépartementdePhysique­Nancy1,B.P.239,FacultédesSciencesetdesTechnologies,54506

Vandoeuvre­lès­Nancy,France.dDepartmentofBioengineering,UniversityofCaliforniaatBerkeley,Berkeley,CA94720

Correspondence email: [email protected]

IntroductionThis article describes the parameterization of, and refinement procedures for, Atomic DisplacementParameters(ADPs,orB‐factors),astheyareimplementedinthecrystallographicstructurerefinementprogram phenix.refine (Afonine et al., 2005). The algorithms and parameterizations areimplementedusinganobjectorientedlibraryapproachandthuscanbere‐usedindifferentcontexts.

1.AtomicDisplacementParametersDiffraction experiments produce data representing time‐ and space‐averaged images of the crystalstructure:time‐averagedbecauseatomsareincontinuousthermalmotionsaroundmeanpositions,andspace‐averagedbecausethereareoftensmalldifferencesbetweensymmetrycopiesoftheasymmetricunitinacrystal,especiallyinthecaseofmacromolecularcrystals.Thedynamicdisplacementsandthestatic spatial disorder lead to vanishing high‐resolution data in reciprocal space and blurring of thediffractingdensity (electronornuclear) in real‐space. Ignoring thedisplacementswhenmodeling thediffractionexperimentwouldleadtoapoorfitofthecalculateddatatotheobserveddata.Modelingofthe small dynamic displacements as isotropic or anisotropic harmonic displacements has been astandard practice from the earliest days of crystallography. Larger displacements (beyond harmonicapproximation) canbemodeledbyusingananharmonicmodel (Gram‐Charlierexpansion; Johnson&Levy,1974;Kendal&Stuart,1958;notavailableinphenix.refine)or“alternativeconformations”.

Theatomicdisplacementisasuperpositionofanumberofcontributions(Dunitz&White,1973;Prince&Finger,1973;Johnson,1980;Sheriff&Hendrickson,1987;Winnetal.,2001),suchas:

• Localatomicvibration• Motionduetoarotationaldegreeoffreedom(e.g.librationaroundatorsionbond)• Loopordomainmovement• Wholemoleculemovement• Crystallatticevibrations

Moredetailedmodelscanbeenvisioned,but inpracticemanyof today’s refinementprogramsuseanapproximationandseparatethetotalADP,UTOTAL,intothreecomponents:

UTOTAL=UCRYST+UGROUP+ULOCAL (1)

While the individualmembersof (1) representdifferent kindsof displacements, eachof themcanbemodeledwithdifferentaccuracyorusingdifferentmodels.Forexample, localatomicvibration,ULOCAL,canbemodeledusinga lessdetailed, isotropic,modelthatusesonlyoneparameterperatom.Amoredetailed (and accurate) anisotropic parameterization uses six parameters but requires moreexperimentalobservationstobepractical.Groupatomicdisplacement,UGROUP,canbemodeledusingtheTLSparameterization(UTLS)orjustoneparameterpergroupofatoms(Fig.1).

Therearearelatively largenumberofconventionsandnotationsusedinconnectionwithADPs(B,U,U*,UCART,UCIF,etc);seeGrosse‐Kunstleve&Adams(2002)foracomprehensivereview.InwhatfollowsweconsistentlyuseUassumingthattheappropriateconventionisusedbasedonthecontext.

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 25: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

25

ARTICLES

2.UCRYSTUCRYST (a symmetric 3x3matrix) models thecommondisplacementofthe crystal (latticevibrations) as a wholeand some additionalexperimental anisotropiceffects (Sheriff &Hendrickson,1987;Usónet al., 1999). Thiscontribution is exactlythe same for all atomsand thus it possible totreat this effect directlywhile performing overallanisotropic scaling

(Afonine et al., 2005) tocompute the total modelstructurefactors

Fmodel = koverall exp −htUcrysth4

Fcalc + ksol exp −

Bsols2

4

Fmask

(2)

UCRYSTisforcedtoobeythecrystalsymmetryconstraints.phenix.refinereportsrefinedelementsofUCRYST matrix expressed in a Cartesian basis and uses the BCART notation (for details, see Grosse‐Kunstleve&Adams;2002).

3.UGROUPUGROUPisintendedtomodelthecontributiontoUTOTALarisingfromconcertedmotionsofmultipleatoms(groupmotions). Itallowsforthecombinationofgroupmotionatdifferent levels(forexample,wholemolecule+chain+residue)andfortheuseofmodelsofdifferentdegreesofsophistication(oraccuracy),suchasgeneralTLS,TLS fora fixedaxis (a librationalADP;ULIB),andasimplegroup isotropicmodelwith one single parameter. In itsmost general form,UGROUP can beUTLS +ULIB +USUBGROUP,where, forexample,UTLSwouldmodelthemotionofthewholemoleculeoralargedomain,USUBGROUPwouldmodelthe displacement of a smaller group such as a chain using a simpler one‐parameter model andULIBwould model a side chain libration around a torsion bond using a simplified TLS model (Stuart &Phillips, 1985). Depending on the context (model and data quality), not all these components can berealized.Forexample,UGROUPmaybejustUTLSorUSUBGROUP.NestedTLSparameterization(whenasmallerTLSgroupcanbeenclosedwithinalargerTLSgroup)isnotyetimplementedinphenix.refine.

3.1.USUBGROUPUSUBGROUP represents one refinable isotropic ADP per group of selected atoms, with any number ofgroups. In phenix.refine there are two pre‐defined selections for USUBGROUP to refine one or two(side+mainchainatoms)USUBGROUPperresidue.Anarbitrarilyselectedsetofatomscanalsobeagroup.There is no general rule for choosing one parameterization over another; the choice depends on theresolution (data‐to‐parameter ratio)and isusuallymadebysystematic testingusingRworkandRfree asthecriteria.NorestraintsareappliedtoUSUBGROUP,butthevaluecanbeconstrainedbetweenpredefinedminimumandmaximumvalues.

Figure1.Hierarchyofcontributionstoatomicdisplacementparameter.

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 26: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

26

ARTICLES

3.2.ULIBThisisaspecialcaseoftheTLSparameterizationforarigid‐bodymotionthatoccursaroundafixedaxis(seeforexample:Dunitz&White,1973;Stuart&Phillips,1985).Anexampleofsuchmotioncouldbealibration of a flexible amino‐acid side chain around a bond vector (such as the Cα‐Cβ torsion bond.Currentlythisapproachisbeingimplementedinphenix.refine.

3.3.UTLSIftheTLSmodelisusedfortherigid‐bodymotionofagroupofatomsthen

UTLS=T+ALAt+AS+StAt (3)

with 20 refinable T (translation), L (libration) and S (screw‐rotation) matrix elements per group(Schomaker&Trueblood;1968).

ThechoiceofTLSgroupsisoftensubjectiveandmaybebasedonvisualinspectionofthemoleculeinanattempt to identify distinct and potentially independent fragments. A more rigorous approach isimplemented in the TLSMD algorithm (Painter & Merritt, 2006a, 2006b)). The TLSMD algorithmidentifies TLS groups by splitting a whole molecule into smaller pieces followed by fitting of TLSparameterstothepreviouslyrefinedatomicB‐factorsforeachpiece.Therefore,itisveryimportantthattheinputADPvaluesfortheTLSMDprocedureareminimallybiasedbytherestraintsusedinpreviousrefinements,andmeaningfulingeneral(notresettoanarbitraryconstantvalue,forexample).Ideally,onemayneed toperforma roundofunrestrainedADP refinement just for thepurposeofTLSgroupidentificationwithTLSMD.Sinceanunrestrainedrefinementmaynotbestable(itmightproducenon‐physicalvaluesoralargespreadofrefinedB‐factors,especiallyatlowerresolution),itmightbebettertoperformagroupB‐factorrefinementonly,usingoneortworefinableparametersperresidue.SucharefinementwillnotmakeuseofanyADPrestraintsandthereforeyieldsrefinedB‐factorsthatattemptto fit the data most closely within thelimits of the group definition used.Futhermore, currently the TLSMDprocedure does not generate a uniquedefinitive answer for the TLS groupselectionbutrathergivesalistofpossiblechoicesandtheresearcherhastomakethedecision to test one or more TLS groupselections; thisstill leads toanelementofsubjectivityinthedefinitionofTLSgroups.

3.4.ULOCALULOCALmodels harmonic atomic vibrationsoccurring around the mean atomicposition. Depending on the experimentaldataquality,thiscanbeasimpleisotropicmodel where the atomic vibrations haveequalamplitude inalldirections,or it canbemore complex where atomic vibrationisassumedtobeanisotropic. Ideally, ifallthe group (or global) contributions to thetotal atomic displacement are subtracted,leaving only the local atomic vibrationULOCAL,thenitshouldobeyHirshfeld’srigidbond postulate (Hirshfeld, 1976)providing a basis for the use of similarity

Figure2.Illustrationofsimilarityrestraints.

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 27: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

27

ARTICLES

restraints(Fig.2).ForisotropicADPs:

Tadp = U local,i −U local, j( )2

pairs of bonded atoms (i, j)

∑ (4)

andforanisotropicADPs:

Tadp = U local,ik −U local, j

k( )2

k=1

6

∑pairs of bonded atoms (i, j)

∑ (5)

which is similar to (4) where the inner sum spans over all six ADP matrix elements. Despite itssimplicitythisformulahasprovedusefulinmanycases.However,moresophisticatedapproacheshavebeen suggested in order to better handle a number of special cases for which (5) is suboptimal(Hendrickson,1985;Schneider,1996;Sheldrick&Schneider,1997).Sincephenix.refineallowsforamixture of atoms with isotropic or anisotropic ADPs, it may happen (at least theoretically) that anisotropic atom is bonded to one having an anisotropic ADP. Currently, in this case only the isotropiccomponentofeachofthetwoatomsisparticipatingintherestraint.

4.Practice:mixingcontributionsintoUTOTAL,restraints

4.1.UTOTAL=UCRYST+ULOCALIn practice, it is still customary to have one isotropic refinable parameter per atom at typical'macromolecular'resolutions(~1.7‐3.0Å),thatisUTOTAL=UCRYST+ULOCAL,wheretheUGROUPcontributionisaccumulatedintootheratomicparameters,includingUCRYSTandULOCAL.

Since in this case ULOCALcontains other contributions todisplacements (UGROUP) this inturn invalidates the use ofrestraints (4‐5) (see Fig. 3 foran illustration).AlthoughFig.3contains some elements ofdramatization and the actualdeviations from similarity maynot be as large as shown(especially for covalentlybondedatoms),stillinthiscaseforcing the B‐factors to benearly identical is less valid. Apossiblework‐around is to usea knowledge‐based type ofrestraint introduced byTronrud (1996). However, inphenix.refine we haveimplemented a ‘softer’algorithm for similarityrestraints,whichisbasedonthefollowingassumptions:

• A bond is almost rigid, therefore the ADPs of bonded atoms (ULOCAL component) are similar(Hirshfeld,1976);

• ADPvaluesofspatiallyclose(includingnon‐bonded)atomsaresimilar(Schneider,1996);

Figure 3. While local atomic vibrations (ULOCAL obey similarityrestraints, the total ADP (UTOTAL)may not obey similarity restrainsbecausethecontribution(UGROUP)arisingfromrigid‐bodymotionofawholefragmentmayadddifferentdisplacementstoeachatom.

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 28: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

28

ARTICLES

• ThedifferencebetweentheADPvaluesofatomscloseinspaceisrelatedtotheabsolutevaluesof the ADP values. Atoms with higher ADP values are allowed larger differences (Ian Tickle,CCP4BulletinBoard,letterfromMarch14,2003).

Consideringtheaboveassumptions,weobtain

Tadp =1rijp

U local,i −U local, j( )2

U local,i +U local, j( )q

j=1

Matoms

i=1

Natoms

∑ (6)

HereNatoms is thetotalnumberofatoms inthemodel, the innersumisextendedoverallMatoms in thesphereofradiusRaroundatomi,rij isthedistancebetweentwoatomsiand j,Ulocal,iandUlocal,jarethecorrespondingisotropicADPvalues,pandqareempiricalconstants.Bydefault,R,pandqarefixedatempiricallyderivedvalues:5.0Å,1.69and1.03,respectively,buttheycanalsobechangedbytheuser.Thefunctionreducestoformula(4)ifp=q=0,andtheradiusRissettobeapproximatelyequaltotheupperlimitofatypicalbondlength.

Athighenoughresolution,approximatelybetterthan1.6‐1.7Å,UGROUP=UTLSisnotused(currentlyisnotimplementedinphenix.refine)leavingUTOTAL=UCRYST+ULOCALwheretheULOCALcanbeanisotropic.

4.2.UTOTAL=UCRYST+UGROUP+ULOCALThis is themost advanced formulationofADPparameterizationavailable inphenix.refine. In thiscasecurrentlyULOCALcanonlyberefinedisotopicallyforthoseatomsthatparticipateinaTLSgroup.TheNCSrestraints(ifavailableandused)aswellas(4)areappliedtoULOCALonlyandnottothewholeADP,UTOTAL.

phenix.refineallowsanycombinationoftheaboveADPrefinementstrategiestobeappliedtoanyselectedpartofthestructure.Theonlyexception(whichisatechnicallimitationandmightbechangedinthefuture)isthatanatomcannotbeinaTLSgroupandsimultaneouslyhaveananisotropicULOCAL.

ItisimportanttonotethatthepositivedefinitenessofUTOTALisensuredatalltimes,whiletheindividualcomponentsmayormaynotbepositivedefinite(exceptUCRYST,seebelow).Forexample, incombinedTLS and individualADP refinement(UTOTAL = UCRYST + UGROUP + ULOCAL),ULOCALarenotforcedtobenon‐zeroor even positive and UTLS are notforcedtobepositivedefiniteeither,while UTOTAL is assured to bepositive definite (by usingeigenvalue filtering). The benefit ofsuch an approach is that it cancompensatefornon‐optimalchoicesofTLSgroups (discussedabove)bya corresponding adjustment ofULOCAL(whichinsuchcasesbecomesacompensation factorandhas littlephysical meaning), while theresulting overall B‐factor, UTOTAL, isstill positive definite. UCRYST is anexception, and the physicalcorrectness and compatibility withthecrystalsymmetryareenforcedat

Figure 4. Protocol of combined TLS and individual ADPrefinement.

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 29: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

29

ARTICLES

the bulk‐solvent correctionandanisotropicscalingstep.

The contributions UCRYST,UGROUP andULOCAL in (1) arehighly correlated making itgenerally impractical toseparate them. In addition,when refinedsimultaneously theelements of the T, L and Smatrices appear to becorrelated aswell. This cangenerate a number ofnumerical issues, such as:refinement of TLS matricesto non‐physical values,refinement becomingstalled, and strong dependence of the final refined values on the starting values. To overcome theseissueswedevelopedacomplexalgorithmforefficientrefinementofthetermsinUTOTAL;thedetailsareoutlinedatFig.4.Theprocedurebeginswithobtainingsensible initialvalues forTLSparametersandULOCAL.This isperformedinthreestages:1)pre‐conditioningofB‐factorsbyperformingapreliminarygroup ADP refinement, 2) extractingT matrix from these refined values as described by Afonine &Urzhumtsev (2007) and 3) least‐squares fit of theT, L and S matrices toUTOTAL starting with theTmatrixobtainedfromthepreviousstepandzeroorpreviouslyrefinedLandSmatrices.ThefittedTLSmatricesarenowthestartingpointfortheTLSrefinementagainstcrystallographicdatainthesecondstep.Finally, the individualB‐factorsare refinedkeepingallother components fixed.This is repeatedseveraltimes(3bydefault).

This combined ADP refinement protocolwas tested on 355 structures from PDB that contained TLSrecords. The structures were selected using the strict criteria: correct TLS selections and R‐factorsreproduciblewithin0.5%.ForthetestpurposestheoriginalB‐factorsinthesestructureswereallre‐setto the modelaverage value.There‐refinementof ADP valuesusingtheprotocoldescribed aboveresulted incomparable orbetter (thanreported) final R‐factors (Fig. 5).Therewasnocasewhere refinementshowed anynumericalproblems.

Figure 5. Result of automatic re‐refinement of 355 TLS containingstructuresinPDB.

Figure6.AschemeshowinghowtherefinedADPsarestoredinaPDBfile.

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 30: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

30

ARTICLES

5.OutputofADPinformationinPDBfilesIt shouldbenoted that each atomparticipating in aTLS group receives an anisotropicADP (ANISOUcardinPDB)(Fig.6).phenix.refinealwaysoutputsthetotalADPforeachatomtothePDBfile.AnexceptionismadeforUCRYST,where,similarlytoCNS(Brunger,2007),onlythetraceofthismatrix(orthecomponentthatkeepstheADPvaluespositivedefinite)isaddedintotheoutputADPforATOMandANISOUrecords,whileitsanisotropiccomponentisseparatedandisreportedinthePDBfileheader.ToanalyzetheseparatecontributionsUTLSandULOCALgivenUTOTAL,phenix.tls(Afonine,unpublished)isatoolwithinPHENIXthatisspecificallydesignedforthis.

6.ExamplesIn this section we illustrate the use of refinement strategies with different choices of ADPparameterization.

All refinements included fivemacro‐cycles of individual coordinates, ADP and occupancy refinement(seethephenix.refinedocumentationfordetailsofoccupancyrefinement),orderedsolvent(water)update (adding, removing, refinement; data resolution permitting). If NCS is available, then bothphenix.refine runs were performed: with and without using NCS, where the NCS groups wereselectedbyphenix.refineautomatically.TheX‐raytargetweightwasautomaticallyoptimizedinallrefinementruns.TheonlyvariationbetweenrefinementrunswastheADPparameterization:

• individual(I),• groupwithoneisotropicADPperresidue(G1),• groupwithtwoisotropicADPperresidue(G2;onepermainandonepersidechainatoms);• TLSonly(T);• TLSincombinationwithG1(T+G1);• TLSincombinationwithG2(T+G2);• TLSincombinationwithI(T+I).

In theexamplesbelowthebestre‐refinementresultachieved inphenix.refine isshowninbold inthecorrespondingtable.

Example1:PDBcode1BL8,resolution3.2Å

This structurewasoriginally refined toRwork=28.0andRfree=29.0%(Doyleetal., 1998).Chenetal.(2007) re‐refined this model using a Normal Mode parameterization in refinement, reducing the R‐factors to 27.2 and 27.2%, respectively (leaving no gap between Rwork and Rfree). At this resolutionautomatedwatermodelupdateswerenotpossible.

Example2:PDBcode2PFD,resolution3.42Å

This structure was originally refined to Rwork = 24.0 and Rfree = 24.9% using a Normal Modeparameterization (Poon et al., 2007). At this resolution automated water model updates were notpossible.

Refinementstrategy

I I+NCS G1+NCS G2+NCS T+NCS T+G1+NCS T+G2+NCS T+I+NCSRwork/Rfree

(%) 25.6/32.0 24.7/28.5 27.0/30.0 26.4/30.8 25.0/28.2 24.8/28.0 24.0/28.1 23.3/27.0

Refinementstrategy

I I+NCS G1+NCS G2+NCS T+NCS T+G1+NCS T+G2+NCS T+I+NCS Rwork/Rfree

(%) 21.6/28.1 20.3/26.5 21.5/28.9 22.1/29.0 21.2/27.5 21.0/27.6 21.0/28.1 20.2/25.8

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 31: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

31

ARTICLES

Example3:PDBcode1DQV,resolution3.2Å

ThisstructurewasoriginallyrefinedtoRwork=29.3andRfree=34.8%usingtheCNSprogram(Suttonetal.,1999).Atthisresolutionautomatedwatermodelupdateswerenotpossible.

Example4:PDBcode1B7G,resolution2.05Å

ThePDB file header indicatesRwork = 22.6 andRfree = 29.3% (Isupov et al., 1999).Winnet al. (2001)applied combined TLS and individual isotropic refinement which resulted inRwork = 22.0 andRfree =25.7%.

AcknowledgementsThis work was supported in part by the US Department of Energy under Contract No. DE‐AC03‐

76SF00098andNIH/NIGMSgrant1P01GM063210.

ReferencesAfonine,P.V.,Grosse‐Kunstleve,R.W.&Adams,P.D.(2005).CCP4Newsl.42,contribution8.

Brunger,A.T.(2007).NatureProtocols.2,2728‐2733.

Chen, X., Poon, B.K., Dousis, A., Wang, Q., & Ma, J. (2007). Structure. 15, 955-962.

Doyle, D.A, Cabral, J.M., Pfuetzner, R.A., Kuo, A., Gulbis, J.M., Cohen, S.L., Chait, B.T. & MacKinnon, R. (1998). Science. 280, 69-77.

Dunitz,J.D.&White,D.N.J.(1973).ActaCryst.A29,93‐94.

Isupov,M.N.,Fleming,T.M.,Dalby,A.R.,Crowhurst,G.S.,Bourne,P.C.&Littlechild,J.A.(1999).J.Mol.Biol.291,651‐660.

Ni,F.,Poon,B.K.,Wang,Q.&Ma,J.(2009).ActaCryst.(2009).D65,633‐643.

Painter,J.&Merritt,E.A.(2006).ActaCryst.D62,439‐450.

Painter,J.&Merritt,E.A.(2006).J.Appl.Cryst.39,109‐111.

Poon,B.K.,Chen,X.,Lu,M.,Vyas,N.K.,Quiocho,F.A.,Wang,Q.,&Ma,J.(2007).PNAS.104,7869‐7874.

Schomaker,V.&Trueblood,K.N.ActaCryst.(1968).B24,63‐76.

Stuart,D.I.&Phillips,D.C.(1985).MethodsinEnzymology.115,117‐142.

Sutton,R.B.,JamesA.Ernst,J.A.&Brunger,A.T.(1999).J.CellBiol.147:589‐598.

RefinementstrategyI G1 G2 T T+G1 T+G2 T+I

21.4/25.2 23.1/25.8 22.9/26.4 19.7/22.5 19.1/22.5 18.9/22.0 17.3/21.2+NCS

21.9/25.7 22.3/26.4 23.7/26.4 20.5/22.1 20.5/22.7 19.6/21.9 17.7/21.3Nowaterupdate,noNCS

Rwork/Rfree(%)

20.5/26.0 22.1/26.8 22.0/27.8 19.5/24.0 18.4/22.3 19.2/24.1 16.7/21.7

Refinementstrategy

I I+NCS G1 G2 T T+G1 T+G2 T+IRwork/Rfree

(%) 21.6/27.9 ‐ 25.3/30.5 24.4/29.8 22.6/27.3 21.9/26.9 21.2/27.3 19.7/25.3

ComputationalCrystallographyNewsletter(2010).1,24–31

Page 32: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

32

SHORT COMMUNICATIONS

Non‐periodictorsionangletargetsinPHENIXJeffreyJ.Headd,aNigelW.Moriarty,aRalfW.Grosse‐Kunstleve,aandPaulD.Adamsa,b

aLawrenceBerkeleyNationalLaboratory,Berkeley,CA94720bDepartmentofBioengineering,UniversityofCaliforniaatBerkeley,Berkeley,CA94720

Correspondenceemail:[email protected]

TorsionrestraintsinthePHENIXmonomerlibraryhavetraditionallybeenparameterizedwithatargetangle (value_angle), a standard deviation (value_angle_esd), and a periodicity (period) asoriginallyspecifiedbyVaginetal.(2004).AnexamplemmCIFdefinitionfortheχanglesinTrpresiduesisshowninFig.1A.Thisperiodicparameterizationforχanglesprovestobeinsufficienttorecapitulateallfavoredrotamerpositions(Lovelletal.,2000)forAsn,Asp,His,Phe,Trp,andTyr.Forexample,Fig.2AdepictstheresultsofgeometryminimizationinPHENIXofanidealTyrm0rotamer,whichresultsinanm­90rotamer.Thisresultoccursbecausetheχ2torsionforTyrisdefinedas90°withaperiodicityof2,whichexcludesthe0°target.Inmostcases,simplyincreasingtheperiodicityofagiventorsionangleiscounterproductiveasthiswillopenupnon‐rotamericconformationsinadditiontothedesiredvalues.

AsshowninFig.2B, increasingtheperiodicityfrom2to4toallowχ2=0°forTyropensupχ2=180°,whichwouldallowarotameroutlierwithanimpossiblestericclashtoscorefavorablyinthegeometryterm.

A.

_chem_comp_tor.comp_id _chem_comp_tor.id _chem_comp_tor.atom_id_1 _chem_comp_tor.atom_id_2 _chem_comp_tor.atom_id_3 _chem_comp_tor.atom_id_4 _chem_comp_tor.value_angle _chem_comp_tor.value_angle_esd _chem_comp_tor.period TRP chi1 N CA CB CG 180.000 15.000 3 TRP chi2 CA CB CG CD1 90.000 20.000 2

B.

_chem_comp_tor.comp_id _chem_comp_tor.id _chem_comp_tor.atom_id_1 _chem_comp_tor.atom_id_2 _chem_comp_tor.atom_id_3 _chem_comp_tor.atom_id_4 _chem_comp_tor.value_angle _chem_comp_tor.alt_value_angle _chem_comp_tor.value_angle_esd _chem_comp_tor.period TRP chi1 N CA CB CG 180.000 . 15.000 3 TRP chi2 CA CB CG CD1 90.000 0 20.000 2

Figure1.ExamplesofthetorsionssectionofarestraintsfileinCIFformatforTRP.

Figure2.GeometryofvariousrotomersforTRP.ImagesgeneratedusingKiNG(Chenetal,2009).

ComputationalCrystallographyNewsletter(2010).1,32–33

Page 33: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

33

SHORT Communications

Torecapitulateallfavorablerotamerpositionswithoutintroducingfalse‐minima,PHENIXnowincludesanalt_angle_value parameter in thetor definitions fornon‐periodicparameterizationof torsionangles. Fig. 1B illustrates the usage of the alt_angle_value parameter in the mmCIFparameterization forTyr. Thisparameter takeseithera single targetvalue, ormultiple targetvaluesthatarecomma‐separated. Thecurrentvalue_angle_esd isusedtoapplythesameESDtothenewtorsiondefinitionsas isusedfortheoriginalperiodicvalue_angledefinition. ESDflexibility forthealternate parameterwill be implemented in the future. Bothvalue_angle andalt_angle_valuemay be used together for flexible implementation of torsion definitions. To allow formixing oftordefinitionswith andwithotalt_angle_ideal values, a ‘.’may be used for thealt_angle_idealparameterforinstanceswhereitshouldbeignoredbyPHENIX.

All previously inaccessible rotamers for the above mentioned residues have been corrected in thegeostd(http://geostd.sourceforge.net) inPHENIXusingalt_angle_valueparameters. Fig.2C illustratesthe proper recapitulation of the Tyr rotamer m0 following geometry minimization using the newparameterization.Tyrχ1asshowninFig.1Bdemonstratessuchanentryinpractice.

These non‐periodic torsion angle definitions may be included in any mmCIF definition for use inPHENIX,includingaminoacids,nucleotides,andligands. Onepotentialusefortheseparametersisforspecifyingsugarpuckersinsaccharides. Internally,eLBOW(Moriartyetal.,2009)currentlyusesnon‐periodictorsionstogenerateandcontrolthepuckersofsaturatedrings.Thevaluesofthedihedralsofasix‐membered puckered ring in the chair conformer are approximately ±55 degrees. A torsiondefinitionusingperiodswouldrequireavalueof60degreesandaperiodofthree.Thisisnotanidealsituationforeithertheidealvalueorthefactthat180degrees isaminimumonthepotentialsurface.Furthermore,theboatconformerhastwotorsionsthatareapproximatelyzerorequiringaperiodofsix.Thismakes120degreesapossiblebutnon‐physicalminimum.Testingofthenon‐periodictorsionsforcarbohydratesiscurrentlyunderwayandwillsoonbeavailableviaeLBOW.

References:Chen,V. B.,Davis, I.W. andRichardson,D. C. (2009)KiNG (Kinemage,NextGeneration):A versatile interactivemolecularandscientificvisualizationprogram.ProteinScience,18,2403‐2409.

Lovell,S.C.,Word,J.M.,Richardson,J.S.andRichardson,D.C.(2000)ThePenultimateRotamerLibrary.Proteins:StructFunctionandGenetics,40,389‐408.

Moriarty,N.W.,Grosse‐Kunstleve,R.W.&Adams,P.D.(2009).ActaCryst.D65,1074–1080.

Vagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. and Murshudov, G. N. (2004)REFMAC5dictionary:organizationofpriorchemicalknowledgeandguidelinesforitsuse.ActaCryst.D60,2184‐2195.

ComputationalCrystallographyNewsletter(2010).1,32–33

Page 34: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

34

SHORT Communications

Model‐buildingupdatesandnewfeaturesTomTerwilligera

aLosAlamosNationalLaboratory,LosAlamos,NM87545

Correspondenceemail:[email protected]

A.Rapidphaseimprovementandmodel‐buildingwithphenix.phase_and_build(NEW)PHENIX nowhasanewandvery rapidmethod for improving thequalityof yourmapandbuildingamodel. This phenix.phase_and_build approach uses all the tools described in this section. Theapproach is tocarryoutan iterativeprocessofbuildingamodelasrapidlyaspossibleandusing thismodel in density modification to improve the map. This approach is related to the olderphenix.autobuildapproach.Thedifferenceisthatinphenix.autobuildmucheffortwasspentonbuilding the best possible model at each stage before carrying out density modification, while inphenix.phase_and_build speed of model‐building is optimized. The result is thatphenix.phase_and_build is 10 times faster than phenix.autobuild, yet it produces nearly asgood a model in the end. The phenix.phase_and_build approach will also find NCS from yourstartingmapandapplyitduringdensitymodification.Youcanrunphenix.phase_and_buildwith:

phenix.phase_and_build my_experimental_data.mtz my_sequence.dat You can also add a startingmodel or a startingmap.Thismeans that you can run it once, get a newmodelandmap,thenrunitagaintoimproveyourmodelandmapfurther.

When you run phenix.phase_and_build it will write out a phase_and_build_params.effparameterfilethatcanbeusedtore‐runphenix.phase_and_build(justasforessentiallyallPHENIXmethods). In addition, phenix.phase_and_build will write out the parameters files for theintermediatemethodsusedaspartofphenix.phase_and_buildtothetemporarydirectoryusedinbuilding.Youcan

• RunNCSidentification

phenix.find_ncs temp_dir/find_ncs_params.eff

• Runfirstcycleofdensitymodification

phenix.autobuild temp_dir/AutoBuild_run_1_/autobuild.eff

• Runmostrecentmodel‐building

phenix.build_one_model temp_dir/build_one_model_params.eff

• Runsequenceassignmentandfillingshortgaps

phenix.assign_sequence temp_dir/assign_sequence_params.eff

• Runloopfitting

phenix.fit_loops temp_dir/fit_loops_params.eff

Thisgivesyoucontrolofallthestepsinmapimprovementandmodel‐buildinginadditiontolettingyourunthemalltogetherwithphenix.phase_and_build.

The phenix.autosol wizard now uses phenix.phase_and_build by default for model‐building.Thismeansthatnowthemodelsproducedbyphenix.autosolarequitegoodbutarestillobtainedquickly.

ComputationalCrystallographyNewsletter(2010).1,34–36

Page 35: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

35

SHORT Communications

B.WorkingwithNCSinPHENIXwithphenix.find_ncs:IdentificationofNCSfromheavy‐atomsites,fromamodelorfromamap

Thefind_ncsmethodcontainsallthealgorithmsavailableinPHENIXforfindingNCS,includinganewalgorithmforfindingNCSdirectlyfromamap.Theapproachesusedinfind_ncsare:

1.FindingNCSfromamap(NEW):

phenix.find_ncs my_map.mtz

The find_ncs_from_density algorithm first identifies potential locations of centers ofmacromolecules in the density map by finding maxima of the local RMS density. Then it cuts out asphereofdensitycenteredatatrialcenterandcarriesoutanFFT‐basedrotation‐translationsearchtofindalloccurrencesofsimilardensityintheasymmetricunitofyourmap.TheregionoverwhichNCS‐relatedcorrelationishighisidentifiedandtheoperatorsarewrittenoutasa"find_ncs.ncs_spec"filethatcanbereadbythePHENIXwizardsanda"find_ncs.phenix_refine"filethatcanbereadbyphenix.refine. Youcanrun thealgorithmasawholeoryoucanruneachpart separatelywithphenix.guess_molecular_centersandphenix.find_ncs_from_density.

2.FindingNCSfromheavy‐atomsitesoramodel

phenix.find_ncs ha.pdb my_map.mtz phenix.find_ncs my_model.pdb

Ifyouhaveaheavy‐atompdbfileandamapfile,thenphenix.find_ncswillidentifysubsetsofyourheavy‐atomsitesthatarerelatedbynon‐crystallographicsymmetry,anditwillcheckwhetherthisNCSisactuallyreflectedinyourmap.ItwillwriteouttheresultingNCSasncs_specandphenix_refinefiles. IfyouhaveamodelwithseveralchainsthatarerelatedbyNCSsymmetry,phenix.find_ncswillfindtheNCSoperatorsfromthecoordinatesinyourmodel.

3.ReadingNCSfromamy_ncs.ncs_specfile

phenix.find_ncs my_ncs.ncs_spec willreadancs_spec filewrittenbyaPHENIXmethod,andwriteout theNCS in formatssuitable forphenix.refine or thewizards. If you supplyalsoamapMTZ file, itwill check for thisNCS inyourmap.

4.CreatingNCS‐relatedcopies

You canalso applyNCSoperators fromamy_ncs.ncs_spec file to a single copyof yourprotein tocreatealltheNCS‐relatedcopieswith:

phenix.apply_ncs my_ncs.ncs_spec my_model_one_ncs_copy.pdb

C.Fittingloopswithlooplibraries(NEW)andbytracingchainswithphenix.fit_loops

phenix.fit_loops my_model_with_gaps.pdb my_map.mtz my_sequence.dat Thephenix.fit_loopsapproachnowhastwomainalgorithms.Oneistofitshortgapsusingalooplibraryderived fromhigh‐resolutionstructures in thePDB,and theother is tobuild loopsdirectlybyiterative extensionwith tripeptides. The loop‐library approach (specifiedwithloop_lib=True) isveryfast,andiscurrentlyapplicableforshortgapsofupto3residues.Theiterativeextensionapproachisslower,butcanbeusedforlongergaps(typicallyupto10‐15residues).

ComputationalCrystallographyNewsletter(2010).1,34–36

Page 36: CCN 2010 07 › newsletter › CCN_2010_07.pdf · 2010-07-14 · Computational Crystallography Newsletter (2010). Volume 1, Part 1. 1 Computational Crystallography Newsletter Iotbx.PDB

36

SHORT Communications

D.Rapidbuildingofasinglemodelwithphenix.build_one_model(NEW)

PHENIXnowhasatoolthatyoucanusetoquicklybuildasinglemodelfromamapandsequencefile,ortoextendanexistingmodel.Youcanbuildanewmodelwithresolvemodel‐buildingwith:

phenix.build_one_model my_map.mtz my_sequence.dat IfyousupplyaPDBfile,thentheendsofeachchaininyourmodelwillbeextended,ifpossible,basedonyourmap.

E.Sequenceassignmentandshortgapfillingwithphenix.assign_sequence(NEW)

Youcannowcarryoutan improvedsequenceassignmentofamodel thatyouhavealreadybuiltwithphenix.assign_sequence. Further,once thesequencehasbeenassigned, thismethodwilluse thesequenceandproximitytoidentifychainsthatshouldbeconnected,anditwillconnectthosethathavetheappropriaterelationshipsusingthenewlooplibrariesavailableinphenix.fit_loops.Theresultisthatyoumaybeabletoobtainamorecompletemodelwithmorechainsassignedtosequencethanpreviously.Youcanrunitwith:

phenix.assign_sequence my_model.pdb my_sequence.dat my_map.mtz

ComputationalCrystallographyNewsletter(2010).1,34–36

PHENIXwebsitefordownloads,documentationandhelpwww.phenix‐online.org