29
Tutorial: Rosetta tools for structure determination in cryoEM density Brandon Frenz, Ray Y.-R. Wang, Frank DiMaio Last updated: August 2018 This tutorial is intended to introduce users to several different ways Rosetta may be used to solve various structure determination tasks given 3-5Å cryoEM density data. It is not intended to replace the user’s guide, available at https://www.rosettacommons.org/manuals/latest/main/. The tutorial is split up into four parts. 1. An introduction to Rosetta in general, showing how one may score structures and minimize structures guided by experimental density data 2. Our fragment-based refinement protocol for refinement against 3-5Å EM density 3. Our model rebuilding protocol (RosettaCM), where one wishes to recombine homologous structures, and rebuild small missing regions (<12 residues) 4. Our de novo model-building tools, where one wishes to rebuild missing regions of a structure (for example, from a homology model 5. Our model completion tools, where one wishes to complete a partial model built by the de novo tool or wishes to rebuild large missing regions (12 or more residues) In each scenario, we present the most basic usage of Rosetta for the task, and then describe additional options that may be useful. Command-line flags and input scripts are provided in shaded boxes, with boldfaced text indicating parameters of note. These parameters are described in the text following the command line. Note: in all sections, you will need to update the command scripts to point at your installation of Rosetta and the Rosetta database. Which section should one read? If you want a straightforward introduction to scoring and basic refinement of structures in Rosetta, read Section 1 If you have a model that you want to refine into a 3 - 4.5Å density map, read Section 2. If you have a density map at 3 - 4.5Å (or worse!), one or more partial models, and you want to combine the models and rebuild short insertions and deletions, read Section 3. If you have a 3 - 4.5Å density map with no homology information available, and want to build a model, read Section 4. If you have a 3 - 4.5Å density map and a very incomplete model you want to complete, read Section 5. Other notes. For all the applications in this tutorial, it is recommended that you download the latest weekly release of Rosetta. Also, for brevity, some of the command lines and XML scripts have been trimmed in this document. The tutorial files contain the full command lines and XML scripts; if something is omitted in this document, it should not be changed from the value in the tutorial file.

Tutorial: Rosetta tools for structure determination in ...faculty.washington.edu/dimaio/files/rosetta_density_tutorial_aug18.pdf · 2. Our fragment-based ... In each scenario

Embed Size (px)

Citation preview

Tutorial:RosettatoolsforstructuredeterminationincryoEMdensityBrandonFrenz,RayY.-R.Wang,FrankDiMaioLastupdated:August2018

ThistutorialisintendedtointroduceuserstoseveraldifferentwaysRosettamaybeusedtosolvevariousstructuredeterminationtasksgiven3-5ÅcryoEMdensitydata.Itisnotintendedtoreplacetheuser’sguide,availableathttps://www.rosettacommons.org/manuals/latest/main/.

Thetutorialissplitupintofourparts.

1. AnintroductiontoRosettaingeneral,showinghowonemayscorestructuresandminimizestructuresguidedbyexperimentaldensitydata

2. Ourfragment-basedrefinementprotocolforrefinementagainst3-5ÅEMdensity3. Ourmodelrebuildingprotocol(RosettaCM),whereonewishestorecombinehomologous

structures,andrebuildsmallmissingregions(<12residues)4. Ourdenovomodel-buildingtools,whereonewishestorebuildmissingregionsofastructure(for

example,fromahomologymodel5. Ourmodelcompletiontools,whereonewishestocompleteapartialmodelbuiltbythedenovo

toolorwishestorebuildlargemissingregions(12ormoreresidues)

Ineachscenario,wepresentthemostbasicusageofRosettaforthetask,andthendescribeadditionaloptionsthatmaybeuseful.Command-lineflagsandinputscriptsareprovidedinshadedboxes,withboldfacedtextindicatingparametersofnote.Theseparametersaredescribedinthetextfollowingthecommandline.

Note:inallsections,youwillneedtoupdatethecommandscriptstopointatyourinstallationofRosettaandtheRosettadatabase.

Whichsectionshouldoneread?

• IfyouwantastraightforwardintroductiontoscoringandbasicrefinementofstructuresinRosetta,readSection1

• Ifyouhaveamodelthatyouwanttorefineintoa3-4.5Ådensitymap,readSection2.• Ifyouhaveadensitymapat3-4.5Å(orworse!),oneormorepartialmodels,andyouwantto

combinethemodelsandrebuildshortinsertionsanddeletions,readSection3.• Ifyouhavea3-4.5Ådensitymapwithnohomologyinformationavailable,andwanttobuilda

model,readSection4.• Ifyouhavea3-4.5Ådensitymapandaveryincompletemodelyouwanttocomplete,readSection5.

Othernotes.

Foralltheapplicationsinthistutorial,itisrecommendedthatyoudownloadthelatestweeklyreleaseofRosetta.

Also,forbrevity,someofthecommandlinesandXMLscriptshavebeentrimmedinthisdocument.ThetutorialfilescontainthefullcommandlinesandXMLscripts;ifsomethingisomittedinthisdocument,itshouldnotbechangedfromthevalueinthetutorialfile.

1)Rosettaandelectrondensitybasics

ThissectionprovidesabriefintroductiontousingRosetta,andanoverviewofusingdensitydatawithinRosetta.

OverviewofRosetta

TheRosettadocumentationisagoodsourceofadditionalinformationonseveralofthetoolsdescribedinthisdocument.Thisisavailableathttps://www.rosettacommons.org/docs/latest/Home.

ThetoolsdescribedinthisdocumentusetheRosettaScriptsframework,describedathttps://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/RosettaScripts.Briefly,thisallowsprotocolstobedefinedasaseriesofatomic"Movers"whichmanipulateastructure.Theformatisasfollows:

<ROSETTASCRIPTS> <SCOREFXNS> </SCOREFXNS> <MOVERS> </MOVERS> <PROTOCOLS> </PROTOCOLS> <OUTPUT /> </ROSETTASCRIPTS>

Eachblockcontainsinformationinrunningtheprotocol:<SCOREFXNS>and<MOVERS>areusedtodeclarescorefunctionsandmovers;while<PROTOCOLS>iswherethestepsoftheprotocolareenumerated.

Rosettatoolsarerunviathecommandline,withflagscontrollinggeneralprogrambehavior.Manyoftheflagsspecificallyfordensityrefinementareoutlinedinthesectionsfollowing.

DensityscoringinRosetta

AgreementtodensityisimplementedinRosettaasanadditionalenergyterm.Rosettaassessesagreementtodensitybycomputingthedensitythatonewouldexpecttosee,givenamodel,andmeasuringtheagreementoftheexpectedandexperimentaldensity.

elec_dens_fastThisscoretermisrecommendedfornearlyallusesofdensityrefinementinRosetta.Itusesinterpolationonaprecomputedgridofper-atomscorestoapproximatethedensitycorrelations.Thisversionissignificantlyfaster(~10x)thentheexactscoringtermbelow,andisveryhighlycorrelated.

TheseenergytermsmaybeprovidedtoRosettaintwoways.First,itmaybeprovidedinaRosettaScriptXMLfileasinput:

<Reweight scoretype="elec_dens_fast" weight="35.0"/>

Fornon-Rosettascriptapplications,thefollowingflagcontrolsthedensityscoringfunctionweight:

-edensity:fast_dens_wt 35.0 Therecommendedweightsforeachofthesetermsvarydependingonthedensitymapresolution,starting

modelquality,andprotocol.Section2describeshowtheweightsmaybetuned.However,thefollowingaregoodrulesofthumbforsettingthedensityweightwithinRosetta:Atresolutionsbetterthan2.5Å:anelec_dens_fastweightof65.0isgenerallyreasonable.Atresolutionsbetween2.5Åand3.5Å:anelec_dens_fastweightof50.0isgenerallyreasonable.Atresolutionsworsethan3.5Å:anelec_dens_fastweightof35.0isgenerallyreasonable.Incentroidmode(describedinpart4):anelec_dens_fastweightof10.0isgenerallyreasonable

Atverylowresolutions(worsethan6Å),theweightmayneedtobefurtherreduced.Ingeneral,iftheRosettaenergiesarepositive(orsignificantoutliersareflaggedbyMolprobityorothervalidationprograms)thentheweightsneedtobereduced.

Inadditiontothescoretermsabove,therearealsoseveralflagsthatcontrolmapscoringbehavior.MapsarereadintoRosettausingeithertheflag:

-edensity::mapfile mapfile.mrc

OrfromXML:

<LoadDensityMap name="loaddens" mapfile="mapfile.mrc"/>

MapsmaybeineitherCCP4orMRCformat(themaptypeisautomaticallydetectedfromtheheaderinfo).

Theresolutionofthemap,usedwhencomparingcalculatedtoexperimentaldensity,isspecifiedwiththeflag:

-edensity::mapreso 5.0

Mapsmayalsoberesampledtoreducememoryusageandruntime.Thisisdonethroughtheflag:

-edensity::grid_spacing 2.0

Noticethatthisflagshouldneverbemorethanhalfthegivenresolution,andifusingthefastscoringfunctionnevermorethanathirdoftheresolution.Forbothparameters,thedefaultisgenerallyfine(don’tresample,andassumetheresolutionis~3xthegridsampling).

Finally,onemaychoosetocalculatedensityusingeithercryoEMorX-rayscatteringfactors.Atlowresolution,thisprobablymakeslittledifference,butmightatresolutionsbetterthanabout3.5Å.ThedefaultistouseX-rayscatteringfactors;toturnoncryoEMscatteringfactorsinstead,usethefollowingflag:

-edensity::cryoem_scatterers

Example1A:ScoringaPDBinRosettawithdensity

Mostsimply,onemaywishtosimplyscoreamodelusingRosetta’senergyfunctionincludingthedensityterms.Thisiseasilyaccomplishedusingthescore_jd2application.Asamplecommandlinetorescorethestructureindensityisgivenin1_rosetta_basics/A_run_rescore.sh.ItillustratestheuseofvariousdensityflagstoprovideRosettawithexperimentaldensityinformation.

$ROSETTA3/source/bin/score_jd2.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 1isrA.pdb 1issA.pdb \ -ignore_unrecognized_res \ -edensity::mapfile 1issA_6A.mrc \ -edensity::mapreso 5.0 \ -edensity::grid_spacing 2.0 \ -edensity::fastdens_wt 35.0 \ -edensity::cryoem_scatterers \ -crystal_refine

Someflagsofnoteareboldfacedabove.First,theinputstructureisprovidedwiththecommand-in::file::s.ThisiscommontomanyRosettaapplications,andmorethanoneinputmaybeprovided;eachwillbeprocessedindependently.Theflagsbeginningwith–edensity::tellRosettaaboutthedensitymapintowhichitisbeingfit.Thenameofthemapfile(inCCP4orMRCformat),theresolutionofthemap,thegridsamplingofthemap(whichshouldneverbemorethanhalftheresolution),andtheweightsonthevariousfit-to-densityscoringfunctions.Thesesameflagsarereusedformanydifferentprotocolsinadditiontorelax.Finally,theflag-crystal_refinetheflagturnsonseveraldensity-relatedoptionsrelatedtoPDBreadingandwriting,andshouldalwaysbeusedwhenrefiningagainstdensitydata.

Note:TheinputPDBmustbealignedtothedensitymapusingsomeexternaltool.Rosettawilloptionallyrigid-bodyminimizethestructureintodensitybeforerescoringbyprovidingtheflag–edensity::realignmintotheapplication.Ifthisisdone,theflag–out::pdbwillwritetheminimizedPDBfiletoaPDBfile.

Thiscommandlineoutputsascorefile,score.sc,thatgives,foreachstructurespecifiedwith-in::file::s,thescorewithrespecttoeachterminRosetta’senergyfunction.ThemeaningofindividualscoretermsaswellasanoverviewoftheRosettaenergyfunctioncanbefoundinthepaper:

AlfordRF,Leaver-FayA,JeliazkovJR,O'MearaMJ,DiMaioFP,ParkH,ShapovalovMV,RenfrewPD,MulliganVK,KappelK,LabonteJW,PacellaMS,BonneauR,BradleyP,DunbrackRLJr,DasR,BakerD,KuhlmanB,KortemmeT,GrayJJ.TheRosettaAll-AtomEnergyFunctionforMacromolecularModelingandDesign.JChemTheoryComput.2017Jun13;13(6):3031-3048.

Example1B:SimplerefinementintodensityusingRosettaScriptsandrelax

InthissectionweintroduceRosettaScriptsbywayofaverysimplerefinement-into-densityexample.RosettaScriptsprovidesanXMLscriptinginterfacetoRosettathatallowsfine-grainedcontrolofprotocols.ThesyntaxisfullydescribedintheRosettadocumentation;however,averybriefintroductionisprovidedhere.ThebasicsyntaxfortheXMLisillustratedhere(1_rosetta_basics/B_relax_density.xml)

<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta_cart"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="1issA_6A.mrc"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="2" cartesian="1"/> </MOVERS> <PROTOCOLS>

<Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>

Therearethree"blocks"ofdeclarationsinthisscript.Inthefirst,<SCOREFXNS>…</SCOREFXNS>,thescorefunctionstobeusedthroughouttheprotocolaredeclared;thesecond,<MOVERS>…</MOVERS>,movers–oratomicoperationsthatmodifyastructure–aredeclared;finally,thethird,<PROTOCOLS>…</PROTOCOLS>,afullprotocolisdeclaredasasequenceofmovers.

Inthisparticularexample,wedeclareasinglescorefunction,dens,whichusesthescorefunctionbeta_cart(adefaultscorefunction,don’tneedtoworryaboutit),andturnsonelec_dens_fast,thefit-to-densityscore,withaweightof35.Wethendeclarethreemovers,SetupForDensityScoring,LoadDensityMap,andFastRelax,whichsetsuptheloadedstructurefordensityscoring,loadsamapintomemory,andthenrefinesthestructureusingtheFastRelaxprotocol.Thedeclaredscorefunction,dens,isusedasaninputtotheFastRelaxmover.

Torunthisscript,weusethefollowingcommandline(1_rosetta_basics/B_relax_density.sh):

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 1isrA.pdb \ -parser::protocol ex_B1_run_RS_relax_density.xml \ -ignore_unrecognized_res \ -edensity::mapreso 5.0 \ -edensity::cryoem_scatterers \ -crystal_refine \ -out::suffix _relax \ -beta

Note:Wedonothavetospecifythedensityweightorthemapfileonthecommandline,sincetheyarehandledwithintheXMLfile.However,otherdensityoptionsmustbespecifiedonthecommandline.WhenusingRosettaScripts,thedensityweightsmustbespecifiedintheXML,theinputmapmaybespecifiedeitherway.

Finally,inthepreviousXMLfile,thetagcartesian=1appears,whichrefinesthestructureinCartesianspace.Rosettaalsoallowsrefinementintorsionalspace,whichmaybebetterforcapturingdomainmotion,andforfurtherreductioninmodelparametersagainstlow-resolutiondata.Toenabletorsionalrefinement(1_rosetta_basics/C_relax_tors_density.xml),wemakethreesmallchangestotheXML:

<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="dens" weights="beta"> <Reweight scoretype="elec_dens_fast" weight="35.0"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="1issA_6A.mrc"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="5" cartesian="0"/> </MOVERS> <PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/>

<Add mover="relaxcart"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>

2)Modelrefinementviaiterativelocalrebuilding

Inthissection,weintroduceourcryoEMrefinementprotocol,whichusesaniterativelocalrebuildingproceduretoescapelocalminimaduringrefinement.Thesectionisdividedintotwoparts,inthefirst,weintroducethemethodfornon-symmetricsystems;inthesecond,wedescribehowtousethismethodforsymmetricsystems.

Asarunningexample,werefinemodelsofthetransmembraneregionoftheTRPV1ionchannel,usinga3.4ÅcryoEMsingleparticlereconstruction(M.Liao,E.Cao,D.Julius,Y.Cheng,Nature,2013),andthedepositedmodel(id:3j5p)asastartingmodel.Wewillfirstrefinethisasymmetrically,andthenintroducesymmetricrefinement.

Example2A:AsymmetricrefinementintocryoEMdensity

AsummaryoftheXMLusedforrefinement(2_cryoem_refinement/A_asymm_refine.xml)isshownbelow.Following,abriefdescriptionofthemoversandoptionsavailableisprovided.

<ROSETTASCRIPTS> <SCOREFXNS> <ScoreFunction name="cen" weights="score4_smooth_cart"> <Reweight scoretype="elec_dens_fast" weight="20"/> </ScoreFunction> <ScoreFunction name="dens_soft" weights="beta_soft"> <Reweight scoretype="cart_bonded" weight="0.5"/> <Reweight scoretype="pro_close" weight="0.0"/> <Reweight scoretype="elec_dens_fast" weight="%%denswt%%"/> </ScoreFunction> <ScoreFunction name="dens" weights="beta_cart"> <Reweight scoretype="elec_dens_fast" weight="%%denswt%%"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <MOVERS> <SetupForDensityScoring name="setupdens"/> <LoadDensityMap name="loaddens" mapfile="%%map%%"/> <SwitchResidueTypeSetMover name="tocen" set="centroid"/> <MinMover name="cenmin" scorefxn="cen" type="lbfgs_armijo_nonmonotone" max_iter="200" tolerance="0.00001" bb="1" chi="1" jump="ALL"/> <CartesianSampler name="cen5_50" automode_scorecut="-0.5" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <CartesianSampler name="cen5_60" automode_scorecut="-0.3" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <CartesianSampler name="cen5_70" automode_scorecut="-0.1" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <CartesianSampler name="cen5_80" automode_scorecut="0.0" scorefxn="cen" mcscorefxn="cen" fascorefxn="dens_soft" strategy="auto" fragbias="density" rms="%%rms%%" ncycles="200" fullatom="0" bbmove="1" nminsteps="25" temp="4" fraglens="7" nfrags="25"/> <ReportFSC name="report" testmap="%%testmap%%" res_low="10.0" res_high="%%reso%%"/>

<BfactorFitting name="fit_bs" max_iter="50" wt_adp="0.0005" init="1" exact="1"/> <FastRelax name="relaxcart" scorefxn="dens" repeats="1" cartesian="1"/> </MOVERS> <PROTOCOLS> <Add mover="setupdens"/> <Add mover="loaddens"/> <Add mover="tocen"/> <Add mover="cenmin"/> <Add mover="relaxcart"/> <Add mover="cen5_50"/> <Add mover="relaxcart"/> <Add mover="cen5_60"/> <Add mover="relaxcart"/> <Add mover="cen5_70"/> <Add mover="relaxcart"/> <Add mover="cen5_80"/> <Add mover="relaxcart"/> <Add mover="relaxcart"/> <Add mover="report"/> </PROTOCOLS> <OUTPUT scorefxn="dens"/> </ROSETTASCRIPTS>

Theprotocolisabitinvolved,butisdescribedinthefollowing.Thefirstthingtonoteistheuseofmacroslike"%%denswt%%".Thesearecommandargumentsthatmaybespecifiedfromthecommandlinethroughtheflag–parser::script_varsdenswt=25.0.Theprotocolaboveusesthesemacrosinplaceofparametersthatuserswouldmostliketochange;otherparametersshouldbeleftasisexceptforadvancedusers.

ThemainadditionistheCartesianSamplermover.Thismoveriterativelylocallyrebuildsthestructureinuser-specifiedorautomaticallydeterminedregions.Abriefdescriptionoftheargumentstothismover:

name="cen5_70":thenameofthemover,referredtointhe<PROTOCOLS>block(thiscanbeanything)strategy="auto":thestrategytouseinselectingwhattorebuild.Oneof:

• auto:selectregionsautomaticallybasedondensityfit&localstrain(usingthecutoffinautomode_scorecut,e.g.,automode_scorecut="-0.1")

• user:manuallyspecifyresidues(usingtheflagresidues,e.g.,residues="32A-48A")• rama:selectregionsautomaticallybasedonramascoreonly

rms="1.5":thecutoffonsimilaritywhenlocallyrebuilding.Increasingthisvaluewillincreasemodel

diversity(allowingworsestartingmodelstoberefinedbutrequiringadditionalsampling)ncycles="200":thenumberofrebuildingcyclestoconsider.Increasingthiswillincreaseruntimeand

slightlyincreasemodeldiversity.fraglens="7":thesegmentlengthtoreplace.Thismustbeanoddnumberfrom5-13,andincreasingthis

valuewillincreasemodeldiversitysignificantly.

Theremainingoptionsshouldneverbechanged.

Anotheroptionispassedtothedensityscoringviathe<Setscale_sc_dens_byres=.../>tag.Intherefinementprotocol,thissetsaper-amino-acidsidechainreweighing.Theweightsshowninthisexampleweredeterminedbyfittingtheseparametersintorefinedstructuresintoseveral3-5ÅcryoEMdensitymaps;theendresultisaslightdownweighingofsidechaindensity,particularlyforchargedsidechains.Thisshouldnotbechangedexceptbyadvancedusers.

TheMinMoverfirstminimizesthestructureusingalow-resolutionenergyfunction(cen).Wehavefoundthisstepismostusefulforimprovingproteinbackbonegeometry,particularlywithhand-tracedmodels.Thislow-resolutionscorefunctionusesthecentroidrepresentation,whichisenabledbytheSwitchResidueTypeSetmover.

TheFitBFactorsmoverfitsreal-spaceatomicBfactorstomaximizemodel-mapcorrelation.AconstraintenforcingnearbyatomstotakethesameBfactorsisalsoemployed,andtheweightonthistermiscontrolledwiththewt_adpterm(0.0005isgenerallywell-behaved).Finally,init=1meanstodoaquickscanofoverallBfactorsbeforebeginningrefinement;ifthereismorethanonecalltothismoverinasingletrajectory,thenonlythefirstneedstohaveinit=1.Exact=1shouldalwaysbeused.

Finally,theReportFSCmoverassessesmodelagreementtothemapusedforfittingaswellasanindependentmapusingtheintegratedFSCoverhigh-resolutionshells.Wehavefoundintegratingfrom10Åtotheresolutionofthedataisbestformodeldiscrimination.

Finally,thiscommandisexecutedusingthefollowing(scenario2_cryoem_refinement/A_asymm_refine.sh):

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 3j5p_transmem_A.pdb \ -parser::protocol A_asymm_refine.xml \ -parser::script_vars denswt=35 rms=1.5 reso=3.4 map=half1_34A.mrc testmap=half2_34A.mrc \ -ignore_unrecognized_res \ -edensity::mapreso 3.4 \ -default_max_cycles 200 \ -edensity::cryoem_scatterers \ -beta \ -out::suffix _asymm \ -crystal_refine

Inboldaretheparametersthatshouldbechangedinadaptingthisrunforothersystems.Thefirstistheinputstructure,whichshouldbespecifiedwiththeargument–in::file::s.Thesecondaretheparameterstobepassedthroughtothescript(usingthemacroreplacementmachineryofRosettaScripts).Threeofthesedescribetheinputmaps:

map=half1_34A.mrc–themaptorefineagainsttestmap=half2_34A.mrc–anindependentmapforvalidation

(ifnotusingsplitmaps,justprovidethesamemapasthepreviousargument)reso=3.4–theresolutionofthedata

(notethatthisneedstobeprovidedtwiceinthecommandline,onceforscoringandonceforreporting)

Theothertwoareparameterstothealgorithm:denswt=35–theweightontheexperimentaldensitydatarms=1.5–theamountofdeviationtoallowinfragmentinsertionmoves

(largervalueswillleadtomoremodeldeviation)

Thedensityweightof25worksreasonablywellasastartingpoint,butonemightwanttoexploreseveraldifferentvaluesusinganindependentreconstruction.Manualinspectionofoutputmodelsformolprobityscore,freeFSC,and(freeFSC–workFSC)shouldprovidecluesastowhichweightworksbest.

Adescriptionofmuchoftheworkinthissectionisdescribedinthereference:

WangRY,SongY,BaradBA,ChengY,FraserJS,DiMaioF.Automatedstructurerefinementofmacromolecularassembliesfromcryo-EMmapsusingRosetta.Elife.2016Sep26;5.pii:e17219.

JobdistributionItisgenerallyusefultosample~100modelsfromeachstartingpoint.Forthispurpose,itmaybeusefultorunmultiplejobsinparallel.Topreventoutputstructuresfromclobberingoneanother,theflag–out::suffixmaybeuseful,whereeachseparatejobisgivenadifferentsuffix.

Forexample,ona16-coremachine,wemayspecify-out::suffix_$1,then(usingGNUparallel)runthefollowing:

parallel –j16 ./A_asymm_refine.sh {} ::: {1..16}

Finally, GNU parallel allows launching of jobs remotely if SSH keys have been set up for passwordless login. To run:

parallel –S 16/node1,16/node2,16/node3,16/node4 –-workdir . ./B1_rosettaCM_singletarget.sh {} ::: {1..48}

Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentation(https://www.gnu.org/software/parallel/)formoreinformation.

AnalyzingresultsandmodelselectionWhilethisisanactivetopicofresearch,generally–onceadensityweighthasbeenchosen–toselectthebestmodelsfromamongthefullset,wewanttoselectmodelsoptimizingbothmodelgeometryandfreeFSCvalues.Modelgeometrymaybeevaluatedusingeither:a)Molprobity,orb)Rosettaenergiesaftersubtractingdensityenergies.Thelattermaybedonebyinspectingthescore*.scfilesproducedasoutput.

Usingtheabovescript,thefreeFSCvaluemaybedeterminedfromtheoutputPDBfiles.Theheadercontainsalinelike:

REMARK 1 FSC[mask=4.45657](10:3) = 0.590966 / 0.591017

Thetwonumbersattheendofthelineindicatethe"work"and"free"integratedFSCvalues.

Generally,weselectthebest20%ofmodelsbygeometry,andselectingthebestoverallbyfreeFSC.Thetop5modelsshouldbeinspectedformodelconvergenceaswellasvisuallyinspectedfordensitymapagreement.

Example2B:SymmetricrefinementintocryoEMdensity

Asthisisasymmetricsystem,tocorrectlyevaluatetheenergeticsofthesystem,weneedtomodelwithsymmetry-relatedcopiespresent.Thismaybedonethroughatwo-stepprocess:first,werunthemake_symmdef_file.plscripttoprepareadescriptionofthesymmetryofthesysteminaRosetta-readableformat.Next,weenablesymmetricscoringandoptimizationwithintheXMLfile.

TheinformationthatRosettaneedstoknowaboutasymmetricsystemisencodedinthesymmetrydefinitionfile.IttellsRosetta:(a)howtoscoreastructuresymmetricallyfromonlyasymmetricunitinteractions,and(b)howtherigid-bodydegreesoffreedomareallowedtomovetomaintainthesymmetryofthesystem.

Toaidincreatingasymmetrydefinitionfilefromasymmetric(ornear-symmetric)PDB,anapplication,

make_symmdef_file.pl,hasbeenincludedinsrc/apps/public/symmetry.TogeneratethesymmetrydefinitionfileforTRPV1,werunthecommandin2_cryoem_refinement/B1_make_symmdef.sh.

$ROSETTA3/source/src/apps/public/symmetry/make_symmdef_file.pl \ -m NCS -a A -i B \ -p 3j5p_transmem.pdb –r 1000 > TRPV1.symm

Thisscriptneedsafewpiecesofinformation:with–m,thetypeofsymmetrytogenerate(hereNCS),with–a,theprimarychain(hereA),andwith–i,anadjacentchainineachsymmetrygroup,separatedbyspaces(herejustB).ForCnsymmetries,onlyoneadjacentchainisgiven;forDn,twoaregiven.Finally,with–r,wegivethecontactdistancebetweenaneighborchainandtheprimarychainnecessarytoincludethatsubunitexplicitly(here,1000,toensureeverysymmetricallyrelatedcopyisincluded).Iftheinputsystemisasymmetric,thescriptwillmakeasymmetricalversionofit(sometimessignificantlyperturbingitintheprocess).Therearealotofotheroptions,includingforcingsymmetricalorderandhelicalandhigher-ordersymmetries,seethedocumentation!

InadditiontothedefinitionfilewrittentoSTDOUT,thescriptwillalsowriteafile3j5p_transmem_symm.pdb,containingthesymmetrizedversionoftheinputfile,andafile3j5p_transmem_INPUT.pdb,thatcontainsonlythemainchain,tobeusedasinput(inadditiontothesymmetrydefinitionfile).

Thesymmetrydefinitionfilelookssomethinglikethis:

symmetry_name TRPV1__4 E = 4*VRT0_base + 4*(VRT0_base:VRT3_base) + 2*(VRT0_base:VRT2_base) anchor_residue CoM ... set_dof JUMP0_to_com x(11.7023996817515) set_dof JUMP0_to_subunit angle_x angle_y angle_z ...

Theomittedsectionsdescribeasystemofvirtualresiduesthatmaintainthesymmetryofthesystem,andtheygenerallyshouldremainunedited.

Thetwoset_doflinesshouldbeedited.Therearetwopossibilitieswhenrefiningsymmetricstructuresintodensity:

A)wedon’twanttorefinetherigidbodyorientationoftheentiresystem;weknowthesymmetryaxesandwedon’twantthemtomove

B)wedowanttorefinetheorientationoftheentiresystem,includingsymmetryaxes

Generally,incryoEM,wherethemapsaresymmetricallyaveraged,andthesymmetryisknown,wewanttousestrategyA.However,insomecases(forexample,ifourstartingmodelwasnotperfectlysymmetric)wewanttousestrategyB.Inbothcases,aminoredittotheset_doflinesinthesymmdeffileisnecessary.

ForstrategyA,becauseweareusingdensity,weneedtochangethefirstset_doflinetothefollowing:

set_dof JUMP0_to_com x y z

ForstrategyB,weleavethetwolinesunchangedandinsteadaddathirdline:

set_dof JUMP0 x y z angle_x angle_y angle_z

ForDnsymmetries,thechangesaresimilar,exceptinAthejumpnameisJUMP0_0_to_com.TherestofthissectionusesstrategyA;theeditedsymmetrydefinitionfileisinscenario1_cryoem_refinement/C4_edit.symm

Onceasymmetrydefinitionfilehasbeengenerated,thenrefiningstructuresinRosettasymmetricallyisstraightforward.ThechangestothepreviousXMLfileareindicatedbelow(see2_cryoem_refinement/B2_symm_refine.xml):

... <ScoreFunction name="cen" weights="score4_smooth_cart" symmetric="1"> <ScoreFunction name="dens_soft" weights="eta_soft" symmetric="1"> <ScoreFunction name="dens" weights="talaris2013_cart" symmetric="1"> ... <SetupForSymmetry name="setupsymm" definition="%%symmdef%%"/> ... <SymMinMover name="cenmin" scorefxn="cen" type="lbfgs_armijo_nonmonotone" max_iter="200" tolerance="0.00001" bb="1" chi="1" jump="ALL"/>

Inallthreedeclaredscorefunctions,symmetric=1mustbegiven.Additionally,theSetupForDensityScoringmovermustbereplacedwiththeSetupForSymmetrymover.Finally,theMinMovermustbereplacedwithitssymmetriccounterpart,SymMinMover.

Thecommandforrunningthisscriptislargelythesameasbefore,withafewadditions:

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::s 3j5p_transmem_A.pdb \ -parser::protocol A_asymm_refine.xml \ -parser::script_vars \ denswt=25 rms=1.5 reso=3.4 map=half1_34A.mrc \ testmap=half2_34A.mrc symmdef=TRPV1_edit.symm \ -ignore_unrecognized_res \ -score_symm_complex false \ -edensity::mapreso 3.4 \ -default_max_cycles 200 \ -edensity::cryoem_scatterers \ -beta \ -out::suffix _symm \ -crystal_refine Thecommandsymmdef=TRPV1_edit.symmpassesthesymmetrydefinitionfiletoRosetta.Theflag-score_symm_complexfalsedependsonwhetherstrategy(a)or(b)wasemployedabove.If(a),thenfalseshouldbeused;if(b),thentrueshouldbeused.Note:TheinputPDB(-in::file::s)isofthemonomer(thatis,theasymmetricunit).

3)Modelrebuildingguidedbyexperimentaldensitydata

Inthisscenario,weintroduceatool,RosettaCM,forbuildingmissingportionsofamodelguidedbydensitydata.Whileprimarilygearedtowardscomparativemodeling,itmayalsobeusefulforbuildingportionsofaproteinthataredisorderedwhencrystallizedordifficultregionsinhand-builtmodels.Inthisscenario,weintroducethebasicrebuildingprotocol,thenshowhowthetoolmayalsobeusedto:

• Combinepiecesfrommultipletemplatemodelsguidedbydensity• Rebuildwithuser-definedrestraints• Iterativelyrebuildmodelsindifficultcasesdifficultcases

Asarunningexample,weusethe20Sproteasome(XuemingLietal.,NatureMethods,2013),whereonlyasubsetofparticleswereused,resultingina4.1Åreconstruction.Wearebuildingmodelsstartingfromahomologousstructure(pdbid:1iru)asthestartingmodel(25%/32%sequenceidentitytochainsA/B).

Example3A:PreparingtemplatesforuseinRosettaCM

Inmanycases,muchofthesetupworkishandledbyascript,setup_RosettaCM.pyinRosettaTools(aseparaterepositoryavailablefromrosettacommons.org).Thisscripttakesaninputalignmentinavarietyofformats,andpreparestheinputsautomatically.Itisexecutedbyrunningthecommand:

setup_RosettaCM.py \ -–fasta t20s.fasta \ --alignment tmpl.fasta \ --alignment_format fasta \ --templates tmpl.pdb \ --rosetta_bin ~/Rosetta/main/source/bin \ --verbose

Inputsincludethefull-lengthfasta,analignmentfile–ineitherfasta,ClustalW,orHHSearchformat–andthecorrespondingtemplatePDBfiles.ThisscriptwillprepareallthenecessaryinputsinordertorunRosettaCM.

Alternately,thesetupmaybeperformedmanually.Inthiscase,sinceweareusingsomenonstandardfeatures(symmetryanddensity),andwehavetwochainsintheasymmetricunitwewilldothis;alternately,theinputsfromthepreviousstepmaybeusedasastartingpointandsubsequentlymodified.Inthiscase,wefirstconvertouralignmenttoRosettaformat(scenario2_model_rebuilding/20S_1iru.ali):

## 1XXX_ 1iruAH_thread # hhsearch scores_from_program: 0 1.00 0 TVFSPDGRLFQVEYAREAVKK-GSTALGMKFANGVLLISDKKVRSRLIEQNSIEKIQLIDDYVAAVTSGLVADAR... 0 TIFSPEGRLYQVEYAFKAINQGGLTSVAVRGKDCAVIVTQKKVPDKLLDSSTVTHLFKITENIGCVMTGMTADSR... --

Inthisformat,thefirstlineis'##'followedbyacodeforthetargetandoneforthetemplate.Thesecondlineidentifiesthesourceofthealignment;thethirdjustkeepasitis.Thefourthlineisthetargetsequenceandthefifthisthetemplate;thenumberisan'offset',identifyingwherethesequencestarts.However,thenumberdoesn'tusethePDBresidbutjustcountsresiduesstartingat0.Thesixthlineis'--'.Multiplealignmentsmaybeconcatenatedinasinglefile,withthetemplatecodeidentifyingthetemplatecorrespondingtoeachalignment.

RosettaCMtakesasinputspartiallythreadedmodels,thatismodelswherealignedpositionshavetheir

residueidentitiesremapped,andunalignedresiduesarenotpresent.Togeneratethesemodelsfromanalignmentfileandtemplate,wecanruntheRosettacommand(3_model_rebuilding/A_partialthread.sh):

$ROSETTA3/source/bin/ partial_thread.macosclangrelease \ -database $ROSETTA3/database/ \ -in::file::fasta t20s.fasta \ -in::file::alignment 20S_1iru.ali \ -in::file::template_pdb 1iruAH_aln.pdb

Thiswilloutputapartiallythreadedmodelin1iruA_thread.pdbthatiscorrectlynumberedforinputintoRosettaCM.

Next,weneedtosetupsymmetricmodelingwithRosettaCM.AsinScenario1,weusethemake_symmdef_file.plscriptinordertogenerateasymmetrydefinitionfileforuseinRosetta.AstraightforwardwaytodosoistouseChimeratodockthenecessarychainsintodensity.Weneedasingle"primarychain"andacoupleofanadjacentchainineachpointgroup;sincetheproteasomefeaturesD7symmetry,thatmeansweneedanadjacentchaininthe7-foldcomplex,aswellasachainintheoppositering.Anexamplehasbeencreatedinscenario2_model_rebuilding/setup_symm.pdbwherethreecopiesofthethreadedmodelhavebeendockedintodensitywithChimera.TogenerateourD7symmetryfilefromthisinput,wethensimplyhavetorunthecommand(3_model_rebuilding/B_make_symmdef.sh):

~/rosetta_source/src/apps/public/symmetry/make_symmdef_file.pl \ -m NCS -a A -i B C \ -p setup_symm.pdb –r 1000 > D7.symm

Sincewehavealreadycreatedtheinputtemplatesusingthepartial_threadapplication,wecanignorethesetup_symm_INPUT.pdbfileandusetheoutputofpartialthreadastheinput.However,westillneedtoalignallthethreadedmodelstothisinputstructure.ThiscaneitherbedonewiththeprogramTMalign(externaltoRosetta)orbyusingChimeratodocktheindividualthreadedmodelsintodensity.Inthiscase,wherewehavejustonetemplate,ithasalreadybeenalignedtothetemplateinscenario2_model_rebuilding/tmpl_thread_aln.pdb.

AsinScenario1,weneedtomakeasmalledittothesymmetrydefinitionfilefordensityrefinement.Changethefollowinglines:

set_dof JUMP0_0_to_com x(35.3434689631743) set_dof JUMP0_0_to_subunit angle_x angle_y angle_z set_dof JUMP0_0 x(39.2905097135684) angle_x

To(3_model_rebuilding/D7_edit.symm):

set_dof JUMP0_0_to_com x y z set_dof JUMP0_0_to_subunit angle_x angle_y angle_z

Note:The20Sproteasomecaseweareusingcontainstwochainsintheasymmetricunit.TospecifythisasinputstoRosettaCM,weneedtolistthefastafile,separatingthechainsbytheslashcharacter‘/’.ThisisreallyonlynecessaryinthefastaprovidedasinputtoRosettaCM(nextstep)however,thereisnoharmisdoingthisineverystep.

Example3B:RunningRosettaCMusingasingletemplatemodelasinput.

LikethemethodsintroducedinScenario1,RosettaCMiscontrolledthroughanXMLscriptusingRosettaScripts.TheXMLisasfollows(3_model_rebuilding/C_rosettaCM_singletarget.xml):

<ROSETTASCRIPTS> <TASKOPERATIONS> </TASKOPERATIONS> <SCOREFXNS> <ScoreFunction name="stage1" weights="score3" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="10"/> </ScoreFunction> <ScoreFunction name="stage2" weights="score4_smooth_cart" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="10"/> </ScoreFunction> <ScoreFunction name="fullatom" weights="beta_cart" symmetric="1"> <Reweight scoretype="atom_pair_constraint" weight="0.1"/> <Reweight scoretype="elec_dens_fast" weight="35"/> <Set scale_sc_dens_byres="R:0.76,K:0.76,E:0.76,D:0.76,M:0.76, C:0.81,Q:0.81,H:0.81,N:0.81,T:0.81,S:0.81,Y:0.88,W:0.88, A:0.88,F:0.88,P:0.88,I:0.88,L:0.88,V:0.88"/> </ScoreFunction> </SCOREFXNS> <FILTERS> </FILTERS> <MOVERS> <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" stage1_increase_cycles="1.0" stage2_increase_cycles="1.0" linmin_only="0" realign_domains="0"> <Template pdb="1iruA_thread.pdb" weight="1.0" cst_file="AUTO" symmdef="D7_edit.symm"/> </Hybridize> </MOVERS> <PROTOCOLS> <Add mover="hybridize"/> </PROTOCOLS> </ROSETTASCRIPTS>

Themainworkisdonethroughasinglemover,Hybridizewhichhandlesallstagesofmodel-building.InputstructuresarespecifiedviaTemplatelines(inthiscasethereisonlyone).Foreachtemplateline,wespecifythepdbinput,aswellasacoupleofotherparameters:aweight(therelativefrequencywesampleeachtemplatewith);aconstraintfile(settingthisto"auto"setsupautomaticconstraintstothetemplate,whilesettingthisto"none"turnsoffallconstraints,user-definedconstraintsaredescribedlater);andan(optional)symmetrydefinitionfile.

Note:Besurethatyourtemplatesarealignedtothedensity!

GiventhisXML,RosettaCMisthenrunwiththefollowingcommandline(3_model_rebuilding/B1_rosettaCM_singletarget.sh):

$ROSETTA3/source/bin/rosetta_scripts.macosclangrelease \ -database $ROSETTA3/database/ \ -in:file:fasta t20s.fasta \ -parser:protocol C_rosettaCM_singletarget.xml \ -nstruct 50 \ -relax:jump_move true \ -relax:dualspace \ -out::suffix _singletgt \ -edensity::mapfile t20S_41A_half1.mrc \ -edensity::mapreso 5.0 \ -edensity::cryoem_scatterers \ -beta \ -default_max_cycles 200

Theinputcommandissimilartothoseseenbefore,butwithafewkeydifferences.First,theinputtoRosetta

isspecifiedwith-in:file:fastaratherthan-in:file:s.Alsonotethattheinputargument–nstruct50isgiven,tellingRosettatogenerate50modelsforeachprocess.Generally,hundredstothousandsofmodelsarenecessarytosufficientlysampleconformationalspace;moreandlongerregionstorebuildrequiremoremodels.

Runningwithoutsymmetry

RunningwithoutsymmetryrequiresonlytwosmallchangestotheXMLfile:• Removethetag:symmdef="D7_edit.symm"• Removethethreetagssymmetric=1

JobdistributionAswithsection2,acombinationof-out::suffixandGNUparallelisusefulfordistributingjobs.Forexample,onemayreplacethe–out::suffixlineabovewith–out::suffix_$1,thenlaunchjobswith:

parallel –j16 ./B1_rosettaCM_singletarget.sh {} ::: {1..16}

Thetotalnumberofstructuresgeneratedisthenumberofstructuresspecifiedwith–nstructtimesthenumberofjobslaunched(inthisinstance,50structuretimes16jobs=800structures).Dependingontheruntimeperstructure(variabledependingonstructuresize)andthenumberofCPUsavailable,bothofthesenumbersmaybeadjusted.

Finally,GNUparallelallowslaunchingofjobsremotelyifSSHkeyshavebeensetupforpasswordlesslogin.Torun:

parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./B1_rosettaCM_singletarget.sh {} ::: {1..48}

Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentationformoreinformation.r

Example3C:RunningRosettaCMusingmultipletemplatemodelsasinput.

OneofthestrengthsofRosettaCMisitsabilitytomakeuseofmultipletemplatestructures,andtorecombineportionsofthesemodelsduringconformationalsampling.Thisisparticularlyusefulwhenmultiplehomologousstructuresareavailable,somewithclosersequenceidentity,andsomewithmorecompletecoverage.Theprotocolallowsthecombinationoffeaturesofbothmodels.

Tomakeuseofthisfeature,wesimplyaddadditionaltemplatelinesintheinputXML.Inthiscase,weaddthetemplate1ryp(scenario3_model_rebuilding/C1_rosettaCM_multitarget.xml):

... <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" stage1_increase_cycles="1.0" stage2_increase_cycles="1.0" linmin_only="0" realign_domains="0"> <Template pdb="1iruA_thread.pdb" weight="1.0" cst_file="auto" symmdef="D7_edit.symm"/> <Template pdb="1rypA_thread.pdb" weight="1.0" cst_file="auto" symmdef="D7_edit.symm"/> </Hybridize> ...

Therestishandledautomaticallybytheprotocol.However,thereareafewcaveatswhenusingmultiple

inputstructures:

• Withdensity,weneedtoensurethatallinputmodelsarealignedtothedensity.ThiscanbedoneusingeitherTMalignorChimera’salignmenttools.

• Ineachtrajectory,astartingmodelischosenatrandom;theconstraintsandsymmetryfromthisselectedmodelarechosenatthestartofeachrun.Ifwewishtouseaportionofamodel,butdonotwanttouseitssymmetryorconstraints,wecanassignitaweightof0:backboneconformationsfromthismodelwillbeusedinconformationalsampling,butthesymmetryandconstraintswillneverbeused.

• Similarly,gapsintheselectedstartingmodelarerebuiltbeforerecombinationoccurs.Ifoneofthetemplateshaspoorcoverage,butprovidesvaluablestructuralfeatures,itshouldbeused,butwithweight0.

Example3D:RunningRosettaCMwithuserspecifiedconstraints.

AnotherstrengthofRosettaCMistheabilitytomakeuseofadditionalexperimentalinformationthatprovidesrestraintsoverconformationalspace.Whilepreviously,wehaveusedcst_file=autotoautomaticallygenerateconstraintsfromtemplatestructures,ifexperimentsprovidedistanceconstraints(orsomeotherpositionalrestraint,wemaymakeuseoftheminmodelrebuildingaswell.

TheRosettadocumentationprovidesagoodoverviewofthetypesofconstraintsthatmaybeused,withanumberofdifferentconstrainttypesandfunctionalformspossible.Forthisdemo,wewillassumewehaveknowledgeonthedistancebetweenresidues107and143thatwewanttouseduringrebuilding.

Thisinformationcanbeencodedinaconstraintfile(scenario3_model_rebuilding/D1_constraints.cst):

AtomPair CA 107 CA 143 HARMONIC 5.0 1.0

Note:Thenumberingofresiduesisbasedupontheorderintheinputfastafile(anddoesnotresetbetweenchains!).

Wethenreplacethecst_file=autolinesintheXMLwithourownconstraintfile(scenario3_model_rebuilding/D1_constraints.xml):

... <Template pdb="tmpl_thread_aln.pdb" weight="1.0" cst_file="D1_constraints.cst" symmdef="D7_edit.symm"/> ...

Wecanthenrebuildandrefineasbefore.

Example3E:ModelselectionandrunningRosettaCMiteratively

Withpossiblyhundredsofgeneratedmodels,thereareafewstrategiestoidentifythebest-sampledmodels.Generally,modelsshouldbefilteredontwodifferentcriteria–thetotalscoreandthedensityscore–insomeway.Weoftenselectthebest10-20%ofmodelsbasedontotalscore,andthesortthesemodelsbydensityscore,butvisualinspectionofthebestbybothcriteriacanoftenbebeneficialindifficultcases.

Finally,onestrategyforsolvingdifficultstructuresistoapplyRosettaCMiteratively.Usingtheabovecriteria,wecanselectthebest5-10modelsfromthefirstroundofrefinement,andfeedthemasinputsintothenextround.Modelscanbeselectedbyenergybylookingatthescorecolumnintheoutput.scfiles.

ThisisverybrieflyillustratedinthefollowingXML(scenario3_model_rebuilding/E1_rosettaCM_iter.xml):

... <Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1"> <Template pdb="expected_outputs/S_multitgt_0001_A.pdb" weight="1.0" cst_file="NONE" symmdef="D7_edit.symm"/> <Template pdb="expected_outputs/S_multitgt_0002_A.pdb" weight="1.0" cst_file="NONE" symmdef="D7_edit.symm"/> <Template pdb="expected_outputs/S_multitgt_0003_A.pdb" weight="1.0" cst_file="NONE" symmdef="D7_edit.symm"/> </Hybridize> ...

Thereisalsosomemanipulationofinputmodelsthatcanprovebeneficial.Ifonewantstoforcerebuildingsomesegmentofbackbone,theycansimplydeletethoseresiduesinallinputmodels.Similarlyifonewantstoforcesomeregiontoadoptaconformationtakinginoneinputmodel,theycandeleteallotherconformationsfromallmodels.

Example3F:RunningwithLigandsand/orNucleicAcids

RosettaCMcanberunwithligandsaswellasnucleicacids.WhilenucleicacidsarereadbyRosettanatively,ligandsrequireanadditionalinputtoRosetta,aparamsfile.

Forligands,amol2fileoftheligandwhichcontainshydrogenatomsisrequired.TheprogramOpenBabel(http://openbabel.org/wiki/Main_Page)iscapableofbothconvertingfromPDBtomol2aswellasaddinghydrogenstoamoleculegivenaPDBfileoftheligand.

Givenamol2file,molfile_to_params.py,includedinRosetta,at$ROSETTA3/source/scripts/python/public/willgenerateaparamsfile.Touse:

python $ROSETTA3/source/scripts/python/apps/public/molfile_to_params.py \ --keep-names --clobber –centroid XXX.mol2 -p XXX -n XXX

Replace"XXX"withtheligandsthreelettercode.ThisscriptwillcreateanXXX.paramsfileaswellasseveralotherfiles.ThisfilecanbepassedintoRosettausingtheflag:-extra_res_fa XXX.fa.params -extra_res_cen XXX.cen.params

Ifthereismorethanoneuniqueligand,runthescriptoneachuniqueligand,andpassalistofparamsfilesusingeachoftheflags.WhenmodellingwithligandsornucleicacidsinRosettaCM,twoadditionalthingsareneeded:1.TheligandornucleicacidsmustbeaddedtotheENDofeachtemplatefilewithaweight>02.AnadditionalflagneedstogetaddedtoHybridize:<Hybridize name="hybridize" stage1_scorefxn="stage1" stage2_scorefxn="stage2" fa_scorefxn="fullatom" batch="1" add_hetatm="1">

4)Denovomodel-buildingguidedbyexperimentaldensitydata

Inthisscenario,weintroduceatool,denovo_density,aimedatautomaticallybuildingbackboneandplacingsequencein3-4.5ÅcryoEMdensitymaps.Thistoolisprimarilyintendedforcaseswhereamodelistobebuiltwithnoknownstructuralhomologues.Itisrelativelyexpensivecomputationally,andconsistsoffourbasicsteps:

• Searchforlocalbackbone"fragments"inthedensitymap• Scorethe"compatability"ofsetsofplacedfragments• MonteCarlosamplingforthe"maximallycompatable"fragmentset• Consensusassignmentfromthebest-scoringMonteCarlotrajectories

Theinputofthemethodisasegmenteddensitymap,andthesequencecontainedinthisregion.Theoutputofthemethodisa"partialmodel"that–ideally–places70-80%ofthesequenceintothedensitymap.ThispartialmodelcanthenbeusedasinputforRosettaCM(seesection3ofthetutorial).

Thismethodisunderconstantdevelopment.Currentlimitationsoftheapproach–hopefullytobeaddressedinfuturerevisions–include:

• Thecodeassumessegmenteddensitymaps.Itcannothandlesymmetry,andpoorlyhandlesunsegmenteddensitymaps.Resultsarebestwhenthemapissegmentstoonlycontainonecopyoftheresiduesgettingbuilt.

• Ithasonlybeentestedbuildingproteins~600residuesorless.Whileitshouldconceptuallyscaletolargerproteins,thisisuntested.Furthermore,withlargerproteins,thememoryusageofsteps2and3increasessignificantly,socaremustbetaken.

• Itcurrentlydoesnotidentifyandbuildligandsornucleicacids

Thissectionofthetutorialwalksthroughthesefourstepsoftheprotocol.Asarunningexample,weagainusethe20Sproteasome(XuemingLietal.,NatureMethods,2013),inthiscaseusinga4.8Åreconstructiondeterminedwithoutusingmotioncorrection.Wewillpretendthatknownhomologousstructuresarenotavailable,andinsteadwillbuildmodelsintodensitydenovo.

Inputfilepreparation:downloadfragmentfilesfromRobetta

Beforerunningthemethod,ausermustfirstcreatea"fragmentfile"thatpredictslocalbackboneconformationsgiventheamino-acidsequence.Theeasiestwaytodosoistosubmityoursequenceathttp://robetta.bakerlab.org/.

Alternately,theRosettausersguidedescribeshowfragmentfilesmaybebuiltlocally:(https://www.rosettacommons.org/manuals/archive/rosetta3.5_user_guide/dc/d10/app_fragment_picker.html)

Step4A.Localfragmentsearch

Inthefirstpartoftheprocedure,wesearchthedensitymapforeachsequence-predictedbackbonefragment.Thispart,likeallthestepsinthissection,usesaRosettaapplicationdenovo_density.

Thecommandtorunfragmentsearchingforasingleresidue(4_denovo_demo/A_search.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -in::file::fasta t20sA.fasta \ -fragfile ./t001_.25.9mers \ -mapfile ./T20S_48A_alpha_chainA.mrc \ -n_to_search 500 -n_filtered 2500 -n_output 100 \ -bw 16 \ -atom_mask_min 2 \ -atom_mask 3 \ -clust_radius 3 \ -clust_oversample 4 \

-point_radius 3 \ -movestep 1 \ -delR 2 \ -frag_dens 0.8 \ -ncyc 3 \ -min_bb false \ -pos $1 \ -out:file:silent round1/t20s.$1.silent

Mostoftheargumentsshownhereshouldbeleftas-is.However,thereareafew–highlightedaboveinboldface–thatyoumightwanttochange:-n_to_search 500 -n_filtered 2500

Theseflagscontrolthenumberoftranslationstosearch,andthenumberofintermediatesolutionstokeep.Asaruleofthumb,theseshouldbeabout2and10timesthenumberofresiduesinthemap,respectively.

Makenoteofthefollowingflag:-pos $1

Thisflagtellsthecodetoonlysearchforfragmentsattheassignedpositions.Thisallowsforparallelizationofthescript,byrunningseparatejobsforeachpositionintheprotein.($1meansthescripttakesthepositionasaninputargument).

Forthisstep,youneedtorunthescriptonceforeachpositionintheprotein.Thiscanbedoneverysimplywiththebashcommand(forthiscase,the221indicatesthereare221residuesintheprotein):for i in `seq 1 221`; do ./A_search.sh $i; done

Theoutputofthisscriptisasinglefileforeachpositionintheprotein,thatidentifiestheplacementandconfigurationofeachdockedfragment.Thesefilesareusedasinputforthenextstepoftheprocess.

However,aseachpositionrunsindependently,andeachpositionmighttake30minutestoanhourforthesearch,youwillprobablywanttoparallelizethisovermanyprocessors.

Jobdistribution

Asthisisthemostcomputationallyintensivestep,itmakessensetoparallelizethisstep,byrunthisprocedureseparatelyforeachpositionintheprotein.UsingGNUparallel.parallel –j16 ./A_search.sh {} ::: {1..221}

GNUparallelallowslaunchingofjobsremotelyifSSHkeyshavebeensetupforpasswordlesslogin.Torun:

parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./A_search.sh {} ::: {1..221}

Thiswilllaunchinstead48jobsspreadacrossfourmachines.SeetheGNUparalleldocumentationformoreinformation.

Otherusefuloptions

Therearetwooptionscontrollingfragmentplacementthatmayalsobeusefulincaseswherethereissomepreviousknowledgeaboutthestructureinquestion.Thefirstofthesedealswiththecasewherethebackbonestructureisknown(oratleastsomewhatknown)butregistrationofthesequencewiththebackbonemodelisambiguous.Inthiscase,aknownbackbonemodelcanbeprovidedwiththeflag:

-ca_positions backbone.pdb

Inthiscase,thecodewillonlyconsiderfragmentscenteredontheCalphapositionsfromtheinputmodel.Thisoffersasignificantspeedupaswellasreducedsearchspace.

Alternately,ifpartofthestructureisknowninadvance,itmaybeprovidedwiththeflag:

-startmodel start.pdb

Inthiscase,thematchingroutinewillmatchonlythenativefragmentscoveredbystartmodel,ensuringthatthesepositionswillbemaintainedthroughouttherefinement.Whenusingthisoptionstherearetwoimportantthingstokeepinmind:

• ThenumberinginthePDBfilemustmatchthenumberingoftheinputfasta• Anycontinuoussegmentsshorterthan9aminoacidsintheinputfilewillgetignored

Step4B.Placedfragmentscoring

Inthisstep,wewanttotaketheplacementsfromthepreviousstepandscorethemforcompatability.TheoutputsfromstepAareusedasinputsinthisstep(4_denovo_demo/B_score.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -mode score \ -in::file::silent round1/t20s*silent \ -scorefile round1/scores1 \ -n_matches 50

Highlightedinboldaretheinputfiles(-in::file::silent)–theoutputsfromthepreviousstep–andthescorefiletobewritten(-scorefile),theoutputofthisstep.Iftheoutputiswrittentoaseparatefolder,youwillneedtopointthecommandlinetothisalternatelocation.Thisstepisrelativelyfast(lessthan5minutes)andcanberunonasingleprocessor.

Step4C.MonteCarlofragmentassembly

Inthethirdstep,weusetheoutputsfromtheprevioustwosteps,andtrytogeneratea"maximally

consistent"fragmentassignment.ItusesMonteCarlosamplingandascorefunctionassessingfragmentcompatibilitytoidentifythisfragmentset.Thecommandline(4_denovo_demo/C_assemble.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -mode assemble \ -nstruct 5 \ -in::file::silent round1/t20s*silent \ -scorefile round1/scores1 \ -assembly_weights 4 20 6 \ -null_weight -150 \ -out:file:silent round1/assembled.$1 \ -scale_cycles 1 \ -mute core

AswithstepB,theoutputsoftheprevioustwostepsneedtobeprovidedasinputs:-in::file::silent round1/t20s*silent -scorefile round1/scores1

Thescriptthenwritesasinglefileforeachindependenttrajectory:-out:file:silent round1/assembled.$1

Finally,eachjobwillgenerateseveral(inthiscase5)independenttrajectories:-nstruct 5

Itisrecommendedtogenerateatotalof1000independenttrajectories.AswithstepA,thiscanbesomewhattime-consuming(thoughnotastime-consumingasstepA).Thereforeitisrecommendedtoparallelizethisasbefore:parallel –j16 ./C_assemble.sh {} ::: {1..200}

Oracrossmultiplemachines:

parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./C_assemble.sh {} ::: {1..200}

Step4D.Consensusassignment

Thefinalstepoftheprotocolistoidentifytheconsensusassignmentfromthelowest-scoringMonteCarlotrajectories.Thisisdoneusingthefollowingcommand(4_denovo_demo/D_consensus.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -mode consensus \ -in::file::silent round1/assembled.*silent \ -consensus_frac 1.0 -energy_cut 0.05 \ -mute core

Theoutputtrajectoriesofthepreviousstepareprovidedasinputtothisscriptwith–in::file::silent.Thisscriptlooksforaconsensusassignmentinthebest-scoringtrajectories,andwilloutputaPDBfile,S_0001.pdb.

ThisPDBfilecontainssequenceplacedintodensity.Ideally,themodelatthispointismorethan70%complete,andthisfilecanthenbeusedasinputtoRosettaCM(seesection3).Ifinsteadthisstructurecontainsareasonablepartialmodel,butwithlessthan70%coverage,theiterativeapproachofthenextsectioncanfurtherimprovethecoverageofthepartialmodel.

Step4E.Iterativeassemblytoincreasemodelcoverage

Insomecases,itmaybenecessarytoiteraterefinement,assubsequentroundsofdenovobuildingmaytraceportionsofthemodelunabletobeplacedinpreviousrounds.Thefollowingcommandlineillustrateshowassemblymaybeiterated(4_denovo_demo/E_search_iter.sh):

$ROSETTA3/source/bin/denovo_density.macosclangrelease \ -in::file::fasta t20sA.fasta \ -fragfile ./t001_.25.9mers \ -startmodel round1_model.pdb \ -mapfile ./T20S_48A_alpha_chainA.mrc \ -n_to_search 250 -n_filtered 1250 -n_output 50 \ -bw 16 \ -atom_mask_min 2 \ -atom_mask 3 \ -clust_radius 3 \ -clust_oversample 4 \

-point_radius 3 \ -movestep 1 \ -delR 2 \ -frag_dens 0.8 \ -ncyc 3 \ -min_bb false \ -pos $1 \ -out:file:silent round2/t20s.$1.silent

ThisstepisnearlyidenticaltostepA,withtwokeychageshighlightedinbold.Thefirstchangeindicatesthattheoutputofthepreviousstepistobeusedasaninitialmodel:-startmodel round1_model.pdb

Followinginspection,thismodelmayalsobemanuallyeditedaswell.Thesecondchangemakessuretheoutputsdon’tclobbertheoutputsfromthepreviousstep:-out:file:silent t20s.rd2.$1.silent

AswithstepA,thisshouldbeparallelizedovermanyprocessors,e.g,usingGNUparallel:parallel –S 16/node1,16/node2,16/node3,16/node4 -–workdir . ./E_search_iter.sh {} ::: {1..221} Oncecomplete,stepsB,C,andDmaybethenfollowedinorder,makingsurethattheinputandoutputfilesareupdatedtoindicatetheround2models.NOTE:Ateachstep,becertainthatyouroutputsarenotoverwritingoneanother.Oncethepartialmodelhasbeencalculated,itissafetodeletetheintermediatefilescreatedaspartoftheconstructionprocess.Inthiscase,thesameoutputfilenamesmaybereused(thus,thesamescriptscanbeusedforeachiterationasidefromthefirst).

5)Completingpartialmodelsguidedbyexperimentaldensitydata

Whilethepreviousworkflowshaveaddressmodelbuildingandmodelrefinement,noneoftheaforementionedtoolsdealwithcompletionoflargesegmentsofprotein.Thesemayariseinseveralcases:

1. Homologymodels(particularlydistantones)mayhavelargeinsertions,orevenentiredomainsthatarelacking.

2. Themodelsproducedfromdenovo_densitymaybemissingsignificantfractionsofthebackbone3. Itmaybedifficulttomanuallytracelongstretchesoflowlocalresolutionintodensity

Toaddresstheseissues,wehavedevelopedatoolcalledRosettaEnumerativeSampling,whichusesaensemblesearchalgorithmtodeterminealargenumberofconformationsthatarebothconsistentwiththedensityandtheRosettaenergyfunction.Thistoolcanbeusedonapartialmodelsfromthedenovo_denistyapplication,anincompletehomologymodel,oranyotherstartingstructure. RosettaESrunsbestwhenworkingwithdataatresolutions5Åorbetterwithsegmentstorebuildshorterthan50residues.However,withverylargeamountsofsampling(e.g.,ensemblesizes>250),reliablemodelsmaybeproducedwithsegmentslongerthan100residues.Atresolutionsworsethan5Å,thistoolmaybeunreliable.Themethodcanbeusedonbothsegmentedandunsegmenteddensitymaps,however,removalofdensitybelongingtopartsofthestructurenotbeingmodeledmayimproveresults. RosettaESmodelbuildingconsistsofthreesteps.Initially,apreparationstepbuildsthefragmentsthataretobeusedinconformationalsampling.Thenarebuildingstepwillidentifyeachunassignedsegmentintheinitialmodelandbuildanensembleofpossiblesolutionsforeach.Finally,acombinationstepfindsalltheconsistentsubsetsofinteractions,andrefinesallsuchmodels(ifthereisonlyonesegment,thescriptsimplyrefinesallstructuresintheensemble).Inthiscombinationstep,ifassemblyfailstofindaconsistentsetofsolutions,anadditionalroundofsamplingwillbecarriedout,forcingdifferentsolutionsthanthepreviousmodel. Comparedtotheothersections,theworkflowisabitmorecomplicatedwhenextendedtomultiplecomputecores.TohandlejobdistributionwehaveincludedapythonscriptRunRosettaES.pythatmanagesthisjobdistributionamongavailableCPUsonasinglemachine.(ThescriptisincludedaspartofRosetta,in/main/source/scripts/python/public/EnumerativeSampling,aswellasinthistutorial).Fordealingwithjobschedulersorclustersincompatiblewiththisscript,section5EgivesanoverviewofjobdistributionwithRosettaES.

Step5A.FragmentPicking

Thefirststep–muchlikeScenario4–involvesselectionof"fragmentfiles,"whichpredictbackboneconformationfromlocalsequence.UnlikeScenario4,wehaveacustomalgorithmforfragmentpicking.ThesefragmentswillneedtobegeneratedbeforerunningRosettaES;thefollowingcommandwillgeneratethesefiles(5_rosettaES/A_PickFragments.sh): $ROSETTA3/source/bin/grower_prep.default.macosclangrelease \ -pdb input.pdb \ -in::file::fasta t20sA.fasta \ -fragsizes 3 9 \ -fragamounts 100 20

Thiswillgenerate1003residuefragmentsand209residuefragments,named100.3mersand20.9mers,thatarethenusedinsubsequentstepsoftherebuildingprocess.

Step5B.GeneratePossibleConformationsForEachSegment Thegrowerconsidersassigningpositionsforeachunassignedsegmentofdensity(thatis,eachstretchofaminoacidspresentinthefastafilebutmissingfromtheinputstructure).Eachsegmentisreferredtousingasegmentid,inwhicheachsegmentisnumberedfromN-toC-terminus(withmultiplechainsgiveninorderintheinputfastafile).Thescriptisrunintwoparts:first,thescriptisrunonceforeachsegmenttorebuild;then,thescriptisrunin“assemblymode”giventheoutputsproducedbyrebuildingeachsegmentindividually.Thus,forrebuildingthetwosegmentsinthetestcase,thescriptiscalledthreetimes:oncetobuildeachsegment,andoncetoassembletheresults. Inthefirststep,weperformconformationalsamplingofeachofthetwosegments,generatinganensembleofputativesolutionsforeach.Thiscanbedonecallingthecommand(5_rosettaES/B1_SampleSegment1.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f t20sA.fasta \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -l 1 \ -c 16 \ -n loop_1

Theargumentstothisprogramareasfollows:

• -rsrunES.sh-thescriptthatislaunchedoneachcoreandcontainsRosettaflagsandinputs • -xRosettaES.xml-theXMLscriptdescribingparametersforconformationalsampling(seebelow) • -ft20sA.fasta-theinputfastafile(withchainbreaksspecifiedby‘/’) • -pinput.pdb-theinputpdbfile.Thisneedstomatchtheinputsequence,andallresiduespresentin

thefastabutabsentinthePDBwillgetbuilt. • -dT20S_48A_alpha_chainA.mrc-theinputdensitymap • -l1-thesegmentidofthesegmenttorebuild.Thiscommandshouldbecalledonceforeachsegment

torebuild,varyingthisargumentfrom1toN • -c16-thenumberofcomputecorestouse • -nloop_1-theoutputtagforthisjob(resultswillbeplacedinafolderwiththisname).Tagsshould

beuniqueforeachsegment. TheinputXMLfileexposeskeyparametersforconformationalsampling.Inthetutorial,thisfile,5_rosettaES/RosettaES.xml,containsablock: ... <FragmentExtension name="ext" fasta="full.fasta" scorefxn="dens" censcorefxn="cendens" beamwidth="32" dumpbeam="0" samplesheets="1" read_from_file="0" continuous_weight="0.3" looporder="1" comparatorrounds=”100” windowdensweight=”30” readbeams="%%readbeams%%" storedbeams="%%beams%%" steps="%%steps%%" pcount="%%pcount%%" filterprevious="%%filterprevious%%" filterbeams="%%filterbeams%%"> <Fragments fragfile="100.3mers"/> <Fragments fragfile="20.9mers"/> </FragmentExtension> ...

ThesamplingbehaviorofRosettaESiscontrolledbytheblockabove.Manyofthetagsinthisblock–fasta,dumpbeam,read_from_file,storedbeams,steps,pcount,filterprevious,comparitorrounds,andfilterbeams–areusedbythejobdistributionscripttopassresultsfromonesteptothenext,andtheyshouldbeleftas-is.

Othersareuser-specified,andcanbemodifiedbasedonthesizeoftheloopandresolutionofthedata: • beamwidth:controlsthemaximumnumberofsolutionstobeheldateachstep.

Settingthevaluehigherwillincreaseruntimebutmayimproveaccuracy. • windowdensweight:therelativecontributionofdensityinmodelselection

Formanycases,thedefaultparametersaresufficient.However,ifthesegmenttogrowislong(50+residues),youmayneedtoincreasebeamwidth;ifthedensityislowresolution,youmightneedtodecreasewindowdensweightto15or20. Severaloptionsshouldrarelybemodified,butmayneedtobeinspecificcases:

• samplesheets:Controlswhetherornotbetasheetsamplingshouldbeperformed.Itisrecommendedtousethisexceptwhenworkingwithsymmetricsystems.

• continuous_weight:Controlsthepenaltyondiscontinuousdensity.Settingthisvalueto1willcompletelyremoveanypenaltyondiscontinuousdensity;settingitcloserto0willincreasethepenalty.Youmaywishtoraisethisvalueto0.7(ormore)ifyouanticipatethesegmentyouaretryingtomodeldoesnotfollowacontinuouspathofdensity.

Finally,theoptioncomparitorroundsisusedinmulti-segmentassembly(seesection5C)AfterrunningthescriptwiththisXML,therearetwoimportantintermediateoutputfiles,placedinthefolderloop_1(theargumentto-n):

• .lps(forlooppartialsolution)files,whicharethencombinedinstep5C,incaseswheretherearemultiplesegmentstomodel

• taboo/beamX.txtfiles,whereXcorrespondstothenumberofresiduesaddedtothesegment.Thesearegeneratedasthesearchaddsresidues,andareusedtopassinformationfromonesteptothenext(asadditionalresiduesareaddedinasinglesegment).

Note:ThisprocessshouldthenberepeatedforallremainingsegmentstorebuildInthetutorial,thecommand5_rosettaES/B2_SampleSegment2.shbuildsconformationsforthesecondsegmentinthisfile.Allsegmentscanbesampledindependentlyofoneanother,soifmanycomputenodesareavailable,eachsegmentcanbesampledsimultaneouslyonseparatenodes.Finally,whileinmostcases,userswillwanttotakethesemodelsintotheassemblystep(part5C),ifthereisonlyonesegmenttorebuild,orifthesamplingresultswanttobeinspected,thefinaloutputensemblecanbesavedasPDBfileswiththecommand(5_rosettaES/B1.2_InspectIntermediates.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f t20sA.fasta \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -l 1 \ -db loop_1/taboo/beam17.txt

Note,thenumberofthebeamfile(17)correspondstothetotalnumberofresiduesbuilt.Intermediateresults(aftergrowingNresidues)canbeinspectedbychangingthistoalowernumber(e.g.,beam14.txtshowssolutionsafter14ofthe17residueshavebeenrebuilt).

Step5C.Findasetofconsistentconformations.

Incaseswheretherearemultipleinteractingsegments,wewanttofindallnonclashingcombinations.Thisstepwilltakethelooppartialsolution(lps)filesgeneratedinstepBanduseaMonteCarloAssembly(MCA)algorithminordertoidentifysetsofsolutionsthatareself-consistent.ThissectionassumesthatallmissingsegmentshavebeenbuiltinstepB.Torunthisassembly,weperformconformationalsamplingusingthescript,passingthe.lpsfilesgeneratedinstepB(5_rosettaES/C_AssembleResults.sh): python RunRosettaES.py \ -rs runES.sh \ -x RosettaES.xml \ -f t20sA.fasta \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -lps loop_*/lps*.txt

Theflag-lpspointstotheoutputsoftheindividualsegmentjobs’output.Asbefore,thescriptwilluse(andmodify)aninputXMLfile.Forassembly,onlyasingleparameterisimportanttomodify:comparatorrounds="100”.ThisparametercontrolsthenumberofMonteCarlotrajectories(itisunlikelyyouwillneedtochangethisparameter). TheoutputofthisstepisPDBfiles,thatwillbeplacedintheworkingdirectorywiththeprefixaftercomparator_RRR_XXX.pdb,whereRRRisthetrajectoryid,andXXXistheenergy.Atextfilerecommendation.txtwillbewrittenthatreportstheclashscoreofthebestmodel. Ifthenumberinrecommendation.txtisabove+100itisrecommendyouperformadditionalroundsofsampling.Todoso,additionalpotentialsolutionsshouldbesampledbyrepeatingstepBandprovidingthesamedirectorynameasinput,withoutdeletingtheintermediatefiles.Thescriptwillthenenterthatdirectoryandusethealready-computedsolutions(storedinthefolder"taboo")toguidesamplingtowardpreviouslyunexploredregions.SeesectionDformoredetails. Ifthenumberinrecommendation.txtisbelow+100,thenmodelsshouldberunthroughRosettarefinementtoaccuratelyrankthem.Thatcanbedoneusingthiscommand(5_rosettaES/D_RefineOutput): python RunRosettaES.py \ -rlxs runrelax.sh \ -x RosettaES.xml \ -p input.pdb \ -d T20S_48A_alpha_chainA.mrc \ -c 16 \ -rp aftercomparator_*.pdb

Whererunrelax.shcontainsacommandforrelaxingstructuresinRosetta.

Step5D.Interpretingresults RosettaESwillproducea100modelsasoutput(orwhateverisspecifiedincomparatorrounds),withscoresinthefilescore.sc.Whenfirstlookingattheoutputfromarun,itisgoodtovisuallyinspectthelowest-energy5-10structures.Ideally,theselowest-energymodelswillbeverytightlyconverged,butthelackofconvergencedoesnotindicatefailureinsampling,andthepresenceofconvergencedoesnotnecessaryindicatethesolutioniscorrect.Instead,thelowest-scoringmodelsshouldbeexaminedwithattentiontothefollowing:

Insufficientsampling.ModelsshouldinitiallybeinspectedforclearlyincorrectfeaturesthatmightnothavebeenproperlypenalizedbyRosetta.Theseinclude:

• unexplaineddensity,particularlysmallsidechainsplacedintoalargedensityprotrusions • regionswithpoorfittodensity • unresolvedclashes

Ifthesefeaturesarepresentinthelowest-energymodels,itsuggeststhattheconformationalsamplingperformedbyRosettaESmaynothavebeensufficient.Thereareseveralwaystoaddressthisissue.Thefirstistoincreaseconformationalsampling.Thiscanbedonebyeither:a)increasedthebeamwidthparameterandrerunninganew,b)and/orperformingadditionalroundsoftaboosearch(taboosamplingoccursautomaticallyifthe“taboo/”folderintherunningdirectoryispopulatedwithbeamfiles”),orc)reducingthesearchspacebyeliminatingregionsofdensity. Thelattercanalsobeperformedbymanuallyremovingpartsofthemapyoudonotwishtosampleorbyincludingadditionalportionsofthemodel,forexampleifyouarebuildingamodelthathas4missingregionsandyoubelieveyouhaveaccuratelysampled3ofthem(theydonotpossesanyofthepathologieslistedaboveandfitthedensitywell),youcancombinethembytreatingthemastemplatesforRosettaCM(useafastafilewiththemissingsegmentremoved)andusetheresultasinputforRosettaEStobuildthelastregion. Unresolvedresidues.RosettaESwillalwaysattempttobuildallresiduespresentinthefastafile,however,inmanycasesnotallresidueswillberesolvedinthemap(particularlyattermini).BecauseRosettaESwillheavilypenalizemodelsthatdonotfitthedensity,youwilloftenfindmodelsthatare"overlycompacted"atterminiorinternalloops,totrytosqueezetheseresiduesintodensity. Ifthishappensatterminiitissuggestedthatyouexamineintermediatestructuresthathaveyettoattempttoassigntheseunresolvedresiduesinordertofindagoodmodel.Ifthishappensinternally(orifyouhaveagoodideaaprioriwhatresiduesshouldbemodeled),theseregionscanberemovedfromtheinputfasta.Ifinternalsegmentsaredeleted,besuretotreatthedeletioncorrectly,byputtinga'/'inthefastafile. Step5E.Customizingjobdistribution. JobdistributioninRosettaESiscomplicated,sinceeach"growingstep"canbeparallelized,butsubsequentstepsneedalltheinformationfromthepreviousstep.Consequently,thescriptprovided(RosettaES.py)managesjobs,bycallingRosettajobsateachround,collectingandcombiningtheresultsofthepreviousround,thensplittinginputfilesforthenextround.Onsystemswhereitisnotpossibletorunthisscript,thissectiondescribesinsomedetailwhatthescriptisdoing. TomanagethisthepythonscriptusestheprovidedXMLfileasinput,rewritingseveralparametersdependingontheprotocolstep.Thesevaryingparametersinclude:

• readbeams,abooleanoptionthattellsRosettawhetheritshouldloadintermediatesolutions • beamfile,thenameofthebeamfilethatstorestheintermediatesolutions • steps,howmanyresiduestoadd(0meansonlydofiltering,otherwisesetto1,settingto“-1”will

buildallmissingresidues) • pcount,usedtouniquelytagoutputfilesfromeachcore • filterprevious,abooleanoptionthatcontrolswhetherintermediatesolutionsshouldbereadfortaboo

search • filterbeams,thefilenameoftheintermediatesolutionsforuseintaboo.

Inordertobuildamissingsegmentthescriptwillperformthefollowingsteps:

1. Launchajobwithreadbeams="0",beamfile="na",steps="1",pcount="1",filterprevious="0",filterbeams="na".Thiswillproduceafilenamedbeam1.1.txtfromRosetta.

2. Parsethebeam1.1.txtfileandsplitintoNfiles(whereNisthenumberofparalleljobstorun).Newfilesarelabeledbeam_$r.$i.txtwhere$risthenumberofresiduesaddedsofar,and$irangesfrom1toN.

3. SubmitNrosettajobswithpcountsetasthejobnumber,readbeamssetto"true",andbeamfilesettothecorrespondingoutputofstep2.

4. Outputfilesareparsedbythescriptandcompiledintoasinglefilewiththenamebeam$r.txt,with$rthenumberofresiduesgrown.

5. Rosettaisrunonasinglecoretofiltertheaggregatesolutionset.Here,readbeams=”1”,beamfile=”beam$r.txt”,andsteps="0".Afilenamedbeam_0.txtwillhavethefilteredresults.

6. Repeatsteps2-6usingthebeam_0.txtastheinputforstep2.ThisisdoneuntilthesegmentiscompleteandRosettaproducesanemptyfilecalledfinished.txt.Thepresenceofthisfiletriggerstheprogramtoperformonefinalroundoffilteringandexit.

ThelaststepofRosettaESwilladditionallyproduceafilewiththename"lpsfile_$s.0.txt,"usedforassembly.Torunassemblywiththesefilesfirstcombinethemintoasinglefile,describedbelow,andrunwithreadfromfile=”filename”.Thisnewfileshouldstartwiththeanumbercorrespondingtothetotalnumberofmissingsegments,thenforeachmissingsegmentprovideanumberforthetotalsolutionsinthatsegmentedfollowedbythesolutioninformationcontainedinthelpsfile_$s.0.txtdescribedabove.Segmentsshouldbearrangedtomatchtheorderinwhichtheyoccurinthefasta.