Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
1
ModelinganddockingantibodystructureswithRosettaBrianD.Weitzner,1*JeliazkoR.Jeliazkov,2*SergeyLyskov,1*NicholasMarze,1DaisukeKuroda,1,3RahelFrick,4NaireetaBiswas,1andJeffreyJ.Gray1,5,6,7
1DepartmentofChemicalandBiomolecularEngineering,JohnsHopkinsUniversity,Baltimore,Maryland,USA.2T.C.JenkinsDepartmentofBiophysics,JohnsHopkinsUniversity,Baltimore,Maryland,USA.3DepartmentofAnalyticalandPhysicalChemistry,ShowaUniversitySchoolofPharmacy,Tokyo142-8555,Japan.4CentreforImmuneRegulation,DepartmentofBiosciences,UniversityofOslo,Oslo,Norway;CentreforImmuneRegulation,DepartmentofImmunology,OsloUniversityHostpitalRikshospitalet,UniversityofOslo,Oslo,Norway.5PrograminMolecularBiophysics,JohnsHopkinsUniversity,Baltimore,Maryland,USA.6InstituteforNanoBioTechnology,JohnsHopkinsUniversity,Baltimore,Maryland,USA.7SidneyKimmelComprehensiveCancerCenter,JohnsHopkinsSchoolofMedicine,Baltimore,Maryland,USA.CorrespondenceshouldbeaddressedtoJ.J.G.([email protected]).*Equalcontributionauthors
ABSTRACT
WedescribeRosetta-basedcomputationalprotocolsforpredictingthethree-dimensionalstructureofanantibodyfromsequenceandthendockingtheantibody–antigencomplexes.Antibodymodelingleveragescanonicalloopconformationstograftlargesegmentsfromexperimentally-determinedstructuresaswellas(1)energeticcalculationstominimizeloops,(2)dockingmethodologytorefinetheVL–VHrelativeorientation,and(3)denovopredictionoftheelusiveCDRH3loop.Toalleviatemodeluncertainty,antibody–antigendockingresamplesCDRloopconformationsandcanusemultiplemodelstorepresentanensembleofconformationsfortheantibody,theantigenorboth.Theseprotocolscanberunfully-automatedviatheROSIEwebserverormanuallyonacomputerwithusercontrolofindividualsteps.Forbestresults,theprotocolrequiresroughly2,500CPU-hoursforantibodymodelingand250CPU-hoursforantibody–antigendocking.Bothtaskscanbecompletedinunderadaybyusingpublicsupercomputers.
INTRODUCTION
ThevertebrateadaptiveimmunesystemiscapableofpromotingcellstodegranulateorphagocytosenearlyanyforeignpathogenbyproducingimmunoglobulinG(IgG)proteins(antibodies)thatrecognizeaspecificregion(epitope)ofapathogenicmolecule(antigen).Theabilitytobinddiverseantigensrequiresadiversepopulationofantibodies,whichisachievedthroughcomplexprocessesinbonemarrowandlymphatictissues,namelyV(D)Jrecombinationandsomatichypermutation.Thediversityofantibodiesisastonishing;thesizeofthetheoreticalnaïveantibodyrepertoireisestimatedtobe>1013inhumans1.Inadditiontotheirbiologicalimportance,antibodiesareroutinelyusedinbiotechnologyasprobesanddiagnostics,andtherearedozensofantibodiesapprovedastherapeutics2.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
2
Next-generationsequencingtechniqueshaveenabledrapiddeterminationoflargenumbersofantibodysequences1.Alimitationoftheseapproachesisthatnoinformationaboutthespecificatomiccontactsbetweentheantibodyandantigencanbegleanedfromthesedatasets.Atomicdetailisrequiredtoconsiderspecificantibody–antigeninteractions,forexample,inordertodeveloptherapeuticantibodiesorvaccinesthataremimeticsofextremelyinfectiousantigens3.Althoughthereareexperimentalmethodscapableofgeneratingstructuralmodelsinatomicdetail(X-raycrystallography,NMR,neutrondiffraction,cryo-EM),notallproteinstructurescanbedeterminedwiththesemethods,andlimitedresourcesmakeitimpossibletodeterminethestructuresofallofthesequencesidentifiedinhigh-throughputsequencingexperiments.Tobridgethesequence–structuregap,onemustemploycomputationalstructurepredictionmethods.Perhapsmoreimportantly,structurepredictionmethodsareusefulindiagnosticsanddrugdiscoverytodefineepitopesandhelpinferbiologicalortherapeuticmechanisms.
Thefunctionofanantibodyarisesfromitsthree-dimensionalstructure.TheIgGisoform,themostcommontypeofnaturallyoccurringantibodies,consistsoftwoidenticalsetsofheavyandlightchainsarrangedintoa“Y”shape,withthefourpolypeptidechainsjoinedbydisulfidelinkages.Theheavychaincontainsfourdomains,threeadjacentconstantdomains(CH1,CH2,CH3)andonevariabledomain(VH),andthelightchainconsistsofasingleconstantdomain(CL)andavariabledomain(VL).TheCH1andVHdomainsinteractwiththeCLandVLdomainstoformtheantigen-bindingfragment(Fab)orthe“arms”oftheY.WithintheFab,bothvariabledomainsaredirectedawayfromtheremainingheavychainconstantdomainsandmakeupthevariablefragment(FV).AtthetipoftheFVarethreecomplementaritydeterminingregion(CDR)loopsoneachchain(CDRL1–3andCDRH1–3)thatformtheregionoftheantibody,calledtheparatope,thatrecognizesitstarget.ThisFvstructureiscommontootherantibodyisoforms(IgA,IgE,etc.).
Antibodyhomologymodeling
TheFVisthefocalpointoftherecombinationandhypermutationevents;assuch,theprimarydifferenceamongantibodiesistheconformation,structuralcontext,andchemicalidentityoftheirCDRloops.Forthisreason,antibodystructurepredictionmethodsfocusonmodelingtheFV.TheFVcanbesplitintotworegions:frameworkregions,andCDRloops.Theframeworkregionshaveahighdegreeofstructuralconservation,makingitpossibletogenerateaccuratemodelsofframeworkregionsfromtemplatestructures.
Similarly,analysisofantibodycrystalstructureshasrevealedthatfiveofthesixCDRloops(CDRL1–3,H1,H2)adoptalimitednumberofdistinctstructures,referredtoascanonicalloopconformations4.ThecanonicalconformationofaparticularCDRloopcantypicallybeidentifiedfromitslengthandsequence.Liketheframeworkregions,theCDRsL1–3,H1,andH2arealsomodeledusingtemplatestructures.
TheremainingCDRloop,H3,doesnotadoptcanonicalconformationsandmustbemodeleddenovo.Additionally,theH3loopliesattheinterfaceofthetwodomains(VHandVL)andcaninteractwithresiduesoneitherchain.Toaccountfortheseinteractionsaswellastheoverallgeometryoftheparatope,theVL–VHorientationisoptimizedduringH3modeling.Accurately
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
3
modelingCDRH3andtheVL–VHorientationaretypicallythemostchallengingaspectsofantibodystructureprediction.
Protein–proteindocking
Whileaccuratepredictionsofunboundantibodystructuresareinformative,theyarevoidofanimportantbiologicalcontext:theantibody–antigen(Ab–Ag)interaction.High-resolutionstructuresofAb–Agcomplexesgiveinsighttothemolecularmechanismbywhichantibodiesfunction,anecessityforrationaldesignofvaccinesorantibodytherapeutics.StructuresofAb–Agcomplexescanbedeterminedthroughexperimentalmethods,however,justaswithunboundantibodies,thesemethodsarelimitedbytheirthroughputandexpenseandarenotviableforallproteins.Whenexperimentalmethodscannotbeusedtodeterminecomplexstructures,computationalprotein–proteininterfaceprediction(docking)providesanalternativeapproach.
Ingeneral,computationaldockingapproachesstrivetosampleallpossibleinteractionsbetweentwoproteinstodiscernthebiologically-relevantinteraction.Predictingaprotein–proteininteractiondenovoischallengingduetothesheernumberofpossibledockedconformations.However,thesamplespacecanbemadetractablewithinformationabouttheinteraction.InthecaseofAb–Aginteractions,thesearchspaceislimitedbecausetheantibodyparatope,comprisedofthesixCDRloops,isthebindingsiteforthecognateantigenepitope.
TheRosettaSnugDockalgorithmleveragestheinformationabouttheflexibleand/oruncertainregionsoftheantibodytoperformrobustAb–Agdocking5.SnugDocksimulatestheinduced-fitmechanismthroughsimultaneousoptimizationofseveraldegreesoffreedom.Itperformsrigid-bodydockingofthemulti-body(VL–VH)–Agcomplex,aswellasre-modelingoftheCDRH2andH3loops,thelatterofwhichtypicallycontributesapluralityofatomiccontactstotheAb–Aginteraction.SnugDockcanalsosimulateconformerselectionbyswappingeithertheantibodyortheantigenwithanothermemberofapre-generatedstructuralensemble.BecauseSnugDocksamplesmostoftheconformationspaceavailabletoantibodyparatopes,itcanrefineantibodyhomologymodelswithinaccuraciesinthedifficult-to-predictVL–VHorientationandCDRH3loop.
Whendockinghomologymodels,itisbestifthereisexperimentalevidencetosuggestthegenerallocationoftheepitope(within~10Å,approximatelythecorrectsideoftheantigendomain),andinthisprotocolpaper,wedescribethelocaldockingprocedureindetail.Ifnoinformationisavailableabouttheepitope,thereareseveralprogramsthatperformglobaldockingorepitopeprediction6.Inparticular,therearetwofast-Fouriertransform(FFT)rigid-bodydockingapproachesthatimplementantibody-specificenergypotentials:PIPER7withtheantibody-ADARSpotential8,andZDOCK9withtheAntibodyi-Patchpotential10.FFTrigid-bodyapproachesarefast,buttheycannotaccountforantibodymotionsuponantigenbindingorcompensateforerrorsintheinitialhomologymodel;SnugDockistheonlyflexible-backboneantibodydockingmethod.Itcanprovideaglobal-antigendockingalternativebutitisslowerand,likeothers,canproducefalse-positiveepitopepredictions5.Forlocaldocking,SnugDockhasbeendemonstratedtoproducehigh-qualitymodelswhenusinganantibodyhomology
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
4
modelorcrystalstructureandtheunboundantigencrystalstructureasinput5.Inaddition,SnugDockapproachesusedintheCAPRIblinddockingchallenge11producedthebeststructureamongallpredictorsforaflexible-looptarget.CAPRIusesstar-basedrankings(***=highquality,**=medium,*=acceptable,0=incorrect)12.ExaminingthehighestattainedCAPRIqualityamongthetenlowest-scoringdockedmodels(startingwithahomologymodeledantibody),SnugDockcurrentlyproduces1***,10**,and4*modelsinatestsetof15antibody-antigentargets(Table1).TheseperformancedataareimprovedsincetheoriginalSnugDockpublication5duetoupdatesintheenergyfunction13andaswitchtothekinematicloopclosure(KIC)loopmodelingmethod.14–16
Protocoloverview
Theprotocoldescribedinthispaperenablesausertogenerateastructuralmodelofanantibodyfromitssequenceandastructuralmodelofanantibody–antigencomplexfromstructuresoftheantibodyanditsantigen(Fig.1).
Protocoloverview:Antibodyhomologymodeling(steps1–8)
GeneratingastructuralmodelofanantibodyfromsequenceinRosettaAntibodyuseshomologymodelingtechniques,thatis,itusessegmentsfromknownstructureswithsimilarsequences.Asdescribedindetailbelow,theinputsequenceissplitintoseveralcomponents.Foreachcomponent,RosettaAntibodysearchesacurateddatabaseofknownstructuresfortheclosestmatchbysequenceandthenassemblesthosestructuralsegmentsintoamodel.ThatmodelisthenusedastheinputforthenextstageinwhichtheCDRH3loopismodeledandtheVL–VHorientationisoptimized.
Numberingtheresiduesinthesequence
TheRosettaAntibodyprotocolidentifiestheCDRsoftheinputantibodysequencethroughregularexpressionmatchingtotheKabatCDRdefinition17,anditnumberstheantibodyresiduesaccordingtotheChothiascheme4.
Templateselection
Foreachstructuralcomponentconsidered(FRL,FRH,CDRsL1–3,H1–3),templatesareselectedbymaximumsequencesimilarityusingaBLAST-basedmethodwithcustomdatabasesconstructedfromhigh-qualitystructuresinthePDB.CanonicalCDRconformationsarebasedonlength,soweuseseparatedatabasesforeachloop–lengthcombination.Forexample,ten-residueH1loopsandeleven-residueH1loopsareseparateBLAST-formatteddatabases.
TheresultsforeachstructuralcomponentaresortedbyBLASTbitscore,andthesequencewithbestscoreisselectedasthetemplate.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
5
InitialVH–VLorientations
TheinitialVL–VHorientationisselectedinmuchthesameway,withtheexceptionthattenVL–VHtemplatesareselectedratherthanasingleone18.Startingfromthelistofallpossibletemplatesorderedbybitscore,thebestmatchisselectedasthefirsttemplate.TodiversifytheinitialVL–VHorientations,alltemplateswithsimilarVL–VHorientations(0.5OCD,seeMarze&Gray18)tothistemplateareprunedfromthelist.Thebestmatchremaininginthelistisselectedasthesecondtemplate,andcandidatetemplatessimilartothesecondtemplatearenowremovedfromthelist.Thiswinnowingisrepeatedtocreatetendistincttemplates.OnegraftedmodelwillbecreatedfromeachoftheseteninitialVL–VHorientations.
GraftingCDRtemplates
OncetheinitialVL–VHorientationsareset,theCDRtemplatesaregraftedontoeachframeworkregionbysuperposingthetwooverlappingresiduesoneithersideoftheloopwiththeircorrespondingresiduesontheframeworkregions.ThegraftpointsarethenadjustedusingCyclicCoordinateDescent(CCD)19,20topreventunphysicalbondlengthsandanglesfrombeingincorporatedintothemodel.Finally,thestructureisrelaxed21,22viaiterationsofside-chainoptimizationandgradient-basedminimizationwhileconstrainingthebackboneandside-chainheavyatomstofindanative-likeconformationatalocalenergyminimuminRosetta’sscorefunction.
All-atomrefinementofCDRH3andtheVL–VHorientation
Thegraftedmodelsarecrudeandmustberefined,particularlyintheCDRH3loopandtheVL–VHorientation.TheH3loopisfirstcompletelyremodeledinthecontextoftheantibodyframeworkusingthenext-generationKIC(NGK)loopmodelingprotocol16.Forspeed,theH3loopsidechainsareeachreducedtoasinglelow-resolutionpseudo-atom,andtoensuresamplingoftheC-terminalkinkconformation,atomicconstraintsareappliedtothegoverningscorefunction23.Forsubsequenthigh-resolutionrefinement,theall-atomCDRH3sidechainsarerecovered,allCDRsidechainsarerepacked,andtheCDRsidechainsandbackbonesareminimized.TheVLandtheVHdomainsarere-dockedwitharigid-backboneRosettaDockprotocol24,25toremoveanyclashescreatedbythenewH3conformation,andtheantibodysidechainsareagainrepacked.UsingNGK,H3isrefinedagaininthecontextoftheupdatedVL–VHorientation.TheCDRsarepackedandminimizedagain,andthemodelissavedasacandidatestructure,ordecoy.Thefirstgraftedmodelisusedasthestartingpointfor1,000refinedmodels,ordecoys,andtheothergraftedmodelsareeachusedasthestartingpointfor200decoys,foratotalof2,800decoys.ThedecoysaresortedbyRosettascore,andthelowest-scoringonesaregivenasthefinalmodels.
Protocoloverview:antibody–antigendocking(steps11–16)
ComputationaldockingcanbeusedtogeneratemodelsofAb–Agcomplexes.Ingeneral,dockingentails(1)roughlyidentifying(within8Å)theinteractinginterfacethrougheither
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
6
experimentorglobaldockingand(2)refiningtheinitialmodelthroughlocaldocking.BelowwedescribelocaldockingwithSnugDockindetail.
Generatingthestartingmodel
SnugDockrequires,asaninput,aputativeAb–Agcomplexthatcontainsareasonableinterface26.Thecomplexcanbecomposedofsinglestructuresorsetsofstructures(ensembles,seeBox1).Theinterfacedefinesthelocalsearch,betweentheantibodyCDRsandtheantigen.InitialmodelsareoftenbasedonexperimentalresultsthatidentifyinteractingresiduesattheAb–Aginterface,suchasmutagenesisorchemicalcrosslinkingassays.Intheabsenceofexperimentalresults,aglobaldockingapproachsuchasZDOCK/iPatch10orPIPER/ADARS8cangenerateputativecomplexesforrefinement.GlobaldockingcanalsobeachievedwithSnugDock,albeitatahighercomputationalexpense.
AntigenorantibodystructuresthathavenotbeengeneratedbyaRosettaprotocolneedtoberefinedbeforebeingplacedincontact.Refinement,commonlyreferredtoastheRelaxprotocol21,22,entailsiterationsofside-chainoptimizationandgradient-basedminimizationinRosetta’sscorefunction.TheRelaxprotocolsampleslocalconformationalspacearoundthestartingstructuretoidentifyanenergeticminimuminthescorefunction.Throughthisprocess,Rosetta-identifiednon-idealities(suchasvanderWaalsbumps)areabated.Oncethepartnershavebeenrefined,aputativecomplexcanbeassembledandprepacked.Prepackingoptimizesside-chainconformationstopreventbiasingtowardtheinputcomplexmodel’sside-chainconformations,ensuringuniformscoringofallpotentialboundcomplexstates.
Performingdocking
SnugDockiterativelyperformsmulti-bodydockingofboththeAb–AgandVL–VHorientationsandremodelingoftheH2andH3CDRloops.Priortodocking,theprepackedstartingAb–Agcomplexissubjecttothreerigid-bodyperturbations:(1)arandomizedrotationabouttheAb–Agprimaryaxis,(2)asmall-magnituderandomtranslation,and(3)asmall-magnituderandomrotation.Dockingoperatesintwophases:low-resolutionmode,wheresidechainsarerepresentedbyasinglepseudoatomlocatedatthecentroidoftheside-chainheavyatoms,andhigh-resolutionmode,whereallproteinatomsareexplicit.Low-resolutionmodeconsistsoftwotypesofinterspersedMonteCarlomoves:rigid-bodyAb–Agtranslationandrotation,andbackboneensembleconformerswaps.Additionally,attheendoflow-resolutionmode,theH2&H3loopsarerefined.High-resolutionmodeconsistsofa50-stepMonteCarlotrajectorywhereeachmoveisselectedfromasetoffivepossiblemoves:rigidbodyAb–Agdocking(40%),rigidbodyVL–VHdocking(40%),CDRminimization(10%),H2looprefinement(5%),andH3looprefinement(5%),wherethepercentagesindicatetheprobabilitiesofselectingeachmove.Eachtrajectoryresultsinonedecoy.Typically,SnugDockisusedtogenerateatotalof1,000decoys,withthelow-scoringdecoysmostlikelytobenearthenativeconformation.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
7
Incorporatingexperimentaldataintothesimulation
TwomaintypesofexperimentaldatathatinformtheAb–AgbindingmodecanbeincorporatedintoSnugDock.First,knowledgeaboutspecificresiduesorpairsofresiduesthatinteractacrosstheinterfacecanbeusedtoguidedocking.Thisinformationcould,forexample,bederivedfromalaninescanningorothermutagenesisexperiments.Second,knowledgeabouttheepitopeandtheoverallAb-Agorientationcanbeincorporated.Bindingpatchdatamaybederivedfromdifferentexperiments,includinghydrogen/deuteriumexchangeorchemicalcrosslinkingofthebindingpartnerswithsubsequentanalysisbymassspectrometry.Othermethodsforepitopemappingmayalsobesuitable.
Dependingonthetypeofexperimentaldataavailable,therearedifferentwaysofincorporatingitintothedockingsimulation.High-confidenceresidue–residueinteractionscanbepreservedwiththeuseofatompairconstraints.Less-specificandpoorly-characterizedinteractions(hydrophobicpockets,ambiguousH-bonds)canbelooselyconstrainedwithambiguousandsiteconstraints.PredictedepitopesandbindingpatchescanbesampledbyproperlyplacingtheSnugDockinputstructureandadjustingthesizeoftheinitialstartingmove.Forfurtherinformationonincorporatingexperimentalconstriants,seetheRosettadocumentation27.
Caveats,challengesandpitfalls
Thereareseveralcaveatsassociatedwithcomputationalmodelingofantibodiesanddockingofantibodiesandantigens.Keepingthesecaveatsinmind,theusershouldcriticallyassesseachprediction(seeBox2).RosettaAntibodyisahomologymodelingapproachandcanbehamperedbytemplateavailability.Forexample,challengingtargetsincludeheavilyengineeredantibodiesorantibodiesderivedfromaspeciesthatdiversifiesitsantibodiesthroughgeneconversion,suchaschickensorrabbits.ErrorsintheFRandCDRL1–3,H1,H2loopsaretypicallysmall(nogreaterthan1ÅbackboneRMSDtonative)28.TheVL–VHorientation,correctlycapturedbyRosettaAntibodyin43of46benchmarkantibodytargets18.TheCDRH3loop,ontheotherhand,ismodeleddenovo,andloopmodelqualitydecreaseswithlooplength.IntheKICloopbenchmark,16,29loopsof12–17residuesaremodeledtonear1ÅbackboneRMSDrelativetothenativestructure—theaveragehumanCDRH3fallswithinthatrangewithanaveragelengthof15residues(IMGTdefinition)30.However,thebenchmarkismeasuredbymodelingloopsoncrystallographicframeworks,whereasinablindcontext,CDRH3loopsaremodeledonahomologyframeworks,whichintroducesuncertaintyintheloopenvironment.Nevertheless,inarecentassessment23RosettaAntibodyproducedmodelswithCDRH3loopswithin1.59ÅbackboneRMSDtonativeandsub-angstromaccuracyinallotherregions.
WhileSnugDockexplicitlysamplestheCDRH2orH3loopconformationandVL–VHorientationtoaccountformodeluncertaintyintroducedduringhomologymodelingandtosampletheregionsmostlikelytoundergoconformationalchangesuponantigenbinding,itdoesnotexplicitlysamplebackbonedegreesoffreedomoftheantigenorofnon-CDR-H2orH3regionsoftheantibody.Thus,iftheunboundandboundconformationsdiffersubstantiallyorifthehomologymodelsarepoor,itcouldbedifficultorimpossibletomodelthedockedcomplex
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
8
accurately31.Despitethiscomplication,SnugDockhassuccessfullypredictedAb–Agcomplexesfromhomologymodels5.
Availability
RosettaAntibodyandSnugDockcanberunviaapublicwebserver(http://rosie.rosettacommons.org),pythonbindings(PyRosetta,http://www.pyrosetta.org)andthroughlocalinstallationsofRosetta.RosettaisdistributedassourcecodeandlicensesareavailablefromtheRosettaCommons(http://www.rosettacommons.org)freeofchargeforacademicandnon-profitusers.RosettacanbeinstalledonUNIX-likeoperatingsystems(includingMacOSX).
MATERIALS
EQUIPMENT
Homologymodelingdata
• Primaryamino-acidsequenceofthevariabledomainofthelightandheavychains.
Dockingdata
• PDB-formattedfileoftheantigenstructure.• PDB-formattedfileoftheantibodystructure,fromthehomologymodelingoutput.• Bothofthesecanbesinglestructuresoranensembleofstructures.
SoftwareforrunningsimulationsviaROSIEwebserver
• Modernwebbrowser
Hardwareforrunningsimulationsmanually(optional)
• Workstationwithmulti-coreCPU(s) runningaPOSIXcompliantoperatingsystem(e.g.,GNU/Linux,OSX)
OR
• a Linux-based cluster. Several public facilities are available. For example, the U.S.National Science Foundation’s provides clusters like Stampede through the ExtremeScience and Engineering Discovery Environment (XSEDE, www.xsede.org). In Europe,thePartnershipforAdvancedComputinginEurope(PRACE,www.prace-ri.eu)providesaccess to clusters like JUQUEEN. Resources like the Norwegian Metacenter forComputational Science (Notur, www.notur.no) or Japan’s supercomputer facilities ofNational Institute ofGenetics (sc.ddbj.nig.ac.jp) andofHumanGenomeCenter at theUniversityofTokyo(hgc.jp)arealsosuitable.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
9
Softwareforrunningsimulationslocally(optional)
• TheRosettasoftwaresuite,availableatwww.rosettacommons.org/softwareo Compilationinstructionsavailableatwww.rosettacommons.org/build.
?TROUBLESHOOTINGo Supportforanyissuesencounteredthatarenotcoveredinthismanuscriptcan
beaddressedontheRosettauserforums:www.rosettacommons.org/forum• BLAST+(version2.2.28orlater),availableat
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/• Texteditor(e.g.,vim,emacs,nano)• Optional:Python(www.python.org)orR(www.r-project.org)foranalyzingresults• Optional:Amolecularvisualizationpackageforviewingresultsandcustomizingstarting
structuresfordocking.RecommendedpackagesincludePyMOL(www.pymol.org)32,UCSFChimera(www.cgl.ucsf.edu/chimera)33,andKinemage(kinemage.biochem.duke.edu)34
PROCEDURE
Thesimplestwaytocreateantibodyandantibody-antigencomplexstructuresisthroughtheuseoftheROSIEwebserver(rosie.rosettacommons.org)35.OnROSIE,theAntibodyappusestheinputantibodysequencetogenerateahomologymodel,andtheSnugDockappusestheantibodymodel(s)andanantigenstructuresfordocking.Bothoperationsareentirelyautomatedwithaminimumofuserinput.
Forgreatercontroloftheoperation,wedescribebelowthestepstoruntheprotocolsmanually,includingthekeypointsforcheckingintermediatedataandinterveningwithalternatechoices.Userswithstructuresoftheunboundantibodyandantigencanskiptodockingstage(step11).
I.AntibodyHomologyModeling
ConstructionofagraftedFvmodelTIMING75-90minutes
1. Setupyourterminal.AfterinstallingBLAST+andRosetta(seeMaterials),launchaninteractiveterminal(e.g.,TerminalonmacorxtermonLinux)andsetpathvariablestotheexecutableprogramsneededasfollows(bashsyntax):
export ROSETTA=~/Rosetta export ROSETTA3_DB=$ROSETTA/main/database export ROSETTA_BIN=$ROSETTA/main/source/bin export PATH=$PATH:$ROSETTA_BIN
Inthefirstlineabove,replace“~”withtheparentdirectorywhereyouinstalledRosettaonyourmachine.Similarly,besurethePATHvariableincludestheblastpprogram(e.g.export PATH=$PATH:/path/to/blastpwhere/path/to/blastpisreplacedwiththedirectorycontainingtheblastpexecutable.Thesepathsettingsmaybeaddedtoa
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
10
configurationfilesuchas.bashrcsotheyareautomaticallyseteachtimeaterminalisopen(loggedinto).
2. Createaworkingdirectoryandnavigatetoit:
mkdir /path/to/my_dir cd /path/to/my_dir
3. Obtaintheaminoacidsequencesforthevariabledomainofyourantibody(lightchainandheavychain)andsavetheminFASTAformat(inyourworkingdirectory)withtheheavyandlightchainsnotedinthecommentlines,asfollows:
> heavy VKLEESGGGLVQPGGSMKLSCATSGFRFADYWMDWVRQSPEKGLEWVAEIRNKANNHATYYAESVKGRFTISRDDSKRRVYLQMNTLRAEDTGIYYCTLIAYBYPWFAYWGQGTLVTVS > light DVVMTQTPLSLPVSLGNQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPFTFGSGTKLEIKR
4. UseRosetta’sgraftingapplicationtofindsuitabletemplatesandgraftthemtogethertoobtainacrudemodeloftheantibody.Executetheapplicationwiththelinebelow.
antibody.macosclangrelease \ -fasta antibody_chains.fasta | tee grafting.log
Theapplicationwilloutputadirectorycalledgrafting.ThePDB-formattedfilesnamedmodel-0.relaxed.pdb,model-1.relaxed.pdb,…,model-9.relaxed.pdbwillbeyourinputfortheH3modeling.The“| tee grafting.log”partofthecommandrecordsalltheprogramoutputinthefilegrafting.logforlaterreview.The“\”permitsthecommandtobespreadacrossmultiplelinesratherthanjustone.
?TROUBLESHOOTING
(Optional)CheckgraftedtemplatestructuresTIMING:10minutes–2hours
5. AssigntheCDRloopsinyourmodelstotheCDRloopclustersdescribedbyNorthetal.36andcheckwhetherthechosentemplatesaresuitable.
Runtheclusteridentificationapplicationasfollows:
identify_cdr_clusters.macosclangrelease \ –s grafting/model-*.relaxed.pdb \ –out:file:score_only north_clusters.log
Northetal.clusteredallCDRloopstructuresbytheirbackbonedihedralanglesandnamedthembyCDRtype,looplengthandclustersize(e.g.“H1-13-10”isthe10thmostcommonconformationfor13-residueH1loops).Occasionally,RosettachoosestemplatesthatarerareorinconsistentwiththesequencepreferencesobservedbyNorthetal.Forexample,ifRosettarecommendstheH1-13-10cluster,theusermight
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
11
alsoconsidertheH1-13-1cluster.Tables3-7ofNorthetal.presentconsensussequencesforeachclusterthatcaninformthisdecision.Loopsandclusterswithprolineresiduesarealsoworthamanualexamination.SeveralclustersofNorthetal.arecontingentonthepresenceofprolinesinparticularlocations(e.g.L3-9-cis7-1hasacis-prolineatposition7).BecauseRosettaAntibodyreliesonBLASTtochooselooptemplates,occasionallyaloopfromanuncommonnon-cis-prolinecluster(e.g.L3-9-2)ischosen.Insuchcasesitisbesttomanuallyselectalooptemplatefromthewell-populatedcis-prolinecluster.
6. If desired, rerun grafting to replace a template with one from a manually-specifiedsourcestructure.Usetheantibodycommandlineasabovewithanextraflagtospecifyatemplate.Followthebelowexample:toforceRosettatousetheCDRH1loopfromthePDB1RZI as the template in themodel, add the flag–antibody:h1_template 1rzi.Selecttemplatesforotherregionsaccordingly:
antibody.macosclangrelease \ -fasta antibody_chains.fasta \ -antibody:h1_template 1rzi | tee graft.log
Flag region-antibody:l1_template -antibody:l2_template -antibody:l3_template
lightchainCDRloops
-antibody:h1_template -antibody:h2_template –antibody:h3_template
heavychainCDRloops
-antibody:light_heavy_template -antibody:n_multi_templates 1
VL–VHorientation
-antibody:frl_template -antibody:frh_template
Frameworkregionofthelightorheavychain
H3modelingTIMING1hourto4days
7. Copy the set of standard H3 modeling flags to your working directory and create adirectoryfortheH3modelingoutput:
cp $ROSETTA/tools/antibody/abH3.flags . mkdir H3_modeling
8. Run Rosetta’s antibody_H3 application on the 10 models generated during grafting.This step requires 2,500CPUhours and is oftenperformed in parallel on a computercluster(seeBox3).ForaMacworkstation,usethefollowingcommandline:
$ROSETTA_BIN/antibody_H3.macosclangrelease \ @abH3.flags \ -s grafting/model-0.relaxed.pdb \ -nstruct 1000 \ -antibody:auto_generate_kink_constraint \
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
12
-antibody:all_atom_mode_kink_constraint \ -multiple_processes_writing_to_one_directory \ -out:file:scorefile H3_modeling_scores.fasc \ -out:path:pdb H3_modeling > h3_modeling-0.log 2>&1 &
• -sspecifiestheinputfile(oneofthegraftedmodelsgeneratedinstep2).• -nstruct specifies the number of structures generated, which should be 1000 for
model.0.pdband200eachforallothergraftedmodels.
TheexpectedoutputisthespecifiednumberofPDBfilesaswellasascorefilenamedH3_modeling_scores.fasc.AllthesefileswillappearinanoutputdirectorynamedH3_modeling/.
Totriviallyruninparallel,simplyrepeatedlyexecutetheabovecommand(changinginputmodels,numberofstructures,andtheoutputlogasyouwish).Eachtimethecommandisexecuted,anantibody_H3processisruninthebackground.
!Caution:Generatingthe2,800antibodystructurestakesapproximately2,500CPUhours.Running24processesinparallel,onamodern24-CPUworkstation,expect~4daysofruntime.Distributingtheworkovernodesonasupercomputercanreducethistimetohours(seeMaterials).
(Optional)CheckVL–VHorientationTIMING:5min
9. Check whether the VL–VH orientations of the antibody models are close to theorientationsobservedinantibodycrystalstructuresfoundinthePDB.Runthepythonscriptplot_LHOC.pyusingthefollowingcommandline:
python $ROSETTA/main/source/scripts/python/public/plot_VL_VH_orientational_coordinates/plot_LHOC.py
Thisscriptwillcreateasubfolder(lhoc_analyis)withseparateplotsforeachofthefourLHOCmetrics.EachplotshowsthenativedistributionofVL–VHorientations(grey),theorientationssampledbyRosetta(blackline)aswellasthetop10models(labeleddiamonds)andthe10differenttemplatestructuresgeneratedduringstep2(dots).Antibodymodelsthatareoutsidethenativedistributionsareunlikelytobecorrect.
ChoosefinalantibodymodelsTIMING10min
10. Choose 10of the antibodymodels as an ensemble for docking. The following criteriamaybeuseful toconsiderasdockingwithensemblesaimsto increaseconformationaldiversityandsampling:
a. Selectmodelswiththelowesttotalscore–thesearepurportedlynative-likeb. Select models with natural VL–VH orientation, falling within the observed
distribution(grey).c. Selectmodelsderivedfromdifferenttemplatestomaintaindiversity.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
13
Ifalltentop-scoringmodelsareoutsidethenativedistribution,considerreturningtostep6andmanuallyselectnewtemplatesfortherelativeorientationoftheVLandVHchainsbyusingthe-antibody:light_heavy_template flag(e.g.,antibody.macosclangrelease -antibody:light_heavy_template 1ABC)
II.Antibody-AntigenDockingTIMING1hour
11. Preparetheantigenandantibodyfordocking.Formatyourantigen(andantibodyifyouarenotusingahomologymodelproducedbyRosettaAntibody)PDB file so it canbereadbyRosetta.Runthefollowingscript:
$ROSETTA/tools/protein_tools/scripts/clean_pdb.py antigen.pdb C
Where antigen.pdb is a PDB file of your antigen and C is the one-letter chainidentifier(s)fortheantigenchain(s)inthePDBfile.
(Optional)RefineantibodyinRosetta’sscorefunctionTIMING10min
12. If you are not using an antibody model produced by Rosetta, you must refine theantibodystructurebyrunningtherelaxapplication.Thecommandlineis:
relax.macosclangrelease \ -s antibody.pdb \ -relax:constrain_relax_to_start_coords \ -relax:ramp_constraints false \ -ex1 \ -ex2 \ -use_input_sc \ -flip_HNQ \ -no_optH false
Youmayalsowishtogenerateanensembleofantibodystructures,seeBox2.
PrepackingTIMING10min
13. GenerateaPDBfilethatcontainsbothyourantibodyandyourantigeninthefollowingorder:lightchainofyourantibody(L),heavychainofyourantibody(H),andantigen(A).ThereareseveralwaystocreateandmodifyaPDBfile.Forexample,withPyMOL:
a. LoadtheantibodyinaPyMOLsession.i. IfitisamodelfromRosettaAntibody,thechainswillalreadybelabeled
asHandL.Otherwise,usethealtercommandtochangethechainIDofaselection:
alter chain A, chain=’H’ alter chain B, chain=’L’
b. LoadtheantigenintothesamePyMOLsession.ChangetheantigenchainIDinasimilar fashion.!Warning: if antigen chains share an IDwith the antibody, youwill have tobe
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
14
more specific with your selections (e.g., alter chain H and antigen,
chain=’A’).c. Reorient the antibody and antigen using the RotO, MovO, and MvOZ editing
commands. Alternatively, one can also use the translate command (i.e.translate [x,y,z], selection). If you know an approximate bindinglocation,adjusttheorientationaccordingly.
d. SavebothobjectsinthesamePDBfile:
save antibody_antigen_start.pdb, chains L+H+A
14. Toensurelow-energystartingside-chainconformations,prepackthemonomers:
docking_prepack_protocol.macosclangrelease \ -in:file:s antibody_antigen_start.pdb \ -ex1 \ -ex2 \ -partners LH_A \ -ensemble1 antibody_ensemble.list \ -ensemble2 antigen_ensemble.list \ -docking:dock_rtmin
antibody_ensemble.listisatextfilethatcontainsfilenameswithabsolutepathstothetenantibodymodelsselectedafterantibodymodeling.Inthecasethatyouhaveasinglecrystalstructure,youcanomitthe–ensemble1flag.
Ifantigenflexibilityisexpected,afamilyofstructurescanbecreatedwithotherRosettaapplications(seeBox1).Thetextfileantigen_ensemble.listwillcontainthefilenamesofyourantigen(usingabsolutepaths).NMRstartingstructuresmustbesplit(i.e.eachmodelshouldbeinitsownPDBfile).Touseasingleantigenstructure,omitthe–ensemble2flag.
DockingTIMING15min
15. Docktheantibodytotheantigen.Asinstep8,thisisanexpensivecomputationalstepand you have the option of running a single process, multiple processes on onemachine,orsplittingthejobacrossprocessorsonasupercomputer(seeBox3).Usingtheexecutable foranMPI-basedcomputingclusterasanexample, thecommand linefordockingis:
snugdock.mpi.linuxgccrelease \ -s prepack/antibody_antigen_start.prepack.pdb \ -ensemble1 antibody_ensemble.list \ -ensemble2 antigen_ensemble.list \ -antibody:auto_generate_kink_constraint \ -antibody:all_atom_mode_kink_constraint -nstruct 1000
?TROUBLESHOOTING
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
15
TIMING
Herewereportthetimetogenerateasingle,dockedmodelfromantibodysequenceandantigencrystalstructure.Typically,however,thousandsofmodelsaregenerated,soweindicateinparenthesesthetimingforthefull,recommendedsimulations.Thesetimeestimateswerecomputedona2x2.4GHzQuad-CoreIntelXeonprocessor;timingwillvaryforothercomputerconfigurations.
Step HumanTime CPUTimeperModel TotalCPUTime(1–4)ConstructionofgraftedFvmodels 5 min 20 min 200 min
(5)Checkgraftedmodels 10 min 1 min 10 min (7–8)H3modeling 5 min 50 min 2500 hrs
(9)CheckVL–VHorientation 5 min 1 min 10 min (10)Choosemodels 10 min 5 min 5 min
(11)Prepareantibodyandantigenfordocking 5 min 15 min 15 min (12)RefineantibodyinRosetta’sscorefunction 5 min 20 min 20 min
(13–14)Prepacking 5 min 10 min 10 min (15)Docking 5 min 15 min 250 hrs
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
16
TROUBLESHOOTING
STEP PROBLEM POSSIBLEREASON SOLUTION
0 Rosettadoesnotcompile. Likelytoberelatedtothespecificcomputeroperatingsystemandconfiguration
SeekhelpontheRosettaforums,www.rosettacommons.org/forum
4 RosettaAntibodyencounterserror“sh: blastp: command not found”
Theblastpexecutableisnotinstalledornotininyour$PATH
Onthecommandline,try‘which blastp’tocheckifyoursystemhasitinstalled.Ifneeded,downloadandinstallBLASTor/andaddblastptoyourPATH(export PATH=$PATH:/path/to/blastp/).Youcanalsospecifythepathusingthecommandlineflag-antibody:blastp /my/path
4 RosettaAntibodyencountersencounters“BLAST Database error”
Theblastpdatabaseisnotspecified,andRosettaAntibodyisnotfindingitinthedefaultlocation($ROSETTA/tools/antibody/blast_database/)
Specifythegraftingdatabaselocationwith-antibody:grafting_database /database/location
4 RosettaAntibodyproducesBLASToutput(e.g.grafting/orientation.align)butdoesnotproducestructuralmodels(e.g.model.0.pdb)
YourversionofBLAST+maybeoutofdate.
DownloadacompatibleversionofBLAST+(version2.2.28orlater).SeeMaterialssection.
4 RegularexpressionfailureforCDRidentification
MutationsinregionsofthechainthatRosettaexpectstobeconservedpreventthesequencefrombeingsplitintostructuralsegmentscorrectly.
Checkyourantibodysequenceagainstotherknownantibodysequences,focusingontheregiongivenintheerrormessage.SeekhelpontheRosettaforums,www.rosettacommons.org/forum.
15 SnugDockreports“ERROR:Couldnotfinddisulfidepartnerforresidue23”
Adisulfidebondwasdisruptedduringdocking.
Youcandisabledisulfidebonddetectionwiththeflag-detect_disulf false
15 snugdock reportserror “chains are not named correctly or are not in the expected order”
InputPDBdoesnotcontainchainsincorrectorder(light,heavy,thenantigen)orchainIDsare
AdjustchainorderininputPDBorspecifychainIDswiththe–partners AB_C flag,whereA,BandCarethelight,heavy,and
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
17
notL,H,andA. antigenchainIDs,respectively.2–15 UnknownRosettaerror. SeekhelpontheRosettaforums,
www.rosettacommons.org/forum2–15 Commonfixes • Checkformisspellings
• Checkpathsarecorrect• CheckFASTAformatting• CheckPDBformatting
ANTICIPATEDRESULTS
Theantibodystructurepredictionanddockingmethodsdescribedinthispapereachproduceasetofstructuralmodelsthathavebeenevaluatedbyascorefunction.Inthecaseofantibodystructureprediction,wehavefoundthroughbenchmarkingandparticipationintheAMAthattheaccuracyofframeworksandnon-H3CDRloopscantypicallybeexpectedtobewithin1.0ÅRMSDofthecoordinatesinacrystalstructure.Whenthemodeldeviatesmorethan1.0ÅinRMSDfromcrystallographiccoordinatesitisusuallybecausethereisnotasuitableknowntemplateinthePDB.ThesesituationsshouldbecomeincreasinglyrareasmorestructuresaredepositedintothePDB,althoughheavilyengineeredantibodiesshouldalwaysbemodeledwithcare.
TheH3loopaccuracyisvariableanddependsbothonlengthandVL–VHorientation.Looplengthisanimportantfactorintheaccuracyofdenovoloopmodelingmethodsbecausethesearchspaceincreasesexponentiallywitheachadditionalresidueintheloop.WeexpectaccuratemodelsofCDRH3loopsoflength14orless23,butthetop-scoringmodelmaynotbethemostaccurate.Wethereforerecommendusingalltenmodelsfordownstreamanalysis.InAMA-II,wefoundthatnon-nativeVL–VHorientationscanleadtoexplicitinteractionsbetweenthelightchainandtheCDRH3loopthatareindistinguishablefromnativeinteractions28.UsingmultipleVL–VHorientationtemplates18allowsbroaderexplorationofconformationalspace,samplingmorelow-scoringwells.ModelsgeneratedfromatleastthreedifferenttemplatesshouldbeusedtomaximizethechanceofcapturingthenativeVL–VHorientation.
ThroughbenchmarkingAb–Agdocking,wehavefoundthattheaccuracyofacomplexmodeldependsonthestartingconfigurationofthepartnersandtheaccuracyofthemodelsforeachpartner.SnugDocksampleslocalconformationspace,thusagoodstartingstructure(within8Å)generallyresultsinsamplinganear-nativeconformation.Equallyimportantisthequalityoftheinitialunboundmodels;near-nativemodelsenableincreaseddockingperformance(seeTable1:B-Brigidbody-dockingvs.U-Urigid-bodydocking).Wehavefoundthatdockingahomologymodeledantibodytothecrystalstructureoftheunboundantigentypicallyresultsinatleastonemodelofacceptablequalityinthetentop-scoringmodels(Table1).
AUTHORCONTRIBUTIONS
BDW,NM,SL,DK,JRJ,andJJGdevelopedthecurrentversionofRosettaAntibody.SLdevelopedROSIEandimplementedtheRosettaAntibodyandSnugDockserverapps.BDWimplementedSnugDockinRosetta3,JRJbenchmarkedSnugDock’sperformance.RFandNBwrotethe
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
18
procedure,codifiedthemanualinterventionstepsdevelopedbyBDW,NM,andDK,andrecordedtiminginformation.BDW,NM,SL,JRJ,DK,RF,NBandJJGwrotethemanuscript.
ACKNOWLEDGMENTS
TheauthorswishtothankArvindSivasubramanian,AroopSircar,andSidharthaChaudhuryfortheirdevelopmentoftheoriginalRosettaAntibody,SnugDock,andEnsembleDockmethods.JianqingXurefactoredtheantibodycode.WealsothankthemembersoftheRosettaCommonsforthecontinueddevelopmentoftheRosettaSoftwareSuite.ROSIEsimulationsarecarriedout,inpart,withintheExtremeScienceandEngineeringDiscoveryEnvironment(XSEDE),whichissupportedbyNationalScienceFoundationgrantnumberACI-1053575.BW,NM,JRJandJJGaresupportedbyNationalInstitutesofHealthGrantR01GM078221.SLissupportedbyNationalInstitutesofHealthGrantR01GM73151.DKissupportedbytheDARPAAntibodyTechnologyProgram(HR-0011-10-1-0052)andtheJapanSocietyforthePromotionofScience(grantnumber15H06606).RFissupportedbytheSouth-EasternNorwayRegionalHealthAuthority(grantnumber850703-6051-39788).
COMPETINGFINANCIALINTERESTS
Theauthorsdeclarenocompetingfinancialinterests.AllrevenuegeneratedbylicensingRosettatofor-profitentitiesisinvestedintothecontinueddevelopmentofthesoftware.
REFERENCES
1. Georgiou,G.,Ippolito,G.C.,Beausang,J.,Busse,C.E.,Wardemann,H.&Quake,S.R.Thepromiseandchallengeofhigh-throughputsequencingoftheantibodyrepertoire.Nat.Biotechnol.32,158–168(2014).
2. Reichert,J.M.Antibodiestowatchin2016.MAbs8,197–204(2016).
3. Correia,B.E.,Bates,J.T.,Loomis,R.J.,Baneyx,G.,Carrico,C.,Jardine,J.G.,Rupert,P.,Correnti,C.,Kalyuzhniy,O.,Vittal,V.,etal.Proofofprincipleforepitope-focusedvaccinedesign.Nature507,201–6(2014).
4. Al-Lazikani,B.,Lesk,A.M.&Chothia,C.Standardconformationsforthecanonicalstructuresofimmunoglobulins.J.Mol.Biol.273,927–948(1997).
5. Sircar,A.&Gray,J.J.SnugDock:Paratopestructuraloptimizationduringantibody-antigendockingcompensatesforerrorsinantibodyhomologymodels.PLoSComput.Biol.6,e1000644(2010).
6. Ponomarenko,J.V&Bourne,P.E.Antibody-proteininteractions:benchmarkdatasetsandpredictiontoolsevaluation.BMCStruct.Biol.7,64(2007).
7. Kozakov,D.,Brenke,R.,Comeau,S.R.&Vajda,S.PIPER:AnFFT-basedproteindocking
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
19
programwithpairwisepotentials.ProteinsStruct.Funct.Genet.65,392–406(2006).
8. Brenke,R.,Hall,D.R.,Chuang,G.Y.,Comeau,S.R.,Bohnuud,T.,Beglov,D.,Schueler-Furman,O.,Vajda,S.&Kozakov,D.Applicationofasymmetricstatisticalpotentialstoantibody-proteindocking.Bioinformatics28,2608–2614(2012).
9. Chen,R.,Li,L.&Weng,Z.ZDOCK:Aninitial-stageprotein-dockingalgorithm.ProteinsStruct.Funct.Genet.52,80–87(2003).
10. Krawczyk,K.,Baker,T.,Shi,J.&Deane,C.M.Antibodyi-Patchpredictionoftheantibodybindingsiteimprovesrigidlocalantibody-antigendocking.ProteinEng.Des.Sel.26,621–629(2013).
11. Sircar,A.,Chaudhury,S.,Kilambi,K.P.,Berrondo,M.&Gray,J.J.AgeneralizedapproachtosamplingbackboneconformationswithRosettaDockforCAPRIrounds13-19.ProteinsStruct.Funct.Bioinforma.78,3115–3123(2010).
12. Méndez,R.,Leplae,R.,Lensink,M.F.&Wodak,S.J.AssessmentofCAPRIpredictionsinrounds3-5showsprogressindockingprocedures.ProteinsStruct.Funct.Bioinforma.60,150–169(2005).
13. O’Meara,M.J.,Leaver-Fay,A.,Tyka,M.D.,Stein,A.,Houlihan,K.,Dimaio,F.,Bradley,P.,Kortemme,T.,Baker,D.,Snoeyink,J.,etal.Combinedcovalent-electrostaticmodelofhydrogenbondingimprovesstructurepredictionwithRosetta.J.Chem.TheoryComput.11,609–622(2015).
14. Coutsias,E.A.,Seok,C.,Jacobson,M.P.&Dill,K.A.Akinematicviewofloopclosure.J.Comput.Chem.25,510–528(2004).
15. Mandell,D.J.,Coutsias,E.A.&Kortemme,T.Sub-angstromaccuracyinproteinloopreconstructionbyrobotics-inspiredconformationalsampling.Nat.Methods6,551–552(2009).
16. Stein,A.&Kortemme,T.ImprovementstoRobotics-InspiredConformationalSamplinginRosetta.PLoSOne8,e63090(2013).
17. Johnson,G.&Wu,T.T.Kabatdatabaseanditsapplications:30yearsafterthefirstvariabilityplot.NucleicAcidsRes.28,214–8(2000).
18. Marze,N.A.&Gray,J.J.ImprovedpredictionofantibodyVL–VHorientation.ProteinEng.Des.Sel.AdvancedAccess,gzw013(2016).
19. Canutescu,A.A.&Dunbrack,R.L.Cycliccoordinatedescent:Aroboticsalgorithmforproteinloopclosure.ProteinSci.12,963–72(2003).
20. Wang,C.,Bradley,P.&Baker,D.Protein–ProteinDockingwithBackboneFlexibility.J.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
20
Mol.Biol.373,503–519(2007).
21. Bradley,P.,Misura,K.M.S.&Baker,D.TowardHigh-ResolutiondeNovoStructurePredictionforSmallProteins.Science.309,1868–1871(2005).
22. Misura,K.M.S.&Baker,D.Progressandchallengesinhigh-resolutionrefinementofproteinstructuremodels.ProteinsStruct.Funct.Genet.59,15–29(2005).
23. Weitzner,B.D.&Gray,J.J.AccuratestructurepredictionofCDRH3loopsenabledbyanovelstructure-basedC-terminal‘kink’constraint.InReview,(2016).
24. Gray,J.J.,Moughon,S.,Wang,C.,Schueler-Furman,O.,Kuhlman,B.,Rohl,C.A.&Baker,D.Protein–ProteinDockingwithSimultaneousOptimizationofRigid-bodyDisplacementandSide-chainConformations.J.Mol.Biol.331,281–299(2003).
25. Chaudhury,S.&Gray,J.J.ConformerSelectionandInducedFitinFlexibleBackboneProtein–ProteinDockingUsingComputationalandNMREnsembles.J.Mol.Biol.381,1068–1087(2008).
26. Kuroda,D.&Gray,J.J.Shapecomplementarityandhydrogenbondpreferencesinprotein-proteininterfaces:implicationsforantibodymodelingandprotein-proteindocking.Bioinformaticsbtw197(2016).doi:10.1093/bioinformatics/btw197
27. RosettaConstraintsDocumentation.https://www.rosettacommons.org/docs/wiki/rosetta_basics/Incorporating-Experimental-Data
28. Weitzner,B.D.,Kuroda,D.,Marze,N.,Xu,J.&Gray,J.J.BlindpredictionperformanceofRosettaAntibody3.0:Grafting,relaxation,kinematicloopmodeling,andfullCDRoptimization.ProteinsStruct.Funct.Bioinforma.82,1611–1623(2014).
29. ÓConchúir,S.,Barlow,K.A.,Pache,R.A.,Ollikainen,N.,Kundert,K.,O’Meara,M.J.,Smith,C.A.&Kortemme,T.AWebresourceforstandardizedbenchmarkdatasets,metrics,androsettaprotocolsformacromolecularmodelinganddesign.PLoSOne10,e0130433(2015).
30. Zemlin,M.,Klinger,M.,Link,J.,Zemlin,C.,Bauer,K.,Engler,J.A.,Schroeder,H.W.&Kirkham,P.M.ExpressedMurineandHumanCDR-H3IntervalsofEqualLengthExhibitDistinctRepertoiresthatDifferintheirAminoAcidCompositionandPredictedRangeofStructures.J.Mol.Biol.334,733–749(2003).
31. Kuroda,D.&Gray,J.J.Pushingthebackboneinprotein-proteindocking.InReview,(2016).
32. Schrödinger,L.ThePyMOLMolecularGraphicsSystem,Version1.8.(2015).
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
21
33. Pettersen,E.F.,Goddard,T.D.,Huang,C.C.,Couch,G.S.,Greenblatt,D.M.,Meng,E.C.&Ferrin,T.E.UCSFChimera--avisualizationsystemforexploratoryresearchandanalysis.JComputChem25,1605–1612(2004).
34. Chen,V.B.,Davis,I.W.&Richardson,D.C.KiNG(Kinemage,NextGeneration):Aversatileinteractivemolecularandscientificvisualizationprogram.ProteinSci.18,2403–2409(2009).
35. Lyskov,S.,Chou,F.-C.,Conchúir,S.Ó.,Der,B.S.,Drew,K.,Kuroda,D.,Xu,J.,Weitzner,B.D.,Renfrew,P.D.,Sripakdeevong,P.,etal.ServerificationofMolecularModelingApplications:TheRosettaOnlineServerThatIncludesEveryone(ROSIE).PLoSOne8,e63906(2013).
36. North,B.,Lehmann,A.&Dunbrack,R.L.AnewclusteringofantibodyCDRloopconformations.J.Mol.Biol.406,228–256(2011).
37. RosettaPrepackProtocolDocumentation.https://www.rosettacommons.org/docs/wiki/application_documentation/docking/docking-prepack-protocol
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
22
Table1.LocalAb–Agdockingbenchmarkresults.Co-crystalPDBIDsindicatethenativecomplex.PDBIDslistedunderthe“Type”columnalsoindicatetheuseofunbound(U)orbound(B)componentstructures(asavailable).ModelqualityisdefinedbytheCAPRIrankingcriteriarepresentedbyanumberofstarsorazero(0).Three,two,andonestar(s)indicatehigh,medium,andacceptablequality,respectively,andazeroindicatesincorrectmodels.Forthe“Rigid-Body”and“SnugDock”columns,thequalityofthelowest-scoringmodel,byinterfaceenergy,isreported(an“f”indicatesastrongenergyfunneldefinedasfiveormoreofthetenlowest-scoringmodelsbeingmediumqualityorbetter).EnsembleSnugDocksimulationswererunwithmulti-templategraftingandaCDRH3kinkconstraint.CAPRISummarylinessummarizemodelqualityforalltargets.CAPRISummaryTop10takesthehighest-qualitymodelfromthetenlowest-scoringmodels.
Co-crystal(PDBID) Type(Ab-Ag)
CDRH3Length
Rigid-BodyDockXtal
Rigid-BodyDockModel SnugDock
EnsembleSnugDock
1mlc U(1mlb)-U(1lza) 7 0 0 ** *1ahw U(1fgn)-U(1boy) 8 0 * * **1jps U(1jpt)-U(1tfh) 8 ** * * **1wej U(1qbl)-U(1hrc) 8 0 0 0 *1vfb U(1vfa)-U(8lyz) 8 * * 0 *1bql B-U(1dkj) 7 ***f 0 * 01k4c B-U(1jvm) 9 **f 0 * **2jel B-U(1poh) 9 ***f 0 ** *1jhl B-U(1ghl) 9 ***f 0 0 **1nca B-U(7nn9) 11 ***f 0 0 *2bdn B-B 8 ***f * * *1ynt B-B 9 ***f ***f **f ***f2aep B-B 9 ***f ** * 02b2x B-B 10 ***f * 0 **1ztx B-B 10 ***f * **f 0
CAPRISummaryTopDecoy 9***/2**/1* 1***/1**/6* 0***/4**/6* 1***/5**/6*CAPRISummaryTop10Decoys 9***/4**/2* 1***/6**/8* 1***/10**/4* 2***/11**/2*
No.ofFunnels 10 1 2 1
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
23
Figure1.Aschematicofthemodelingprotocols.ThestructureontheleftshowstheFvantibodydomainspredictedbyhomologymodeling(heavychainindarkbluewithCDRH1andH2loopsinorangeandCDRH3loopinred;lightchaininyellowwithitsCDRloopsinlightblue).Thestructureontherightdepictsanantibody–antigenstructureoutputbydocking(antigeningreen).
HOMOLOGY MODELLING
ANTIBODY SEQUENCES
CRYSTAL STRUCTURE OF THE ANTIBODY
STRUCTURE OF THE ANTIGEN
ANTIBODY- ANTIGEN DOCKING
FINAL DOCKED MODELS
>H|antibody EVQLLESDGGLV... >L|antibody TDIMQSPSSLAV...
H2 H1
H3
L3L1
L2
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
24
Figure2:Exampleoutputofplot_LHOC.py.ThetwoplotsshowdistributionsoftheHeavyOpeningAngle18asobtainedbyplot_LHOC.pyfortwodifferentantibodies.The10distinctlight-heavyorientationtemplatesarerepresentedbythecircles.Thetentop-scoringmodelsafterH3loopmodelingarerepresentedbythediamondswiththefillcolorcorrespondingtothestartingtemplate;inthelegend,thesepointsareorderedfromsmallesttolargestmetricvalue.ForAntibody_1,theanglessampledbyRosettaoverlapwiththeanglesobservedinantibody
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
25
crystalstructures.Thetentop-scoringmodelsareclosetothecenterofthedistribution.InAntibody_2,mostoftheanglessampledarefoundrarelyornotatallinantibodycrystalstructures.Thetentop-scoringmodelsarealsoshiftedtolargeranglesthantypicallyfoundinantibodies.ForAntibody_2,theusermightconsidertryingalternatelight-heavyorientationtemplates(Step10).
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
26
Box1|Increasingsamplingduringdockingbyincorporatingbackbonestructuralensembles.
InRosetta,anensembleisasetofdiscreteconformationsofaproteinstructure.SnugDockusesensemblestoapproximatebackboneconformationalflexibilitybysamplingconformationsfromtheensembleduringdocking.Throughthisapproach,notonlydoestheprotocolexploremoreconformationalspacethanstandarddocking,butitcanalsocompensateformodelerror,forexamplebyusinganensembleofmodelsproducedbyamodelingapproachinapreviousstepsuchasRosettaAntibody.
RosettaensemblescanbeconverteddirectlyfromNMRensembles,ortheycanbegeneratedusinganymethodthatinducesstructuraldiversity,suchasmoleculardynamicsorvariousRosettarefinementprotocols.Theensemblestypicallyspansmallstructuralvariationsof1-2ÅbackboneRMSD25.Rosetta'srelaxprotocol(unconstrained)21,22orKICprotocol14–16aresuggestedtogeneratedockingensemblesforantigens.Inaddition,RosettaAntibodycreatesensemblesofantibodiesbydefault.MoreonhowtogeneratinganddockingensemblescanbefoundinChauduryandGray25andinRosetta’sdocumentation37.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
27
Box2|Assessingantibodymodelingandantibody–antigendockingresults
Theusermustcriticallyanalyzecomputationalmodels.ModelsoutputbyRosettashouldberankedaccordingtoscore.Inmostsimulations,approximately90%ofthemodelswillbenon-native-like,andtheywilloccupyabulkscorerangeof30-50RosettaEnergyUnits(REU).Onlyabout1-5%ofmodelswillbenative-like,andtheywillhavescoresrangingfromwithinthebulkscorerangetoawellof5-10REUbelowthebulkscorerange.Ifaclusterofstructurallyrelatedmodelslieinthewellbelowthebulkscorerange,thesearelikelynative-like,withdeeperscoringandmorepopulatedwellsprovidinghigherconfidence.
Antibodymodelassessment.
First,assessthephysicalfeasibilityofthelowest-scoringmodelsbyeyeinamolecularvisualizationpackagesuchasPyMOL.Itisimportanttocheckforobviousflawsthatcanoccurinrarecircumstancessuchaspolypeptidechainbreaksorbackboneclashes,particularlywithintheCDRsandattheirgraftpoints.Theaccuracyofthenon-H3CDRloopsshouldbefurtherassessedbycomparingtheCDRclusterofthegraftedloopwiththeclusteroftheinputsequenceasidentifiedbyNorthetal.36(seestep5).Next,ensurethecomponentsoftheVL–VHorientationliewithinnature’sdistribution(step9),asmodelswithoutlyingorientationsarenotlikelytobenative-like;anexceptiontothisrulecanbemadeiftheVL–VHorientationgraftingtemplatesandRosettasamplingallliefartowardtheedgeofnature’sdistribution.
Ab–Agdockingmodelassessment.
Makesurethatthelowest-scoringmodelsmakegoodcontactsbetweentheantigenandtheantibodyparatope.Higherconfidencecanbeassignedtomodelswithlarge(~1200Å2),complementaryinterfaces,26aswellasthoseinwhichtheH3CDRloopmakesseveralspecificcontacts.Ifexperimentaldatasuggestanantigenbindingsite,ensuretheparatopecontactsatthissite.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint
28
Box3|UsingRosettaondifferentplatformsandrunninginparallel.
Rosettaondifferentplatforms.Throughoutthisprotocolexecutablesaresuffixedbytheplatformandmodeforwhichtheywerecompiled(i.e.antibody.macosclangreleaseindicatesthattheantibodyexecutablewascompiledonaMacOSoperatingsystemusingtheClangcompileranditwascompiledinreleasemode).Thesuffixishighlightedinorangethroughout(.macosclangrelease).Onotherplatformsyouwillreplacethisstringwithyouroperatingsystemandcompiler(forexample,GNU/Linuxplatformswithgccasthecompilerwilldefaultto.linuxgccrelease).Additionally,thesuffixisprefixedby.mpi (.mpi.linuxgccrelease)whentheexecutableisbuiltforthemessagepassinginterface(MPI)byanMPIcompiler.MPI-compatibleexecutablescancommunicatewithoneanotherforparallelprocessing,andsomeRosettaexecutablesuseMPInon-trivially.However,moststandardRosettaapplicationsaretriviallyparallelizable(“embarrassinglyparallel”)andthuscapableofrunningonbothMPIandnon-MPIsystems.Runninginparallel.Anexampleofhowtolocallyrunanon-MPIexecutableinparallelisgiveninstep8.Ingeneral,addthe-multiple_processes_writing_to_one_directoryflagtoyourcommandline,andthenexecutemultipleinstancesoftheprocess.ThisprocedureworksonasingledesktopcomputerwithmultipleCPUsorremotelyonasupercomputercluster.However,runningaRosettaexecutableonaclusterstronglydependsonthehardwareconfigurationandavailablesoftware(e.g.workloadmanagementsoftware).
Forexample,torunanon-MPIexecutableviaHTCondor:(1)savethestandardcommandlineasanexecutablebashscript,(2)writeasubmitdescriptionfilespecifyingtheexecutablebashscriptandthenumberofprocessestoexecute,and(3)usethecondor_submitcommandwiththedescriptionfileasanargumenttosubmityourjobstothecluster.
Ontheotherhand,MPIexecutablescanberuninparallellocallybyprependingthecommandlinewiththempirun –n XX command,whereXXisthenumberofprocessestorun,ifyourmachineisconfiguredtousetheOpenMPIlibrary.Again,theexactdependonthespecificclusterconfiguration.Forexample,torunanMPIexecutableonStampedeviatheslurmworkloadmanager:(1)savethestandardcommandlineasanexecutablebashscript,(2)writeaslurmbatchscriptspecifyingtheexecutablebashscriptandthenumberoftasks,and(3)usethesbatchcommandwiththebashscriptasanargumenttosubmityourjobstothecluster.
.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint