10
“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 1 — #1 A new class of synthetic beta-solenoid proteins with the fragment-free computational design of a beta-hairpin extension James T. MacDonald * , Burak V. Kabasakal , David Godding § , Sebastian Kraatz , Louie Henderson , James Barber , Paul S. Freemont * and James W. Murray * Centre for Synthetic Biology and Innovation, Imperial College London, London, SW7 2AZ, UK, Department of Medicine, Imperial College London, London, SW7 2AZ, UK, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK, § Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA UK, and Laboratory of Biomolecular Research, Paul Scherrer Institut, CH-5232 Villigen PSI, Switzerland Submitted to Proceedings of the National Academy of Sciences of the United States of America The ability to design and construct structures with atomic level pre- cision is one of the key goals of nanotechnology. Proteins offer an attractive target for atomic design, as they can be synthesized chem- ically or biologically, and can self-assemble. However the generalized protein folding and design problem is unsolved. One approach to simplifying the problem is to use a repetitive protein as a scaffold. Repeat proteins are intrinsically modular, and their folding and struc- tures are better understood than large globular domains. Here, we have developed a new class of synthetic repeat protein, based on the pentapeptide repeat family of beta-solenoid proteins. We have constructed length variants of the basic scaffold, and computation- ally designed de novo loops projecting from the scaffold core. The experimentally solved 3.56 ˚ A resolution crystal structure of one de- signed loop matches closely the designed hairpin structure, showing the computational design of a backbone extension onto a synthetic protein core without the use of backbone fragments from known structures. Two other loop designs were not clearly resolved in the crystal structures and one loop appeared to be in an incorrect confor- mation. We have also shown that the repeat unit can accommodate whole domain insertions by inserting a domain into one of the de- signed loops. computational protein design | synthetic repeat proteins | de novo backbone design Abbreviations: RFR, repeat five residues; RMSD, root-mean-square deviation D uring the course of evolution, natural proteins may be re- cruited to new unrelated functions conferring a selective advantage to the organism [1,2]. This accretion of new features and functions is likely to have left behind complex interlocking amino acid dependencies which can make reengineering natu- ral proteins difficult and unpredictable [3]. For this reason, we and others hypothesize that it is more desirable to design de novo proteins as these provide a biologically-neutral platform onto which functional elements can be grafted [4]. Artificial proteins have been designed by decoding simple residue pat- terning rules that govern the packing of secondary structural elements and this has been particularly successful for α-helical bundle proteins [5–7]. An alternative approach is to assemble de novo folds from backbone fragments of known structures or idealized secondary structural elements and use computa- tional protein design methods to design the sequence [4,8–10]. Both the computational and the simpler rules-based design approaches have concentrated on designing proteins consisting of canonical secondary structure linked with loops of minimal length. A class of proteins that has attracted considerable inter- est is artificial proteins based on repeating structural motifs due to their intrinsic modularity and designability [11]. Re- peat proteins have applications including their use as novel nanomaterials [12–14] and as scaffolds for molecular recog- nition [15, 16]. These proteins may be designed using both sequence consensus-based rules [17] or computational protein design methods [18, 19]. There are a number of families of beta-helical repeat proteins [20], from which we chose the pen- tapeptide repeat family, forming the RFR-fold (repeat five residues), which has a square cross-sectional profile, as the basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties that make it at- tractive as a substrate for design. The structure is unusually regular, but is able to tolerate a wide range of residues on the outside of the solenoid barrel. The solenoids in natural RFR- fold proteins are nearly straight in contrast to several other forms of repeat protein such as the leucine rich repeat (LRR) which are highly curved. There are examples of natural RFR- fold proteins with loop extensions projecting from the barrel, making this class of protein particularly suitable for function- alization. The protein is similar in diameter to DNA, and some RFR-fold proteins are thought to play a role as DNA mimics [22]. Here, we have designed and solved the structures of a number of artificial RFR-fold proteins of different lengths. Previously, computationally designed enzymes have reused backbone scaffolds from known natural proteins [23–25], al- though artificial helical bundle proteins have been functional- Significance The development of algorithms to design new proteins with backbone plasticity is a key challenge in computational protein design. In this paper, we describe a novel class of extensible synthetic repeat protein scaffolds with computationally designed variable loops projecting from the central core. We have devel- oped new methods to computationally sample backbone confor- mations using a coarse-grained potential energy function with- out using backbone fragments from known protein structures. This was combined with existing methods for sequence design to successfully design a loop at atomic level precision. Given the inherent modular and composable nature of repeat proteins, this approach allows the iterative atomic-resolution design of complex structures with potential applications in novel nanoma- terials and molecular recognition. Reserved for Publication Footnotes www.pnas.org/cgi/doi/??? PNAS Issue Date Volume Issue Number 17

A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 1 — #1 ii

ii

ii

A new class of synthetic beta-solenoid proteinswith the fragment-free computational design of abeta-hairpin extensionJames T. MacDonald ∗ †, Burak V. Kabasakal ‡, David Godding ‡ §, Sebastian Kraatz ‡ ¶, Louie Henderson ‡ , James Barber‡ , Paul S. Freemont ∗ † and James W. Murray ‡

∗Centre for Synthetic Biology and Innovation, Imperial College London, London, SW7 2AZ, UK,†Department of Medicine, Imperial College London, London, SW7 2AZ,

UK,‡Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK,§Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA UK, and¶Laboratory of Biomolecular Research, Paul Scherrer Institut, CH-5232 Villigen PSI, Switzerland

Submitted to Proceedings of the National Academy of Sciences of the United States of America

The ability to design and construct structures with atomic level pre-cision is one of the key goals of nanotechnology. Proteins offer anattractive target for atomic design, as they can be synthesized chem-ically or biologically, and can self-assemble. However the generalizedprotein folding and design problem is unsolved. One approach tosimplifying the problem is to use a repetitive protein as a scaffold.Repeat proteins are intrinsically modular, and their folding and struc-tures are better understood than large globular domains. Here, wehave developed a new class of synthetic repeat protein, based onthe pentapeptide repeat family of beta-solenoid proteins. We haveconstructed length variants of the basic scaffold, and computation-ally designed de novo loops projecting from the scaffold core. Theexperimentally solved 3.56 A resolution crystal structure of one de-signed loop matches closely the designed hairpin structure, showingthe computational design of a backbone extension onto a syntheticprotein core without the use of backbone fragments from knownstructures. Two other loop designs were not clearly resolved in thecrystal structures and one loop appeared to be in an incorrect confor-mation. We have also shown that the repeat unit can accommodatewhole domain insertions by inserting a domain into one of the de-signed loops.

computational protein design | synthetic repeat proteins | de novo backbone

design

Abbreviations: RFR, repeat five residues; RMSD, root-mean-square deviation

During the course of evolution, natural proteins may be re-cruited to new unrelated functions conferring a selective

advantage to the organism [1,2]. This accretion of new featuresand functions is likely to have left behind complex interlockingamino acid dependencies which can make reengineering natu-ral proteins difficult and unpredictable [3]. For this reason, weand others hypothesize that it is more desirable to design denovo proteins as these provide a biologically-neutral platformonto which functional elements can be grafted [4]. Artificialproteins have been designed by decoding simple residue pat-terning rules that govern the packing of secondary structuralelements and this has been particularly successful for α-helicalbundle proteins [5–7]. An alternative approach is to assemblede novo folds from backbone fragments of known structuresor idealized secondary structural elements and use computa-tional protein design methods to design the sequence [4,8–10].Both the computational and the simpler rules-based designapproaches have concentrated on designing proteins consistingof canonical secondary structure linked with loops of minimallength.

A class of proteins that has attracted considerable inter-est is artificial proteins based on repeating structural motifsdue to their intrinsic modularity and designability [11]. Re-peat proteins have applications including their use as novelnanomaterials [12–14] and as scaffolds for molecular recog-nition [15, 16]. These proteins may be designed using both

sequence consensus-based rules [17] or computational proteindesign methods [18, 19]. There are a number of families ofbeta-helical repeat proteins [20], from which we chose the pen-tapeptide repeat family, forming the RFR-fold (repeat fiveresidues), which has a square cross-sectional profile, as thebasis for the design of a new class of synthetic repeat protein(Fig. 1 A and B) [21].

The RFR-fold has a number of properties that make it at-tractive as a substrate for design. The structure is unusuallyregular, but is able to tolerate a wide range of residues on theoutside of the solenoid barrel. The solenoids in natural RFR-fold proteins are nearly straight in contrast to several otherforms of repeat protein such as the leucine rich repeat (LRR)which are highly curved. There are examples of natural RFR-fold proteins with loop extensions projecting from the barrel,making this class of protein particularly suitable for function-alization. The protein is similar in diameter to DNA, andsome RFR-fold proteins are thought to play a role as DNAmimics [22]. Here, we have designed and solved the structuresof a number of artificial RFR-fold proteins of different lengths.

Previously, computationally designed enzymes have reusedbackbone scaffolds from known natural proteins [23–25], al-though artificial helical bundle proteins have been functional-

Significance

The development of algorithms to design new proteins withbackbone plasticity is a key challenge in computational proteindesign. In this paper, we describe a novel class of extensiblesynthetic repeat protein scaffolds with computationally designedvariable loops projecting from the central core. We have devel-oped new methods to computationally sample backbone confor-mations using a coarse-grained potential energy function with-out using backbone fragments from known protein structures.This was combined with existing methods for sequence designto successfully design a loop at atomic level precision. Giventhe inherent modular and composable nature of repeat proteins,this approach allows the iterative atomic-resolution design ofcomplex structures with potential applications in novel nanoma-terials and molecular recognition.

Reserved for Publication Footnotes

www.pnas.org/cgi/doi/??? PNAS Issue Date Volume Issue Number 1–7

Page 2: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 2 — #2 ii

ii

ii

ized using an intuitive manual design process [26–28]. As thefield of enzyme design becomes more ambitious it is likely thatconsideration of backbone plasticity will become increasinglyimportant [29]. Backbone conformations from solved proteinstructures are guaranteed to be designable as there is at leastone sequence known to fold into that structure. However, thisis unlikely to be true for an arbitrary backbone conformation.The incorporation of backbone flexibility in protein design hasbeen recognized as a key challenge in computational proteindesign [30] with current methods typically reusing backbonefragments from other known protein structures [31, 32]. Re-cently, we have developed algorithms to rapidly sample loopconformations using a coarse-grained Cα model [33] and toaccurately reconstruct proteins backbones [34] as part of anapproach that often gave sub-A RMSD loop predictions [35].In this paper, we have applied these techniques to de novobackbone design without using fragments from known proteinstructures while also explicitly considering alternative confor-mational states. We were able to solve the structures of fourloop design proteins using X-ray crystallography and showthat one of these structures matched the design at atomiclevel accuracy.

ResultsDesign of synthetic RFR-fold proteins of variable length.Residue frequency tables were derived from known RFR-foldproteins for each of the five positions in the repeat giving theconsensus sequence ADLSG (Fig. 1C and SI Appendix, Ta-ble S2). A 120 residue stochastic repeat sequence (24 repeatsor six superhelical turns) was drawn from the frequency ta-bles and combined with N- and C-terminal capping sequencesto protect the hydrophobic core from solvent exposure. TheC-terminal cap also incorporated a dimer interface from theparent protein as a first step towards lattice and multimer de-sign. The initial synthetic protein was named SynRFR24.1.Single turns were removed or added to create variant pro-teins of different lengths, SynRFR20.1 and SynRFR28.1. Allthree proteins were easily expressed and purified using stan-dard techniques, and were found to crystallize in a variety ofdifferent crystal forms (Table 1 and SI Appendix, Table S1). Afourth variant protein, SynRFR24.2, was constructed with twoamino acid changes (D196S and R198H). This crystallized ina new crystal form, not observed for the SynRFR24.1 protein,probably because the large arginine 198 side-chain blocked acrystal contact. All SynRFR proteins formed dimers in thecrystal lattice (Fig. 1D) and in solution (SI Appendix, Fig.S2). Thermal stability measurements of these variable lengthproteins using a thermofluor assay showed melting tempera-tures of between 65 to 73 ◦C that did not appear to be corre-lated with repeat length (SI Appendix, Fig. S5).

Computational design of de novo loops. Given the inherentmodularity and ease of expression, we decided to test whetherthese proteins could serve as an extended scaffold base for thedesign of de novo backbone embellishments as a step towardsfunctionalization. Taking the 1.8 A resolution SynRFR24.1crystal structure (PDB: 4YC5) as the base scaffold, an eightresidue insertion was created approximately midway along thestochastic repeat region of the protein (between residues 108-109). This loop length was chosen on the basis of the ac-curacy of previous loop structure prediction results [35]. 4000backbone sequence-independent loop conformations were sam-pled using the PD2 loop model software with no externallyimposed restraints on secondary structure or any other fea-ture (Fig. 2A). Briefly, the method samples plausible back-bone loop conformations from a sequence-independent coarse-

grained Cα potential energy function and then reconstructsother backbone atoms using a structural alphabet-based algo-rithm [34]. The Cα potential energy function includes pseudo-bond length, bond angle, and dihedral terms to ensure goodlocal structure together with soft steric repulsive and pseudo-hydrogen bonding terms. Loop conformations were sampledby successive simulated annealing Monte Carlo runs followedby full backbone reconstruction. Previously, this method wassuccessfully applied to loop prediction giving results that werecomparable to fragment replacement-based methods despitethe sequence-independence of the initial backbone conforma-tional sampling [35]. Coarse-grained loop sampling was fol-lowed by sequence design using Rosetta [36] on each of theconformations to generate full-atom models.

Selection of designed loops. A significant proportion of the4000 conformations were likely not designable so we devel-oped an approach that explicitly considered alternative lowenergy conformational states in order to filter out bad de-signs. Each of the 4000 designed sequences was threaded ontoeach of the 4000 loop conformations then gradient minimizedin the Rosetta force-field with the resulting energy and RMSDto the designed structure recorded (Fig. 2 B and C). With theassumption that we have sampled the important low energystates, we filtered the designs based on the probability that adesign is in a folded state, Pi > 0.9 (equation [1]; Fig. 2D),calculated using the Boltzmann distribution, and other crite-ria (see Methods). The criterion that Pi > 0.9 removed 97.9% of designs by itself.

Crystal structures of designed loop proteins. Of the ten loopextension designs selected for experimental characterization,five could be expressed and purified, and crystal structureswere obtained for four (Table 1). Of these structures, Syn-RFR.t1428 was solved at 3.56 A resolution and showed clearunbiased electron density that unambiguously matched thedesigned loop embellishment after molecular replacement us-ing a model with the loop region excised (SI Appendix, Fig.S1). After refinement, the hairpin loop region residues (108 to117) very closely matched the design with an all-atom RMSDvalue of 0.71 A for the best chain (Fig. 3 A and B). The loopregion forms a crystal contact with the non-crystallographicsymmetry copy of itself leading to a higher-order assembly inthe crystal lattice (Fig. 3C) but in solution, the protein wasdimeric (SI Appendix, Fig. S3). The hairpin loop structureof SynRFR.t1428 forms a type I beta-turn with a proline, atryptophan and a tyrosine forming a mini-hydrophobic core.A similar tyrosine and tryptophan stacking motif, albeit in dif-ferent relative positions, can be seen in a designed beta-sheetprotein with type I’ beta-turns, Betanova, which was foundto fold cooperatively in aqueous solution despite having noreal hydrophobic core [37]. A previous study also engineeredan extended beta-hairpin on an SH3 domain using sequencesfrom a model peptide system in order to determine its effecton folding [38].

Of the other loop designs, SynRFR.t1555, solved at 4.4 Aresolution, showed electron density consistent with the de-signed loop conformation but the resolution was too low tobe conclusive (SI Appendix, Fig. S1). SynRFR.t801 had elec-tron density over the entire loop that was clearly different tothe design and had the same type III crystal form as one ofthe SynRFR24.1 structures (Table 1, Fig. 4). The density forthe SynRFR.t3284 loop was not resolvable beyond the firstfew residues but had the same type IV crystal form observedfor SynRFR24.2. Thermal stability assays of the loop variantproteins showed slighly lower melting temperatures compared

2 www.pnas.org/cgi/doi/??? MacDonald et al.

Page 3: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 3 — #3 ii

ii

ii

the length variant proteins ranging from 54 to 60 ◦C (SI Ap-pendix, Fig. S6).

Loop energy landscape. In order to further characterize theloop energy landscape and understand our results, an extra16,000 loop conformations were sampled with weak harmonicCα coordinate restraints to the solved crystal structures us-ing the PD2 loop model software followed by gradient energyminimization using Rosetta for each of the loops with electrondensity in the loop region. These extra samples are shownas green points in Fig. 2C. The structure of SynRFR24.t801was found to be in a completely different conformation to thedesigned structure, however a new potential energy minimumnear to the experimentally solved structure was not observed.Although the general path of the SynRFR24.t801 loop back-bone could be traced in the electron density, it was not wellresolved so the restrained resampling procedure may not havesampled the correct region of conformational space. Anotherpotential source of error is that the energy minimization proto-col did not permit bond angle or bond length flexibility whichcould be important for the accurate modelling of the energylandscape [39]. Alternatively, this may indicate that the po-tential energy function can be further improved. The energylandscape for SynRFR24.t1428 supports a minimum aroundthe designed structure but SynRFR24.t1555 appears to havea broad minimum 1-2 A RMSD from the designed structure.This indicates conformational flexibility for this design andmay explain why the loop was not well-ordered in the crystalstructure.

Crystal lattice packing. Several of the synRFR proteins crys-tallized in the same crystal form. For example the Syn-RFR.t801 crystal form is the same as the type III P3221 Syn-RFR24.1 structure, but the loop projects into the solvent voids(Fig. 4 A and B). The SynRFR24.2 I222 (form IV) structurehas a packing motif of a bundle of three dimers with D3 sym-metry, which is also found in the SynRFR.t3284 structure,in which the loops project into the solvent voids (Fig. 4 Cand D). The C-terminal dimer axis exhibits flexibility, withthe angle between the solenoid axes varying between 157◦ and168◦ in the different structures, and there is also variationwithin a single crystal form. Such flexibility may assist in as-sembling future nanostructures. The large surface area andwide allowed variability within the RFR consensus repeat ofthe SynRFR solenoid should allow for fine control of latticecontact points, enabling precise lattice and multimer designin future constructs.

Whole domain insertions into the solenoid scaffold. To testwhether the extended beta-solenoid structure can be deco-rated with whole domain embellishments, two variants withsuperfolder green fluorescent protein (sfGFP) domain inser-tions were created. Taking the SynRFR24.t1428 protein asthe template, a sfGFP domain was inserted between the loopresidues P112 and W113 with additional glycine/serine link-ers to connect to the termini of the sfGFP domain. A secondvariant was simultaneously created with a W113A mutationin case the large hydrophobic tryptophan caused unwantedinteractions. Both proteins were found to be well-expressed,soluble and fluorescent. The proteins were found to be dimericin solution (SI Appendix, Fig. S4), suggesting that the C-terminal cap of the solenoid was still folded and able to forma dimer interface. It is unlikely that dimerization is mediatedby the sfGFP domain as this is monomeric [40]. These datasuggest that the solenoid is continuous, and has accomodatedthe large domain insertion.

DiscussionWe have described the design and construction of a seriesof variable-length synthetic beta-solenoid RFR-fold proteins,which are capable of hosting computationally designed loopsthat decorate the structure. Initial results suggest entire pro-tein domains can also be inserted. To our knowledge, theseare the only artificial versions of this class of protein to havebeen created to date. The synthetic protein scaffolds crys-tallize in a variety of crystal forms, some identical betweendifferent protein designs. The regular extensible linear struc-ture and DNA-like dimensions make synRFR proteins poten-tial building blocks in the emerging field of protein origamias well as for co-assembling DNA-protein nanomaterials [41].Proteins have several advantages over DNA in that they havemany more functional groups for derivatization, are chemicallyricher and are capable of self-assembling in vivo without com-plex annealing protocols. The variety of crystal forms is alsoa first step towards crystal lattice design, enabling the con-struction of functional zeolite-like porous bioreactive materi-als. Multi-component designs of solenoids with ends capableof forming different multimers could also be used to constructclosed cages [42] or extended complex lattices.

Here, we have been able to computationally design an au-tomatically generated free-form de novo backbone embellish-ment on a de novo repeat scaffold without using backbonefragments from known protein structures. The ability to sam-ple plausible and designable backbone conformations directlyfrom a coarse-grained potential energy function rather thanusing fragment insertion permits the incorporation of func-tional geometric constraints and the use of sophisticated sam-pling techniques during the design process. In this work, wehave developed a method to select promising loop designs byusing the alternative sampled backbone conformations as de-coys and the Boltzmann distribution to rank the designs. It isprobable that very few short single loop projections into sol-vent from the solenoid core are designable and able to fold intowell-defined rigid structures due to the lack of opportunity toform a well-packed core. Multiple surface loop projections aremore likely to form stable well-defined structures and couldbe iteratively designed using successful single loop designs asstarting points.

We have shown that the beta-solenoid scaffold may be ca-pable of hosting whole domain insertions within the repeatunits by inserting a sfGFP domain into the loop of Syn-RFR24.t1428. This could prove useful by providing a rigidscaffold as a basis for large artificial multi-enzyme complexes[43].

These advances provide a solid basis for the design of func-tionalised extensions, of single and multiple loops, to be incor-porated into new crystal lattices and oligomers. The abilityof the SynRFR proteins to act as stable platforms for variableloops may also prove useful for molecular recognition applica-tions [15]. In the future we can envisage more complex multi-ple loop decorations, including co-factor binding sites, enzymeactive sites, and complete protein domains.

Materials and MethodsDesign of beta-solenoid repeats. Residue frequency tables were derived from

known Repeat Five Residue (RFR) proteins then manually edited to remove cysteine

and proline residues, and to ensure alanine at position 1 and leucine at position 3.

These were found to have a consensus repeat sequence of ADLSG. A stochastic re-

peat sequence was created by drawing residues from the residue frequency table. The

N-terminal cap, including a cleavable hexahistidine tag, (sequence: MGSSHHHHHH

SSGLVPRGSHMNVGEILRHYAAGKRNFQHINLQEIELTNASLTGADLSY) was taken

from the HetL protein from Nostoc sp. Strain PCC7120 (PDB: 3DU1) and the C-

terminal cap (sequence: ADLSGARTTGARLDDADLRGATVDPVLWRTASLVGARV

MacDonald et al. PNAS Issue Date Volume Issue Number 3

Page 4: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 4 — #4 ii

ii

ii

DVDQAVAFAAAHGLCLAGGSGC) was taken from the MfpA protein from M. tu-berculosis (PDB: 2BM4). The C-terminal cap forms a homodimeric interface in

all the crystal structures. A one-turn solenoid extension variant, synRFR28.1, was

created by modelling an extra 20 residue turn into the solenoid structure after residue

139. The model was created by superimposing the highest resolution SynRFR24.1

crystal structure (PDB: 4YC5) onto itself with a one-turn shift. Two halves from

each structure were then recombined to create the final extended solenoid model.

Sequences for the one turn insertion were designed using RosettaDesign permitting

only residues that appear in the residue frequency table. A one-turn solenoid deletion,

synRFR20.1, was created by deleting 20 residues from SynRFR24.1 (∆120-139).

Computational loop design. Using the highest resolution type I SynRFR24.1

crystal structure as the base scaffold (PDB: 4YC5), an 8 residue insert was created

between residues 108 and 109 at a “corner” of the square solenoid repeat. 4000 back-

bone loop conformations were sampled using algorithms we have previously developed

and implemented in the PD2 software package [34, 35]. Using the Rosetta3 software

package [36], a sequence was designed for each of the loop structures with a proto-

col that cycles through rounds of sequence design and gradient energy minimisation

(FlxbbDesign). The amino acid identites of residues immediately adjacent to the loop

were also allowed to vary in addition to the loop region itself. Each of the 4000 se-

quences was threaded onto all 4000 structures and gradient energy minimised using

the FastRelax protocol. For both the design and relaxation protocols, the Talaris2013

scoring function was used. Good sequence designs were expected to have the lowest

potential energies close to the desired loop conformation. Assuming the loops follow

the Boltzmann distribution, the sequence threading calculations permitted the ranking

of each design by explicitly considering alternative low energy conformational states

using equation [ 1 ].

Pi =

∑j∈F

e−Ei(j)

kBT

N∑j=1

e−Ei(j)

kBT

[ 1 ]

where Pi was the probability of the designed loop, i, being in the desired folded

conformation, Ei(j) was the gradient minimised energy of sequence i on structure

j, F was the set of correctly folded loops (defined as less than 1 A RMSD from

the designed structure), N was 4000 (i.e. all sampled conformations). A list of

10 designs for experimental characterisation was selected by picking structures with

Pi > 0.9, the lowest folded loop energy less than the mean lowest folded loop en-

ergy (< −324.8) Rosetta Energy Units (REU), the energy gap between the lowest

energy structure < 1 A RMSD and the lowest energy structure > 1 A RMSD being

< −2 REU, RosettaHoles score < 2.3, and with no residues in forbidden regions

of the Ramachandran plot.

Cloning, expression and purification. The SynRFR24.1 gene sequence was codon

optimised, synthesised and cloned into the pET11a expression plasmid by GeneArt.

Turns were added and deleted using PCR followed by recircularisation using Gibson

assembly or restriction enzyme digestion and ligation. Variable loop regions were sup-

plied as linear DNA gBlocks from IDT, the original SynRFR24.1 plasmid (including

the non-variable parts of the coding sequence) was linearised by PCR and the final

construct formed using In-Fusion HD (Clontech Laboratories, Inc.). 100 µg/ml ampi-

cillin was used for selection in all media. All ligation and assembly reactions were

transformed into the Escherichia coli strain NEB10β (New England Biolabs)

and grown overnight on LB Agar medium. Colonies were picked and grown overnight

in 5 ml Lysogeny Broth (LB) medium, plasmid miniprepped (QIAprep Spin Miniprep

Kit, Qiagen) and sequence verified (Eurofins Genomics) using the standard T7 and

T7 terminator primers. Verified plasmids were transformed into chemically competent

BL21-Gold DE3 (Agilent Technologies) or KRX cells (Promega). For each SynRFR

variant, 1 l LB or Terrific Broth (TB) medium was inoculated with 1 ml from 5 ml

overnight cultures. The cultures were grown until an OD600 reading of 0.6 whereupon

expression was induced with 1 mM IPTG or 0.1% rhamnose for KRX cells. After 4

hours of induction, the cells were harvested and resuspended in lysis buffer (100 mM

bicine and 150 mM NaCl buffer titrated to pH 9.0 with NaOH) with EDTA-free SIG-

MAFAST protease inhibitor cocktail tablets (Sigma). The cells were sonicated and

clarified by spinning at 40,000 RCF for 40 minutes. The proteins were purified with a

Ni-NTA column, washed with 100 mM bicine, 150 mM NaCl, 25mM imidazole at pH

9.0 and eluted in 100 mM bicine, 150 mM NaCl, 250mM imidazole at pH 9.0. The

proteins were further purified by gel filtration using Superdex75 HiLoad 16/60 (GE

Healthcare) or Superdex200 HiLoad 16/60 (GE Healthcare) columns. Proteins were

concentrated in bicine buffer using 10 kDa cutoff centrifugal concentrators (Millipore).

X-ray crystallography. The proteins were concentrated to∼10 mg/ml and used to

set up vapour diffusion sparse-matrix crystallization trials with a TTP mosquito robot.

Crystals were optimised in manually set up trays where necessary. Crystals were cryo-

protected in the mother liquor and 30% volume added of glycerol or PEG400, were

flash-cooled in liquid nitrogen and stored for data collection. Diffraction data were

collected at Diamond Light Source synchrotron with the exception of SynRFR24.2

which was collected with an in-house rotating-anode source.

ACKNOWLEDGMENTS. The authors would like to thank Diamond Light Sourcefor beamtime (proposals mx7299, mx1227, mx9424). BBSRC through a Eurocores(funding JTM) grant number: BB/J010294/1. EPSRC through the Syntegron project(funding JTM) grant numer: EP/K034359/1. JWM was funded by a Biotechnol-ogy and Biological Sciences Research Council (BBSRC) David Phillips Fellowship(BB/F023308/1). The Imperial College High Performance Computing cluster wasused for computational protein design. Ciaran McKeown and Kirsten Jensen arethanked for help and support.

1. Aharoni A et al. (2005) The ’evolvability’ of promiscuous protein functions. Nature

Genetics 37(1):73–76.

2. Toscano M, Woycechowsky K, Hilvert D (2007) Minimalist Active-Site Redesign:

Teaching Old Enzymes New Tricks. Angewandte Chemie International Edition

46(18):3212–3236.

3. Dutton PL, Moser CC (2011) Engineering enzymes. Faraday Discussions 148:443–448.

4. Lin YR et al. (2015) Control over overall shape and size in de novo designed proteins.

Proc Natl Acad Sci U S A pp. E5478–E5485.

5. Regan L, DeGrado WF (1988) Characterization of a helical protein designed from first

principles. Science 241(4868):976–978.

6. Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH (1993) Protein design by

binary patterning of polar and nonpolar amino acids. Science 262(5140):1680–1685.

7. Woolfson DN (2005) The design of coiled-coil structures and assemblies. Advances in

Protein Chemistry 70(04):79–112.

8. Kuhlman B et al. (2003) Design of a novel globular protein fold with atomic-level

accuracy. Science 302(5649):1364–8.

9. Koga N et al. (2012) Principles for designing ideal protein structures. Nature

491(7423):222–227.

10. Thomson AR et al. (2014) Computational design of water-soluble alpha-helical barrels.

Science 346(6208):485–488.

11. Javadi Y, Itzhaki LS (2013) Tandem-repeat proteins: Regularity plus modularity equals

design-ability. Current Opinion in Structural Biology 23(4):622–631.

12. Lee G et al. (2006) Nanospring behaviour of ankyrin repeats. Nature 440(7081):246–

249.

13. Tsai Cj et al. (2007) Principles of nanostructure design with protein building blocks.

Proteins 68(1):1–12.

14. Phillips JJ, Millership C, Main ERG (2012) Fibrous nanostructures from the self-

assembly of designed repeat protein modules. Angewandte Chemie - International

Edition 51:13132–13135.

15. Binz HK, Amstutz P, Pluckthun A (2005) Engineering novel binding proteins from

nonimmunoglobulin domains. Nature Biotechnology 23(10):1257–1268.

16. Karanicolas J et al. (2011) A de novo protein binding pair by computational design

and directed evolution. Molecular Cell 42(2):250–60.

17. Boersma YL, Pluckthun A (2011) DARPins and other repeat protein scaffolds: Ad-

vances in engineering and applications. Current Opinion in Biotechnology 22(6):849–

857.

18. Parmeggiani F et al. (2014) A general computational approach for repeat protein de-

sign. Journal of Molecular Biology 427(2):563–575.

19. Park K et al. (2015) Control of repeat-protein curvature by computational protein

design. Nature Structural & Molecular Biology 22(2):167–174.

20. Kajava AV, Steven AC (2006) Beta-rolls, beta-helices, and other beta-solenoid pro-

teins. Advances in Protein Chemistry 73:55–96.

21. Bateman A, Murzin AG, Teichmann SA (1998) Structure and distribution of pentapep-

tide repeats in bacteria. Protein Science 7(6):1477–1480.

22. Hegde SS et al. (2005) A Fluoroquinolone Resistance Protein from Mycobacterium

tuberculosis That Mimics DNA. Science 308(5727):1480–1483.

23. Bolon DN, Mayo SL (2001) Enzyme-like proteins by computational design. Proc Natl

Acad Sci U S A 98(25):14274–14279.

24. Rothlisberger D et al. (2008) Kemp elimination catalysts by computational enzyme

design. Nature 453(7192):190–195.

25. Jiang L et al. (2008) De novo computational design of retro-aldol enzymes. Science

319(5868):1387–91.

26. Kaplan J, DeGrado WF (2004) De novo design of catalytic proteins. Proc Natl Acad

Sci U S A 101(32):11566–11570.

27. Lichtenstein BR, Cerda JF, Koder RL, Dutton PL (2009) Reversible proton coupled

electron transfer in a peptide-incorporated naphthoquinone amino acid. Chemical

Communications (2):168–170.

4 www.pnas.org/cgi/doi/??? MacDonald et al.

Page 5: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 5 — #5 ii

ii

ii

28. Koder RL et al. (2009) Design and engineering of an O(2) transport protein. Nature

458(7236):305–309.

29. Eiben CB et al. (2012) Increased Diels-Alderase activity through backbone remodeling

guided by Foldit players. Nature Biotechnology 30(2):190–2.

30. Ollikainen N, Smith Ca, Fraser JS, Kortemme T (2013) Flexible Backbone Sampling

Methods to Model and Design Protein Alternative Conformations. Methods in Enzy-

mology 523:61–85.

31. Hu X, Wang H, Ke H, Kuhlman B (2007) High-resolution design of a protein loop.

Proc Natl Acad Sci U S A 104(45):17668–17673.

32. Murphy PM, Bolduc JM, Gallaher JL, Stoddard BL, Baker D (2009) Alteration of

enzyme specificity by computational loop remodeling and design. Proc Natl Acad Sci

U S A.

33. MacDonald JT, Maksimiak K, Sadowski MI, Taylor WR (2010) De novo backbone

scaffolds for protein design. Proteins 78(5):1311–1325.

34. Moore BL, Kelley LA, Barber J, Murray JW, MacDonald JT (2013) High quality

protein backbone reconstruction from alpha carbons using Gaussian mixture models.

Journal of Computational Chemistry 34(22):1881–1889.

35. MacDonald JT, Kelley LA, Freemont PS (2013) Validating a Coarse-Grained Potential

Energy Function through Protein Loop Modelling. PLoS ONE 8(6):e65770.

36. Leaver-Fay A et al. (2011) ROSETTA3: an object-oriented software suite for the

simulation and design of macromolecules. Methods in Enzymology 487(11):545–574.

37. Kortemme T (1998) Design of a 20 – Amino Acid , Three-Stranded beta-Sheet Pro-

tein. Science 281(5374):253–256.

38. Viguera AR, Serrano L (2001) Bergerac-SH3: “frustation” induced by stabilizing the

folding nucleus. Journal of Molecular Biology 311(2):357–371.

39. Conway P, Tyka MD, DiMaio F, Konerding DE, Baker D (2014) Relaxation of back-

bone bond geometry improves protein energy landscape modeling. Protein Science

23(1):47–55.

40. Pedelacq JD, Cabantous S, Tran T, Terwilliger TC, Waldo GS (2006) Engineering

and characterization of a superfolder green fluorescent protein. Nature Biotechnology

24(1):79–88.

41. Mou Y, Yu JY, Wannier TM, Guo CL, Mayo SL (2015) Computational design of

co-assembling protein–DNA nanowires. Nature 525(7568):230–233.

42. King NP et al. (2014) Accurate design of co-assembling multi-component protein

nanomaterials. Nature 510(7503):103–108.

43. Dueber JE et al. (2009) Synthetic protein scaffolds provide modular control over

metabolic flux. Nature Biotechnology 27(8):753–9.

MacDonald et al. PNAS Issue Date Volume Issue Number 5

Page 6: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 6 — #6 ii

ii

ii

Table 1. Summary of crystal structures of SynRFR proteins solved in this study.

Protein Crystal form Dimer pairs ∗ Dimer angle/◦ † Bundle motif‡ Resolution/A Space group PDBSynRFR24.1 I A:A’ 157 NO 1.76 P4122 4YC5SynRFR24.1 II A:B’ B:A’ 166, 166 YES 3.31 H32 4YDTSynRFR24.1 III A:A’ 164 NO 2.41 P3221 4YCQSynRFR24.2 IV A:A’ B:C’ C:B’ 165, 162, 162 YES 3.55 I222 4YEISynRFR28.1 V A:A’ 168 YES 3.39 H32 4YFOSynRFR28.1 VI A:B C:D E:F 160, 164, 167 NO 3.33 P212121 5DZBSynRFR20.1 VII A:A’ 156 NO 2.99 P43212 5DRASynRFR24.t1555 VIII A:A’ 161 NO 4.40 I4122 5DN0SynRFR24.t1428 IX A:B’ B:A’ 160, 160 NO 3.56 P3221 5DNSSynRFR24.t3284 IV A:A’ B:C’ C:B’ 164, 158, 158 YES 3.27 I222 5DQASynRFR24.t801 III A:A’ 168 NO 2.28 P3221 5DI5∗Dimers are shown between chains related by a two-fold axis at the C-termini, a prime indicates a symmetry-related partner.†The C-terminal dimer axis exhibits flexibility, with the angle between the solenoid axes of the dimer varying between 157◦ and 168◦ in the different structures.‡The “dimer bundle” packing motif is a bundle of 3 dimer pairs with D3 symmetry, seen in Fig. 4 C and D.

6 www.pnas.org/cgi/doi/??? MacDonald et al.

Page 7: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 7 — #7 ii

ii

ii

Fig. 1. The repeat five residue (RFR) beta-solenoid proteins. (a) a single superhelical turn

composed of four repeats with square cross-sectional profile. Each five residue repeat forms

one face of the square and twenty residues forms a helical turn with a ∼5 A rise. (b) view

down the beta-solenoid, showing leucine residues from position 3 in the repeat motif forming

the hydrophobic core. (c) Logo plot of residue frequencies used to produce stochastic sequence

region of the synRFR24.1 protein at each position in the RFR repeat and (d) the solved crystal

structures of the three synthetic length variants SynRFR20.1, SynRFR24.1, SynRFR28.1.

MacDonald et al. PNAS Issue Date Volume Issue Number 7

Page 8: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 8 — #8 ii

ii

ii

Fig. 2. Automated computational design and selection of de novo loops. (a) Superposition of the ten designed loop variants based on the SynRFR24.1 scaffold selected

for experimental characterisation. (b) Sequence vs Structure energy/RMSD matrices for the ten selected loops. The designed sequence for each of the 4000 sampled loop

structures was threaded onto all 4000 sampled loop structures and energy minimised. The resulting energies and all-atom loop RMSD (to the designed loop structure) values

were recorded giving two 4000×4000 matrices (SI Appendix, Fig. S7). Rows and columns corresponding the selected ten structures and sequences are shown here. (c) Loop

RMSD vs minimised Rosetta energy plots for the experimentally solved loop structures. The vertical red lines correspond to RMSD values of the solved crystal loop structure

to the designed structure. The black points represent the 4000 originally sampled loop conformations after sequence threading and energy minimisation. The green points

represent an additional 16000 conformations sampled with additional harmonic restraints to sample the region around the solved crystal structure conformation for each loop.

(d) Histogram of Pi values for all 4000 designs. Pi is the probability that sequence, i, is in a folded conformation assuming the loop conformations follow a Boltzmann

distribution. The vertical line at Pi = 0.9 and blue shaded region under the curve represent the selected designs. All RMSD values in this figure were calculated by superposing

the Cα atoms of the non-loop regions of the solenoid scaffold and calculating the all-atom RMSD of the region around the loop compared to the designed structure (residues

105-120) without further superposition.

8 www.pnas.org/cgi/doi/??? MacDonald et al.

Page 9: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 9 — #9 ii

ii

ii

Fig. 3. Crystal structure of SynRFR24.t1428 loop design. The protein was found to crystallise with two chains in the asymmetric unit. The designed loop structure (shown

in magenta) is shown superposed on the experimentally solved structures (shown in green) of (a) chain A and (b) chain B together with 2Fo-Fc electron density map contoured

at 1σ. The loops very closely matched the model structure with all atom RMSD values of 1.47 A for chain A and 0.71 A for chain B after superposition. If the non-loop region

Cα atoms of the scaffold were superposed, the all atom RMSDs compared to the design were determined to be 2.15 A (chain A) and 1.52 A (chain B) for the loop region

residues. Chain A has a flipped tryptophan side-chain compared to the design. (c) The loop embellishment mediates a higher order assembly of SynRFR24.t1428 in the crystal

lattice.

MacDonald et al. PNAS Issue Date Volume Issue Number 9

Page 10: A new class of synthetic beta-solenoid proteins with the ......basis for the design of a new class of synthetic repeat protein (Fig. 1 A and B) [21]. The RFR-fold has a number of properties

ii

“SynRFR˙PNAS˙FINAL˙revision” — 2016/6/16 — 16:47 — page 10 — #10 ii

ii

ii

Fig. 4. The lattice of the P3221 synRFR24.1 protein (a), and the synRFR24.t801 protein

(b). The dimer bundle packing motif of synRFR24.2 (c), also seen in (d) the structure of

synRFR24.t3284.

10 www.pnas.org/cgi/doi/??? MacDonald et al.