10
SHORT REVIEW Designing Proteins from the Inside Out Salvador Ventura, 1 * and Luis Serrano 2 1 Institut de Biotecnologia i de Biomedicina and Departament de Bioquimica i Biologia Molecular, Universitat Autonoma de Barcelona, Barcelona, Spain 2 European Molecular Biology Laboratory, Heidelberg, Germany ABSTRACT Globular proteins are character- ized by the specific and tight packing of hydropho- bic side-chains in the so-called “hydrophobic core.” Formation of the core is key in folding, stabilization, and conformational specificity. The critical role of hydrophobic cores in maintaining the highly or- dered structures present in natural proteins justi- fies the tremendous efforts devoted to their rede- sign. Both experimental and computational combinatorial-based approaches have been reported in the last years as powerful protein design tools. These manage to explore large regions of the se- quence/conformational space, allowing the search for alternative protein core arrangements display- ing native-like properties. The overall results ob- tained from core design projects have contributed significantly to our present knowledge of protein folding and function. In addition, core design has worked as a benchmark for the development of ambitious protein design projects that nowadays are allowing the de novo design of novel protein structures and functions. Proteins 2004;56:1–10. © 2004 Wiley-Liss, Inc. Key words: protein design; hydrophobic core; pro- tein folding; protein conformation; com- binatorial approaches INTRODUCTION Protein sequences are shaped by a complex interplay of different selective pressures that are still poorly under- stood. Although proteins perform their roles in vivo very efficiently, it is now clear that they are not fully optimized, but just fulfill the minimum requirements, in terms of stability and folding efficiency, that allow them to operate in the cell. 1–3 Thus, it has been shown that there is ample room for improvement of these properties, at least “in vitro.” 4–7 Protein design is concerned with finding amino acid sequences that are specifically compatible with tem- plate protein structures, 8,9 whereas the so-called inverse folding problem tries to characterize the set of all protein sequences compatible with a given fold. 10,11 Protein design has used computational, experimental, and in some cases hybrid approaches to gain insight into the inverse protein folding problem by sampling the amino acid sequence space compatible with certain protein folds. These efforts have provided clues to understand the underlying physical rules that govern protein folding, structure, and function. In the last few years, this knowledge has been successfully applied to the redesign of several proteins with native-like structures, some of them with new properties and func- tions. 12–15 Water-soluble proteins fold into compact structures that generally have hydrophobic side-chains buried in the interior and polar residues exposed on the outside of the molecule. The so-called hydrophobic effect, hiding hydro- phobic amino acids from solvent inside the protein core, is widely believed to be the main force driving protein folding. 16,17 However, it has been proved that protein stabilization can be also achieved by mutations in solvent- exposed regions. 18,19 Side-chains in protein cores are tightly packed and usually in single, well-defined, low- energy conformations. A recent survey of 100 well-resolved crystal structures shows an impressively well-fitted pack- ing in protein interiors, with the side-chains neatly inter- locked. 20 The tight packing of the hydrophobic core has been found to play a key role in the stability of proteins by providing many favorable van der Waals interactions, as well as excluding the solvent to maximize hydrophobic stabilization. 21 Consistent with this, even subtle substitu- tions inside proteins tend to be destabilizing. 22–24 On the other hand, there is evidence that hydrophobic interac- tions do not necessarily confer folding specificity to protein structures, and so the design of globular proteins with well-defined hydrophobic cores can be viewed mainly as a specificity problem. The side-chains in a disordered core adopt many alternative conformations with similar en- ergy, instead of assuming a single, specific arrangement. Anyhow, in many cases, these disordered conformations provide sufficient stabilization energy to keep the protein in a more-or-less folded state. 25–27 To achieve specificity, *Correspondence to: Salvador Ventura, Departament de Bioquimica i Biologia Molecular, Universitat Autonoma de Barcelona, 08193 Bellaterra, Barcelona, Spain. E-mail: [email protected] Received 4 February 2004; Accepted 4 February 2004 Published online 7 May 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20142 PROTEINS: Structure, Function, and Bioinformatics 56:1–10 (2004) © 2004 WILEY-LISS, INC.

Designing proteins from the inside out

Embed Size (px)

Citation preview

Page 1: Designing proteins from the inside out

SHORT REVIEW

Designing Proteins from the Inside OutSalvador Ventura,1* and Luis Serrano2

1Institut de Biotecnologia i de Biomedicina and Departament de Bioquimica i Biologia Molecular, Universitat Autonoma deBarcelona, Barcelona, Spain2European Molecular Biology Laboratory, Heidelberg, Germany

ABSTRACT Globular proteins are character-ized by the specific and tight packing of hydropho-bic side-chains in the so-called “hydrophobic core.”Formation of the core is key in folding, stabilization,and conformational specificity. The critical role ofhydrophobic cores in maintaining the highly or-dered structures present in natural proteins justi-fies the tremendous efforts devoted to their rede-sign. Both experimental and computationalcombinatorial-based approaches have been reportedin the last years as powerful protein design tools.These manage to explore large regions of the se-quence/conformational space, allowing the searchfor alternative protein core arrangements display-ing native-like properties. The overall results ob-tained from core design projects have contributedsignificantly to our present knowledge of proteinfolding and function. In addition, core design hasworked as a benchmark for the development ofambitious protein design projects that nowadaysare allowing the de novo design of novel proteinstructures and functions. Proteins 2004;56:1–10.© 2004 Wiley-Liss, Inc.

Key words: protein design; hydrophobic core; pro-tein folding; protein conformation; com-binatorial approaches

INTRODUCTION

Protein sequences are shaped by a complex interplay ofdifferent selective pressures that are still poorly under-stood. Although proteins perform their roles in vivo veryefficiently, it is now clear that they are not fully optimized,but just fulfill the minimum requirements, in terms ofstability and folding efficiency, that allow them to operatein the cell.1–3 Thus, it has been shown that there is ampleroom for improvement of these properties, at least “invitro.”4–7 Protein design is concerned with finding aminoacid sequences that are specifically compatible with tem-plate protein structures,8,9 whereas the so-called inversefolding problem tries to characterize the set of all proteinsequences compatible with a given fold.10,11 Protein designhas used computational, experimental, and in some caseshybrid approaches to gain insight into the inverse protein

folding problem by sampling the amino acid sequencespace compatible with certain protein folds. These effortshave provided clues to understand the underlying physicalrules that govern protein folding, structure, and function.In the last few years, this knowledge has been successfullyapplied to the redesign of several proteins with native-likestructures, some of them with new properties and func-tions.12–15

Water-soluble proteins fold into compact structures thatgenerally have hydrophobic side-chains buried in theinterior and polar residues exposed on the outside of themolecule. The so-called hydrophobic effect, hiding hydro-phobic amino acids from solvent inside the protein core, iswidely believed to be the main force driving proteinfolding.16,17 However, it has been proved that proteinstabilization can be also achieved by mutations in solvent-exposed regions.18,19 Side-chains in protein cores aretightly packed and usually in single, well-defined, low-energy conformations. A recent survey of 100 well-resolvedcrystal structures shows an impressively well-fitted pack-ing in protein interiors, with the side-chains neatly inter-locked.20 The tight packing of the hydrophobic core hasbeen found to play a key role in the stability of proteins byproviding many favorable van der Waals interactions, aswell as excluding the solvent to maximize hydrophobicstabilization.21 Consistent with this, even subtle substitu-tions inside proteins tend to be destabilizing.22–24 On theother hand, there is evidence that hydrophobic interac-tions do not necessarily confer folding specificity to proteinstructures, and so the design of globular proteins withwell-defined hydrophobic cores can be viewed mainly as aspecificity problem. The side-chains in a disordered coreadopt many alternative conformations with similar en-ergy, instead of assuming a single, specific arrangement.Anyhow, in many cases, these disordered conformationsprovide sufficient stabilization energy to keep the proteinin a more-or-less folded state.25–27 To achieve specificity,

*Correspondence to: Salvador Ventura, Departament de Bioquimicai Biologia Molecular, Universitat Autonoma de Barcelona, 08193Bellaterra, Barcelona, Spain. E-mail: [email protected]

Received 4 February 2004; Accepted 4 February 2004

Published online 7 May 2004 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/prot.20142

PROTEINS: Structure, Function, and Bioinformatics 56:1–10 (2004)

© 2004 WILEY-LISS, INC.

Page 2: Designing proteins from the inside out

the designed state, with a properly packed core, has tohave the lowest free energy of all possible states (groundstate), and there has to be a large free energy gap betweenthis and the rest of the accessible states. The attainment ofnative-like stability and specificity is the goal of anyprotein-core design project.

Pioneer studies of hydrophobic core redesign involvediterative mutations on a protein scaffold and structural–functional characterization of the new designs. Lately,both computational and experimental methods have usedcombinatorial approaches to select proper candidates fromthe vast sequence space available a priori for a particularfold.

Rational design, together with iterative experimentalapproximations, has provided detailed rules on hydropho-bic packing constraints. A nice example of this kind ofstudy is the work of the DeGrado group on 4-helix bundledesigns.28–32 This group and others33–37 established thatit was possible to create de novo sequences able to adoptdefined structures, providing new clues to understandprotein structure and function. From these experiments, itwas found that it was a surprisingly easy task to obtainproteins with the target global fold but very difficult toproperly reproduce local protein details, due to the lack ofspecificity of hydrophobic core packing. References to thesepioneering works are obligatory in any review on proteincores, but these approaches have been excellently re-viewed elsewhere.38,39 Therefore, in this review, we focuslargely on recent combinatorial approaches to attain prop-erly folded proteins with new packed hydrophobic cores.

By combining rational design and experiments in aniterative way, sequences folding into the desired structurehave been obtained. However, these approaches only ex-plore local regions of the vast sequence space. This meansthat other possible sequence combinations, which fitequally—or better—to the desired structure–function, areleft unexplored. On the other hand, using pure “nonra-tional” combinatorial approaches allows the testing of alarger region of the enormous space available for a proteinbut will result in successful sequences being found withvery low frequency. Thus, the most advantageous way toface a particular protein design problem appears to beblending both approaches in order to generate enoughdiversity to cover a significant chosen region of the se-quence space, thereby obtaining the highest probabilitiesto yield protein sequences with proper features.

EXPERIMENTAL COMBINATORIALAPPROACHES

Combinatorial approaches are powerful tools to findsolutions to problems, especially where we have only apartial knowledge of the molecular rules behind the pro-cess; protein folding is exactly such a case. A combinatorialfolding experiment has two key elements: the creation of alibrary with a desired degree of diversity, and the subse-quent search for sequences with proper conformationalproperties. Successful hydrophobic core design requiresoptimization of protein interactions inside the protein.However, in naturally occurring proteins, optimal packing

of the hydrophobic core is usually not related to anyobservable phenotype.

Selecting for Function

The development of systems to screen large combinato-rial libraries for proper hydrophobic core packing is diffi-cult in the absence of biological, observable phenotypes.Thus, the first experimental combinatorial approaches todesign protein cores relied on functional selection, assum-ing that, when the detection of protein activity is feasible,biological activity correlates with protein conformation.

Lim, Sauer and coworkers40,41 randomized combinatori-ally, by cassette mutagenesis, up to 4 interacting residuesin the hydrophobic core of the N-terminal domain oflambda repressor. Sequences in the resulting protein poolwere selected by their ability to bind DNA. Most of theprotein variants in the library show some level of biologicalactivity, indicating that basic structural information ap-pears to reside largely in the hydrophobic character of coreresidues. But only 2 sequences were found comparable tothe wild-type protein, in terms of stability and bindingactivity. This was one of the first indications in theliterature that proper core packing interactions are impor-tant determinants of the protein’s precise structure andstability. In this case, the extent of functional impairmentcorrelated with the extent of modification of the nativesequence, as expected if natural selection is acting onfunction.

Next, Alan Fersht’s group42 saturated the core of bar-nase with random hydrophobic substitutions, and activemutants were selected by taking advantage of the extremeautotoxicity of this enzyme when expressed in Escherichiacoli. Greater than 20% of the randomized sequencesmaintained the activity in vivo, and active protein variantswith no wild-type core residues were obtained. As in Limand Sauer’s work, hydrophobicity appeared to be a suffi-cient criterion to attain a somehow-functional core. Eventhough the relative levels of activity of different proteinsequences were not assayed in this work, it was proposedthat the refinement of these crude cores, to attain properfunction, is the most stringent sequence constraint. Fershtand coworkers suggested that new functions can be devel-oped more easily by limiting core design to mere specifica-tion of hydrophobicity and by using an iterative mutation–selection procedure to optimize core structure. On thebasis of this and other studies43,44 by this group onbarnase, it has been argued that every CH2 group in thehydrophobic core contributes equally to the net stability ofa protein. This would imply that this core is plastic andadjustable in thermodynamic terms.

Selecting for Folding

As it appears, the attainment of some degree of biologi-cal function only requires loose packing of the hydrophobiccore of a protein. This implies that selection of proteinvariants based on activity or binding ability will fail inselecting protein forms with optimized stability or highcore packing. Therefore, several methods that uncouplefunction and stability have been designed in recent years.

2 S. VENTURA AND L. SERRANO

Page 3: Designing proteins from the inside out

These methods allow the redesign of proteins for which noselective assays are available, and resemble in silicomethods, where no functional selection can be performed.Thus, it is possible to compare computational and experi-mental results in the same system, while decouplingselection from the evolutionary requirements of nature’sproteins. Finally, since no bias for function is introduced,rules describing the relation between sequence and struc-ture or stability can be indirectly inferred from theseapproaches.

One of the first experimental combinatorial approachesto design a new hydrophobic core without a functionalscreening was carried out by Mossing’s group.45 As tem-plate they used a previously designed version of thelambda Cro repressor. The scaffold protein had a three-dimensional (3D) structure very similar to that specifiedby the original design. However, the protein displayedpacking defects, with a somehow expanded hydrophobiccore, as deduced from the crystal structure.46 They appliedcombinatorial mutagenesis and a genetic screen for differ-ential protein expression level in E. coli in order toconstruct a second generation of proteins displaying alter-native arrangements of the hydrophobic core with en-hanced stability. In this work, structural inputs werecombined with combinatorial mutagenesis to search effi-ciently the combinatorial space, exploiting the loose corre-lation observed between protein stability and proteinexpression in bacterial systems.

In structural–stability-focused studies, the biophysicalproperties of highly purified protein samples are assayedin vitro. The fact that many designed proteins lack biologi-cal phenotypes make screening systems based on biophysi-cal properties valuable tools in combinatorial proteindesign. However, the need for purity has prevented theapplication of such approaches to high-throughput screen-ing. Recently, Johnson and Hecht have developed methodsthat enable rapid purification of semipure samples, suit-able for biophysical characterization, from bacterial ly-sates.47 The screening for properly packed structures isbased on methods capable of monitoring certain featurespresent in native-like proteins but absent in loose-packedvariants, such as the presence of sharp peaks and goodchemical-shift dispersion on monodimensional NMR spec-tra, or the higher protection of amide protons assayed bymass spectrometry.48,49 The Hecht group has appliedthese rapid procedures in their screening of combinatoriallibraries based on a binary patterning. In these proteinlibraries, the positions of polar and nonpolar residues arespecified explicitly, but the identities of these side-chainsare allowed to vary. Amino acid stretches maintaining theperiodicity of �-helix and/or �-sheet secondary structureare linked by glycine, proline, and polar-based turns.50 Byusing this simple binary code strategy, they have de-signed, evolved, and characterized de novo proteins thatfold cooperatively and specifically into �-helix or �-sheetstructures.50–53 Moreover, they have found that whenbinary patterning is applied to an appropriately designedstructural scaffold, the libraries contain a relatively largenumber of well-ordered structures. According to Mossing

and coworkers’ results, some bias toward selection ofnontoxic, better expressible variants may be expected inthis procedure, with a consequent enrichment in foldedspecies. From these studies, it appears that, for a givenstructural scaffold, many different amino acid combina-tions can specify folded structures. However, even thoughHecht et al. have isolated �-helical variants with goodstability and “near-native-like” structural features from asecond-generation library,54 no 3D structure of thesebinary-patterned proteins has been solved to date, andlittle can be said about the specificity of core packing.

Display technologies have emerged in the last few yearsas a powerful, totally-unbiased approach to generate large,random combinatorial peptide and protein libraries. Inphage display technology, genetically encoded multiplemutants of a target protein are displayed as fusions to aphage capsid. Phages that display a molecule with thedesired property can be easily selected from the libraryand decoded by sequencing the phage DNA.55 This fastapproach has been successfully used to modify and/oroptimize pre-existing protein functions56,57 and even toevolve proteins with completely new function.58 However,the usefulness of the phage display technique to studyprotein folding was not obvious a priori, since selection inthis method relies on function. Only recently, some groupshave achieved the selection of stable protein variantswithout a functional screen, in this way uncoupling func-tion and stability–structure. All these approaches takeadvantage of the correlation between a protein’s resistanceto proteolysis and its thermodynamic stability, for selec-tion of properly folded and stable variants from poorlyfolded or unfolded mutants that are quickly proteolyzed.

Kristensen and Winter,59 as well as Sieber and cowork-ers,60,61 developed very similar protein selection methodsemploying phage-selective infectivity: Phages consisting ofseveral domains and peptides or proteins can be insertedat the domain boundaries without loss of infectivity.However, the infectivity of the phage is lost when itsdomains are disconnected by proteolytic cleavage of theunstable protein inserts. Rounds of in vitro proteolysis,infection, and propagation were thus performed in order toenrich those phages containing the most stable variants ofthe protein insert. The system proved to be successful inselecting stable hydrophobic core variants of barnase59

and ribonuclease T1.60

Woolfson and coworkers62,63 chose an alternative phagedisplay model system, in which poly-his-tagged proteinvariants were displayed on the phage surface and immobi-lized onto nickel-coated surfaces. The bound fusion-phageswere then proteolyzed, and stable fusions were subse-quently used to infect the bacteria. This method wasdemonstrated in the context of a core-directed proteindesign project, in which stable native-like core mutants ofubiquitin were selected. The hydrophobic core of ubiquitindisplayed low plasticity and was surprisingly intolerant ofamino acid substitutions. Interestingly enough, these re-sults are consistent with the ones obtained by the Han-del’s64 and Wodak’s65 groups, using a computer basedapproach (see next section).

DESIGNING PROTEINS FROM THE INSIDE OUT 3

Page 4: Designing proteins from the inside out

All the above-mentioned display strategies discriminatebetween variants differing in conformational stability andare able to select from a large repertoire of variants thatonly marginally vary in stability. Phage display proteinstability–conformation selection has became very popularsince these first attempts and is being applied to solveprotein problems with increasing levels of complexity,such as the selection of amyloid-forming peptides,66 theredesign of a 4-helix bundle protein,67 the directed evolu-tion of barnase stability,68 or the generation of a folded,native-like protein using a very reduced amino acid alpha-bet.69,70

The relative importance of different noncovalent interac-tions for the stability of a protein is difficult to assess inaqueous solutions. Thus, much of our knowledge aboutprotein stability is based on extrapolations from conditionsunder which both the native and unfolded state aresignificantly populated; in the presence of denaturants, forexample. The development of new experimental systemsin which the relative contribution of different residues toprotein conformation and stability under physiologicalconditions can be directly assessed will provide new in-sights into the rules governing these properties. Hopefully,these strategies may be suitable for the exploration of avast space of sequence alternatives. A nice step in thisdirection can be found in the work of Linse and cowork-ers.71 They have studied the role of hydrophobic coresubstitutions on the reconstitution of a protein from itssubdomain fragments. For this purpose, they have chosena small Ca2�-binding protein of the EF-hand family. Theeffect of substitutions of hydrophobic core amino acids onthe affinity between the two EF-hands fragments, indifferent split protein variants, was measured using sur-face plasmon resonance technology. A strong correlationwas found between the affinity of the different subdomainmutants and the stability changes produced by the samecore substitutions in the intact protein. Thus, fragmentcomplementation appears to be a new, valuable method inassessing the relative importance of different interactionsthat stabilize the native states of a protein.

Overall, the emerging development of new combinato-rial experimental approaches in protein design is provid-ing new ways to probe structural and folding determi-nants. It should be mentioned though, that we are notdealing with pure combinatorial strategies, since, in mostcases, rational constraints have to be introduced at somepoint to restrict the large number of possible sequencesand obtain interpretable results. The recent developmentof screening methods, based on biophysical properties ofthe designed proteins rather than on their function, wouldin the future probably allow a synergistic integration ofexperimental and in silico results reducing the introducedrational bias.

COMPUTATIONAL COMBINATORIALAPPROACHES

The large sequence space accessible for a given protein,and the fact that side-chains can exist in a number oflow-energy conformations, provides hydrophobic core pack-

ing problems with a combinatorial explosion in complexity.Furthermore, many of these sequences—and their confor-mations—display only small differences in global energy.The immense combinatorial complexity and subtle ener-getic differences depict a landscape of virtually infinite,different, basic interaction possibilities. Only the recentdevelopment of several powerful computational methodshas allowed effective screening of these vast sequence andconformational spaces for candidate sequences with de-sired properties. In other words, design algorithms arewritten to find optimized sequences, with the lowest freeenergy of folding, for an existing target protein structureprior to any experimental work.

The inputs for a protein design algorithm are thestructural coordinates of a target protein. The programshould include a library of permissible, statistically signifi-cant conformations (rotamers) for each residue, an energyfunction to evaluate the compatibility of a given sequencewith the structural template, and a search method to findthe sequences with the lowest packing energy.

Algorithms differ mainly in the accuracy of their energyfunctions and search methods. The force field depends onthe balance of forces responsible for protein stability,including van der Waals interactions, hydrogen bonds, saltbridges, and hydrophobic/polar interactions. Usually theenergy term consists of three contributions: backbone–backbone, rotamer–backbone, and rotamer–rotamer ef-fects. In most algorithms, the backbone remains fixedduring the simulation; thus, the backbone–backbone termis not considered for the optimization. However, in somerecent algorithms, some main-chain flexibility has beenintroduced in the calculations with excellent results. Therotamer specification is combined with the calculated forcefield to create a discrete sequence–rotamer energy land-scape, in which each single point corresponds to a rotamercombination and an assigned energy. The dimensions ofthis space increase exponentially with protein size. Thesearch for a global minimum energy conformation in thisspace can be afforded by several search methods. Thesecan be classified as stochastic or deterministic algorithms:Stochastic algorithms include simulated annealing,72

Monte Carlo,73 and genetic algorithms,74 and rely onprobabilistic trajectories, where the outcome is deter-mined by the initial conditions, as well as the randomnumber generator seed. On average, the search progres-sively moves toward better scoring (lower energy) se-quences. The partially random nature of the search per-mits escape from local minima in the sequence–rotamerlandscape; however, there is always certain degree ofuncertainty, since it is impossible to confirm that the foundsolution corresponds to the lowest energy conformation.Deterministic methods include dead-end elimination75

and self-consistent mean-field,76–77 which will alwaysrepeat the same solution given that the same input is used.In essence, these methods eliminate successively thosesequence–rotamer states that cannot be part of the globaloptimum, until no further states can be discarded. How-ever, the two methods often do not converge to the samesolution. The detailed properties of different design pro-

4 S. VENTURA AND L. SERRANO

Page 5: Designing proteins from the inside out

grams have recently been thoroughly reviewed else-where.78,79 In this review, we focus on the application ofthese algorithms to hydrophobic-core design.

Computer-Aided Hydrophobic Core Design

Protein design seeks to obtain new proteins that foldinto a native-like, properly ordered and stable, predeter-mined structural target. A successful design should dis-play structural uniqueness, high stability, as deducedfrom its free energy of unfolding and/or thermal stability,and highly cooperative unfolding transitions. Since thehydrophobic core of a protein contributes to all the above-mentioned protein properties, preferential attention hasbeen paid to computer-aided protein core design.

Ponder and Richards80 described a pioneer algorithm,able to enumerate a set of predicted “allowable” sequencesthat are compatible with a given core structure. A discretelibrary of rotamers was used and compatible sequence-rotamers were evaluated attaining only to packing crite-ria, mainly steric constraints and the relative filling ofprotein interior. Avoidance of steric overlap appeared to bethe most stringent selection criterion. More than 50 theo-retically compatible alternative core sequences were se-lected. However, they were not tested experimentally.

Matthews and coworkers81 addressed for the first timethe evaluation of the precise impact of the exchange ofnatural core sequences for designed ones on protein stabil-ity, structure, and function. They designed and experimen-tally characterized core packing arrangements in bacterio-phage T4 lysozyme (Fig. 1). Upon selection of a set of coresequences, based on the packing constraints proposed byPonder and Richards,80 the program evaluated the rela-tive energies of selected candidates in the folded state bymeans of energy minimization. The selected designs showedlower but near-native activity and stability. Moreover,structural analysis of the proteins revealed that, in somecases, cooperative changes in structure, as well as instability, have occurred. This was interpreted as a genuinerepacking of the hydrophobic core. Although the projectwas quite successful, the program was still far away frombeing able to accurately predict protein stability andconformation.

Another pioneering work presented by Desjarlais andHandel82 on the core of the phage 434 Cro protein involvedthe use of two programs to repack the core of this protein.The first program constructed a “custom made” rotamerlibrary. For the first time, a separate library was createdfor each particular core position instead of just using a

Fig. 1. Three-dimensional structures of several proteins with redesigned cores. Secondary structure elements are displayed, as well as side-chainsof residues mutated with respect to the natural target protein. (1) T4 lysozyme (PDB: 1C68), (2) ubiquitin (PDB: 1UD7), (3) Spectrin-SH3 (PDB: 1E6G ),and (4) G�1 (PDB: 1FD6).

DESIGNING PROTEINS FROM THE INSIDE OUT 5

Page 6: Designing proteins from the inside out

library of rotamers derived statistically from structures inthe Protein Data Bank (PDB). This approach resulted inincreased combinatorial complexity. Therefore, a secondprogram called ROC was developed. ROC uses a geneticalgorithm with a van der Waals potential as a scoringfunction to search through the rotamer-sequence space forlow-energy packing alternatives. Alternative core se-quences selected as permissive by this algorithm lead tostable and folded structures. Interestingly enough, a pro-tein with 5 substitutions in the hydrophobic core appearedto be more stable than the naturally occurring form. As acontrol, randomly chosen hydrophobic cores resulted inunfolded proteins. The group further applied ROC todesign several ubiquitin core variants.64 In contrast toprevious results with the 434 Cro protein, all designedvariants were less stable than the native protein, suggest-ing more stringent packing requirements for �-sheet thanfor �-helix structures. ROC was not only able to identifyalternative core sequences that result in native-like pro-teins but was also quite effective in predicting the relativestabilities of the different variants. The solution structureof one of the redesigned ubiquitin variants suggests thatthe differences in stability detected between native ubiq-uitin and the highest scoring variants can be due to ahigher unpredictable flexibility present in the core ofredesigned ubiquitin83 (Fig. 1).

Dahiyat and Mayo84,85 developed an automated side-chain selection program that explicitly and qualitativelyconsiders specific packing interactions as a design crite-rion. This approach permitted varying packing forcesduring the design exercise and incorporated a minimumeffective level of steric forces to compensate for the restric-tive effect of a fixed backbone and the use of discreteside-chain rotamers in the simulation. This resulted in abroader sampling of sequences compatible with the targetstructure. They also implemented the dead-end elimina-tion theorem to optimize the sequence design, allowingrapid finding of the global minimum in the sequence–rotamer space. Recently, Mayo and coworkers have de-scribed an exact rotamer optimization method that dra-matically enhances the performance of dead-endelimination algorithms.86 Initially, the program was as-sessed for the core of G�1 protein domain. The flexibilityand stability of the designed proteins correlated well withthe degree of packing specificity selected during the de-sign.87 This demonstrated the relevance of specific packinginteractions in the design of native-like structures. Byproper selection of packing constraints, they isolated anative-like variant with higher stability than the naturalform. After validation of their program in the design ofprotein cores, Mayo’s group has successfully expanded therange of computational protein design to residues of allparts of a protein (the buried core, the solvent-exposedsurface, and the boundary between core and surface) withspectacular success.5,88,89

Most of the above-mentioned programs are focused onobtaining optimal packing energy in the folded state, andless attention is paid to the effect of core amino acidsubstitutions in the unfolded state. Protein design algo-

rithms empirically take into account the unfolded state byincluding hydration contributions on their energy func-tions. Doi and coworkers90 considered, in addition, en-tropic terms, which come mostly from the denatured state,to estimate changes in Gibbs free energies between thefolded and unfolded structures of a candidate sequence.Using this approach, they have engineered the core of abacterial malate dehydrogenase. Again, the highest scor-ing sequences resemble very much the wild-type se-quences in terms of stability. The good correlation foundbetween the predicted and experimental stability of thedesigned variants validates this approximation.

Two recently launched programs are those from thegroups of Farid and Wodak. Farid and coworkers devel-oped an algorithm called CORE,91 whose scoring functionincorporates parameters that are directly correlated tofree energies of unfolding, melting temperatures, andcooperativity. In their program, they also replaced van derWaals energies for hard-sphere bumps, with a consequentincrease in computing efficiency. The search method con-sisted of simulated annealing and Monte Carlo sampling.Wodak and coworkers have developed DESIGNER.65 Inthis algorithm, the energy of the produced models iscalculated as the sum of the CHARMM package non-bonded energy terms and a solvation free energy, depen-dent on the surface area. The search is performed by thedead elimination method, followed by exact or heuristicoptimizations.

Both the Farid and Wodak groups tested their programsby repacking the core of previously designed proteins, forwhich the thermodynamic parameters of the wild-type andsome designed forms had been already elucidated, thusallowing effective validation of the algorithm. CORE wastested by redesigning the core of two small proteins, the �1domain of protein G and the Cro protein of bacteriophage434, with the objective of isolating thermophilic forms. Inboth cases, the wild-type sequence was regenerated but, inaddition, a fair amount of sequences ranked higher, beingpredicted to be more stable than the natural sequence. Thesequences included a version of G�1, previously designedand described as hyperthermostable by Dahiyat andMayo,87 and the only variant of cro 434 characterized byDesjarlais and Handel82 as more stable than the corre-sponding wild-type form. Meanwhile, Wodak and cowork-ers65 repacked the cores of G�1 and ubiquitin. In thesecases, the best scoring sequences resembled very much thewild-type hydrophobic cores. As reported, Handel andcoworkers’ designs of the ubiquitin core64 turned out to beless stable than the natural protein, in accordance with thefact that DESIGNER selected virtually native variants forthis protein. This program did not rank the thermostablevariant of G�1 designed by Dahiyat and Mayo87 among theproteins with highest scores, but this optimized proteindiffers only in one substitution from sequences selected byDESIGNER. It is encouraging that the best scoring se-quences in these algorithms, for a given protein, resemblethose produced by other programs. This strongly suggeststhat design algorithms tend to provide similar solutions tothe same problem, independent of the approach.

6 S. VENTURA AND L. SERRANO

Page 7: Designing proteins from the inside out

Giving Some Flexibility

In the 3D structures of several redesigned protein cores,it has been demonstrated that small but global backbonemovements occur in order to allow the alternative coreside-chains to pack without necessitating large destabiliza-tion.84,92,93 In principle, this behavior may be predicted bymodeling main-chain flexibility instead of presupposing arigid backbone structure. However, backbone motions aredifficult to incorporate into packing calculations. First,because the number of main-chain conformations avail-able for a typical protein is enormous, the sampling of allbackbone rearrangements becomes unaffordable in termsof computational time. Second, there is a need to developproper energy functions to rank backbone conformers.Nevertheless, a few approaches to incorporate main-chainflexibility into protein core design have been described.

A first effort to simplify the backbone sampling problemwas done by the Kim group,94 by using algebraic parameter-ization of the secondary structure geometry to produce anensemble of related backbones. This reduces the number ofmain-chain rearrangements to sample and thus simplifiesand accelerates the conformational search, while ensuringthat the ensemble contains reasonable backbone conforma-tions. In this particular case, hydrophobic core packingwas successfully combined with explicit backbone flexibil-ity to search satisfactory coordinates for a given �-helicalbundle sequence. Kim’s group has successfully used thisapproach for the de novo design of a protein fold for whichno structural template was known: �-helical bundle pro-teins with a right-handed superhelical twist.95

Su and Mayo96 went a step further and presented amethod that allows both main-chain flexibility and se-quence selection using the core of the G�1 protein as amodel. The design process consists of two steps: First, a setof backbone conformations is generated by systematicmanipulation of the relative orientation of supersecondaryelements in the protein. This is followed by the traditionalautomated amino acid sequence selection on top of thegenerated backbones. When small but significant back-bone differences were considered, high-ranking sequencesdid not differ from those using a fixed backbone, suggest-ing that the algorithm energy function is tolerant to subtlechanges. The main advantage of the method is that itallows exploration of greater sequence diversity thanwhen unperturbed backbones are considered. In this way,highly substituted core versions of G�1 were produced thatstill maintain native-like properties. Ultimately, the char-acterization of the solution structure of two designedvariants confirms that the algorithm forecast sequencesfold specifically to a conformation close to the template,even when the backbone template is moderately perturbedwith respect to the natural structure97 (Fig. 1).

Desjarlais and Handel98 modified their original ROCprogram to develop SoftROC. This algorithm permitssimultaneous optimization of protein hydrophobic coreand backbone. It differs from the above-mentioned pro-grams in that it allows explicit main-chain flexibility atevery position in the structure, avoiding the need of a highdegree of symmetry in the target system. However, this

dramatically increases the combinatorial space and ex-cludes the use of deterministic approaches to perform thesearch. The first stage of this program is a genetic algo-rithm optimization of a restricted population of structures,obtained by varying the dihedral angles in the protein,starting from a torsional model of the template structure.Next, the energies of each model are evaluated using apotential that constrains the backbone to retain similarityto the target structure. Once the lowest energy conforma-tion has been selected, it is subjected to Monte Carlosampling for further structure refinement. The perfor-mance of the algorithm was tested by repacking the coresof the 434 cro protein and T4 lysozyme. Surprisingly,SoftROC was inferior to ROC in its accuracy to predictexperimental protein stabilities; the only advantage ofSoftROC appears when predicting the structure of proteinvariants with extreme core substitutions. In this way,Desjarlais and Handel produced two stable variants of 434cro protein that were predicted to be unstable using thefixed scaffold approach.

Giving flexibility to the backbone appears to be a promis-ing approach in de novo design where, by definition,precise starting template structures are not exactly de-fined. Nowadays, however, the reduced complexity andcomputing efficiency of fixed backbone approximationsmeans that this strategy should be the preferred optionwhen protein design needs to be performed against aknown structural target.

Core Packing Constraints and Strain in theHydrophobic Core

Most protein core studies assess the native-like proper-ties of their best scoring sequences by indirect methods,relying mainly on stability or spectroscopic studies, andonly a few 3D structures of core-designed proteins are yetavailable in atomic detail. Moreover, as best scoring/morestable sequences usually are quite similar to the wild-typeprotein, little can be learned about core packing con-straints and plasticity from these structures. On the otherhand, surprisingly little attention has been paid to thedetailed folding properties of redesigned core mutants.

We considered this an important and urgent area tostudy because, while automatic design does not takefolding properties into account, it is quite conceivable thatredesigned proteins may display altered folding propertiesthat may somehow compromise their function. This isespecially relevant in the case of protein cores, because inall proteins studied, some residues from the protein coreare included in the folding nucleus. With this in mind, weused our design algorithm PERLA99 to redesign the buriedresidues of the well-characterized spectrin-SH3 domain.100

We selected core sequences that, while being compatiblewith the given target in terms of calculated stability, wereas divergent as possible to natural core SH3 sequences. Weanalyzed the kinetics of folding of those forms, withstabilities near the wild-type one, and solved their atomicstructures. From this study, we learned that the SH3 coredisplays high plasticity and that constraints to core evolu-tion exist in nature, since redesigned hydrophobic cores

DESIGNING PROTEINS FROM THE INSIDE OUT 7

Page 8: Designing proteins from the inside out

displaying good stability, function, and structure showedconcerted changes in several residues, and such simulta-neous multiple replacements rarely occur naturally. Morestriking was the discovery that protein cores with native-like stability and function, thus selected as optimal in anydesign exercise, displayed anomalous folding features,with a preferential stabilization of the transition state offolding. The structural characterization confirmed over-packing and strain in the hydrophobic core. When thevolume of the core was diminished by just one methylgroup, the proteins recovered wild-type-like folding proper-ties and structures. These new designs displayed dramaticincreases in stability. Therefore, producing a stable pro-tein with a new hydrophobic core does not necessarilyimply that the design exercise has been completely success-ful, and special attention should be paid in the future tofolding properties of designed proteins.

CONCLUSIONS

In the last few years, we have witnessed an impressiveprogress in protein core design. Many algorithms havesucceeded in decorating a template structure with a newcore side-chain combination. The results obtained areglobally robust and apparently quite independent of thescoring function used for model-to-template evaluation, orof the fact that a static scaffold is used, or that a flexiblebackbone is considered. In several cases, designs withhigher stability than the naturally occurring protein havebeen isolated. However, we have learned cautionary talesfrom the detailed analysis of the structural and foldingproperties of designed core variants. Specially, attentionshould be paid to the folding features of the high-scoringsequences obtained in design exercises, because during theselection process, folding is not considered. Hence, theselected candidates may be far away from the optimalsequence for a given template, due to anomalous foldingproperties.

Especially exciting is the appearance of new combinato-rial experimental approaches that uncouple selection fromthe evolutionary pressures acting on natural proteins.These methods resemble computational methods, in thatthe sequences produced are essentially free from evolution-ary constraints other than proper folding and stability,allowing comparison of computational and experimentalresults. In fact, when these different approaches have beenapplied on the same protein system, the selected se-quences display similar characteristics. This is especiallyencouraging, because one can foresee a future in whichintegrated combinatorial computational and experimentalapproaches will allow constraining the astronomical se-quential–conformational space compatible with a complexde novo designed protein. This could be achieved by an insilico selection of successful crude candidates, followed bya stringent experimental assessment of the desired proper-ties in this reduced library. It appears that, in a short time,we are going to be able not only to create new folds but alsoeven to append new functional properties onto such struc-tures.

ACKNOWLEDGMENTS

We would like to thank Drs. Ulf Ekstrom and MarkIsalan for kindly revising this manuscript.

REFERENCES

1. Adams MWW, Kelly RM, editors. Biocatalysis at extreme tempera-tures: enzyme systems near and above 100°C. Washington, DC:ACS Press; 1992.

2. Demetrius L. Thermodynamics and kinetics of protein folding: anevolutionary perspective. J Theor Biol 2002;217:397–411.

3. Kim DE, Gu H, Baker D. The sequences of small proteins are notextensively optimized for rapid folding by natural selection. ProcNatl Acad Sci USA 1998;95:4982–4986.

4. van den Burg B, Eijsink VG. Selection of mutations for increasedprotein stability. Curr Opin Biotechnol 2002;4:333–337.

5. Malakauskas SM, Mayo SL. Design, structure and stability of ahyperthermophilic protein variant. Nat Struct Biol 1998;5:470–475.

6. Villegas V, Viguera AR, Aviles FX, Serrano L. Stabilization ofproteins by rational design of alpha-helix stability using helix/coil transition theory. Fold Des 1996;1:29–34.

7. Martinez JC, Pisabarro MT, Serrano L. Obligatory steps inprotein folding and the conformational diversity of the transitionstate. Nat Struct Biol 1998;5:721–729.

8. Regan L, Jackson SE. Engineering and design: protein design:theory and practice. Curr Opin Struct Biol 2003;13:479–481.

9. Pokala N, Handel TM. Review: protein design—where we were,where we are, where we’re going. J Struct Biol 2001;134:269–281.

10. Drexler KE. Molecular engineering: an approach to the develop-ment of general capabilities for molecular manipulation. ProcNatl Acad Sci USA 1981;78:5275–5278.

11. Pabo C. Molecular technology: designing proteins and peptides.Nature 1983;301:200.

12. Lazar GA, Marshall SA, Plecs JJ, Mayo SL, Desjarlais JR.Designing proteins for therapeutic applications. Curr Opin StructBiol 2003;13:513–518.

13. Bolon DN, Voigt CA, Mayo SL. De novo design of biocatalysts.Curr Opin Chem Biol 2002;6:125–129.

14. Summa CM, Lombardi A, Lewis M, DeGrado WF. Tertiarytemplates for the design of diiron proteins. Curr Opin Struct Biol1999;9:500–508.

15. Reina J, Lacroix E, Hobson SD, Fernandez-Ballester G, Rybin V,Schwab MS, Serrano L, Gonzalez C. Computer-aided design of aPDZ domain to recognize new target sequences. Nat Struct Biol2002;9:621–627.

16. Tanford C. The hydrophobic effect and the organization of livingmatter. Science 1978;200:1012–1018.

17. Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH.Hydrophobicity of amino acid residues in globular proteins.Science 1985;229:834–838.

18. Eijsink VG, Veltman OR, Aukema W, Vriend G, Venema GStructural determinants of the stability of thermolysin-likeproteinases. Nat Struct Biol 1995;2:374–379.

19. Spector S, Wang M, Carp SA, Robblee J, Hendsch ZS, Fairman R,Tidor B, Raleigh DP. Rational modification of protein stability bythe mutation of charged surface residues. Biochemistry 2000;39:872–879.

20. Word JM, Lovell SC, LaBean TH, Taylor HC, Zalis ME, PresleyBK, Richardson JS, Richardson DC. Visualizing and quantifyingmolecular goodness-of-fit: small-probe contact dots with explicithydrogen atoms. J Mol Biol 1999;285:1711–1733.

21. Dill KA. Dominant forces in protein folding. Biochemistry 1990;29:7133–7155.

22. Buckle AM, Cramer P, Fersht AR. Structural and energeticresponses to cavity-creating mutations in hydrophobic cores:observation of a buried water molecule and the hydrophilicnature of such hydrophobic cavities. Biochemistry 1996;35:4298–4305.

23. Eriksson AE, Baase WA, Zhang XJ, Heinz DW, Blaber M,Baldwin EP, Matthews BW. Response of a protein structure tocavity-creating mutations and its relation to the hydrophobiceffect. Science 1992;255:178–183.

24. Lee B. Estimation of the maximum change in stability of globular

8 S. VENTURA AND L. SERRANO

Page 9: Designing proteins from the inside out

proteins upon mutation of a hydrophobic residue to another ofsmaller size. Protein Sci 1993;2:733–738.

25. Hecht MH, Richardson JS, Richardson DC, Ogden RC. De novodesign, expression, and characterization of Felix: a four-helixbundle protein of native-like sequence. Science. 1990;249:884–891.

26. Handel TM, Williams SA, DeGrado WF. Metal ion-dependentmodulation of the dynamics of a designed protein. Science1993;261:879–885.

27. Gibney BR, Johansson JS, Rabanal F, Skalicky JJ, Wand AJ,Dutton PL. Global topology and stability and local structure anddynamics in a synthetic spin-labeled four-helix bundle protein.Biochemistry 1997;36:2798–2806.

28. Regan L, DeGrado WF. Characterization of a helical proteindesigned from first principles. Science 1988;241:976–978.

29. Raleigh DP, Betz SF, DeGrado WF. A de novo designed proteinmimics the native state of natural proteins. J Am Chem Soc1995;117:7558.

30. Hill RB, Hong J-K, DeGrado WF. Hydrogen bonded cluster canspecify the native state of a protein. J Am Chem Soc 2000;122:746–747.

31. Hill RB, DeGrado WF. A polar, solvent-exposed residue can beessential for native protein structure. Struct Fold Des 2000;8:471–479.

32. Hill RB, Bracken C, DeGrado WF, Palmer AG. Molecular mo-tions and protein folding: characterization of the backbonedynamics and folding equilibrium of �2D using 13C NMR spinrelaxation. J Am Chem Soc 2000;122:11610–11619.

33. Hecht MH, Richardson JS, Richardson DC, Ogden RC. De novodesign, expression, and characterization of Felix: a four-helixbundle protein of native-like sequence. Science 1990;249:884–891.

34. Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH. Proteindesign by binary patterning of polar and nonpolar amino acids.Science 1993;262:1680–1685.

35. Olofsson S, Baltzer L. Structure and dynamics of a designedhelix–loop–helix dimer in dilute aqueous trifluoroethanol solu-tion: a strategy for NMR spectroscopic structure determination ofmolten globules in the rational design of native-like proteins.Fold Des 1996;1:347–356.

36. Struthers MD, Cheng RP, Imperiali B. Design of a monomeric23-residue polypeptide with defined tertiary structure. Science1996;271:342–345.

37. Kortemme T, Ramirez-Alvarado M, Serrano L. Design of a20-amino acid, three-stranded beta-sheet protein. Science 1998;281:253–256.

38. Hill RB, Raleigh DP, Lombardi A, DeGrado WF. De novo designof helical bundles as models for understanding protein foldingand function. Acc Chem Res 2000;33:745–754.

39. Baltzer L, Nilsson H, Nilsson J. De novo design of proteins—what are the rules? Chem Rev 2001;101:3153–3163.

40. Lim WA, Sauer RT. Alternative packing arrangements in thehydrophobic core of lambda repressor. Nature 1989;339:31–36.

41. Lim WA, Hodel A, Sauer RT, Richards FM. The crystal structureof a mutant protein with altered but improved hydrophobic corepacking. Proc Natl Acad Sci USA 1994;91:423–427.

42. Axe DD, Foster NW, Fersht AR. Active barnase variants withcompletely random hydrophobic cores. Proc Natl Acad Sci USA1996;93:5590–5594.

43. Kellis JT Jr, Nyberg K, Sali D, Fersht AR. Contribution ofhydrophobic interactions to protein stability. Nature 1988;333:784–786.

44. Buckle AM, Henrick K, Fersht AR. Crystal structural analysis ofmutations in the hydrophobic cores of barnase. J Mol Biol1993;234:847–860.

45. Mollah AK, Aleman MA, Albright RA, Mossing MC. Core packingdefects in an engineered Cro monomer corrected by combinato-rial mutagenesis. Biochemistry 1996;35:743–748.

46. Albright RA, Mossing MC, Matthews BW. High-resolution struc-ture of an engineered Cro monomer shows changes in conforma-tion relative to the native dimer. Biochemistry 1996;35:735–742.

47. Johnson BH, Hecht MH. Recombinant proteins can be isolatedfrom E. coli cells by repeated cycles of freezing and thawing.Biotechnology (NY) 1994;12:1357–1360.

48. Roy S, Helmer KJ, Hecht MH. Detecting native-like properties incombinatorial libraries of de novo proteins. Fold Des 1997;2:89–92.

49. Rosenbaum DM, Roy S, Hecht MH. Screening combinatoriallibraries of de novo proteins by hydrogen-deuterium exchangeand electrospray mass spectrometry. J Am Chem Soc 1999;121:9509–9513.

50. Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH. Proteindesign by binary patterning of polar and nonpolar amino acids.Science 1993;262:1680–1685.

51. West MW, Hecht MH. Binary patterning of polar and nonpolaramino acids in the sequences and structures of native proteins.Protein Sci 1995;4:2032–2039.

52. Wang W, Hecht MH. Rationally designed mutations convert denovo amyloid-like fibrils into monomeric beta-sheet proteins.Proc Natl Acad Sci USA 2002;99:2760–2765.

53. Roy S, Hecht MH. Cooperative thermal denaturation of proteinsdesigned by binary patterning of polar and nonpolar amino acids.Biochemistry 2000;39:4603–4607.

54. Wei Y, Liu T, Sazinsky SL, Moffet DA, Pelczer I, Hecht MH.Stably folded de novo proteins from a designed combinatoriallibrary. Protein Sci 2003;12:92–102.

55. Smith GP. Filamentous fusion phage: novel expression vectorsthat display cloned antigens on the virion surface. Science1985;228:1315–1317.

56. Barbas CF III, Hu D, Dunlop N, Sawyer L, Cababa D, HendryRM, Nara PL, Burton DR. In vitro evolution of a neutralizinghuman antibody to human immunodeficiency virus type 1 toenhance affinity and broaden strain cross-reactivity. Proc NatlAcad Sci USA 1994;91:3809–3813.

57. Saggio I, Gloaguen I, Poiana G, Laufer R. CNTF variants withincreased biological potency and receptor selectivity define afunctional site of receptor interaction. EMBO J 1995;14:3045–3054.

58. Nizak C, Monier S, del Nery E, Moutel S, Goud B, Perez F.Recombinant antibodies to the small GTPase Rab6 as conforma-tion sensors. Science 2003;300:984–987.

59. Kristensen P, Winter G. Proteolytic selection for protein foldingusing filamentous bacteriophages. Fold Des 1998;3:321–328.

60. Sieber V, Pluckthun A, Schmid FX. Selecting proteins withimproved stability by a phage-based method. Nat Biotechnol1998;16:955–960.

61. Jung S, Arndt KM, Muller KM, Pluckthun A. Selectively infec-tive phage (SIP) technology: scope and limitations. J ImmunolMethods 1999;231:93–104.

62. Finucane MD, Tuna M, Lees JH, Woolfson DN. Core-directedprotein-design: I. An experimental method for selecting stableproteins from combinatorial libraries. Biochemistry 1999;38:11604–11612.

63. Finucane MD, Woolfson DN. Core-directed protein design: II.Rescue of a multiply mutated and destabilized variant of ubiq-uitin. Biochemistry 1999;38:11613–11623.

64. Lazar GA, Desjarlais JR, Handel TM. De novo design of thehydrophobic core of ubiquitin. Protein Sci 1997;6:1167–1178.

65. Wernisch L, Hery S, Wodak SJ. Automatic protein design withall atom force-fields by exact and heuristic optimization. J MolBiol 2000;301:713–736.

66. Koscielska-Kasprzak K, Otlewski J. Amyloid-forming peptidesselected proteolytically from phage display library. Protein Sci2003;12:1675–1685.

67. Chu R, Takei J, Knowlton JR, Andrykovitch M, Pei W, KajavaAV, Steinbach PJ, Ji X, Bai Y. Redesign of a four-helix bundleprotein by phage display coupled with proteolysis and structuralcharacterization by NMR and X-ray crystallography. J Mol Biol2002;323:253–262.

68. Pedersen JS, Otzen DE, Kristensen P. Directed evolution ofbarnase stability using proteolytic selection. J Mol Biol 2002;323:115–123.

69. Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, GrantcharovaVP, Yi Q, Baker D. Functional rapidly folding proteins fromsimplified amino acid sequences. Nat Struct Biol 1997;4:805–809.

70. Yi Q, Rajagopal P, Klevit RE, Baker D. Structural and kineticcharacterization of the simplified SH3 domain FP1. Protein Sci2003;12:776–783.

71. Berggard T, Julenius K, Ogard A, Drakenberg T, Linse S.Fragment complementation studies of protein stabilization byhydrophobic core residues. Biochemistry 2001;40:1257–1264.

72. Nilges M, Gronenborn AM, Brunger AT, Clore GM. Determina-tion of three-dimensional structures of proteins by simulated

DESIGNING PROTEINS FROM THE INSIDE OUT 9

Page 10: Designing proteins from the inside out

annealing with interproton distance restraints: application tocrambin, potato carboxypeptidase inhibitor and barley serineproteinase inhibitor 2. Protein Eng 1988;2:27–38.

73. Metropolis N, Rosenbluth AW, Rosenbluth MN,Teller AH, TellerE. Equations of state calculations by fast computing machines.J Chem Phys 1953;21:1087–1092.

74. Holland JH. Adaptation in natural and artificial systems. Bos-ton: MIT Press; 1993.

75. Desmet J, De Maeyer M, Hazes B, Lasters I. The dead-endelimination theorem and its use in protein side-chain positioning.Nature 1992;356:539–542.

76. Lee C. Predicting protein mutant energetics by self-consistentensemble optimization. J Mol Biol 1994;236:918–939.

77. Koehl P, Delarue M. Mean-field minimization methods for biologi-cal macromolecules. Curr Opin Struct Biol 1996;6:222–226.

78. Desjarlais JR, Clarke ND. Computer search algorithms in pro-tein modification and design. Curr Opin Struct Biol 1998;8:471–475.

79. Street AG, Mayo SL. Computational protein design. Struct FoldDes 1999;7:105–109.

80. Ponder JW, Richards FM. Tertiary templates for proteins: use ofpacking criteria in the enumeration of allowed sequences fordifferent structural classes. J Mol Biol 1987;193:775–791.

81. Hurley JH, Baase WA, Matthews BW. Design and structuralanalysis of alternative hydrophobic core packing arrangementsin bacteriophage T4 lysozyme. J Mol Biol 1992;224:1143–1159.

82. Desjarlais JR, Handel TM. De novo design of the hydrophobiccores of proteins. Protein Sci 1995;4:2006–2018.

83. Johnson EC, Lazar GA, Desjarlais JR, Handel TM. Solutionstructure and dynamics of a designed hydrophobic core variant ofubiquitin. Struct Fold Des 1999;7:967–976.

84. Dahiyat BI, Mayo SL. Protein design automation. Protein Sci1996;5:895–903.

85. Dahiyat BI, Mayo SL. Probing the role of packing specificity inprotein design. Proc Natl Acad Sci USA 1997;94:10172–10177.

86. Gordon DB, Hom GK, Mayo SL, Pierce NA. Exact rotameroptimization for protein design. J Comput Chem 2003;24:232–243.

87. Dahiyat BI, Mayo SL. Probing the role of packing specificity inprotein design. Proc Natl Acad Sci USA 1997;94:10172–10177.

88. Dahiyat BI, Mayo SL. De novo protein design: fully automatedsequence selection. Science 1997;278:82–87.

89. Shimaoka M, Shifman JM, Jing H, Takagi J, Mayo SL, SpringerTA. Computational design of an integrin I domain stabilized inthe open high affinity conformation. Nat Struct Biol 2000;7:674–678.

90. Kono H, Nishiyama M, Tanokura M, Doi J. Designing thehydrophobic core of Thermus flavus malate dehydrogenase basedon side-chain packing. Protein Eng 1998;11:47–52.

91. Jiang X, Farid H, Pistor E, Farid RS. A new approach to thedesign of uniquely folded thermally stable proteins. Protein Sci2000;9:403–416.

92. Baldwin EP, Hajiseyedjavadi O, Baase WA, Matthews BW. Therole of backbone flexibility in the accommodation of variants thatrepack the core of T4 lysozyme. Science 1993;262:1715–1718.

93. Lim WA, Hodel A, Sauer RT, Richards FM. The crystal structureof a mutant protein with altered but improved hydrophobic corepacking. Proc Natl Acad Sci USA 1994;91:423–427.

94. Harbury PB, Tidor B, Kim PS. Repacking protein cores withbackbone freedom: structure prediction for coiled coils. Proc NatlAcad Sci USA 1995;92:8408–8412.

95. Harbury PB, Plecs JJ, Tidor B, Alber T, Kim PS. High-resolutionprotein design with backbone freedom. Science 1998;282:1462–1467.

96. Su A, Mayo SL. Coupling backbone flexibility and amino acidsequence selection in protein design. Protein Sci 1997;6:1701–1707.

97. Ross SA, Sarisky CA, Su A, Mayo SL. Designed protein G corevariants fold to native-like structures: sequence selection byORBIT tolerates variation in backbone specification. Protein Sci2001;10:450–454.

98. Desjarlais JR, Handel TM. Designed protein G core variants foldto native-like structures: sequence selection by ORBIT toleratesvariation in backbone specification. Protein Sci 2001;10:450–454.

99. Angrand I, Serrano L, Lacroix E. Computer-assisted re-design ofspectrin SH3 residue clusters. Biomol Eng 2001;18:125–134.

100. Ventura S, Vega MC, Lacroix E, Angrand I, Spagnolo L, SerranoL. Conformational strain in the hydrophobic core and its implica-tions for protein folding and design. Nat Struct Biol 2002;9:485–493.

10 S. VENTURA AND L. SERRANO