Designing Proteins from the Inside OutSalvador Ventura,1* and Luis Serrano21Institut de Biotecnologia i de Biomedicina and Departament de Bioquimica i Biologia Molecular, Universitat Autonoma deBarcelona, Barcelona, Spain2European Molecular Biology Laboratory, Heidelberg, Germany
ABSTRACT Globular proteins are character-ized by the specic and tight packing of hydropho-bic side-chains in the so-called hydrophobic core.Formation of the core is key in folding, stabilization,and conformational specicity. The critical role ofhydrophobic cores in maintaining the highly or-dered structures present in natural proteins justi-es the tremendous efforts devoted to their rede-sign. Both experimental and computationalcombinatorial-based approaches have been reportedin the last years as powerful protein design tools.These manage to explore large regions of the se-quence/conformational space, allowing the searchfor alternative protein core arrangements display-ing native-like properties. The overall results ob-tained from core design projects have contributedsignicantly to our present knowledge of proteinfolding and function. In addition, core design hasworked as a benchmark for the development ofambitious protein design projects that nowadaysare allowing the de novo design of novel proteinstructures and functions. Proteins 2004;56:110. 2004Wiley-Liss, Inc.
Key words: protein design; hydrophobic core; pro-tein folding; protein conformation; com-binatorial approaches
Protein sequences are shaped by a complex interplay ofdifferent selective pressures that are still poorly under-stood. Although proteins perform their roles in vivo veryefciently, it is now clear that they are not fully optimized,but just fulll the minimum requirements, in terms ofstability and folding efciency, that allow them to operatein the cell.13 Thus, it has been shown that there is ampleroom for improvement of these properties, at least invitro.47 Protein design is concerned with nding aminoacid sequences that are specically compatible with tem-plate protein structures,8,9 whereas the so-called inversefolding problem tries to characterize the set of all proteinsequences compatible with a given fold.10,11 Protein designhas used computational, experimental, and in some caseshybrid approaches to gain insight into the inverse protein
folding problem by sampling the amino acid sequencespace compatible with certain protein folds. These effortshave provided clues to understand the underlying physicalrules that govern protein folding, structure, and function.In the last few years, this knowledge has been successfullyapplied to the redesign of several proteins with native-likestructures, some of them with new properties and func-tions.1215
Water-soluble proteins fold into compact structures thatgenerally have hydrophobic side-chains buried in theinterior and polar residues exposed on the outside of themolecule. The so-called hydrophobic effect, hiding hydro-phobic amino acids from solvent inside the protein core, iswidely believed to be the main force driving proteinfolding.16,17 However, it has been proved that proteinstabilization can be also achieved by mutations in solvent-exposed regions.18,19 Side-chains in protein cores aretightly packed and usually in single, well-dened, low-energy conformations. A recent survey of 100 well-resolvedcrystal structures shows an impressively well-tted pack-ing in protein interiors, with the side-chains neatly inter-locked.20 The tight packing of the hydrophobic core hasbeen found to play a key role in the stability of proteins byproviding many favorable van der Waals interactions, aswell as excluding the solvent to maximize hydrophobicstabilization.21 Consistent with this, even subtle substitu-tions inside proteins tend to be destabilizing.2224 On theother hand, there is evidence that hydrophobic interac-tions do not necessarily confer folding specicity to proteinstructures, and so the design of globular proteins withwell-dened hydrophobic cores can be viewed mainly as aspecicity problem. The side-chains in a disordered coreadopt many alternative conformations with similar en-ergy, instead of assuming a single, specic arrangement.Anyhow, in many cases, these disordered conformationsprovide sufcient stabilization energy to keep the proteinin a more-or-less folded state.2527 To achieve specicity,
*Correspondence to: Salvador Ventura, Departament de Bioquimicai Biologia Molecular, Universitat Autonoma de Barcelona, 08193Bellaterra, Barcelona, Spain. E-mail: email@example.com
Received 4 February 2004; Accepted 4 February 2004
Published online 7 May 2004 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/prot.20142
PROTEINS: Structure, Function, and Bioinformatics 56:110 (2004)
2004 WILEY-LISS, INC.
the designed state, with a properly packed core, has tohave the lowest free energy of all possible states (groundstate), and there has to be a large free energy gap betweenthis and the rest of the accessible states. The attainment ofnative-like stability and specicity is the goal of anyprotein-core design project.
Pioneer studies of hydrophobic core redesign involvediterative mutations on a protein scaffold and structuralfunctional characterization of the new designs. Lately,both computational and experimental methods have usedcombinatorial approaches to select proper candidates fromthe vast sequence space available a priori for a particularfold.
Rational design, together with iterative experimentalapproximations, has provided detailed rules on hydropho-bic packing constraints. A nice example of this kind ofstudy is the work of the DeGrado group on 4-helix bundledesigns.2832 This group and others3337 established thatit was possible to create de novo sequences able to adoptdened structures, providing new clues to understandprotein structure and function. From these experiments, itwas found that it was a surprisingly easy task to obtainproteins with the target global fold but very difcult toproperly reproduce local protein details, due to the lack ofspecicity of hydrophobic core packing. References to thesepioneering works are obligatory in any review on proteincores, but these approaches have been excellently re-viewed elsewhere.38,39 Therefore, in this review, we focuslargely on recent combinatorial approaches to attain prop-erly folded proteins with new packed hydrophobic cores.
By combining rational design and experiments in aniterative way, sequences folding into the desired structurehave been obtained. However, these approaches only ex-plore local regions of the vast sequence space. This meansthat other possible sequence combinations, which tequallyor betterto the desired structurefunction, areleft unexplored. On the other hand, using pure nonra-tional combinatorial approaches allows the testing of alarger region of the enormous space available for a proteinbut will result in successful sequences being found withvery low frequency. Thus, the most advantageous way toface a particular protein design problem appears to beblending both approaches in order to generate enoughdiversity to cover a signicant chosen region of the se-quence space, thereby obtaining the highest probabilitiesto yield protein sequences with proper features.
Combinatorial approaches are powerful tools to ndsolutions to problems, especially where we have only apartial knowledge of the molecular rules behind the pro-cess; protein folding is exactly such a case. A combinatorialfolding experiment has two key elements: the creation of alibrary with a desired degree of diversity, and the subse-quent search for sequences with proper conformationalproperties. Successful hydrophobic core design requiresoptimization of protein interactions inside the protein.However, in naturally occurring proteins, optimal packing
of the hydrophobic core is usually not related to anyobservable phenotype.
Selecting for Function
The development of systems to screen large combinato-rial libraries for proper hydrophobic core packing is dif-cult in the absence of biological, observable phenotypes.Thus, the rst experimental combinatorial approaches todesign protein cores relied on functional selection, assum-ing that, when the detection of protein activity is feasible,biological activity correlates with protein conformation.
Lim, Sauer and coworkers40,41 randomized combinatori-ally, by cassette mutagenesis, up to 4 interacting residuesin the hydrophobic core of the N-terminal domain oflambda repressor. Sequences in the resulting protein poolwere selected by their ability to bind DNA. Most of theprotein variants in the library show some level of biologicalactivity, indicating that basic structural information ap-pears to reside largely in the hydrophobic character of coreresidues. But only 2 sequences were found comparable tothe wild-type protein, in terms of stability and bindingactivity. This was one of the rst indications in theliterature that proper core packing interactions are impor-tant determinants of the proteins precise structure andstability. In this case, the extent of functional impairmentcorrelated with the extent of modication of the nativesequence, as expected if natural selection is acting onfunction.
Next, Alan Fershts group42 saturated the core of bar-nase with random hydrophobic substitutions, and activemutants were selected by taking advantage of the extremeautotoxicity of this enzyme when expressed in Escherichiacoli. Greater than 20% of the randomized sequencesmaintained the activity in vivo, and active protein variantswith no wild-type core residues were obtained. As in Limand Sauers work, hydrophobicity appeared to be a suf-cient criterion to attain a somehow-functional core. Eventhough the relative levels of activity of different proteinsequences were not assayed in this work, it was proposedthat the renement of these crude cores, to attain properfunction, is the most stringent sequence constraint. Fershtand coworkers suggested that new functions can be devel-oped more easily by limiting core design to mere specica-tion of hydrophobicity and by using an iterative mutationselection procedure to optimize core structure. On thebasis of this and other studies43,44 by this group onbarnase, it has been argued that every CH2 group in thehydrophobic core contributes equally to the net stability ofa protein. This would imply that this core is plastic andadjustable in thermodynamic terms.
Selecting for Folding
As it appears, the attainment of some degree of biologi-cal function only requires loose packing of the hydrophobiccore of a protein. This implies that selection of proteinvariants based on activity or binding ability will fail inselecting protein forms with optimized stability or highcore packing. Therefore, several methods that uncouplefunction and stability have been designed in recent years.
2 S. VENTURA AND L. SERRANO
These methods allow the redesign of proteins for which noselective assays are available, and resemble in silicomethods, where no functional selection can be performed.Thus, it is possible to compare computational and experi-mental results in the same system, while decouplingselection from the evolutionary requirements of naturesproteins. Finally, since no bias for function is introduced,rules describing the relation between sequence and struc-ture or stability can be indirectly inferred from theseapproaches.
One of the rst experimental combinatorial approachesto design a new hydrophobic core without a functionalscreening was carried out by Mossings group.45 As tem-plate they used a previously designed version of thelambda Cro repressor. The scaffold protein had a three-dimensional (3D) structure very similar to that speciedby the original design. However, the protein displayedpacking defects, with a somehow expanded hydrophobiccore, as deduced from the crystal structure.46 They appliedcombinatorial mutagenesis and a genetic screen for differ-ential protein expression level in E. coli in order toconstruct a second generation of proteins displaying alter-native arrangements of the hydrophobic core with en-hanced stability. In this work, structural inputs werecombined with combinatorial mutagenesis to search ef-ciently the combinatorial space, exploiting the loose corre-lation observed between protein stability and proteinexpression in bacterial systems.
In structuralstability-focused studies, the biophysicalproperties of highly puried protein samples are assayedin vitro. The fact that many designed proteins lack biologi-cal phenotypes make screening systems based on biophysi-cal properties valuable tools in combinatorial proteindesign. However, the need for purity has prevented theapplication of such approaches to high-throughput screen-ing. Recently, Johnson and Hecht have developed methodsthat enable rapid purication of semipure samples, suit-able for biophysical characterization, from bacterial ly-sates.47 The screening for properly packed structures isbased on methods capable of monitoring certain featurespresent in native-like proteins but absent in loose-packedvariants, such as the presence of sharp peaks and goodchemical-shift dispersion on monodimensional NMR spec-tra, or the higher protection of amide protons assayed bymass spectrometry.48,49 The Hecht group has appliedthese rapid procedures in their screening of combinatoriallibraries based on a binary patterning. In these proteinlibraries, the positions of polar and nonpolar residues arespecied explicitly, but the identities of these side-chainsare allowed to vary. Amino acid stretches maintaining theperiodicity of -helix and/or -sheet secondary structureare linked by glycine, proline, and polar-based turns.50 Byusing this simple binary code strategy, they have de-signed, evolved, and characterized de novo proteins thatfold cooperatively and specically into -helix or -sheetstructures.5053 Moreover, they have found that whenbinary patterning is applied to an appropriately designedstructural scaffold, the libraries contain a relatively largenumber of well-ordered structures. According to Mossing
and coworkers results, some bias toward selection ofnontoxic, better expressible variants may be expected inthis procedure, with a consequent enrichment in foldedspecies. From these studies, it appears that, for a givenstructural scaffold, many different amino acid combina-tions can specify folded structures. However, even thoughHecht et al. have isolated -helical variants with goodstability and near-native-like structural features from asecond-generation library,54 no 3D structure of thesebinary-patterned proteins has been solved to date, andlittle can be said about the specicity of core packing.
Display technologies have emerged in the last few yearsas a powerful, totally-unbiased approach to generate large,random combinatorial peptide and protein libraries. Inphage display technology, genetically encoded multiplemutants of a...