7
Available online at www.sciencedirect.com Protein disorder a breakthrough invention of evolution? Avner Schlessinger 5 , Christian Schaefer 1 , Esmeralda Vicedo 1 , Markus Schmidberger 1 , Marco Punta 1,2,3 and Burkhard Rost 1,2,3,4 As an operational definition, we refer to regions in proteins that do not adopt regular three-dimensional structures in isolation, as disordered regions. An antipode to disorder would be ‘well- structured’ rather than ‘ordered’. Here, we argue for the following three hypotheses. Firstly, it is more useful to picture disorder as a distinct phenomenon in structural biology than as an extreme example of protein flexibility. Secondly, there are many very different flavors of protein disorder, nevertheless, it seems advantageous to portray the universe of all possible proteins in terms of two main types: well-structured, disordered. There might be a third type ‘other’ but we have so far no positive evidence for this. Thirdly, nature uses protein disorder as a tool to adapt to different environments. Protein disorder is evolutionarily conserved and this maintenance of disorder is highly nontrivial. Increasingly integrating protein disorder into the toolbox of a living cell was a crucial step in the evolution from simple bacteria to complex eukaryotes. We need new advanced computational methods to study this new milestone in the advance of protein biology. Addresses 1 TUM, Bioinformatik - i12, Informatik, Boltzmannstrasse 3, 85748 Garching, Germany 2 Institute of Advanced Study (IAS), TUM, Boltzmannstr. 3, 85748 Garching, Germany 3 New York Consortium on Membrane Protein Structure (NYCOMPS), TUM Bioinformatics, Boltzmannstr. 3, 85748 Garching, Germany 4 Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY 10032, USA 5 Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, University of California, San Francisco, CA, USA Corresponding author: Rost, Burkhard ([email protected]) Current Opinion in Structural Biology 2011, 21:412–418 This review comes from a themed issue on Sequences and topology Edited by Julian Gough and Keith Dunker Available online 20 April 2011 0959-440X/$ see front matter # 2011 Elsevier Ltd. All rights reserved. DOI 10.1016/j.sbi.2011.03.014 Introduction Dog eat dogma The once Central Dogma of Molecular Biology (‘DNA makes RNA makes protein’) has cracked due to the discovery of the functional importance of noncoding RNA [1]. The Central Dogma of Genomics that derives from structural biology’ [2] implies that proteins adopt unique three- dimensional (3D) structures, and that the intricate detailed order in these 3D protein structures determines protein function. Over the last decade, experimental and compu- tational structural biologists have been accumulating sur- prising evidence: Every organism seemingly has proteins that appear not to adopt 3D structures in isolation, that is, contains disorder. Is it time for the dog to eat the dogma [2] that sequence determines structure determines function, as Greg Petsko so poetically phrased it? Since 3D details can determine function structures have evolved to exhibit innate and specific flexibility [36]. Functional flexibility spans a wide range in terms of the time scale and the amount of motion [7]. Is protein disorder just an extreme example for flexibility, and if so: would this save the dogma a little longer? Mixed nuts no fruits: disorder new principle of protein structure Disorder is a mixed bag Here, we refer to disordered regions as those regions in proteins that, when in isolation (i.e., not bound to other molecules), do not fold into a well-defined 3D structure but rather sample a large portion of their available confor- mational space. Put differently, if we could observe dis- ordered regions in isolation at two different times, we would see two grossly different structures [810]. This definition covers local flexible loops, extended domains, molten globule domains, and folded domains with flexible linkers [10]. These objects have been called the flavors of disorder [1113]. The length of a disordered region matters: data suggest that regions spanning just a few (<10) con- secutive residues are ‘just’ loops in well-structured proteins, while very long regions behave differently [1418]. There is no sound way to define a particular value as the threshold to distinguish between short and long. Clearly, longer than 30 is long, and shorter than 10 is short. In the following, we will try to make the following three points: first, long disordered regions clearly differ from well-structured regions, second, long disorder appears dif- ficult to maintain against random mutations; hence, the levels of disorder observed in native proteins suggest a functional role of long disordered regions, third, disorder is unequally distributed on the tree of life and seems to be correlated in a nontrivial way to organism complexity. Well-structured different than disorder; both not random PDB, the Protein Data Bank [19] may represent the universe of all proteins that adopt regular well-ordered Current Opinion in Structural Biology 2011, 21:412418 www.sciencedirect.com

Protein disorder — a breakthrough invention of evolution?

  • Upload
    tum

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Available online at www.sciencedirect.com

Protein disorder — a breakthrough invention of evolution?Avner Schlessinger5, Christian Schaefer1, Esmeralda Vicedo1,Markus Schmidberger1, Marco Punta1,2,3 and Burkhard Rost1,2,3,4

As an operational definition, we refer to regions in proteins that

do not adopt regular three-dimensional structures in isolation,

as disordered regions. An antipode to disorder would be ‘well-

structured’ rather than ‘ordered’. Here, we argue for the

following three hypotheses. Firstly, it is more useful to picture

disorder as a distinct phenomenon in structural biology than as

an extreme example of protein flexibility. Secondly, there are

many very different flavors of protein disorder, nevertheless, it

seems advantageous to portray the universe of all possible

proteins in terms of two main types: well-structured,

disordered. There might be a third type ‘other’ but we have so

far no positive evidence for this. Thirdly, nature uses protein

disorder as a tool to adapt to different environments. Protein

disorder is evolutionarily conserved and this maintenance of

disorder is highly nontrivial. Increasingly integrating protein

disorder into the toolbox of a living cell was a crucial step in the

evolution from simple bacteria to complex eukaryotes. We

need new advanced computational methods to study this new

milestone in the advance of protein biology.

Addresses1 TUM, Bioinformatik - i12, Informatik, Boltzmannstrasse 3, 85748

Garching, Germany2 Institute of Advanced Study (IAS), TUM, Boltzmannstr. 3, 85748

Garching, Germany3 New York Consortium on Membrane Protein Structure (NYCOMPS),

TUM Bioinformatics, Boltzmannstr. 3, 85748 Garching, Germany4 Department of Biochemistry and Molecular Biophysics, Columbia

University, 701 West, 168th Street, New York, NY 10032, USA5 Department of Bioengineering and Therapeutic Sciences, Department

of Pharmaceutical Chemistry, and California Institute for Quantitative

Biosciences, University of California, San Francisco, CA, USA

Corresponding author: Rost, Burkhard ([email protected])

Current Opinion in Structural Biology 2011, 21:412–418

This review comes from a themed issue on

Sequences and topology

Edited by Julian Gough and Keith Dunker

Available online 20 April 2011

0959-440X/$ – see front matter

# 2011 Elsevier Ltd. All rights reserved.

DOI 10.1016/j.sbi.2011.03.014

IntroductionDog eat dogma

The once Central Dogma of Molecular Biology (‘DNA makes

RNA makes protein’) has cracked due to the discovery of

the functional importance of noncoding RNA [1]. The

‘Central Dogma of Genomics that derives from structural

Current Opinion in Structural Biology 2011, 21:412–418

biology’ [2] implies that proteins adopt unique three-

dimensional (3D) structures, and that the intricate detailed

order in these 3D protein structures determines protein

function. Over the last decade, experimental and compu-

tational structural biologists have been accumulating sur-

prising evidence: Every organism seemingly has proteins

that appear not to adopt 3D structures in isolation, that is,

contains disorder. Is it time for the dog to eat the dogma [2]

that sequence determines structure determines function,

as Greg Petsko so poetically phrased it?

Since 3D details can determine function structures have

evolved to exhibit innate and specific flexibility [3–6].

Functional flexibility spans a wide range in terms of the

time scale and the amount of motion [7]. Is protein

disorder just an extreme example for flexibility, and if

so: would this save the dogma a little longer?

Mixed nuts — no fruits: disorder new principleof protein structureDisorder is a mixed bag

Here, we refer to disordered regions as those regions in

proteins that, when in isolation (i.e., not bound to other

molecules), do not fold into a well-defined 3D structure but

rather sample a large portion of their available confor-

mational space. Put differently, if we could observe dis-

ordered regions in isolation at two different times, we

would see two grossly different structures [8–10]. This

definition covers local flexible loops, extended domains,

molten globule domains, and folded domains with flexible

linkers [10]. These objects have been called the flavors ofdisorder [11–13]. The length of a disordered region matters:

data suggest that regions spanning just a few (<10) con-

secutive residues are ‘just’ loops in well-structured

proteins, while very long regions behave differently [14–18]. There is no sound way to define a particular value as

the threshold to distinguish between short and long.

Clearly, longer than 30 is long, and shorter than 10 is short.

In the following, we will try to make the following three

points: first, long disordered regions clearly differ from

well-structured regions, second, long disorder appears dif-

ficult to maintain against random mutations; hence, the

levels of disorder observed in native proteins suggest a

functional role of long disordered regions, third, disorder is

unequally distributed on the tree of life and seems to be

correlated in a nontrivial way to organism complexity.

Well-structured different than disorder; both not random

PDB, the Protein Data Bank [19] may represent the

universe of all proteins that adopt regular well-ordered

www.sciencedirect.com

Protein disorder — a breakthrough invention of evolution? Schlessinger et al. 413

Figure 1

only disorder

disorder + loop

only loop

loops: 50-80residues

loops: 30-40residues

disorder: 50-80residues

disorder: 30-40residues

Percentage of regions0 20 40 60 80 100

Current Opinion in Structural Biology

Disorder is not loops only. In a selection of a few entirely sequenced

organisms (Acaryochloris marina, Drosophila melanogaster, Hemiselmis

andersenii, Homo sapiens, Saccharomyces cerevisiae and

Methanococcus aeolicus), we find all regions 30–40 and all 50–80

residues long that are predicted to only contain loops by PROFsec

[28,29,54] (gray), and all that are predicted to only contain NORSnet [18]

disorder (pink). Then we monitor the overlap between disorder and loop

(gray and pink stripes). Although, most of the loopy disorder is in fact

predicted as loop (93% for regions of 30–40 residues, and 72% for

regions 50–80 residues), only half of the long loops are actually

disordered. Thus, even the loopy type of disorder differs significantly

from nonregular secondary structure. This is particularly remarkable,

because NORSnet has been trained exclusively on loops predicted by

PROFsec at the first place.

3D structures and SWISS-PROT [20] the larger universe

of proteins to which we are able to attach some kind of

functional label via our current experimental techniques.

Those two populations (proteins in PDB and SWISS-

PROT) differ [14]. Protein disorder is one of the three

major aspects explaining some of that difference [14].

The other two are membrane regions and coiled-coil

regions. In eukaryotes, disorder occupies a larger fraction

of the sequence space than membrane spanning regions

[14,21]; in prokaryotes, encompassing both bacteria and

archaea, they are about on par [21]. In all three super-

kingdoms both membrane and disordered regions contain

2–3 times more residues than coiled-coils [14,21].

Usually, structural biologists perceive coiled-coils and

membrane regions as a part of the universe of well-

structured proteins, and long flexible loops as observed

in, for example, domain linkers or antibodies are also

considered as an intrinsic aspect of well-defined struc-

tures. Should we then consider disorder analogously, that

is, as one aspect of well-structured proteins?

Computational biologists can predict and experimental

biologist can describe some flavors of protein disorder

[8,13,22–26]. Analyzing the populations of proteins with

disorder identified experimentally and/or computationally

reveals that disordered regions differ from well-structured

regions and that both differ from polypeptides encoded by

sequences assembled randomly in silico [14,15].

Disorder not the same as nonregular secondary

structure

All known well-structured proteins have about half of all

their residues in either helices or strands, that is, in regular

secondary structure [27–29]. Many disordered regions are

predicted not to contain regular secondary structure; we

refer to those as the ‘loopy flavor’ of disorder [14]. Thus, we

can identify some disordered regions by searching for very

long regions with no regular secondary structure (NORS

regions). NORSnet is a machine-learning based method

that identifies such loopy regions [18]; disorder predicted by

NORSnet has a high overlap with disorder predicted by

two methods that have been developed using very differ-

ent principles and data sets, namely DISOPRED2 [30] and

IUPred [31], explicitly using experimentally characterized

disordered regions for their optimization. Surprisingly,

NORSnet can distinguish between long (�30) disordered

regions and those long regions that resemble shorter loops

in well-structured proteins: not all loopy regions of, for

example, 30–40 residues are predicted to have loops, and

vice versa: not all 30–40 residue loops are predicted to be

loopy (Figure 1).

Maintaining disorder requires an extra effort

There are different aspects of ‘maintenance of disorder’

that are nontrivial. Two important aspects of the challenge

for cells to evolve protein disorder that we do not address

here are: prevention from aggregation and digestion is

www.sciencedirect.com

demanding [32], and over-expression of disordered

proteins can be damaging [10,33]. Here, we address

another important aspect: the difficulty of evolving and

maintaining disorder against the odds of random

mutations.

One prevailing view — supported by careful analysis and

data — portrays disorder as a means to become ‘immune’

against mutations [13,34]. The argument combines two

assumptions proceeding as follows. First, mutations in

protein sequences are more often problematic for an

organism than beneficial [35–37]. Second, the details in

well-ordered 3D structures make these become suscept-

ible to mutations [37–41]. In fact, the profiles of which

amino acids can be changed at which positions against

which others contain crucial information that can be used

to predict aspects of protein structure and function [27–29,42]. It might therefore be advantageous to inventregions that function and are less sensitive to mutation.

Could protein disorder be a means toward this end? Some

data supports this view [34,43–50].

Recent observations, however, sharpen this view: many

long disordered regions are not robust against random

Current Opinion in Structural Biology 2011, 21:412–418

414 Sequences and topology

Figure 2

Predicted secondary structure:

1 - predicted disorder

Predicted disorder:short (<10) long (>30)

loop strand helix

DisProt-random

DisProt-native

PDB-native

Content in protein Content in protein

PDB-random

0 10.2 0.4 0.6 0.8 0 10.2 0.4 0.6 0.8

Current Opinion in Structural Biology

Helix/strand conserved-disorder not. We randomly synthesize protein sequences in silico. In one experiment we use the frequency of amino acids

observed in the database of known protein structures (PDB [19]), in the other the frequency observed in the database of proteins with disordered

regions (DisProt [81]). We then predict secondary structure (PROFsec [28,29,54]) and disorder (IUPred [31]) for these random proteins as well as for

representative subsets of native proteins from those two databases. Firstly, the resulting four sets of predictions are surprisingly similar in their

secondary structure (left panel: red: helix, blue: strand, gray: loop). Secondly, the four are also similar in their short disorder (right panel: green bars). In

contrast, the difference in terms of long disorder is very significant (pink). These findings suggest that short disorder resembles loops, while long

disorder behaves very differently. The random mutagenesis published earlier [15] shows the high sensitivity of long disorder to random mutations.

mutations [18]. In particular, long disordered regions

seem to disappear upon random in silico mutagenesis

and are predicted less in sequences assembled randomly

in silico than naturally observed (Figure 2). In contrast,

regular secondary structure is robust against mutation

[15,51] (Figure 2). This suggests that a specific effort is

required for a cell to evolve and maintain long protein

disorder against the drift from random mutations [15] and

hence those regions should not be seen as a cells way to

immunize against mutations.

Disorder seems evolutionarily conserved

All methods that compare proteins (e.g., alignment

methods) were developed using well-structured proteins,

and typically ignored regions of low complexity. Because

disordered regions can be abundant in low-complexity

segments or adopt very different structures when folded,

they are difficult to align [52]. Conclusions from aligning

disordered regions should, therefore, better be viewed

with many grains of salt [53]. Despite this caveat we

observed that very long loopy disorder (NORS-type)

regions appeared to be more conserved than their flanking

regions [14]. In contrast, loops in well-structured proteins

are — on average — less well conserved than regular

secondary structure [54], that is, if loopy disorder were

similar to loops from well-structured proteins, we would

expect a strong trend in the opposite direction. Further-

more, disorder predictors that use sequence profiles as

input perform significantly better than those that use

sequence information alone, suggesting that sequence

conservation is an important feature for prediction

[21,55]. Overall, disorder seems more conserved than

what we might expect from a tool that renders immune

to change.

Current Opinion in Structural Biology 2011, 21:412–418

Does disorder need to be conserved for functional

reasons? We have yet to discover examples for regions

in proteins that are conserved in evolution without any

good functional reason [56–59]. Thus, the conservation of

disorder underlines the importance of this new principle

of protein structure for function. Indeed, we know many

examples for functional disorder and we know that the

more functional a region the more it is conserved. Unfor-

tunately, we cannot conclusively answer the question to

which extent disorder needs to be conserved in order to

preserve function in any way other than for a tiny set of

examples that may not be representative [6,43–45,50].

Protein disorder — a major tool for evolution?Eukaryotes more disordered than prokaryotes

In 1995, we got the first glimpse at an entirely sequenced

organism [60], while others from all super-kingdoms of

life have been following [61]. Several attempts have been

made to find simple protein features that distinguish

super-kingdoms [62,63]. Surprisingly, eukaryotic and pro-

karyotic proteins resemble each other in terms of number

of domains, protein length, and amino acid composition

(with some caveats [63–66]). In contrast to early hypoth-

eses, the fraction of membrane proteins is fairly similar

(put differently, it varies more within each super-king-

dom than it differs between them). Some main themes in

cores of 3D shapes [67] are kingdom-specific [68]. But our

set of structures may still be importantly incomplete and

most relations that exist only on the level of 3D structure

cannot be detected from sequence alone [41], that is, the

kingdom-specificity of folds may turn out to be invalid.

Can we identify disorder as one of the major tools through

which evolution increases complexity and enables adap-

tation?

www.sciencedirect.com

Protein disorder — a breakthrough invention of evolution? Schlessinger et al. 415

Figure 3

Prediction method:

Eukaryota

Bacteria

Archaea

0 2010Percentage of proteins with ≥1 disorder region of length:≥30 consecutive residues ≥80 consecutive residues

30 40 50 60 70 10 20 30 40 50 60 70

Virusea

Current Opinion in Structural Biology

MD IUPred VSL2

Eukaryotes have much more disorder than prokaryotes. Prediction of

disorder in all the 1848 organisms with entirely sequenced proteomes

(�4 million proteins, UNIPROT, December 2010). For each protein, we

predicted disorder regions with three publicly available methods, namely

MD [21], IUPred [31], and VSL2 [21,55]. We removed all regions with

short disorder. Here we show results for two thresholds for ‘short’: the

left panel shows results after removing regions with fewer than 30

consecutive residues in disorder, the right panel those with fewer than

80 residues. The x-axes give the percentage of proteins with at least one

disorder region according to those two length-thresholds. The

percentage is compiled with respect to all proteins from all fully

sequenced organisms in one of the three super-kingdoms and in viruses.

Although the three methods disagree in the percentage of proteins with

disorder predicted, they largely agree in the prediction that eukaryotes

have many more proteins with disorder, for example, MD/IUPred

estimate about 2–3% of all proteins in archaea and bacteria to have at

least one disorder region longer than 80 residues (right panel), while both

estimate about 17–19% of all proteins in eukaryotes to have regions of

such extreme length.

Disorder associated with complexity of an organism

In the following we refer to disorder composition as the

percentage of proteins in an entirely sequenced proteome

that contain at least one long region of disorder. As

experimental information remains insufficient, this value

can currently only be obtained in silico. For a variety of

definitions for ‘long region of disorder’ with respect to

length and prediction method, the composition of dis-

order clearly correlates with the super-kingdom (bacteria,

archaea, and eukaryotes, Figure 3) [14,69]. Specifically,

computational predictions noted some overabundance of

disorder in protein interaction hubs [70–73] (or more

specifically in date hubs [18]), transcription [13,14,74],

translation [13,14], signal transduction [13,14], and more

recently ubiquitination [75]. All these observations sup-

port the oversimplified view of disordered regions as a

molecular tool to increase the complexity of a system or

organism, because ultimately, the complexity of regula-

tion is one of the major differences between bacteria and

eukarya and on the way of the evolution from lower

eukaryotes to mammals.

Can we view protein disorder, the new structural prin-

ciple that is costly to maintain and evolve, as a molecular

mechanism that brings about complexity? Many aspects

of disorder suggest that this might be so, but we need to

dig much deeper to accumulate evidence. One particular

support of the image of disorder as a molecular tool to

build complexity comes from the analysis of disorder in

the context of diverse environments.

Protein disorder related to environmentDifferences between organisms from different habitats

are imprinted upon disorder

We can study the relation between evolution and disorder

is by analyzing disorder in prokaryotes (many diverse

organisms are sequenced, evolutionary relations can be

quantified [76], and disorder predictions are accurate

[77]). Our preliminary work suggests that differences

between organisms from distinct habitats are imprinted

upon the fraction of proteins with long disordered regions.

First, proteomes of thermophiles are well-structured,

which might explain the high success rate of these for

structure determination [78]. Second, proteomes of psy-

chrophiles (i.e., organisms thriving in low temperatures)

are disordered. Tompa et al. picked up on this finding

showing experimentally that some disordered regions can

function in the cold better than well-structured proteins

[79]. Third, proteomes of archaean and bacterial halo-

philes (i.e., organisms that survive high salt conditions)

are disordered, in agreement with the notion that acidic

residues — which ‘promote’ disorder [10] — are import-

ant for proteins to be functional in salty conditions [80].

Fourth, organisms with high tolerance for mutations tend

to be abundant with disorder. For instance, the fraction of

disorder in the proteome of Deinococcus radiodurans, a

www.sciencedirect.com

bacterium that can survive high doses of radiation is very

high. The malleability of its proteome might enable this

bacterium to tolerate structural modifications resulting

from frequent mutations thereby escaping radiation

damage. If so, this might be an example for how despite

the sensitivity of long disorder for mutations disorder

might be a buffer for fateful mutations. The finding that

the majority of these correlations between environment

and disorder are independent of the phylogenetic branch

on the tree of life (e.g., animal kingdom) increases our

confidence that it did not occur by chance.

The finding that a microscopic feature as coarse-grained

as the overall content in proteins with disordered regions

correlates with such a complex macroscopic variable as

the environment remains surprising and will have to be

investigated more. Many questions remain open. Are

these molecular adaptations incremental or do we have

evidence for some sort of leap or transition? Is it true that a

little more or less flexibility distributed throughout the

protein is the gradual response to extreme habitats, or do

we have evidence for some major step in local regions that

may increase the fitness of extremophiles in leaps?

Advanced alignment methods capturing distant sequence

relationships will enable us to accurately compare ortho-

logous proteins and pathways among genomes; disorder

prediction methods focusing on different flavors of dis-

Current Opinion in Structural Biology 2011, 21:412–418

416 Sequences and topology

order will enable us to describe the functional differences

and commonalities among these pathways, thus hopefully

providing answers for these questions.

ConclusionsIn this perspective, we argue for three major views; none

of those can be established, but we argue that accepting

those for the time being is beneficial. Firstly, well-struc-

tured and disordered regions occupy different regions in

the space of all sequences and both differ from random.

Put differently, protein disorder is something new, not an

extreme aspect of flexible regular structure. Secondly,

although the term protein disorder describes a very mixed

bag of features, the content of this bag shares the label

disorder. Thirdly, protein disorder is conserved in evol-

ution and we can picture it as one important tool for

evolution to help in advancing from simple bacteria to

more complex eukaryotes. Ultimately, we need better

experimental and computational methods to better

understand the role of this new phenomenon in biology.

Is disorder THE major tool that simplifies the increase in

complexity and adaptation to the environment? If the

data suggesting this view will be confirmed, protein

disorder may turn out to be THE most important break-

through invention of evolution.

AcknowledgementsThanks to Laszlo Kajan, Tim Karl, and Marlena Drabik (TUM), JulianGough (Univ. Bristol) and Keith Dunker (Indiana Univ.), and Joel Sussman(Weizmann) for their important support; to the anonymous reviewer forimproving this paper. Our work was supported by the Alexander vonHumboldt Foundation, the TUM Institute for Advanced Study, funded bythe German Excellence, and the following NIH grants: R01-LM07329,U54-GM75026-01, NIH F32-GM088991. Last, not least, thanks to all thosewho deposit their experimental data in public databases, and to those whomaintain these databases.

References

1. Mattick JS: Deconstructing the dogma: a new view of theevolution and genetic programming of complex organisms.Ann N Y Acad Sci 2009, 1178:29-46.

2. Petsko GA: Dog eat dogma. Genome Biol 2000, 1:1002.

3. Noble M, Blanchard JS: Catalysis regulation. Curr Opin StructBiol 2009, 19:641-642.

4. Sippl MJ: Fold space unlimited. Curr Opin Struct Biol 2009,19:312-320.

5. Sadowski MI, Jones DT: The sequence–structure relationshipand protein function prediction. Curr Opin Struct Biol 2009,19:357-362.

6. Tokuriki N, Oldfield CJ, Uversky VN, Berezovsky IN, Tawfik DS: Doviral proteins possess unique biophysical features? TrendsBiochem Sci 2009, 34:53-59.

7. Palmer AG 3rd, Massi F: Characterization of the dynamics ofbiomacromolecules using rotating-frame spin relaxation NMRspectroscopy. Chem Rev 2006, 106:1700-1719.

8. Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z,Uversky VN, Dunker AK: Intrinsic disorder and functionalproteomics. Biophys J 2007, 92:1439-1456.

9. Dosztanyi Z, Tompa P: Prediction of protein disorder. MethodsMol Biol 2008, 426:103-115.

Current Opinion in Structural Biology 2011, 21:412–418

10. Dyson HJ, Wright PE: Intrinsically unstructured proteins andtheir functions. Nat Rev Mol Cell Biol 2005, 6:197-208.

11. Romero P, Obradovic Z, Kissinger CR, Villafranca JE, Garner E,Guilliot S, Dunker AK: Thousands of proteins likely to have longdisordered regions. Pac Symp Biocomput 1998:437-448.

12. Vucetic S, Brown CJ, Dunker AK, Obradovic Z: Flavors of proteindisorder. Proteins 2003, 52:573-584.

13. Dunker AK, Silman I, Uversky VN, Sussman JL: Function andstructure of inherently disordered proteins. Curr Opin StructBiol 2008, 18:756-764.

14. Liu J, Tan H, Rost B: Loopy proteins appear conserved inevolution. J Mol Biol 2002, 322:53-64.

15. Schaefer C, Schlessinger A, Rost B: Protein secondary structureappears to be robust under in silico evolution while proteindisorder appears not to be. Bioinformatics 2010, 26:625-631.

16. Bordoli L, Kiefer F, Schwede T: Assessment of disorderpredictions in CASP7. Proteins 2007, 69(Suppl. 8):129-136.

17. Mohan A, Uversky VN, Radivojac P: Influence of sequencechanges and environment on intrinsically disordered proteins.PLoS Comput Biol 2009, 5:e1000497.

18. Schlessinger A, Liu J, Rost B: Natively unstructured loops differfrom other loops. PLoS Comput Biol 2007, 3:e140.

19. Rose PW, Beran B, Bi C, Bluhm WF, Dimitropoulos D,Goodsell DS, Prlic A, Quesada M, Quinn GB, Westbrook JD et al.:The RCSB Protein Data Bank: redesigned web site and webservices. Nucleic Acids Res 2010, 39:D392-D401.

20. Schneider M, Lane L, Boutet E, Lieberherr D, Tognolli M,Bougueleret L, Bairoch A: The UniProtKB/Swiss-Protknowledgebase and its plant proteome annotation program. JProteomics 2009, 72:567-573.

21. Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B: Improveddisorder prediction by combination of orthogonalapproaches. PLoS One 2009, 4:e4433.

22. Galea CA, High AA, Obenauer JC, Mishra A, Park CG, Punta M,Schlessinger A, Ma J, Rost B, Slaughter CA et al.: Large-scaleanalysis of thermostable, mammalian proteins providesinsights into the intrinsically disordered proteome. J ProteomeRes 2009, 8:211-226.

23. Galea CA, Wang Y, Sivakolundu SG, Kriwacki RW: Regulation ofcell division by intrinsically unstructured proteins: intrinsicflexibility, modularity, and signaling conduits. Biochemistry2008, 47:7598-7609.

24. Csizmok V, Felli IC, Tompa P, Banci L, Bertini I: Structural anddynamic characterization of intrinsically disordered humansecurin by NMR spectroscopy. J Am Chem Soc 2008,130:16873-16879.

25. Bracken C, Iakoucheva LM, Romero PR, Dunker AK: Combiningprediction, computation and experiment for thecharacterization of protein disorder. Curr Opin Struct Biol 2004,14:570-576.

26. Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL:Addressing the intrinsic disorder bottleneck in structuralproteomics. Proteins Struct Funct Bioinform 2005, 59:444-453.

27. Rost B: Did evolution leap to create the protein universe? CurrOpin Struct Biol 2002, 12:409-416.

28. Rost B: PHD: predicting one-dimensional protein structure byprofile based neural networks. Methods Enzymol 1996, 266:525-539.

29. Rost B, Sander C: Prediction of protein secondary structure atbetter than 70% accuracy. J Mol Biol 1993, 232:584-599.

30. Buchan DW, Ward SM, Lobley AE, Nugent TC, Bryson K,Jones DT: Protein annotation and modelling servers atUniversity College London. Nucleic Acids Res 2010,38(Suppl.):W563-W568.

31. Dosztanyi Z, Csizmok V, Tompa P, Simon I: IUPred: web serverfor the prediction of intrinsically unstructured regions of

www.sciencedirect.com

Protein disorder — a breakthrough invention of evolution? Schlessinger et al. 417

proteins based on estimated energy content. Bioinformatics2005, 21:3433-3434.

32. Tompa P, Prilusky J, Silman I, Sussman JL: Structural disorderserves as a weak signal for intracellular protein degradation.Proteins 2008, 71:903-909.

33. Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B: Intrinsicprotein disorder and interaction promiscuity are widelyassociated with dosage sensitivity. Cell 2009, 138:198-208.

34. Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P,Oldfield CJ, Cortese MS, Sickmeier M, Legall T, Obradovic Z et al.:Alternative splicing in concert with protein intrinsic disorderenables increased functional diversity in multicellularorganisms. Proc Natl Acad Sci U S A 2006, 103:8390-8395.

35. Bromberg Y, Overton J, Vaisse C, Leibel RL, Rost B: In silicomutagenesis: a case study of the melanocortin 4 receptor.FASEB J 2009, 23:3059-3069.

36. Matthews B: Structural and genetic analysis of protein foldingand stability. Curr Opin Struct Biol 1993, 3:589-593.

37. Lesk AM: In Protein Architecture — A Practical Approach, vol. 1.Edited by Rickwood D, Hames HD. Oxford/New York/Tokyo:Oxford University Press; 1991.

38. Chothia C, Lesk AM: The relation between the divergence ofsequence and structure in proteins. EMBO J 1986, 5:823-826.

39. Sander C, Schneider R: Database of homology-derivedstructures and the structural meaning of sequence alignment.Proteins Struct Funct Genet 1991, 9:56-68.

40. Rost B: Twilight zone of protein sequence alignments. ProteinEng 1999, 12:85-94.

41. Rost B, O’Donoghue S, Sander C: Midnight Zone of ProteinStructure Evolution. Heidelberg: EMBL; 1998.

42. Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y: Automaticprediction of protein function. Cell Mol Life Sci 2003, 60:2637-2650.

43. Fuxreiter M, Tompa P, Simon I: Local structural disorderimparts plasticity on linear motifs. Bioinformatics 2007,23:950-956.

44. Sugase K, Dyson HJ, Wright PE: Mechanism of coupled foldingand binding of an intrinsically disordered protein. Nature 2007,447:1021-1025.

45. Brown CJ, Johnson AK, Daughdrill GW: Comparing models ofevolution for ordered and disordered proteins. Mol Biol Evol2010, 27:609-621.

46. Hegyi H, Kalmar L, Horvath T, Tompa P: Verification ofalternative splicing variants based on domain integrity,truncation length and intrinsic protein disorder. Nucleic AcidsRes 2010, 39:1208-1219.

47. Cortese MS, Uversky VN, Dunker AK: Intrinsic disorder inscaffold proteins: getting more from less. Prog Biophys Mol Biol2008, 98:85-106.

48. Midic U, Oldfield CJ, Dunker AK, Obradovic Z, Uversky VN:Unfoldomics of human genetic diseases: illustrative examplesof ordered and intrinsically disordered members of the humandiseasome. Protein Pept Lett 2009, 16:1533-1547.

49. Tokuriki N, Tawfik DS: Stability effects of mutations and proteinevolvability. Curr Opin Struct Biol 2009, 19:596-604.

50. Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS,Dunker AK, Uversky VN: Analysis of molecular recognitionfeatures (MoRFs). J Mol Biol 2006, 362:1043-1059.

51. Lavelle DT, Pearson WR: Globally, unrelated protein sequencesappear random. Bioinformatics 2010, 26:310-318.

52. Forslund K, Sonnhammer EL: Benchmarking homologydetection procedures with low complexity filters.Bioinformatics 2009, 25:2500-2505.

53. Radivojac P, Obradovic Z, Brown CJ, Dunker AK: Improvingsequence alignments for intrinsically disordered proteins. PacSymp Biocomput 2002:589-600.

www.sciencedirect.com

54. Rost B, Sander C: Combining evolutionary information andneural networks to predict protein secondary structure.Proteins Struct Funct Genet 1994, 19:55-72.

55. Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK, Obradovic Z:Optimizing long intrinsic disorder predictors with proteinevolutionary information. J Bioinform Comput Biol 2005, 3:35-60.

56. Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A,Lee D, Fiser A, Godzik A, Rost B, Orengo C: PSI-2: structuralgenomics to cover protein domain family space. Structure2009, 17:869-881.

57. Dessailly BH, Redfern OC, Cuff A, Orengo CA: Exploitingstructural classifications for function prediction: towards adomain grammar for protein function. Curr Opin Struct Biol2009, 19:349-356.

58. Andreeva A, Murzin AG: Structural classification of proteins andstructural genomics: new insights into protein folding andevolution. Acta Crystallogr F Struct Biol Cryst Commun 2010,66:1190-1197.

59. Murzin AG: Biochemistry. Metamorphic proteins. Science 2008,320:1725-1726.

60. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF,Kerlavage AR, Bult CJ, Tomb J-F, Dougherty BA, Merrick JM et al.:Whole-genome random sequencing and assembly ofHaemophilus influenzae Rd. Science 1995, 269:496-512.

61. Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC: TheGenomes On Line Database (GOLD) in 2007: status of genomicand metagenomic projects and their associated metadata.Nucleic Acids Res 2008, 36:D475-D479.

62. Gerstein M, Levitt M: A structural census of the currentpopulation of protein sequences. Proc Natl Acad Sci U S A 1997,94:11911-11916.

63. Liu J, Rost B: Comparing function and structure between entireproteomes. Protein Sci 2001, 10:1970-1979.

64. Liu J, Hegyi H, Acton TB, Montelione GT, Rost B: Automatictarget selection for structural genomics on eukaryotes.Proteins Struct Funct Bioinform 2004, 56:188-200.

65. Liu J, Rost B: Domains, motifs, and clusters in the proteinuniverse. Curr Opin Chem Biol 2003, 7:5-11.

66. Pe’er I, Felder CE, Man O, Silman I, Sussman JL, Beckmann JS:Proteomic signatures: amino acid and oligopeptidecompositions differentiate among phyla. Proteins 2004,54:20-40.

67. Petrey D, Fischer M, Honig B: Structural relationships amongproteins with different global topologies and their implicationsfor function annotation strategies. Proc Natl Acad Sci U S A2009, 106:17377-17382.

68. Aravind L, Iyer LM, Koonin EV: Comparative genomics andstructural biology of the molecular innovations of eukaryotes.Curr Opin Struct Biol 2006, 16:409-419.

69. Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ:Intrinsic protein disorder in complete genomes. GenomeInform Ser Workshop Genome Inform 2000, 11:161-171.

70. Dosztanyi Z, Chen J, Dunker AK, Simon I, Tompa P: Disorder andsequence repeats in hub proteins and their implications fornetwork evolution. J Proteome Res 2006, 5:2985-2995.

71. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN:Flexible nets, the roles of intrinsic disorder in proteininteraction networks. FASEB J 2005, 272:5129-5148.

72. Gsponer J, Babu MM: The rules of disorder or why disorderrules. Prog Biophys Mol Biol 2009, 99:94-103.

73. Oldfield CJ, Meng J, Yang JY, Yang MQ, Uversky VN, Dunker AK:Flexible nets: disorder and induced fit in the associations ofp53 and 14-3-3 with their partners. BMC Genomics 2008,9(Suppl. 1):S1.

74. Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK:Intrinsic disorder in transcription factors. Biochemistry 2006,45:6873-6888.

Current Opinion in Structural Biology 2011, 21:412–418

418 Sequences and topology

75. Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW,Goebl MG, Iakoucheva LM: Identification, analysis, andprediction of protein ubiquitination sites. Proteins 2009,78:365-380.

76. Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online toolfor phylogenetic tree display and annotation. Bioinformatics2007, 23:127-128.

77. Noivirt-Brik O, Prilusky J, Sussman JL: Assessment of disorderpredictions in CASP8. Proteins 2009, 77(Suppl. 9):210-216.

78. Robinson-Rechavi M, Alibes A, Godzik A: Contribution ofelectrostatic interactions, compactness and quaternarystructure to protein thermostability: lessons from

Current Opinion in Structural Biology 2011, 21:412–418

structural genomics of Thermotoga maritima. J Mol Biol 2006,356:547-557.

79. Tantos A, Friedrich P, Tompa P: Cold stability of intrinsicallydisordered proteins. FEBS Lett 2009, 583:465-469.

80. Fukuchi S, Yoshimune K, Wakayama M, Moriguchi M,Nishikawa K: Unique amino acid composition of proteins inhalophilic bacteria. J Mol Biol 2003, 327:347-357.

81. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS,Tantos A, Szabo B, Tompa P, Chen J, Uversky VN et al.: DisProt:the database of disordered proteins. Nucleic Acids Res 2007,35:D786-793.

www.sciencedirect.com