Upload
tum
View
2
Download
0
Embed Size (px)
Citation preview
Available online at www.sciencedirect.com
Protein disorder — a breakthrough invention of evolution?Avner Schlessinger5, Christian Schaefer1, Esmeralda Vicedo1,Markus Schmidberger1, Marco Punta1,2,3 and Burkhard Rost1,2,3,4
As an operational definition, we refer to regions in proteins that
do not adopt regular three-dimensional structures in isolation,
as disordered regions. An antipode to disorder would be ‘well-
structured’ rather than ‘ordered’. Here, we argue for the
following three hypotheses. Firstly, it is more useful to picture
disorder as a distinct phenomenon in structural biology than as
an extreme example of protein flexibility. Secondly, there are
many very different flavors of protein disorder, nevertheless, it
seems advantageous to portray the universe of all possible
proteins in terms of two main types: well-structured,
disordered. There might be a third type ‘other’ but we have so
far no positive evidence for this. Thirdly, nature uses protein
disorder as a tool to adapt to different environments. Protein
disorder is evolutionarily conserved and this maintenance of
disorder is highly nontrivial. Increasingly integrating protein
disorder into the toolbox of a living cell was a crucial step in the
evolution from simple bacteria to complex eukaryotes. We
need new advanced computational methods to study this new
milestone in the advance of protein biology.
Addresses1 TUM, Bioinformatik - i12, Informatik, Boltzmannstrasse 3, 85748
Garching, Germany2 Institute of Advanced Study (IAS), TUM, Boltzmannstr. 3, 85748
Garching, Germany3 New York Consortium on Membrane Protein Structure (NYCOMPS),
TUM Bioinformatics, Boltzmannstr. 3, 85748 Garching, Germany4 Department of Biochemistry and Molecular Biophysics, Columbia
University, 701 West, 168th Street, New York, NY 10032, USA5 Department of Bioengineering and Therapeutic Sciences, Department
of Pharmaceutical Chemistry, and California Institute for Quantitative
Biosciences, University of California, San Francisco, CA, USA
Corresponding author: Rost, Burkhard ([email protected])
Current Opinion in Structural Biology 2011, 21:412–418
This review comes from a themed issue on
Sequences and topology
Edited by Julian Gough and Keith Dunker
Available online 20 April 2011
0959-440X/$ – see front matter
# 2011 Elsevier Ltd. All rights reserved.
DOI 10.1016/j.sbi.2011.03.014
IntroductionDog eat dogma
The once Central Dogma of Molecular Biology (‘DNA makes
RNA makes protein’) has cracked due to the discovery of
the functional importance of noncoding RNA [1]. The
‘Central Dogma of Genomics that derives from structural
Current Opinion in Structural Biology 2011, 21:412–418
biology’ [2] implies that proteins adopt unique three-
dimensional (3D) structures, and that the intricate detailed
order in these 3D protein structures determines protein
function. Over the last decade, experimental and compu-
tational structural biologists have been accumulating sur-
prising evidence: Every organism seemingly has proteins
that appear not to adopt 3D structures in isolation, that is,
contains disorder. Is it time for the dog to eat the dogma [2]
that sequence determines structure determines function,
as Greg Petsko so poetically phrased it?
Since 3D details can determine function structures have
evolved to exhibit innate and specific flexibility [3–6].
Functional flexibility spans a wide range in terms of the
time scale and the amount of motion [7]. Is protein
disorder just an extreme example for flexibility, and if
so: would this save the dogma a little longer?
Mixed nuts — no fruits: disorder new principleof protein structureDisorder is a mixed bag
Here, we refer to disordered regions as those regions in
proteins that, when in isolation (i.e., not bound to other
molecules), do not fold into a well-defined 3D structure but
rather sample a large portion of their available confor-
mational space. Put differently, if we could observe dis-
ordered regions in isolation at two different times, we
would see two grossly different structures [8–10]. This
definition covers local flexible loops, extended domains,
molten globule domains, and folded domains with flexible
linkers [10]. These objects have been called the flavors ofdisorder [11–13]. The length of a disordered region matters:
data suggest that regions spanning just a few (<10) con-
secutive residues are ‘just’ loops in well-structured
proteins, while very long regions behave differently [14–18]. There is no sound way to define a particular value as
the threshold to distinguish between short and long.
Clearly, longer than 30 is long, and shorter than 10 is short.
In the following, we will try to make the following three
points: first, long disordered regions clearly differ from
well-structured regions, second, long disorder appears dif-
ficult to maintain against random mutations; hence, the
levels of disorder observed in native proteins suggest a
functional role of long disordered regions, third, disorder is
unequally distributed on the tree of life and seems to be
correlated in a nontrivial way to organism complexity.
Well-structured different than disorder; both not random
PDB, the Protein Data Bank [19] may represent the
universe of all proteins that adopt regular well-ordered
www.sciencedirect.com
Protein disorder — a breakthrough invention of evolution? Schlessinger et al. 413
Figure 1
only disorder
disorder + loop
only loop
loops: 50-80residues
loops: 30-40residues
disorder: 50-80residues
disorder: 30-40residues
Percentage of regions0 20 40 60 80 100
Current Opinion in Structural Biology
Disorder is not loops only. In a selection of a few entirely sequenced
organisms (Acaryochloris marina, Drosophila melanogaster, Hemiselmis
andersenii, Homo sapiens, Saccharomyces cerevisiae and
Methanococcus aeolicus), we find all regions 30–40 and all 50–80
residues long that are predicted to only contain loops by PROFsec
[28,29,54] (gray), and all that are predicted to only contain NORSnet [18]
disorder (pink). Then we monitor the overlap between disorder and loop
(gray and pink stripes). Although, most of the loopy disorder is in fact
predicted as loop (93% for regions of 30–40 residues, and 72% for
regions 50–80 residues), only half of the long loops are actually
disordered. Thus, even the loopy type of disorder differs significantly
from nonregular secondary structure. This is particularly remarkable,
because NORSnet has been trained exclusively on loops predicted by
PROFsec at the first place.
3D structures and SWISS-PROT [20] the larger universe
of proteins to which we are able to attach some kind of
functional label via our current experimental techniques.
Those two populations (proteins in PDB and SWISS-
PROT) differ [14]. Protein disorder is one of the three
major aspects explaining some of that difference [14].
The other two are membrane regions and coiled-coil
regions. In eukaryotes, disorder occupies a larger fraction
of the sequence space than membrane spanning regions
[14,21]; in prokaryotes, encompassing both bacteria and
archaea, they are about on par [21]. In all three super-
kingdoms both membrane and disordered regions contain
2–3 times more residues than coiled-coils [14,21].
Usually, structural biologists perceive coiled-coils and
membrane regions as a part of the universe of well-
structured proteins, and long flexible loops as observed
in, for example, domain linkers or antibodies are also
considered as an intrinsic aspect of well-defined struc-
tures. Should we then consider disorder analogously, that
is, as one aspect of well-structured proteins?
Computational biologists can predict and experimental
biologist can describe some flavors of protein disorder
[8,13,22–26]. Analyzing the populations of proteins with
disorder identified experimentally and/or computationally
reveals that disordered regions differ from well-structured
regions and that both differ from polypeptides encoded by
sequences assembled randomly in silico [14,15].
Disorder not the same as nonregular secondary
structure
All known well-structured proteins have about half of all
their residues in either helices or strands, that is, in regular
secondary structure [27–29]. Many disordered regions are
predicted not to contain regular secondary structure; we
refer to those as the ‘loopy flavor’ of disorder [14]. Thus, we
can identify some disordered regions by searching for very
long regions with no regular secondary structure (NORS
regions). NORSnet is a machine-learning based method
that identifies such loopy regions [18]; disorder predicted by
NORSnet has a high overlap with disorder predicted by
two methods that have been developed using very differ-
ent principles and data sets, namely DISOPRED2 [30] and
IUPred [31], explicitly using experimentally characterized
disordered regions for their optimization. Surprisingly,
NORSnet can distinguish between long (�30) disordered
regions and those long regions that resemble shorter loops
in well-structured proteins: not all loopy regions of, for
example, 30–40 residues are predicted to have loops, and
vice versa: not all 30–40 residue loops are predicted to be
loopy (Figure 1).
Maintaining disorder requires an extra effort
There are different aspects of ‘maintenance of disorder’
that are nontrivial. Two important aspects of the challenge
for cells to evolve protein disorder that we do not address
here are: prevention from aggregation and digestion is
www.sciencedirect.com
demanding [32], and over-expression of disordered
proteins can be damaging [10,33]. Here, we address
another important aspect: the difficulty of evolving and
maintaining disorder against the odds of random
mutations.
One prevailing view — supported by careful analysis and
data — portrays disorder as a means to become ‘immune’
against mutations [13,34]. The argument combines two
assumptions proceeding as follows. First, mutations in
protein sequences are more often problematic for an
organism than beneficial [35–37]. Second, the details in
well-ordered 3D structures make these become suscept-
ible to mutations [37–41]. In fact, the profiles of which
amino acids can be changed at which positions against
which others contain crucial information that can be used
to predict aspects of protein structure and function [27–29,42]. It might therefore be advantageous to inventregions that function and are less sensitive to mutation.
Could protein disorder be a means toward this end? Some
data supports this view [34,43–50].
Recent observations, however, sharpen this view: many
long disordered regions are not robust against random
Current Opinion in Structural Biology 2011, 21:412–418
414 Sequences and topology
Figure 2
Predicted secondary structure:
1 - predicted disorder
Predicted disorder:short (<10) long (>30)
loop strand helix
DisProt-random
DisProt-native
PDB-native
Content in protein Content in protein
PDB-random
0 10.2 0.4 0.6 0.8 0 10.2 0.4 0.6 0.8
Current Opinion in Structural Biology
Helix/strand conserved-disorder not. We randomly synthesize protein sequences in silico. In one experiment we use the frequency of amino acids
observed in the database of known protein structures (PDB [19]), in the other the frequency observed in the database of proteins with disordered
regions (DisProt [81]). We then predict secondary structure (PROFsec [28,29,54]) and disorder (IUPred [31]) for these random proteins as well as for
representative subsets of native proteins from those two databases. Firstly, the resulting four sets of predictions are surprisingly similar in their
secondary structure (left panel: red: helix, blue: strand, gray: loop). Secondly, the four are also similar in their short disorder (right panel: green bars). In
contrast, the difference in terms of long disorder is very significant (pink). These findings suggest that short disorder resembles loops, while long
disorder behaves very differently. The random mutagenesis published earlier [15] shows the high sensitivity of long disorder to random mutations.
mutations [18]. In particular, long disordered regions
seem to disappear upon random in silico mutagenesis
and are predicted less in sequences assembled randomly
in silico than naturally observed (Figure 2). In contrast,
regular secondary structure is robust against mutation
[15,51] (Figure 2). This suggests that a specific effort is
required for a cell to evolve and maintain long protein
disorder against the drift from random mutations [15] and
hence those regions should not be seen as a cells way to
immunize against mutations.
Disorder seems evolutionarily conserved
All methods that compare proteins (e.g., alignment
methods) were developed using well-structured proteins,
and typically ignored regions of low complexity. Because
disordered regions can be abundant in low-complexity
segments or adopt very different structures when folded,
they are difficult to align [52]. Conclusions from aligning
disordered regions should, therefore, better be viewed
with many grains of salt [53]. Despite this caveat we
observed that very long loopy disorder (NORS-type)
regions appeared to be more conserved than their flanking
regions [14]. In contrast, loops in well-structured proteins
are — on average — less well conserved than regular
secondary structure [54], that is, if loopy disorder were
similar to loops from well-structured proteins, we would
expect a strong trend in the opposite direction. Further-
more, disorder predictors that use sequence profiles as
input perform significantly better than those that use
sequence information alone, suggesting that sequence
conservation is an important feature for prediction
[21,55]. Overall, disorder seems more conserved than
what we might expect from a tool that renders immune
to change.
Current Opinion in Structural Biology 2011, 21:412–418
Does disorder need to be conserved for functional
reasons? We have yet to discover examples for regions
in proteins that are conserved in evolution without any
good functional reason [56–59]. Thus, the conservation of
disorder underlines the importance of this new principle
of protein structure for function. Indeed, we know many
examples for functional disorder and we know that the
more functional a region the more it is conserved. Unfor-
tunately, we cannot conclusively answer the question to
which extent disorder needs to be conserved in order to
preserve function in any way other than for a tiny set of
examples that may not be representative [6,43–45,50].
Protein disorder — a major tool for evolution?Eukaryotes more disordered than prokaryotes
In 1995, we got the first glimpse at an entirely sequenced
organism [60], while others from all super-kingdoms of
life have been following [61]. Several attempts have been
made to find simple protein features that distinguish
super-kingdoms [62,63]. Surprisingly, eukaryotic and pro-
karyotic proteins resemble each other in terms of number
of domains, protein length, and amino acid composition
(with some caveats [63–66]). In contrast to early hypoth-
eses, the fraction of membrane proteins is fairly similar
(put differently, it varies more within each super-king-
dom than it differs between them). Some main themes in
cores of 3D shapes [67] are kingdom-specific [68]. But our
set of structures may still be importantly incomplete and
most relations that exist only on the level of 3D structure
cannot be detected from sequence alone [41], that is, the
kingdom-specificity of folds may turn out to be invalid.
Can we identify disorder as one of the major tools through
which evolution increases complexity and enables adap-
tation?
www.sciencedirect.com
Protein disorder — a breakthrough invention of evolution? Schlessinger et al. 415
Figure 3
Prediction method:
Eukaryota
Bacteria
Archaea
0 2010Percentage of proteins with ≥1 disorder region of length:≥30 consecutive residues ≥80 consecutive residues
30 40 50 60 70 10 20 30 40 50 60 70
Virusea
Current Opinion in Structural Biology
MD IUPred VSL2
Eukaryotes have much more disorder than prokaryotes. Prediction of
disorder in all the 1848 organisms with entirely sequenced proteomes
(�4 million proteins, UNIPROT, December 2010). For each protein, we
predicted disorder regions with three publicly available methods, namely
MD [21], IUPred [31], and VSL2 [21,55]. We removed all regions with
short disorder. Here we show results for two thresholds for ‘short’: the
left panel shows results after removing regions with fewer than 30
consecutive residues in disorder, the right panel those with fewer than
80 residues. The x-axes give the percentage of proteins with at least one
disorder region according to those two length-thresholds. The
percentage is compiled with respect to all proteins from all fully
sequenced organisms in one of the three super-kingdoms and in viruses.
Although the three methods disagree in the percentage of proteins with
disorder predicted, they largely agree in the prediction that eukaryotes
have many more proteins with disorder, for example, MD/IUPred
estimate about 2–3% of all proteins in archaea and bacteria to have at
least one disorder region longer than 80 residues (right panel), while both
estimate about 17–19% of all proteins in eukaryotes to have regions of
such extreme length.
Disorder associated with complexity of an organism
In the following we refer to disorder composition as the
percentage of proteins in an entirely sequenced proteome
that contain at least one long region of disorder. As
experimental information remains insufficient, this value
can currently only be obtained in silico. For a variety of
definitions for ‘long region of disorder’ with respect to
length and prediction method, the composition of dis-
order clearly correlates with the super-kingdom (bacteria,
archaea, and eukaryotes, Figure 3) [14,69]. Specifically,
computational predictions noted some overabundance of
disorder in protein interaction hubs [70–73] (or more
specifically in date hubs [18]), transcription [13,14,74],
translation [13,14], signal transduction [13,14], and more
recently ubiquitination [75]. All these observations sup-
port the oversimplified view of disordered regions as a
molecular tool to increase the complexity of a system or
organism, because ultimately, the complexity of regula-
tion is one of the major differences between bacteria and
eukarya and on the way of the evolution from lower
eukaryotes to mammals.
Can we view protein disorder, the new structural prin-
ciple that is costly to maintain and evolve, as a molecular
mechanism that brings about complexity? Many aspects
of disorder suggest that this might be so, but we need to
dig much deeper to accumulate evidence. One particular
support of the image of disorder as a molecular tool to
build complexity comes from the analysis of disorder in
the context of diverse environments.
Protein disorder related to environmentDifferences between organisms from different habitats
are imprinted upon disorder
We can study the relation between evolution and disorder
is by analyzing disorder in prokaryotes (many diverse
organisms are sequenced, evolutionary relations can be
quantified [76], and disorder predictions are accurate
[77]). Our preliminary work suggests that differences
between organisms from distinct habitats are imprinted
upon the fraction of proteins with long disordered regions.
First, proteomes of thermophiles are well-structured,
which might explain the high success rate of these for
structure determination [78]. Second, proteomes of psy-
chrophiles (i.e., organisms thriving in low temperatures)
are disordered. Tompa et al. picked up on this finding
showing experimentally that some disordered regions can
function in the cold better than well-structured proteins
[79]. Third, proteomes of archaean and bacterial halo-
philes (i.e., organisms that survive high salt conditions)
are disordered, in agreement with the notion that acidic
residues — which ‘promote’ disorder [10] — are import-
ant for proteins to be functional in salty conditions [80].
Fourth, organisms with high tolerance for mutations tend
to be abundant with disorder. For instance, the fraction of
disorder in the proteome of Deinococcus radiodurans, a
www.sciencedirect.com
bacterium that can survive high doses of radiation is very
high. The malleability of its proteome might enable this
bacterium to tolerate structural modifications resulting
from frequent mutations thereby escaping radiation
damage. If so, this might be an example for how despite
the sensitivity of long disorder for mutations disorder
might be a buffer for fateful mutations. The finding that
the majority of these correlations between environment
and disorder are independent of the phylogenetic branch
on the tree of life (e.g., animal kingdom) increases our
confidence that it did not occur by chance.
The finding that a microscopic feature as coarse-grained
as the overall content in proteins with disordered regions
correlates with such a complex macroscopic variable as
the environment remains surprising and will have to be
investigated more. Many questions remain open. Are
these molecular adaptations incremental or do we have
evidence for some sort of leap or transition? Is it true that a
little more or less flexibility distributed throughout the
protein is the gradual response to extreme habitats, or do
we have evidence for some major step in local regions that
may increase the fitness of extremophiles in leaps?
Advanced alignment methods capturing distant sequence
relationships will enable us to accurately compare ortho-
logous proteins and pathways among genomes; disorder
prediction methods focusing on different flavors of dis-
Current Opinion in Structural Biology 2011, 21:412–418
416 Sequences and topology
order will enable us to describe the functional differences
and commonalities among these pathways, thus hopefully
providing answers for these questions.
ConclusionsIn this perspective, we argue for three major views; none
of those can be established, but we argue that accepting
those for the time being is beneficial. Firstly, well-struc-
tured and disordered regions occupy different regions in
the space of all sequences and both differ from random.
Put differently, protein disorder is something new, not an
extreme aspect of flexible regular structure. Secondly,
although the term protein disorder describes a very mixed
bag of features, the content of this bag shares the label
disorder. Thirdly, protein disorder is conserved in evol-
ution and we can picture it as one important tool for
evolution to help in advancing from simple bacteria to
more complex eukaryotes. Ultimately, we need better
experimental and computational methods to better
understand the role of this new phenomenon in biology.
Is disorder THE major tool that simplifies the increase in
complexity and adaptation to the environment? If the
data suggesting this view will be confirmed, protein
disorder may turn out to be THE most important break-
through invention of evolution.
AcknowledgementsThanks to Laszlo Kajan, Tim Karl, and Marlena Drabik (TUM), JulianGough (Univ. Bristol) and Keith Dunker (Indiana Univ.), and Joel Sussman(Weizmann) for their important support; to the anonymous reviewer forimproving this paper. Our work was supported by the Alexander vonHumboldt Foundation, the TUM Institute for Advanced Study, funded bythe German Excellence, and the following NIH grants: R01-LM07329,U54-GM75026-01, NIH F32-GM088991. Last, not least, thanks to all thosewho deposit their experimental data in public databases, and to those whomaintain these databases.
References
1. Mattick JS: Deconstructing the dogma: a new view of theevolution and genetic programming of complex organisms.Ann N Y Acad Sci 2009, 1178:29-46.
2. Petsko GA: Dog eat dogma. Genome Biol 2000, 1:1002.
3. Noble M, Blanchard JS: Catalysis regulation. Curr Opin StructBiol 2009, 19:641-642.
4. Sippl MJ: Fold space unlimited. Curr Opin Struct Biol 2009,19:312-320.
5. Sadowski MI, Jones DT: The sequence–structure relationshipand protein function prediction. Curr Opin Struct Biol 2009,19:357-362.
6. Tokuriki N, Oldfield CJ, Uversky VN, Berezovsky IN, Tawfik DS: Doviral proteins possess unique biophysical features? TrendsBiochem Sci 2009, 34:53-59.
7. Palmer AG 3rd, Massi F: Characterization of the dynamics ofbiomacromolecules using rotating-frame spin relaxation NMRspectroscopy. Chem Rev 2006, 106:1700-1719.
8. Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z,Uversky VN, Dunker AK: Intrinsic disorder and functionalproteomics. Biophys J 2007, 92:1439-1456.
9. Dosztanyi Z, Tompa P: Prediction of protein disorder. MethodsMol Biol 2008, 426:103-115.
Current Opinion in Structural Biology 2011, 21:412–418
10. Dyson HJ, Wright PE: Intrinsically unstructured proteins andtheir functions. Nat Rev Mol Cell Biol 2005, 6:197-208.
11. Romero P, Obradovic Z, Kissinger CR, Villafranca JE, Garner E,Guilliot S, Dunker AK: Thousands of proteins likely to have longdisordered regions. Pac Symp Biocomput 1998:437-448.
12. Vucetic S, Brown CJ, Dunker AK, Obradovic Z: Flavors of proteindisorder. Proteins 2003, 52:573-584.
13. Dunker AK, Silman I, Uversky VN, Sussman JL: Function andstructure of inherently disordered proteins. Curr Opin StructBiol 2008, 18:756-764.
14. Liu J, Tan H, Rost B: Loopy proteins appear conserved inevolution. J Mol Biol 2002, 322:53-64.
15. Schaefer C, Schlessinger A, Rost B: Protein secondary structureappears to be robust under in silico evolution while proteindisorder appears not to be. Bioinformatics 2010, 26:625-631.
16. Bordoli L, Kiefer F, Schwede T: Assessment of disorderpredictions in CASP7. Proteins 2007, 69(Suppl. 8):129-136.
17. Mohan A, Uversky VN, Radivojac P: Influence of sequencechanges and environment on intrinsically disordered proteins.PLoS Comput Biol 2009, 5:e1000497.
18. Schlessinger A, Liu J, Rost B: Natively unstructured loops differfrom other loops. PLoS Comput Biol 2007, 3:e140.
19. Rose PW, Beran B, Bi C, Bluhm WF, Dimitropoulos D,Goodsell DS, Prlic A, Quesada M, Quinn GB, Westbrook JD et al.:The RCSB Protein Data Bank: redesigned web site and webservices. Nucleic Acids Res 2010, 39:D392-D401.
20. Schneider M, Lane L, Boutet E, Lieberherr D, Tognolli M,Bougueleret L, Bairoch A: The UniProtKB/Swiss-Protknowledgebase and its plant proteome annotation program. JProteomics 2009, 72:567-573.
21. Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B: Improveddisorder prediction by combination of orthogonalapproaches. PLoS One 2009, 4:e4433.
22. Galea CA, High AA, Obenauer JC, Mishra A, Park CG, Punta M,Schlessinger A, Ma J, Rost B, Slaughter CA et al.: Large-scaleanalysis of thermostable, mammalian proteins providesinsights into the intrinsically disordered proteome. J ProteomeRes 2009, 8:211-226.
23. Galea CA, Wang Y, Sivakolundu SG, Kriwacki RW: Regulation ofcell division by intrinsically unstructured proteins: intrinsicflexibility, modularity, and signaling conduits. Biochemistry2008, 47:7598-7609.
24. Csizmok V, Felli IC, Tompa P, Banci L, Bertini I: Structural anddynamic characterization of intrinsically disordered humansecurin by NMR spectroscopy. J Am Chem Soc 2008,130:16873-16879.
25. Bracken C, Iakoucheva LM, Romero PR, Dunker AK: Combiningprediction, computation and experiment for thecharacterization of protein disorder. Curr Opin Struct Biol 2004,14:570-576.
26. Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL:Addressing the intrinsic disorder bottleneck in structuralproteomics. Proteins Struct Funct Bioinform 2005, 59:444-453.
27. Rost B: Did evolution leap to create the protein universe? CurrOpin Struct Biol 2002, 12:409-416.
28. Rost B: PHD: predicting one-dimensional protein structure byprofile based neural networks. Methods Enzymol 1996, 266:525-539.
29. Rost B, Sander C: Prediction of protein secondary structure atbetter than 70% accuracy. J Mol Biol 1993, 232:584-599.
30. Buchan DW, Ward SM, Lobley AE, Nugent TC, Bryson K,Jones DT: Protein annotation and modelling servers atUniversity College London. Nucleic Acids Res 2010,38(Suppl.):W563-W568.
31. Dosztanyi Z, Csizmok V, Tompa P, Simon I: IUPred: web serverfor the prediction of intrinsically unstructured regions of
www.sciencedirect.com
Protein disorder — a breakthrough invention of evolution? Schlessinger et al. 417
proteins based on estimated energy content. Bioinformatics2005, 21:3433-3434.
32. Tompa P, Prilusky J, Silman I, Sussman JL: Structural disorderserves as a weak signal for intracellular protein degradation.Proteins 2008, 71:903-909.
33. Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B: Intrinsicprotein disorder and interaction promiscuity are widelyassociated with dosage sensitivity. Cell 2009, 138:198-208.
34. Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P,Oldfield CJ, Cortese MS, Sickmeier M, Legall T, Obradovic Z et al.:Alternative splicing in concert with protein intrinsic disorderenables increased functional diversity in multicellularorganisms. Proc Natl Acad Sci U S A 2006, 103:8390-8395.
35. Bromberg Y, Overton J, Vaisse C, Leibel RL, Rost B: In silicomutagenesis: a case study of the melanocortin 4 receptor.FASEB J 2009, 23:3059-3069.
36. Matthews B: Structural and genetic analysis of protein foldingand stability. Curr Opin Struct Biol 1993, 3:589-593.
37. Lesk AM: In Protein Architecture — A Practical Approach, vol. 1.Edited by Rickwood D, Hames HD. Oxford/New York/Tokyo:Oxford University Press; 1991.
38. Chothia C, Lesk AM: The relation between the divergence ofsequence and structure in proteins. EMBO J 1986, 5:823-826.
39. Sander C, Schneider R: Database of homology-derivedstructures and the structural meaning of sequence alignment.Proteins Struct Funct Genet 1991, 9:56-68.
40. Rost B: Twilight zone of protein sequence alignments. ProteinEng 1999, 12:85-94.
41. Rost B, O’Donoghue S, Sander C: Midnight Zone of ProteinStructure Evolution. Heidelberg: EMBL; 1998.
42. Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y: Automaticprediction of protein function. Cell Mol Life Sci 2003, 60:2637-2650.
43. Fuxreiter M, Tompa P, Simon I: Local structural disorderimparts plasticity on linear motifs. Bioinformatics 2007,23:950-956.
44. Sugase K, Dyson HJ, Wright PE: Mechanism of coupled foldingand binding of an intrinsically disordered protein. Nature 2007,447:1021-1025.
45. Brown CJ, Johnson AK, Daughdrill GW: Comparing models ofevolution for ordered and disordered proteins. Mol Biol Evol2010, 27:609-621.
46. Hegyi H, Kalmar L, Horvath T, Tompa P: Verification ofalternative splicing variants based on domain integrity,truncation length and intrinsic protein disorder. Nucleic AcidsRes 2010, 39:1208-1219.
47. Cortese MS, Uversky VN, Dunker AK: Intrinsic disorder inscaffold proteins: getting more from less. Prog Biophys Mol Biol2008, 98:85-106.
48. Midic U, Oldfield CJ, Dunker AK, Obradovic Z, Uversky VN:Unfoldomics of human genetic diseases: illustrative examplesof ordered and intrinsically disordered members of the humandiseasome. Protein Pept Lett 2009, 16:1533-1547.
49. Tokuriki N, Tawfik DS: Stability effects of mutations and proteinevolvability. Curr Opin Struct Biol 2009, 19:596-604.
50. Mohan A, Oldfield CJ, Radivojac P, Vacic V, Cortese MS,Dunker AK, Uversky VN: Analysis of molecular recognitionfeatures (MoRFs). J Mol Biol 2006, 362:1043-1059.
51. Lavelle DT, Pearson WR: Globally, unrelated protein sequencesappear random. Bioinformatics 2010, 26:310-318.
52. Forslund K, Sonnhammer EL: Benchmarking homologydetection procedures with low complexity filters.Bioinformatics 2009, 25:2500-2505.
53. Radivojac P, Obradovic Z, Brown CJ, Dunker AK: Improvingsequence alignments for intrinsically disordered proteins. PacSymp Biocomput 2002:589-600.
www.sciencedirect.com
54. Rost B, Sander C: Combining evolutionary information andneural networks to predict protein secondary structure.Proteins Struct Funct Genet 1994, 19:55-72.
55. Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK, Obradovic Z:Optimizing long intrinsic disorder predictors with proteinevolutionary information. J Bioinform Comput Biol 2005, 3:35-60.
56. Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A,Lee D, Fiser A, Godzik A, Rost B, Orengo C: PSI-2: structuralgenomics to cover protein domain family space. Structure2009, 17:869-881.
57. Dessailly BH, Redfern OC, Cuff A, Orengo CA: Exploitingstructural classifications for function prediction: towards adomain grammar for protein function. Curr Opin Struct Biol2009, 19:349-356.
58. Andreeva A, Murzin AG: Structural classification of proteins andstructural genomics: new insights into protein folding andevolution. Acta Crystallogr F Struct Biol Cryst Commun 2010,66:1190-1197.
59. Murzin AG: Biochemistry. Metamorphic proteins. Science 2008,320:1725-1726.
60. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF,Kerlavage AR, Bult CJ, Tomb J-F, Dougherty BA, Merrick JM et al.:Whole-genome random sequencing and assembly ofHaemophilus influenzae Rd. Science 1995, 269:496-512.
61. Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC: TheGenomes On Line Database (GOLD) in 2007: status of genomicand metagenomic projects and their associated metadata.Nucleic Acids Res 2008, 36:D475-D479.
62. Gerstein M, Levitt M: A structural census of the currentpopulation of protein sequences. Proc Natl Acad Sci U S A 1997,94:11911-11916.
63. Liu J, Rost B: Comparing function and structure between entireproteomes. Protein Sci 2001, 10:1970-1979.
64. Liu J, Hegyi H, Acton TB, Montelione GT, Rost B: Automatictarget selection for structural genomics on eukaryotes.Proteins Struct Funct Bioinform 2004, 56:188-200.
65. Liu J, Rost B: Domains, motifs, and clusters in the proteinuniverse. Curr Opin Chem Biol 2003, 7:5-11.
66. Pe’er I, Felder CE, Man O, Silman I, Sussman JL, Beckmann JS:Proteomic signatures: amino acid and oligopeptidecompositions differentiate among phyla. Proteins 2004,54:20-40.
67. Petrey D, Fischer M, Honig B: Structural relationships amongproteins with different global topologies and their implicationsfor function annotation strategies. Proc Natl Acad Sci U S A2009, 106:17377-17382.
68. Aravind L, Iyer LM, Koonin EV: Comparative genomics andstructural biology of the molecular innovations of eukaryotes.Curr Opin Struct Biol 2006, 16:409-419.
69. Dunker AK, Obradovic Z, Romero P, Garner EC, Brown CJ:Intrinsic protein disorder in complete genomes. GenomeInform Ser Workshop Genome Inform 2000, 11:161-171.
70. Dosztanyi Z, Chen J, Dunker AK, Simon I, Tompa P: Disorder andsequence repeats in hub proteins and their implications fornetwork evolution. J Proteome Res 2006, 5:2985-2995.
71. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN:Flexible nets, the roles of intrinsic disorder in proteininteraction networks. FASEB J 2005, 272:5129-5148.
72. Gsponer J, Babu MM: The rules of disorder or why disorderrules. Prog Biophys Mol Biol 2009, 99:94-103.
73. Oldfield CJ, Meng J, Yang JY, Yang MQ, Uversky VN, Dunker AK:Flexible nets: disorder and induced fit in the associations ofp53 and 14-3-3 with their partners. BMC Genomics 2008,9(Suppl. 1):S1.
74. Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK:Intrinsic disorder in transcription factors. Biochemistry 2006,45:6873-6888.
Current Opinion in Structural Biology 2011, 21:412–418
418 Sequences and topology
75. Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW,Goebl MG, Iakoucheva LM: Identification, analysis, andprediction of protein ubiquitination sites. Proteins 2009,78:365-380.
76. Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online toolfor phylogenetic tree display and annotation. Bioinformatics2007, 23:127-128.
77. Noivirt-Brik O, Prilusky J, Sussman JL: Assessment of disorderpredictions in CASP8. Proteins 2009, 77(Suppl. 9):210-216.
78. Robinson-Rechavi M, Alibes A, Godzik A: Contribution ofelectrostatic interactions, compactness and quaternarystructure to protein thermostability: lessons from
Current Opinion in Structural Biology 2011, 21:412–418
structural genomics of Thermotoga maritima. J Mol Biol 2006,356:547-557.
79. Tantos A, Friedrich P, Tompa P: Cold stability of intrinsicallydisordered proteins. FEBS Lett 2009, 583:465-469.
80. Fukuchi S, Yoshimune K, Wakayama M, Moriguchi M,Nishikawa K: Unique amino acid composition of proteins inhalophilic bacteria. J Mol Biol 2003, 327:347-357.
81. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS,Tantos A, Szabo B, Tompa P, Chen J, Uversky VN et al.: DisProt:the database of disordered proteins. Nucleic Acids Res 2007,35:D786-793.
www.sciencedirect.com