Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Structure
Review
Protein Modeling: What Happenedto the ‘‘Protein Structure Gap’’?
Torsten Schwede1,2,*1Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland2Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056 Basel, Switzerland*Correspondence: [email protected]://dx.doi.org/10.1016/j.str.2013.08.007
Computational modeling of three-dimensional macromolecular structures and complexes from theirsequence has been a long-standing vision in structural biology. Over the last 2 decades, a paradigm shifthas occurred: starting from a large ‘‘structure knowledge gap’’ between the huge number of proteinsequences and small number of known structures, today, some form of structural information, either exper-imental or template-based models, is available for the majority of amino acids encoded by common modelorganism genomes. With the scientific focus of interest moving toward larger macromolecular complexesand dynamic networks of interactions, the integration of computational modeling methods with low-resolu-tion experimental techniques allows the study of large and complex molecular machines. One of the openchallenges for computational modeling and prediction techniques is to convey the underlying assumptions,as well as the expected accuracy and structural variability of a specificmodel, which is crucial to understand-ing its limitations.
All macromolecular structures are, to some degree, models with
a variable ratio between experimental data and computational
prediction. Typically, the atomic coordinates of heavy atoms in
very high-resolution crystal structures are overdetermined by
the diffraction data, while methods with a lower ratio of para-
meters to experimental observables increasingly rely on com-
putational tools to construct structural models for the spatial
interpretation of the data (e.g., nuclear magnetic resonance
[NMR], electron microscopy [EM], small-angle X-ray scattering
[SAXS], fluorescence resonance energy transfer [FRET]) (Kalinin
et al., 2012; Read et al., 2011; Rieping et al., 2005; Trewhella
et al., 2013). At the other end of the spectrum are methods for
the de novo prediction of macromolecular structures, which
aim to predict the native three-dimensional (3D) structure, typi-
cally a domain of a protein, directly from its amino acid sequence
without experimental data by using various ab initio or knowl-
edge-based computational approaches (Baker and Sali, 2001;
Das and Baker, 2008).
With the focus of interest in structural biology moving toward
larger macromolecular complexes and dynamic networks of
interactions, integrative structure solution techniques that can
combine experimental data from heterogeneous sources and
are able to handle ambiguous or conflicting information
are becoming essential. The combination of computational
modelingwith low-resolution experimental constraints has espe-
cially proven powerful (Ward et al., 2013). In retrospect, the most
famous 3D structure of a biological macromolecule could, in
today’s terms, be considered as an ‘‘integrative low-resolution
model’’: When Watson and Crick published the structure of the
DNA double helix, their model was based on fiber diffraction
data at low resolution and additional constraints about chemistry
and stoichiometry. Although atomic high-resolution diffraction
data only became availablemuch later, this low-resolutionmodel
suggested ‘‘a possible copyingmechanism for the geneticmate-
Structure
rial’’ (Watson and Crick, 1953, p. 737) and has initiated a revolu-
tion in molecular biology and biomedical research (Collins et al.,
2003). Obviously, it is not the atomic resolution or precision of
a model that determines its usefulness but the understanding
which interpretations and conclusions can be supported by the
model at hand.
In contrast to the regular structure of the DNA double helix, the
structural biology of proteins is much more complex, where
each protein has its own unique 3D structure. Since small
changes in the sequence of a protein can have strong effects
on its biophysical properties, experimental determination of
protein structures is a laborious and often unpredictable
endeavor. The computational modeling of a protein’s structure
has therefore attracted substantial interest in the field of
bioinformatics to complement experimental structural biology
efforts to characterize the protein universe (Baker and Sali,
2001; Levitt, 2009).
In the following paragraphs, I provide an updated view on the
‘‘protein structure gap,’’ arguing that structure information for the
majority of amino acids in common model organism proteomes
can be provided by a combination of computational and exper-
imental techniques. I highlight some applications of structure
modeling and prediction techniques in various areas of life
sciences and discuss limitations and challenges in communi-
cating model information to potential users of models.
The Protein Structure Gap Is DisappearingAdvances in DNA sequencing techniques are giving rise to an
unprecedented avalanche of new sequences (UniProt Con-
sortium, 2013), and it is obvious that it will be impossible to deter-
mine the structures of all proteins of interest experimentally
with current techniques (Figure 1). With more sensitive next-gen-
eration sequencing techniques becoming available, noncultivat-
able organisms also come within reach, widening the protein
21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1531
Figure 1. Mind the GapThe number of entries in the SwissProt and trEMBL sequence databases(UniProt Consortium, 2013) and the PDB (Berman et al., 2007) are growingexponentially, while the protein structure gap between sequence and struc-tures is widening dramatically. Inset: growth of PDB holdings from 1972 to2013.
Table 1. Commonly Used Tools and Services for Protein
Structure Modeling and Prediction
Tool or Service Web Site
Protein Model
Portal
http://www.proteinmodelportal.org
(Arnold et al., 2009; Haas et al., 2013)
Model Archive http://modelarchive.org
HHpred http://toolkit.tuebingen.mpg.de/hhpred
(Hildebrand et al., 2009)
IMP http://www.salilab.org/imp (Russel et al., 2012;
Yang et al., 2012)
IntFOLD http://www.reading.ac.uk/bioinf/IntFOLD/
(Roche et al., 2011)
I-Tasser http://zhanglab.ccmb.med.umich.edu/
I-TASSER/ (Zhang, 2013)
ModBase http://salilab.org/modbase/
(Pieper et al., 2011)
Modeler/ModWeb http://salilab.org/modeller/
(Pieper et al., 2011; Yang et al., 2012)
Pcons.net http://pcons.net/ (Larsson et al., 2011)
PHYRE2 http://www.sbg.bio.ic.ac.uk/phyre2/
(Kelley and Sternberg, 2009)
Robetta http://robetta.bakerlab.org/
(Raman et al., 2009)
Rosetta https://www.rosettacommons.org
(Das and Baker, 2008)
SWISS-MODEL
Repository
http://swissmodel.expasy.org/repository
(Kiefer et al., 2009)
SWISS-MODEL
Workspace
http://swissmodel.expasy.org/workspace/
(Arnold et al., 2006; Bordoli and Schwede, 2012)
Structure
Review
structure gap even further, despite tremendous progress in auto-
mating experimental structure determination techniques.
Fortunately, homologous proteins that share detectable
sequence similarity have similar 3D structures, and their struc-
tural diversity is increasing with evolutionary distance, as out-
lined by Chothia and Lesk in their seminal paper, ‘‘The Relation
between the Divergence of Sequence and Structure in Proteins’’
(Chothia and Lesk, 1986). Based on this observation, methods
were developed 2 decades ago for the comparative modeling
(a.k.a. homology or template-based modeling) of protein struc-
tures, which allow extrapolation of the available experimental
structure information to as-yet uncharacterized protein se-
quences (Guex et al., 2009; Peitsch, 1995; Sanchez and Sali,
1998; Sutcliffe et al., 1987). Today, comparative modeling
techniques have matured into fully automated stable pipelines
that provide reliable 3Dmodels accessible also to nonspecialists
(Table 1).
The apparent complexity of the protein sequence universe can
be explained by multidomain architectures formed by combina-
tions of single domains characterized by approximately 15,000
sequence family profiles (Levitt, 2009). Thanks to efforts by the
experimental structural biology community andworldwide struc-
tural genomics efforts (Terwilliger, 2011), an increasing fraction
of protein families has at least one member with an experimental
structure in the Protein Data Bank (PDB) (Berman et al., 2007). At
the same time, sensitive and accurate profile HMM methods
have been developed and allow researchers to take advantage
of the available sequence information for detection of remote
template relationships (Remmert et al., 2012). As a result, a para-
digm shift has occurred during the last decade: starting from a
situation where the ‘‘structure knowledge gap’’ between the
number of protein sequences and small number of known struc-
tures has hampered the widespread use of structure-based
approaches in life science research, today, some form of struc-
tural information, either experimental or computational, is avail-
able for the majority of amino acids encoded by common model
organism genomes (Figure 2). The widespread availability of 3D
structure information enables rational structure-based ap-
1532 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r
proaches in a broad range of applications in life science research
(Schwede et al., 2009).
However, computational models often represent only fractions
of the full-length of a protein of interest, and one of the unre-
solved questions in template-basedmodeling is how to combine
information from multiple templates, e.g., different structural
domains, into larger complex assemblies. Current techniques
are not able to reliably predict the relative orientation of domains
of such multitemplate models. Also, comparative models still
resemble more closely the template than the target structure,
and refinement methods are not able to consistently refine
models closer to the target structure (MacCallum et al., 2011).
The development of reproducible and reliable methods for
refinement that consistently improve the accuracy of models
by shifting the coordinates closer to the native state is one of
the pressing challenges in the field.
The function of a protein almost always involves motions and
conformational changes, and a molecular understanding of its
mechanism requires a detailed description of the different func-
tional states the structure can explore dynamically. Typical
examples include allosteric conformational changes on binding
events, intermediate excited states in reaction cycles, trans-
port, and motion phenomena. Frequently, however, these
states are not directly observable experimentally at high reso-
lution but can only be characterized at low resolution, e.g., by
changes in chemical shifts, FRET, SAXS, or limited electron
density in X-ray crystallography (Hennig et al., 2013; Kalinin
eserved
Figure 2. Structural Template Coverage of the Human ProteomeThe fraction of amino acids in the human proteome showing sequence simi-larity to proteins with known structures in the PDB is shown over time, wherecolors indicate levels of sequence identity as detected by PSI-BLAST (Altschulet al., 1997). The area shaded in red indicates the fraction of about 30% ofintrinsically unstructured residues estimated in the human proteome (Colaket al., 2013; Ward et al., 2004). Models built on templates sharing lowsequence identity <20% are often of poor quality due to evolutionary diver-gence between target and template structures, as well as limitations of themodeling and refinement methods (illustrated in Figure 3B). Prokaryotic pro-teomes have, in general, a higher structural coverage than eukaryotic ones(Guex et al., 2009; Zhang et al., 2009).
Structure
Review
et al., 2012; Ochi et al., 2012). Modeling and simulation will play
a central role in exploring these alternative conformations and
describing the dynamics of the transitions (Ma et al., 2011;
Nygaard et al., 2013; Weinkam et al., 2012).
Modeling Protein Complexes and InteractionsAlthough structural protein domains often reflect functional
modules (Lees et al., 2012), they are rarely found in isolation:
many proteins form intricate multidomain architectures, often
assemble to stable oligomeric quaternary states, and frequently
incorporate low-molecular-weight ligands and metal ions, e.g.,
as cofactors or structural components. Since molecular inter-
actions are not easily recognizable from the sequence of the pro-
tein, different structure-based prediction strategies have been
explored for modeling these interactions. Assuming that the
structures of the proteins participating in a complex are known,
docking programs aim to align them in an orientation favorable
for interaction (Janin, 2010). However, the details of the struc-
ture-affinity relationships are not yet fully understood, and for
complex systems, the estimates for the binding free energy are
often only approximate (Kastritis and Bonvin, 2013).
The fraction of protein heterocomplexes in the PDB is small
compared to the number of monomeric or homo-oligomeric
structures. Nevertheless, it seems that, for almost all known pro-
tein-protein interactions for which the individual components are
structurally characterized, structures of complexes can be iden-
tified in the PDB that can be used for template-based prediction
approaches (Kundrotas et al., 2012; Stein et al., 2011; Xu and
Dunbrack, 2011). In combination with homology modeling of
the target proteins, this opens the opportunity for the structural
prediction of protein-protein interactions on a genomewide scale
(Stein et al., 2011; Vakser, 2013). Although structure-based
methods for predicting protein-protein interactions might have
Structure
a rather high noise level, accuracy comparable to other high-
throughput methods can be achieved in combination with
orthogonal information (Aloy and Russell, 2006; Zhang et al.,
2012). The current state of the art in protein docking and predic-
tion of complexes is regularly assessed in the CAPRI blind pre-
diction experiment (Janin, 2010).
In contrast to structurally well-characterized stable protein
complexes, transient protein-protein interactions often involve
significant structural changes of the partners on binding. In
fact, a large fraction estimated around 30% of the proteome
encoded by higher eukaryotes is assumed to be highly flexible
or even intrinsically disordered and supposed to form a well-
defined 3-dimensional structure only on binding to a partner
molecule (Colak et al., 2013; Janin and Sternberg, 2013). These
intrinsically disordered protein (IDP) regions enable interactions
with many different proteins and are attributed to have a function
in tissue-specific rewiring of protein-protein interactions by spe-
cific modifications of IDP regions; for example, by posttransla-
tional modifications (PTMs) or alternative splicing (Buljan et al.,
2013). Sequence variations in protein-protein interfaces are
often associatedwith human diseases, as they have the potential
to disrupt regulatory networks in the cell (David et al., 2012; Vidal
et al., 2011; Wei et al., 2013). Prediction of these interactions and
structural modeling of mutations and PTMs remains an open
challenge of highest interest (Uversky and Dunker, 2013; Wass
et al., 2011).
Know Your Limits‘‘Essentially, all models are wrong, but some are useful’’ (Box
and Draper, 1987, p. 424). Different applications of macromolec-
ular structure information have different requirements with
respect to accuracy and resolution of a model. While atomistic
molecular modeling only works with highly accurate sets of
coordinates, models of lower resolution are still useful for the
rational-design site-directed mutagenesis experiments, epitope
mapping, or supporting experimental structure determination
(Schwede et al., 2009). In order to be able to decide if a model
or prediction is useful for a specific application, knowing its
expected accuracy and quality is essential. The assessment of
techniques for structure prediction in retrospective experiments
such as CASP (Moult et al., 2011), EVA (Koh et al., 2003), Life-
Bench (Rychlewski and Fischer, 2005), or CAMEO (Haas et al.,
2013) allowed comparison of different methods on the same
data set, thereby establishing the current state of the art and indi-
cating areas that require further development of improved
methods. However, the accuracy differences between the best
predictionmethods on the same protein target are typically small
in comparison to the differences between easy and difficult
protein targets (Figure 3). Therefore, reliable local error estimates
for the atomic coordinates predicted by a model are crucial for
judging its applicability for a specific question. Unfortunately,
only very few modeling methods today deliver reliable confi-
dence measures for their predictions (Kryshtafovych et al.,
2013; Mariani et al., 2011). This is a serious limitation of current
modeling techniques, which hinders the more widespread appli-
cation of models in biomedical research.
Several independent tools for model validation and quality
estimation have been developed to overcome this problem,
which assess certain structural features of a model such as
21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1533
Figure 3. Examples of Blind StructurePredictionsTwo proteins of different modeling difficulty aredisplayed, highlighting the importance of qualityestimation. Reliable estimation of the expectedcoordinate errors of individual models is crucial forjudging their suitability for specific applications.Consensus between independent predictionmethods has been shown to be a good indicator ofmodel accuracy in general.(A) Crystal structure of the acyl-CoA dehydroge-nase from Slackia heliotrinireducens solved by theMidwest Center for Structural Genomics insuperposition with the ten best blind predictions inthe CASP10 experiment (T0758). Obviously, in thiscase, all predictions agree well with the experi-mental reference structure, and the differencesbetween methods are small.(B) In more difficult cases, such as the crystalstructure of a hypothetical protein from Rumino-coccus gnavus solved at Joint Center for Struc-tural Genomics (PDB ID 4GL6) shown here (CASPtarget T0684-d2), no suitable template structurecould be identified, and the ten best predictionsshow large deviations from the reference structureand among each other.
Structure
Review
stereo-chemical correctness (Chen et al., 2010; Hooft et al.,
1996; Laskowski et al., 1993) or apply a combination of knowl-
edge-based statistical measures derived from high-resolution
crystals structures to provide estimates of model accuracy (Ben-
kert et al., 2011; Ray et al., 2012; Wiederstein and Sippl, 2007). In
case where many independent predictions for the same protein
by different independent methods are available, consensus
approaches have proven powerful to identify reliable areas in
models and to identify segments with likely deviations from the
actual structure (Ginalski et al., 2003; Kryshtafovych et al.,
2013; McGuffin et al., 2013).
Validating the accuracy and reliability of a model in order to
estimate its suitability for a specific application obviously
requires access to the model coordinates and information about
the procedures and underlying assumptions that were applied to
generate the model. However, since coordinates derived by
theoretical modeling cannot be deposited in the PDB (Berman
et al., 2006), many manuscripts reporting results of theoretical
modeling or simulations are published without the models being
made available. This makes it impossible for the reader and
reviewer of the manuscript to judge if the experiment is repro-
ducible and if conclusions are justified. In order to alleviate this
situation, a public archive of macromolecular structure models
(http://modelarchive.org) is currently being established as part
of Protein Model Portal (Haas et al., 2013). The model archive
provides a unique stable accession code (i.e., a digital object
identifier, or DOI) for each deposited model, which can be
directly referenced in the corresponding manuscripts. Besides
the actual model coordinates, archiving of models should
include sufficient details about assumptions, parameters, and
constraints applied in the simulation to allow the user of a model
to assess, and, if necessary, reproduce the simulation. In an ideal
situation, it should be possible to download a deposition from a
model archive and continue the simulation, e.g., by adding more
experimental data as constraints or by applying advanced simu-
lation methods that may have been developed in the meantime.
1534 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r
Over the last 2 decades, protein structure modeling and
prediction methods have matured to a point where reliable
models for many proteins can be generated and successfully
used as substitutes for direct experimental structures in a broad
variety of applications. In the following text, a few recent exam-
ples are highlighted.
Application of Homology Models in Structure-BasedDrug DiscoveryThe rational development of drugs increasingly relies on struc-
ture-based strategies for identifying potent and selective low-
molecular-weight chemical compounds. The usefulness of
homology models in structure-based virtual screening has
been demonstrated in various retrospective analyses on a broad
variety of different targets (Costanzi, 2013; Kairys et al., 2006;
McGovern and Shoichet, 2003; Oshiro et al., 2004; Skolnick
et al., 2013). One class of proteins garnering particular interest
for structure-based drug discovery are G protein-coupled recep-
tors (GPCRs), where recent advances in experimental structure
determination have brought many receptors within range for
comparative modeling techniques (Carlsson et al., 2011; Kobilka
andSchertler, 2008). Similar to protein structure prediction, com-
munity efforts for blind and independent validation of prediction
techniques are crucial for assessment of the expected accuracy
and reliability of modeled protein-ligand interactions (Damm-Ga-
namet et al., 2013; Kufareva et al., 2011; SAMPL, 2010).
Of note, experimental structures will not necessarily give
better results than models in structure-based drug discovery.
Bajorath and coworkers have analyzed 322 prospective virtual
screening campaigns in the scientific literature (Ripphausen
et al., 2010), out of which a total of 73 studies successfully
utilized homology models. It is surprising that the potency of
the hits identified using homology models was, on average,
higher than for hits identified by docking into X-ray structures.
The observation that an X-ray structure is not necessarily the
best possible representation of a particular structural state is
eserved
Structure
Review
illustrated by a recent study for predicting the substrate site of
metabolism in cytochrome P450 CYP2D6. A model of CYP2D6
was generated based on the X-ray crystal structures of sub-
strate-bound CYP2C5 (Unwalla et al., 2010). During the study,
the structure of apo-CYP2D6 also became available. Both the
homology model and the experimental structure were used as
receptors in docking calculations. While overall the homology
model was in good agreement with the CYP2D6 crystal struc-
ture, the model consistently outperformed the experimental
structure in the docking calculations. This observation can be
attributed to structural differences in the substrate recognition
sites and demonstrates the importance of correctly describing
substrate-induced conformational changes that occur upon
ligand binding, which must be taken into account. Computa-
tional modeling techniques play a crucial role in exploring such
alternative receptor conformations.
Structural modeling can not only support the development of
new drugs but also predict the likely effect of amino acid
sequence variation in the proximity of the binding site; for ex-
ample, mutations that disrupt drug binding leading to drug re-
sistance. Computational techniques that can predict resistance
a priori are expected to become useful tools for drug discovery
and design of treatment in the clinics (Safi and Lilien, 2012).
De Novo Structure Prediction Techniques Guide ProteinEngineeringThe de novo structure prediction of naturally occurring proteins
is considered as the ‘‘holy grail’’ in computational structural
biology. Still, despite a few remarkable exceptions, de novo
techniques are limited to small proteins, and the overall accuracy
remains typically rather low (Kinch et al., 2011). It is interesting,
however, that the very same techniques appear to work remark-
ably well for the inverse problem—the design of protein
sequences with specific properties, e.g., that will fold into a spe-
cific structure or will form specific interactions. Baker and
coworkers have successfully designed a library of idealized
protein folds that allowed them to derive a set of rules guiding
future designs (Koga et al., 2012). Others have engineered natu-
rally occurring tandem-repeat proteins, e.g., ankyrin repeat
proteins, into a system that now allows researchers to rationally
design proteins with specific binding properties and finds a wide
range of applications from research in structural biology to,
possibly in the future, therapeutics (Javadi and Itzhaki, 2013;
Tamaskovic et al., 2012).
Due to limitations in the computational design methodology,
these approaches are often combined with high-throughput
screening and in vitro maturation techniques to diagnose
modeling inaccuracies and generate high-activity binders
(Whitehead et al., 2013). This approach was recently applied to
design proteins that bind a conserved surface patch on the
stem of the influenza hemagglutinin (HA) from the 1918 H1N1
pandemic virus. After affinity maturation, two of the designed
proteins, HB36 and HB80, bind H1 and H5 HAs with low nano-
molar affinity (Fleishman et al., 2011). Special cases of designing
specific interactions are proteins that self-assemble to a desired
symmetric architecture. The experimental validation of a de-
signed 24-subunit, 13-nm diameter complex with octahedral
symmetry and a 12-subunit, 11-nm diameter complex with tetra-
hedral symmetry confirmed that the resulting materials closely
Structure
matched the design models (King et al., 2012). The approach
opens interesting perspectives for the development of self-
assembling protein nanomaterials.
While, in the previous example, the self-assembly into larger
aggregates was a desired design goal, the self-assembly of
proteins into amyloid fibrils in the human body is associated
with numerous pathologies. Recently, several structures of
amyloid fibrils have been determined experimentally (Eisenberg
and Jucker, 2012) and provide an interesting target for the devel-
opment of specific inhibitors of pathological amyloid fibril for-
mation. Computer-aided structure-based design approaches
have been successfully applied to develop highly specific pep-
tide inhibitors of amyloid formation of the tau protein associated
with Alzheimer’s disease and of an amyloid fibril that enhances
sexual transmission of the HIV (Sievers et al., 2011). It is worth
noting that such peptide inhibitors are not limited to the 20 natu-
rally occurring amino acids.
Coevolution Information in Sequence Data HelpsPredict Membrane Protein StructuresThe experimental determination of the structures of membrane
proteins by X-ray crystallography is challenging and requires a
series of sophisticated techniques for protein expression, purifi-
cation, and crystallization. Recent technological advances have
led to a strong increase in the number of membrane proteins
characterized experimentally. Prominent examples include
pharmacologically highly relevant drug targets such as GPCRs
(Kobilka and Schertler, 2008), drug efflux transporters (Aller
et al., 2009; Nakashima et al., 2013), or ion channels (Gouaux
and Mackinnon, 2005). However, the number of membrane
proteins is still small compared to soluble proteins, and homol-
ogy modeling of membrane proteins is, therefore, only possible
for a small fraction of membrane proteins.
Recently, computational methods using coevolution infor-
mation have been developed to predict contacts between pairs
of amino acid residues based on deep multiple sequence align-
ments. Although similar approaches had been proposed before,
recent breakthroughs in the handling of phylogenetic information
and in disentangling indirect relationships have resulted in an
improved capacity to predict contacts between different protein
residues (Burger and van Nimwegen, 2010; de Juan et al., 2013;
Hopf et al., 2012; Lapedes et al., 2002; Morcos et al., 2011;
Nugent and Jones, 2012). These approaches were shown to be
effective in inferring evolutionary covariation in pairs of sequence
positions within families of membrane proteins and were shown
to provide sufficiently accurate pairwise distance constraints
to generate all-atom models. These methods are expected to
greatly expand the range of membrane proteins amenable to
modeling due to the rapid increase sequence information, which
will allowderivation ofmore comprehensive information on evolu-
tionary constraints (Hopf et al., 2012; Nugent and Jones, 2012).
Combining Experimental Methods and ComputationalModeling for Integrative Structure DeterminationIn the simplest case of combining modeling with experimental
data, homology models are being used as search models for
phasing diffraction data in X-ray crystallography by searching
for placements of a starting model within the crystallographic
unit cell that best accounts for the measured diffraction
21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1535
Figure 4. Integrative Structure Model of the NPCThe molecular architecture of the approximately 50 MDa transmembrane NPCconsist of 456 constituent proteins that selectively transport cargoes acrossthe nuclear envelope (Alber et al., 2007a, 2007b). Image courtesy of AndrejSali, UCSF (http://salilab.org).
Structure
Review
amplitudes. This approach however, requires relatively accurate
starting coordinates and often fails for starting models based on
remote homologs. This limitation can be overcome by using
modeling algorithms for sampling near-native conformations of
the initial starting model (DiMaio et al., 2011) or even allow
human protein folding game players to sample different protein
conformations (Khatib et al., 2011).
Often, however, preparing protein crystals of sufficient quality
to collect high-resolution diffraction data turns out to be the
limiting step, and complementary methods to characterize the
sample in solution have to be explored. Many of these methods
provide data of relatively low resolution, and computational
modeling techniques are required to generate likely structural
representations compatible with the data. For example, NMR
chemical shifts provide important local structural information
for proteins, and consistent structure generation from NMR
chemical shift data has recently become feasible for proteins
with sizes of up to 130 residues at an accuracy comparable to
those obtained with the standard NMR protocol (Shen et al.,
2009). SAXS is a robust and easily accessible technique for the
structural characterizations of biological macromolecular com-
plexes in solution under physiological conditions. This low-reso-
lution technique is specifically useful for characterizing large and
transient complexes andmovements in flexible macromolecules
(Graewert and Svergun, 2013; Rambo and Tainer, 2013;
Schneidman-Duhovny et al., 2012a; Trewhella et al., 2013) and
provides valuable constraints, e.g., for restrained macromolec-
ular docking experiments (Schneidman-Duhovny et al., 2011).
Recent advances in the measurement and interpretation of
single-molecule FRET experiments allow the generation of highly
accurate distance constraints as input for restrained modeling of
biomolecules and complexes (Kalinin et al., 2012). Conceptually
similar constraints about the spatial proximity of pairs of lysine
residues at the surface of a protein can be derived by chemical
crosslinking and analysis by peptide mass spectrometry. This
type of information appears useful for reconstructing large
macromolecular complexes (Walzthoeni et al., 2013).
While, traditionally, the main focus of macromolecular struc-
ture modeling and prediction has been on proteins, RNA struc-
ture prediction has recently gained significant attention. Of
note, many of the approaches developed originally for protein
modeling such as homology modeling, de novo prediction, qual-
ity estimation, and blind assessment of prediction methods,
appear to be applicable with small adaptations to RNA structure
modeling (Cruz et al., 2012; Rother et al., 2011; Seetin andMath-
ews, 2012; Sim et al., 2012). In contrast to proteins, RNA second-
ary structure can be directly characterized by an experimental
approach called SHAPE-Seq, which uses selective 20-hydroxylacylation analyzed by primer extension sequencing to inform
the modeling of RNA tertiary structure (Aviran et al., 2011).
With the focus of interest in structural biology moving toward
larger macromolecular complexes and dynamic networks of
interactions, individual experimental techniques are often no
longer able to generate sufficient data that would allow gener-
ating a unique high-resolution atomic model of the system. The
combination computational modeling with a variety of hetero-
geneous (low-resolution) experimental constraints has proven
extremely powerful (Alber et al., 2008; Ward et al., 2013). One
essential feature of such integrative structure solution tech-
1536 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r
niques is that they must be able to handle ambiguous or conflict-
ing information with different levels of accuracy (Alber et al.,
2008; Rieping et al., 2005; Schneidman-Duhovny et al., 2012b).
In this context, cryo-electron microscopy techniques play a cen-
tral role for determining the molecular architecture of large
macromolecular complexes (Lasker et al., 2012b; Velazquez-
Muriel et al., 2012; Zhao et al., 2013). Recent successful exam-
ples determined by integrative (a.k.a. hybrid) techniques include
the molecular architecture of the nuclear pore complex (NPC;
Figure 4), a transmembrane complex of approximately 50 MDa
with 456 constituent proteins that selectively transport cargoes
across the nuclear envelope (Alber et al., 2007a, 2007b). In a
similar approach, the molecular architecture of the 26S protea-
some holocomplex was determined (Lasker et al., 2012a).
Beyond Individual Models: Structural Biology of CellularProcessesAlthough many equilibrium models treat cells as a ‘‘bag of
enzymes,’’ living cells actively maintain a higher order internal
structure, and the functional aspects of this topological organiza-
tion in 3D space are gradually being discovered. New technolo-
gies for cryo-electron tomography hold the promise for direct
observation of cellular processes in situ (Briggs, 2013; Robinson
et al., 2007; Yahav et al., 2011). Higher order 3D cellular organiza-
tion also includes genome topology. Genomewide biochemical
analysis methods in combination with functional data provide
insights into how genome topology is maintained and the influ-
ence that it exhibits on gene expression and genome mainte-
nance. The intricate interplay between transcriptional activity
and spatial organization indicates a self-organizing and self-
perpetuating system that uses epigenetic dynamics to regulate
genome function in response to regulatory cues and topropagate
cell fate memory. Computational modeling based on data from
recently developed chromosome conformation capture tech-
nology provides detailed insights into the spatial organization of
genomes (Cavalli and Misteli, 2013; Dekker et al., 2013; Engreitz
et al., 2013; Gibcus and Dekker, 2013; Kimura et al., 2013).
eserved
Figure 5. Speculative Data-Driven 3DModel of the Bacterial DivisionMachineryThe model was created with the GraphiteLifeExplorer modeling tool (Hornuset al., 2013). The FtsZ tubulin-like protein (in dark blue and yellow) is shapedinto a double ring. A short filament of the FtsA actin-like protein (in light blue) isshown on the Z ring. One FtsKmotor (in gray) pumps the DNA. This translocaseis linked to the membrane (data not shown) by six linkers (Vendeville et al.,2011). Image courtesy of Damien Lariviere (http://www.lifeexplorer.eu/).
Structure
Review
The integrative modeling of complex molecular machines
combines data from a broad range of experimental techniques,
such as EM, chemical crosslinking, proteomics, FRET, or
SAXS in an attempt to build an ensemble of models that is
consistent with the available data (Webb et al., 2011). Often,
this is an iterative process where ambiguities in the models indi-
cate lack (or inconsistency) of the data, motivating new experi-
ments to resolve these ambiguities. Software packages that
integrate modeling capability with powerful visualization tools
in projects such as IMP/Chimera (Yang et al., 2012), Mesoscope
(Al-Amoudi et al., 2011), or Life Explorer (Figure 5) (Hornus et al.,
2013) will greatly facilitate the generation and interpretation of
the resulting models.
Conveying the underlying assumptions of a computational
technique, as well as estimating the expected accuracy and vari-
ability of a model, will be crucial in making these computational
techniques useful for the nonexpert scientist. Understanding the
limitations of a model is the first step in deciding which interpre-
tations and conclusions can be supported. One of the key chal-
lenges in this respect will be to develop an open environment for
sharing of data, models, and algorithms that will allow us to
continuously and collaboratively refine our current model of the
‘‘molecular sociology of the cell’’ (Robinson et al., 2007).
ACKNOWLEDGMENTS
Special thanks to Andrej Sali (University of California, San Francisco) forproviding the image of the NPC, Damien Lariviere (Fondation Fourmentin)for the screen shot of Life Explorer, and Jurgen Haas and Stefan Bienert forhelp with the template coverage plot. Model Archive was supported by the Na-tional Institute of General Medical Science of the National Institutes of Healthas a subaward to Rutgers University under award number U01 GM093324 andthe SIB Swiss Institute of Bioinformatics.
REFERENCES
Al-Amoudi, A., Castano-Diez, D., Devos, D.P., Russell, R.B., Johnson, G.T.,and Frangakis, A.S. (2011). The three-dimensional molecular structure of thedesmosomal plaque. Proc. Natl. Acad. Sci. USA 108, 6480–6485.
Structure
Alber, F., Dokudovskaya, S., Veenhoff, L.M., Zhang, W., Kipper, J., Devos, D.,Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B.T., et al. (2007a). Deter-mining the architectures of macromolecular assemblies. Nature 450, 683–694.
Alber, F., Dokudovskaya, S., Veenhoff, L.M., Zhang, W., Kipper, J., Devos, D.,Suprapto, A., Karni-Schmidt, O., Williams, R., Chait, B.T., et al. (2007b). Themolecular architecture of the nuclear pore complex. Nature 450, 695–701.
Alber, F., Forster, F., Korkin, D., Topf, M., and Sali, A. (2008). Integratingdiverse data for structure determination of macromolecular assemblies.Annu. Rev. Biochem. 77, 443–477.
Aller, S.G., Yu, J., Ward, A., Weng, Y., Chittaboina, S., Zhuo, R., Harrell, P.M.,Trinh, Y.T., Zhang, Q., Urbatsch, I.L., and Chang, G. (2009). Structure of P-glycoprotein reveals a molecular basis for poly-specific drug binding. Science323, 1718–1722.
Aloy, P., and Russell, R.B. (2006). Structural systems biology: modelling pro-tein interactions. Nat. Rev. Mol. Cell Biol. 7, 188–197.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.,and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generationof protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Arnold, K., Bordoli, L., Kopp, J., and Schwede, T. (2006). The SWISS-MODELworkspace: a web-based environment for protein structure homology model-ling. Bioinformatics 22, 195–201.
Arnold, K., Kiefer, F., Kopp, J., Battey, J.N., Podvinec, M., Westbrook, J.D.,Berman, H.M., Bordoli, L., and Schwede, T. (2009). The Protein Model Portal.J. Struct. Funct. Genomics 10, 1–8.
Aviran, S., Trapnell, C., Lucks, J.B., Mortimer, S.A., Luo, S., Schroth, G.P.,Doudna, J.A., Arkin, A.P., and Pachter, L. (2011). Modeling and automationof sequencing-based characterization of RNA structure. Proc. Natl. Acad.Sci. USA 108, 11069–11074.
Baker, D., and Sali, A. (2001). Protein structure prediction and structural geno-mics. Science 294, 93–96.
Benkert, P., Biasini, M., and Schwede, T. (2011). Toward the estimation of theabsolute quality of individual protein structure models. Bioinformatics 27,343–350.
Berman, H.M., Burley, S.K., Chiu, W., Sali, A., Adzhubei, A., Bourne, P.E.,Bryant, S.H., Dunbrack, R.L., Jr., Fidelis, K., Frank, J., et al. (2006). Outcomeof a workshop on archiving structural models of biological macromolecules.Structure 14, 1211–1217.
Berman, H., Henrick, K., Nakamura, H., and Markley, J.L. (2007). The world-wide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDBdata. Nucleic Acids Res. 35(Database issue), D301–D303.
Bordoli, L., and Schwede, T. (2012). Automated protein structure modelingwith SWISS-MODEL Workspace and the Protein Model Portal. MethodsMol. Biol. 857, 107–136.
Box, G.E.P., and Draper, N.R. (1987). Empirical Model-Building and ResponseSurfaces (New York: Wiley).
Briggs, J.A. (2013). Structural biology in situ—the potential of subtomogramaveraging. Curr. Opin. Struct. Biol. 23, 261–267.
Buljan, M., Chalancon, G., Dunker, A.K., Bateman, A., Balaji, S., Fuxreiter, M.,and Babu, M.M. (2013). Alternative splicing of intrinsically disordered regionsand rewiring of protein interactions. Curr. Opin. Struct. Biol. 23, 443–450.
Burger, L., and van Nimwegen, E. (2010). Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 6, e1000633.
Carlsson, J., Coleman, R.G., Setola, V., Irwin, J.J., Fan, H., Schlessinger, A.,Sali, A., Roth, B.L., and Shoichet, B.K. (2011). Ligand discovery from a dopa-mine D3 receptor homology model and crystal structure. Nat. Chem. Biol. 7,769–778.
Cavalli, G., and Misteli, T. (2013). Functional implications of genome topology.Nat. Struct. Mol. Biol. 20, 290–299.
Chen, V.B., Arendall, W.B., 3rd, Headd, J.J., Keedy, D.A., Immormino, R.M.,Kapral, G.J., Murray, L.W., Richardson, J.S., and Richardson, D.C. (2010).MolProbity: all-atom structure validation for macromolecular crystallography.Acta Crystallogr. D Biol. Crystallogr. 66, 12–21.
21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1537
Structure
Review
Chothia, C., and Lesk, A.M. (1986). The relation between the divergence ofsequence and structure in proteins. EMBO J. 5, 823–826.
Colak, R., Kim, T., Michaut, M., Sun, M., Irimia, M., Bellay, J., Myers, C.L.,Blencowe, B.J., and Kim, P.M. (2013). Distinct types of disorder in the humanproteome: functional implications for alternative splicing. PLoS Comput. Biol.9, e1003030.
Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S.; US NationalHuman Genome Research Institute. (2003). A vision for the future of genomicsresearch. Nature 422, 835–847.
Costanzi, S. (2013). Modeling G protein-coupled receptors and their interac-tions with ligands. Curr. Opin. Struct. Biol. 23, 185–190.
Cruz, J.A., Blanchet, M.F., Boniecki, M., Bujnicki, J.M., Chen, S.J., Cao, S.,Das, R., Ding, F., Dokholyan, N.V., Flores, S.C., et al. (2012). RNA-Puzzles: aCASP-like evaluation of RNA three-dimensional structure prediction. RNA18, 610–625.
Damm-Ganamet, K.L., Smith, R.D., Dunbar, J.B., Jr., Stuckey, J.A., and Carl-son, H.A. (2013). CSAR Benchmark Exercise 2011-2012: Evaluation of resultsfrom docking and relative ranking of blinded congeneric series. J. Chem. Inf.Model. Published online April 2, 2013. http://dx.doi.org/10.1021/ci400025f.
Das, R., and Baker, D. (2008). Macromolecular modeling with rosetta. Annu.Rev. Biochem. 77, 363–382.
David, A., Razali, R., Wass, M.N., and Sternberg, M.J. (2012). Protein-proteininteraction sites are hot spots for disease-associated nonsynonymous SNPs.Hum. Mutat. 33, 359–363.
de Juan, D., Pazos, F., and Valencia, A. (2013). Emerging methods in proteinco-evolution. Nat. Rev. Genet. 14, 249–261.
Dekker, J., Marti-Renom, M.A., and Mirny, L.A. (2013). Exploring the three-dimensional organization of genomes: interpreting chromatin interactiondata. Nat. Rev. Genet. 14, 390–403.
DiMaio, F., Terwilliger, T.C., Read, R.J.,Wlodawer, A., Oberdorfer, G.,Wagner,U., Valkov, E., Alon, A., Fass, D., Axelrod, H.L., et al. (2011). Improved molec-ular replacement by density- and energy-guided protein structure optimiza-tion. Nature 473, 540–543.
Eisenberg, D., and Jucker, M. (2012). The amyloid state of proteins in humandiseases. Cell 148, 1188–1203.
Engreitz, J.M., Pandya-Jones, A., McDonel, P., Shishkin, A., Sirokman, K.,Surka, C., Kadri, S., Xing, J., Goren, A., Lander, E.S., et al. (2013). The XistlncRNA exploits three-dimensional genome architecture to spread acrossthe X chromosome. Science. Published online July 4, 2013. http://dx.doi.org/10.1126/science.1237973.
Fleishman, S.J., Whitehead, T.A., Ekiert, D.C., Dreyfus, C., Corn, J.E., Strauch,E.M., Wilson, I.A., and Baker, D. (2011). Computational design of proteinstargeting the conserved stem region of influenza hemagglutinin. Science332, 816–821.
Gibcus, J.H., and Dekker, J. (2013). The hierarchy of the 3D genome. Mol. Cell49, 773–782.
Ginalski, K., Elofsson, A., Fischer, D., and Rychlewski, L. (2003). 3D-Jury: asimple approach to improve protein structure predictions. Bioinformatics 19,1015–1018.
Gouaux, E., and Mackinnon, R. (2005). Principles of selective ion transport inchannels and pumps. Science 310, 1461–1465.
Graewert, M.A., and Svergun, D.I. (2013). Impact and progress in small andwide angle X-ray scattering (SAXS and WAXS). Curr. Opin. Struct. Biol. Pub-lished online July 5, 2013. http://dx.doi.org/10.1016/j.sbi.2013.06.007.
Guex, N., Peitsch, M.C., and Schwede, T. (2009). Automated comparative pro-tein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a histori-cal perspective. Electrophoresis 30(Suppl 1 ), S162–S173.
Haas, J., Roth, S., Arnold, K., Kiefer, F., Schmidt, T., Bordoli, L., and Schwede,T. (2013). The Protein Model Portal—a comprehensive resource for proteinstructure and model information. Database (Oxford) 2013, bat031.
Hennig, J., Wang, I., Sonntag, M., Gabel, F., and Sattler, M. (2013). CombiningNMR and small angle X-ray and neutron scattering in the structural analysis ofa ternary protein-RNA complex. J. Biomol. NMR 56, 17–30.
1538 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r
Hildebrand, A., Remmert, M., Biegert, A., and Soding, J. (2009). Fast andaccurate automatic structure prediction with HHpred. Proteins 77(Suppl 9 ),128–132.
Hooft, R.W., Vriend, G., Sander, C., and Abola, E.E. (1996). Errors in proteinstructures. Nature 381, 272.
Hopf, T.A., Colwell, L.J., Sheridan, R., Rost, B., Sander, C., and Marks, D.S.(2012). Three-dimensional structures of membrane proteins from genomicsequencing. Cell 149, 1607–1621.
Hornus, S., Levy, B., Lariviere, D., and Fourmentin, E. (2013). Easy DNAmodeling and more with GraphiteLifeExplorer. PLoS ONE 8, e53609.
Janin, J. (2010). Protein-protein docking tested in blind predictions: the CAPRIexperiment. Mol. Biosyst. 6, 2351–2362.
Janin, J., and Sternberg, M.J. (2013). Protein flexibility, not disorder, is intrinsicto molecular recognition. F1000 Biol. Rep. 5, 2.
Javadi, Y., and Itzhaki, L.S. (2013). Tandem-repeat proteins: regularity plusmodularity equals design-ability. Curr. Opin. Struct. Biol. 23, 622–631.
Kairys, V., Fernandes, M.X., and Gilson, M.K. (2006). Screening drug-likecompounds by docking to homology models: a systematic study. J. Chem.Inf. Model. 46, 365–379.
Kalinin, S., Peulen, T., Sindbert, S., Rothwell, P.J., Berger, S., Restle, T.,Goody, R.S., Gohlke, H., and Seidel, C.A. (2012). A toolkit and benchmarkstudy for FRET-restrained high-precision structural modeling. Nat. Methods9, 1218–1225.
Kastritis, P.L., and Bonvin, A.M. (2013). On the binding affinity of macromolec-ular interactions: daring to ask why proteins interact. J. R. Soc. Interface 10,20120835.
Kelley, L.A., and Sternberg, M.J. (2009). Protein structure prediction on theWeb: a case study using the Phyre server. Nat. Protoc. 4, 363–371.
Khatib, F., DiMaio, F., Cooper, S., Kazmierczyk, M., Gilski, M., Krzywda, S.,Zabranska, H., Pichova, I., Thompson, J., Popovi�c, Z., et al.; Foldit ContendersGroup; Foldit Void Crushers Group. (2011). Crystal structure of a monomericretroviral protease solved by protein folding game players. Nat. Struct. Mol.Biol. 18, 1175–1177.
Kiefer, F., Arnold, K., Kunzli, M., Bordoli, L., and Schwede, T. (2009). TheSWISS-MODEL Repository and associated resources. Nucleic Acids Res.37(Database issue), D387–D392.
Kimura, H., Shimooka, Y., Nishikawa, J.I., Miura, O., Sugiyama, S., Yamada,S., and Ohyama, T. (2013). The genome folding mechanism in yeast.J. Biochem. 154, 137–147.
Kinch, L., Yong Shi, S., Cong, Q., Cheng, H., Liao, Y., and Grishin, N.V. (2011).CASP9 assessment of freemodeling target predictions. Proteins 79(Suppl 10 ),59–73.
King, N.P., Sheffler, W., Sawaya, M.R., Vollmar, B.S., Sumida, J.P., Andre, I.,Gonen, T., Yeates, T.O., and Baker, D. (2012). Computational design of self-assembling protein nanomaterials with atomic level accuracy. Science 336,1171–1174.
Kobilka, B., and Schertler, G.F. (2008). New G-protein-coupled receptor crys-tal structures: insights and limitations. Trends Pharmacol. Sci. 29, 79–83.
Koga, N., Tatsumi-Koga, R., Liu, G., Xiao, R., Acton, T.B., Montelione, G.T.,and Baker, D. (2012). Principles for designing ideal protein structures. Nature491, 222–227.
Koh, I.Y., Eyrich, V.A., Marti-Renom,M.A., Przybylski, D., Madhusudhan, M.S.,Eswar, N., Grana, O., Pazos, F., Valencia, A., Sali, A., and Rost, B. (2003). EVA:Evaluation of protein structure prediction servers. Nucleic Acids Res. 31,3311–3315.
Kryshtafovych, A., Barbato, A., Fidelis, K., Monastyrskyy, B., Schwede, T., andTramontano, A. (2013). Assessment of the assessment: Evaluation of themodel quality estimates in CASP10. Proteins. http://dx.doi.org/10.1002/prot.24347, Published June 18, 2013.
Kufareva, I., Rueda, M., Katritch, V., Stevens, R.C., and Abagyan, R.; GPCRDock 2010 participants. (2011). Status of GPCR modeling and docking asreflected by community-wide GPCR Dock 2010 assessment. Structure 19,1108–1126.
eserved
Structure
Review
Kundrotas, P.J., Zhu, Z., Janin, J., and Vakser, I.A. (2012). Templates are avail-able tomodel nearly all complexes of structurally characterized proteins. Proc.Natl. Acad. Sci. USA 109, 9438–9441.
Lapedes, A., Giraud, B., and Jarzynski, C. (2002). Using sequence alignmentsto predict protein structure and stability with high accuracy. arXiv,arXiv:1207.2484, http://arxiv.org/abs/1207.2484.
Larsson, P., Skwark, M.J., Wallner, B., and Elofsson, A. (2011). Improvedpredictions by Pcons.net using multiple templates. Bioinformatics 27,426–427.
Lasker, K., Forster, F., Bohn, S., Walzthoeni, T., Villa, E., Unverdorben, P.,Beck, F., Aebersold, R., Sali, A., and Baumeister, W. (2012a). Molecular archi-tecture of the 26S proteasome holocomplex determined by an integrativeapproach. Proc. Natl. Acad. Sci. USA 109, 1380–1387.
Lasker, K., Velazquez-Muriel, J.A., Webb, B.M., Yang, Z., Ferrin, T.E., and Sali,A. (2012b). Macromolecular assembly structures by comparative modelingand electron microscopy. Methods Mol. Biol. 857, 331–350.
Laskowski, R.A., Macarthur, M.W., Moss, D.S., and Thornton, J.M. (1993).Procheck - a program to check the stereochemical quality of protein struc-tures. J. Appl. Crystallogr. 26, 283–291.
Lees, J., Yeats, C., Perkins, J., Sillitoe, I., Rentzsch, R., Dessailly, B.H., andOrengo, C. (2012). Gene3D: a domain-based resource for comparative geno-mics, functional annotation and protein network analysis. Nucleic Acids Res.40(Database issue), D465–D471.
Levitt, M. (2009). Nature of the protein universe. Proc. Natl. Acad. Sci. USA106, 11079–11084.
Ma, B., Tsai, C.J., Halilo�glu, T., and Nussinov, R. (2011). Dynamic allostery:linkers are not merely flexible. Structure 19, 907–917.
MacCallum, J.L., Perez, A., Schnieders, M.J., Hua, L., Jacobson, M.P., andDill, K.A. (2011). Assessment of protein structure refinement in CASP9. Pro-teins 79(Suppl 10 ), 74–90.
Mariani, V., Kiefer, F., Schmidt, T., Haas, J., and Schwede, T. (2011). Assess-ment of template based protein structure predictions in CASP9. Proteins79(Suppl 10 ), 37–58.
McGovern, S.L., and Shoichet, B.K. (2003). Information decay in moleculardocking screens against holo, apo, and modeled conformations of enzymes.J. Med. Chem. 46, 2895–2907.
McGuffin, L.J., Buenavista, M.T., and Roche, D.B. (2013). The ModFOLD4server for the quality assessment of 3D protein models. Nucleic Acids Res.41(W1), W368–W372.
Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D.S., Sander, C., Zec-china, R., Onuchic, J.N., Hwa, T., and Weigt, M. (2011). Direct-couplinganalysis of residue coevolution captures native contacts across many proteinfamilies. Proc. Natl. Acad. Sci. USA 108, E1293–E1301.
Moult, J., Fidelis, K., Kryshtafovych, A., and Tramontano, A. (2011). Criticalassessment of methods of protein structure prediction (CASP)—round IX. Pro-teins 79(Suppl 10 ), 1–5.
Nakashima, R., Sakurai, K., Yamasaki, S., Hayashi, K., Nagata, C., Hoshino,K., Onodera, Y., Nishino, K., and Yamaguchi, A. (2013). Structural basis forthe inhibition of bacterial multidrug exporters. Nature 500, 102–106.
Nugent, T., and Jones, D.T. (2012). Accurate de novo structure prediction oflarge transmembrane protein domains using fragment-assembly and corre-lated mutation analysis. Proc. Natl. Acad. Sci. USA 109, E1540–E1547.
Nygaard, R., Zou, Y., Dror, R.O., Mildorf, T.J., Arlow, D.H., Manglik, A., Pan,A.C., Liu, C.W., Fung, J.J., Bokoch, M.P., et al. (2013). The dynamic processof b(2)-adrenergic receptor activation. Cell 152, 532–542.
Ochi, T., Wu, Q., Chirgadze, D.Y., Grossmann, J.G., Bolanos-Garcia, V.M.,and Blundell, T.L. (2012). Structural insights into the role of domain flexibilityin human DNA ligase IV. Structure 20, 1212–1222.
Oshiro, C., Bradley, E.K., Eksterowicz, J., Evensen, E., Lamb, M.L., Lanctot,J.K., Putta, S., Stanton, R., and Grootenhuis, P.D. (2004). Performance of3D-database molecular docking studies into homology models. J. Med.Chem. 47, 764–767.
Structure
Peitsch, M.C. (1995). Protein modeling by e-mail. Nat. Biotechnol. 13,658–660.
Pieper, U., Webb, B.M., Barkan, D.T., Schneidman-Duhovny, D., Schles-singer, A., Braberg, H., Yang, Z., Meng, E.C., Pettersen, E.F., Huang, C.C.,et al. (2011). ModBase, a database of annotated comparative protein structuremodels, and associated resources. Nucleic Acids Res. 39(Database issue),D465–D474.
Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D.,Kellogg, E., DiMaio, F., Lange, O., et al. (2009). Structure prediction for CASP8with all-atom refinement using Rosetta. Proteins 77(Suppl 9 ), 89–99.
Rambo, R.P., and Tainer, J.A. (2013). Super-resolution in solution X-ray scat-tering and its applications to structural systems biology. Annu. Rev. Biophys.42, 415–441.
Ray, A., Lindahl, E., and Wallner, B. (2012). Improved model quality assess-ment using ProQ2. BMC Bioinformatics 13, 224.
Read, R.J., Adams, P.D., Arendall, W.B., 3rd, Brunger, A.T., Emsley, P., Joos-ten, R.P., Kleywegt, G.J., Krissinel, E.B., Lutteke, T., Otwinowski, Z., et al.(2011). A new generation of crystallographic validation tools for the proteindata bank. Structure 19, 1395–1412.
Remmert, M., Biegert, A., Hauser, A., and Soding, J. (2012). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat.Methods 9, 173–175.
Rieping, W., Habeck, M., and Nilges, M. (2005). Inferential structure determi-nation. Science 309, 303–306.
Ripphausen, P., Nisius, B., Peltason, L., and Bajorath, J. (2010). Quo vadis,virtual screening? A comprehensive survey of prospective applications.J. Med. Chem. 53, 8461–8467.
Robinson, C.V., Sali, A., and Baumeister, W. (2007). The molecular sociologyof the cell. Nature 450, 973–982.
Roche, D.B., Buenavista, M.T., Tetchner, S.J., and McGuffin, L.J. (2011). TheIntFOLD server: an integrated web resource for protein fold recognition, 3Dmodel quality assessment, intrinsic disorder prediction, domain predictionand ligand binding site prediction. Nucleic Acids Res. 39(Web Server issue),W171–W176.
Rother, M., Rother, K., Puton, T., and Bujnicki, J.M. (2011). RNA tertiary struc-ture prediction with ModeRNA. Brief. Bioinform. 12, 601–613.
Russel, D., Lasker, K., Webb, B., Velazquez-Muriel, J., Tjioe, E., Schneidman-Duhovny, D., Peterson, B., and Sali, A. (2012). Putting the pieces together:integrative modeling platform software for structure determination of macro-molecular assemblies. PLoS Biol. 10, e1001244.
Rychlewski, L., and Fischer, D. (2005). LiveBench-8: the large-scale, contin-uous assessment of automated protein structure prediction. Protein Sci. 14,240–245.
Safi, M., and Lilien, R.H. (2012). Efficient a priori identification of drug resistantmutations using Dead-End Elimination and MM-PBSA. J. Chem. Inf. Model.52, 1529–1541.
SAMPL. (2010). Proceedings of the Third Annual Statistical Assessment of theModeling of Proteins and Ligands (SAMPL) Challenge and Workshop. June2009. Montreal, Canada. J. Comput. Aided Mol. Des. 24, 257–383.
Sanchez, R., and Sali, A. (1998). Large-scale protein structure modeling of theSaccharomyces cerevisiae genome. Proc. Natl. Acad. Sci. USA 95, 13597–13602.
Schneidman-Duhovny, D., Hammel, M., and Sali, A. (2011). Macromoleculardocking restrained by a small angle X-ray scattering profile. J. Struct. Biol.173, 461–471.
Schneidman-Duhovny, D., Kim, S.J., and Sali, A. (2012a). Integrative structuralmodeling with small angle X-ray scattering profiles. BMC Struct. Biol. 12, 17.
Schneidman-Duhovny, D., Rossi, A., Avila-Sakar, A., Kim, S.J., Velazquez-Muriel, J., Strop, P., Liang, H., Krukenberg, K.A., Liao, M., Kim, H.M., et al.(2012b). A method for integrative structure determination of protein-proteincomplexes. Bioinformatics 28, 3282–3289.
Schwede, T., Sali, A., Honig, B., Levitt, M., Berman, H.M., Jones, D., Brenner,S.E., Burley, S.K., Das, R., Dokholyan, N.V., et al. (2009). Outcome of a
21, September 3, 2013 ª2013 Elsevier Ltd All rights reserved 1539
Structure
Review
workshop on applications of protein models in biomedical research. Structure17, 151–159.
Seetin, M.G., and Mathews, D.H. (2012). RNA structure prediction: an over-view of methods. Methods Mol. Biol. 905, 99–122.
Shen, Y., Vernon, R., Baker, D., and Bax, A. (2009). De novo protein structuregeneration from incomplete chemical shift assignments. J. Biomol. NMR 43,63–78.
Sievers, S.A., Karanicolas, J., Chang, H.W., Zhao, A., Jiang, L., Zirafi, O.,Stevens, J.T., Munch, J., Baker, D., and Eisenberg, D. (2011). Structure-baseddesign of non-natural amino-acid inhibitors of amyloid fibril formation. Nature475, 96–100.
Sim, A.Y., Minary, P., and Levitt, M. (2012). Modeling nucleic acids. Curr. Opin.Struct. Biol. 22, 273–278.
Skolnick, J., Zhou, H., and Gao, M. (2013). Are predicted protein structures ofany value for binding site prediction and virtual ligand screening? Curr. Opin.Struct. Biol. 23, 191–197.
Stein, A., Mosca, R., and Aloy, P. (2011). Three-dimensional modeling ofprotein interactions and complexes is going ’omics. Curr. Opin. Struct. Biol.21, 200–208.
Sutcliffe, M.J., Haneef, I., Carney, D., and Blundell, T.L. (1987). Knowledgebased modelling of homologous proteins, Part I: Three-dimensional frame-works derived from the simultaneous superposition of multiple structures.Protein Eng. 1, 377–384.
Tamaskovic, R., Simon, M., Stefan, N., Schwill, M., and Pluckthun, A. (2012).Designed ankyrin repeat proteins (DARPins) from research to therapy.Methods Enzymol. 503, 101–134.
Terwilliger, T.C. (2011). The success of structural genomics. J. Struct. Funct.Genomics 12, 43–44.
Trewhella, J., Hendrickson, W.A., Kleywegt, G.J., Sali, A., Sato, M., Schwede,T., Svergun, D.I., Tainer, J.A., Westbrook, J., and Berman, H.M. (2013). Reportof the wwPDB Small-Angle Scattering Task Force: data requirements forbiomolecular modeling and the PDB. Structure 21, 875–881.
UniProt Consortium. (2013). Update on activities at the Universal ProteinResource (UniProt) in 2013. Nucleic Acids Res. 41(Database issue), D43–D47.
Unwalla, R.J., Cross, J.B., Salaniwal, S., Shilling, A.D., Leung, L., Kao, J., andHumblet, C. (2010). Using a homology model of cytochrome P450 2D6 to pre-dict substrate site of metabolism. J. Comput. Aided Mol. Des. 24, 237–256.
Uversky, V.N., and Dunker, A.K. (2013). The case for intrinsically disorderedproteins playing contributory roles in molecular recognition without a stable3D structure. F1000 Biol. Rep. 5, 1.
Vakser, I.A. (2013). Low-resolution structural modeling of protein interactome.Curr. Opin. Struct. Biol. 23, 198–205.
Velazquez-Muriel, J., Lasker, K., Russel, D., Phillips, J., Webb, B.M., Schneid-man-Duhovny, D., and Sali, A. (2012). Assembly of macromolecular com-plexes by satisfaction of spatial restraints from electron microscopy images.Proc. Natl. Acad. Sci. USA 109, 18821–18826.
Vendeville, A., Lariviere, D., and Fourmentin, E. (2011). An inventory of the bac-terial macromolecular components and their spatial organization. FEMSMicrobiol. Rev. 35, 395–414.
Vidal, M., Cusick, M.E., and Barabasi, A.L. (2011). Interactome networks andhuman disease. Cell 144, 986–998.
1540 Structure 21, September 3, 2013 ª2013 Elsevier Ltd All rights r
Walzthoeni, T., Leitner, A., Stengel, F., and Aebersold, R. (2013). Mass spec-trometry supported determination of protein complex structure. Curr. Opin.Struct. Biol. 23, 252–260.
Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. (2004).Prediction and functional analysis of native disorder in proteins from the threekingdoms of life. J. Mol. Biol. 337, 635–645.
Ward, A.B., Sali, A., andWilson, I.A. (2013). Biochemistry. Integrative structuralbiology. Science 339, 913–915.
Wass, M.N., David, A., and Sternberg, M.J. (2011). Challenges for the predic-tion of macromolecular interactions. Curr. Opin. Struct. Biol. 21, 382–390.
Watson, J.D., and Crick, F.H. (1953). Molecular structure of nucleic acids; astructure for deoxyribose nucleic acid. Nature 171, 737–738.
Webb, B., Lasker, K., Schneidman-Duhovny, D., Tjioe, E., Phillips, J., Kim,S.J., Velazquez-Muriel, J., Russel, D., and Sali, A. (2011). Modeling of proteinsand their assemblies with the integrative modeling platform. Methods Mol.Biol. 781, 377–397.
Wei, Q., Xu, Q., and Dunbrack, R.L., Jr. (2013). Prediction of phenotypes ofmissense mutations in human proteins from biological assemblies. Proteins81, 199–213.
Weinkam, P., Pons, J., and Sali, A. (2012). Structure-based model of allosterypredicts coupling between distant sites. Proc. Natl. Acad. Sci. USA 109, 4875–4880.
Whitehead, T.A., Baker, D., and Fleishman, S.J. (2013). Computational designof novel protein binders and experimental affinity maturation. Methods Enzy-mol. 523, 1–19.
Wiederstein, M., and Sippl, M.J. (2007). ProSA-web: interactive web servicefor the recognition of errors in three-dimensional structures of proteins.Nucleic Acids Res. 35(Web Server issue), W407–W410.
Xu, Q., and Dunbrack, R.L., Jr. (2011). The protein common interface database(ProtCID)—a comprehensive database of interactions of homologous proteinsin multiple crystal forms. Nucleic Acids Res. 39(Database issue), D761–D770.
Yahav, T., Maimon, T., Grossman, E., Dahan, I., and Medalia, O. (2011). Cryo-electron tomography: gaining insight into cellular processes by structuralapproaches. Curr. Opin. Struct. Biol. 21, 670–677.
Yang, Z., Lasker, K., Schneidman-Duhovny, D., Webb, B., Huang, C.C., Pet-tersen, E.F., Goddard, T.D., Meng, E.C., Sali, A., and Ferrin, T.E. (2012).UCSF Chimera, MODELLER, and IMP: an integrated modeling system.J. Struct. Biol. 179, 269–278.
Zhang, Q.C., Petrey, D., Deng, L., Qiang, L., Shi, Y., Thu, C.A., Bisikirska, B.,Lefebvre, C., Accili, D., Hunter, T., et al. (2012). Structure-based predictionof protein-protein interactions on a genome-wide scale. Nature 490, 556–560.
Zhang, Y. (2013). Interplay of I-TASSER and QUARK for template-based andab initio protein structure prediction in CASP10. Proteins. http://dx.doi.org/10.1002/prot.24341, Published June 13, 2013.
Zhang, Y., Thiele, I., Weekes, D., Li, Z., Jaroszewski, L., Ginalski, K., Deacon,A.M., Wooley, J., Lesley, S.A., Wilson, I.A., et al. (2009). Three-dimensionalstructural view of the central metabolic network of Thermotoga maritima.Science 325, 1544–1549.
Zhao, G., Perilla, J.R., Yufenyuy, E.L., Meng, X., Chen, B., Ning, J., Ahn, J.,Gronenborn, A.M., Schulten, K., Aiken, C., and Zhang, P. (2013). MatureHIV-1 capsid structure by cryo-electron microscopy and all-atom moleculardynamics. Nature 497, 643–646.
eserved