2
NATURE METHODS | VOL.10 NO.1 | JANUARY 2013 | 43 NEWS AND VIEWS Interpreting protein networks with three-dimensional structures Joan Teyra & Philip M Kim A fully automated pipeline that systematically models three- dimensional (3D) structural details of protein interactions will allow researchers to interpret perturbation effects within protein pathways and networks. Proteins are the working machinery of the cell, and their intricate interactions orchestrate most biological processes in a highly dynamic and cooperative fashion. For the correct execu- tion of such processes, a precise and complex chain of protein interaction events needs to occur (for example, molecular assembly, acti- vation, catalysis and disassembly) that can be represented in a network. With the advent of high-throughput experimental technologies, protein-protein interactions (PPIs) are being identified at a genome scale, and large networks have emerged for several organisms, including human. Such data are usually represented in a graph, where nodes symbolize proteins that are connected by undirected edges corresponding to interactions. In this issue of Nature Methods, Mosca et al. 1 describe a tool for annotating such graphs with 3D structural information and illustrate the value of doing so. As a result of high-throughput methods, an entire subfield of computational bio- logy studying protein interaction networks has emerged, and many advances have been made in both the analysis and the prediction of such networks 2 . However, the high level of abstraction of such protein networks misses many details occurring at the molecular level. For instance, a highly connected node may represent a protein that recognizes its part- ners though the same surface region at dif- ferent steps of a molecular process, or one that participates in two different pathways at different times and locations. Also, designing inhibitors of interactions (for example, in the form of pharmaceutical intervention) always requires a higher-resolution view. Capturing such molecular details about protein inter- actions requires obtaining structural infor- mation about protein complexes. Structures can highlight key three-dimensional features, such as surface regions involved in recogni- tion and, more importantly, the specific resi- dues involved. Unfortunately, despite the large number of available 3D protein structures, only a small fraction of protein complexes within the interactome have known structures. Owing to the inherent technical difficulties of solving cocrystal structures, this is unlikely to change very soon. Fortunately, computational methods offer alternatives. Homology modeling is a mature techno- logy to predict three-dimensional models for a given protein sequence based on using the structures of homologous proteins as tem- plates. Ten years ago, the Russell laboratory 3 pioneered techniques to assess how well a homologous pair of sequences fit onto a pre- viously determined structure of a protein com- plex, in order to generate an approximate 3D model of the complex using existing compara- tive modeling techniques. The method is based on the principle that many protein domains belonging to the same family use common molecular features to recognize members of another family (that is, they have equivalent binding surface regions and residue positions). Since then, many computational biologists have used structural information in combina- tion with computational tools to analyze larger networks 4 , to predict new protein interactions, to determine which interactions are compat- ible with each other, and to obtain functional insights into the structural effects of single and multiple mutations 5 . However, obtaining and interpreting structural information remains a technical challenge for the wider scientific community, even for researchers obtaining and analyzing PPI data. To address this need, Mosca et al. present a fully automated homology modeling pipe- line, dubbed Interactome3D, to map structural information onto any PPI data set on the fly 1 . Researchers can input their interaction data set and obtain a structure-mapped network in which 3D models representing nodes and edges can be inspected in detail (Fig. 1). In addition, Mosca et al. have assembled and pre- calculated the structural interactomes for eight model organisms (including Escherichia coli, yeast and human), which can be visualized and inspected on the Interactome3D website. To illustrate the importance of integrating struc- tural data into interactions, they annotated the complement cascade pathway, an innate immune system that helps antibodies and phagocytic cells protect the host against patho- gens. For this particular case, they obtained an almost perfect structural coverage, helping them to map mutations related to three dif- ferent diseases and rationalize a mechanism common to different mutations that cause one particular syndrome. The analysis of the different interactomes show that, on average, Mosca et al. were able to use their pipeline to obtain full or partial structural coverage for 35% of individual proteins, whereas for binary interactions this was possible for only approximately 11% 1 . These low numbers highlight how far we still are from our ultimate goal of studying the regulation of cell function by incorporat- ing structural knowledge. In addition to the aforementioned lack of cocrystal structures, another complication is that automatically generated 3D homology models of protein complexes are not perfect and often need manual expert input. Especially difficult are cases in which the interactions involve con- formational changes at the interface, or the Joan Teyra and Philip M. Kim are at the Donnelly Centre, Department of Computer Science and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. e-mail: [email protected] npg © 2013 Nature America, Inc. All rights reserved.

Interpreting protein networks with three-dimensional structures

Embed Size (px)

Citation preview

nature methods | VOL.10 NO.1 | JANUARY 2013 | 43

news and views

interpreting protein networks with three-dimensional structuresJoan Teyra & Philip M Kim

A fully automated pipeline that systematically models three-dimensional (3D) structural details of protein interactions will allow researchers to interpret perturbation effects within protein pathways and networks.

Proteins are the working machinery of the cell, and their intricate interactions orchestrate most biological processes in a highly dynamic and cooperative fashion. For the correct execu-tion of such processes, a precise and complex chain of protein interaction events needs to occur (for example, molecular assembly, acti-vation, catalysis and disassembly) that can be represented in a network. With the advent of high-throughput experimental technologies, protein-protein interactions (PPIs) are being identified at a genome scale, and large networks have emerged for several organisms, including human. Such data are usually represented in a graph, where nodes symbolize proteins that are connected by undirected edges corresponding to interactions. In this issue of Nature Methods, Mosca et al.1 describe a tool for annotating such graphs with 3D structural information and illustrate the value of doing so.

As a result of high-throughput methods, an entire subfield of computational bio-logy studying protein interaction networks has emerged, and many advances have been made in both the analysis and the prediction of such networks2. However, the high level of abstraction of such protein networks misses many details occurring at the molecular level. For instance, a highly connected node may represent a protein that recognizes its part-ners though the same surface region at dif-ferent steps of a molecular process, or one that participates in two different pathways at different times and locations. Also, designing

inhibitors of interactions (for example, in the form of pharmaceutical intervention) always requires a higher-resolution view. Capturing such molecular details about protein inter-actions requires obtaining structural infor-mation about protein complexes. Structures can highlight key three-dimensional features, such as surface regions involved in recogni-tion and, more importantly, the specific resi-dues involved. Unfortunately, despite the large number of available 3D protein structures, only a small fraction of protein complexes within the interactome have known structures. Owing to the inherent technical difficulties of solving cocrystal structures, this is unlikely to change very soon. Fortunately, computational methods offer alternatives.

Homology modeling is a mature techno-logy to predict three-dimensional models for a given protein sequence based on using the structures of homologous proteins as tem-plates. Ten years ago, the Russell laboratory3 pioneered techniques to assess how well a homologous pair of sequences fit onto a pre-viously determined structure of a protein com-plex, in order to generate an approximate 3D model of the complex using existing compara-tive modeling techniques. The method is based on the principle that many protein domains belonging to the same family use common molecular features to recognize members of another family (that is, they have equivalent binding surface regions and residue positions). Since then, many computational biologists

have used structural information in combina-tion with computational tools to analyze larger networks4, to predict new protein interactions, to determine which interactions are compat-ible with each other, and to obtain functional insights into the structural effects of single and multiple mutations5. However, obtaining and interpreting structural information remains a technical challenge for the wider scientific community, even for researchers obtaining and analyzing PPI data.

To address this need, Mosca et al. present a fully automated homology modeling pipe-line, dubbed Interactome3D, to map structural information onto any PPI data set on the fly1. Researchers can input their interaction data set and obtain a structure-mapped network in which 3D models representing nodes and edges can be inspected in detail (Fig. 1). In addition, Mosca et al. have assembled and pre-calculated the structural interactomes for eight model organisms (including Escherichia coli, yeast and human), which can be visualized and inspected on the Interactome3D website. To illustrate the importance of integrating struc-tural data into interactions, they annotated the complement cascade pathway, an innate immune system that helps antibodies and phagocytic cells protect the host against patho-gens. For this particular case, they obtained an almost perfect structural coverage, helping them to map mutations related to three dif-ferent diseases and rationalize a mechanism common to different mutations that cause one particular syndrome.

The analysis of the different interactomes show that, on average, Mosca et al. were able to use their pipeline to obtain full or partial structural coverage for 35% of individual proteins, whereas for binary interactions this was possible for only approximately 11%1. These low numbers highlight how far we still are from our ultimate goal of studying the regulation of cell function by incorporat-ing structural knowledge. In addition to the aforementioned lack of cocrystal structures, another complication is that automatically generated 3D homo logy models of protein complexes are not perfect and often need manual expert input. Especially difficult are cases in which the interactions involve con-formational changes at the interface, or the

Joan Teyra and Philip M. Kim are at the Donnelly Centre, Department of Computer Science and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. e-mail: [email protected]

npg

© 2

013

Nat

ure

Am

eric

a, In

c. A

ll rig

hts

rese

rved

.

44 | VOL.10 NO.1 | JANUARY 2013 | nature methods

news and views

modeled interfaces contain insertions or deletions with respect to the template. For this reason, the authors adopt a conservative approach that provides only reliable models to biologists, as shown in their benchmark1. Their strategy thus optimizes accuracy at the expense of coverage.

Going forward, we envision several ways in which protein complex structural coverage will increase. First, novel crystal structures will increase coverage automatically, though this will take time. Second, an ideal way to overcome the deficit of coverage would be to use docking techniques to predict the optimal binding geometry between 3D domain models from interacting protein pairs6. Unfortunately, protein docking remains an extremely challenging problem, and it is currently not feasible to obtain reliable

structures of interactions using this set of techniques. Finally, conceptual advances could lead to the inclusion of a type of inter-actions that is currently underrepresented in Mosca et al.’s method. These involve short sequence motifs (3–10 amino acids long), known as linear motifs, that are key media-tors of PPIs. To capture these interactions, the authors do use structures of domain–peptide complexes as templates, but the biophysical nature of these interactions and the low specificity contained in the short sequence motifs leads to difficult tradeoffs between coverage and accuracy. Whereas Mosca et al. rightly optimize accuracy1, new experimental technologies enabling the determination of more specific sequence signatures may allow more accurate inclu-sion of such interactions into a structural

interactome7. In our opinion, the challenge for the future lies in developing alternative and complementary procedures that are reliable enough to potentially expand the current structural interactome.

The novel resource presented by Mosca et al.1 constitutes a dramatic advance on the analysis of protein-protein interactions at atomic detail for networks and pathways. The web interface (http://interactome3d.irbbarcelona.org) is intuitive and easy for nonspecialist researchers to use, and it will allow molecular and cell biologist to rou-tinely incorporate structural insights into the analysis of novel experimental PPI data. Knowledge of these details allows more rational design of experiments to disrupt an interaction and study the effect of such a perturbation within the system. We have no doubt that Interactome3D will facilitate many exciting discoveries.

ComPetinG FinanCiaL interestsThe authors declare no competing financial interests.

1. Mosca, R. et al. Nat. Methods 10, 47–53 (2013).

2. Vidal, M., Cusick, M.E. & Barabási, A.-L. Cell 144, 986–998 (2011).

3. Aloy, P. & Russell, R.B. Proc. Natl. Acad. Sci. USA 99, 5896–5901 (2002).

4. Kim, P.M., Lu, L.J., Xia, Y. & Gerstein, M.B. Science 314, 1938–1941 (2006).

5. Kiel, C., Beltrao, P. & Serrano, L. Annu. Rev. Biochem. 77, 415–441 (2008).

6. Janin, J. Mol. Biosyst. 6, 2351–2362 (2010).

7. Tonikian, R. et al. PLoS Biol. 7, e1000218 (2009).

Figure 1 | Interactome3D uses a protein-protein interaction dataset (left) as input. Nodes represent proteins and edges represent interactions. A full or partial structure of each protein is modeled, as well as the protein-protein complex (center) if a suitable template can be found. The three-dimensional model of the protein complex (right) provides structural insights for interpreting perturbation effects and rational design of further experiments.

Protein interaction networks

Identification of structural complexes

Structural modeling and scoring

an indirect approach to generating specific human cell typesErnesto Lujan & Marius Wernig

Two groups derived neural and mesodermal cells from human fibroblasts by going through a partially reprogrammed intermediate.

The ability to easily convert accessible human cells into disease-relevant cell types through cellular reprogramming has opened new doors for basic research and regenerative medicine1. Takahashi and Yamanaka ushered in contem-porary reprogramming when they demon-strated that a combination of four transcription

factors (Oct3/4, Sox2, Klf4 and c-Myc) could drive skin-derived fibroblasts to a pluripo-tent state that could be further differenti-ated into the desired cell type2 (Fig. 1a). But robust differentiation into specific lineages remains a stumbling block. Low efficiencies and week- to month-long protocols often

give rise to mixed cultures requiring a second purification step. Purity matters, as remnant pluripotent cells can give rise to tumors after transplantation. Moreover, the yielded cells are typically immature (as in cardiomyocyte, hematopoietic or neuronal differentiation).

With these concerns in mind, we and other groups have sought to take a different approach where, instead of going through a pluripotent state, one somatic cell type can be directly con-verted to another with the correct combina-tion of lineage-specific transcription factors3 (Fig. 1). Surprisingly, using this approach, cellular conversion is fast (2–3 weeks), does not require the derivation of pluripotent cells and is efficient.

Recently, several groups have taken yet another approach to cellular conversion by transiently expressing the Yamanaka factors to generate what appears to be a multipotent, partially reprogrammed intermediate that

Ernesto Lujan is the Department of Genetics, Marius Wernig is the Department of Pathology, and both are at the Institute of Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, California, USA. e-mail: [email protected]

npg

© 2

013

Nat

ure

Am

eric

a, In

c. A

ll rig

hts

rese

rved

.