Upload
jun-gao
View
214
Download
0
Embed Size (px)
Citation preview
Comparing Four Different Approaches for the Determination ofInter-Residue Interactions Provides Insight for the StructurePrediction of Helical Membrane Proteins
Jun Gao,1,2 Zhijun Li1,3
1 Department of Bioinformatics and Computer Science, University of the Sciences in Philadelphia, Philadelphia, PA 19104
2 Institute of Theoretical Chemistry, Shandong University, Jinan 250100, People’s Republic of China
3 Institute for Translational Medicine and Therapeutics, University of the Pennsylvania, Philadelphia, PA 19104
Received 13 November 2008; revised 13 February 2009; accepted 14 February 2009
Published online 24 February 2009 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/bip.21175
This article was originally published online as an accepted
preprint. The ‘‘Published Online’’ date corresponds to the
preprint version. You can request a copy of the preprint by
emailing the Biopolymers editorial office at biopolymers@wiley.
com
INTRODUCTION
Inter-residue interactions play a crucial role in driving
protein folding and studying these interactions facilitates
the development of computational tools for the structure
prediction of both soluble and membrane proteins.1 Elu-
cidating inter-residue interactions within a protein’s
three-dimensional structure represents the first step for fur-
ther analysis. Computational analyses of inter-residue inter-
actions often define them in a simplified, but proven useful,
manner.2–7 For example, an interaction between two residues
is regarded to exist if the distance between any two atoms
from the two residues is less than 5 A.8 Various approaches
have been proposed as the basis for the determination of
inter-residue interactions and applied to the analysis of
Comparing Four Different Approaches for the Determination ofInter-Residue Interactions Provides Insight for the StructurePrediction of Helical Membrane Proteins
Additional Supporting Information may be found in the online version of this article.Correspondence to: Zhijun Li; e-mail: [email protected]
ABSTRACT:
Studying inter-residue interactions provides insight into
the folding and stability of both soluble and membrane
proteins and is essential for developing computational
tools for protein structure prediction. As the first step,
various approaches for elucidating such interactions
within protein structures have been proposed and proven
useful. Since different approaches may grasp different
aspects of protein structural folds, it is of interest to
systematically compare them. In this work, we applied
four approaches for determining inter-residue
interactions to the analysis of three distinct structure
datasets of helical membrane proteins and compared
their correlation to the three individual quality measures
of structures in these datasets. These datasets included
one of 35 structures of rhodopsin receptors and bacterial
rhodopsins determined at various resolutions, one derived
from the HOMEP benchmark dataset previously
reported, and one comprising of 139 homology models. It
was found that the correlation between the average
number of inter-residue interactions obtained by applying
the four approaches and the available structure quality
measures varied quite significantly among them. The best
correlation was achieved by the approach focusing
exclusively on favorable inter-residue interactions. These
results provide interesting insight for the development of
objective quality measure for the structure prediction of
helical membrane proteins. # 2009 Wiley Periodicals,
Inc. Biopolymers 91: 547–556, 2009.
Keywords: inter-residue interaction; membrane protein
structure; interaction determination; structure quality
Contract grant sponsor: Research Starter Grant in Informatics (PhRMA Founda-
tion)
VVC 2009 Wiley Periodicals, Inc.
Biopolymers Volume 91 / Number 7 547
protein structures. These approaches can be approximately
classified into three categories: approaches employing a sin-
gle distance cutoff between atoms of two residues,3,5,9
approaches based on van der Walls radii of atoms,4 and
approaches detecting specific inter-residue interactions, e.g.
H-bonds.10 Since different approaches may grasp different
aspects of protein structural folds, it is of interest to system-
atically compare these approaches in order to gain new ideas
for the development of structure prediction tools.
The helical transmembrane (TM) proteins are excellent
systems for comparison studies of different approaches for
the determination of inter-residue interactions. First, struc-
ture prediction remains an eminent challenge in the field of
TM proteins11 and developing high-quality and fast predic-
tion tools is needed urgently. TM proteins mediate a variety
of fundamental cellular activities and are estimated to
account for �20–30% of the human genome.12–14 TM pro-
teins also serve as important drug targets with the G-protein
coupled receptor (GPCR) superfamily alone being the target
of 30–50% of drugs available in the market.15,16 However,
TM protein structure determination remains a challenge in
general17 and membrane proteins represent less than 1% of
structures in the PDB database.18 Computational modeling
approaches for their structure prediction have played a sig-
nificant role in structural and functional studies of mem-
brane proteins,19–21 as well as in structure-based drug design
efforts.21,22 Distinct differences were observed between mem-
brane and soluble proteins, in terms of amino acid propen-
sity, packing density and side-chain rotamer frequencies.23–26
Developing computational prediction tools based on the
studies of inter-residue interactions in membrane proteins is
thus of particular interest.
Second, the packing of the TM domains of helical mem-
brane proteins is relatively homogeneous and simple to char-
acterize.27 The average number of inter-residue interactions
derived based on a specific approach falls into a narrow range
for high-resolution X-ray structures.7 This number decreases
for the low-resolution X-ray structures. Similarly, for a set of
96 GPCR homology models, a good linear relationship
between the average number of interactions derived and their
sequence identity to the template of the rhodopsin receptor
was reported.7 These observations suggest that a simple mea-
sure such as the average number of inter-residue interactions
correlates directly with the quality of the structure.
Studies of inter-residue interactions in helical membrane
proteins have revealed interesting findings and facilitated the
development of computational approaches for membrane
protein structure prediction. Frequent packing motifs and
highly probable inter-residue interactions are identified for
helical membrane proteins.28,29 Such knowledge has been
subsequently adopted for membrane protein structure pre-
diction with exciting outcomes.30–32 In other studies, the dis-
tribution of short-, medium-, and long-range interactions in
membrane proteins is found to be different from soluble pro-
teins.33–35 Although the number of inter-residue interactions
at various sequence separation cutoffs follows the power-law
behavior for both helical soluble and membrane proteins, the
fitting parameters describing this behavior vary between
them.36
In this work, we seek to compare four approaches for
determining inter-residue interactions in correlation to dif-
ferent quality measures of helical TM protein structures.
These four approaches represent all three categories of
approaches for the determination of inter-residue interac-
tions. For this comparison study, we have compiled three
structure datasets of helical TM proteins. The quality of
structures in each dataset was measured from a different per-
spective. The first dataset included 35 crystal structures of
rhodopsin receptors and bacterial rhodopsins whose quality
was represented by their resolutions. The second dataset was
derived from the HOMEP benchmark dataset of membrane
protein models previously reported.37 The X-ray structures
of these models are already available, thus their quality was
measured by direct structure comparison. The third dataset
included 139 TM protein homology models. These models
were built based on 10 membrane protein structures, each
representing a unique superfamily. The quality of these mod-
els was indicated by their sequence identity to individual
templates. The analyses showed that the correlation between
the average number of inter-residue interactions obtained by
applying the four approaches and the structure quality meas-
ures employed varied significantly among these four
approaches. As different approaches grasp different aspects
of protein structural folds, these results provide new ideas for
developing objective quality measures for the structure pre-
diction of TM proteins.
MATERIALS AND METHODS
Rhodopsin Structure Dataset (Dataset I)A total of 35 structures of rhodopsin receptors and bacterial rho-
dopsins were identified from the online membrane protein re-
sources (http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html,
version January 19, 2009 and http://www.mpibp-frankfurt.mpg.de/
michel/public/memprotstruct.html, version March 30, 2006). Rho-
dopsin receptors belong to the largest superfamily of TM proteins,
the GPCR superfamily. Both rhodopsin receptors and bacterial rho-
dopsins bind to retinal, and are well studied membrane protein
families. The resolution of the 35 structures ranges from 1.55 to
4.15 A. All the 35 structures were determined using the X-ray or
electron diffraction crystallography.
548 Gao and Li
Biopolymers
For each structure, its TM helical boundaries were identified
based on the definition in the PDBTM database.38 The loops of the
soluble regions were manually removed to keep only alpha helices
that lie within the TM regions.
HOMEP Benchmark Dataset (Dataset II)188 homology models of membrane proteins from the HOMEP
benchmark dataset37 were obtained from Dr. Barry Honig’s lab. The
HOMEP dataset is a carefully compiled set of homologous models
of 94 membrane protein query-template pairs of known structures
and covers a wide range of sequence identities from \10 to 80%.
Among the 188 models obtained, 92 are helical membrane proteins,
representing 46 query-template pairs. For each pair, a homology
model based on their sequence-to-sequence alignment and a model
based on their structure alignment were constructed for the query
protein using Modeller 6v2.39 These 92 models of helical membrane
proteins were studied here.
For each HOMEP model studied, its TM helical boundaries were
identified based on the definition for its corresponding X-ray struc-
ture in the PDBTM database.38 The corresponding X-ray structure
was downloaded from the PDB.40 The loops of the soluble regions
were again manually removed to keep only alpha helices that lie
within the TM regions. Structure comparison between a model and
its X-ray structure was measured by TM-score using the TM-align
server (http://zhang.bioinformatics.ku.edu/TM-align/).41
Homology Model Dataset of Helical Membrane
Proteins (Dataset III)A homology model dataset of 139 nonredundant helical membrane
proteins was compiled for this study (Supporting Information Table
I). To prepare this dataset, 17 proteins containing a single TM do-
main as defined in the CATH database42 were selected from the
high-resolution structure dataset compiled in the previous study,7
and used as query sequences to search the Swiss-Prot database.43
Each query sequence represents a unique membrane protein super-
family based on the classification in the CATH database.42 The simi-
larity searches were carried out by BLAST with default settings.44
All the BLAST hit sequences were then searched against the pro-
tein model database, MODBASE.45 The hits whose entry informa-
tion in MODBASE satisfied the following criteria were included in
the model dataset: (i) Its model was available; (ii) The model was
constructed using homology modeling techniques based on the
structure of the query protein; (iii) Its entire sequence identity to
the query protein, as reported in the BLAST search, is within 20–
100%; (iv) Its MODBASE score is [0.70 to winnow models with
obvious errors. For a query protein with less than three models sat-
isfying the criteria (i)–(iv), those models were not included.
Members in the model dataset were subsequently divided into
eight subsets, based upon their sequence identity to their template
proteins. Ranges included were 90–100%, 80–89%, 70–79%, 60–
69%, 50–59%, 40–49%, 30–39%, and 20–29%. If one subset has
more than four models built based on the same template, four were
selected randomly. For each model studied, its TM helical bounda-
ries were identified in MOE (Molecular Computing Group version
2006.08). The loops of the soluble regions were manually removed
to keep only alpha helices that lie within the TM regions.
The original percentage of the sequence identity between a
model sequence and its template sequence was reported by the
BLAST search. Since this study focused on the TM domain of those
model structures, their TM domain sequences derived above were
realigned with the TM domain sequences of their respective tem-
plate proteins identified based on the PDBTM database (http://
pdbtm.enzim.hu/) using the AMPS package.46 The subset classifica-
tion of these models was adjusted accordingly. Totally, this dataset
represents 10 unique membrane protein superfamilies classified in
the CATH database. Among them, seven template proteins have
more than 10 homology models present.
Derivation of Inter-Residue InteractionsAn inter-residue interaction between two residues within a protein
structure was defined by one of four approaches. These four
approaches were described below.
Approach A. an edge was defined between two residues if one of
the four inter-residue interactions was detected: hydrophobic inter-
action, hydrogen bond, ionic bond, and disulfide bond. Hydrogen
bonds had a distance cutoff between the electronegative heavy
atoms of 3.1 or 3.2 A and a geometry criterion that the angle from
donor to acceptor should be between 1208 and 180847; hydrophobic
interactions and ionic bonds were based only on proximity and the
default cutoff was 4.5 A; and disulfide bonds were defined to exist
between explicitly bonded sulfur pairs or nonbonded sulfur pairs
within the distance cutoff of 2.5 A. These four interactions were
determined using the Protein Contacts function in MOE with
default settings as reported previously,10 which did not include
hydrogen bonds formed between two backbone atoms.
Approach B. an edge was defined between two residues if the dis-
tance between any two atoms from the two residues was �5 A.3,6
Such distances were determined using the CCP4 package (Collabo-
rative Computational Project, Number 4, 1994).
Approach C. an edge was defined between two residues if the the-
oretical solvent accessible surface of the two residues contacted each
other.4 Such contacts were determined using the CSU program.48
Approach D. an edge was defined between two residues if the dis-
tance between their Ca atoms was �8 A.2 Such distances were deter-
mined using the CCP4 package.
For all four approaches, inter-residue interactions between resi-
dues closer than four positions along the sequence were not
included in the calculation in an effort to focus on the long-range
inter-residue interactions.
Computational AnalysisFor each of the four approaches for determining inter-residue inter-
actions, the relationship between the average number of interactions
within a structure and the structure quality measure was fitted to
the linear function as in Eq. (1):
Y ¼ kX þ b ð1Þ
where, Y represents the average number of inter-residue interactions
detected within a structure, and X represents the value of the struc-
ture quality measure. For a structure in dataset I, X refers to its reso-
Inter-Residue Interactions in Membrane Proteins 549
Biopolymers
lution; for a structure in dataset II, X refers to its TM-score; and for
a structure in dataset III, X refers to the percentage of sequence
identity of that structure model to its template.
RESULTSTo compare the four different approaches for determining
inter-residue interactions within a structure, the computa-
tional approach generally included several steps: (i) Compile
three structure datasets of helical membrane proteins; (ii)
Determine the inter-residue interactions within structures in
these datasets using the four approaches; (iii) Obtain the lin-
ear relationship between the average number of inter-residue
interactions derived by these approaches and the structure
quality measures; and (iv) Compare and analyze the results
for the four approaches.
Analysis of the Rhodopsin Structure
Dataset (Dataset I)
The first dataset we analyzed included a set of rhodopsin X-
ray structures determined at different resolutions. In total, 35
structures were identified from two online membrane pro-
tein resources (see Figure 1). The resolution of the 35 struc-
tures ranged from 1.55 to 4.15 A. A clear linear relationship
between the average number of interactions of the TM
domains of the structures and their resolution was observed
for Approach A with the R-fitting value of 0.63. No such rela-
tionship was observed for Approach B to D (see Figure 1).
Since this dataset included structures of two membrane pro-
teins, rhodopsin GPCRs and bacterial rhodopsins, the overall
low correlation was understandable. Examination of the cor-
relation results for 13 rhodopsin receptors presented in this
dataset indicated a much better correlation for all four
approaches, with Approach A having the best fitting value of
0.91 (Supporting Information Figure 1).
Analysis of the HOMEP Model Dataset (Dataset II)
In the above analyses, the resolution of the X-ray structures
was adopted as the quality measure of the structure. The cor-
relation between the average number of interactions and the
value of this measure seems to vary significantly. To confirm
the finding, the same four approaches were applied to the
analysis of 92 homology models of helical membrane pro-
teins in the HOMEP benchmark dataset.37 HOMEP is a care-
fully compiled dataset of homology models of membrane
proteins with a wide range of sequence identities. For all the
models included in this dataset, their X-ray structures are
available. The 92 models studied here represented 46 query-
template pairs. Judged by the TM-score of their Ca atoms,
the quality of these models varied quite significantly with the
TM-score ranging from 0.33 to 0.99 (see Figure 2).
The relationship between the average number of inter-
residue interactions and the model’s TM-score again varied
for the four approaches. For Approach A, B and C, a roughly
linear function relationship was observed with Approach A
again having the best R-fitting value of 0.71 (Figures 3A-C).
While for Approach D, no clear trend was detected (Figure 3D).
Analysis of the Second Homology Model Dataset
(Dataset III)
The above analyses clearly demonstrated Approach A dis-
plays the best correlation between the average number of
inter-residue interactions and the structure quality measures
among the four approaches. To further validate this conclu-
sion, a dataset of 139 homology models based on 10 mem-
brane protein structures was compiled. For a homology
model, its quality is primarily dependent on its sequence
identity to the template protein. The higher the sequence
identity, the better the model quality. With models in this
dataset displaying the sequence identity of 20–100%, this
dataset included model of a variety of quality.
The relationship between the average number of inter-
residue interactions and the models’ sequence identity to their
templates displayed the same trend for the four approaches.
For Approach A, a clear mono-increasing function relation-
ship was observed (Figure 4A). This is similar to the previous
study using the GPCR homology model dataset.7 While for
Approach B-D, no clear trend was detected. The average num-
ber of interactions obtained based on them seems insensitive to
changes in the sequence identity value (Figures 4B–D).
Possible Relationship Among the Four Approaches
We then explored the possible relationships between the four
approaches for determining inter-residue interactions using
the third dataset, in which the correlation between Approach
B and Approach A, C, and D was studied individually (see
Figure 5). The correlation between Approach B and A is
quite weak, e.g. the linear fitting R-value was only 0.48. In
contrast, there is a strong linear relationship between
Approach B and Approach C and D. The R-value was 0.93
between B and C, and 0.95 between B and D (see Figure 5).
An Example Illustrating the Relationships
of the Four Approaches
The studies above demonstrated that the Approach A is
weakly correlated with Approach B-D while the latter three
have the strong correlation with each other. To further exam-
550 Gao and Li
Biopolymers
ine the relationships among these four approaches, the
results of one recently published high-resolution membrane
protein structure, the human b2 adrenergic GPCR (PDB ID:
2RH1),49 was analyzed in detail. The b2 adrenergic receptor
is a seven helical membrane protein and belongs to the larg-
est membrane protein superfamily of GPCRs.
The inter-residue interactions within the TM domain of
this protein were determined using the four Approaches A to
D. Since Approach A focuses exclusively on favorable interac-
tions, it is unsurprising that the entire set of interactions
derived using Approach A represented a subset of interac-
tions derived using Approach B or C (see Figure 6). Most of
interactions (64%) from Approach A were also included in
the set from Approach D. On the other hand, interactions
presented in all three sets from Approach B, C and D
accounted for a significant percentage of those three individual
FIGURE 1 Linear correlation between the average number of inter-residue interactions of the TM
domains of 35 rhodopsin structures and their structure resolution for the four approaches A–D.
The PDB ID, studied chain and resolution of the rhodopsin structures are: 1, 1F88 (Chain A, 2.80
A); 2, 1GZM (Chain A, 2.65 A); 3, 1HZX (Chain A, 2.80 A), 4, 1L9H (Chain A, 2.60 A); 5, 1U19
(Chain A, 2.20 A); 6, 2I35 (Chain A, 3.80 A); 7, 2I36 (Chain A, 4.10 A); 8, 2I37 (Chain A, 4.15 A); 9,
2J4Y (Chain A, 3.40 A); 10, 2Z73 (Chain A, 2.50 A); 11, 2ZIY (Chain A, 3.70 A); 12, 3CAP (Chain
A, 2.90 A); 13, 3DQB (Chain A, 3.20 A); 14, 1AT9 (Chain A, 3.00 A); 15, 1C8R (Chain A, 1.80 A);
16, 1C8S (Chain A, 2.00 A); 17, 1E12 (Chain A, 1.80 A); 18, 1H2S (Chain A, 1.93 A); 19, 1H68
(Chain A, 2.10 A); 20, 1JGJ (Chain A, 2.40 A); 21, 1QKO (Chain A, 2.10 A); 22, 1UAZ (Chain A,
3.40 A); 23, 1VGO (Chain A, 2.50 A); 24, 1XIO (Chain A, 2.00 A); 25, 2EI4 (Chain A, 2.10 A); 26,
2Z55 (Chain A, 2.50 A); 27, 3DDL (Chain A, 1.90 A); 28, 1AP9 (Chain A, 2.35 A); 29, 1BRD (Chain
A, 3.50 A); 30, 1BRR (Chain A, 2.90 A); 31, 1BRX (Chain A, 2.30 A); 32, 1C3W (Chain A, 1.55 A);
33, 1KME (Chain A, 2.00 A); 34, 1QHJ (Chain A, 1.90 A); 35, 2BRD (Chain A, 3.50 A).
Inter-Residue Interactions in Membrane Proteins 551
Biopolymers
sets, 63% for Approach B, 65% for Approach C and 86% for
Approach D. This explained the strong correlation relation-
ship between these three approaches.
DISCUSSIONInter-residue interactions are essential to the folding of solu-
ble and membrane proteins, and analysis of these interac-
tions is of significance to the development of high-quality
tools for the structure prediction of TM proteins.1 The first
step in the analysis of inter-residue interactions is to define
such interactions using an approach based on knowledge of
protein folding and packing. Different approaches have been
proposed over the years,2–7 each grasping a different aspect
of protein structural folds. Thus it is of interest to systemati-
cally compare these approaches by applying them to the anal-
ysis of the same structures. In this work, four widely adopted
approaches for determining inter-residue interactions were
studied by applying them to three different structure datasets
of helical membrane proteins.
The four chosen approaches included one based on spe-
cific inter-residue interactions detected (Approach A), two
employing a single distance cutoff between atoms of two resi-
dues (Approach B and D), and one based on the atom van
der Walls radii of two residues (Approach C). These four
approaches were subsequently applied to three structure
datasets whose quality was measured differently. The first
dataset included 35 X-ray structures of rhodopsin receptors
and bacterial rhodopsins determined at various resolutions.
The quality of these structures was measured by their resolu-
tion. The second dataset included 92 homology models of
helical membrane proteins from the HOMEP dataset.27
These models represents 46 query-template pairs and their
crystal structures were also available, allowing for the direct
comparison quantified by the TM-score. The third dataset
comprised 139 diverse models of membrane proteins. Within
this dataset, seven superfamilies had more than 10 models
and none of them had more than 35 representative models
presented. These ensured that the results would not be biased
by the presence of models from a single superfamily. The
sequence identity between these models and their respective
template proteins ranges from 20 to 100%. Since no experi-
mental structures were reported for these models, the quality
of these models was measured by their sequence identity to
the individual templates.50
For all three datasets, A similar trend in the correlation
between the average number of inter-residue interactions
derived and the individual quality measures adopted was
observed, in which the degree of the correlation decreased in
the order: Approach A [ B � C [D. To some extent, this
observation is understandable. Approach D defines an inter-
action based only on Ca–Ca distances; while Approach B and
C define an interaction somewhat based on the distance
between any two atoms from either side chains or backbones,
and Approach A uses the criteria including not only the
atom-atom distance, but also the type of residues involved.
Since a good structure requires the optimized packing of
both the backbone and side chain atoms, the trend observed
here implicated the importance of taking side-chain atoms
into consideration when developing approaches for deter-
mining inter-residue interactions.
Perhaps the most striking finding in this work was the fact
that for all three datasets whose structure quality was meas-
ured differently, Approach A performed consistently best
among all four approaches. For the 13 structures of the rho-
dopsin receptors in the first dataset, the correlation between
the average number of interactions and the structure resolu-
tion was excellent (R 5 0.91) (Supporting Information
FIGURE 2 Ca TM-score of the homology models in the second
dataset vs. their X-ray structures. The PDB IDs of each query-tem-
plate pair are: 1, 1KQF-1L0V; 2, 1L0V-1KQF; 3, 1KQF-1QLA; 4,
1QLA-1KQF; 5, 1QLA-1L0V; 6, 1U19-1E12; 7, 1E12-1U19; 8,
1L0V-1QLA; 9, 1H68-1U19; 10, 1U19-1M0L; 11, 1U19-1H68; 12,
1M0L-1U19; 13, 1PV6-1PW4; 14, 1PW4-1PV6; 15, 1J4N-1FX8; 16,
1FX8-1J4N; 17, 1FX8-1RC2; 18, 1E12-1H68; 19, 1RC2-1FX8; 20,
1H68-1E12; 21, 1E12-1M0L; 22, 1J4N-1RC2; 23, 1NTM-1KB9; 24,
1M0L-1E12; 25, 1RC2-1J4N; 26, 1BCC-1KB9; 27, 1H68-1M0L; 28,
1KB9-1BCC; 29, 1KB9-1NTM; 30, 1M0L-1H68; 31, 1OGV-1PRC;
32, 1M56-1V54; 33, 1PRC-1OGV; 34, 1BCC-1NTM; 35, 1NTM-
1BCC; 36, 1AR1-1V54; 37, 1EYS-1PRC; 38, 1PRC-1EYS; 39, 1OGV-
1EYS; 40, 1V54-1AR1; 41, 1AR1-1M56; 42, 1V54-1M56; 43, 1M56-
1AR1; 44, 1KPL-1OTS; 45, 1EYS-1OGV;46, 1OTS-1KPL.
552 Gao and Li
Biopolymers
Figure 1). For the third dataset, it was the only approach that
showed the correlation to the percentage of sequence identity
value. It also performed best for the second dataset, even
given the fact that TM-score does not always correlate line-
arly with the structure quality. These differences reflect the
fundamental differences in the definition for determining an
inter-residue interaction among these four approaches.
Approach A tends to focus only on those inter-residue inter-
actions that form favorable interactions, while Approach B-
D, although with subtle differences (see Figure 5), make no
such differentiations in determining an interaction. As long
as two residues within a specified distance, an interaction
would be assumed regardless whether favorable or unfavora-
ble interactions existing between the two residues.
When a protein folds, numerous favorable and unfavora-
ble inter-residue interactions form within the structure. Con-
sistently, the better a structure, the more average number of
favorable interactions should form. Therefore, it should not
come to a surprise that an approach capturing favorable
interactions demonstrates the good correlation with the
structure quality. However, the folding of a structure also
results in the formation of many unfavorable interactions.
The results reported here seem to suggest that the latter does
not play a significant role in the folding of membrane protein
structures. Overall, the fact that an approach measuring
favorable interactions exclusively performed consistent better
than those measuring overall interactions suggests that the
future development of inter-residue interaction-based quality
measures should focus more on the favorable interactions.
Correlating the average number of inter-residue interac-
tions with structure quality measures can provides an abso-
lute measure of model quality.51 A linear relationship was
observed between the average number of inter-residue inter-
actions and three different quality measures for Approach A.
FIGURE 3 Linear correlation between the average number of inter-residue interactions of the TM
domains of homology models in the second dataset and their TM-score for the four approaches A–
D. The PDB IDs of each query-template pair is the same as in Figure 2.
Inter-Residue Interactions in Membrane Proteins 553
Biopolymers
FIGURE 4 Linear correlation between the average number of inter-residue interactions of the TM
domains of all models in the third dataset and their sequence identity to respective templates for the
four approaches A–D.
FIGURE 5 Linear correlation between Approach B and Approach A, C, and D in term of the aver-
age number of interactions of the TM domains of all models in the third dataset.
554 Gao and Li
Biopolymers
By applying the same approach to protein structure models
in question, and subsequently mapping their value of the av-
erage number of inter-residue interactions to the linear
graphs presented in this work, we can make sound judgments
regarding how different the model’s packing from a native-
like fold. Given the fact that few membrane protein-oriented
quality measures are available, we envision this interaction-
based approach will have broad application.
A measure of the average number of inter-residue interac-
tions was employed for the correlation study reported here.
The fact that the best correlation was obtained by Approach
A should not be regarded as the major reason to devalue the
other three approaches. Those approaches may be better
suited for different types of analysis.2–7 On the other hand,
none of the four approaches used in this study has consis-
tently resulted in a perfect correlation with the quality mea-
sures illustrated in this work. This suggests the continuous
need to develop novel approaches that better capture the
essential features of folded TM protein structures.
CONCLUSIONIn summary, three datasets of helical membrane protein
structures were compiled and analyzed using four different
approaches for determining inter-residue interactions within
a TM protein structure. Three conclusions were derived
based on the analysis. First, approaches that take side chain
atoms into consideration give better correlation with the
quality of the structures; Second, the approach focusing
exclusively on favorable interactions performs consistently
better than those focusing on the overall interactions; And
third, none of the four approaches examined has resulted in
the prefect correlation, suggesting the continuous need to de-
velop new approaches. Overall, this study provides new
insights for the development of structure prediction tools for
helical membrane proteins.
The authors thank Drs. Lucy R. Forrest, and Barry Honig for shar-
ing with us the HOMEP dataset of membrane protein models. We
acknowledge the use of the MODBASE database (http://modbase.
compbio.ucsf.edu/modbase-cgi/search_form.cgi) in this work.
REFERENCES1. Gromiha, M. M.; Selvaraj S. Prog Biophys Mol Biol 2004, 86,
235–277.
2. Vendruscolo, M.; Dokholyan, N. V.; Paci, E.; Karplus, M.
Phys Rev E Stat Nonlin Soft Matter Phys 2002, 65, 061910-1–
061910-4.
3. Greene, L. H.; Higman, V. A. J Mol Biol 2003, 334, 781–791.
4. Amitai, G.; Shemesh, A.; Sitbon, E.; Shklar, M.; Netanely, D.;
Venger, I.; Pietrokovski, S. J Mol Biol 2004, 344, 1135–1146.
5. Brinda, K. V.; Vishveshwara, S. Biophys J 2005, 89, 4159–4170.
6. del Sol, A.; O’Meara, P. Proteins 2005, 58, 672–682.
7. Pabuwal, V.; Li, Z. Protein Eng Des Sel , 21, 55–64.
8. Ponnuswamy, P. K. Prog Biophys Mol Biol 1993, 59, 57–103.
9. del Sol, A.; Fujihashi, H.; Amoros, D.; Nussinov R. Protein Sci
2006, 15, 2120–2128.
10. Muppirala, U. K.; Li, Z. Protein Eng Des Sel 2006, 19, 265–275.
11. Barth, P.; Schonbrun, J.; Baker, D. Proc Natl Acad Sci USA 2007,
104, 15682–15687.
12. Fleming, K. G. Curr Opin Biotechnol 2000, 11, 67–71.
13. Liu, Y.; Engelman, D. M.; Gerstein, M. Genome Biol 2002, 3,
research0054.
14. Wallin, E.; von Heijne, G. Protein Sci 1998, 7, 1029–1038.
15. Drews, J. Nat Biotechnol 1996, 14, 1516–1518.
16. Klabunde, T.; Hessler, G. Chembiochem 2002, 3, 928–944.
17. Karnik, S. S.; Gogonea, C.; Patil, S.; Saad, Y.; Takezako, T. Trends
Endocrinol Metab 2003, 14, 431–437.
18. White, S. H. Protein Sci 2004, 13, 1948–1949.
19. Fanelli, F.; De Benedetti, P. G. Chem Rev 2005, 105, 3297–3351.
20. Oliveira, L.; Hulsen, T.; Lutje Hulsik, D.; Paiva, A. C.; Vriend, G.
FEBS Lett 2004, 564, 269–273.
21. Visiers, I.; Ballesteros, J. A. Weinstein, H. Methods Enzymol
2002, 343, 329–371.
22. Kontoyianni, M.; DeWeese, C.; Penzotti, J. E.; Lybrand, T. P.
J Med Chem 1996, 39, 4406–4420.
23. Adamian, L.; Liang, J. J Mol Biol 2001, 311, 891–907.
24. Chamberlain, A. K.; Bowie, J. U. Biophys J 2004, 87, 3460–3469.
25. Eilers, M.; Patel, A. B.; Liu, W.; Smith, S. O. Biophys J 2002, 82,
2720–2736.
26. Eilers, M.; Shekar, S. C.; Shieh, T.; Smith, S. O.; Fleming, P. J.
Proc Natl Acad Sci USA 2000, 97, 5796–5801.
27. Bowie, J. U. Nature 2005, 438, 581–589.
28. Curran, A. R.; Engelman, D. M. Curr Opin Struct Biol 2003, 13,
412–417.
FIGURE 6 Venn diagram showing the relationships of sets
of inter-residue interactions determined using the four approaches
A–D.
Inter-Residue Interactions in Membrane Proteins 555
Biopolymers
29. Liang, J. Curr Opin Chem Biol 2002, 6, 878–884.
30. McAllister, S. R.; Floudas, C. A. Biophys J 2008, 95, 5281–5295.
31. Fleishman, S. J.; Ben-Tal, N. J Mol Biol 2002, 321, 363–378.
32. Wendel, C.; Gohlke, H. Proteins 2008, 70, 984–999.
33. Gromiha, M. M.; Selvaraj S. Int J Biol Macromol 2001, 29, 25–
34.
34. Gromiha, M. M.; Selvaraj S. J Mol Biol 2001, 310, 27–32.
35. Kumarevel, T. S.; Gromiha, M. M.; Ponnuswamy, M. N. Biophys
Chem 1998, 75, 105–113.
36. Gao, J.; Li, Z. Biopolymers 2008, 89, 1174–1178.
37. Forrest, L. R.; Tang, C. L.; Honig, B. Biophys J 2006, 91, 508–
517.
38. Tusnady, G. E.; Dosztanyi, Z.; Simon, I. Nucleic Acids Res 2005,
33 (Database issue), D275–D278.
39. Sali, A.; Blundell, T. L. J Mol Biol 1993, 234, 779–815.
40. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.
N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids
Res 2000, 28, 235–242.
41. Zhang, Y.; Skolnick, J. Nucleic Acids Res 2005, 33, 2302–2309.
42. Pearl, F. M.; Bennett, C. F.; Bray, J. E.; Harrison, A. P.; Martin,
N.; Shepherd, A.; Sillitoe, I.; Thornton, J.; Orengo, C. A. Nucleic
Acids Res 2003, 31, 452–455.
43. Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.;
Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.;
O’Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M. Nucleic
Acids Res 2003, 31, 365–370.
44. Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.;
Miller, W.; Lipman, D. J. Nucleic Acids Res 1997, 25, 3389–3402.
45. Pieper, U.; Eswar, N.; Braberg, H.; Madhusudhan, M. S.; Davis, F.
P.; Stuart, A. C.; Mirkovic, N.; Rossi, A.; Marti-Renom, M. A.;
Fiser, A, Webb, B.; Greenblatt, D.; Huang, C. C.; Ferrin, T. E.;
Sali, A. Nucleic Acids Res 2004, 32(Database issue), D217–D222.
46. Livingstone, C. D.; Barton, G. J. Comput Appl Biosci 1993, 9,
745–756.
47. Stickle, D. F.; Presta, L. G.; Dill, K. A.; Rose, G. D. J Mol Biol
1992, 226, 1143–1159.
48. Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E. E.; Edelman, M.
Bioinformatics 1999, 15, 327–332.
49. Cherezov, V.; Rosenbaum, D. M.; Hanson, M. A.; Rasmussen, S.
G.; Thian, F. S.; Kobilka, T. S.; Choi, H. J.; Kuhn, P.; Weis, W. I.;
Kobilka, B. K.; Stevens, R. C. Science 2007, 318, 1258–1265.
50. Sanchez, R.; Sali, A. Curr Opin Struct Biol 1997, 7, 206–214.
51. Eramian, D.; Eswar, N.; Shen, M. Y.; Sali, A. Protein Sci 2008,
17, 1881–1893.
Reviewing Editor: David Case
556 Gao and Li
Biopolymers