10
Jun Gao, 1,2 Zhijun Li 1,3 1 Department of Bioinformatics and Computer Science, University of the Sciences in Philadelphia, Philadelphia, PA 19104 2 Institute of Theoretical Chemistry, Shandong University, Jinan 250100, People’s Republic of China 3 Institute for Translational Medicine and Therapeutics, University of the Pennsylvania, Philadelphia, PA 19104 Received 13 November 2008; revised 13 February 2009; accepted 14 February 2009 Published online 24 February 2009 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/bip.21175 This article was originally published online as an accepted preprint. The ‘‘Published Online’’ date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at biopolymers@wiley. com INTRODUCTION I nter-residue interactions play a crucial role in driving protein folding and studying these interactions facilitates the development of computational tools for the structure prediction of both soluble and membrane proteins. 1 Elu- cidating inter-residue interactions within a protein’s three-dimensional structure represents the first step for fur- ther analysis. Computational analyses of inter-residue inter- actions often define them in a simplified, but proven useful, manner. 2–7 For example, an interaction between two residues is regarded to exist if the distance between any two atoms from the two residues is less than 5 A ˚ . 8 Various approaches have been proposed as the basis for the determination of inter-residue interactions and applied to the analysis of Comparing Four Different Approaches for the Determination of Inter-Residue Interactions Provides Insight for the Structure Prediction of Helical Membrane Proteins Additional Supporting Information may be found in the online version of this article. Correspondence to: Zhijun Li; e-mail: [email protected] ABSTRACT: Studying inter-residue interactions provides insight into the folding and stability of both soluble and membrane proteins and is essential for developing computational tools for protein structure prediction. As the first step, various approaches for elucidating such interactions within protein structures have been proposed and proven useful. Since different approaches may grasp different aspects of protein structural folds, it is of interest to systematically compare them. In this work, we applied four approaches for determining inter-residue interactions to the analysis of three distinct structure datasets of helical membrane proteins and compared their correlation to the three individual quality measures of structures in these datasets. These datasets included one of 35 structures of rhodopsin receptors and bacterial rhodopsins determined at various resolutions, one derived from the HOMEP benchmark dataset previously reported, and one comprising of 139 homology models. It was found that the correlation between the average number of inter-residue interactions obtained by applying the four approaches and the available structure quality measures varied quite significantly among them. The best correlation was achieved by the approach focusing exclusively on favorable inter-residue interactions. These results provide interesting insight for the development of objective quality measure for the structure prediction of helical membrane proteins. # 2009 Wiley Periodicals, Inc. Biopolymers 91: 547–556, 2009. Keywords: inter-residue interaction; membrane protein structure; interaction determination; structure quality Contract grant sponsor: Research Starter Grant in Informatics (PhRMA Founda- tion) V V C 2009 Wiley Periodicals, Inc. Biopolymers Volume 91 / Number 7 547

Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

  • Upload
    jun-gao

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

Comparing Four Different Approaches for the Determination ofInter-Residue Interactions Provides Insight for the StructurePrediction of Helical Membrane Proteins

Jun Gao,1,2 Zhijun Li1,3

1 Department of Bioinformatics and Computer Science, University of the Sciences in Philadelphia, Philadelphia, PA 19104

2 Institute of Theoretical Chemistry, Shandong University, Jinan 250100, People’s Republic of China

3 Institute for Translational Medicine and Therapeutics, University of the Pennsylvania, Philadelphia, PA 19104

Received 13 November 2008; revised 13 February 2009; accepted 14 February 2009

Published online 24 February 2009 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/bip.21175

This article was originally published online as an accepted

preprint. The ‘‘Published Online’’ date corresponds to the

preprint version. You can request a copy of the preprint by

emailing the Biopolymers editorial office at biopolymers@wiley.

com

INTRODUCTION

Inter-residue interactions play a crucial role in driving

protein folding and studying these interactions facilitates

the development of computational tools for the structure

prediction of both soluble and membrane proteins.1 Elu-

cidating inter-residue interactions within a protein’s

three-dimensional structure represents the first step for fur-

ther analysis. Computational analyses of inter-residue inter-

actions often define them in a simplified, but proven useful,

manner.2–7 For example, an interaction between two residues

is regarded to exist if the distance between any two atoms

from the two residues is less than 5 A.8 Various approaches

have been proposed as the basis for the determination of

inter-residue interactions and applied to the analysis of

Comparing Four Different Approaches for the Determination ofInter-Residue Interactions Provides Insight for the StructurePrediction of Helical Membrane Proteins

Additional Supporting Information may be found in the online version of this article.Correspondence to: Zhijun Li; e-mail: [email protected]

ABSTRACT:

Studying inter-residue interactions provides insight into

the folding and stability of both soluble and membrane

proteins and is essential for developing computational

tools for protein structure prediction. As the first step,

various approaches for elucidating such interactions

within protein structures have been proposed and proven

useful. Since different approaches may grasp different

aspects of protein structural folds, it is of interest to

systematically compare them. In this work, we applied

four approaches for determining inter-residue

interactions to the analysis of three distinct structure

datasets of helical membrane proteins and compared

their correlation to the three individual quality measures

of structures in these datasets. These datasets included

one of 35 structures of rhodopsin receptors and bacterial

rhodopsins determined at various resolutions, one derived

from the HOMEP benchmark dataset previously

reported, and one comprising of 139 homology models. It

was found that the correlation between the average

number of inter-residue interactions obtained by applying

the four approaches and the available structure quality

measures varied quite significantly among them. The best

correlation was achieved by the approach focusing

exclusively on favorable inter-residue interactions. These

results provide interesting insight for the development of

objective quality measure for the structure prediction of

helical membrane proteins. # 2009 Wiley Periodicals,

Inc. Biopolymers 91: 547–556, 2009.

Keywords: inter-residue interaction; membrane protein

structure; interaction determination; structure quality

Contract grant sponsor: Research Starter Grant in Informatics (PhRMA Founda-

tion)

VVC 2009 Wiley Periodicals, Inc.

Biopolymers Volume 91 / Number 7 547

Page 2: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

protein structures. These approaches can be approximately

classified into three categories: approaches employing a sin-

gle distance cutoff between atoms of two residues,3,5,9

approaches based on van der Walls radii of atoms,4 and

approaches detecting specific inter-residue interactions, e.g.

H-bonds.10 Since different approaches may grasp different

aspects of protein structural folds, it is of interest to system-

atically compare these approaches in order to gain new ideas

for the development of structure prediction tools.

The helical transmembrane (TM) proteins are excellent

systems for comparison studies of different approaches for

the determination of inter-residue interactions. First, struc-

ture prediction remains an eminent challenge in the field of

TM proteins11 and developing high-quality and fast predic-

tion tools is needed urgently. TM proteins mediate a variety

of fundamental cellular activities and are estimated to

account for �20–30% of the human genome.12–14 TM pro-

teins also serve as important drug targets with the G-protein

coupled receptor (GPCR) superfamily alone being the target

of 30–50% of drugs available in the market.15,16 However,

TM protein structure determination remains a challenge in

general17 and membrane proteins represent less than 1% of

structures in the PDB database.18 Computational modeling

approaches for their structure prediction have played a sig-

nificant role in structural and functional studies of mem-

brane proteins,19–21 as well as in structure-based drug design

efforts.21,22 Distinct differences were observed between mem-

brane and soluble proteins, in terms of amino acid propen-

sity, packing density and side-chain rotamer frequencies.23–26

Developing computational prediction tools based on the

studies of inter-residue interactions in membrane proteins is

thus of particular interest.

Second, the packing of the TM domains of helical mem-

brane proteins is relatively homogeneous and simple to char-

acterize.27 The average number of inter-residue interactions

derived based on a specific approach falls into a narrow range

for high-resolution X-ray structures.7 This number decreases

for the low-resolution X-ray structures. Similarly, for a set of

96 GPCR homology models, a good linear relationship

between the average number of interactions derived and their

sequence identity to the template of the rhodopsin receptor

was reported.7 These observations suggest that a simple mea-

sure such as the average number of inter-residue interactions

correlates directly with the quality of the structure.

Studies of inter-residue interactions in helical membrane

proteins have revealed interesting findings and facilitated the

development of computational approaches for membrane

protein structure prediction. Frequent packing motifs and

highly probable inter-residue interactions are identified for

helical membrane proteins.28,29 Such knowledge has been

subsequently adopted for membrane protein structure pre-

diction with exciting outcomes.30–32 In other studies, the dis-

tribution of short-, medium-, and long-range interactions in

membrane proteins is found to be different from soluble pro-

teins.33–35 Although the number of inter-residue interactions

at various sequence separation cutoffs follows the power-law

behavior for both helical soluble and membrane proteins, the

fitting parameters describing this behavior vary between

them.36

In this work, we seek to compare four approaches for

determining inter-residue interactions in correlation to dif-

ferent quality measures of helical TM protein structures.

These four approaches represent all three categories of

approaches for the determination of inter-residue interac-

tions. For this comparison study, we have compiled three

structure datasets of helical TM proteins. The quality of

structures in each dataset was measured from a different per-

spective. The first dataset included 35 crystal structures of

rhodopsin receptors and bacterial rhodopsins whose quality

was represented by their resolutions. The second dataset was

derived from the HOMEP benchmark dataset of membrane

protein models previously reported.37 The X-ray structures

of these models are already available, thus their quality was

measured by direct structure comparison. The third dataset

included 139 TM protein homology models. These models

were built based on 10 membrane protein structures, each

representing a unique superfamily. The quality of these mod-

els was indicated by their sequence identity to individual

templates. The analyses showed that the correlation between

the average number of inter-residue interactions obtained by

applying the four approaches and the structure quality meas-

ures employed varied significantly among these four

approaches. As different approaches grasp different aspects

of protein structural folds, these results provide new ideas for

developing objective quality measures for the structure pre-

diction of TM proteins.

MATERIALS AND METHODS

Rhodopsin Structure Dataset (Dataset I)A total of 35 structures of rhodopsin receptors and bacterial rho-

dopsins were identified from the online membrane protein re-

sources (http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html,

version January 19, 2009 and http://www.mpibp-frankfurt.mpg.de/

michel/public/memprotstruct.html, version March 30, 2006). Rho-

dopsin receptors belong to the largest superfamily of TM proteins,

the GPCR superfamily. Both rhodopsin receptors and bacterial rho-

dopsins bind to retinal, and are well studied membrane protein

families. The resolution of the 35 structures ranges from 1.55 to

4.15 A. All the 35 structures were determined using the X-ray or

electron diffraction crystallography.

548 Gao and Li

Biopolymers

Page 3: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

For each structure, its TM helical boundaries were identified

based on the definition in the PDBTM database.38 The loops of the

soluble regions were manually removed to keep only alpha helices

that lie within the TM regions.

HOMEP Benchmark Dataset (Dataset II)188 homology models of membrane proteins from the HOMEP

benchmark dataset37 were obtained from Dr. Barry Honig’s lab. The

HOMEP dataset is a carefully compiled set of homologous models

of 94 membrane protein query-template pairs of known structures

and covers a wide range of sequence identities from \10 to 80%.

Among the 188 models obtained, 92 are helical membrane proteins,

representing 46 query-template pairs. For each pair, a homology

model based on their sequence-to-sequence alignment and a model

based on their structure alignment were constructed for the query

protein using Modeller 6v2.39 These 92 models of helical membrane

proteins were studied here.

For each HOMEP model studied, its TM helical boundaries were

identified based on the definition for its corresponding X-ray struc-

ture in the PDBTM database.38 The corresponding X-ray structure

was downloaded from the PDB.40 The loops of the soluble regions

were again manually removed to keep only alpha helices that lie

within the TM regions. Structure comparison between a model and

its X-ray structure was measured by TM-score using the TM-align

server (http://zhang.bioinformatics.ku.edu/TM-align/).41

Homology Model Dataset of Helical Membrane

Proteins (Dataset III)A homology model dataset of 139 nonredundant helical membrane

proteins was compiled for this study (Supporting Information Table

I). To prepare this dataset, 17 proteins containing a single TM do-

main as defined in the CATH database42 were selected from the

high-resolution structure dataset compiled in the previous study,7

and used as query sequences to search the Swiss-Prot database.43

Each query sequence represents a unique membrane protein super-

family based on the classification in the CATH database.42 The simi-

larity searches were carried out by BLAST with default settings.44

All the BLAST hit sequences were then searched against the pro-

tein model database, MODBASE.45 The hits whose entry informa-

tion in MODBASE satisfied the following criteria were included in

the model dataset: (i) Its model was available; (ii) The model was

constructed using homology modeling techniques based on the

structure of the query protein; (iii) Its entire sequence identity to

the query protein, as reported in the BLAST search, is within 20–

100%; (iv) Its MODBASE score is [0.70 to winnow models with

obvious errors. For a query protein with less than three models sat-

isfying the criteria (i)–(iv), those models were not included.

Members in the model dataset were subsequently divided into

eight subsets, based upon their sequence identity to their template

proteins. Ranges included were 90–100%, 80–89%, 70–79%, 60–

69%, 50–59%, 40–49%, 30–39%, and 20–29%. If one subset has

more than four models built based on the same template, four were

selected randomly. For each model studied, its TM helical bounda-

ries were identified in MOE (Molecular Computing Group version

2006.08). The loops of the soluble regions were manually removed

to keep only alpha helices that lie within the TM regions.

The original percentage of the sequence identity between a

model sequence and its template sequence was reported by the

BLAST search. Since this study focused on the TM domain of those

model structures, their TM domain sequences derived above were

realigned with the TM domain sequences of their respective tem-

plate proteins identified based on the PDBTM database (http://

pdbtm.enzim.hu/) using the AMPS package.46 The subset classifica-

tion of these models was adjusted accordingly. Totally, this dataset

represents 10 unique membrane protein superfamilies classified in

the CATH database. Among them, seven template proteins have

more than 10 homology models present.

Derivation of Inter-Residue InteractionsAn inter-residue interaction between two residues within a protein

structure was defined by one of four approaches. These four

approaches were described below.

Approach A. an edge was defined between two residues if one of

the four inter-residue interactions was detected: hydrophobic inter-

action, hydrogen bond, ionic bond, and disulfide bond. Hydrogen

bonds had a distance cutoff between the electronegative heavy

atoms of 3.1 or 3.2 A and a geometry criterion that the angle from

donor to acceptor should be between 1208 and 180847; hydrophobic

interactions and ionic bonds were based only on proximity and the

default cutoff was 4.5 A; and disulfide bonds were defined to exist

between explicitly bonded sulfur pairs or nonbonded sulfur pairs

within the distance cutoff of 2.5 A. These four interactions were

determined using the Protein Contacts function in MOE with

default settings as reported previously,10 which did not include

hydrogen bonds formed between two backbone atoms.

Approach B. an edge was defined between two residues if the dis-

tance between any two atoms from the two residues was �5 A.3,6

Such distances were determined using the CCP4 package (Collabo-

rative Computational Project, Number 4, 1994).

Approach C. an edge was defined between two residues if the the-

oretical solvent accessible surface of the two residues contacted each

other.4 Such contacts were determined using the CSU program.48

Approach D. an edge was defined between two residues if the dis-

tance between their Ca atoms was �8 A.2 Such distances were deter-

mined using the CCP4 package.

For all four approaches, inter-residue interactions between resi-

dues closer than four positions along the sequence were not

included in the calculation in an effort to focus on the long-range

inter-residue interactions.

Computational AnalysisFor each of the four approaches for determining inter-residue inter-

actions, the relationship between the average number of interactions

within a structure and the structure quality measure was fitted to

the linear function as in Eq. (1):

Y ¼ kX þ b ð1Þ

where, Y represents the average number of inter-residue interactions

detected within a structure, and X represents the value of the struc-

ture quality measure. For a structure in dataset I, X refers to its reso-

Inter-Residue Interactions in Membrane Proteins 549

Biopolymers

Page 4: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

lution; for a structure in dataset II, X refers to its TM-score; and for

a structure in dataset III, X refers to the percentage of sequence

identity of that structure model to its template.

RESULTSTo compare the four different approaches for determining

inter-residue interactions within a structure, the computa-

tional approach generally included several steps: (i) Compile

three structure datasets of helical membrane proteins; (ii)

Determine the inter-residue interactions within structures in

these datasets using the four approaches; (iii) Obtain the lin-

ear relationship between the average number of inter-residue

interactions derived by these approaches and the structure

quality measures; and (iv) Compare and analyze the results

for the four approaches.

Analysis of the Rhodopsin Structure

Dataset (Dataset I)

The first dataset we analyzed included a set of rhodopsin X-

ray structures determined at different resolutions. In total, 35

structures were identified from two online membrane pro-

tein resources (see Figure 1). The resolution of the 35 struc-

tures ranged from 1.55 to 4.15 A. A clear linear relationship

between the average number of interactions of the TM

domains of the structures and their resolution was observed

for Approach A with the R-fitting value of 0.63. No such rela-

tionship was observed for Approach B to D (see Figure 1).

Since this dataset included structures of two membrane pro-

teins, rhodopsin GPCRs and bacterial rhodopsins, the overall

low correlation was understandable. Examination of the cor-

relation results for 13 rhodopsin receptors presented in this

dataset indicated a much better correlation for all four

approaches, with Approach A having the best fitting value of

0.91 (Supporting Information Figure 1).

Analysis of the HOMEP Model Dataset (Dataset II)

In the above analyses, the resolution of the X-ray structures

was adopted as the quality measure of the structure. The cor-

relation between the average number of interactions and the

value of this measure seems to vary significantly. To confirm

the finding, the same four approaches were applied to the

analysis of 92 homology models of helical membrane pro-

teins in the HOMEP benchmark dataset.37 HOMEP is a care-

fully compiled dataset of homology models of membrane

proteins with a wide range of sequence identities. For all the

models included in this dataset, their X-ray structures are

available. The 92 models studied here represented 46 query-

template pairs. Judged by the TM-score of their Ca atoms,

the quality of these models varied quite significantly with the

TM-score ranging from 0.33 to 0.99 (see Figure 2).

The relationship between the average number of inter-

residue interactions and the model’s TM-score again varied

for the four approaches. For Approach A, B and C, a roughly

linear function relationship was observed with Approach A

again having the best R-fitting value of 0.71 (Figures 3A-C).

While for Approach D, no clear trend was detected (Figure 3D).

Analysis of the Second Homology Model Dataset

(Dataset III)

The above analyses clearly demonstrated Approach A dis-

plays the best correlation between the average number of

inter-residue interactions and the structure quality measures

among the four approaches. To further validate this conclu-

sion, a dataset of 139 homology models based on 10 mem-

brane protein structures was compiled. For a homology

model, its quality is primarily dependent on its sequence

identity to the template protein. The higher the sequence

identity, the better the model quality. With models in this

dataset displaying the sequence identity of 20–100%, this

dataset included model of a variety of quality.

The relationship between the average number of inter-

residue interactions and the models’ sequence identity to their

templates displayed the same trend for the four approaches.

For Approach A, a clear mono-increasing function relation-

ship was observed (Figure 4A). This is similar to the previous

study using the GPCR homology model dataset.7 While for

Approach B-D, no clear trend was detected. The average num-

ber of interactions obtained based on them seems insensitive to

changes in the sequence identity value (Figures 4B–D).

Possible Relationship Among the Four Approaches

We then explored the possible relationships between the four

approaches for determining inter-residue interactions using

the third dataset, in which the correlation between Approach

B and Approach A, C, and D was studied individually (see

Figure 5). The correlation between Approach B and A is

quite weak, e.g. the linear fitting R-value was only 0.48. In

contrast, there is a strong linear relationship between

Approach B and Approach C and D. The R-value was 0.93

between B and C, and 0.95 between B and D (see Figure 5).

An Example Illustrating the Relationships

of the Four Approaches

The studies above demonstrated that the Approach A is

weakly correlated with Approach B-D while the latter three

have the strong correlation with each other. To further exam-

550 Gao and Li

Biopolymers

Page 5: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

ine the relationships among these four approaches, the

results of one recently published high-resolution membrane

protein structure, the human b2 adrenergic GPCR (PDB ID:

2RH1),49 was analyzed in detail. The b2 adrenergic receptor

is a seven helical membrane protein and belongs to the larg-

est membrane protein superfamily of GPCRs.

The inter-residue interactions within the TM domain of

this protein were determined using the four Approaches A to

D. Since Approach A focuses exclusively on favorable interac-

tions, it is unsurprising that the entire set of interactions

derived using Approach A represented a subset of interac-

tions derived using Approach B or C (see Figure 6). Most of

interactions (64%) from Approach A were also included in

the set from Approach D. On the other hand, interactions

presented in all three sets from Approach B, C and D

accounted for a significant percentage of those three individual

FIGURE 1 Linear correlation between the average number of inter-residue interactions of the TM

domains of 35 rhodopsin structures and their structure resolution for the four approaches A–D.

The PDB ID, studied chain and resolution of the rhodopsin structures are: 1, 1F88 (Chain A, 2.80

A); 2, 1GZM (Chain A, 2.65 A); 3, 1HZX (Chain A, 2.80 A), 4, 1L9H (Chain A, 2.60 A); 5, 1U19

(Chain A, 2.20 A); 6, 2I35 (Chain A, 3.80 A); 7, 2I36 (Chain A, 4.10 A); 8, 2I37 (Chain A, 4.15 A); 9,

2J4Y (Chain A, 3.40 A); 10, 2Z73 (Chain A, 2.50 A); 11, 2ZIY (Chain A, 3.70 A); 12, 3CAP (Chain

A, 2.90 A); 13, 3DQB (Chain A, 3.20 A); 14, 1AT9 (Chain A, 3.00 A); 15, 1C8R (Chain A, 1.80 A);

16, 1C8S (Chain A, 2.00 A); 17, 1E12 (Chain A, 1.80 A); 18, 1H2S (Chain A, 1.93 A); 19, 1H68

(Chain A, 2.10 A); 20, 1JGJ (Chain A, 2.40 A); 21, 1QKO (Chain A, 2.10 A); 22, 1UAZ (Chain A,

3.40 A); 23, 1VGO (Chain A, 2.50 A); 24, 1XIO (Chain A, 2.00 A); 25, 2EI4 (Chain A, 2.10 A); 26,

2Z55 (Chain A, 2.50 A); 27, 3DDL (Chain A, 1.90 A); 28, 1AP9 (Chain A, 2.35 A); 29, 1BRD (Chain

A, 3.50 A); 30, 1BRR (Chain A, 2.90 A); 31, 1BRX (Chain A, 2.30 A); 32, 1C3W (Chain A, 1.55 A);

33, 1KME (Chain A, 2.00 A); 34, 1QHJ (Chain A, 1.90 A); 35, 2BRD (Chain A, 3.50 A).

Inter-Residue Interactions in Membrane Proteins 551

Biopolymers

Page 6: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

sets, 63% for Approach B, 65% for Approach C and 86% for

Approach D. This explained the strong correlation relation-

ship between these three approaches.

DISCUSSIONInter-residue interactions are essential to the folding of solu-

ble and membrane proteins, and analysis of these interac-

tions is of significance to the development of high-quality

tools for the structure prediction of TM proteins.1 The first

step in the analysis of inter-residue interactions is to define

such interactions using an approach based on knowledge of

protein folding and packing. Different approaches have been

proposed over the years,2–7 each grasping a different aspect

of protein structural folds. Thus it is of interest to systemati-

cally compare these approaches by applying them to the anal-

ysis of the same structures. In this work, four widely adopted

approaches for determining inter-residue interactions were

studied by applying them to three different structure datasets

of helical membrane proteins.

The four chosen approaches included one based on spe-

cific inter-residue interactions detected (Approach A), two

employing a single distance cutoff between atoms of two resi-

dues (Approach B and D), and one based on the atom van

der Walls radii of two residues (Approach C). These four

approaches were subsequently applied to three structure

datasets whose quality was measured differently. The first

dataset included 35 X-ray structures of rhodopsin receptors

and bacterial rhodopsins determined at various resolutions.

The quality of these structures was measured by their resolu-

tion. The second dataset included 92 homology models of

helical membrane proteins from the HOMEP dataset.27

These models represents 46 query-template pairs and their

crystal structures were also available, allowing for the direct

comparison quantified by the TM-score. The third dataset

comprised 139 diverse models of membrane proteins. Within

this dataset, seven superfamilies had more than 10 models

and none of them had more than 35 representative models

presented. These ensured that the results would not be biased

by the presence of models from a single superfamily. The

sequence identity between these models and their respective

template proteins ranges from 20 to 100%. Since no experi-

mental structures were reported for these models, the quality

of these models was measured by their sequence identity to

the individual templates.50

For all three datasets, A similar trend in the correlation

between the average number of inter-residue interactions

derived and the individual quality measures adopted was

observed, in which the degree of the correlation decreased in

the order: Approach A [ B � C [D. To some extent, this

observation is understandable. Approach D defines an inter-

action based only on Ca–Ca distances; while Approach B and

C define an interaction somewhat based on the distance

between any two atoms from either side chains or backbones,

and Approach A uses the criteria including not only the

atom-atom distance, but also the type of residues involved.

Since a good structure requires the optimized packing of

both the backbone and side chain atoms, the trend observed

here implicated the importance of taking side-chain atoms

into consideration when developing approaches for deter-

mining inter-residue interactions.

Perhaps the most striking finding in this work was the fact

that for all three datasets whose structure quality was meas-

ured differently, Approach A performed consistently best

among all four approaches. For the 13 structures of the rho-

dopsin receptors in the first dataset, the correlation between

the average number of interactions and the structure resolu-

tion was excellent (R 5 0.91) (Supporting Information

FIGURE 2 Ca TM-score of the homology models in the second

dataset vs. their X-ray structures. The PDB IDs of each query-tem-

plate pair are: 1, 1KQF-1L0V; 2, 1L0V-1KQF; 3, 1KQF-1QLA; 4,

1QLA-1KQF; 5, 1QLA-1L0V; 6, 1U19-1E12; 7, 1E12-1U19; 8,

1L0V-1QLA; 9, 1H68-1U19; 10, 1U19-1M0L; 11, 1U19-1H68; 12,

1M0L-1U19; 13, 1PV6-1PW4; 14, 1PW4-1PV6; 15, 1J4N-1FX8; 16,

1FX8-1J4N; 17, 1FX8-1RC2; 18, 1E12-1H68; 19, 1RC2-1FX8; 20,

1H68-1E12; 21, 1E12-1M0L; 22, 1J4N-1RC2; 23, 1NTM-1KB9; 24,

1M0L-1E12; 25, 1RC2-1J4N; 26, 1BCC-1KB9; 27, 1H68-1M0L; 28,

1KB9-1BCC; 29, 1KB9-1NTM; 30, 1M0L-1H68; 31, 1OGV-1PRC;

32, 1M56-1V54; 33, 1PRC-1OGV; 34, 1BCC-1NTM; 35, 1NTM-

1BCC; 36, 1AR1-1V54; 37, 1EYS-1PRC; 38, 1PRC-1EYS; 39, 1OGV-

1EYS; 40, 1V54-1AR1; 41, 1AR1-1M56; 42, 1V54-1M56; 43, 1M56-

1AR1; 44, 1KPL-1OTS; 45, 1EYS-1OGV;46, 1OTS-1KPL.

552 Gao and Li

Biopolymers

Page 7: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

Figure 1). For the third dataset, it was the only approach that

showed the correlation to the percentage of sequence identity

value. It also performed best for the second dataset, even

given the fact that TM-score does not always correlate line-

arly with the structure quality. These differences reflect the

fundamental differences in the definition for determining an

inter-residue interaction among these four approaches.

Approach A tends to focus only on those inter-residue inter-

actions that form favorable interactions, while Approach B-

D, although with subtle differences (see Figure 5), make no

such differentiations in determining an interaction. As long

as two residues within a specified distance, an interaction

would be assumed regardless whether favorable or unfavora-

ble interactions existing between the two residues.

When a protein folds, numerous favorable and unfavora-

ble inter-residue interactions form within the structure. Con-

sistently, the better a structure, the more average number of

favorable interactions should form. Therefore, it should not

come to a surprise that an approach capturing favorable

interactions demonstrates the good correlation with the

structure quality. However, the folding of a structure also

results in the formation of many unfavorable interactions.

The results reported here seem to suggest that the latter does

not play a significant role in the folding of membrane protein

structures. Overall, the fact that an approach measuring

favorable interactions exclusively performed consistent better

than those measuring overall interactions suggests that the

future development of inter-residue interaction-based quality

measures should focus more on the favorable interactions.

Correlating the average number of inter-residue interac-

tions with structure quality measures can provides an abso-

lute measure of model quality.51 A linear relationship was

observed between the average number of inter-residue inter-

actions and three different quality measures for Approach A.

FIGURE 3 Linear correlation between the average number of inter-residue interactions of the TM

domains of homology models in the second dataset and their TM-score for the four approaches A–

D. The PDB IDs of each query-template pair is the same as in Figure 2.

Inter-Residue Interactions in Membrane Proteins 553

Biopolymers

Page 8: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

FIGURE 4 Linear correlation between the average number of inter-residue interactions of the TM

domains of all models in the third dataset and their sequence identity to respective templates for the

four approaches A–D.

FIGURE 5 Linear correlation between Approach B and Approach A, C, and D in term of the aver-

age number of interactions of the TM domains of all models in the third dataset.

554 Gao and Li

Biopolymers

Page 9: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

By applying the same approach to protein structure models

in question, and subsequently mapping their value of the av-

erage number of inter-residue interactions to the linear

graphs presented in this work, we can make sound judgments

regarding how different the model’s packing from a native-

like fold. Given the fact that few membrane protein-oriented

quality measures are available, we envision this interaction-

based approach will have broad application.

A measure of the average number of inter-residue interac-

tions was employed for the correlation study reported here.

The fact that the best correlation was obtained by Approach

A should not be regarded as the major reason to devalue the

other three approaches. Those approaches may be better

suited for different types of analysis.2–7 On the other hand,

none of the four approaches used in this study has consis-

tently resulted in a perfect correlation with the quality mea-

sures illustrated in this work. This suggests the continuous

need to develop novel approaches that better capture the

essential features of folded TM protein structures.

CONCLUSIONIn summary, three datasets of helical membrane protein

structures were compiled and analyzed using four different

approaches for determining inter-residue interactions within

a TM protein structure. Three conclusions were derived

based on the analysis. First, approaches that take side chain

atoms into consideration give better correlation with the

quality of the structures; Second, the approach focusing

exclusively on favorable interactions performs consistently

better than those focusing on the overall interactions; And

third, none of the four approaches examined has resulted in

the prefect correlation, suggesting the continuous need to de-

velop new approaches. Overall, this study provides new

insights for the development of structure prediction tools for

helical membrane proteins.

The authors thank Drs. Lucy R. Forrest, and Barry Honig for shar-

ing with us the HOMEP dataset of membrane protein models. We

acknowledge the use of the MODBASE database (http://modbase.

compbio.ucsf.edu/modbase-cgi/search_form.cgi) in this work.

REFERENCES1. Gromiha, M. M.; Selvaraj S. Prog Biophys Mol Biol 2004, 86,

235–277.

2. Vendruscolo, M.; Dokholyan, N. V.; Paci, E.; Karplus, M.

Phys Rev E Stat Nonlin Soft Matter Phys 2002, 65, 061910-1–

061910-4.

3. Greene, L. H.; Higman, V. A. J Mol Biol 2003, 334, 781–791.

4. Amitai, G.; Shemesh, A.; Sitbon, E.; Shklar, M.; Netanely, D.;

Venger, I.; Pietrokovski, S. J Mol Biol 2004, 344, 1135–1146.

5. Brinda, K. V.; Vishveshwara, S. Biophys J 2005, 89, 4159–4170.

6. del Sol, A.; O’Meara, P. Proteins 2005, 58, 672–682.

7. Pabuwal, V.; Li, Z. Protein Eng Des Sel , 21, 55–64.

8. Ponnuswamy, P. K. Prog Biophys Mol Biol 1993, 59, 57–103.

9. del Sol, A.; Fujihashi, H.; Amoros, D.; Nussinov R. Protein Sci

2006, 15, 2120–2128.

10. Muppirala, U. K.; Li, Z. Protein Eng Des Sel 2006, 19, 265–275.

11. Barth, P.; Schonbrun, J.; Baker, D. Proc Natl Acad Sci USA 2007,

104, 15682–15687.

12. Fleming, K. G. Curr Opin Biotechnol 2000, 11, 67–71.

13. Liu, Y.; Engelman, D. M.; Gerstein, M. Genome Biol 2002, 3,

research0054.

14. Wallin, E.; von Heijne, G. Protein Sci 1998, 7, 1029–1038.

15. Drews, J. Nat Biotechnol 1996, 14, 1516–1518.

16. Klabunde, T.; Hessler, G. Chembiochem 2002, 3, 928–944.

17. Karnik, S. S.; Gogonea, C.; Patil, S.; Saad, Y.; Takezako, T. Trends

Endocrinol Metab 2003, 14, 431–437.

18. White, S. H. Protein Sci 2004, 13, 1948–1949.

19. Fanelli, F.; De Benedetti, P. G. Chem Rev 2005, 105, 3297–3351.

20. Oliveira, L.; Hulsen, T.; Lutje Hulsik, D.; Paiva, A. C.; Vriend, G.

FEBS Lett 2004, 564, 269–273.

21. Visiers, I.; Ballesteros, J. A. Weinstein, H. Methods Enzymol

2002, 343, 329–371.

22. Kontoyianni, M.; DeWeese, C.; Penzotti, J. E.; Lybrand, T. P.

J Med Chem 1996, 39, 4406–4420.

23. Adamian, L.; Liang, J. J Mol Biol 2001, 311, 891–907.

24. Chamberlain, A. K.; Bowie, J. U. Biophys J 2004, 87, 3460–3469.

25. Eilers, M.; Patel, A. B.; Liu, W.; Smith, S. O. Biophys J 2002, 82,

2720–2736.

26. Eilers, M.; Shekar, S. C.; Shieh, T.; Smith, S. O.; Fleming, P. J.

Proc Natl Acad Sci USA 2000, 97, 5796–5801.

27. Bowie, J. U. Nature 2005, 438, 581–589.

28. Curran, A. R.; Engelman, D. M. Curr Opin Struct Biol 2003, 13,

412–417.

FIGURE 6 Venn diagram showing the relationships of sets

of inter-residue interactions determined using the four approaches

A–D.

Inter-Residue Interactions in Membrane Proteins 555

Biopolymers

Page 10: Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins

29. Liang, J. Curr Opin Chem Biol 2002, 6, 878–884.

30. McAllister, S. R.; Floudas, C. A. Biophys J 2008, 95, 5281–5295.

31. Fleishman, S. J.; Ben-Tal, N. J Mol Biol 2002, 321, 363–378.

32. Wendel, C.; Gohlke, H. Proteins 2008, 70, 984–999.

33. Gromiha, M. M.; Selvaraj S. Int J Biol Macromol 2001, 29, 25–

34.

34. Gromiha, M. M.; Selvaraj S. J Mol Biol 2001, 310, 27–32.

35. Kumarevel, T. S.; Gromiha, M. M.; Ponnuswamy, M. N. Biophys

Chem 1998, 75, 105–113.

36. Gao, J.; Li, Z. Biopolymers 2008, 89, 1174–1178.

37. Forrest, L. R.; Tang, C. L.; Honig, B. Biophys J 2006, 91, 508–

517.

38. Tusnady, G. E.; Dosztanyi, Z.; Simon, I. Nucleic Acids Res 2005,

33 (Database issue), D275–D278.

39. Sali, A.; Blundell, T. L. J Mol Biol 1993, 234, 779–815.

40. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.

N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids

Res 2000, 28, 235–242.

41. Zhang, Y.; Skolnick, J. Nucleic Acids Res 2005, 33, 2302–2309.

42. Pearl, F. M.; Bennett, C. F.; Bray, J. E.; Harrison, A. P.; Martin,

N.; Shepherd, A.; Sillitoe, I.; Thornton, J.; Orengo, C. A. Nucleic

Acids Res 2003, 31, 452–455.

43. Boeckmann, B.; Bairoch, A.; Apweiler, R.; Blatter, M. C.;

Estreicher, A.; Gasteiger, E.; Martin, M. J.; Michoud, K.;

O’Donovan, C.; Phan, I.; Pilbout, S.; Schneider, M. Nucleic

Acids Res 2003, 31, 365–370.

44. Altschul, S. F.; Madden, T. L.; Schaffer, A. A.; Zhang, J.; Zhang, Z.;

Miller, W.; Lipman, D. J. Nucleic Acids Res 1997, 25, 3389–3402.

45. Pieper, U.; Eswar, N.; Braberg, H.; Madhusudhan, M. S.; Davis, F.

P.; Stuart, A. C.; Mirkovic, N.; Rossi, A.; Marti-Renom, M. A.;

Fiser, A, Webb, B.; Greenblatt, D.; Huang, C. C.; Ferrin, T. E.;

Sali, A. Nucleic Acids Res 2004, 32(Database issue), D217–D222.

46. Livingstone, C. D.; Barton, G. J. Comput Appl Biosci 1993, 9,

745–756.

47. Stickle, D. F.; Presta, L. G.; Dill, K. A.; Rose, G. D. J Mol Biol

1992, 226, 1143–1159.

48. Sobolev, V.; Sorokine, A.; Prilusky, J.; Abola, E. E.; Edelman, M.

Bioinformatics 1999, 15, 327–332.

49. Cherezov, V.; Rosenbaum, D. M.; Hanson, M. A.; Rasmussen, S.

G.; Thian, F. S.; Kobilka, T. S.; Choi, H. J.; Kuhn, P.; Weis, W. I.;

Kobilka, B. K.; Stevens, R. C. Science 2007, 318, 1258–1265.

50. Sanchez, R.; Sali, A. Curr Opin Struct Biol 1997, 7, 206–214.

51. Eramian, D.; Eswar, N.; Shen, M. Y.; Sali, A. Protein Sci 2008,

17, 1881–1893.

Reviewing Editor: David Case

556 Gao and Li

Biopolymers