Protein Structure Prediction: Inte- gral Membrane …dburns/547/TMHMM.pdfProtein Structure Prediction: Inte-gral Membrane Proteins. These are proteins which have segments bound within

Protein Structure Prediction: Inte-gral Membrane Proteins.

These are proteins which have segments bound

within the lipid bilayer of the cell membrane

(also interior membrane walls). Current esti-

mates are that 20-40% of all coded proteins

are such proteins, across species. (20-30%

are helix bundle proteins of the kind discussed

mainly today.) As we have seen, these include

various active and inactive transport channels,

and receptor proteins. All interaction with the

cell’s environment pass through such proteins.

Further, the typical hydrophobicity of the seg-

ment of such a protein facing into the lipid bi-

layer makes such proteins difficult to crystalize

and hence to obtain accurate X-ray diffraction

measurements of their structures. All of this

makes them a useful and important target of

sequence based analysis.

Protein structure vocabulary: primary, secondary,

tertiary and quaternary structure. Primary means

the amino acid sequence information. Sec-

ondary means the most basic recurring struc-

tural constituents. We have already seen the

α-helix as the first example of such an ele-

ment. Tertiary structure is basically how the

secondary structure elements are arranged into

three dimensional space to form the protein.

Quaternary structure refers to multimer con-

figurations of proteins, so that, e.g., the ac-

tive protein might be a homodimer composed

of two copies of the same protein monomer,

with a necessary three dimensional configura-

tion for activity.

At the level of secondary structure, integral

membrane proteins involve the two most com-

mon secondary structure elements: α-helices

and β-strands. The most common alternative

to the structures built out of α-helices travers-

ing the bilayer is the so-called β-barrel.

Here are cartoons of these two classes:

Here is a schematic for the α-helix:

Here is a cartoon of a porin β-barrel:

The yellow ribbons are the (anti-parallel) β-

strands. Porins generally provide a simple dif-

fusion pathway across the membrane for molecules

less than 1 kDa with little substrate selectiv-

ity. (There is a classs of proteins called aqua-

porins which allow passage of H2O across the

membrane, but which are composed mainly of

α-helices.)

Study TMHMM today, the tool used last time

to “parse” GPCRs (or “7TM” proteins in gen-

eral) into internal, lipid bilayer (or trans-membrane)

and external portions of the protein, the “topol-

ogy” of the protein. It has been developed by

A. Krogh and coworkers, most notably G. von

Heijne. This is an HMM tool – TMHMM –

which is available via an online server:

http://www.cbs.dtu.dk/services/TMHMM/.

Because the lipid bilayer is so different from

the intracellular or extracellular environment,

there is a strong signal implicit in these por-

tions of membrane spanning proteins, so the

basic problem should be amenable to treat-

ment by HMMs. More prescisely the tool pre-

dicts trans-membrane helices rather than just

membrane embedded portions of proteins. β-

barrels are the target of another, less widely

used tool by Krogh and co-workers.

There were classically (i.e., 15 years ago) two

features, or rules, which were used to deter-

mine or predict TM protein topology: (1) the

“positive inside rule”, according to which if

there were charged residues, these were in the

cytoplasmic segment of the protein, and (2)

the hydrophobic residues should be to the lipid

layer. For β-barrels, this can be somewhat mis-

leading (green = polar, white = aromatic, red

= non-polar):

First, the HMM architecture of TMHMM, which

follows the simplest idea of the general struc-

ture of such a protein:

Notice that there are seven submodules here:

helix core, cytoplasmic and non-cytoplasmic

cap regions, cytoplasmic loops, non-cytoplasmic

long and short loops, and “globular” regions.

This latter is really a catch-all for things like

casettes in the interior of the cell and recep-

tor structures in the cell exterior. The caps

are there because of the ambiguity, even at

the wet lab bench, of what the exact extent of

the length of the portion of the helix contained

strictly speaking within the bilayer: a kind of

transition segment. Some TM proteins have

active elements here, such as charged residues,

and so their composition may indeed be differ-

ent, and so it is a good idea to model this as

a possibly different signal.

The helix core has an estimate of the maxi-

mal number of residues to traverse the bilayer

(25 – minus the two end cap regions!), and a

minimum estimate (5 + caps).

Here are some results: first, for single sequences,

the method is about 97.5% accurate for pre-

dicting TM helices. The topology is correct

about 77% of the time, with an additional 7%

if you allow a flip of cytoplasmic/exterior.

these are based on genome wide screens for

several species.

Here is the text output of TMHMM, for humanSulfonylurea receptor 1 (SwissProt Q09428):

TMHMM result https://ctools.umich.edu/access/content/group/1118774169420-2039...

1 of 1 12/5/05 8:48 AM

TMHMM resultHELP with output formats

# sp_Q09428_ACC8_HUMAN Length: 1580# sp_Q09428_ACC8_HUMAN Number of predicted TMHs: 16# sp_Q09428_ACC8_HUMAN Exp number of AAs in TMHs: 352.04622# sp_Q09428_ACC8_HUMAN Exp number, first 60 AAs: 21.39772# sp_Q09428_ACC8_HUMAN Total prob of N-in: 0.00094# sp_Q09428_ACC8_HUMAN POSSIBLE N-term signal sequencesp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1 28sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 29 51sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 52 71sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 72 94sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 95 103sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 104 123sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 124 134sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 135 157sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 158 166sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 167 189sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 190 300sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 301 323sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 324 349sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 350 367sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 368 426sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 427 449sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 450 453sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 454 476sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 477 536sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 537 559sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 560 573sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 574 596sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 597 1005sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1006 1028sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1029 1061sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1062 1084sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 1085 1153sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1154 1176sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1177 1247sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1248 1270sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 1271 1274sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1275 1297sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1298 1580

# plot in postscript, script for making the plot in gnuplot, data for plot

Here is a TMHMM graphical output for a Halobac-

terium archaerhodopsin, so one of our 7TM

proteins encountered earlier, showing the (three)

posterior probabilities computed at each posi-

tion:

Documents

Protein Structure Prediction: Inte- gral Membrane …dburns/547/TMHMM.pdfProtein Structure Prediction: Inte-gral Membrane Proteins. These are proteins which have segments bound within