Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Protein Structure Prediction: Inte-gral Membrane Proteins.
These are proteins which have segments bound
within the lipid bilayer of the cell membrane
(also interior membrane walls). Current esti-
mates are that 20-40% of all coded proteins
are such proteins, across species. (20-30%
are helix bundle proteins of the kind discussed
mainly today.) As we have seen, these include
various active and inactive transport channels,
and receptor proteins. All interaction with the
cell’s environment pass through such proteins.
Further, the typical hydrophobicity of the seg-
ment of such a protein facing into the lipid bi-
layer makes such proteins difficult to crystalize
and hence to obtain accurate X-ray diffraction
measurements of their structures. All of this
makes them a useful and important target of
sequence based analysis.
Protein structure vocabulary: primary, secondary,
tertiary and quaternary structure. Primary means
the amino acid sequence information. Sec-
ondary means the most basic recurring struc-
tural constituents. We have already seen the
α-helix as the first example of such an ele-
ment. Tertiary structure is basically how the
secondary structure elements are arranged into
three dimensional space to form the protein.
Quaternary structure refers to multimer con-
figurations of proteins, so that, e.g., the ac-
tive protein might be a homodimer composed
of two copies of the same protein monomer,
with a necessary three dimensional configura-
tion for activity.
At the level of secondary structure, integral
membrane proteins involve the two most com-
mon secondary structure elements: α-helices
and β-strands. The most common alternative
to the structures built out of α-helices travers-
ing the bilayer is the so-called β-barrel.
Here are cartoons of these two classes:
Here is a schematic for the α-helix:
Here is a cartoon of a porin β-barrel:
The yellow ribbons are the (anti-parallel) β-
strands. Porins generally provide a simple dif-
fusion pathway across the membrane for molecules
less than 1 kDa with little substrate selectiv-
ity. (There is a classs of proteins called aqua-
porins which allow passage of H2O across the
membrane, but which are composed mainly of
α-helices.)
Study TMHMM today, the tool used last time
to “parse” GPCRs (or “7TM” proteins in gen-
eral) into internal, lipid bilayer (or trans-membrane)
and external portions of the protein, the “topol-
ogy” of the protein. It has been developed by
A. Krogh and coworkers, most notably G. von
Heijne. This is an HMM tool – TMHMM –
which is available via an online server:
http://www.cbs.dtu.dk/services/TMHMM/.
Because the lipid bilayer is so different from
the intracellular or extracellular environment,
there is a strong signal implicit in these por-
tions of membrane spanning proteins, so the
basic problem should be amenable to treat-
ment by HMMs. More prescisely the tool pre-
dicts trans-membrane helices rather than just
membrane embedded portions of proteins. β-
barrels are the target of another, less widely
used tool by Krogh and co-workers.
There were classically (i.e., 15 years ago) two
features, or rules, which were used to deter-
mine or predict TM protein topology: (1) the
“positive inside rule”, according to which if
there were charged residues, these were in the
cytoplasmic segment of the protein, and (2)
the hydrophobic residues should be to the lipid
layer. For β-barrels, this can be somewhat mis-
leading (green = polar, white = aromatic, red
= non-polar):
First, the HMM architecture of TMHMM, which
follows the simplest idea of the general struc-
ture of such a protein:
Notice that there are seven submodules here:
helix core, cytoplasmic and non-cytoplasmic
cap regions, cytoplasmic loops, non-cytoplasmic
long and short loops, and “globular” regions.
This latter is really a catch-all for things like
casettes in the interior of the cell and recep-
tor structures in the cell exterior. The caps
are there because of the ambiguity, even at
the wet lab bench, of what the exact extent of
the length of the portion of the helix contained
strictly speaking within the bilayer: a kind of
transition segment. Some TM proteins have
active elements here, such as charged residues,
and so their composition may indeed be differ-
ent, and so it is a good idea to model this as
a possibly different signal.
The helix core has an estimate of the maxi-
mal number of residues to traverse the bilayer
(25 – minus the two end cap regions!), and a
minimum estimate (5 + caps).
Here are some results: first, for single sequences,
the method is about 97.5% accurate for pre-
dicting TM helices. The topology is correct
about 77% of the time, with an additional 7%
if you allow a flip of cytoplasmic/exterior.
these are based on genome wide screens for
several species.
Here is the text output of TMHMM, for humanSulfonylurea receptor 1 (SwissProt Q09428):
TMHMM result https://ctools.umich.edu/access/content/group/1118774169420-2039...
1 of 1 12/5/05 8:48 AM
TMHMM resultHELP with output formats
# sp_Q09428_ACC8_HUMAN Length: 1580# sp_Q09428_ACC8_HUMAN Number of predicted TMHs: 16# sp_Q09428_ACC8_HUMAN Exp number of AAs in TMHs: 352.04622# sp_Q09428_ACC8_HUMAN Exp number, first 60 AAs: 21.39772# sp_Q09428_ACC8_HUMAN Total prob of N-in: 0.00094# sp_Q09428_ACC8_HUMAN POSSIBLE N-term signal sequencesp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1 28sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 29 51sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 52 71sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 72 94sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 95 103sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 104 123sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 124 134sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 135 157sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 158 166sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 167 189sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 190 300sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 301 323sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 324 349sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 350 367sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 368 426sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 427 449sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 450 453sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 454 476sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 477 536sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 537 559sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 560 573sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 574 596sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 597 1005sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1006 1028sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1029 1061sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1062 1084sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 1085 1153sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1154 1176sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1177 1247sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1248 1270sp_Q09428_ACC8_HUMAN TMHMM2.0 inside 1271 1274sp_Q09428_ACC8_HUMAN TMHMM2.0 TMhelix 1275 1297sp_Q09428_ACC8_HUMAN TMHMM2.0 outside 1298 1580
# plot in postscript, script for making the plot in gnuplot, data for plot
Here is a TMHMM graphical output for a Halobac-
terium archaerhodopsin, so one of our 7TM
proteins encountered earlier, showing the (three)
posterior probabilities computed at each posi-
tion: