View
220
Download
1
Category
Tags:
Preview:
Citation preview
V6 SS 2009Membrane Bioinformatics
1
V6 Membrane Beta Barrels – Membrane Positioning
Beta-barrels are the second important type of
transmembrane proteins.
They are mostly found in the outer membranes
of bacteria, chloroplasts, and mitochondria.
They function as:
(1) Simple passive pores for transport across
bacterial membranes
(2) active ion transporters for nutrient uptake,
membrane anchors, defense against
pathogenic proteins.
Schulz, Curr Opin Struct Biol 10, 443 (2000)
Georg Schulz (Uni Freiburg):First X-ray structure of porin (1992)
V6 SS 2009Membrane Bioinformatics
2
(1) Schulz: 10 roles for Membrane Beta Barrels1. The number of β-strands is even. The N and C terminiare at the periplasmic barrel end.
2. The β-strand tilt is always around 45° and correspondsto the common β-sheet twist. Only one of the two possibletilt directions is assumed, the other one is an energeticallydisfavored mirror image. (Today: tilt between 20 and 45°)
3. The shear number of an n-stranded barrel is positiveand around n+2, in agreement with the observed tilt.
4. All β-strands are antiparallel and connected locally to their next neighbors along the chain, resulting in a maximum neighborhood correlation.
5. The strand connections at the periplasmic barrel end areshort turns of a couple of residues named T1, T2 and so on.
Schulz, Curr Opin Struct Biol 10, 443 (2000)
V6 SS 2009Membrane Bioinformatics
3
Schulz: 10 roles for Membrane Beta Barrels6. At the external barrel end, the strand connections areusually long loops named L1, L2 and so on.
7. The β-barrel surface contacting the nonpolar membraneinterior consists of aliphatic sidechains forming a nonpolarribbon with a width of about 22 Å.
8. The aliphatic ribbon is lined by two girdles of aromaticsidechains, which have intermediate polarity and contactthe two nonpolar–polar interface layers of the membrane.
9. The sequence variability of all parts of the β-barrel duringevolution is high when compared with soluble proteins.
10. The external loops show exceptionally high sequencevariability and they are usually mobile.
Schulz, Curr Opin Struct Biol 10, 443 (2000)
V6 SS 2009Membrane Bioinformatics
4
shear
Ideal topology, see Fig. on the right.
However, TM -strands do not span the membrane
at 90° (perpendicular to the membrane).
They are usually inclined at an angle to the vertical TM axis.
This results in a shift in the H-bonded residues, termed the shear number.
A shear number of +1 means that the H-bonded partner of the residue at position i
is at position j + 1 rather than j.
Waldispühl et al. Proteins 65, 61 (2006)
V6 SS 2009Membrane Bioinformatics
5
Structures of Membrane Beta Barrels
Nowadays: β-barrels size from small 8-stranded to large 22-stranded proteins.
Oligomerization state: TMBs can against exist as monomers or oligomers.
Their topology is defined by the strand number and shear number (measure of
inclination angle of beta-strand against the axis).
V6 SS 2009Membrane Bioinformatics
6
partiFold
Model is motivated by an abstract physical description of
omps.
It uses -strand contact energy parameters for globular
proteins taken from the program BETAWRAP
[statistical potentials: W(r) = -kT ln p(r)]
Jerome Waldispühl (MIT)
V6 SS 2009Membrane Bioinformatics
7
Structural features
Fundamental features of beta-barrel structures:
(i) The overall shape of the barrel (# of strands, their relative arrangement)
(ii) A list of antiparallel -strand pairs; residue contacts and side chain orientation
(iii) Inclination of TM -strands through the membrane plane.
Waldispühl et al. Proteins 71, 1097 (2008)
V6 SS 2009Membrane Bioinformatics
8
2-tape representation
Decomposition of TMB into individual blocks of antiparallel -strands.
Each strand is involved in two „pairings“.
Figure shows 2-tape representation.
Pairings are made from one tape to the other.
Waldispühl et al. Proteins 71, 1097 (2008)
V6 SS 2009Membrane Bioinformatics
9
New notation
Each block is represented as 4-tuple
22
11
ji
ji
where i1 and j1 are the indices of the strand on the first tape and i2 and j2 are those
on the second tape.
M : -strand residues with side-chains oriented toward the membrane.
C : residues with side-chain oriented toward the channel.
E : unpaired -strand residues
V6 SS 2009Membrane Bioinformatics
10
partiFold
Model is based on an abstract physical description of omps.
It uses -strand contact energy parameters for globular proteins taken from the
program BETAWRAP.
Waldispühl et al. Proteins 71, 1097 (2008)
V6 SS 2009Membrane Bioinformatics
11
partiFold: computation of structures
Compute energies of all conformations
using statistical potential for amino acid
stacking pairs.
Use dynamic programming approach to
sample all possible TMB structures,
compute their energies, and thus the
partition function.
partiFold algorithms then predicts an
ensemble of structural conformations for a
TMB.
Energy function apparently needs to be
refined further ...
Waldispühl et al. Proteins 71, 1097 (2008)
V6 SS 2009Membrane Bioinformatics
12
Another interesting approach: statistics of NP-patterns
Shown here:
Pattern frequencies in
Soluble proteins.
Need to perform
analogous statistics
for TM barrels.
(ongoing work
by Sikander)
Mandel-Gutfreund,
Gregoret,
JMB 323, 453 (2002)
Membrane Bioinformatics – Part II13
TMHMM: 1000 of 4288 predicted E.coli
genes are inner membrane proteins.
737 genes encode proteins with > 100
residues and 2 TM helices.
714 were suitable for cloning into phoA
and gfp fusion vectors.
Both fusions could be obtained for 573
genes, one fusion for an additional 92
genes.
(2) Global Topology Analysis
Daley et al. Science 308, 1321 (2005)
Knowing the topology of a TM protein is
essential to understanding its function.
Idea: generate reference point, e.g.
the location of a protein‘s C terminus.
E.coli attach alkaline phosphatase
(PhoA) to C-terminus that is active only
in the periplasm of E.coli, or green
fluorescent protein (GFP) that
fluoresces only in the cytoplasm.
Membrane Bioinformatics – Part II14
Using homology, 601 proteins could be
assigned a topology.
For 71 of these, the location of the C terminus
was already established.
The results agreed except for 2 cases.
The error rate is therefore ~ 1%.
TMHMM alone predicts the correct C-terminal
location for 78% of the 601 proteins.
By providing unambiguous C-terminal
locations, the TMHMM reliability score
increases for 526 proteins and decreases for
75 proteins.
Global Topology Analysis
Daley et al. Science 308, 1321 (2005)
Membrane Bioinformatics – Part II15
Functional categorization of E.coli inner membrane proteome
Daley et al. Science 308, 1321 (2005)
clear trend for Nin – Cin topologies (even number of TMH)
- largest functional category is transport proteins, many with
6 or 12 TM helices.
Most proteins with unknown function have 6 TM helices.
Membrane Bioinformatics – Part II16
Idea: transfer experimental data set from PhoA and GFP-fusions of 608 proteins to
homologous proteins.
In March 2005 were available, 204 annotated eubacterial and 21 archeal genomes,
with 658,210 annotated protein sequences.
Perform BLAST searches (E-value < 10-5)
30,744 sequence hits where TMHMM predicts 1 TM helix
Second BLAST query with these 30,744 sequences
17,111 „secondary homologs“.
Extend predictions by sequence homology
Granseth et al., J.Mol.Biol. 352, 489 (2005)
Membrane Bioinformatics – Part II17
Unconstrained vs. constrained prediction
Granseth et al., J.Mol.Biol. 352, 489 (2005)
(a) Unconstrained TMHMM predictions for the
full set of 158,182 sequences with 1 predicted
TM helix (grey bars) and constrained predictions
for the 51,208 sequences for which the C-
terminal location or the location of an internal
residue could be annotated (black bars).
The number of proteins with different topologies
are shown; Cin topologies are plotted upwards,
Cout downwards. The number of Cout proteins with
a single TM helix (39,322) is off-scale.
The unconstrained algorithm predicts too many
proteins as Cout.
(b) TMHMM predictions for the 51,208 annotated
sequences before (grey bars)
and after (black bars) constraining the
predictions with the location of the annotated
residue.
Membrane Bioinformatics – Part II18
Most TM proteins are expected to adopt only one topology in the membrane.
Global topology analysis of E.coli inner membrane proteome identified 5 dual-
topology candidates: EmrE, SugE, CrcB, YdgC, YnfY.
All are quite small (~ 100 aa), contain 4 strongly predicted TM segments, contain
only few K and R residues and have very small (K + R) bias.
(3) Dual-topology proteins?
Rapp et al., Nat.Struct.Biol. 13, 112 (2006)
(a) A dual-topology protein inserts into the membrane in two opposite directions. As nearly all helix-bundle membrane proteins have a higher number of lysine (K) and arginine (R) residues in cytoplasmic (in) than in periplasmic (out) loops (the ‚positive-inside‘ rule), dual-topology proteins are expected to have very small (K + R) biases.
Rectangles: TM segmentsblack dots: K and R residues
Membrane Bioinformatics – Part II19
Without solving their 3D structures, how can one prove that a protein has dual
topology?
Such a protein would be particularly sensitive to the addition or removal of a single
positively charged residue in a loop or tail.
measure activities of two different, C-terminally fused reporter proteins:
PhoA (only enzymatically active when in the periplasm)
GFP (fluorescent only when in the cytoplasm).
Concentrate on N-terminus and first loop.
Dual-topology proteins?
Rapp et al., Nat.Struct.Biol. 13, 112 (2006)
Membrane Bioinformatics – Part II20
(a) wt YdgE-PhoA fusion is active,
wt YdgE-GFP fusion is inactive
C-terminus in periplasm (Cout )
wt YdgF behaves oppositely (Cin)
These 2 proteins are topologically
stable.
(b – d) C-terminal orientation of
EmrE, SugE, CrcB, YnfA and
YdgC is highly sensitve to charge
mutations.
For 14 or 19 charge mutations,
both PhoA and GFP activities
change in the direction expected
from the change in (K + R) bias.
Charge mutations shift the orientations of dual-topology TM proteins
Rapp et al., Nat.Struct.Biol. 13, 112 (2006)
Membrane Bioinformatics – Part II21
Experimental techniques to study orientation of proteins in membranes are:
- chemical modification
- spin-labeling
- fluorescence quenching
- X-ray scattering
- neutron diffraction
- electron cryomicroscopy
- NMR
- polarized infrared spectroscopy.
It is very desirable to complement them by computational methods.
- e.g. explicit-solvent molecular dynamics simulations
- here: simplified approach that minimize the protein transfer energy
from water to a hydrophobic slab used as a membrane model.
(4) Positioning of proteins in membranes – OPM database
Adamian & Liang, Proteins 63, 1 (2006)
Membrane Bioinformatics – Part II22
important parameters
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II23
Model protein as a rigid body that freely floats in the planar hydrocarbon core of
a lipid bilayer.
Calculation of transfer energy
Adamian & Liang, Proteins 63, 1 (2006)
ii
MW
iitransferzfASAdzG ,,,
0
ASAi : accessible surface area of atom i
iW-M : solvation parameter of atom i (transfer energy of the atom from water to
membrane interior in kcal/(mol.Å2) )
f(zi): interfacial water concentration profile with = 0.9 Å
0
1
1zzi i
ezf
Membrane Bioinformatics – Part II24
ionization of charged residues
Residues that are typically charged in soluble proteins may become neutral in the
hydrophobic inside of the bilayer!
The ionization/protonation energies of charged residues are described by the
Henderson-Hasselbalch equation:
Lomize et al. Prot.Sci. 15, 1318 (2006)
aioniz
pKpHRTG 3.2at pH = 7
average pKa value Gioniz
in proteins [kcal/mol]
Arg 12.0 6.9
Lys 10.4 4.7
Asp 3.4 4.9
Glu 4.1 4.0
His 6.6 0.6
Membrane Bioinformatics – Part II25
use deterministic 2-step search strategy:
(1) grid scan to determine a set of low-energy combinations of variables z0, d, , grid steps: 0.5 Å for z0 and d, 5° for , 2° for
(2) local energy minimization (Davidon-Fletcher-Powell method) starting from low-
energy points
Also consider energetically best rotation of solvent-exposed charged side chains
(e.g. Lys and Arg) that are situated close to the calculated boundaries and could
be rotated away from the hydrophobic core snorkeling.
Global energy optimization
Adamian & Liang, Proteins 63, 1 (2006)
Membrane Bioinformatics – Part II26
Which solvation parameters to use?
chx and dcd results agree well with experiment, oct agrees poorly.
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II27
Pay attention to …
slightly different parameter sets should be applied for proteins in detergents and
bilayers
Gtransfer should not include contributions of atoms that face internal polar cavities
of TM proteins and that do not directly interact with surrounding bulk lipid
Otherwise, the orientation of many -barrels and pore-forming transporters would
be computed incorrectly
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II28
Main features of model
necessary and sufficient approximations for reproducing the exp. Data
(1) lipid bilayer is represented as planar hydrophobic slab with adjustable
thickness and a narrow interfacial area with a sigmoidal polarity profile
(2) proteins are considered as rigid bodies with flexible side chains; their transfer
energies are minimized with respect to 4 variables
(3) transfer free energy is calculated at an all-atom level using atomic solvation
parameters determined for the water-decadiene system
(4) neglect explicit electrostatic interactions, account for neutralization of charged
residues
(5) eliminate contributions of pore-facing atoms
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II29
parameters of model
The model only depends on 5 atomic solvation parameters (N, O, S, sp2 C,
sp3 C), one constant , and the ionization energies of charged groups.
All can be obtained independently from experimental sources.
Verify method for 24 TM proteins of known 3D structure whose spatial position in
bilayers have been exp studied.
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II30
Average tilt angles
(a) hydrophobic thickness matches well (table 2)
Lomize et al. Prot.Sci. 15, 1318 (2006)
(b) the calculated tilt values are in excellent agreement with NMR data,
they also correlate well with ATR-FTIR data (table 3), although the exp. values are
systematically larger orientational disorder in the experiments?
Membrane Bioinformatics – Part II31
Membrane penetration depths
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II32
Biological membranes differ
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II33
Membrane pentration depths
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II34
Membrane core boundaries
Lomize et al. Prot.Sci. 15, 1318 (2006)
Additional slides
Membrane Bioinformatics – Part II36
application to all other 109 TM protein complexes
80 -helical
28 -barrels
gramicidin dimer
control set:
20 water-soluble proteins
32 monotopic and peripheral proteins
Application to all TM proteins from the PDB
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II37
Peripheral and monotopic
proteins have low penetration
depths.
Calculated tilt angles vary
from 0° - 6°.
TM proteins tend to be
nearly perpendicular to the
membrane, although the
individual helices are on
average tilted by 21°.
Application to membrane proteins
Lomize et al. Prot.Sci. 15, 1318 (2006)
Membrane Bioinformatics – Part II38
Global topology analysis of E.coli inner membrane proteome showed that ca. 20 –
25% of the TM proteins have 10 TM helices.
These are often involved in transport of small molecules across the membrane.
Many of these proteins will have buried helices. Can we identify those?
Develop an empirical helix burial function f based on a few assumptions.
(i) residues in buried helices are more conserved because of structural and
functional contraints.
(ii) the residue composition of the buried helices is different from the composition of
helices facing the lipid environment.
(iii) the difference between the minimal and maximal values of conservation
entropy for every position in MSAs of TM helices should be smaller in buried
helices than in lipid-exposed helices because of the homogenous environment.
(4) Prediction of buried TM helices
Adamian & Liang, Proteins 63, 1 (2006)
Membrane Bioinformatics – Part II39
f: burial function
s: average entropy of all residue positions of the TM helix
l : average lipophilicity
k: sorted entropy values of all residue positions in a helix of length d for helices
1 ... n of the TM protein
Problems: the average entropy depends on the number of sequences in the MSA.
needs MSAs with exactly the same set of sequences from the same set of
species.
Also, the stability of different membrane proteins in the lipid environment may be
different.
Account for ambiguity in the definition of TM helix ends.
Burial Function
Adamian & Liang, Proteins 63, 1 (2006)
lskf
d
ssss d
...21
d
llll d
...21
Membrane Bioinformatics – Part II40
Ranking of TM helices by burial function and robustness
Adamian & Liang, Proteins 63, 1 (2006)
Membrane Bioinformatics – Part II41
(a) TM helices TM4, TM5, TM6, TM8 form core, consistent with prediction.
(b) TM4, TM10 are most buried.
(c) one can explain prediction of TM8 as buried by considering a tightly bound
cardiolipin molecule identified in the X-ray structure.
Examples of buried TM helices that are correctly predicted
Adamian & Liang, Proteins 63, 1 (2006)
Membrane Bioinformatics – Part II42
Is the method applicable to TM
proteins where only sequence data
is available?
Test on structure of Leu transporter.
TMHMM predicts 12 TM helices.
Good overlap with X-ray helices.
Problem that no additional
sequences exist that are annotated
as Na+-dependent Leu transporters.
LeuTAa has 3 significantly buried
helices: 1, 6 and 8.
1 and 6 are true positives, 2 is a
false positive, 8 is a false negative.
Test ranking results
Adamian & Liang, Proteins 63, 1 (2006)
Membrane Bioinformatics – Part II43
Pfam searches in 174 fully sequenced bacterial genomes for homologs (E < 10-10)
to SugE, EmrE, YdgE, CrcB, YnfA, YdgC and YdgO/YdgL.
Create multiple sequence alignment with ClustalW.
Use TMHMM to predict the positions of TM helices.
Obtain consensus TM helix prediction, compute (K + R) biases for individual
proteins. 10 residues from each of the flanking TM helices were included to allow
for possible misprediction of the exact positions of the loop ends.
Dual-topology homologs occur as gene pairs or singletons
Rapp et al., Nat.Struct.Biol. 13, 112 (2006)
Membrane Bioinformatics – Part II44
Interpretation: SMR and CrcB occur as closely spaced pairs or as singletons.
Paired genes encode homologous proteins with opposite (K + R) bias.
Dual-topology homologs occur as gene pairs or singletons
Rapp et al., Nat.Struct.Biol. 13, 112 (2006)
Membrane Bioinformatics – Part II45
Most likely evolutionary scenario:
a single dual-topology protein
undergoes gene duplication, the
two resulting proteins become
fixed in opposite orientations and
finally fuse into a single
polypeptide.
An internally duplicated protein with opposite topology
Rapp et al., Nat.Struct.Biol. 13, 112 (2006)
Recommended