Upload
isabel-miles
View
222
Download
2
Embed Size (px)
Citation preview
Secondary Structure & Solvent accessible surface
CalculationLecture 6
Structural BioinformaticsDr. Avraham Samson
81-871
DSSP
2012 Avraham Samson - Faculty of Medicine - Bar Ilan University
2
Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features
Wolfgang Kabsch, Christian Sander
Biopolymers, Volume 22, Issue 12, pages 2577–2637, December 1983
Hydrogen bond donors and acceptors
the amide nitrogen:
main-chain hydrogen bond donor
the carbonyl oxygen:
main-chain hydrogen bond acceptor
there are also side-chain acceptors and donors
N
CH
C
CH3
O
N
CH
C
H2C
O
O
H
HH
Dihedral angle calculationThe book "Crystal Structure Analysis for Chemists and Biologists" by Jenny P. Glusker gives four different ways of calculating the dihedral angle, p 465-469. Probably the most direct is:
Consider the four atom chain 1 - 2 - 3 - 4
The distances between any two atoms is denoted d(ij).
For example d13 is the distance between atoms 1 and 3. Since you already have cartesian coordinates, this is easily calculated as SQRT( SQ(x3-x1) + SQ(y3-y1) + SQ(z3-z1) )
The dihedral angle is defined as follows: cos(angle) = P/SQRT(Q)
where P = SQ(d12) * ( SQ(d23)+SQ(d34)-SQ(d24)) + SQ(d23) * (-SQ(d23)+SQ(d34)+SQ(d24)) + SQ(d13) * ( SQ(d23)-SQ(d34)+SQ(d24)) - 2 * SQ(d23) * SQ(d14)
and Q = (d12 + d23 + d13) * ( d12 + d23 - d13) * (d12 - d23 + d13) * (-d12 + d23 + d13 ) * (d23 + d34 + d24) * ( d23 + d34 - d24 ) * (d23 - d34 + d24) * (-d23 + d34 + d24 )
A test case, d12 = 2.38, d23 = 1.48, d34 = 1.48, d13 = 3.56, d14 = 3.61, d24 = 2.40
P = 20.83, SQRT(Q) = 21.40, angle = 13.3 degrees
2012 Avraham Samson - Faculty of Medicine - Bar Ilan University
11
More details
• SS-bonds
• Chain breaks
• Handedness (chirality)
• Pymol and molmol use DSSP to assign secondary structure
2012 Avraham Samson - Faculty of Medicine - Bar Ilan University
14
Nelson et al (Eisenberg lab), Nature 435:773 (2005).for background on “polar zippers”: Perutz et al. PNAS 91:5355 (1991)These types of fibrils important in Huntington’s disease etc
amyloid-like fibril(left) of peptide GNNQNNY from the yeast prion protein Sup35, and itsatomic structure (right)
Because of the repetitive nature of secondary structures, and particularly beta-sheets, proteins can form fibrillar structures and aggregates
amide stacks
fibril axis
in the case of this fibril the side chains also hydrogen bond to each other
Fibrillar helical structures: the leucine zipper
GCN4 “leucine zipper” (green) bound asa dimer (two copies of the polypeptide) to target DNA
The GCN4 dimer is formed throughhydrophobic interactions betweenleucines (red) in the two polypeptide chains
Leu
Leu
DSSP Code: H = alpha helix G = 3-helix (3/10 helix) I = 5 helix (pi helix) B = residue in isolated beta-bridge E = extended strand, participates in beta ladderT = hydrogen bonded turn S = bendBlank = loop
• Question: How would you assign structural neighbors (<5 A) from a PDB file?
• Answer: Parse PDB file for atoms with distance less than 5 Angstroms!
19
Contact maps of protein structures
1avg--structure of triabin
map of C-C distances < 6 Å
near diagonal: local contacts in the sequence
off-diagonal: long-range (nonlocal) contacts
rainbow ribbon diagramblue to red: N to C
-both axes are the sequence of the protein
Contact maps of protein structures
Structure of n15 Cro
-both axes are the sequence of the protein
rainbow ribbon diagramblue to red: N to C
map of C-C distances < 6 Å
Contact maps of protein structures
Structure of n15 Cro
-both axes are the sequence of the protein
rainbow ribbon diagramblue to red: N to C
map of all heavy atom distances < 6 Å (includes side chains)
Surface and interior of globular proteins
solvent accessible surface
molecular surface
residue fractional accessibility
pockets and cavities
“hydrophobic core”
ordered waters in protein structures
“Accessible Surface”
Lee & Richards, 1971Shrake & Rupley, 1973
represent atoms as spheres w/appropriateradii and eliminate overlapping parts...
mathematically roll asphere all around thatsurface...
the sphere’scenter tracesout a surfaceas it rolls...
Now look at a cross-section (slice) of a protein structure: Inner surfaces here are van der Waals. Outer surface is that traced out by the center of the sphere as it rolls around the van der Waals’ surface. If any part of the arc around a given atom is traced out, that atom is accessible to solvent. The solvent accessible surface of the atom is defined as the sum the arcs traced around an atom.
solventaccessiblesurface from
Lee &Richards,1971
van der Waalssurface
arc traced around atom
there’s not much solvent accessible surface in the middle
“Accessible surface”/“Molecular surface”
note: these are alternative ways of representing the same reality:the surface which is essentially in contact with solvent
• molecular and accessible surfaces are both useful representations, but molecular surface is more closely related to the actual atomic surfaces. This makes it somewhat better for visualizing the texture of the outer surface, as well as for assessing the shape and volume of any internal cavities.
• you will hear the term Connolly surface used often, after Michael Connolly. A Connolly surface is a particular way of calculating the molecular surface. The accessible surface is also occasionally called the Richards surface, after Fred Richards.
Molecular surface of proteins
depiction of heavy atoms (O, N,C, S) in a protein as van der Waals spheres
depiction of the corresponding “molecular surface”--volume containedby this surface is vdW volume plus“interstitial volume”--spaces in between
The irregular surface of proteins: pockets and cavities
• a pocket is an empty concavity on a protein surface which is accessible to solvent from the outside.
• a cavity or void in a protein is a pocket which has no opening to the outside. It is an interior empty space inside the protein.
Pockets and cavities can be critical features of proteins in terms of their binding behavior, and identifying them is usually a first step in structure-based ligand design etc.
Fractional accessibility
• calculate total solvent accessible surface of protein structure (also can calculate solvent accessible surface for individual residues/sidechains within the protein)
• can also model the accessible surface area in a disordered or unfolded protein using accessible surface area calculations on model tripeptides such as Ala-X-Ala or Gly-X-Gly.
• from these we can calculate what fraction of the surface is buried (inaccessible to solvent) by virtue of being within the folded, native structure of the protein.
• this is done by dividing the accessible surface area in the native protein structure by the accessible surface in the modelled unfolded protein. That’s the fractional accessibility. The residue fractional accessibility and side chain fractional accessibility refer to the same thing calculated for individual residues/sidechains within the structure.
Accessible surface area in globular protein structures
Accessible surface area As in native states of proteins is a non-linear
function of molecular weight (Miller, Janin, Lesk & Chothia, 1987):
As = 6.3Mr0.73
` where Mr is molecular wt
This is an empiricalcorrelation but it comesclose to the expectedtwo-thirds power law relating surface area tovolume or mass for a setof bodies of similar shapeand density.
How much surface area is buried when a protein adopts its native structure in solution?
• estimate total accessible surface area in extended/disorded polypeptide chain using the accessible surface areas in Gly-X-Gly or Ala-X-Ala models. This is a linear function of molecular weight
At = 1.48Mr + 21
• the total fractional accessibility is As/At ,and the fraction of surface
area buried is 1- As /At • What is the total fractional surface area buried for a protein of molecular
weight 10,000? 20,000? Is the fraction higher for small proteins or large?
Distribution of residue fractional accessibilities
note broad distribution among non-buried residues, and mean fractional accessibility for non-buried residuesof around 0.5
note that few residues arecompletely exposed to solvent, but that fractionalaccessibility of >1 is possible
from Miller et al,1987
note that a sizeable group are completely buried(hatched) or nearly completely buried
Buried residues in proteins
size class mean Mrfraction of buried residues0% ASA 5% ASA
small 8000 0.070 0.154medium 16000 0.107 0.240large 25000 0.139 0.309XL 34000 0.155 0.324all 0.118 0.257
•the fraction of buried residues (defined by 0% or 5% ASA cutoffs) increases as a function of molecular weight--for your average protein around 25% of the residues will be buried. These form the core.
Residue fractional accessibility correlates with free energies of transfer for amino acids between water
and organic solvents
• (Miller, Janin, Lesk & Chothia, 1987)
• (Fauchere & Pliska, 1983)
• the interior of a protein is akin to a
nonpolar solvent in which the nonpolar
sidechains are buried. Polar sidechains,
on the other hand, are usually on the
surface. However, some polar side chains
do get buried, and it must also be
remembered that the backbone for every
residue is polar, including those with
nonpolar side chains. So a lot of polar
moieties do get buried in proteins.
The hydrophobic core of a small protein: N15 Cro
0% ASA:Pro 3Leu 6Ala 16Val 27Ile 36Ile 44< 5 % ASA:Met 1Ala 17Val 20Gln 41Ser 54
11 of 66 ordered residues have less than 5% ASA
note that some polar residuesare buried
The outer surface: water in protein structures
Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well-defined positions, and waters may also occur internally
Water is not just surrounding the protein--it is interacting with it
Water interacts with protein surfaces
second shell water:only contacts other waters
first shell waters:in contact with/hydrogen boundto protein
Most waters visible in crystal structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both
REM --------------- Detailed secondary structure assignment------------- 1L4W
REM 1L4W
REM |---Residue---| |--Structure--| |-Phi-| |-Psi-| |-Area-| 1L4W
ASG ILE A 1 1 C Coil 360.00 168.01 69.6 1L4W
ASG VAL A 2 2 E Strand -97.71 163.93 42.5 1L4W
ASG CYS A 3 3 E Strand -164.52 149.74 1.4 1L4W
ASG HIS A 4 4 E Strand -98.82 174.84 39.5 1L4W
ASG THR A 5 5 E Strand -171.97 161.21 25.5 1L4W
ASG THR A 6 6 E Strand -119.23 98.92 13.1 1L4W
ASG ALA A 7 7 C Coil -159.51 -46.53 10.0 1L4W
ASG THR A 8 8 T Turn -76.14 -145.16 41.5 1L4W
ASG SER A 9 9 T Turn -67.19 -64.98 58.7 1L4W
ASG PRO A 10 10 T Turn -98.83 -165.54 75.7 1L4W
ASG ILE A 11 11 E Strand -63.95 136.61 71.6 1L4W
ASG SER A 12 12 E Strand -95.58 151.90 4.8 1L4W
ASG ALA A 13 13 E Strand -149.03 116.85 55.7 1L4W
ASG VAL A 14 14 E Strand -140.58 165.04 77.2 1L4W
ASG THR A 15 15 E Strand -95.72 140.63 82.1 1L4W
ASG CYS A 16 16 C Coil -90.67 106.54 11.5 1L4W
ASG PRO A 17 17 C Coil -62.41 -47.14 122.3 1L4W
ASG PRO A 18 18 T Turn -71.40 -166.42 60.1 1L4W
ASG GLY A 19 19 T Turn -69.07 -28.03 66.1 1L4W
ASG GLU A 20 20 T Turn -76.00 94.17 91.2 1L4W
ASG ASN A 21 21 T Turn -121.17 1.96 35.5 1L4W
ASG LEU A 22 22 E Strand -69.97 133.22 51.5 1L4W
ASG CYS A 23 23 E Strand -99.29 111.44 0.0 1L4W
ASG TYR A 24 24 E Strand -96.27 149.93 62.1 1L4W
ASG ARG A 25 25 E Strand -118.58 83.18 17.2 1L4W
ASG LYS A 26 26 E Strand -78.88 139.08 32.1 1L4W
ASG MET A 27 27 E Strand -156.68 130.00 34.7 1L4W
ASG TRP A 28 28 E Strand -135.36 -157.76 57.9 1L4W
ASG CYS A 29 29 E Strand -110.51 120.76 33.8 1L4W
ASG ASP A 30 30 E Strand -140.95 83.38 68.8 1L4W
ASG ALA A 31 31 B Bridge 96.09 -30.41 13.7 1L4W
ASG PHE A 32 32 T Turn -64.73 -31.60 104.7 1L4W
ASG CYS A 33 33 T Turn -76.46 -35.27 97.0 1L4W
ASG SER A 34 34 T Turn -92.60 -74.82 109.8 1L4W
ASG SER A 35 35 T Turn -142.87 -52.13 100.6 1L4W
ASG ARG A 36 36 C Coil -73.80 -90.71 148.5 1L4W
ASG GLY A 37 37 E Strand -161.56 -176.78 0.0 1L4W
2012 Avraham Samson - Faculty of Medicine - Bar Ilan University
42
Structure Analysis
• Assign secondary structure for amino acids from 3D structure
• Generate solvent accessible area for amino acids from 3D structure
• Most widely used tool: DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Kabsch and Sander, 1983)
2D: Contact Map Prediction
1 2 ………..………..…j...…………………..…n 123....i.......n
3D Structure 2D Contact Map
Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005
Distance Threshold = 8Ao
3D Structure Prediction Tools
• MULTICOM (http://sysbio.rnet.missouri.edu/multicom_toolbox/index.html )
• I-TASSER (http://zhang.bioinformatics.ku.edu/I-TASSER/)• HHpred (
http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred)• Robetta (http://robetta.bakerlab.org/)• 3D-Jury (http://bioinfo.pl/Meta/)• FFAS (http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl)• Pcons (http://pcons.net/)• Sparks (http://phyyz4.med.buffalo.edu/hzhou/anonymous-fold-sp3.html)• FUGUE (http://www-cryst.bioc.cam.ac.uk/%7Efugue/prfsearch.html)• FOLDpro (http://mine5.ics.uci.edu:1026/foldpro.html)• SAM (http://www.cse.ucsc.edu/research/compbio/sam.html)• Phyre (http://www.sbg.bio.ic.ac.uk/~phyre/)• 3D-PSSM (http://www.sbg.bio.ic.ac.uk/3dpssm/)• mGenThreader (http://bioinf.cs.ucl.ac.uk/psipred/psiform.html)