45
Secondary Structure & Solvent accessible surface Calculation Lecture 6 Structural Bioinformatics Dr. Avraham Samson 81-871

Secondary Structure & Solvent accessible surface Calculation Lecture 6 Structural Bioinformatics Dr. Avraham Samson 81-871

Embed Size (px)

Citation preview

Secondary Structure & Solvent accessible surface

CalculationLecture 6

Structural BioinformaticsDr. Avraham Samson

81-871

DSSP

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

2

Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features

Wolfgang Kabsch, Christian Sander

Biopolymers, Volume 22, Issue 12, pages 2577–2637, December 1983

AminoAcids

SecondaryStructure

Solvent Accessibility

Hydrogen bond donors and acceptors

the amide nitrogen:

main-chain hydrogen bond donor

the carbonyl oxygen:

main-chain hydrogen bond acceptor

there are also side-chain acceptors and donors

N

CH

C

CH3

O

N

CH

C

H2C

O

O

H

HH

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

5

Hydrogen bonded turns

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

7

Hydrogen bonded bridges

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

8

Bend

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

9

Chirality

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

10

Dihedral angle calculationThe book "Crystal Structure Analysis for Chemists and Biologists" by Jenny P. Glusker gives four different ways of calculating the dihedral angle, p 465-469. Probably the most direct is:

Consider the four atom chain 1 - 2 - 3 - 4

The distances between any two atoms is denoted d(ij).

For example d13 is the distance between atoms 1 and 3. Since you already have cartesian coordinates, this is easily calculated as SQRT( SQ(x3-x1) + SQ(y3-y1) + SQ(z3-z1) )

The dihedral angle is defined as follows: cos(angle) = P/SQRT(Q)

where P = SQ(d12) * ( SQ(d23)+SQ(d34)-SQ(d24)) + SQ(d23) * (-SQ(d23)+SQ(d34)+SQ(d24)) + SQ(d13) * ( SQ(d23)-SQ(d34)+SQ(d24)) - 2 * SQ(d23) * SQ(d14)

and Q = (d12 + d23 + d13) * ( d12 + d23 - d13) * (d12 - d23 + d13) * (-d12 + d23 + d13 ) * (d23 + d34 + d24) * ( d23 + d34 - d24 ) * (d23 - d34 + d24) * (-d23 + d34 + d24 )

A test case, d12 = 2.38, d23 = 1.48, d34 = 1.48, d13 = 3.56, d14 = 3.61, d24 = 2.40

P = 20.83, SQRT(Q) = 21.40, angle = 13.3 degrees

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

11

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

12

Helices

Ladders and sheets

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

13

More details

• SS-bonds

• Chain breaks

• Handedness (chirality)

• Pymol and molmol use DSSP to assign secondary structure

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

14

Nelson et al (Eisenberg lab), Nature 435:773 (2005).for background on “polar zippers”: Perutz et al. PNAS 91:5355 (1991)These types of fibrils important in Huntington’s disease etc

amyloid-like fibril(left) of peptide GNNQNNY from the yeast prion protein Sup35, and itsatomic structure (right)

Because of the repetitive nature of secondary structures, and particularly beta-sheets, proteins can form fibrillar structures and aggregates

amide stacks

fibril axis

in the case of this fibril the side chains also hydrogen bond to each other

Fibrillar helical structures: the leucine zipper

GCN4 “leucine zipper” (green) bound asa dimer (two copies of the polypeptide) to target DNA

The GCN4 dimer is formed throughhydrophobic interactions betweenleucines (red) in the two polypeptide chains

Leu

Leu

DSSP Code: H = alpha helix G = 3-helix (3/10 helix) I = 5 helix (pi helix) B = residue in isolated beta-bridge E = extended strand, participates in beta ladderT = hydrogen bonded turn S = bendBlank = loop

18

• Question: How would you assign structural neighbors (<5 A) from a PDB file?

• Answer: Parse PDB file for atoms with distance less than 5 Angstroms!

19

Contact maps of protein structures

1avg--structure of triabin

map of C-C distances < 6 Å

near diagonal: local contacts in the sequence

off-diagonal: long-range (nonlocal) contacts

rainbow ribbon diagramblue to red: N to C

-both axes are the sequence of the protein

Contact maps of protein structures

Structure of n15 Cro

-both axes are the sequence of the protein

rainbow ribbon diagramblue to red: N to C

map of C-C distances < 6 Å

Contact maps of protein structures

Structure of n15 Cro

-both axes are the sequence of the protein

rainbow ribbon diagramblue to red: N to C

map of all heavy atom distances < 6 Å (includes side chains)

Surface and interior of globular proteins

solvent accessible surface

molecular surface

residue fractional accessibility

pockets and cavities

“hydrophobic core”

ordered waters in protein structures

“Accessible Surface”

Lee & Richards, 1971Shrake & Rupley, 1973

represent atoms as spheres w/appropriateradii and eliminate overlapping parts...

mathematically roll asphere all around thatsurface...

the sphere’scenter tracesout a surfaceas it rolls...

Now look at a cross-section (slice) of a protein structure: Inner surfaces here are van der Waals. Outer surface is that traced out by the center of the sphere as it rolls around the van der Waals’ surface. If any part of the arc around a given atom is traced out, that atom is accessible to solvent. The solvent accessible surface of the atom is defined as the sum the arcs traced around an atom.

solventaccessiblesurface from

Lee &Richards,1971

van der Waalssurface

arc traced around atom

there’s not much solvent accessible surface in the middle

“Accessible surface”/“Molecular surface”

note: these are alternative ways of representing the same reality:the surface which is essentially in contact with solvent

• molecular and accessible surfaces are both useful representations, but molecular surface is more closely related to the actual atomic surfaces. This makes it somewhat better for visualizing the texture of the outer surface, as well as for assessing the shape and volume of any internal cavities.

• you will hear the term Connolly surface used often, after Michael Connolly. A Connolly surface is a particular way of calculating the molecular surface. The accessible surface is also occasionally called the Richards surface, after Fred Richards.

Molecular surface of proteins

depiction of heavy atoms (O, N,C, S) in a protein as van der Waals spheres

depiction of the corresponding “molecular surface”--volume containedby this surface is vdW volume plus“interstitial volume”--spaces in between

The irregular surface of proteins: pockets and cavities

• a pocket is an empty concavity on a protein surface which is accessible to solvent from the outside.

• a cavity or void in a protein is a pocket which has no opening to the outside. It is an interior empty space inside the protein.

Pockets and cavities can be critical features of proteins in terms of their binding behavior, and identifying them is usually a first step in structure-based ligand design etc.

Fractional accessibility

• calculate total solvent accessible surface of protein structure (also can calculate solvent accessible surface for individual residues/sidechains within the protein)

• can also model the accessible surface area in a disordered or unfolded protein using accessible surface area calculations on model tripeptides such as Ala-X-Ala or Gly-X-Gly.

• from these we can calculate what fraction of the surface is buried (inaccessible to solvent) by virtue of being within the folded, native structure of the protein.

• this is done by dividing the accessible surface area in the native protein structure by the accessible surface in the modelled unfolded protein. That’s the fractional accessibility. The residue fractional accessibility and side chain fractional accessibility refer to the same thing calculated for individual residues/sidechains within the structure.

Accessible surface area in globular protein structures

Accessible surface area As in native states of proteins is a non-linear

function of molecular weight (Miller, Janin, Lesk & Chothia, 1987):

As = 6.3Mr0.73

` where Mr is molecular wt

This is an empiricalcorrelation but it comesclose to the expectedtwo-thirds power law relating surface area tovolume or mass for a setof bodies of similar shapeand density.

How much surface area is buried when a protein adopts its native structure in solution?

• estimate total accessible surface area in extended/disorded polypeptide chain using the accessible surface areas in Gly-X-Gly or Ala-X-Ala models. This is a linear function of molecular weight

At = 1.48Mr + 21

• the total fractional accessibility is As/At ,and the fraction of surface

area buried is 1- As /At • What is the total fractional surface area buried for a protein of molecular

weight 10,000? 20,000? Is the fraction higher for small proteins or large?

Distribution of residue fractional accessibilities

note broad distribution among non-buried residues, and mean fractional accessibility for non-buried residuesof around 0.5

note that few residues arecompletely exposed to solvent, but that fractionalaccessibility of >1 is possible

from Miller et al,1987

note that a sizeable group are completely buried(hatched) or nearly completely buried

Buried residues in proteins

size class mean Mrfraction of buried residues0% ASA 5% ASA

small 8000 0.070 0.154medium 16000 0.107 0.240large 25000 0.139 0.309XL 34000 0.155 0.324all 0.118 0.257

•the fraction of buried residues (defined by 0% or 5% ASA cutoffs) increases as a function of molecular weight--for your average protein around 25% of the residues will be buried. These form the core.

Residue fractional accessibility correlates with free energies of transfer for amino acids between water

and organic solvents

• (Miller, Janin, Lesk & Chothia, 1987)

• (Fauchere & Pliska, 1983)

• the interior of a protein is akin to a

nonpolar solvent in which the nonpolar

sidechains are buried. Polar sidechains,

on the other hand, are usually on the

surface. However, some polar side chains

do get buried, and it must also be

remembered that the backbone for every

residue is polar, including those with

nonpolar side chains. So a lot of polar

moieties do get buried in proteins.

The hydrophobic core of a small protein: N15 Cro

0% ASA:Pro 3Leu 6Ala 16Val 27Ile 36Ile 44< 5 % ASA:Met 1Ala 17Val 20Gln 41Ser 54

11 of 66 ordered residues have less than 5% ASA

note that some polar residuesare buried

The outer surface: water in protein structures

Structures of water-soluble proteins determined at reasonably high resolution will be decorated on their outer surfaces with water molecules (cyan balls) with relatively well-defined positions, and waters may also occur internally

Water is not just surrounding the protein--it is interacting with it

Water interacts with protein surfaces

second shell water:only contacts other waters

first shell waters:in contact with/hydrogen boundto protein

Most waters visible in crystal structures make hydrogen bonds to each other and/or to the protein, as donor/acceptor/both

DSSP Web Service

http://mrs.cmbi.ru.nl/hsspsoap/

AminoAcids

SecondaryStructure

Solvent Accessibility

STRIDE web service

http://webclu.bio.wzw.tum.de/cgi-bin/stride/stridecgi.py

41

REM --------------- Detailed secondary structure assignment------------- 1L4W

REM 1L4W

REM |---Residue---| |--Structure--| |-Phi-| |-Psi-| |-Area-| 1L4W

ASG ILE A 1 1 C Coil 360.00 168.01 69.6 1L4W

ASG VAL A 2 2 E Strand -97.71 163.93 42.5 1L4W

ASG CYS A 3 3 E Strand -164.52 149.74 1.4 1L4W

ASG HIS A 4 4 E Strand -98.82 174.84 39.5 1L4W

ASG THR A 5 5 E Strand -171.97 161.21 25.5 1L4W

ASG THR A 6 6 E Strand -119.23 98.92 13.1 1L4W

ASG ALA A 7 7 C Coil -159.51 -46.53 10.0 1L4W

ASG THR A 8 8 T Turn -76.14 -145.16 41.5 1L4W

ASG SER A 9 9 T Turn -67.19 -64.98 58.7 1L4W

ASG PRO A 10 10 T Turn -98.83 -165.54 75.7 1L4W

ASG ILE A 11 11 E Strand -63.95 136.61 71.6 1L4W

ASG SER A 12 12 E Strand -95.58 151.90 4.8 1L4W

ASG ALA A 13 13 E Strand -149.03 116.85 55.7 1L4W

ASG VAL A 14 14 E Strand -140.58 165.04 77.2 1L4W

ASG THR A 15 15 E Strand -95.72 140.63 82.1 1L4W

ASG CYS A 16 16 C Coil -90.67 106.54 11.5 1L4W

ASG PRO A 17 17 C Coil -62.41 -47.14 122.3 1L4W

ASG PRO A 18 18 T Turn -71.40 -166.42 60.1 1L4W

ASG GLY A 19 19 T Turn -69.07 -28.03 66.1 1L4W

ASG GLU A 20 20 T Turn -76.00 94.17 91.2 1L4W

ASG ASN A 21 21 T Turn -121.17 1.96 35.5 1L4W

ASG LEU A 22 22 E Strand -69.97 133.22 51.5 1L4W

ASG CYS A 23 23 E Strand -99.29 111.44 0.0 1L4W

ASG TYR A 24 24 E Strand -96.27 149.93 62.1 1L4W

ASG ARG A 25 25 E Strand -118.58 83.18 17.2 1L4W

ASG LYS A 26 26 E Strand -78.88 139.08 32.1 1L4W

ASG MET A 27 27 E Strand -156.68 130.00 34.7 1L4W

ASG TRP A 28 28 E Strand -135.36 -157.76 57.9 1L4W

ASG CYS A 29 29 E Strand -110.51 120.76 33.8 1L4W

ASG ASP A 30 30 E Strand -140.95 83.38 68.8 1L4W

ASG ALA A 31 31 B Bridge 96.09 -30.41 13.7 1L4W

ASG PHE A 32 32 T Turn -64.73 -31.60 104.7 1L4W

ASG CYS A 33 33 T Turn -76.46 -35.27 97.0 1L4W

ASG SER A 34 34 T Turn -92.60 -74.82 109.8 1L4W

ASG SER A 35 35 T Turn -142.87 -52.13 100.6 1L4W

ASG ARG A 36 36 C Coil -73.80 -90.71 148.5 1L4W

ASG GLY A 37 37 E Strand -161.56 -176.78 0.0 1L4W

2012 Avraham Samson - Faculty of Medicine - Bar Ilan University

42

Structure Analysis

• Assign secondary structure for amino acids from 3D structure

• Generate solvent accessible area for amino acids from 3D structure

• Most widely used tool: DSSP (Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Kabsch and Sander, 1983)

2D: Contact Map Prediction

1 2 ………..………..…j...…………………..…n 123....i.......n

3D Structure 2D Contact Map

Cheng, Randall, Sweredoski, Baldi. Nucleic Acid Research, 2005

Distance Threshold = 8Ao

3D Structure Prediction Tools

• MULTICOM (http://sysbio.rnet.missouri.edu/multicom_toolbox/index.html )

• I-TASSER (http://zhang.bioinformatics.ku.edu/I-TASSER/)• HHpred (

http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred)• Robetta (http://robetta.bakerlab.org/)• 3D-Jury (http://bioinfo.pl/Meta/)• FFAS (http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl)• Pcons (http://pcons.net/)• Sparks (http://phyyz4.med.buffalo.edu/hzhou/anonymous-fold-sp3.html)• FUGUE (http://www-cryst.bioc.cam.ac.uk/%7Efugue/prfsearch.html)• FOLDpro (http://mine5.ics.uci.edu:1026/foldpro.html)• SAM (http://www.cse.ucsc.edu/research/compbio/sam.html)• Phyre (http://www.sbg.bio.ic.ac.uk/~phyre/)• 3D-PSSM (http://www.sbg.bio.ic.ac.uk/3dpssm/)• mGenThreader (http://bioinf.cs.ucl.ac.uk/psipred/psiform.html)