21
Using Biology to Teach Geometry: Protein Structure Tessellations in Matlab Majid Masso, Ph.D. Laboratory for Structural Bioinformatics George Mason University http://binf.gmu.edu/mmasso [email protected]

Using Biology to Teach Geometry: Protein Structure

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Biology to Teach Geometry: Protein Structure

Using Biology to Teach Geometry: Protein Structure Tessellations in Matlab

Majid Masso, Ph.D. Laboratory for Structural Bioinformatics

George Mason Universityhttp://binf.gmu.edu/mmasso [email protected]

Page 2: Using Biology to Teach Geometry: Protein Structure

Proteins in Brief• Intra- and inter-cellular workhorses of all organisms• Building blocks: amino acids

• 20 distinct types in nature (A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y)

• ~200 ordered, successively linked amino acids/protein (varies widely across proteins, from tens to thousands)

• Protein structure representations: all-atom backbone ribbon tessellation

Page 3: Using Biology to Teach Geometry: Protein Structure

Amino Acids• Atomic constituents: carbon (C), nitrogen (N), oxygen (O), and

hydrogen (H)• Amino Acids C (cysteine) and M (methionine) each also contain

a sulfur (S) atom• Coordinates of hydrogen atoms are only available for structures

solved at very high resolution• Example –

N

C

CC

C

O

=

+H3N Cα C H O

O- CH2 CH

CH3 H3C

Identical for al l amino acids

Unique side chain (R group) for each amino acid

Leucine (Leu or L) C

Leucine (Leu or L)

Page 4: Using Biology to Teach Geometry: Protein Structure

Peptide Bond• Backbone linkage between consecutive amino acids in the

growing, linear protein chain (the “primary sequence”)• Links the backbone C atom of amino acid n–1 to the

backbone N atom of amino acid n, with release of H2O=

+H3N Cα C H O

O- R1

+ +H3N H Cα R2

C O-

= O

H2O

+H3N Cα Cα

=

O

R1 C C

H H N

R2 H O-

O

=

peptide bond L – L

Page 5: Using Biology to Teach Geometry: Protein Structure

Protein Data Bank (PDB, http://www.pdb.org)

Page 6: Using Biology to Teach Geometry: Protein Structure

PDB File Format

X Y Z

Page 7: Using Biology to Teach Geometry: Protein Structure

HIV-1 Protease CA Coordinate Data

Page 8: Using Biology to Teach Geometry: Protein Structure

(a) all-atom (b) backbone only (c) ribbon diagram (d) Cα trace

Example: HIV-1 Protease (PDB ID: 3PHV)

P1

F99

Page 9: Using Biology to Teach Geometry: Protein Structure

Delaunay Tessellation of Protein Structure

Aspartic Acid(Asp or D)

Abstract every amino acid to a pointAtomic coordinates – Protein Data Bank (PDB)

Cα coordinates in 3D

D3

A22

S64

L6F7

G62

C63

K4

R5

Delaunay tessellation: 3D “tiling” of space into non-overlapping, irregular tetrahedral simplices. Each simplex objectively identifies a quadruplet of nearest-neighbor amino acids at its vertices.

Page 10: Using Biology to Teach Geometry: Protein Structure

Tessellation Example: HIV-1 Protease (PR)

(a) Cα trace (b) complete tessellation (convex hull of simplices) (c) tessellation subject to a 12 Angstrom edge length cutoff

Page 11: Using Biology to Teach Geometry: Protein Structure

Delaunay Tessellation in Matlab

Page 12: Using Biology to Teach Geometry: Protein Structure

Five Simplex Categories

Singh et al. (1996) J. Comput. Biol., 3, 213-222

Page 13: Using Biology to Teach Geometry: Protein Structure

Simplex Categories Example: HIV-1 PR

Page 14: Using Biology to Teach Geometry: Protein Structure

Simplex Categories in Matlab

Page 15: Using Biology to Teach Geometry: Protein Structure

Tetrahedrality and Volume

Vectors a, b, c, and d represent the 3D coordinates of the tetrahedral vertices.

Singh et al. (1996) J. Comput. Biol., 3, 213-222

Page 16: Using Biology to Teach Geometry: Protein Structure

Tetrahedrality and Volume Example: HIV-1 PR

109 simplicesmean T = 0.20mean V = 10.09

89 simplicesmean T = 0.15mean V = 9.45

73 simplicesmean T = 0.11mean V = 41.51

95 simplicesmean T = 0.18mean V = 19.27

16 simplicesmean T = 0.18mean V = 5.61

Page 17: Using Biology to Teach Geometry: Protein Structure

Tetrahedrality and Volume in Matlab

Page 18: Using Biology to Teach Geometry: Protein Structure

Counting Amino Acid Quadruplets

only realistic choice for proteins

only realistic choice to get enough quadruplets for each of the 8855 types (by tessellating a large, diverse set of protein structures) and obtain a frequency distribution

n = size of amino acid alphabet = 20; r = size of the subsets = 4

Page 19: Using Biology to Teach Geometry: Protein Structure

Counting Amino Acid QuadrupletsRepetitions – yes, permutations – no: a more “hands-on” counting approach

D F E C

C C D E

C C D D

C C C D

C C C C

204

1920

2 ⋅

202

20 19⋅

20

Total: 8,855 distinct quadruplets

Page 20: Using Biology to Teach Geometry: Protein Structure

Predicting Alpha Helix Locations in Proteins

actual locations{predicted locations{

Amino Acid Sequence Numbers

Taylor et al. (2005) Proteins, 60, 513-524

Amino acid i can participate in up to 4 distinct simplices consisting of consecutive amino acids at the vertices: (i, i+1, i+2, i+3), (i-1, i, i+1, i+2), (i-2, i-1, i, i+1), and (i-3, i-2, i-1, i). The step-graph shows the number of such simplices (“t-number” values of 0, 1, 2, 3, or 4) for each amino acid in protein structure 2mnr. Amino acids with a t-number of 4 strongly correlate with those occurring in alpha helices.

Page 21: Using Biology to Teach Geometry: Protein Structure

References• To obtain a copy of these slides:

(http://binf.gmu.edu/mmasso/MAA2010.pdf)

• Protein structure repository: Protein Data Bank (http://www.pdb.org)

• Structure visualization: Chimera (http://www.cgl.ucsf.edu/chimera/)

• Delaunay tessellation:• Matlab (http://www.mathworks.com/)• Qhull (http://www.qhull.org/)

• Programming and data formatting: Perl(http://www.activestate.com/activeperl/)