A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Bioinformatics:Protein Structure
Brian Y. Chen
Lehigh University
Dept. Computer Science & Engineering
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Informatics is the science of designing methodologies for
gathering, analyzing, integrating, and visualizing data
used to form an opinion.
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Word Lens, iPhone App
released: Dec 16, 2010
Translation Off Translation On
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
What is Bioinformatics?The science of designing methodologies for
gathering, analyzing, integrating, and visualizing dataused to form an opinion on Biological Systems.
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
GenBank – Global public repository of DNA sequences
Protein Data Bank – Global public repository of protein structures
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Biological systems are nestedand interacting machines
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Individual molecules are thefoundation of all systems
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Structural biology connectsstructure with function
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
1946: Sumner
1962: Crick, Watson, Wilkins1962: Perutz, Kendrew
1964: Hodgkin
1972: Anfinsen
1982: Klug
1988: Deisenhofer, Huber,Michel
1991: Ernst
1997: Walker
2002: Wuthrich2003: MacKinnon2006: Kornberg2009: Steitz, Yonath
Tim
elin
e of
Nob
el P
rizes
in S
truc
tura
l Bio
logy
Ramakrishnan,
Number of Entries in the Protein Data Bank
10000
20000
30000
40000
50000
60000
2009
2005
2000
1995
1990
1985
1980
1975
Total
Annual
Structural biology has becomerich in data
year
# entriesNIH ProteinStructure Initiative
Source: www.pdb.org
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Structural bioinformatics connects structurewith function at scale and with precision
StructuralBioinformatics
StructurePrediction
IntegrativeMethods
MolecularSimulation
StructureAlignment
FunctionalSite
Comparison
Docking
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
The General Problem:
Gather, analyze, integrate, and visualize data used to form an opinion on Biological Systems.
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
HAWPFMVSLQL-AGG----HFCGATLIAPNFV-----MSAAHCVANVNV
O
O
N
C
C
N
C
C
C
C
C
C
C
C
C
C
C
O
N
C
A
F
G
O
N
C
C
O
C
C
C CL
Proteins are chains of amino acids
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
HAWPFMVSLQL-AGG----HFCGATLIAPNFVMSAAHCVANVNV
HAWPFMVSLQL-RGG----HFCGATLIAPNFVMSAAHCVANVK-
HSWPWQISLQY-SKNDAWGHTCGGTLIASNYVLTAAHCISNAKT
HSRPYMVSLQV-Q---G-NHFCGGTLIHPQFVMTAAHCIDKINP
LA-PYIASLQRNRGG----HFCGGTLIHQQFVMTAAHCINSRNV
O
O
N
C
C
N
C
C
C
C
C
C
C
C
C
C
C
O
N
C
A
F
G
O
N
C
C
O
C
C
C CL
Similar sequences imply similar function
FASTA Mackey, et al. Mol. Cell. Prot. 2002.CLUSTALW Larkin et al. Bioinformatics., 2007.BLAST Altschul et al. Nuc. Acid. Res. 1997.
ConSurf Glaser, et al. Bioinformatics, 2003.Evolutionary Trace Mihalek, et al. Proteins, 2006.HMAP Tang, et al. J. Mol. Biol. 2003.
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Similar functional sites imply similar function
MASH Chen et al, J. Comput. Biol., 2007Combinatorial Extension Jia et al, J. Comput. Biol., 2004Geometric Hashing Nussinov et al, Proteins, 2001pevoSOAR Tseng et al, J. Mol. Biol., 2009Ska Petrey et al, Methods Enzymol. 2003. Geometric Sieving Chen et al, J. Bioinf. Comput. Biol., 2007PINTS Stark et al, Nucleic Acids Res, 2003.JESS Barker et al, Bioinformatics, 2003.Dali Holm et al, Bioinformatics, 2008.
MotifTarget
Match
Known function
Unknown function
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Protein surfaces reveal functional sites
GRASP2 Petrey et al, Methods Enzymol. 2003. SURFNET Glaser et al, Proteins, 2006.CASTp Dundas et al, Nucleic Acids Res. 2006.SCREEN Nayal et al, Proteins, 2006eF-seek Kinoshita et al, Nucleic Acids Res. 2007APROPOS Peters et al, J. Mol. Biol., 1996
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Similarity doesn’t tell us everything
How does this protein fit in the system?
What parts of the protein make it work?
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Specificity is preferential binding
Specificity is an aspect of function
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Proteins with the same function can havedifferent specificity
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
VASP: Volumetric Analysis of the Surfaces of Proteins• Identify amino acids that alter cavity shape• Identify subcavities that alter cavity shape
VASP isolates differences in cavity shape
Results: VASP finds influences on specificity
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
DefineCavities
VolumetricComparisonOf Cavities
AlignStructures
Output:VolumetricDifferences
Input:A ProteinFamily
The VASP procedure
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
DefineCavities
VolumetricComparisonOf Cavities
AlignStructures
Output:VolumetricDifferences
Input:A ProteinFamily
SkaPetrey D, Honig B. GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol. 374:492-509. 2003.
The VASP procedure
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
DefineCavities
VolumetricComparisonOf Cavities
AlignStructures
Output:VolumetricDifferences
Input:A ProteinFamily
Computational Solid Geometry
The VASP procedure
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Union
A B
Intersection
Difference
Boolean SetOperations
Computational Solid Geometry (CSG)
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
DefineCavities
VolumetricComparisonOf Cavities
AlignStructures
Output:VolumetricDifferences
Input:A ProteinFamily
The VASP procedure
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Compute the convex hull
Barber, C.B., Dobkin, D.P., and Huhdanpaa, H.T., ACM T Math Software, 22(4):469-483Schematic
A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic
CSG intersection with the envelope surface
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
DefineFunctionalCavities
VolumetricComparisonOf Cavities
AlignStructures
Output:VolumetricDifferences
Input:A ProteinFamily
• Amino Acids affecting cavity shape
• Subcavities affecting cavity shape
The VASP procedure
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
1 2
345
6 7
8 9 10
1 2 3 4 5 6 7 8 9 10 …
Å3
0
Finding amino acids that affect cavity shape
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
DefineFunctionalCavities
VolumetricComparisonOf Cavities
AlignStructures
Output:VolumetricDifferences
Input:A ProteinFamily
• Amino Acids affecting cavity shape
• Subcavities affecting cavity shape
The VASP procedure
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Intersection
What is the maximum extent of B?
A B
Union
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
output
Intersection Union
Difference
All parts of A that are not in any part of B
A B
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
• Serine Proteases: Same function, different specificity– Trypsins
– Elastases
– Chymotrypsins
• Experiments– VASP identifies amino acids that influence specificity
– VASP identifies subcavities that influence specificity
Results
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Serine Proteases
Chymotrypsin ClanCatalytic Triad: His-Asp-Ser
Subtilisin Clan (Asp-His-Ser)
Other clans (not used)Oligopeptidases (Asp-Ser-His)
Carboxypeptidases (Ser-Asp-His)
Others..
Rawlings ND, Barrett AJ. Evolutionary families of peptidases.Biochem J. 1993 Feb 15; 290(pt.1):205-18.
Chymotrypsins
Trypsins
Elastases
Subtilisins
The serine protease family
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Serine Protease
P1P2P3P4 P3’P2’P1’N C
HydrolyzePeptide Bonds
Blow DM, Birktoft JJ, Hartley BS. 1969. Role of a buried acid group in themechanism of action of chymotrypsin. Nature 221:337-340
CatalyticTriad
Serine proteases break up other proteins
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
1a0j Atlantic Salmon Trypsin
Asp 102His 57
Ser 195
Triad Catalytic Triad
Ala-Arg Peptide Substrate
(from 1fn8)
A serine protease up close
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Elastases3.4.21.36
Trypsins3.4.21.4
Chymotrypsins3.4.21.1
Alignment by Catalytic triad + S1 residue(Ca and Cb atoms)
Structural Alignment
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
S1S2S3S4 S3’S2’S1’
P1P2P3P4 P3’P2’P1’N C
Serine Protease
Schechter I and Berger A. On the size of the active site in proteases. I. Papain. Biochem. Biophys. Res. Commun. 1967 Apr 20;27(2):157-62
Serine proteases have specificity fordifferent sequences of amino acids
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Chymotrypsins
S1
P1P2 P1’N C
Chymotrypsin
{ Tyr, Phe, Trp }
Ser 189
Wesolowska, Krokoszynska, Krowarsch, Otlewski, Enhancement of Chymotrypsin-inhibitor/substrate interactions by 3M NaCl Biochimica et Biophysica Acta 1545 (2001) 78-85
1ab9Bovine Chymotrypsin
Triad
Chymotrypsins prefer big amino acids
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Trypsins
Asp 189
Lesner, Kupryszewski, and Rolka. Chromogenic Substrates of Bovine beta-trypsin: The influence of an Amino Acid Residue in P1 Position on their Interaction with the EnzymeBiochemical and Biophysical Research Communications, 285, pp 1350-1353 (2001)
S1
P1P2 P1’N C
Trypsin
{ Arg, Lys }
1a0jAtlantic Salmon Trypsin
Triad
+
-
Trypsins bind positively charged residues
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Elastases
S1
P1P2 P1’N C
Elastase
{ Ala, Gly, Val, .. }
1b0ePorcine Pancreatic Elastase
Ser 189
Triad
Elastases prefer small amino acids
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Pro
tein
Dat
a B
ank
ElastasesTrypsinsChymotrypsins
58individualstructures
45Non-mutants
2
< 90%SequenceIdentity
371individualstructures
290Non-mutants
11
< 90%SequenceIdentity
91individualstructures
91Non-mutants
2
< 90%SequenceIdentity
The data is filtered for noise and bias
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Chymotrypsin Elastase
Shotton D.M., Watson H.C. Three-dimensional structure of tosyl-elastase.Nature 225(5235): 811-816. 1970.
VASP finds amino acids in elastasethat influence specificity
Amino Acid Sequence Number
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Chymotrypsin Trypsin
Steitz T.A., Henderson R., Blow D.M. Structure of crystalline alpha-chymotrypsin. 3. Crystallographic studies ofsubstrates and inhibitors bound to the active site of alpha-chymotrypsin. J. Mol. Biol. 46(2): 337-348. 1969.
VASP finds amino acids in trypsinsthat influence specificity
Amino Acid Sequence Number
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
Trypsin Intersection Elastase Union
VASP finds subcavities in trypsins andelastases that influence specificity
A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen
• VASP can identify:– Amino acids that influence specificity
– Subcavities that influence specificity
• Contributions– The first unsupervised analysis of protein structures that identifies
active components of functional sites
– The first algorithm to isolate the basis for specificity in protein structures
– The first representation of proteins using smooth solid volumes
• What can we use VASP for ?• Identify amino acids that might change specificity in drug resistance
• Influential subcavities point to drug designs that bind more specifically, and thus reduce side effects
Discussion