Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Coiled Coils: A Tractable Problem for Bioinformatics
Vincent Waldman
Indiana UniversityJanuary 30, 2008
Helical Net Representation of Coiled Coil Interface
abcdefgabcdefgabcdefgabcdefg
Left-Handed Super Helical Twist
2FXO
Myosin
A Schematic View of Coiled Coils
Knobs into Holes
PDB: 1YSA
Periodic, Simple right? GCN4
1YSA
bcdefgabcdefgabcdefgabcdefgabc
KQLEDKVEELLSKNYHLENEVARLKKLVGE
Oligomeric State
PDB: 2HY6
PDB: 1GCLPDB: 2ZTA PDB: 1GCMN
GCN4-p1 a and d Ile a Ilee and g Ala
N
Relative Orientation
PDB: 2ZTA
N
GCN4-p1
NPDB: 1SER
Residues 530-597
N C
SRS
1AQ5
Cartilage Matrix Protein
Variety of Coiled Coils
2TMA
Tropomyosin
1SER
SRS
1T98
MukF
2NPS
SNAREs
2FXO
Myosin
GCN4
1YSA
1HTM
Influenza Hemagglutinin
2SPC
SpectrinHepatidis
Delta Antigen
1A92
Tsr
1QU7
An Important Example of Unknown Coiled Coil Structure
SMCBSMukBEC
Coiled Coil Algorithms• SOCKET
– Walshaw and Woolfson (2001) JMB, 1437• Coils
– Lupas et al (1991) Science, 1162• Paircoil
– Berger et al (1995) PNAS, 8259
• A Genome wide search for coiled coils that uses a version of Paircoil
• Biochemistry that has allowed programs to be designed that predict potential interacting coiled coil pairs
SOCKET: Automatic Identification of Coiled Coils in Structures
GCN4
1YSA
bcdefgabcdefgabcdefgabcdefgabc
KQLEDKVEELLSKNYHLENEVARLKKLVGE
Tsr
1QU7
SOCKET
•SOCKET is a program that unambiguously identifies coiled coil motifs in protein structure and assigns register positions
•This program makes assignments based on structural features as opposed to sequence
Representing Side Chains
Walshaw and Woolfson (2001) JMB, 1437
Knobs Into Holes
Walshaw and Woolfson (2001) JMB, 1437
Types of Knobs into Holes
Walshaw and Woolfson (2001) JMB, 1437
Assigning Register With Complementary Knobs Into Holes
i
i+3i+4
i+7
i+3 i+4
i
i+7
kn = hn, 2= d position
If kn = hn, 3 would be an a position
Walshaw and Woolfson (2001) JMB, 1437abcdefgabcdefgabcdefgabcdefgabc
N-term Top Hole
C-term Bottom Hole
Complementary Knobs into Holes in Trimers and Higher Order
Structures
Walshaw and Woolfson (2001) JMB, 1437
Non-Coiled Coil Knobs Into Holes
Walshaw and Woolfson (2001) JMB, 1437
Minimum Layers in Higher Order Structures
Walshaw and Woolfson (2001) JMB, 1437
Coiled Coil Knobs Into Holes
Walshaw and Woolfson (2001) JMB, 1437
SOCKET Recap• Side chains represented as mean of coordinates
• A residue is considered a knob if it touches 4 or more residues under a specified cutoff
• Holes are defined as the four closest residues to a knob
• Register is assigned through complementary Knobs into Holes
To Use SOCKET
• Walshaw, J. & Woolfson, D.N. (2001), SOCKET: A Program for Identifying and Analysing Coiled-coil Motifs Within Protein Structures, J. Mol. Biol., 307 (5), 1427-1450
• http://www.lifesci.sussex.ac.uk/research/woolfson/html/coiledcoils/
– Requires• PDB-format file with 3D coordinates
GCN4
1YSA
COILS• COILS– is a program that predicts the probability
of coiled coil formation from protein sequence
• COILS compares protein sequence to a database of parallel two-stranded coiled coils
• The comparison generates a similarity score• The score is compared to score distributions of
globular and coiled coil proteins to generate a probability of coiled coil formation
2TMA
GenBank db• ~2,000,000 residues
Random Generated Sequence db• ~52,200 residues
Database AssemblyGlobular db
– All non-redundant non-cc proteins in pdb
• 150 proteins• ~32,600 residues
CC db– Tropomyosin– Myosin– Keratins
• All parallel dimers
• Extracted from Genbank
• ~17,500 Residues
Tropomyosin
2FXO
Myosin
Tabulate Relative Frequencies of Occurrence
Normalized Probabilities = νk(A) = [fk(A)/Tk] / WA
Lupas et al (1991) Science, 1162
Sliding Window To Calculate Residue Score
n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI
Lupas et al (1991) Science, 1162
n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI
Window 1
Sliding Window To Calculate Residue Score
Lupas et al (1991) Science, 1162
n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30
a b c d e f g g a b
(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1
POI
Register 1
Window 1
Sliding Window To Calculate Residue Score
Lupas et al (1991) Science, 1162
Frequency Values
Lupas et al (1991) Science, 1162
n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30
a b c d e f g g a b
(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1
b c d e f g a a b c
(νb* νc* νd* νe* νf* νg* νa* ... νa)1/28 = Sk2
POI
Register 1
Register 2
Window 1
Sliding Window To Calculate Residue Score
Lupas et al (1991) Science, 1162
n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30
a b c d e f g g a b
(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1
b c d e f g a a b c
(νb* νc* νd* νe* νf* νg* νa* ... νa)1/28 = Sk2
g a b a b c d d e f
(νg* νa* νb* νa* νb* νc* νd* ... νd)1/28 = Sk7
POI
Register 1
Register 2
Register 7
Window 1
Sliding Window To Calculate Residue Score
Lupas et al (1991) Science, 1162
n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI
Window 2
For each residue k there are
•28 possible windows
•With 7 registers each
Sk is the highest of all of these 196 possible score
Sliding Window To Calculate Residue Score
Lupas et al (1991) Science, 1162
Steps to Calculating Residue Score
i. Assign Window ii. Assign Heptad Register to Windowiii. Assign Each Residue in Window the
Corresponding Relative Frequency iv. Take Geometric Meanv. Assign New Heptad Register and repeat iii
and ivvi. Assign New Window and Repeat ii-vvii. Take Largest Score
Lupas et al (1991) Science, 1162
Gaussian Distribution of CC Scores
• Scores of GenBank and Random Sequences were performed for comparison and scaling purposes
Lupas et al (1991) Science, 1162
Probability a Residue is CC
P(S)=GCC(S)/[30Gg(S)+ GCC(S)]
Lupas et al (1991) Science, 1162
Coils Output For MukB
http://www.ch.embnet.org/software/COILS_form.html
Coils Recap• Score Generated for each Residue of a
Sequence Based on Relative Frequencies
• Scores interpreted as a probability for Forming A Coiled Coil
• Limitations– Trained on parallel dimers– Biased towards hydrophillic charge rich sequences
Coils Options
• Scoring Options– MTK matrix, as described– MTIDK matrix, weights residue frequency
according to CC families
• Weighting a and d positions
• Window sizes 14, 21, 28
To Use Coils• Lupas, A., Van Dyke, M., and Stock, J. (1991)
Predicting Coled Coils from Protein Sequences,Science 252:1162-1164.
• http://www.ch.embnet.org/software/COILS_form.html
• Requirements– Almost any standard format of protein sequence can be
used as an input
Coils Spin Off
PCOILS: most recent version of Coils– Trained on a larger data set– More options for matrices– Uses BLAST to calculate probabilities on alignments
– Is much slower than Coils
To use is part of the REPPER server– http://toolkit.tuebingen.mpg.de/pcoils
PAIRCOIL
• PAIRCOIL- like COILS is a program that predicts the probability of coiled coil formation from protein sequence
• PAIRCOILS differs from COILS in that it uses pairwise interactions to determine similarity scores instead of relative occurrences in register positions
Database AssemblyCC-database
– Myosin– Tropomyosin– Intermediate filaments
• From Genpept• ~58,200 residues
PDB-minus database– PDB database with known
coiled coils removed– The multiple alignment
program pileup reduced protein sequences to 286 classes
– One representative structure from each class was used
– 63,100 residues
PIR-minus database •PIR database with the myosins, tropomyosins, and IF proteins removed•~7,300,000 residues
Computing Probabilities
n1 n2 n3 n4 k5 n6 n7 n8 n9 n10 n11
a b c d e f g a b c d
k-4 k-3 k-2 k-1 k k+1 K+2 K+3 K+4 K+5 K+5…
POI
Register
Representation
In this register
k=e
k+1 = f
K+2 = g
Computing Normalized Probabilities
• Single occurrence frequencies are computed as they were in COILS
– Relative Frequency of Occurrence νk(A)=[fk(A)/Tk] / WAor
νk+i(B)=[fk+i(B)/Tk+i] / WB
• Correlations Occurrence Frequencies
νk,k+1(A,B)=[fk,k+1(A,B)/Tk,k+1] / WAB-i
Berger et al (1995) PNAS, 8259
Tabulating Pairwise Correlations
( )( ) ( )BA
BAAP
ikk
ikkikk
+
++ =
ννν ,
ln)( ,,
Berger et al (1995) PNAS, 8259
Tabulating Pairwise Correlations
Berger et al (1995) PNAS, 8259
Color Coding Pairwise Correlations
k, k+7
k+2
k+3
k+4
k+5
k+6
k+1
Berger et al (1995) PNAS, 8259
Using Correlation Probabilities to Predict Coiled Coil
( ) ( ) ( )( ) ( ) ( )DCB
DACABAAP
kkk
kkkkkkk
421
4,2,1, ,,,ln
31)(
+++
+++=ννν
ννν
Berger et al (1995) PNAS, 8259
Calculating Residue Score
1. Set a Sliding Window of 30-residues to include residue k
2. Sum all tripartite correlation probabilities for each residue in the window for each register
3. Shift Window and repeat 24. Repeat steps 2 and 3 until all possible
windows and registers have been scored5. Residue score is highest value for all possible
window over and registers
Comparison of COILS to PAIRCOIL
PDB-minus database for non-coiled coils
Berger et al (1995) PNAS, 8259
Comparison of COILS to PAIRCOIL
PIR-minus database for non-coiled coils
Berger et al (1995) PNAS, 8259
Using Correlation Score to predict Probability
Berger et al (1995) PNAS, 8259
MultiCoil a PairCoil Spin Off
Green: PDB-minus ~39,000 residues
Red: 2-strand DB ~58,200 residues
Blue: 3-strand DB ~6,300 residues
•Distinguishes between globular proteins, two-stranded CC’s and Three-stranded CC’s
Wolf et al (1997) Protein Science, 1179
To use Paircoil
• Paircoil:– http://groups.csail.mit.edu/cb/paircoil/cgi-
bin/paircoil.cgi
• Multicoil– http://groups.csail.mit.edu/cb/multicoil/cgi-
bin/multicoil.cgi
• Paircoil2– http://groups.csail.mit.edu/cb/paircoil2/paircoil2.html
Output Comparisons
Coils Paircoil
Multicoil Output
A Computationally Directed Screen Identifying Interacting CCs in Yeast
– Protein-interaction motifs are often identified with computational methods but potential ligands are not
– A potential ligand for a CC is a CC
Newman et al (2000) PNAS, 13203
Using Multicoil to Identify Potential Pairing Partners
•~6,000 ORF in yeast
•~300 two-stranded
•~250 three-stranded
•~1:11 proteins in yeast potential have CCs
•~half of these have no known function
Newman et al (2000) PNAS, 13203
Yeast Two Hybrid Assay
GAR-Y
GDBD-X
CC motifs often work well for X and Y because many times they can fold autonomously
Fields, Song (1989) Nature, 245
GAR-Y
GDBD-X
GAR-Y
GDBD-X
162 x 162 = 26244 possible combinations (about half redundant)
Identified 213 interactions
GAR-Y
GDBD-X
162 x 162 = 26244 possible combinations (about half redundant)
Identified 213 interactions
Blow Up of Yeast Two Hybrid Results
GAR-Y
GDBD-X
Newman et al (2000) PNAS, 13203
Limitations to this Method
• Protein-Protein Interactions may require non-CC contacts in addition to CC contacts. – This study only identified 6 of the 25 known
interactions assayed from the yeast genome
• Some CC types not easily detected with Y2H– Parallel homodimeric constructs not easily detected
• False Positives possible but arguably not likely
Human bZips a Testing Ground For Software that Predicts Protein Protein Interactions
~53 Unique Human bZips
~1131 Unique Heterodimers
Newman, Keating (2003) Science, 2097
Current Generation of CC programs
• Aim to predict novel interacting CC partners
• Two notable studies attempt to describe bZip proteins with considerable success
1. Weights potential interstrand interactions based on many biophysical studies– Fong JA, Keating AE, Singh M (2004) Genome Biology,
5:R11
2. Physical modeling– Grigoryan G, Keating AE (2006) JMB, 355, 1125
Coiled Coil Recap
SOCKET: Identifies and assigns coiled coil registers to structures
Coils, Paircoil, Multicoil and progeny: predict the likelihood that a particular sequence belongs to a CC
Programs are beginning to be able to predict potential coiled coil binding partners