69

Coiled Coils: A Tractable Problem for Bioinformatics

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coiled Coils: A Tractable Problem for Bioinformatics
Page 2: Coiled Coils: A Tractable Problem for Bioinformatics

Coiled Coils: A Tractable Problem for Bioinformatics

Vincent Waldman

Indiana UniversityJanuary 30, 2008

Page 3: Coiled Coils: A Tractable Problem for Bioinformatics

Helical Net Representation of Coiled Coil Interface

abcdefgabcdefgabcdefgabcdefg

Page 4: Coiled Coils: A Tractable Problem for Bioinformatics

Left-Handed Super Helical Twist

2FXO

Myosin

Page 5: Coiled Coils: A Tractable Problem for Bioinformatics

A Schematic View of Coiled Coils

Page 6: Coiled Coils: A Tractable Problem for Bioinformatics

Knobs into Holes

PDB: 1YSA

Page 7: Coiled Coils: A Tractable Problem for Bioinformatics

Periodic, Simple right? GCN4

1YSA

bcdefgabcdefgabcdefgabcdefgabc

KQLEDKVEELLSKNYHLENEVARLKKLVGE

Page 8: Coiled Coils: A Tractable Problem for Bioinformatics

Oligomeric State

PDB: 2HY6

PDB: 1GCLPDB: 2ZTA PDB: 1GCMN

GCN4-p1 a and d Ile a Ilee and g Ala

N

Page 9: Coiled Coils: A Tractable Problem for Bioinformatics

Relative Orientation

PDB: 2ZTA

N

GCN4-p1

NPDB: 1SER

Residues 530-597

N C

SRS

Page 10: Coiled Coils: A Tractable Problem for Bioinformatics

1AQ5

Cartilage Matrix Protein

Variety of Coiled Coils

2TMA

Tropomyosin

1SER

SRS

1T98

MukF

2NPS

SNAREs

2FXO

Myosin

GCN4

1YSA

1HTM

Influenza Hemagglutinin

2SPC

SpectrinHepatidis

Delta Antigen

1A92

Tsr

1QU7

Page 11: Coiled Coils: A Tractable Problem for Bioinformatics

An Important Example of Unknown Coiled Coil Structure

SMCBSMukBEC

Page 12: Coiled Coils: A Tractable Problem for Bioinformatics

Coiled Coil Algorithms• SOCKET

– Walshaw and Woolfson (2001) JMB, 1437• Coils

– Lupas et al (1991) Science, 1162• Paircoil

– Berger et al (1995) PNAS, 8259

• A Genome wide search for coiled coils that uses a version of Paircoil

• Biochemistry that has allowed programs to be designed that predict potential interacting coiled coil pairs

Page 13: Coiled Coils: A Tractable Problem for Bioinformatics

SOCKET: Automatic Identification of Coiled Coils in Structures

GCN4

1YSA

bcdefgabcdefgabcdefgabcdefgabc

KQLEDKVEELLSKNYHLENEVARLKKLVGE

Tsr

1QU7

Page 14: Coiled Coils: A Tractable Problem for Bioinformatics

SOCKET

•SOCKET is a program that unambiguously identifies coiled coil motifs in protein structure and assigns register positions

•This program makes assignments based on structural features as opposed to sequence

Page 15: Coiled Coils: A Tractable Problem for Bioinformatics

Representing Side Chains

Walshaw and Woolfson (2001) JMB, 1437

Page 16: Coiled Coils: A Tractable Problem for Bioinformatics

Knobs Into Holes

Walshaw and Woolfson (2001) JMB, 1437

Page 17: Coiled Coils: A Tractable Problem for Bioinformatics

Types of Knobs into Holes

Walshaw and Woolfson (2001) JMB, 1437

Page 18: Coiled Coils: A Tractable Problem for Bioinformatics

Assigning Register With Complementary Knobs Into Holes

i

i+3i+4

i+7

i+3 i+4

i

i+7

kn = hn, 2= d position

If kn = hn, 3 would be an a position

Walshaw and Woolfson (2001) JMB, 1437abcdefgabcdefgabcdefgabcdefgabc

N-term Top Hole

C-term Bottom Hole

Page 19: Coiled Coils: A Tractable Problem for Bioinformatics

Complementary Knobs into Holes in Trimers and Higher Order

Structures

Walshaw and Woolfson (2001) JMB, 1437

Page 20: Coiled Coils: A Tractable Problem for Bioinformatics

Non-Coiled Coil Knobs Into Holes

Walshaw and Woolfson (2001) JMB, 1437

Page 21: Coiled Coils: A Tractable Problem for Bioinformatics

Minimum Layers in Higher Order Structures

Walshaw and Woolfson (2001) JMB, 1437

Page 22: Coiled Coils: A Tractable Problem for Bioinformatics

Coiled Coil Knobs Into Holes

Walshaw and Woolfson (2001) JMB, 1437

Page 23: Coiled Coils: A Tractable Problem for Bioinformatics

SOCKET Recap• Side chains represented as mean of coordinates

• A residue is considered a knob if it touches 4 or more residues under a specified cutoff

• Holes are defined as the four closest residues to a knob

• Register is assigned through complementary Knobs into Holes

Page 24: Coiled Coils: A Tractable Problem for Bioinformatics

To Use SOCKET

• Walshaw, J. & Woolfson, D.N. (2001), SOCKET: A Program for Identifying and Analysing Coiled-coil Motifs Within Protein Structures, J. Mol. Biol., 307 (5), 1427-1450

• http://www.lifesci.sussex.ac.uk/research/woolfson/html/coiledcoils/

– Requires• PDB-format file with 3D coordinates

GCN4

1YSA

Page 25: Coiled Coils: A Tractable Problem for Bioinformatics

COILS• COILS– is a program that predicts the probability

of coiled coil formation from protein sequence

• COILS compares protein sequence to a database of parallel two-stranded coiled coils

• The comparison generates a similarity score• The score is compared to score distributions of

globular and coiled coil proteins to generate a probability of coiled coil formation

Page 26: Coiled Coils: A Tractable Problem for Bioinformatics

2TMA

GenBank db• ~2,000,000 residues

Random Generated Sequence db• ~52,200 residues

Database AssemblyGlobular db

– All non-redundant non-cc proteins in pdb

• 150 proteins• ~32,600 residues

CC db– Tropomyosin– Myosin– Keratins

• All parallel dimers

• Extracted from Genbank

• ~17,500 Residues

Tropomyosin

2FXO

Myosin

Page 27: Coiled Coils: A Tractable Problem for Bioinformatics

Tabulate Relative Frequencies of Occurrence

Normalized Probabilities = νk(A) = [fk(A)/Tk] / WA

Lupas et al (1991) Science, 1162

Page 28: Coiled Coils: A Tractable Problem for Bioinformatics

Sliding Window To Calculate Residue Score

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI

Lupas et al (1991) Science, 1162

Page 29: Coiled Coils: A Tractable Problem for Bioinformatics

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Page 30: Coiled Coils: A Tractable Problem for Bioinformatics

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30

a b c d e f g g a b

(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1

POI

Register 1

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Page 31: Coiled Coils: A Tractable Problem for Bioinformatics

Frequency Values

Lupas et al (1991) Science, 1162

Page 32: Coiled Coils: A Tractable Problem for Bioinformatics

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30

a b c d e f g g a b

(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1

b c d e f g a a b c

(νb* νc* νd* νe* νf* νg* νa* ... νa)1/28 = Sk2

POI

Register 1

Register 2

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Page 33: Coiled Coils: A Tractable Problem for Bioinformatics

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30

a b c d e f g g a b

(νa* νb* νc* νd* νe* νf* νg* ... νg)1/28 = Sk1

b c d e f g a a b c

(νb* νc* νd* νe* νf* νg* νa* ... νa)1/28 = Sk2

g a b a b c d d e f

(νg* νa* νb* νa* νb* νc* νd* ... νd)1/28 = Sk7

POI

Register 1

Register 2

Register 7

Window 1

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Page 34: Coiled Coils: A Tractable Problem for Bioinformatics

n1 n2 n3 n4 k5 n6 n7 ... n28 n29 n30 POI

Window 2

For each residue k there are

•28 possible windows

•With 7 registers each

Sk is the highest of all of these 196 possible score

Sliding Window To Calculate Residue Score

Lupas et al (1991) Science, 1162

Page 35: Coiled Coils: A Tractable Problem for Bioinformatics

Steps to Calculating Residue Score

i. Assign Window ii. Assign Heptad Register to Windowiii. Assign Each Residue in Window the

Corresponding Relative Frequency iv. Take Geometric Meanv. Assign New Heptad Register and repeat iii

and ivvi. Assign New Window and Repeat ii-vvii. Take Largest Score

Lupas et al (1991) Science, 1162

Page 36: Coiled Coils: A Tractable Problem for Bioinformatics

Gaussian Distribution of CC Scores

• Scores of GenBank and Random Sequences were performed for comparison and scaling purposes

Lupas et al (1991) Science, 1162

Page 37: Coiled Coils: A Tractable Problem for Bioinformatics

Probability a Residue is CC

P(S)=GCC(S)/[30Gg(S)+ GCC(S)]

Lupas et al (1991) Science, 1162

Page 38: Coiled Coils: A Tractable Problem for Bioinformatics

Coils Output For MukB

http://www.ch.embnet.org/software/COILS_form.html

Page 39: Coiled Coils: A Tractable Problem for Bioinformatics

Coils Recap• Score Generated for each Residue of a

Sequence Based on Relative Frequencies

• Scores interpreted as a probability for Forming A Coiled Coil

• Limitations– Trained on parallel dimers– Biased towards hydrophillic charge rich sequences

Page 40: Coiled Coils: A Tractable Problem for Bioinformatics

Coils Options

• Scoring Options– MTK matrix, as described– MTIDK matrix, weights residue frequency

according to CC families

• Weighting a and d positions

• Window sizes 14, 21, 28

Page 41: Coiled Coils: A Tractable Problem for Bioinformatics

To Use Coils• Lupas, A., Van Dyke, M., and Stock, J. (1991)

Predicting Coled Coils from Protein Sequences,Science 252:1162-1164.

• http://www.ch.embnet.org/software/COILS_form.html

• Requirements– Almost any standard format of protein sequence can be

used as an input

Page 42: Coiled Coils: A Tractable Problem for Bioinformatics

Coils Spin Off

PCOILS: most recent version of Coils– Trained on a larger data set– More options for matrices– Uses BLAST to calculate probabilities on alignments

– Is much slower than Coils

To use is part of the REPPER server– http://toolkit.tuebingen.mpg.de/pcoils

Page 43: Coiled Coils: A Tractable Problem for Bioinformatics

PAIRCOIL

• PAIRCOIL- like COILS is a program that predicts the probability of coiled coil formation from protein sequence

• PAIRCOILS differs from COILS in that it uses pairwise interactions to determine similarity scores instead of relative occurrences in register positions

Page 44: Coiled Coils: A Tractable Problem for Bioinformatics

Database AssemblyCC-database

– Myosin– Tropomyosin– Intermediate filaments

• From Genpept• ~58,200 residues

PDB-minus database– PDB database with known

coiled coils removed– The multiple alignment

program pileup reduced protein sequences to 286 classes

– One representative structure from each class was used

– 63,100 residues

PIR-minus database •PIR database with the myosins, tropomyosins, and IF proteins removed•~7,300,000 residues

Page 45: Coiled Coils: A Tractable Problem for Bioinformatics

Computing Probabilities

n1 n2 n3 n4 k5 n6 n7 n8 n9 n10 n11

a b c d e f g a b c d

k-4 k-3 k-2 k-1 k k+1 K+2 K+3 K+4 K+5 K+5…

POI

Register

Representation

In this register

k=e

k+1 = f

K+2 = g

Page 46: Coiled Coils: A Tractable Problem for Bioinformatics

Computing Normalized Probabilities

• Single occurrence frequencies are computed as they were in COILS

– Relative Frequency of Occurrence νk(A)=[fk(A)/Tk] / WAor

νk+i(B)=[fk+i(B)/Tk+i] / WB

• Correlations Occurrence Frequencies

νk,k+1(A,B)=[fk,k+1(A,B)/Tk,k+1] / WAB-i

Berger et al (1995) PNAS, 8259

Page 47: Coiled Coils: A Tractable Problem for Bioinformatics

Tabulating Pairwise Correlations

( )( ) ( )BA

BAAP

ikk

ikkikk

+

++ =

ννν ,

ln)( ,,

Berger et al (1995) PNAS, 8259

Page 48: Coiled Coils: A Tractable Problem for Bioinformatics

Tabulating Pairwise Correlations

Berger et al (1995) PNAS, 8259

Page 49: Coiled Coils: A Tractable Problem for Bioinformatics

Color Coding Pairwise Correlations

k, k+7

k+2

k+3

k+4

k+5

k+6

k+1

Berger et al (1995) PNAS, 8259

Page 50: Coiled Coils: A Tractable Problem for Bioinformatics

Using Correlation Probabilities to Predict Coiled Coil

( ) ( ) ( )( ) ( ) ( )DCB

DACABAAP

kkk

kkkkkkk

421

4,2,1, ,,,ln

31)(

+++

+++=ννν

ννν

Berger et al (1995) PNAS, 8259

Page 51: Coiled Coils: A Tractable Problem for Bioinformatics

Calculating Residue Score

1. Set a Sliding Window of 30-residues to include residue k

2. Sum all tripartite correlation probabilities for each residue in the window for each register

3. Shift Window and repeat 24. Repeat steps 2 and 3 until all possible

windows and registers have been scored5. Residue score is highest value for all possible

window over and registers

Page 52: Coiled Coils: A Tractable Problem for Bioinformatics

Comparison of COILS to PAIRCOIL

PDB-minus database for non-coiled coils

Berger et al (1995) PNAS, 8259

Page 53: Coiled Coils: A Tractable Problem for Bioinformatics

Comparison of COILS to PAIRCOIL

PIR-minus database for non-coiled coils

Berger et al (1995) PNAS, 8259

Page 54: Coiled Coils: A Tractable Problem for Bioinformatics

Using Correlation Score to predict Probability

Berger et al (1995) PNAS, 8259

Page 55: Coiled Coils: A Tractable Problem for Bioinformatics

MultiCoil a PairCoil Spin Off

Green: PDB-minus ~39,000 residues

Red: 2-strand DB ~58,200 residues

Blue: 3-strand DB ~6,300 residues

•Distinguishes between globular proteins, two-stranded CC’s and Three-stranded CC’s

Wolf et al (1997) Protein Science, 1179

Page 56: Coiled Coils: A Tractable Problem for Bioinformatics

To use Paircoil

• Paircoil:– http://groups.csail.mit.edu/cb/paircoil/cgi-

bin/paircoil.cgi

• Multicoil– http://groups.csail.mit.edu/cb/multicoil/cgi-

bin/multicoil.cgi

• Paircoil2– http://groups.csail.mit.edu/cb/paircoil2/paircoil2.html

Page 57: Coiled Coils: A Tractable Problem for Bioinformatics

Output Comparisons

Coils Paircoil

Page 58: Coiled Coils: A Tractable Problem for Bioinformatics

Multicoil Output

Page 59: Coiled Coils: A Tractable Problem for Bioinformatics

A Computationally Directed Screen Identifying Interacting CCs in Yeast

– Protein-interaction motifs are often identified with computational methods but potential ligands are not

– A potential ligand for a CC is a CC

Newman et al (2000) PNAS, 13203

Page 60: Coiled Coils: A Tractable Problem for Bioinformatics

Using Multicoil to Identify Potential Pairing Partners

•~6,000 ORF in yeast

•~300 two-stranded

•~250 three-stranded

•~1:11 proteins in yeast potential have CCs

•~half of these have no known function

Newman et al (2000) PNAS, 13203

Page 61: Coiled Coils: A Tractable Problem for Bioinformatics

Yeast Two Hybrid Assay

GAR-Y

GDBD-X

CC motifs often work well for X and Y because many times they can fold autonomously

Fields, Song (1989) Nature, 245

Page 62: Coiled Coils: A Tractable Problem for Bioinformatics

GAR-Y

GDBD-X

Page 63: Coiled Coils: A Tractable Problem for Bioinformatics

GAR-Y

GDBD-X

162 x 162 = 26244 possible combinations (about half redundant)

Identified 213 interactions

Page 64: Coiled Coils: A Tractable Problem for Bioinformatics

GAR-Y

GDBD-X

162 x 162 = 26244 possible combinations (about half redundant)

Identified 213 interactions

Page 65: Coiled Coils: A Tractable Problem for Bioinformatics

Blow Up of Yeast Two Hybrid Results

GAR-Y

GDBD-X

Newman et al (2000) PNAS, 13203

Page 66: Coiled Coils: A Tractable Problem for Bioinformatics

Limitations to this Method

• Protein-Protein Interactions may require non-CC contacts in addition to CC contacts. – This study only identified 6 of the 25 known

interactions assayed from the yeast genome

• Some CC types not easily detected with Y2H– Parallel homodimeric constructs not easily detected

• False Positives possible but arguably not likely

Page 67: Coiled Coils: A Tractable Problem for Bioinformatics

Human bZips a Testing Ground For Software that Predicts Protein Protein Interactions

~53 Unique Human bZips

~1131 Unique Heterodimers

Newman, Keating (2003) Science, 2097

Page 68: Coiled Coils: A Tractable Problem for Bioinformatics

Current Generation of CC programs

• Aim to predict novel interacting CC partners

• Two notable studies attempt to describe bZip proteins with considerable success

1. Weights potential interstrand interactions based on many biophysical studies– Fong JA, Keating AE, Singh M (2004) Genome Biology,

5:R11

2. Physical modeling– Grigoryan G, Keating AE (2006) JMB, 355, 1125

Page 69: Coiled Coils: A Tractable Problem for Bioinformatics

Coiled Coil Recap

SOCKET: Identifies and assigns coiled coil registers to structures

Coils, Paircoil, Multicoil and progeny: predict the likelihood that a particular sequence belongs to a CC

Programs are beginning to be able to predict potential coiled coil binding partners