View
215
Download
1
Embed Size (px)
Citation preview
Computing for Bioinformatics
Lecture 8: protein folding
Problem description● Definition: Given the amino acid sequence of a protein, what
is the protein's structure in three dimension?
● Importance: The structure of a protein provides a key to understanding its biological function.
● Assumption: The amino acid sequence contains all information about the native 3-D structure.
● Thermodynamic principle: (Christian Anfinsen's denaturation-renaturation experiments on ribonuclease.) If one changes the solvent condition, the protein will undergo a transition from the native state to an unfolded state and become inactive. When solvent condition is changed back, the protein refolds and becomes active again.
Methods of 3D structure determination
● Experimental approaches: expensive, slow ➢ Nuclear magnetic resonance (NMR)➢ X-ray crystallography
● Today we have much more sequenced proteins than protein’s structures. The gap is rapidly increasing.
● Protein structure prediction is becoming increasingly important.
Protein Structure
• Primary structure – sequence of amino acids constituting polypeptide chain
• Secondary structure – local organization of polypeptide chain into secondary structures such as -helices and -sheets
• Tertiary structure –three dimensional arrangements of amino acids as they react to one another due to polarity and interactions between side chains
• Quaternary structure – Interaction of several protein subunits
Amino acids
• Hydrophobic: Glycine(G), Alanine(A), Valine(V), phenylalanine (F), Proline (P), Methionine (M), isoleucine (I), Leucine(L), Tryptophan (W)
• Charged: Aspartic acid (D), Glutamic Acid (E), Lysine (K), Arginine (R), Histidine (H)
• Polar: Serine (S), Theronine (T), Tyrosine (Y); Histidine (H), Cysteine (C), Asparagine (N), Glutamine (Q), Tryptophan (W)
Types of Secondary Structures
● Sheets● -Helices●Loops
– Image source: http://www.ebi.ac.uk/microarray/biology_intro.html
Helix
– Most abundant secondary structure
– 3.6 amino acids per turn
– Hydrogen bond formed between every
fourth reside
– Average length: 10 amino acids, or 3 turns
– Varies from 5 to 40 amino acids
Helix
• Every third amino acid tends to be hydrophobic
• Rich in alanine (A), gutamic acid (E), leucine (L), and methionine (M)
• Poor in proline (P), glycine (G), tyrosine (Y), and serine (S)
Sheet
Image source: http://broccoli.mfn.ki.se/pps_course_96/ss_960723_12.html; http://www4.ocn.ne.jp/~bio/biology/protein.htm
Sheet• Hydrogen bonds between 5-10 consecutive amino acids in one
portion of the chain with another 5-10 farther down the chain
• Interacting regions may be adjacent with a short loop, or far apart with other structures in between
– Directions:• Same: Parallel Sheet• Opposite: Anti-parallel Sheet• Mixed: Mixed Sheet
– Pattern of hydrogen bond formation in parallel and anti-parallel sheets is different
Interactions in Helices and Sheets
Loops
• Regions between helices and sheets
• Various lengths and three-dimensional configurations
• Located on surface of the structure
• More variable sequence structure
• Tend to have charged and polar amino acids
• Frequently a component of active sites
Classes of Protein Structure
The classes are made based on the percentages of secondary structure components.
1) Class :: bundles of -helices connected by loops on surface of proteins
2) Class antiparallel sheets, usually two sheets in close contact forming sandwich
3) Class /: mainly parallel sheets with intervening helices; may also have mixed sheets (metabolic enzymes)
4) Class + mainly segregated -helices and antiparallel sheets
Class Protein (hemoglobin)
Class Protein (T-Cell CD8)
/ Class Protein(tryptohan synthase)
+ Class Protein(1RNB)
Protein structure database
• Databases of three dimensional structures of proteins, where structure has been solved using X-ray or NMR techniques
• Protein Databases:– PDB– SCOP– Swiss-Prot– PIR
• Most extensive for 3-D structure is the Protein Data Bank (PDB).
• Current release of PDB (April 8, 2003) has 20,622 structures
Partial PDB File
ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162
ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163
ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164
ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165
ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166
ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167
ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168
ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169
ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170
ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171
ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172
ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173
ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174
ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175
ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176
Description of PDB File
• second column: amino acid position in the polypeptide chain
• fourth column: current amino acid
• Columns 7, 8, and 9: x, y, and z coordinates (in angstroms)
• The 11th column: temperature factor -- can be used as a measurement of uncertainty
Visualization of Proteins
• Most popular program for viewing 3-dimensional structures is Rasmol
Rasmol: http://www.umass.edu/microbio/rasmol/
Chime: http://www.umass.edu/microbio/chime/
Cn3D: http://www.ncbi.nlm.nih.gov/Structure/
Mage: http://kinemage.biochem.duke.edu/website/kinhome.html
Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html