21
Computing for Bioinformatics Lecture 8: protein folding

Computing for Bioinformatics Lecture 8: protein folding

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Computing for Bioinformatics Lecture 8: protein folding

Computing for Bioinformatics

Lecture 8: protein folding

Page 2: Computing for Bioinformatics Lecture 8: protein folding

Problem description● Definition: Given the amino acid sequence of a protein, what

is the protein's structure in three dimension?

● Importance: The structure of a protein provides a key to understanding its biological function.

● Assumption: The amino acid sequence contains all information about the native 3-D structure.

● Thermodynamic principle: (Christian Anfinsen's denaturation-renaturation experiments on ribonuclease.) If one changes the solvent condition, the protein will undergo a transition from the native state to an unfolded state and become inactive. When solvent condition is changed back, the protein refolds and becomes active again.

Page 3: Computing for Bioinformatics Lecture 8: protein folding

Methods of 3D structure determination

● Experimental approaches: expensive, slow ➢ Nuclear magnetic resonance (NMR)➢ X-ray crystallography

● Today we have much more sequenced proteins than protein’s structures. The gap is rapidly increasing.

● Protein structure prediction is becoming increasingly important.

Page 4: Computing for Bioinformatics Lecture 8: protein folding

Protein Structure

• Primary structure – sequence of amino acids constituting polypeptide chain

• Secondary structure – local organization of polypeptide chain into secondary structures such as -helices and -sheets

• Tertiary structure –three dimensional arrangements of amino acids as they react to one another due to polarity and interactions between side chains

• Quaternary structure – Interaction of several protein subunits

Page 5: Computing for Bioinformatics Lecture 8: protein folding

Amino acids

• Hydrophobic: Glycine(G), Alanine(A), Valine(V), phenylalanine (F), Proline (P), Methionine (M), isoleucine (I), Leucine(L), Tryptophan (W)

• Charged: Aspartic acid (D), Glutamic Acid (E), Lysine (K), Arginine (R), Histidine (H)

• Polar: Serine (S), Theronine (T), Tyrosine (Y); Histidine (H), Cysteine (C), Asparagine (N), Glutamine (Q), Tryptophan (W)

Page 7: Computing for Bioinformatics Lecture 8: protein folding

Helix

– Most abundant secondary structure

– 3.6 amino acids per turn

– Hydrogen bond formed between every

fourth reside

– Average length: 10 amino acids, or 3 turns

– Varies from 5 to 40 amino acids

Page 8: Computing for Bioinformatics Lecture 8: protein folding

Helix

• Every third amino acid tends to be hydrophobic

• Rich in alanine (A), gutamic acid (E), leucine (L), and methionine (M)

• Poor in proline (P), glycine (G), tyrosine (Y), and serine (S)

Page 10: Computing for Bioinformatics Lecture 8: protein folding

Sheet• Hydrogen bonds between 5-10 consecutive amino acids in one

portion of the chain with another 5-10 farther down the chain

• Interacting regions may be adjacent with a short loop, or far apart with other structures in between

– Directions:• Same: Parallel Sheet• Opposite: Anti-parallel Sheet• Mixed: Mixed Sheet

– Pattern of hydrogen bond formation in parallel and anti-parallel sheets is different

Page 11: Computing for Bioinformatics Lecture 8: protein folding

Interactions in Helices and Sheets

Page 12: Computing for Bioinformatics Lecture 8: protein folding

Loops

• Regions between helices and sheets

• Various lengths and three-dimensional configurations

• Located on surface of the structure

• More variable sequence structure

• Tend to have charged and polar amino acids

• Frequently a component of active sites

Page 13: Computing for Bioinformatics Lecture 8: protein folding

Classes of Protein Structure

The classes are made based on the percentages of secondary structure components.

1)  Class :: bundles of -helices connected by loops on surface of proteins

2)  Class antiparallel sheets, usually two sheets in close contact forming sandwich

3)  Class /: mainly parallel sheets with intervening helices; may also have mixed sheets (metabolic enzymes)

4) Class + mainly segregated -helices and antiparallel sheets

Page 14: Computing for Bioinformatics Lecture 8: protein folding

Class Protein (hemoglobin)

Page 15: Computing for Bioinformatics Lecture 8: protein folding

Class Protein (T-Cell CD8)

Page 16: Computing for Bioinformatics Lecture 8: protein folding

/ Class Protein(tryptohan synthase)

Page 17: Computing for Bioinformatics Lecture 8: protein folding

+ Class Protein(1RNB)

Page 18: Computing for Bioinformatics Lecture 8: protein folding

Protein structure database

• Databases of three dimensional structures of proteins, where structure has been solved using X-ray or NMR techniques

• Protein Databases:– PDB– SCOP– Swiss-Prot– PIR

• Most extensive for 3-D structure is the Protein Data Bank (PDB).

• Current release of PDB (April 8, 2003) has 20,622 structures

Page 19: Computing for Bioinformatics Lecture 8: protein folding

Partial PDB File

ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162

ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163

ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164

ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165

ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166

ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167

ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168

ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169

ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170

ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171

ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172

ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173

ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174

ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175

ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176

Page 20: Computing for Bioinformatics Lecture 8: protein folding

Description of PDB File

• second column: amino acid position in the polypeptide chain

• fourth column: current amino acid

• Columns 7, 8, and 9: x, y, and z coordinates (in angstroms)

• The 11th column: temperature factor -- can be used as a measurement of uncertainty

Page 21: Computing for Bioinformatics Lecture 8: protein folding

Visualization of Proteins

• Most popular program for viewing 3-dimensional structures is Rasmol

Rasmol: http://www.umass.edu/microbio/rasmol/

Chime: http://www.umass.edu/microbio/chime/

Cn3D: http://www.ncbi.nlm.nih.gov/Structure/

Mage: http://kinemage.biochem.duke.edu/website/kinhome.html

Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html