35
10/12/07 BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction BCB 444/544 Lecture 22 Secondary Structure Prediction Tertiary Structure Prediction #22_Oct10

BCB 444/544

Embed Size (px)

DESCRIPTION

BCB 444/544. Lecture 22 Secondary Structure Prediction Tertiary Structure Prediction #22_Oct10. Required Reading ( before lecture). Mon Oct 8 - Lecture 20 Protein Secondary Structure Prediction Chp 14 - pp 200 - 213 Wed Oct 10 - Lecture 21 Protein Tertiary Structure Prediction - PowerPoint PPT Presentation

Citation preview

Page 1: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

BCB 444/544

Lecture 22

Secondary Structure Prediction

Tertiary Structure Prediction

#22_Oct10

Page 2: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Mon Oct 8 - Lecture 20

Protein Secondary Structure Prediction

• Chp 14 - pp 200 - 213

Wed Oct 10 - Lecture 21

Protein Tertiary Structure Prediction

• Chp 15 - pp 214 - 230

Thurs Oct 11 & Fri Oct 12 - Lab 7 & Lecture 22

Protein Tertiary Structure Prediction

• Chp 15 - pp 214 - 230

Required Reading (before lecture)

Page 3: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Assignments & Announcements

ALL: HomeWork #3 √Due: Mon Oct 8 by 5 PM

• HW544: HW544Extra #1

√Due: Task 1.1 - Mon Oct 1 by noon

Due: Task 1.2 & Task 2 - Fri Oct 12 by 5 PM

• 444 "Project-instead-of-Final" students should also submit:• HW544Extra #1

• √Due: Task 1.1 - Mon Oct 8 by noon

• Due: Task 1.2 - Fri Oct 12 by 5 PM <Task 2 NOT required for BCB444 students>

Page 4: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

New Reading & Homework Assignment

ALL: HomeWork #4 (posted online today)Due: Fri Oct 19 by 5 PM (one week from today)

Read: Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website)

• Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures.

• Your assignment is to write a summary of this paper - for details

see HW#4 posted online & sent by email on Fri Oct 12

Page 5: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Seminars this Week - (yesterday)

BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html

• Oct 11 Thurs

• Dr. Klaus Schulten (Univ of Illinois) - Baker Center Seminar

The Computational Microscope 2:10 PM in E164

Lagomarcino http://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulten_Seminar.pdf

• Dr. Dan Gusfield (UC Davis) - Computer Science Colloquium

ReCombinatorics: Combinatorial Algorithms for Studying History of Recombination in Populations 3:30 PM in Howe Hall

Auditorium

http://www.cs.iastate.edu/~colloq/new/gusfield.shtml

Page 6: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Seminars this Week - Fri (today)

BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html

• Oct 12 Fri • Dr. Edward Yu (Physics/BBMB, ISU) - BCB Faculty Seminar

TBA: "Structural Biology" (see URL below) 2:10 PM in 102

Sci http://webdev.its.iastate.edu/webnews/data/site_gdcb_dept_seminars/30/webnewsfilefield_abstract/Dr.-Ed-Yu.pdf

• Dr. Srinivas Aluru (ECprE, ISU) - GDCB Seminar

Consensus Genetic Maps: A Graph Theoretic Approach 4:10 PM in 1414

MBBhttp://webdev.its.iastate.edu/webnews/data/

site_gdcb_dept_seminars/35/webnewsfilefield_abstract/Dr.-Srinivas-Aluru.pdf

Page 7: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Chp 12 - Protein Structure Basics

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 12 Protein Structure Basics

• Amino Acids• Peptide Bond Formation• Dihedral Angles• Hierarchy• Secondary Structures• Tertiary Structures

• Determination of Protein 3-Dimensional Structure

• Protein Structure DataBank (PDB)

Page 8: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Experimental Determination of 3D Structure

2 Major Methods to obtain high-resolution structures

1. X-ray Crystallography (most PDB structures)

2. Nuclear Magnetic Resonance (NMR) Spectroscopy

Note Advantages & Limitations of each method• (See your lecture notes & textbook)• For more info: http://en.wikipedia.org/wiki/Protein_structure

1. Other methods (usually lower resolution, at present):• Electron Paramagnetic Resonance (EPR - also called ESR, EMR)• Electron microscopy (EM)• Cryo-EM• Scanning Probe Microscopies (AFM - Atomic Force Microscopy)

• http://www.uweb.engr.washington.edu/research/tutorials/SPM.pdf

• Circular Dichroism (CD), several other spectroscopic methods

Page 9: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

"Best" Resolution of Protein Structures

• High-resolution methods• X-ray crystallography (< 1A)• NMR (~1 - 2.5A)

• Lower-resolution methods• Cryo-EM (~10-15A)

• Theoretical Models? • Usually low resolution, at present, but• Highly variable - & a few ~crystal data

Baker & Sali (2000)Pevsner Fig 9.36

Page 10: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Chp 13 - Protein Structure Visualization, Comparison & Classification

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 13

Protein Structure Visualization, Comparison & Classification

• Protein Structural Visualization• Protein Structure Comparison - later• Protein Structure Classification

Page 11: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Protein Structure Classification

• SCOP = Structural Classification of Proteins

Levels reflect both evolutionary and structural relationships

http://scop.mrc-lmb.cam.ac.uk/scop

• CATH = Classification by Class, Architecture,Topology & Homologyhttp://cathwww.biochem.ucl.ac.uk/latest/

• DALI - (recently moved to EBI & reorganized)

DALI Database (fold classification)http://ekhidna.biocenter.helsinki.fi/dali/start

Each method has strengths & weaknesses….

Page 12: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Chp 14 - Secondary Structure Prediction

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 14

Protein Secondary Structure Prediction

• Secondary Structure Prediction for Globular Proteins

• Secondary Structure Prediction for Transmembrane Proteins

• Coiled-Coil Prediction

Page 13: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Secondary Structure Prediction

Has become highly accurate in recent years (>85%)

• Usually 3 (or 4) state predictions:

• H = -helix• E = -strand• C = coil (or loop)• (T = turn)

Page 14: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Secondary Structure Prediction Methods

• 1st Generation methods

Ab initio - used relatively small dataset of structures availableChou-Fasman - based on amino acid propensities (3-

state)GOR - also propensity-based (4-state)

• 2nd Generation methods

based on much larger datasets of structures now availableGOR II, III, IV, SOPM, GOR V, FDM

• 3rd Generation methodsHomology-based & Neural network based

PHD, PSIPRED, SSPRO, PROF, HMMSTR, CDM

• Meta-Servers combine several different methods

Consensus & Ensemble basedJPRED, PredictProtein, Proteus

Page 15: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Secondary Structure Prediction Servers

Prediction Evaluation?

• Q3 score - % of residues correctly predicted (3-state)

in cross-validation experiments

Best results? Meta-servers

• http://expasy.org/tools/ (scroll for 2' structure prediction)

• http://www.russell.embl-heidelberg.de/gtsp/secstrucpred.html

• JPred www.compbio.dundee.ac.uk/~www-jpred

• PredictProtein http://www.predictprotein.org/ Rost, Columbia

Best "individual" programs? ??

• CDM http://gor.bb.iastate.edu/cdm/ Sen…Jernigan, ISU

• FDM (not available separately as server) Cheng…

Jernigan, ISU

• GOR V http://gor.bb.iastate.edu/ Kloczkowsky…Jernigan, ISU

Page 16: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

• Developed by Jernigan Group at ISU• Basic premise: combination of 2 complementary methods

can enhance performance by harnessing distinct advantages of both methods; combines FDM & GOR V:

• FDM - Fragment Data Mining - exploits availability of sequence-similar fragments in the PDB, which can lead to highly accurate prediction - much better than GOR V - for such fragments, but such fragments are not available for many cases

• GOR V - Garnier, Osguthorpe, Robson V - predicts secondary structure of less similar fragments with good performance; these are protein fragments for which FDM method cannot find suitable structures

• For references & additional details: http://gor.bb.iastate.edu/cdm/

Consensus Data Mining (CDM)

Page 17: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Where Find "Actual" Secondary Structure? In the PDB

Page 18: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

How Does Predicted Secondary Structure Compare?

e.g., from CMD

Query MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIENEEGOR V CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCCFDM CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCCCDM CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC

DSSPAuthor

Page 19: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Secondary Structure Prediction: for Different Types of Proteins/Domains

For Complete proteins:

Globular Proteins - use methods previously described

Transmembrane (TM) Proteins - use special methods

(next slides)

For Structural Domains: many under development:

Coiled-Coil Domains (Protein interaction domains)

Zinc Finger Domains (DNA binding domains),

others…

Page 20: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

SS Prediction for Transmembrane Proteins

Transmembrane (TM) Proteins • Only a few in the PDB - but ~ 30% of cellular proteins are

membrane-associated !

• Hard to determine experimentally, so prediction important

• TM domains are relatively 'easy' to predict!

Why? constraints due to hydrophobic environment

2 main classes of TM proteins:

- helical

- barrel

Page 21: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

SS Prediction for TM -Helices

-Helical TM domains:• Helices are 17-25 amino acids long (span the

membrane) • Predominantly hydrophobic residues • Helices oriented perpendicular to membrane• Orientation can be predicted using "positive inside" rule

Residues at cytosolic (inside or cytoplasmic) side of TM helix, near hydrophobic anchor are more positively charged than those on lumenal (inside an organelle in eukaryotes) or periplasmic side (space between inner & outer membrane in gram-negative bacteria)

• Alternating polar & hydrophobic residues provide clues to interactions among helices within membrane

Servers? • TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic

signal peptides (short hydrophobic sequences that target proteins to the endoplasmic reticulum, ER)

• Phobius - 94% accuracy - uses distinct HMM models for TM helices& signal peptide sequences

Page 22: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

SS Prediction for TM -Helices

-Helical TM domains:• Helices are 17-25 amino acids long (span the

membrane) • Predominantly hydrophobic residues • Helices oriented perpendicular to membrane• Orientation can be predicted using "positive inside" rule

Residues at cytosolic (inside or cytoplasmic) side of TM helix, near hydrophobic anchor are more positively charged than those on lumenal (inside an organelle in eukaryotes) or periplasmic side (space between inner & outer membrane in gram-negative bacteria)

• Alternating polar & hydrophobic residues provide clues to interactions among helices within membrane

Servers? • TMHMM or HMMTOP - 70% accuracy - confused by hydrophobic

signal peptides (short hydrophobic sequences that target proteins to the endoplasmic reticulum, ER)

• Phobius - 94% accuracy - uses distinct HMM models for TM helices& signal peptide sequences

Page 23: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

SS Prediction for TM -Barrels

-Barrel TM domains: • -strands are amphipathic (partly hydrophobic, partly

hydrophilic)

• Strands are 10 - 22 amino acids long

• Every 2nd residue is hydrophobic, facing lipid bilayer

• Other residues are hydrophilic, facing "pore" or opening

Servers? Harder problem, fewer servers…

TBBPred - uses NN or SVM (more on these ML methods later) Accuracy ?

Page 24: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Prediction of Coiled-Coil Domains

Coiled-coils• Superhelical protein motifs or domains, with two or more

interacting -helices that form a "bundle"• Often mediate inter-protein (& intra-protein) interactions

'Easy' to detect in primary sequence:• Internal repeat of 7 residues (heptad)

• 1 & 4 = hydrophobic (facing helical interface)• 2,3,5,6,7 = hydrophilic (exposed to solvent)

• Helical wheel representation - can be used manually detect these, based on amino acid sequence

Servers?

Coils, Multicoil - probability-based methods

2Zip - for Leucine zippers = special type of CC in TFs:

characterized by Leu-rich motif: L-X(6)-L-X(6)-L-X(6)-L

Page 25: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Chp 15 - Tertiary Structure Prediction

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 15

Protein Tertiary Structure Prediction

• Methods• Homology Modeling• Threading and Fold Recognition• Ab Initio Protein Structural Prediction• CASP

Page 26: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Structural Genomics - Status & Goal

~ 20,000 "traditional" genes in human genome (recall, this is fewer than earlier estimate of

30,000)

~ 2,000 proteins in a typical cell> 4.9 million sequences in UniProt (Oct 2007)> 46,000 protein structures in the PDB (Oct 2007)

Experimental determination of protein structure lags far behind sequence determination!

Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction

Page 27: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Structural Genomics Projects

TargetDB: database of structural genomics targetshttp://targetdb.pdb.org

Page 28: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Protein Sequence & Structure: Analysis

• Diamond STING Millennium - Many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/

• SwissProt (UniProt)Protein knowledgebasehttp://us.expasy.org/sprot

• InterProSequence analysis toolshttp://www.ebi.ac.uk/interpro

Page 29: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Protein Structure Prediction or Protein Folding Problem

"Major unsolved problem in molecular biology"

In cells: spontaneousassisted by enzymesassisted by chaperones

In vitro: many proteins can fold to their "native" states spontaneously & without assistance

but, many do not!

Page 30: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Deciphering the Protein Folding Code• Protein Structure Prediction or "Protein Folding" Problem

Given the amino acid sequence of a protein, predict its 3-dimensional structure (fold)

• "Inverse Folding" Problem

Given a protein fold, identify every amino acid sequence that can adopt that 3-dimensional structure

Page 31: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Protein Structure Prediction

Structure is largely determined by sequence

BUT:• Similar sequences can assume different structures• Dissimilar sequences can assume similar structures• Many proteins are multi-functional 2 Major Protein Folding Problems:

1- Determination of folding pathway 2- Prediction of tertiary structure from

sequence

Both still largely unsolved problems

Page 32: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Steps in Protein Folding

1-"Collapse"- driving force is burial of hydrophobic aa’s(fast - msecs)

2- Molten globule - helices & sheets form, but "loose"(slow - secs)

3- "Final" native folded state - compaction & rearrangement of some 2' structures

Native state? - assumed to be lowest free energy - may be an ensemble of structures

Page 33: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Protein Dynamics

• Protein in native state is NOT static

• Function of many proteins requires conformational changes, sometimes large, sometimes small

• Globular proteins are inherently "unstable"

(NOT evolved for maximum stability)

• Energy difference between native and denatured state is very small (5-15 kcal/mol)

(this is equivalent to ~ 2 H-bonds!)

• Folding involves changes in both entropy & enthalpy

Page 34: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

Difficulty of Tertiary Structure Prediction

Folding or tertiary structure prediction problem can be formulated as a search for minimum energy conformation

• Search space is defined by psi/phi angles of backbone and side-chain rotamers

• Search space is enormous even for small proteins!

• Number of local minima increases exponentially with number of residues

Computationally it is an exceedingly difficult problem!

Page 35: BCB 444/544

10/12/07BCB 444/544 F07 ISU Dobbs #22 - Secondary & Tertiary Structure Prediction

From Thursday's Lab:

• Homology Modeling - using SWISS-MODEL• http://swissmodel.expasy.org//SWISS-MODEL.html

• Threading - using 3-D JURY (BioinfoBank, a METAserver)• http://meta.bioinfo.pl/submit_wizard.pl

• Be sure to take a look at CASP contest:• http://predictioncenter.gc.ucdavis.edu/

• CASP7 contest in 2006 • http://www.predictioncenter.org/casp7/Casp7.html