20
Protein Secondary Structure Prediction

Protein Secondary Structure Prediction

  • Upload
    donagh

  • View
    116

  • Download
    1

Embed Size (px)

DESCRIPTION

Protein Secondary Structure Prediction. ?. ?. TDVEAAVNSLVNLYLQASYLS. ?. Protein secondary structure prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand, or loop. Servers for SS prediction. - PowerPoint PPT Presentation

Citation preview

Page 1: Protein Secondary Structure Prediction

Protein Secondary Structure Prediction

Page 2: Protein Secondary Structure Prediction

• Input: protein sequence• Output: for each residue its associated Secondary

structure (SS): alpha-helix, beta-strand, or loop.

Protein secondary structure prediction

Page 3: Protein Secondary Structure Prediction

Servers for SS prediction• AGADIR - An algorithm to predict the helical content of peptides• APSSP - Advanced Protein Secondary Structure Prediction Server• CFSSP - Chou & Fasman Secondary Structure Prediction Server• GOR - Garnier et al, 1996• HNN - Hierarchical Neural Network method (Guermeur, 1997)• HTMSRAP - Helical TransMembrane Segment Rotational Angle Prediction• Jpred - A consensus method for protein secondary structure prediction at University of Dundee• JUFO - Protein secondary structure prediction from sequence (neural network)• NetSurfP - Protein Surface Accessibility and Secondary Structure Predictions• NetTurnP - Prediction of Beta-turn regions in protein sequences• nnPredict - University of California at San Francisco (UCSF)• Porter - University College Dublin• PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia

University• Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction• PSA - BioMolecular Engineering Research Center (BMERC) / Boston• PSIpred - Various protein structure prediction methods at Bloomsbury Centre for Bioinformatics• SOPMA - Geourjon and Delage, 1995• Scratch Protein Predictor• DLP-SVM - Domain linker prediction using SVM at Tokyo University of Agriculture and Technology

Page 4: Protein Secondary Structure Prediction

SS prediction Methods

Most basic idea - probabilitiesChou-Fasman method (1974)

Most basic idea - probabilitiesChou-Fasman method (1974)

Conditional probabilitiesGOR method (1978)

Conditional probabilitiesGOR method (1978)

Machine learning techniquesSVM, Neural network (2004/5)Machine learning techniques

SVM, Neural network (2004/5)

Other improvementsEnvironment, solvent accessibility (ongoing)

Other improvementsEnvironment, solvent accessibility (ongoing)

~50%

~60%

~70%

~80%

Page 5: Protein Secondary Structure Prediction

Query

SwissProt

BLASTp

QuerySubjectSubjectSubjectSubject

psiBLAST,MaxHom MSA

Machine LearningApproach

HHHLLLHHHEEE

Known structures

Protein secondary structure prediction

Page 6: Protein Secondary Structure Prediction

Evaluating secondary structure prediction methods

• Assume you have a new method for SS prediction.• Given the following sequence you get the result:

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE

How can you assess how good your result is?1)Compare it to the TRUTH, assuming this structure exists. (what if it doesn’t?)2)Calculate the percentage of amino acids whose secondary structure class (helix, coil, or sheet) is correctly predicted.(Q3)

Coil: - , Beta strand: E , Alpha helix: H

Page 7: Protein Secondary Structure Prediction

Original sequence:

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT

Prediction:

---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE

Truth (from a PDB file):

-----EE-------------HHHHHHHHHH--------EE--------HHHHHHH-----

Evaluating secondary structure prediction methods

Page 8: Protein Secondary Structure Prediction

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE-----EE-------------HHHHHHHHHH--------EE--------HHHHHHH-----YYYNNYYNNNYYYNNNNYYYNNNNYYYYYYNNYYYYYNYYNYYYYYYYNNNNNNNNNNNN

Evaluating secondary structure prediction methods

What can be the problem with such calculation?

•Overall, there are 61 AA. •Number of correctly predicted (Y) is 31.•So the Q3 score of this method would be: 50.81%

Page 9: Protein Secondary Structure Prediction

Evaluating secondary structure prediction methods

What can be the problem with such calculation?

•Assume that alpha helix is the SS of 60% of the residues. •Then a constant prediction of alpha helices would yield a Q3 measurement of 60%.•This method rewards over prediction of more common secondary structure classes in the database.

Page 10: Protein Secondary Structure Prediction

• There are other ways to measure correlation between the result and the ‘truth’.

• Most of them rely on the ratio between 1. True positive (TP) = correctly identified

2. True negative (TN) = correctly rejected

3. False positive (FP) = incorrectly identified

4. False negative (FN) = incorrectly rejected

Evaluating secondary structure prediction methods

Page 11: Protein Secondary Structure Prediction

• For instance, for the α-helix: – TP: number of α-helix residues that are

correctly predicted. – TN: number of residues observed in β-strands

and loops that are not predicted as α-helix. – FP: number of residues incorrectly predicted in

α-helix conformation. – FN: number of residues observed in α-helices

but predicted to be either in β-strands or loops.

Evaluating secondary structure prediction methods

Page 12: Protein Secondary Structure Prediction

• Sensitivity and specificity are statistical measures of the performance of a binary classification test.

• Sensitivity measures the proportion of actual positives which are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition).

• Specificity measures the proportion of negatives which are correctly identified (e.g. the percentage of healthy people who are correctly identified as not having the condition).

Sensitivity and specificity

Page 13: Protein Secondary Structure Prediction

• Question:– If the predictor perfectly predicts the truth, what would

be the sensitivity rate? The specificity rate?

• Answer:– A perfect predictor would be described as ______%

sensitivity (i.e. predict all people from the sick group as sick) and ______% specificity (i.e. not predict anyone from the healthy group as sick).

Sensitivity and specificity

Page 14: Protein Secondary Structure Prediction

• For any test, there is usually a trade-off between the measures.

• For example: in an airport security setting in which one is testing for potential threats to safety, scanners may be set to trigger on low-risk items like belt buckles and keys (low specificity), in order to reduce the risk of missing objects that do pose a threat to the aircraft and those aboard (high sensitivity).

Sensitivity and specificity

Page 15: Protein Secondary Structure Prediction

Sensitivity and specificity

TPSensitivity

TP FN

TNSpecificity

TN FP

Page 16: Protein Secondary Structure Prediction

Exercise

Calculate the specificity and sensitivity of the alpha helix prediction in the following SS prediction:

Original sequence:

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT

Prediction:

---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE

Truth (from a PDB file):

-----EE-------------HHHHHHHHHH--------EE--------HHHHHHH-----

Page 17: Protein Secondary Structure Prediction

Answer

---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE-----EE-------------HHHHHHHHHH--------EE--------HHHHHHH-----

Alpha helix:– TP = 6 – FP=2 – FN=4+7=11 – TN=61-(6+2+11)=42

TP - Alpha helicesCorrectly identified

FP - Alpha helicesIncorrectly identified

FN - Alpha helicesincorrectly rejected

35.26

6 19%

1

TPSensitivity

TP FN

42

42 1179.24%

TNSpecificity

TN FP

Page 18: Protein Secondary Structure Prediction

Jpred 3 – SS prediction server

Page 19: Protein Secondary Structure Prediction

MSA

Buried/exposed prediction

Reliability score

Final SS prediction

Page 20: Protein Secondary Structure Prediction

Original sequence:

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT

Jpred Prediction + reliability:

-----HHHH------------HHHHHHHHHHH-------------------EEE------997500000026777567776017899988721577400467777777773000000699

Truth (from a PDB file):

-----EE-------------HHHHHHHHHH--------EE--------HHHHHHH-----

Jpred 3 – SS prediction server