80
PREDICTING PROTEIN PREDICTING PROTEIN SECONDARY STRUCTURE SECONDARY STRUCTURE USING ARTIFICIAL USING ARTIFICIAL NEURAL NETWORKS NEURAL NETWORKS Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih

Embed Size (px)

Citation preview

PREDICTING PROTEIN PREDICTING PROTEIN SECONDARY SECONDARY

STRUCTURE USING STRUCTURE USING ARTIFICIAL NEURAL ARTIFICIAL NEURAL

NETWORKSNETWORKS

Sudhakar ReddyPatrick ShihChrissy Oriol

Lydia Shih

Sudhakar Reddy

ProteinsAnd Secondary Structure

Project GoalsProject Goals

To predict the secondary structure of a protein using artificial neural networks.

STRUCTURESSTRUCTURES

Primary structure: linear arrangement of amino acid (a.a) residues that constitute the polypeptide chain.

SECONDARY SECONDARY STRUCTURESTRUCTURE

Localized organization of parts of a polypeptide chain, through hydrogen bonds between different residues.

Without any stabilizing interactions , a polypeptide assumes random coil structure.

When stabilizing hydrogen bond forms, the polypeptide backbone folds periodically in to one of two geometric arrangements viz.

ALPHA HELIX BETA SHEET U-TURNS

ALPHA HELIXALPHA HELIX A polypeptide back bone is folded in to spiral that is held in place

by hydrogen bonds between backbone oxygen atoms and hydrogen atoms.

The carbonyl oxygen of each peptide bond is hydrogen bonded to the amide hydrogen of the a.a 4 residues toward the C-terminus

Each alpha helix has 3.6 a.a per turn

From the backbone side chains point outward

Hydrophobic/hydrophilic quality of the helix is determined entirely by side chains, because polar groups of the peptide backbone are already involved H-bonding in the helix and thus are unable to affect its hydrophobic/hydrophilic.

ALPHA HELIXALPHA HELIX

THE BETA SHEETTHE BETA SHEET

Consists of laterally packed beta strands

Each beta strand is a short (5-8 residues), nearly fully extended polypeptide chain

Hydrogen bonding between backbone atoms in a adjacent beta strands, within either the same or different polypeptide chains forms a beta sheet.

Orientation can be either parallel or anti-parallel. In both arrangements side chains project from both faces of the sheet.

THE BETA SHEETTHE BETA SHEET

THE BETA SHEETTHE BETA SHEET

TURNSTURNS

Composed of 3-4 residues , are compact, U-shaped secondary structures stabilized by H-bonds between their end residues.

Located on the surface of the protein, forming a sharp bend that redirects the polypeptide backbone back toward the interior.

Glycine and proline are commonly present. Without these turns , a protein would be large,

extended and loosely packed.

TURNSTURNS

MOTIFS: regular combinations of secondary structure.

– Coiled coil motif

– Helix-loop-helix(Ca+)

– Zinc finger motif.

MOTIFSMOTIFS

COILED-COIL MOTIFCOILED-COIL MOTIF

                  

HELIX-LOOP-HELIX (CA+)HELIX-LOOP-HELIX (CA+)

ZINC-FINGER MOTIFZINC-FINGER MOTIF

FUTURE FUTURE Protein structure identification is key to understanding

biological function and its role in health and disease

Characterizing a protein structure helpful in the development of new agents and devices to treat disease

Challenge of unraveling the structure lies in developing methods for accurately and reliably understanding this relationship

Most of the current protein structures have been characterized by NMR and X-Ray diffraction

Revolution in sequencing studies-growing data base-only 3000 known structures

Very few confirmations of protein are possible and structure and sequence are directly related to each other, we can unravel the secondary structure by developing an efficient algorithm, which compares new sequences with the ones available, and use them in health care industry.

ADVANTAGEADVANTAGE

Prediction of secondary structure is an essential intermediate step on the way to predicting the full 3-D structure of a protein

If the secondary structure of a protein is known, it is possible to derive a comparatively small number of possible tertiary structures using knowledge about the ways that secondary structural elements pack

WHY SECONDARY STRUCTURE?WHY SECONDARY STRUCTURE?

Artificial Neural Network Artificial Neural Network (ANN)(ANN)

Peichung Shih

Biological Neural Biological Neural NetworkNetwork

Artificial Neural Artificial Neural NetworkNetwork

: Threshold

X1k : Input from X1

X2k : Input from X2

W1k : Weight of X1

W2k : Weight of X2

X0k : Bias term

W0k : Weight of bias term

-11

: Nonlinear function

qk : Output of node k

X1k : Input from X1

X2k : Input from X2

W1k : Weight of X1

W2k : Weight of X2

X0k : Bias term

W0k : Weight of bias term : Threshold

-1 : Nonlinear function

qk : Output of node k

19991.0

e

11

1)7(F

7

);0(exitelse

;WXoutput)7(if ii

7221121WX

2

0iii

Artificial Neural Network - Example

71

7221121WX

2

0iii

);0(exitelse

;WXoutput)7(if ii

W1 = 1X1= 1

W2 = 2X2 = 2

+

+ = 6

X0 = 1

W0 = 2

-1

-1

F(x) = ( 1 + e-x )-1

19991.0

e

11

1)7(F

7

Output 1

Topology

LearningFeedback Feedforward

Unsupervised

Binary Adaptive Resonance Theory (ART1) Analog Adaptive Resonance Theory (ART2)

Fuzzy Associative Memory (FAM) Learning Vector Quantization (LVQ)

Supervised

Brain-State-in-a-Box (BSB) Fuzzy Cognitive Map (FCM)

Perceptron Adaline & Madaline Backpropagation (BP)

Perceptron Adaline & Madaline Backpropagation (BP)

Paradigms of ANN - Paradigms of ANN - OverviewOverview

Topology

LearningFeedback Feedforward

Unsupervised

Supervised

Paradigms of ANN - Paradigms of ANN - FeedforwardFeedforward

Topology

LearningFeedback Feedforward

Unsupervised

Supervised

Paradigms of ANN - Paradigms of ANN - feedbackfeedback

Topology

LearningFeedback Feedforward

Unsupervised

Supervised

Paradigms of ANN - Paradigms of ANN - supervisedsupervised

Topology

LearningFeedback Feedforward

Unsupervised

Supervised

Paradigms of ANN - Paradigms of ANN - UnsupervisedUnsupervised

Topology

LearningFeedback Feedforward

Unsupervised

Binary Adaptive Resonance Theory (ART1) Analog Adaptive Resonance Theory (ART2)

Fuzzy Associative Memory (FAM) Learning Vector Quantization (LVQ)

Supervised

Brain-State-in-a-Box (BSB) Fuzzy Cognitive Map (FCM)

Perceptron Adaline & Madaline Backpropagation (BP)

Perceptron Adaline & Madaline Backpropagation (BP)

Paradigms of ANN - Paradigms of ANN - OverviewOverview

Perceptron One of the earliest learning networks was proposed by Rosenblatt in the late 1950's.

RULE:

net = w1I1 + w2I2

if net > then output = 1,

otherwise o = 0.

MODEL:

Perceptron Example : AND Operation

Initial Network:

1 1

- 0.5

+ 0.5

= 1.5

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 1.5

1 1

- 0.5

+ 0.5

0

Input I1

Input I2

Target

1 1 1

0.5

0.5 1.5

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 0.5

1 0

0.5

1.5

0

Input I1

Input I2

Target

1 0 0

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 0.5

0 1

0.5

1.5

1

Input I1

Input I2

Target

0 1 0

1.5

0.5

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 1.5

0 0

0.5

0.5

0

Input I1

Input I2

Target

0 0 0

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 1.5

1 1

0.5

0.5

0

Input I1

Input I2

Target

1 1 1

0.5

1.5 1.5

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 0.5

1 0

1.5

1.5

1

Input I1

Input I2

Target

1 0 0

1.5

0.5

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 1.5

0 1

0.5

1.5

0

Input I1

Input I2

Target

0 1 0

Perceptron Example : AND Operation

W = W + 1

Output correct?

y

NO = 1 ; O = 0 ;

W = W

W = W - 1

W = W

W = W

= 1.5

0 1

0.5

1.5

0

Input I1

Input I2

Target

0 1 0

Hidden Layer

10

1

0

(1, 1)

(1, 0)

(0, 1)

(0, 0)

AND

OR

XOR

Hidden LayerInput I1

Input I2

Target

1 1 0

1 0 1

0 1 1

0 0 0

0

Hidden LayerInput I1

Input I2

Target

1 1 0

1 0 1

0 1 1

0 0 0

1 1

1 1- 2

1 1 1 1

1 1

1.5

0.5

How Many Hidden How Many Hidden Nodes?Nodes?

We have indicated the number of layers needed. However, no indication is provided as to the optimal number of nodes per layer. There is no formal method to determine this optimal number; typically, one uses trial and error.

Hidden Units Q3(%)

0 62.50

5 61.60

10 61.50

15 62.60

20 62.30

30 62.50

40 62.70

60 61.40

CHRISSY ORIOL

JNET AND JPRED

•Multiple Alignement

•Neural Network

•Consensus of methods

JNET

TRAINING AND TESTS

• 480 proteins train (1996 PDB)

• 406 proteins test (2000 PDB)

Blind test

7-fold cross validation test

MULTIPLE ALIGNMENTS

• Multiple sequence alignment constructed

• Generation of profiles

Frequency counts of each residue / total residue in the column (expressed as percentage)

Each residue scored by its value from BLOSUM62 and the scores were averaged based on the number of sequence in that column

Profile HMM generated by HMMER2

PSI-BLAST (Position Specific Iterative Basic Local Alignment Search Tool)

o Frequency of residue

o PSSM (Position Specific Scoring Matrix)

ALIGNMENTS

HMM PROFILE• Uses:

Statistical descriptions of a sequence family's consensus

Position-specific scores for residues, insertions and deletions

• Profiles: Captures important information about the degree of conservation at different positions

Varying degree to which gaps and insertions and deletions are permitted

Align [a] and [b]

Remove gaps in [a] and the column below the gaps to form a restrained profile which better represents sequence [a]

Align [c] to profile of [a] and [b]

Iterate addition of each sequence from PSIBlast search until all are aligned

Alignment profile based on the query sequence to be predicted

Full length seq. from the initial PSIBlast search, extracted from the database, and ordered by p-value

PSI-BLAST PROFILE

PSI-BLAST PROFILE

• Iterative Low complexity sequences polluted searching profile

• Filtered database to “mask” out: Low complexity sequences (SEG)

Coiled-coil regions (HELIXFILT)

Transmembrane helices (HELIXFILT)

NUERAL NETWORK

• Two Nueral Network Used 1st

o Sliding window of 17 residues

o 9 hidden nodes

o 3 outputs

2nd

o Sliding window of 19 residue

o 9 hidden nodes

o 3 outputs

NUERAL NETWORK

CONSENSUS COMBINATIONOF PREDICTION METHODS

CONSENSUS COMBINATIONOF PREDICTION METHODS

• “Jury Agreement” (Identical predictions by all methods Q3 = 82%)

• “No Jury” (Q3 = 76.4%)

Trained another neural network

Q3

(iH ,E ,C ) 100predicted

observed

ASSESMENT OF ACCURACY

Confidence = 10 (outmax outnext)

Sov 1

N

minov(sobs

;spred

) maxov(s

obs;s

pred)

len(s1)

s

Segment Overlap:

RIBONUCLEASE A

KEY“H” – helix

“E” – strand

“B” - buried residue

“-” exposed residue

“*” – no jury

YourSeq : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : YourSeq YA60_PYRHO : ERALIEAQIQAILRKILTPEARERLARVKLVRPELARQVELILVQLYQAGQITERIDDAKLKRILAQIEAKRREFRIKW. : YA60_PYRHO TF19_HUMAN : ..KHREAEMRSILAQVLDQSARARLSNLALVKPEKTKAVENYLIQMARYGQLSEKVSEQGLIEILKKVSQQEKTTTVKFN : TF19_HUMAN Q9VUZ8 : ..MRAQEEMKSILSQVLDQQARARLNTLKVSKPEKAQMFENMVIRMAQMGQVRGKLDDAQFVSILESVNAQQSKSSVKYD : Q9VUZ8 YRGK_CAEEL : ARAENQETAKGMISQILDQAAMQRLSNLAVAKPEKAQMVEAALINMARRGQLSGKMTDDGLKALMERVSAQQKATSVKFD : YRGK_CAEEL Y691_METJA : ..ALLEAEMQALLRKILTPEARERLERIRLARPEFAEAVEVQLIQLAQLGRLPIPLSDEDFKALLERISALKRKREIKIV : Y691_METJA YK68_ARCFU : MRRQVEAQKKAILRAILEPEAKERLSRLKLAHPEIAEAVENQLIYLAQAGRIQSKITDKMLVEILKRVQPKKRETRIIRK : YK68_ARCFU YF69_SCHPO : ..QEVQDEMRNLLSQILEHPARDRLRRIALVRKDRAEAVEELLLRMAKTGQISHKISEPELIELLEKISGEKRNETKIVI : YF69_SCHPO YMW4_YEAST : .AGGGENSAPAAIANFLEPQALERLSRVALVRRDRAQAVETYLKKLIATNNVTHKITEAEIVSILNGIAKQQNNSKIIFE : YMW4_YEAST   : 1---------11--------21--------31--------41--------51--------61--------71-------- :OrigSeq : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : OrigSeq jalign : --HHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----EE--- : jalignjfreq : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH----EEEEE-- : jfreqjhmm : -HHHHHHHHHHHHHHH---HHHHHHHHHHH----HHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH---EEEEE- : jhmmjnet : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEEEE- : jnetjpssm : --HHHHHHHHHHHHHH--HHHHHHH-HEEEE---HHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEE--- : jpssm Jpred : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEEE-- : Jpred MCoil : -------------------------------------------------------------------------------- : MCoilMCoilDI : -------------------------------------------------------------------------------- : MCoilDIMCoilTRI : -------------------------------------------------------------------------------- : MCoilTRILupas 21 : -------------------------------------------------------------------------------- : Lupas 21Lupas 14 : -------------------------------------------------------------------------------- : Lupas 14Lupas 28 : -------------------------------------------------------------------------------- : Lupas 28 Jnet_25 : ---BB---B--BBB-BB---B--BB--B-BB---BB-BBB-BBB-BB-BB-B---B----BB-BB--B--------B--- : Jnet_25Jnet_5 : -----------BB--B----B---B--B----------B---B--B--------------B--BB--------------- : Jnet_5Jnet_0 : --------------------------------------B---B--B--------------B------------------- : Jnet_0Jnet Rel : 79889998888998643697888849188454657899999999988626987657778999999986007883747728 : Jnet Rel

JNET OUTPUT

JPRED SERVERConsensus web server

•JNET – default method

•PREDATOR • Neural network focused on predicting hydrogen bonds

•PHD - PredictProtein • Neural network focused on predicting hydrogen bonds

•NNSSP – Nearest-neighbor SS prediction

•DSC – Discrimination of protein Secondary structure Class

• Based on dividing secondary structure prediction into the basic concepts for prediction and then use simple and linear statistical methods to combine the concepts for prediction

•ZPRED• physiochemical information

•MULPRED •Single sequence method combination

JPRED SERVER cont.

YourSeq : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : YourSeq YA60_PYRHO : ERALIEAQIQAILRKILTPEARERLARVKLVRPELARQVELILVQLYQAGQITERIDDAKLKRILAQIEAKRREFRIKW. : YA60_PYRHO TF19_HUMAN : ..KHREAEMRSILAQVLDQSARARLSNLALVKPEKTKAVENYLIQMARYGQLSEKVSEQGLIEILKKVSQQEKTTTVKFN : TF19_HUMAN Q9VUZ8 : ..MRAQEEMKSILSQVLDQQARARLNTLKVSKPEKAQMFENMVIRMAQMGQVRGKLDDAQFVSILESVNAQQSKSSVKYD : Q9VUZ8 YRGK_CAEEL : ARAENQETAKGMISQILDQAAMQRLSNLAVAKPEKAQMVEAALINMARRGQLSGKMTDDGLKALMERVSAQQKATSVKFD : YRGK_CAEEL Y691_METJA : ..ALLEAEMQALLRKILTPEARERLERIRLARPEFAEAVEVQLIQLAQLGRLPIPLSDEDFKALLERISALKRKREIKIV : Y691_METJA YK68_ARCFU : MRRQVEAQKKAILRAILEPEAKERLSRLKLAHPEIAEAVENQLIYLAQAGRIQSKITDKMLVEILKRVQPKKRETRIIRK : YK68_ARCFU YF69_SCHPO : ..QEVQDEMRNLLSQILEHPARDRLRRIALVRKDRAEAVEELLLRMAKTGQISHKISEPELIELLEKISGEKRNETKIVI : YF69_SCHPO YMW4_YEAST : .AGGGENSAPAAIANFLEPQALERLSRVALVRRDRAQAVETYLKKLIATNNVTHKITEAEIVSILNGIAKQQNNSKIIFE : YMW4_YEAST consv : --3-273433568336-522-43--25838573836556-2384484316682-37581274298238323542-3422- : consv : 1---------11--------21--------31--------41--------51--------61--------71-------- :OrigSeq : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : OrigSeq jalign : --HHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----EE--- : jalignjfreq : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH----EEEEE-- : jfreqjhmm : -HHHHHHHHHHHHHHH---HHHHHHHHHHH----HHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH---EEEEE- : jhmmjnet : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEEEE- : jnetjpssm : --HHHHHHHHHHHHHH--HHHHHHH-HEEEE---HHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEE--- : jpssmmul : --HHHHHHHHHHHHHHHHH--HHHHHHHH-H--HHHHHHHHHHHHHH----------HHHHHHHHHHHHHHH--H-EEE- : mulnnssp : HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHH--------HHHHHHHHHHHHH-----EEEEE : nnsspphd : ---HHHHHHHHHHHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----EEE-- : phdpred : ---HHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHH-------HHHHHHHHHHHHHHHHHHHHH---- : predzpred : --HHHHHHHHHHHHHEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-EE----HHHHHHHHHHHHHHHHH---EE-- : zpred Jpred : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHH----EEEE-- : Jpred

PHDHtm : -------------------------------------------------------------------------------- : PHDHtmMCoil : -------------------------------------------------------------------------------- : MCoilMCoilDI : -------------------------------------------------------------------------------- : MCoilDIMCoilTRI : -------------------------------------------------------------------------------- : MCoilTRILupas 21 : -------------------------------------------------------------------------------- : Lupas 21Lupas 14 : -------------------------------------------------------------------------------- : Lupas 14Lupas 28 : -------------------------------------------------------------------------------- : Lupas 28 PHDacc : ----B---B-BBBBBBB---B---BB-B-BB----B-BB-BBBB-BB-BB-B---B----B--BB--B------B-B-U- : PHDaccJnet_25 : ---BB---B--BBB-BB---B--BB--B-BB---BB-BBB-BBB-BB-BB-B---B----BB-BB--B--------B--- : Jnet_25Jnet_5 : -----------BB--B----B---B--B----------B---B--B--------------B--BB--------------- : Jnet_5Jnet_0 : --------------------------------------B---B--B--------------B------------------- : Jnet_0 PHD Rel : 97527999999999999899999999986315269999999999999964332235649999999999962356225319 : PHD RelPred Rel : 00777700999990990609990999886606668099999999009677787757768989909999957077777000 : Predator RelJnet Rel : 79889998888998643697888849188454657899999999988626987657778999999986007883747728 : Jnet Rel

Accuracy EvaluationAccuracy Evaluation

By Liang-Yu Shih

Per-residue accuracy Q3 measurement: traditional way Mathew’s correlation coefficient:

Per-segment accuracy SOV measurement: CASP2

Subcategorizing the incorrect prediction

Over: predict alpha/beta when it is coil Under: predict coil when it is alpha/beta Wrong: predict alpha when it is beta or

vice versa

Methods

How to measure Q3How to measure Q3

Qindex:

Qhelix, Qstrand and Qcoil: for a single conformational state:

Qi = [(number of residues correctly predicted in state i)/(number of residues observed in

state i)] x 100

Q3: for all three states

Q3 = [(number of residues correctly predicted)/(number of all residues)] x 100

How to measure How to measure MatthewMatthew

coefficientscoefficients

Problems in Problems in per-residue accuracyper-residue accuracy

1. It does not reflect 3D structure. Example: assigning the entire

myoblobin chain as a single helix gives a Q3 score of 80.

2. Conformational variation observed at secondary structure segment ends.

Example: low Q3 value but can predict folding well.

Q: What is a good measure?Q: What is a good measure?A: A structurally oriented A: A structurally oriented

measuremeasure A structurally oriented measure consider the

following………..

1. Type and position of secondary structure segments rather than a per-residue assignment of conformational state.

2. Natural variation of segment boundaries among families of homologous proteins.

How to measure SOVHow to measure SOV

SOV ExampleSOV Example

Observed (S1): CCEEECCCCCCEEEEEECCC

Predicted (S2): CCCCCCCEEEEECCCEECCC Minov # ##

Maxov

SOV Example Cont.SOV Example Cont.

Sov(E) = 6.346*)6

22

10

11(*

366

1*100

EEECCCCCCEEEEEE

[minov(s1, s2) + delta(s1,s2)] / maxov(s1, s2)

S(E’) S(E’) S(E) S(E)

Delta(s1,s2)=min[(10-1);(1);(15/2);(10/2)]

Delta(s1,s2)=min[(6-2);(2);(15/2);(10/2)]

Evaluation-Step 1Evaluation-Step 1(query sequence)(query sequence)

Hypothetical Protein :

MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK

80 residues Methanothermobacter thermautotrophicus Structures solved by NMR Christendat,D., et al. Nat. Struct. Biol. 7 (10),

903-909 (2000)

Evaluation-Step 2 (programs)Evaluation-Step 2 (programs)  Explicit rules Nearest-

NeighborsNeural-Networks based prediction

PSI-Profile

HMM

First Generation(information is from a single residue, of a single sequence)

Lim 1974        

Second Generation(Local interactions)

 Levin et al 1986Nishikawa and Ooi 1986

Holley and Karplus 1989Qian and Sejnowski 1988

   

PREDATOR 1996

Third Generation(Information is from homologous sequences)

APSSP1995

  SAM-T99sec

    PHD 1993

Jpred 1999

PROFsec2000

SSPRO2  

 

SeversSevers1. APSSPhttp://imtech.ernet.in/raghava/

apssp/2. JPred http://jura.ebi.ac.uk:8888/3. PHDhttp://cubic.bioc.columbia.edu/

predictprotein4. PROFsechttp://

cubic.bioc.columbia.edu/predictprotein5. PSIpredhttp://insulin.brunel.ac.uk/

psiform.html6. SAM-T99sec

http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html

Evaluation-Step 3Evaluation-Step 3

Conversion of DSSP secondary structure from 8 states to 3 states:

DSSP H G I E B T S ' '

USED H H H E E L L L

H: alpha helix

E: beta strand

L: coil (others)

•First column: protein sequence (AA) in one-letter code

•Second column: observed (OSEC) secondary structure

•Third column: predicted (PSEC) secondary structure

http://predictioncenter.llnl.gov/local/sov/sov.html

Evaluation-Step 4

Evaluation-ResultEvaluation-Result

Method Measurement ALL HELIX STRAND COIL

Jpred Q3 73.8 100.0 100.0 47.5

SOV 62.2 80.5 100.0 48.1

Apssp Q3 72.5 97.5 100.0 47.5

SOV 67.3 93.8 100.0 46.9

Sam-T99 Q3 72.5 100.0 100.0 45.0

SOV 65.8 93.8 100.0 44.2

PHD Q3 67.5 97.5 100.0 37.5

SOV 56.5 80.0 100.0 38.5

Predator Q3 70.0 95.5 100.0 45.0

SOV 66.4 89.4 100.0 48.0

SSRPO Q3 77.5 100.0 100.0 55.0

SOV 69.1 94.0 100.0 50.0

 

EVA: Evaluation of Automatic EVA: Evaluation of Automatic protein structure prediction protein structure prediction

http://cubic.bioc.columbia.edu/eva/sec/graph/common3.jpg

ConclusionConclusion

Jpred is the pioneer of methods which give high Q3 and SOV scores.

The 2ndary structure prediction using a jury of neural networks is one of the best methods.

REFERENCES1. Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ. “Jpred: A consensus secondary

structure prediction server,” Bioinformatics, 1998;14:892-893.

2. Cuff,J.A. and Barton, G.J. “Evaluation and improvement of multiple sequence methods for protein secondary structure prediction.” Proteins: Structure, Functions, and Genetics, 1999;34:508-519.

3. Cuff,J.A. and Barton, G.J. “Application of multiple sequence alignment profiles to improve protein secondary structure prediction.” Proteins: Structure, Functions, and Genetics, 2000;40:502-511.

4. Zemla et al. A modified definition of Sov, a Segment-Based Measure for Protein Secondary Structure Prediction Assessment. Protein; 1999:34:220-223

 5. Defay T, Cohen F. Evaluation of current techniques for ab initio protein structure

prediction. Proteins 1995; 23:431-445.  

6. Barton GJ. Protein secondary structure prediction. Curr Opin Struct Biol 1995; 5:372-376 

7. Schulz GE. A critical evaluation of methods for prediction of secondary structures. Ann Rev Biophys Chem 1988; 17:1-21

 8. Zhu Z-Y. A new approach to the evaluation of protein secondary structure predictions at

the level of the elements of secondary strucuter. Protein Eng 1995; 8:103-108