42
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki Earlier version: Hung Ta Current: Petri Törönen

Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Embed Size (px)

Citation preview

Page 1: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Secondary Structure Prediction and Signal Peptides

Protein Analysis Workshop 2012

Bioinformatics groupInstitute of BiotechnologyUniversity of helsinki

Earlier version: Hung Ta

Current: Petri Törönen

Page 2: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Why Sec. Struct. Predictions andsignal peptides?

Usually sequence homology represents good source of information

However sometimes one does not get good homology

We need other sources of information to aid us• Domain (profile) homologies (later lectures)

• Secondary structure

• Signal peptides

• Transmembrane regions

Sec.Struct. And signal peptides also good information for other bioinformatics tools

Page 3: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Secondary Structure

Alternative when only weak sequence homology Structure more conserved than sequence

Similar sec. struct. gives extra support for weak sequence homology

Special cases of sec. struct. can suggest function or localization

Page 4: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Hierachy of Protein Structure Hierachy of Protein Structure

Page 5: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Primary Structure: a Primary Structure: a Linear Arrangement Linear Arrangement of Amino Acidsof Amino Acids

An amino acid has several structural components: a central carbon atom (C), an amino group (NH2), a carboxyl group (COOH), a hydrogen atom (H), a side chain (R). There are 20 amino acids

The peptide bond is formed as the cacboxyl group of an aa bind to the amino group of the adjacent aa.

The primary structure of a protein is simply the linear arrangement, or sequence, of the amino acid residues that compose it

Page 6: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Secondary Structure: Secondary Structure: Core Elements of Core Elements of Protein ArchitectureProtein Architecture

resulted from the folding of localized parts of a

polypeptide chain.

α-helix

β-sheet

Coils, turns,

} major internal supportive elements, 60 percent of the polypeptide chain

Page 7: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

αα-Helix-Helix

Hydrogen-bonded

3.6 residues per turn

Axial dipole moment

Side chains point outward

Average length is 10 amino acids

(3 turns).

Typically, rich of Analine,

Glutamine, Leucine, Methione;

and poor of Proline, Glycine,

Tyrosine and Serine.

Page 8: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

ββ-Sheet-Sheet

Formed due to hydrogen bonds

between β-strands which are short

polypeptide segments (5-8

residues).

Adjacent β-strands run in the

same directions -> parallel sheet.

Adjacent β-strands run in the

oposite directions -> anti-parallel

sheet.

Ribbon diagram

Page 9: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Turns, loops, coils…Turns, loops, coils…

A turn, composed of 3-4 residues, forms

sharp bends that redirect the polypeptide

backbone back toward the interior.

A loop is similar with turns but can form

longer bends

Turns and loops help large proteins fold into

compact structures.

A random coil is a class of conformations

that indicate an absence of regular

secondary structure.

Turn

Page 10: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Secondary Structure PredictionSecondary Structure Prediction

Primary: MSEGEDDFPRKRTPWCFDDEHMC

Secondary: CCHHHHHHCCCCEEEEEECCCCC

Why: the first level of structural organization.

The tasks:

• H: α-helix

• E: β- strand

• T: turn

• C: coil

aa

?

Page 11: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Secondary Structure PredictionSecondary Structure Prediction

Single residue statistical analysis (Chou-Fasman -1974): For each amino acid type, assign its ‘propensity’ to be in a helix, β-

sheet, or coil.

Based on 15 proteins of known conformation, 2473 total amino

acids.

Limited accuracy: ~55-60% on average.

Eg: Chou-Fasman (1974), not used any more

Page 12: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Secondary Structure PredictionSecondary Structure Prediction

Segment-based statistics: Look for correlations (within 11-21 aa windows).

Many algorithms have been tried.

Most performant: Neural Networks:

Input: a number of protein sequences with their known secondary

structure.

Output: a trained network that predicts secondary structure elements for

given query sequences.

Accuracy < 70%.

Page 13: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Popular Servers for Secondary Structure Prediction

Jpred (http://www.compbio.dundee.ac.uk/www-jpred/ )

Psipred (http://bioinf.cs.ucl.ac.uk/psipred/ ) Metaserver PredictProtein

(http://www.predictprotein.org/ ).

Page 14: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

PSIPRED and JPRED

Test with uniprot|P00772|ELA1_PIG Elastase-1 precursor

Correct answer: http://www.uniprot.org/uniprot/P00772

Page 15: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/result/351083)

Page 16: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

JPRED (http://www.compbio.dundee.ac.uk/www-jpred/results/jp_Pt7zBV4/jp_Pt7zBV4.results.html)

•Above the summary•On the right the Detailed view

Page 17: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Special Cases of Secondary Special Cases of Secondary StructureStructure

Informative special cases of secondary structures. These include: Coiled Coil regions Transmembrane regions

Page 18: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Prediction of coiled-coilsPrediction of coiled-coils

• Coiled-coil protein are often biologically relevant regulators (Transcription Factors)• Coiled-coils are generally solvent exposed multi-stranded helix structures:

Helix periodicity and solvent exposure imposespecial pattern of heptad repeat:

… abcdefg … hydrophobic residues hydrophilic residues

two-stranded

(From Wikipedia Leucine zipper article)

Helical diagram of2 interacting helices:

Page 19: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Compares a sequence to a database of known, parallel two-stranded coiled-coils, and derives a similarity score.

By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation.

Options:• scoring matrices,• window size (score may vary),• weighting options.

The COILS server at EMBnetThe COILS server at EMBnet

Page 20: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

The program works well for parallel two-stranded structures that are solvent-exposed but runs progressively into problems with the addition of more helices, their antiparallel orientation and their decreasing length.

The program fails entirely on buried structures.

COILS LimitationsCOILS Limitations

Page 21: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

COILS DemoCOILS Demo

Let us submit the sequence

to the COILS server at EMBnet:

http://www.ch.embnet.org/software/COILS_form.html

>1jch_AVAAPVAFGFPALSTPGAGGLAVSISAGALSAAIADIMAALKGPFKFGLWGVALYGVLPSQIAKDDPNMMSKIVTSLPADDITESPVSSLPLDKATVNVNVRVVDDVKDERQNISVVSGVPMSVPVVDAKPTERPGVFTASIPGAPVLNISVNNSTPAVQTLSPGVTNNTDKDVRPAFGTQGGNTRDAVIRFPKDSGHNAVYVSVSDVLSPDQVKQRQDEENRRQQEWDATHPVEAAERNYERARAELNQANEDVARNQERQAKAVQVYNSRKSELDAANKTLADAIAEIKQFNRFAHDPMAGGHRMWQMAGLKAQRAQTDVNNKQAAFDAAAKEKSDADAALSSAMESRKKKEDKKRSAENNLNDEKNKPRKGFKDYGHDYHPAPKTENIKGLGDLKPGIPKTPKQNGGGKRKRWTGDKGRKIYEWDSQHGELEGYRASDGQHLGSFDPKTGNQLKGPDPKRNIKKYL

Page 22: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Correct answer:http://www.rcsb.org/pdb/explore/explore.do?structureId=1JCH

Page 23: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Correct answer:http://www.rcsb.org/pdb/explore/explore.do?structureId=1JCH

Page 24: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Transmembrane proteins are important receptor or transport proteins.

Transmembrane regions: Usually contain residues with hydrophobic side

chains (surface must be hydrophobic). Usually ~20 residues long, can be up to 30 if

not perpendicular through membrane.Methods: Hydropathy plots (historical, better methods now available)

Threading (TMpred, MEMSAT), Hidden Markov Model (TMHMM), Neural Network (PHDhtm).

Transmembrane Region PredictionTransmembrane Region Prediction

Page 25: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Hydropathy Plots (Kyte-Doolittle)

The hydropathy index of an amino acid is a number

representing the hydrophobic or hydrophilic properties of

its side-chain

compute an average hydropathy value for each position

in the query sequence,

window length of 19 usually chosen for membrane-

spanning region prediction.

•Skip this

Page 26: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK

Hydropathy Plot ServersHydropathy Plot Servers

Let us submit the sequence

to

Membrane Explorer (also as standalone MPEx), Grease (http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1)

Remove the FASTA header, if seq reading is not working.

•Skip this

Page 27: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Hydropathy PlotHydropathy Plot

The larger the number is, the more hydrophobic the amino acid

Correct answer (http://pir.uniprot.org/uniprot/P06010)

•Skip this

Page 28: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Scans a candidate sequence for matches to a sequence scoring matrix, obtained by aligning the sequences of all transmembrane alpha-helical regions that are known from structures.

These sequences are collected in a database called TMBase.

TM PredTM Pred

Method summary:

Remark: Authors do not suggest this method for genomic sequences. Automatic methods recommended, eg, TMHMM, PHDhtm.

Page 29: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

TM Pred ServerTM Pred Server

>sp|P06010|RCEM_RHOVI Reaction center protein M chain (Photosynthetic reaction center M subunit) - Rhodopseudomonas viridis. ADYQTIYTQIQARGPHITVSGEWGDNDRVGKPFYSYWLGKIGDAQIGPIYLGASGIAAFAFGSTAILIILFNMAAEVHFDPLQFFRQFFWLGLYPPKAQYGMGIPPLHDGGWWLMAGLFMTLSLGSWWIRVYSRARALGLGTHIAWNFAAAIFFVLCIGCIHPTLVGSWSEGVPFGIWPHIDWLTAFSIRYGNFYYCPWHGFSIGFAYGCGLLFAAHGATILAVARFGGDREIEQITDRGTAVERAALFWRWTIGFNATIESVHRWGWFFSLMVMVSASVGILLTGTFVDNWYLWCVKHG AAPDYPAYLPATPDPASLPGAPK

Let us submit RCEM_RHOVI again

to the TMPred server at EMBnet:

http://www.ch.embnet.org/software/TMPRED_form.html

Page 30: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki
Page 31: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki
Page 32: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

allows you to obtain many informations based on your sequence including structure predictions, motif or domain search… The predictions are based on several methods.

PredictProtein: http://predictprotein.org

Meta-ServersMeta-Servers

A server which

Page 33: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

For sequence analysis, structure and function prediction. When you submit

any protein sequence PredictProtein retrieves similar sequences in the

database and predicts aspects of protein structure and function

SEG: finds low complexity regions.

ProSite: database of functional motifs, ie, biologically relevant short patterns

ProDom: a comprehensive set of protein domain families automatically generated

from the SWISS-PROT and TrEMBL sequence databases.

PROFsec (PHDsec): secondary structure,

PROFacc (PHDacc): solvent accessibility,

PHDhtm: transmembrane helices.

Sequence database is scanned for similar sequences (Blast, Psi-Blast).

Multiple sequence alignment profiles are generated by weighted dynamic

programming (MaxHom).

The PredictProtein meta-server

Page 34: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

PredictProtein Demo

Let´s submit again

to http://predictprotein.org/

>uniprot|P00772|ELA1_PIG Elastase-1 precursor MLRLLVVASLVLYGHSTQDFPETNARVVGGTEAQRNSWPSQISLQYRSGSSWAHTCGGTLIRQNWVMTAAHCVDRELTFRVVVGEHNLNQNDGTEQYVGVQKIVVHPYWNTDDVAAGYDIALLRLAQSVTLNSYVQLGVLPRAGTILANNSPCYITGWGLTRTNGQLAQTLQQAYLPTVDYAICSSSSYWGSTVKNSMVCAGGDGVRSGCQGDSGGPLHCLVNGQYAVHGVTSFVSRLGCNVTRKPTVFTRVSAYISWINNVIASN

For a list of mirror sites: http://predictprotein.org/newwebsite/doc/mirrors.html

Page 35: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki
Page 36: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Detailed results Summary view

Page 37: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Results

Page 38: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Documentation:• COILS: http://www.ch.embnet.org/software/coils/COILS_doc.html

• TMPred: http://www.ch.embnet.org/software/tmbase/TMBASE_doc.html

• MPEx: http://blanco.biomol.uci.edu/mpex/MPEXdoc.html

Articles: B. Rost: Evolution teaches neural networks. In Scientific applications of neural nets. Ed.

J.W.Clark, T.Lindenau, M.L. Ristig, 207-223 (1999).

D.T Jones: Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices. J.Mol.Biol. 292, 195-202 (1999).

B. Rost: Prediction in 1D: Secondary Structure, Membrane Helices, and Accessibility. In Structural Bioinformatics (reference below).

Books: P.E. Bourne, H. Weissig: Structural Bioinformatics. Wiley-Liss, 2003.

A. Tramontano: Protein Structure Prediction. Wiley-VCH, 2006.

References •Skip this

Page 39: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Short peptide chain that directs the transport of protein

Peptide chain is located mostly in N or C-terminus

Targets in eukaryotes: ER, nucleus, nucleolus, mitochonrion, peroxisome

Bacteries use them to secrete proteins When one does not have the sequence

homology these still can tell the potential location of the protein => a hint to function

Signal PeptidesSignal Peptides

Page 40: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Challenge is to determine weak signal from the background noise

Various machine learning methods used Hidden Markov Models (HMM) Neural Networks

Most popular tool: SignalP http://www.cbs.dtu.dk/services/SignalP/

Prediction of signal peptidesPrediction of signal peptides

Page 41: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

Tools that predict the cellular localization automatically

Wolf Psort: http://wolfpsort.org/ TargetP: http://www.cbs.dtu.dk/services/TargetP/

Prediction of cellular localizatio nPrediction of cellular localizatio n

Page 42: Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki

http://www.signalpeptide.de/ Collection of the information on known and

predicted sign.peptide - protein pairs Allows search with sequence name and keywords Advanced search allows limitation of hits to single

species

This is useful when looking for extra information for the known protein

Signal Peptide DatabaseSignal Peptide Database