26
Protein Function Protein Function Prediction Prediction Function categories of proteins : Function categories of proteins : Proteins can be divided into 3 Proteins can be divided into 3 categories categories 1- Biochemical 1- Biochemical functions. functions. 2- Sub cellular 2- Sub cellular locations. locations. 3- Cell role. 3- Cell role.

Protein Function Prediction

Embed Size (px)

DESCRIPTION

Protein Function Prediction. Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role. Sub cellular locations. 1- Cytoplasm 2- Nuclear 3- Mitochondria 4- Extra cellular 5- Golgi apparatus - PowerPoint PPT Presentation

Citation preview

Protein Function PredictionProtein Function PredictionFunction categories of proteins :Function categories of proteins :

Proteins can be divided into 3 Proteins can be divided into 3 categoriescategories

1- Biochemical functions. 1- Biochemical functions.

2- Sub cellular locations.2- Sub cellular locations.

3- Cell role. 3- Cell role.

Sub cellular locationsSub cellular locations

1- Cytoplasm 2- Nuclear 3- Mitochondria 4- Extra cellular 5- Golgi apparatus 6- Chloroplast 7- Endoplasmic

reticulum 8- Cytoskeleton 9- Vacuole 10- Peroxisome 11- Lysosome 12- Plasma membrane

Protein function prediction Protein function prediction methodsmethods

1- Analyzing Gene 1- Analyzing Gene expressionexpression2- phylogenetic profiles2- phylogenetic profiles3- protein fusion 3- protein fusion 4- Protein sequences4- Protein sequences --N- protein protein interactionN- protein protein interaction

protein protein interactionprotein protein interaction

What do we mean by protein interaction ?What do we mean by protein interaction ? Do you mean physical contact ?Do you mean physical contact ? no, but higher levels of relationsno, but higher levels of relations 1- Inclusion in multi protein complexes1- Inclusion in multi protein complexes 2- Common cellular compartments2- Common cellular compartments 3- Same signalling path way 3- Same signalling path way 4- Same metabolic path way4- Same metabolic path way 5- Co-expression 5- Co-expression 6- Genetic co-regulation6- Genetic co-regulation 7- Molecular co-evolution7- Molecular co-evolution

protein interactionprotein interactionComplete protein interaction Complete protein interaction inter inter

actomactom

Very complicated

Not only for large number of proteins but for the range of distinct

types of protein interaction

Types of protein interactionTypes of protein interaction

Three levels of associations Three levels of associations

1- Physical interaction1- Physical interaction

2- Correlated proteins 2- Correlated proteins

3- Co-located proteins3- Co-located proteins

Physical interactionPhysical interaction

Proteins forming a stable protein complex That carries out

bio molecular Role

(structure – function)- ATPas

- Ribosomal proteins

Proteins come together in certainCellular state to undertake

Bio molecular functions- DNA replication complex

- Proteins involved in signal transduction

Permanent interaction

Transient interaction

Correlated proteins

Proteins involved in the same Metabolic path way (enzymes)

Krebs cycle enzymes

Proteins are encoded by co expressed or

co regulated genesProteins that regulate a phase

of the cell cycle

Metabolic correlation

Genetic correlation

Co-located proteins

Proteins placed in the same cellular soluble space Proteins in lysosome

Proteins in mitochondria

Proteins placed in the same Cellular membrane

Receptors in the plasma memTransporters in the mitochondria mem

Soluble location

Membrane location

Databases of Protein interactionDatabases of Protein interaction

Bimolecular Interaction Network Database (BIND);

Database of Interacting Proteins (DIP); the General Repository for Interaction

Datasets (GRID); Molecular Interactions Database (MINT); Database of predicted functional

associations among genes/proteins (HNB) that has 3 tools [ SMART - mini PEDANT – STRING ].

How How can we predict the function of can we predict the function of un annotated proteins through PPIun annotated proteins through PPI

m2

p5

p3

p2

m1

p1

Annotated protein

Un Annotated protein

function

f2

f4 f6

f5

f1

f7

f9

f3

f1

f8

f1

f5

If two proteins interact, they are neighborhood of each other. the functions of un annotated protein’s neighbors contain information about the un annotated proteins

Our objective Our objective Is to assign functions to all the un Is to assign functions to all the un

annotated proteins based on the annotated proteins based on the functions of the annotated proteins.functions of the annotated proteins.

ConditionsConditions - Protein may have several different Protein may have several different

functions up to 8 functionsfunctions up to 8 functions- We do not know which combinations of We do not know which combinations of

the functions contribute the interactionthe functions contribute the interaction

GivenGiven

Suppose that genome has N proteins Suppose that genome has N proteins P1P1 PN PN

PP11 P Pnn un annotated proteins un annotated proteins

PPn+1n+1 P Pn+mn+m annotated proteins N=n+m annotated proteins N=n+m

Xi = 1 protein has function Xi = 1 protein has function

0 protein has not function 0 protein has not function

X = ( xX = ( x11,x,x22,…….,x,…….,xn+mn+m))

Assumptions Assumptions

Let Let OO i ji j observed interaction between observed interaction between PPii and P and Pj.j.

OO i j i j = 1 proteins have interaction= 1 proteins have interaction OO i j i j = 0 other wise = 0 other wise SS = { P = { Pii <-> P <-> PJJ : O : O i j , i, j =1…N}i j , i, j =1…N} Nei (i)Nei (i) set of proteins interact with P set of proteins interact with Pii

∏∏ jj fraction of all proteins having fraction of all proteins having function Ffunction Fjj

Previous methods Previous methods

Neighborhood counting method Neighborhood counting method Frequencies of its neighbors methodFrequencies of its neighbors method Chi square methodChi square method

Markov random field methodMarkov random field method

Neighborhood counting methodNeighborhood counting method

Method to predict the function based Method to predict the function based on the functions of its neighbors ( on the functions of its neighbors ( all all annotated functions are ordered in list )annotated functions are ordered in list )

Dis advantagesDis advantages

1- no significance value1- no significance value

2- ignore the size of 2- ignore the size of

functional classesfunctional classes

3- equal weights for 3- equal weights for

distance neighbors distance neighbors

F1 most frequent

F2

F3

F4

|

|

FN least frequent

Frequencies of its neighbors methodFrequencies of its neighbors method

As same as the previous method As same as the previous method Neighborhood Neighborhood

counting methodcounting method but assign k functions for un but assign k functions for un annotated protein with k largest frequencies in annotated protein with k largest frequencies in its neighbors.its neighbors.

Dis advantagesDis advantages

1- it does not take consideration that some 1- it does not take consideration that some proteins have same function.proteins have same function.

2- if we have famous function in the neighbors, the 2- if we have famous function in the neighbors, the probability that the un annotated protein has the probability that the un annotated protein has the same function is larger than any other functionsame function is larger than any other function

Chi square methodChi square method Let ni (j) number of proteins interact with Let ni (j) number of proteins interact with

protein i & having function j. protein i & having function j. observed freqobserved freq

#Nei (i) number of proteins in Nei (i) #Nei (i) number of proteins in Nei (i) ei (j) = # Nei (j) . ei (j) = # Nei (j) . ∏∏j j Expected freqExpected freq

SSi (j) = i (j) = [[ ni (j) – ei (j) ni (j) – ei (j) ]]²² / / ei (j)ei (j)

We will take the highest scoreWe will take the highest score

Significance value but …………………?Significance value but …………………?

Test

Problem of Chi square methodProblem of Chi square method

Equally weightedEqually weighted

it is obvious that proteins it is obvious that proteins farfar away away from Pi contribute less from Pi contribute less information than those information than those closeclose neighbors.neighbors.

It is not clear how can we choose the It is not clear how can we choose the correct weightscorrect weights

m2

p5

p2

f6

f5

f1

f7

Problem Problem The problems are how to assign The problems are how to assign

different weights to the different weights to the parameters. parameters.

How to estimate the probabilities How to estimate the probabilities based on the network.based on the network.

Markov random field methodMarkov random field method

Over come all the above problemsOver come all the above problems X=(x1, x2, ….., xN) functional annotationX=(x1, x2, ….., xN) functional annotation Xi = 1 i protein has the functionXi = 1 i protein has the function

= 0 other wise= 0 other wise

Get the prior probability distribution of x Get the prior probability distribution of x based on the interaction network (based on the interaction network (Gibbs Gibbs distributiondistribution))

Calc the posterior probability of the Calc the posterior probability of the function of the un annotated proteinfunction of the un annotated protein

Pr (g)

ps

ob

Check the accuracy Check the accuracy By using leave one out method By using leave one out method for each annotated protein with at least one for each annotated protein with at least one

interaction. We assume that it as un annotated interaction. We assume that it as un annotated protein and predict the function.protein and predict the function.

Compare the prediction and the annotationCompare the prediction and the annotationRepeat the Repeat the LOO LOO for all proteins Pi…..Pkfor all proteins Pi…..PkCheck the sensitivity = ∑ Ki / ∑ ni where Check the sensitivity = ∑ Ki / ∑ ni where Ki over lap between the set of observed & Ki over lap between the set of observed &

predicted predicted functions functions Ni number of the functions of protein PiNi number of the functions of protein Pi

Limitations and assumptions Limitations and assumptions

LimitationsLimitations

- Interaction network and functional - Interaction network and functional annotation of proteins are incomplete.annotation of proteins are incomplete.

- The actual number of interacting - The actual number of interacting protein much than what we have.protein much than what we have.

AssumptionsAssumptions

- annotated proteins have complete - annotated proteins have complete functional annotation.functional annotation.

Note Note

We can take into consideration the We can take into consideration the correlation between the functions.correlation between the functions.

Ex: protein has function A may Ex: protein has function A may increase the chance to have function increase the chance to have function B because A,B are highly correlated. B because A,B are highly correlated.

Chi square TestChi square Test Is a test of significance of overall deviation Is a test of significance of overall deviation

square in the observed and expected square in the observed and expected frequencies divided by the expected ones.frequencies divided by the expected ones.

XX²= ∑ { (o - e)² / e} where o observed freq²= ∑ { (o - e)² / e} where o observed freq e expected freqe expected freq Compare x² and the tabulated valueCompare x² and the tabulated value If x² > the tabulated value If x² > the tabulated value value is value is

significancesignificance x² depends on the - number of the classesx² depends on the - number of the classes - the degree of the freedom - the degree of the freedom 0--- ∞0--- ∞

Back

ThanksThanks