Upload
samuel-hanson
View
55
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Protein Function Prediction. Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role. Sub cellular locations. 1- Cytoplasm 2- Nuclear 3- Mitochondria 4- Extra cellular 5- Golgi apparatus - PowerPoint PPT Presentation
Citation preview
Protein Function PredictionProtein Function PredictionFunction categories of proteins :Function categories of proteins :
Proteins can be divided into 3 Proteins can be divided into 3 categoriescategories
1- Biochemical functions. 1- Biochemical functions.
2- Sub cellular locations.2- Sub cellular locations.
3- Cell role. 3- Cell role.
Sub cellular locationsSub cellular locations
1- Cytoplasm 2- Nuclear 3- Mitochondria 4- Extra cellular 5- Golgi apparatus 6- Chloroplast 7- Endoplasmic
reticulum 8- Cytoskeleton 9- Vacuole 10- Peroxisome 11- Lysosome 12- Plasma membrane
Protein function prediction Protein function prediction methodsmethods
1- Analyzing Gene 1- Analyzing Gene expressionexpression2- phylogenetic profiles2- phylogenetic profiles3- protein fusion 3- protein fusion 4- Protein sequences4- Protein sequences --N- protein protein interactionN- protein protein interaction
protein protein interactionprotein protein interaction
What do we mean by protein interaction ?What do we mean by protein interaction ? Do you mean physical contact ?Do you mean physical contact ? no, but higher levels of relationsno, but higher levels of relations 1- Inclusion in multi protein complexes1- Inclusion in multi protein complexes 2- Common cellular compartments2- Common cellular compartments 3- Same signalling path way 3- Same signalling path way 4- Same metabolic path way4- Same metabolic path way 5- Co-expression 5- Co-expression 6- Genetic co-regulation6- Genetic co-regulation 7- Molecular co-evolution7- Molecular co-evolution
protein interactionprotein interactionComplete protein interaction Complete protein interaction inter inter
actomactom
Very complicated
Not only for large number of proteins but for the range of distinct
types of protein interaction
Types of protein interactionTypes of protein interaction
Three levels of associations Three levels of associations
1- Physical interaction1- Physical interaction
2- Correlated proteins 2- Correlated proteins
3- Co-located proteins3- Co-located proteins
Physical interactionPhysical interaction
Proteins forming a stable protein complex That carries out
bio molecular Role
(structure – function)- ATPas
- Ribosomal proteins
Proteins come together in certainCellular state to undertake
Bio molecular functions- DNA replication complex
- Proteins involved in signal transduction
Permanent interaction
Transient interaction
Correlated proteins
Proteins involved in the same Metabolic path way (enzymes)
Krebs cycle enzymes
Proteins are encoded by co expressed or
co regulated genesProteins that regulate a phase
of the cell cycle
Metabolic correlation
Genetic correlation
Co-located proteins
Proteins placed in the same cellular soluble space Proteins in lysosome
Proteins in mitochondria
Proteins placed in the same Cellular membrane
Receptors in the plasma memTransporters in the mitochondria mem
Soluble location
Membrane location
Databases of Protein interactionDatabases of Protein interaction
Bimolecular Interaction Network Database (BIND);
Database of Interacting Proteins (DIP); the General Repository for Interaction
Datasets (GRID); Molecular Interactions Database (MINT); Database of predicted functional
associations among genes/proteins (HNB) that has 3 tools [ SMART - mini PEDANT – STRING ].
How How can we predict the function of can we predict the function of un annotated proteins through PPIun annotated proteins through PPI
m2
p5
p3
p2
m1
p1
Annotated protein
Un Annotated protein
function
f2
f4 f6
f5
f1
f7
f9
f3
f1
f8
f1
f5
If two proteins interact, they are neighborhood of each other. the functions of un annotated protein’s neighbors contain information about the un annotated proteins
Our objective Our objective Is to assign functions to all the un Is to assign functions to all the un
annotated proteins based on the annotated proteins based on the functions of the annotated proteins.functions of the annotated proteins.
ConditionsConditions - Protein may have several different Protein may have several different
functions up to 8 functionsfunctions up to 8 functions- We do not know which combinations of We do not know which combinations of
the functions contribute the interactionthe functions contribute the interaction
GivenGiven
Suppose that genome has N proteins Suppose that genome has N proteins P1P1 PN PN
PP11 P Pnn un annotated proteins un annotated proteins
PPn+1n+1 P Pn+mn+m annotated proteins N=n+m annotated proteins N=n+m
Xi = 1 protein has function Xi = 1 protein has function
0 protein has not function 0 protein has not function
X = ( xX = ( x11,x,x22,…….,x,…….,xn+mn+m))
Assumptions Assumptions
Let Let OO i ji j observed interaction between observed interaction between PPii and P and Pj.j.
OO i j i j = 1 proteins have interaction= 1 proteins have interaction OO i j i j = 0 other wise = 0 other wise SS = { P = { Pii <-> P <-> PJJ : O : O i j , i, j =1…N}i j , i, j =1…N} Nei (i)Nei (i) set of proteins interact with P set of proteins interact with Pii
∏∏ jj fraction of all proteins having fraction of all proteins having function Ffunction Fjj
Previous methods Previous methods
Neighborhood counting method Neighborhood counting method Frequencies of its neighbors methodFrequencies of its neighbors method Chi square methodChi square method
Markov random field methodMarkov random field method
Neighborhood counting methodNeighborhood counting method
Method to predict the function based Method to predict the function based on the functions of its neighbors ( on the functions of its neighbors ( all all annotated functions are ordered in list )annotated functions are ordered in list )
Dis advantagesDis advantages
1- no significance value1- no significance value
2- ignore the size of 2- ignore the size of
functional classesfunctional classes
3- equal weights for 3- equal weights for
distance neighbors distance neighbors
F1 most frequent
F2
F3
F4
|
|
FN least frequent
Frequencies of its neighbors methodFrequencies of its neighbors method
As same as the previous method As same as the previous method Neighborhood Neighborhood
counting methodcounting method but assign k functions for un but assign k functions for un annotated protein with k largest frequencies in annotated protein with k largest frequencies in its neighbors.its neighbors.
Dis advantagesDis advantages
1- it does not take consideration that some 1- it does not take consideration that some proteins have same function.proteins have same function.
2- if we have famous function in the neighbors, the 2- if we have famous function in the neighbors, the probability that the un annotated protein has the probability that the un annotated protein has the same function is larger than any other functionsame function is larger than any other function
Chi square methodChi square method Let ni (j) number of proteins interact with Let ni (j) number of proteins interact with
protein i & having function j. protein i & having function j. observed freqobserved freq
#Nei (i) number of proteins in Nei (i) #Nei (i) number of proteins in Nei (i) ei (j) = # Nei (j) . ei (j) = # Nei (j) . ∏∏j j Expected freqExpected freq
SSi (j) = i (j) = [[ ni (j) – ei (j) ni (j) – ei (j) ]]²² / / ei (j)ei (j)
We will take the highest scoreWe will take the highest score
Significance value but …………………?Significance value but …………………?
Test
Problem of Chi square methodProblem of Chi square method
Equally weightedEqually weighted
it is obvious that proteins it is obvious that proteins farfar away away from Pi contribute less from Pi contribute less information than those information than those closeclose neighbors.neighbors.
It is not clear how can we choose the It is not clear how can we choose the correct weightscorrect weights
m2
p5
p2
f6
f5
f1
f7
Problem Problem The problems are how to assign The problems are how to assign
different weights to the different weights to the parameters. parameters.
How to estimate the probabilities How to estimate the probabilities based on the network.based on the network.
Markov random field methodMarkov random field method
Over come all the above problemsOver come all the above problems X=(x1, x2, ….., xN) functional annotationX=(x1, x2, ….., xN) functional annotation Xi = 1 i protein has the functionXi = 1 i protein has the function
= 0 other wise= 0 other wise
Get the prior probability distribution of x Get the prior probability distribution of x based on the interaction network (based on the interaction network (Gibbs Gibbs distributiondistribution))
Calc the posterior probability of the Calc the posterior probability of the function of the un annotated proteinfunction of the un annotated protein
Pr (g)
ps
ob
Check the accuracy Check the accuracy By using leave one out method By using leave one out method for each annotated protein with at least one for each annotated protein with at least one
interaction. We assume that it as un annotated interaction. We assume that it as un annotated protein and predict the function.protein and predict the function.
Compare the prediction and the annotationCompare the prediction and the annotationRepeat the Repeat the LOO LOO for all proteins Pi…..Pkfor all proteins Pi…..PkCheck the sensitivity = ∑ Ki / ∑ ni where Check the sensitivity = ∑ Ki / ∑ ni where Ki over lap between the set of observed & Ki over lap between the set of observed &
predicted predicted functions functions Ni number of the functions of protein PiNi number of the functions of protein Pi
Limitations and assumptions Limitations and assumptions
LimitationsLimitations
- Interaction network and functional - Interaction network and functional annotation of proteins are incomplete.annotation of proteins are incomplete.
- The actual number of interacting - The actual number of interacting protein much than what we have.protein much than what we have.
AssumptionsAssumptions
- annotated proteins have complete - annotated proteins have complete functional annotation.functional annotation.
Note Note
We can take into consideration the We can take into consideration the correlation between the functions.correlation between the functions.
Ex: protein has function A may Ex: protein has function A may increase the chance to have function increase the chance to have function B because A,B are highly correlated. B because A,B are highly correlated.
Chi square TestChi square Test Is a test of significance of overall deviation Is a test of significance of overall deviation
square in the observed and expected square in the observed and expected frequencies divided by the expected ones.frequencies divided by the expected ones.
XX²= ∑ { (o - e)² / e} where o observed freq²= ∑ { (o - e)² / e} where o observed freq e expected freqe expected freq Compare x² and the tabulated valueCompare x² and the tabulated value If x² > the tabulated value If x² > the tabulated value value is value is
significancesignificance x² depends on the - number of the classesx² depends on the - number of the classes - the degree of the freedom - the degree of the freedom 0--- ∞0--- ∞
Back