Upload
lyre
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Course on Functional Analysis. ::: Introduction to Functional Analysis. ?. Daniel Rico , PhD. [email protected]. Bioinformatics Unit CNIO. ::: Schedule. Biological (Functional) Databases Threshold-based and threshold free methods Threshold-based example: FatiGO. - PowerPoint PPT Presentation
Citation preview
Daniel Rico, PhD. [email protected] Rico, PhD. [email protected]
::: Introduction to Functional Analysis::: Introduction to Functional Analysis
Course on Functional AnalysisCourse on Functional Analysis
Bioinformatics UnitCNIO
Bioinformatics UnitCNIO
::: Schedule.
1. Biological (Functional) Databases2. Threshold-based and threshold free methods3. Threshold-based example: FatiGO.4. Threshold free example 1: FatisScan.
Many of these slides have been taken and adapted from original slides by Fatima Al-Shahrour from Joaquin Dopazo’s group (Babelomics team).
We are grateful for the material and for the great tools they have developed!!!!
ACKNOWLEDGEMENTS
Arabidopsis thaliana
Homo sapiens
Mus musculus
Rattus
norvegicus
Drosophila melanogaster
Caenorhabditis elegans
Saccharmoyces cerevisae
Gallus gallus
Danio
rerio
HGNC symbol
EMBL acc
RefSeq
PDB
Protein Id
IPI….
Genes IDs
Gene Ontology
Biological Process Molecular Function Cellular Component
UniProt/Swiss-Prot
UniProtKB/TrEMBL
Ensembl IDs
EntrezGene
Affymetrix
Agilent
KEGG pathways Regulatory elementsmiRNA
CisRed
Transcription Factor Binding Sites
Biocarta pathways
InterPro Motifs
Bioentities from literature:
Diseases terms Chemical terms
Gene Expression in tissues
Keywords Swissprot
Biological databases
Gene Ontology CONSORTIUM
http://www.geneontology.org • The objective of GO is to provide controlled vocabularies for
the description of the molecular function, biological process and cellular component of gene products.
• These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them.
• The controlled vocabularies of terms are structured
GO structureThe three categories of GO
Molecular Function
the tasks performed by individual gene products; examples are transcription factor and DNA helicase
Biological Process
broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions
Cellular Component
subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex
GO tree structure
IS_A relation
PART_OF relation
http://www.genome.ad.jp/kegg/pathway.html
http://www.biocarta.com/genes/index.asp
http://www.reactome.org/
http://www.pathwaycommons.org
http://www.whichgenes.org/
http://www.cisred.org/
::: Schedule.
1. Biological (Functional) Databases2. Threshold-based and threshold free methods3. Threshold-based example: FatiGO.4. Threshold free example 1: FatisScan.
The two-steps approach
• Genes of interest are selected using the experimental value.
• Selected genes are compared to the background.
Threshold-based functional analysis
Study the enrichment in functional terms in groups of genes defined by
the experimental value.
FatiGO
GOminer
DAVID
Marmite
Threshold-free functional analysis
Select genes taking into account their functional properties.
FatiScan
GSEA
MarmiteScan
• Under a systems biology perspective.
• Detect blocks of functionally related genes.
Class1 Class2
ttest cut-off
FDR<0.05
FDR<0.05
Biological meaning?
Threshold-based functional analysis
ES/NES statistic
-
+
Class1 Class2
Gene Set 1
ttest cut-off
Gene Set 2
Gene Set 3
Gene set 3enriched in Class 2
Gene set 2enriched in Class 1
Threshold-free functional analysis
::: Schedule.
1. Biological (Functional) Databases2. Threshold-based and threshold free methods3. Threshold-based example: FatiGO.4. Threshold free example 1: FatisScan.
http://babelomics.bioinfo.cipf.es/
::: How the functional profiling should never be done
It is not uncommon to find the following assertion in papers and talks: “then we examined our set of genes selected in this way (whatever) and we discover that 65% of them were related to metabolism, so we can conclude that our experiment activates metabolism genes”.
Annotation is not a functional result!!!
::: Exercise 1: FatiGO SEARCH
1. Select “FatiGO Search” ” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “KEGG pathways” and click “Run”
::: Exercise 1: FatiGO SEARCH
1. Select “FatiGO Search” ” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “KEGG pathways” and click “Run”
FatiGO-Search annotations
Testing the distribution of GO terms among two groups of genes
(remember, we have to test hundreds of GOs)
Biosynthesis 60% Biosynthesis 20%
Sporulation 20% Sporulation 20%
Group A Group B
Genes in group A have significantly to do with biosynthesis, but not with sporulation.
Are this two groups of genes
carrying out different
biological roles?
84No biosynthesis
26Biosynthesis
BA
Using FatiGO
List1: genes of interest (they are significantly over- or under-expressed when two classes of experiments are compared, co-located in the chromosomes, etc.)
List2:the background (typically the rest of genes).
Select suitable database, Run...
List2
Remove genes repeated in list1
Remove genes repeated between
both lists
Remove genes repeated in list2
Extract functional
terms
Comparing groups of genes
List1“clean” List1
“clean” List2
BABELOMICS
GOKEGG
InterproKW
BioentitiesGene
ExpressionTF
Cisred
011000101010101001 ......11001010 ...........010001010 ...........0110001010 ...........1111001111...............
Matrix of functional
terms
Fisher´s test
Adjust p-value by FDR
ttest cut-off
FDR<0.05
FDR<0.05
List 1
List 2(background)
Class1 Class2
List 1b / List 2b
::: Exercise 2: FatiGO COMPARE
1. Select “FatiGO Compare” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “Rest of Genome” as background.4. Select “KEGG pathways” and click “Run”
::: Exercise 2: FatiGO COMPARE
1. Select “FatiGO Compare” and “H. sapiens”.2. Upload FatiGO_example.txt file3. Select “Rest of Genome” as background.4. Select “KEGG pathways” and click “Run”
Only “Apoptosis” is significant