Upload
melanie-courtot
View
970
Download
3
Embed Size (px)
Citation preview
The Gene Ontologyand Gene Ontology Annotation resources
Mélanie Courtot, Ph.D.EMBL-EBIGO/GOA Project leaderSPOT/UniProt content [email protected]
Industry workshopMarch 17 2016
In 1999, collaboration between 3 Model
Organism Databases
Ashburner et al., Nat Genet. 2000 May;25(1):25-9.
• A way to capture biological knowledge for individual gene productsin a written and computable form
• A set of concepts and their relationships to each other arrangedas a hierarchy
http://www.ebi.ac.uk/QuickGO
Less specific concepts
More specific concepts
The Gene Ontology
1. Molecular FunctionAn elemental activity or task or job
• protein kinase activity• insulin receptor activity
3. Cellular ComponentWhere a gene product is located
• mitochondrion
• mitochondrial matrix
• mitochondrial inner membrane
2. Biological ProcessA commonly recognized series of events
• cell division
Provide a public resource of data and tools
Annotate gene products using ontology terms
Develop the ontology
Aims of the GO project
Develop the ontology• An OWL ontology of >41,000 classes
• biological process, cellular component, molecular function
• > 14,000 imported classes (CL, Uberon, ChEBI, NCBI_tax)
• >136,000 logical axioms, including:• ~72,000 subClassOf axioms between named GO
classes• ~41,000 simple existential restrictions (subClassOf R
some C)• EL expressivity => fast, scalable reasoning (with
ELK)https://www.cs.ox.ac.uk/isg/tools/ELK/
Building the GO• The GO editorial team• Submission via GitHub,
https://github.com/geneontology/• Submissions via TermGenie, http://
go.termgenie.org• ~80% terms are now created this way
Annotate gene products
gene -> GO term
associated genes
GO
Database
genome and protein databases
…a statement that a gene product;
P00505
Accession Name GO ID GO term name Reference Evidence code
IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2
A GO annotation is …
…a statement that a gene product;
1. has a particular molecular function or is involved in a particular biological process
or is located within a certain cellular component
A GO annotation is …
P00505
Accession Name GO ID GO term name Reference Evidence code
IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2
…a statement that a gene product;
1. has a particular molecular function or is involved in a particular biological process
or is located within a certain cellular component
2. as described in a particular reference
A GO annotation is …
P00505
Accession Name GO ID GO term name Reference Evidence code
IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2
…a statement that a gene product;
1. has a particular molecular function or is involved in a particular biological process
or is located within a certain cellular component
2. as described in a particular reference
3. as determined by a particular method
A GO annotation is …
P00505
Accession Name GO ID GO term name Reference Evidence code
IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2
Experimental data
Computational analysis
Author statements/curator inference
(+ Inferred from electronic annotations)
http://www.evidenceontology.org/
Tracking provenance
Manual annotations
• Time-consuming process producing lower numbers of annotations (~2,800 taxons covered)
• More specific GO terms• Manual annotation is
essential for creating predictions
AleksandraShypitsyna
ElenaSperetta
AlexHolmes
TonySawford
Electronic Annotations• Quick way of producing large numbers of
annotations• Annotations use less-specific GO terms• Only source of annotation for ~438,000 non-
model organism species
orthology taxon constraints
* Includes manual annotations integrated from external model organism and specialist groups
2,752,604Manual annotations*
269,207,317Electronic annotations
Provide a public resource of data and toolsNumber of annotations in UniProt-GOA database (March 2016)
http://www.ebi.ac.uk/GOA
https://www.ebi.ac.uk/QuickGO/
Enrichment analysisSample Reference
40%20%
20%20%
=> The sample is over-enriched for
Spinocerebellar ataxia type 28
PaolaRoncaglia
Novel biomarkers of rectal radiotherapy
Biomarker for diagnosis and prognosis
Gene expression changes in diabetes
Improved network analysis
25
Many gene products are associated with a large number of descriptive, leaf GO nodes:
GO slims
…however annotations can be mapped up to a smaller set of parent GO terms:
GO slims
Slim generation for industry• Collaboration funded by Roche• Need a custom GO slim for analysis of genesets of
interest• Need to be descriptive enough• Without redundancy
• Internal proprietary vocabulary – hard to maintain• Desire to automatically map to GO
http://www.swat4ls.org/wp-content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf
ROCHE CVGSEA with full GO GSEA with Roche CV
Courtesy Laura Badi
• Mapping query: participant_OR_reg_participant some cannabinoid
• Description: “A process in which a cannabinoid participates, or that regulates a process in which a cannabinoid participates.”
Results• We have successfully mapped 84% of terms from
RCV (308/365) to OWL queries that can be used to replicate some proportion of the original manual mapping.
• In addition, these queries find 1000s of terms that were missed in the original mapping.
David Osumi-Sutherland
GO SLIM (generic)
ROCHE CV – MANUAL ONLY
ROCHE CV MANUAL + AUTO
Acknowledgements
• GO editors and developers• GO annotators• The Gene Ontology (GO) Consortium• Samples, Phenotype and Ontology team (Helen Parkinson)• Protein Function Content team (Claire O’Donovan)• Funding: EMBL-EBI, National Human Genome Research
Institute (NHGRI)
Useful links• Ontology browser: http://www.ebi.ac.uk/ols
/beta/ontologies/go• Browsing GO & annotations, GO slims: https://
www.ebi.ac.uk/QuickGO/• GO Annotation: http://www.ebi.ac.uk/GOA• EBI-Roche collaboration paper:
http://www.swat4ls.org/wp-content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf
• Contact: [email protected]