Upload
destiney-mitchum
View
213
Download
0
Embed Size (px)
Citation preview
Relations in GO for 2009
Intro
• We have many relations ready to GO live in the scratch directory– within GO ontologies– across GO ontologies– between GO and external ontologies– Both cross product (N+S conditions) and regular links
• Requires a fundamental change in how we and our users think about GO and annotations– Tools that make use of these will better serve users
Relations in GO
• In the beginning there was is_a and part_of– Benefits: simplicity
• We could effectively ignore relations• Most tools and users effectively do this
– Speculation: recent introduction of regulates had no effect on majority of users
– Drawbacks: lack of expressivity• We need more relations
– Regulation– Spatial relations– has_part for Process-Function– annotations
Example of a relation rule in GO
• Rule:– A is_a B, B is_a C A is_a C
• Example:• We can generalize this by having a rule for transitive
relations– transitive r, A r B, B r C A r C
• We can also write this as a composition rule:– is_a . is_a is_a– Open question:
• does this notation help or hinder??
Transitivity
• We currently have two transitive relations in GO:– is_a . is_a is_a– part_of . part_of part_of
• Example:– mitotic prophase part_of mitosis– In GO, part_of is an all-some relation
• regulates is not defined to be transitive in GO• (but the majority of tools still treat it as if it were!)
• Example:
Composition with is_a• Any relation that follows the all-some pattern composes
with is_a to itself• Example:
– (all) nucleus part_of (some) cell• Composition:
– is_a . R R– R . is_a R
• Example:– (all) mitotic prophase part_of (some) mitosis– mitosis is_a cell cycle phase
• – (all) mitotic prophase part_of (some) cell cycle phase
is_a part_of
is_a is_a part_of
part_of part_of part_of
Read row first, the column(so far the table is symmetric)
Composition Table
is_a part_of
is_a is_a part_of
part_of part_of part_of
Composition Tablemitotic prophase part_of mitosis is_a cell cycle phase
(all) mitotic prophase part_of (some) cell cycle phase
is_a part_of
is_a is_a part_of
part_of part_of part_of
Chained compositionsA part_f B is_a C is_a D part_of E
A part_of B is_a D part_of E
A part_of D part_of E
A part_of E
order of reductiondoes not matter
regulates transitive_over part_of
• regulates . part_of regulates
inferredlink inferred link
regulates transitive_over part_of
• regulates . part_of regulates
inferredlink inferred link
(all) RoSPoMCCregulates (some) MCC
regulates transitive_over part_of
• regulates . part_of regulates
inferredlink inferred link
(all) RoSPoMCCregulates (some) MCC
is_a part_of regulates
is_a is_a part_of regulates
part_of part_of part_of -
regulates regulates regulates -
Composition Table: Regulates
is_a part_of regulates
is_a is_a part_of regulates
part_of part_of part_of -
regulates regulates regulates -
Composition Table: Regulates
regulates . part_of regulates
is_a part_of regulates
is_a is_a part_of regulates
part_of part_of part_of N/A
regulates regulates regulates -
Composition Table: Regulates
part_of . regulates N/A
is_a part_of regulates
is_a is_a part_of regulates
part_of part_of part_of -
regulates regulates regulates indirectly regulates
We have the option of defining additional relationsThese may be entirely implicit (i.e. we would never assert indirectly regulates in GO)
regulates . regulates indirectly regulates
is_a part_of regulates indirectly regulates
is_a is_a part_of regulates indirectly regulates
part_of part_of part_of - -
regulates regulates regulates indirectly regulates
indirectly regulates
indirectly regulates
indirectly regulates
indirectly regulates
indirectly regulates
indirectly regulatesRegulates is not transitive
Indirectly regulates is transitive
is_a part_of regulates indirectly regulates
is_a I P R ~R
part_of P P - -
regulates R R ~R ~R
indirectly regulates
~R ~R ~R ~R
USE SYMBOLS?OR IS THIS GETTING TOO ABSTRACT?
Sub-relations
• regulates– negatively_regulates– positively_regulates
is_a part_of regulates + regulates
- regulates
is_a I P R +R -R
part_of P P
regulates R R
+ regulates
+R +R +R
- regulates
-R -R -R
is_a part_of regulates + regulates
- regulates
indirectly regulates
is_a I P R +R -R ~R
part_of P P - - - -
regulates R R ~R ~R ~R ~R
+ regulates
+R +R +R ~+R ~-R ~R
- regulates
-R -R -R ~-R ~+R ~R
indirectly regulates
~R ~R ~R ~R ~R ~R
Sub-relations + indirect
RR
R+R+ R-R-
~R~R
~R+~R+ ~R-~R-
normal regulates relations asserted in GO
indirect regulates relations never asserted, only implied
Regulation relation lattice
RDRD
RD+RD+ RD-RD-
~R~R
~R+~R+ ~R-~R-
renamed to DIRECTLY regulates? indirect regulates relations never asserted, only implied
~RG~RG
~RG+~RG+ ~RG-~RG-
super-relation ofindirect and direct regulation(call this one “regulates”?)
has_part• NOT the inverse of part_of at the ontology level• Example:
– nucleus part_of cell: YES• every nucleus is part_of some cell
– by definition; e.g. extruded nuclei are ex-nuclei– cell has_part nucleus: NO
• not every cell has_part nucleus– mammalian erythrocytes, bacteria
• Example:– <pf example here>– <summarise pf progress>
Annotations and relations
• not just an ontology issue– this is of relevance to annotations too…
• The current simple methodology of propagating annotations up the graph only works for a small subset of relations– To understand how annotations and new relations
interact we must think in terms of gene product relations
Gene product relations
• What is the relation between a gene product and– A molecular function?– A biological process?– A cellular component?
• Why care?• What’s wrong with “annotated_to”?
– We need to define these relations:• to do justice to the biology• to be able to deal with new relations within the GO itself
Why we should care
• How should annotation queries, analysis tools (slimmers, enrichment tools) etc treat the (pseudo-)new regulates relation?
• How should we recommend the process-function links be vizualized?
• How should these links be treated in queries?
Proposed relations for gene products
• For MF and BP:– has_potential– has_function_during
• For CC:– localized_to
– This is more specific than has_location• A gene product may travel through different locations
– Formally:• GP localized_to CC : GP executes some function in CC
Names TBDMFs are ontologically like BPs(bfo processes)….
How to read a GAF
• <gene product> <rel> <GO term>• gene product may not be explicitly in GAF
– that’s OK– gene as proxy
• The relation does NOT apply to the gene however• genes are only localized_to chromosomes, and only
participate in gene expression. It’s the products that do the work
• <rel> is implicit, depending on F, C or P• Examples:
Annotation relation composition• is_a
– always propagate over is_a• localized_to . is_a localized_to• has_function_in . is_a has_function_in
• part_of• localized_to . part_of localized_to• has_function_in . part_of has_function_in
• This is effectively what we do with gene product annotations now
• post-hoc logical justification for why it’s OK to propagate
Annotation relation composition: regulates
• regulates– localized_to . regulates NEVER POSSIBLE
• localized_to never has a process as target• regulates always has process as subject
– has_function_in . regulates regulator_of
• This introduces an addition implicit relation that can be used to sum gene product results– Fake AmiGO screenshot here
Annotation relation composition: inter-ontology links
• We have 183 CC->MF/BP links in scratch• regulates
– localized_to . has_function_in ??may_contribute_to??• Example:
• RPS25A localized_to ribosome• ribosome has_function_in protein biosynthesis
– • RPS25A ??has_function_in?? protein biosynthesis
• No need for curator to make explicit IC claims
• Q: we never want “may” in relation names?• Can we make a stronger claim?• How does a curator know when to make an IC claim here?• Potential confusion with contributes_to qualifier
Annotation relations and has_part
• Need some graphical illustrations• See
– http://wiki.geneontology.org/index.php/Has_part
– for now
Qualifiers
• Annotation qualifiers (contributes_to) have the effect of modifying the relation– NOT is not a qualifier – it is a logical operator
• We can add new relations to the qualifier column– geneProductA acted_on_during protein secretion
by the type II secretion system
Secondary taxon IDs
Cell component relations
• We have 674 xp defs within CC in scratch– adjacent_to– surrounds/surrounded_by– spans– overlaps
• Use case: reactome
• Can we say anything about gene products here?– we can perform spatial gene product queries
Spatial reasoning
– spans . adjacent_to overlaps (??TBD!!)
– SUN-KASH complex spans nuclear inner membrance
– nuclear inner membrane adjacent_to nuclear lumen
– – SUN-KASH complex overlaps nuclear lumen
Links from BP to external ontologies
• Process-continuant links– A has_function_in cysteine biosynthesis
• A ??has_participant?? cysteine• this is true but can we make stronger claims
– A has_function_in heart development• A has_participant heart• c.f. heart process, TAZ gene
• How can we use this?– Browse GO annotations via other ontologies– Enrichment using anatomy terms…– AmiGO screenshots
what next?
Won’t this confuse users?
• We will provide a pre-made inferred relation table for all of GO– we could do this for gps too but it would be over a
billion entries..
• We can always distribute a dumbGO– just is_a and part_of, not even regulates
• Need more guidance on how this can be used
Discussion
What’s next?• Move relations into GO editors file
– post OE2– CC-self
• spatial relations– BP->MF
• has_part• regulates
– BP->BP• has_part (??)
– External onts• Dual releases? dumbGO and fullGO?• Fix GOC tools (AmiGO, slimmer, enrichment, graphviz, refG) to deal
appropriately– OE2 should already be fine
• Educate non-GOC folks