48
GO and OBO: an introduction

GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is

Embed Size (px)

Citation preview

GO and OBO:GO and OBO:

an introductionan introduction

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

• What is the Gene Ontology?• What is OBO?• OBO-Edit demo & practical

• What is the Gene Ontology?• What is OBO?• OBO-Edit demo & practical

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Gene OntologyGene Ontology

• Built for a very specific purpose:“annotation of genes and proteins in

genomic and protein databases”• Applicable to all species

• Built for a very specific purpose:“annotation of genes and proteins in

genomic and protein databases”• Applicable to all species

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Evolution of GOEvolution of GO

• Original GO created in 2000• Three databases involved:

– FlyBase (Drosophila)– MGI (Mouse)– SGD (S. cerevisae)

• Used immediately

• Original GO created in 2000• Three databases involved:

– FlyBase (Drosophila)– MGI (Mouse)– SGD (S. cerevisae)

• Used immediately

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Evolution of GOEvolution of GO

• Later databases:– TAIR (Arabadopsis)– TIGR (microbes including prokaryotes)– SWISS-PROT (several thousand species inc. human)– PSU (P. falciparum)

• Recent additions– ZFIN (zebrafish)– PAMGO (plant pathogens)

• Later databases:– TAIR (Arabadopsis)– TIGR (microbes including prokaryotes)– SWISS-PROT (several thousand species inc. human)– PSU (P. falciparum)

• Recent additions– ZFIN (zebrafish)– PAMGO (plant pathogens)

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Evolution of GOEvolution of GO

• GO development traditionally annotation-driven– development directed by use

• Terms added as new species annotated• Terms added on as as-needed basis

• GO development traditionally annotation-driven– development directed by use

• Terms added as new species annotated• Terms added on as as-needed basis

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Evolution of GOEvolution of GO

• Developed by an international consortium of biologists and computer scientists– members from individual databases– central office at EBI

• Development involves collaboration with domain experts from different biological fields– also formal ontologists

• Developed by an international consortium of biologists and computer scientists– members from individual databases– central office at EBI

• Development involves collaboration with domain experts from different biological fields– also formal ontologists

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Evolution of GOEvolution of GO

• Resulted in ‘organic’ structure, little formality

• Ontological formality added subsequently– philosophical and logical

• Resulted in ‘organic’ structure, little formality

• Ontological formality added subsequently– philosophical and logical

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Growth of GOGrowth of GOGO term history 2001 - 2007

0

5000

10000

15000

20000

25000

30000

Jan-01Apr-01Jul-01Oct-01Jan-02Apr-02Jul-02Oct-02Jan-03Apr-03Jul-03Oct-03Jan-04Apr-04Jul-04Oct-04Jan-05Apr-05Jul-05Oct-05Jan-06Apr-06Jul-06Oct-06Jan-07

Date

Number of terms

obsolete

undefined terms

defined terms

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

How does GO work?How does GO work?

• What does the gene product do?• Where and when does it act?• Why does it perform these

activities?

• What does the gene product do?• Where and when does it act?• Why does it perform these

activities?

What information might we want to capture about a gene product?What information might we want to capture about a gene product?

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

GO structureGO structure

• GO terms divided into three parts:– cellular component– molecular function– biological process

• GO terms divided into three parts:– cellular component– molecular function– biological process

Cellular ComponentCellular Component

• where a gene product acts

Cellular ComponentCellular Component

Cellular ComponentCellular Component

Cellular ComponentCellular Component

• Enzyme complexes in the component ontology refer to places, not activities.

Molecular FunctionMolecular Function

• activities or “jobs” of a gene product

glucose-6-phosphate isomerase activity

Molecular FunctionMolecular Function

insulin bindinginsulin receptor activity

Molecular FunctionMolecular Function

drug transporter activity

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Molecular FunctionMolecular Function

• A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.

• Sets of functions make up a biological process.

• A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product.

• Sets of functions make up a biological process.

Biological ProcessBiological Process

a commonly recognized series of events

cell division

Biological ProcessBiological Process

transcription

Biological ProcessBiological Process

regulation of gluconeogenesis

Biological ProcessBiological Process

limb development

Biological ProcessBiological Process

courtship behavior

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Ontology StructureOntology Structure

• Terms are linked by two relationships– is-a – part-of

• Terms are linked by two relationships– is-a – part-of

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Ontology StructureOntology Structurecell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Ontology StructureOntology Structure

• Ontologies are structured as a hierarchical directed acyclic graph (DAG)

• Terms can have more than one parent and zero, one or more children

• Ontologies are structured as a hierarchical directed acyclic graph (DAG)

• Terms can have more than one parent and zero, one or more children

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Ontology StructureOntology Structurecell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

Directed Acyclic Graph (DAG) - multiple

parentage allowed

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Open Biomedical Ontologies (OBO)Open Biomedical Ontologies (OBO)

• GO is a member of OBO • An umbrella project for grouping

different ontologies in biological/medical field– a repository for ontologies with

defined set of standards• Available from a single source:http://obo.sourceforge.net/

• GO is a member of OBO • An umbrella project for grouping

different ontologies in biological/medical field– a repository for ontologies with

defined set of standards• Available from a single source:http://obo.sourceforge.net/

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Why do we need OBO?Why do we need OBO?

• GO covers small area of biology:– molecular function of a protein– biological function of a protein– cellular location of a protein

• GO covers small area of biology:– molecular function of a protein– biological function of a protein– cellular location of a protein

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Why do we need OBO?Why do we need OBO?

• Lots of other aspects that also need to be captured, e.g.:– phenotype– anatomy– genomic– taxonomy

• Lots of other aspects that also need to be captured, e.g.:– phenotype– anatomy– genomic– taxonomy

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Why do we need OBO?Why do we need OBO?

• Many groups develop their own ontologies– e.g. plant ontology, anatomies for specific organisms

• No standardisation of ontologies with respect to:– format– scope – relationships

• No way of knowing whether such ontologies already exist

• No mechanism of distribution for other groups

• Many groups develop their own ontologies– e.g. plant ontology, anatomies for specific organisms

• No standardisation of ontologies with respect to:– format– scope – relationships

• No way of knowing whether such ontologies already exist

• No mechanism of distribution for other groups

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Why do we need OBO?Why do we need OBO?

• Creating ontologies takes a lot of work– Makes sense to reuse existing

ontologies where possible• Improves data integration where

small set of ontologies used• Allows ontologies to be made

available from a single place

• Creating ontologies takes a lot of work– Makes sense to reuse existing

ontologies where possible• Improves data integration where

small set of ontologies used• Allows ontologies to be made

available from a single place

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Why do we need OBO?Why do we need OBO?

• Ultimate aim: a complete set of integrated ontologies completely covering the biomedical domain

• Ultimate aim: a complete set of integrated ontologies completely covering the biomedical domain

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirementsOBO requirements

To be part of OBO, ontologies must:

• Be open, can be used by all without any constraint

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirements: openOBO requirements: open

• Ontologies can be used by anyone without any constraints, except:– original authors are acknowledged– cannot be edited and then released

under same name

• Ontologies can be used by anyone without any constraints, except:– original authors are acknowledged– cannot be edited and then released

under same name

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirementsOBO requirements

To be part of OBO, ontologies must:

• Be open, can be used by all without any constraint

• Be in a common shared syntax

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirements: syntax OBO requirements: syntax

• Usually the OBO format, same as primary GO format– and adaptions of OBO format

• Also accept OWL (Web Ontology Language) format

• Allows the same tools to be applied, facilitating shared software implementations

• Usually the OBO format, same as primary GO format– and adaptions of OBO format

• Also accept OWL (Web Ontology Language) format

• Allows the same tools to be applied, facilitating shared software implementations

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Anatomy of an OBO termAnatomy of an OBO termid: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092

id: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.[http://cancerweb.ncl.ac.uk/omd/index.html]exact_synonym: glucose biosynthesisxref_analog: MetaCyc:GLUCONEO-PWYis_a: GO:0006006is_a: GO:0006092

unique IDterm name

definition

synonymdatabase ref

parentage

ontology

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirementsOBO requirements

To be part of OBO, ontologies must:

• Be open, can be used by all without any constraint

• Be in a common shared syntax• Not overlap with other ontologies in

OBO

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirements: overlappingOBO requirements: overlapping• Ontologies can (and should)

overlap partially, but large overlap should be avoided

• Idea is that terms from different ontologies can be combined to form new terms

• Striving for accepted standards rather than competition

• Ontologies can (and should) overlap partially, but large overlap should be avoided

• Idea is that terms from different ontologies can be combined to form new terms

• Striving for accepted standards rather than competition

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirementsOBO requirements

To be part of OBO, ontologies must:

• Be open, can be used by all without any constraint

• Be in a common shared syntax• Not overlap with other ontologies in

OBO• Share a unique identifier space

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirements: id spaceOBO requirements: id space• So, for example, the GO identifier

is “GO”:– No other OBO ontology could use this

id space

• Prevents problems where multiple ontologies are used together

• So, for example, the GO identifier is “GO”:– No other OBO ontology could use this

id space

• Prevents problems where multiple ontologies are used together

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirementsOBO requirements

To be part of OBO, ontologies must:

• Be open, can be used by all without any constraint

• Be in a common shared syntax• Not overlap with other ontologies in

OBO• Share a unique identifier space• Include text definitions of their terms

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO requirementsOBO requirements

• In addition, OBO includes ontology of relationships– all ontologies should use these

definitions of relationships• For example

– part_of– develops_from– regulates

• In addition, OBO includes ontology of relationships– all ontologies should use these

definitions of relationships• For example

– part_of– develops_from– regulates

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

What’s availableWhat’s available

• demo:http://obo.sourceforge.net/

• demo:http://obo.sourceforge.net/

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

Editing ontologiesEditing ontologies

• GO is edited using OBO-Edit– stand-alone Java application– available for all platforms– browse, create or edit any ontology in

OBO format

• GO is edited using OBO-Edit– stand-alone Java application– available for all platforms– browse, create or edit any ontology in

OBO format

Jane Lomax EMBL-EBIJane Lomax EMBL-EBI

OBO-Edit demoOBO-Edit demo

• Browsing ontologies– loading ontologies (including loading multiple ontologies)– graph viewer– reasoner/single relationship views– searching/filtering/rendering– help

• Creating/editing ontologies– creating a new ontology– adding terms– copying/moving/deleting terms– adding definitions, dbxrefs etc– verification plugin– saving ontologies

• Browsing ontologies– loading ontologies (including loading multiple ontologies)– graph viewer– reasoner/single relationship views– searching/filtering/rendering– help

• Creating/editing ontologies– creating a new ontology– adding terms– copying/moving/deleting terms– adding definitions, dbxrefs etc– verification plugin– saving ontologies