34
Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Embed Size (px)

Citation preview

Page 1: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Judith A. Blake, David P. Hill, Barry Smith

BioOntologies SIG: ViennaJuly 20, 2007

Gene Ontology Annotations:What they mean and where they come

from

Page 2: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

GO Consortium Project Goals

1. We will maintain comprehensive, logically rigorous and biologically accurate ontologies.

*2. We will comprehensively annotate reference genomes in as complete detail as possible.

*3. We will support annotation across all organisms.

4. We will provide our annotations and tools to the research community.

Page 3: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

GO terms are used for functional annotations

I

Brain development [GO:0007420] (141 genes, 207 annotations)

I

Page 4: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

GO Stats:

I

GO Annotations

Total experimental GO annotations - 388,633

Total proteins with manual annotations – 80,402

Contributing Groups (including MGI): - 19

Total Pub Med References – 346,002

Total number predicted annotations – 17,029,553

Total number taxa – 129,318

Total number distinct proteins – 2,971,374

April 24, 2007

Page 5: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Annotations provide the connection between genomic information and the GO.

Experiments provide the data that enables us to annotate gene products with terms from the ontologies.

Annotations for App: amyloid beta (A4) precursor protein

Annotations are assertions

Page 6: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern ISS: Inferred from sequence or structural similarity TAS: Traceable author statement NAS:Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis IEA: Inferred from electronic annotation ND: no data available

NO Direct Experiment

Direct Experiment

We use evidence codes to describe the basis of the annotation

Page 7: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Examples of how we connect instances with knowledge representation in the GO

What follows are examples of annotation of the biomedical literature

using GO types, gene product types and evidence codes

Page 8: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Example #1:Molecular Function using IDA

Figure from

Zhang M, Chen W, Smith SM, Napoli JL.Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1.J Biol Chem. 2001 Nov 23;276(47):44083-90.

Page 9: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The Annotation:

The Observation

NAD+

NADHH+

Page 10: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

What are the instances in this experiment?

Gene product instances Molecules of retinol dehydrogenase

Molecular function instances Instances of execution of the molecular function

revealed by the assay Instances of molecular function associated with

instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity.

Page 11: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

We are interested in understanding how gene products contribute to the biology

of an organism.

What knowledge are we trying to capture?

Page 12: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

They do experiments!

Experiments are designed to study the properties of gene product instances.

Experimental biologists take on “The Burden of Proof”.

How do wet-bench biologists learn about gene products?

Page 13: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

We* make annotations!

******

Annotations connect what wet-bench biologists see in the lab with how we represent our current

understanding of biological reality

How do we represent the accumulated knowledge?

* GO curators

Page 14: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The instances are in the lab. We use what people report about instances, but

we never actually deal with them directly

So, where are the instances?

Page 15: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Gene Product Type Stands proxy for the ‘gene’

Genes are what we have in MODs Types = what instances have in common

Gene Product Instance A molecule of a gene product

It can be physically isolated It takes up space

What do we mean by gene product?

Page 16: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

An annotation Asserts that instances of molecules of a type of

gene product have propensity to act as designated by the terms in an ontology such as the GO

Is created on the basis of observations of the instances of such types in experiments and of the inferences drawn from such observations

Note: comprehensive experimental details are embedded in biomedical publications and in specialized databases

What do we mean by annotations?

Page 17: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Example #2: Molecular Function using IMP

Figure from

Schulz S, Lopez MJ, Kuhn M, Garbers DL.Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice.J Clin Invest. 1997 Sep 15;100(6):1590-5.

Example #2: Molecular Function using IMP

Page 18: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The Annotation:

The Observation

X X

IMP

Page 19: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

What are the instances in this experiment?

Gene product instances Molecules of GUCY2C protein The lack of functional molecules of GUCY2C in

mutants

Molecular function instances The execution of the molecular function, measured

by the accumulation of cGMP The potential of a molecule of GUCY2C to execute

the molecular function Revealed by the correlation between a lack of

molecules and a lack of executions of molecular function

Page 20: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The Curator Perspective: Annotation Process

1. Identification of relevant experimental data

- Biomedical literature as primary source

- Annotations inferred from experiments in performed in other organisms or inferred from sequence structure

Page 21: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The Curator Perspective: Annotation Process

1. Identification of relevant experimental data

2. Identification of the appropriate ontology annotation term

- Experimental assay influences limit of resolution/granularityof term assignment available to use

- Differences in expertise among curators should result in close, but not necessarily exact, GO term annotations

Page 22: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The Curator Perspective: Annotation Process

1. Identification of relevant experimental data

1. Identification of the appropriate ontology annotation term

2. Employment of annotation quality control processes for - Correct formal structure

- Evaluate annotation consistency- Harvest emerging knowledge to refine and

extend the GO

Page 23: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Example #3: Biological Process Using IMP

Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J; Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development.,

Dev Biol 2005 Jul 15;283(2):357-72.

Page 24: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

The Annotation:

The Observation

IMP

X

Page 25: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

What are the instances in this Experiment?

Gene product instances Molecules of the Shh gene Non-functional molecules of the Shh gene

Biological Process instances The development of a mouse heart

Molecular Function Instances The execution of a molecular function by a

molecule of the Shh gene

Page 26: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

So, when a biological process occurs,it is the result of molecules

of a gene product(s) executingtheir molecular function(s)

Page 27: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

How do wet-bench biologists learn about gene products?

They do experiments!Experiments are designed to study the properties of

gene product instances.Experimental biologists take on “The Burden of

Proof”.

They make conclusionsabout gene product

typesbased on the accumulated

experimental data!

Page 28: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

If experiments show:

All instances of a gene product studied have the potential to execute the function tyrosine kinase

Instances of the same gene product are involved in the biological process limb development

All instances of the same gene product are found in instances of the cytoplasm

Page 29: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

A wet-bench biologist would conclude:

The gene product of this gene is a tyrosine kinase that functions in the cytoplasm and the tyrosine

kinase functioning is used in limb development

Page 30: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

If we comprehensively annotate genes, can we make the same conclusions?

Analysis of gene product annotations lead to new hypothesis for wet-bench biologists to test

This is the basis of biological discovery!

Page 31: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Development of GO depends on intersection of curation with ontology refinements

New results may stand in conflict with current version of ontology

Process of annotation brings new experimental results into perspective with existing scientific knowledge captured in the ontology

One of strengths of GO development paradigm is that it is primarily a task of biologist-curators who are experts in understanding the experimental systems

Page 32: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Experimental Literature

Hypothesisgeneration

Informatics Resources

Data mining, and prediction using ontologies

Experiments and data analysis using GO, etc

Improved annotations, in MODs, UniProt;

Refine bio-ontologies

Page 33: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from

Summary

Gene product annotation is an integral aspect of the work of the GO Consortium

Annotations reflect conclusions from experiments as interpreted by the biologist and reviewed by peers

The structure of the GO depends upon accumulated knowledge from many experiments resulting in a representation of current thought about biological reality

As experimental data changes our view of reality, the ontology must change as well

Page 34: Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 Gene Ontology Annotations: What they mean and where they come from