38
GIGA2, Munich, March 2015 STRUCTURING PHENOTYPE DATA: Chris Mungall LBNL, Berkeley Gene Ontology Lessons from vertebrate genomes

GIGA2 Structuring Phenotype Data

Embed Size (px)

Citation preview

Page 1: GIGA2 Structuring Phenotype Data

GIGA2, Munich, March 2015

STRUCTURING

PHENOTYPE DATA:

Chris

Mungall

LBNL,

Berkeley

Gene

Ontology

Lessons from vertebrate

genomes

Page 2: GIGA2 Structuring Phenotype Data

Web Apollo: http://genomearchitect.org

Page 3: GIGA2 Structuring Phenotype Data

Desvignes, T., Pontarotti, P., & Bobe, J. (2010).

Nme gene family evolutionary history reveals pre-

metazoan origins and high conservation between

humans and the sea anemone, nematostella

vectensis. PLoS ONE, 5(11).

doi:10.1371/journal.pone.0015506

Genome

structures are

highly

amenable to

comparison

Page 4: GIGA2 Structuring Phenotype Data

Can we compute over the architecture of phenomes as we do

for genome architecture?

oWhat genes affect distal appendage length or shape?

oWhat are the genes expressed in the mouth during development?

oWhat structures develop using the same gene regulatory networks as

in bilaterian mouths?

Current methods

o Text based search of literature and manually gather results

Time consuming

Hard to automate

COMPUTING OVER PHENOTYPES

Page 5: GIGA2 Structuring Phenotype Data

Ge

ne

Every phenotype ever to have existed

expressed

in mouth

Affects appendage length

regulates EMT …

Page 6: GIGA2 Structuring Phenotype Data

PHENOTYPES: ENDLESS FORMS

Pe

yto

ian

ath

ors

tiA

mp

hip

ho

lis

sq

ua

ma

taP

etr

om

yzo

nm

ari

nu

s

Bu

gu

la

Ho

mo

sa

pie

ns

(wit

h c

left

pa

late

)

Myste

ce

tiA

ply

sin

aa

ero

ph

ob

aG

astr

ula

(M

eta

zoa

n)

mouth anusosculum

blastopore

cleft

lip and

palate

Page 7: GIGA2 Structuring Phenotype Data

Ge

ne

“expressed

in mouth”

“affects appendage length”

“long tentacles”

“elongated arms”

FREE TEXT != STRUCTURED

“expressed

around oral

opening”

“expressed

in anterior

end of gut

tube”

Page 8: GIGA2 Structuring Phenotype Data

ONTOLOGIES: STRUCTURING A DIVERSITY

OF PHENOTYPES

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

https://github.com/obophenotype/cephalopod-ontology

mouthsurrounds

Page 9: GIGA2 Structuring Phenotype Data

ONTOLOGIES FOR MOLECULAR

PHENOTYPES

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

Scr Lox5 Antp

Expressed in

mouthsurrounds

Page 10: GIGA2 Structuring Phenotype Data

GRAPH KNOWLEDGE QUERIES

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

Scr Lox5 Antp

Expressed in

mouthsurrounds

“What genes

Are expressed in

structures that develop from

a tentacle bud, or homologs?”

Page 11: GIGA2 Structuring Phenotype Data

ONTOLOGIES FOR TRAITS

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

mouthsurrounds

shape length++

=shape of

tentacular club

=length of

arm IV

Page 12: GIGA2 Structuring Phenotype Data

Wild-type phenotypic function:

o The Gene Ontology

Anatomy:

o Uberon anatomy ontology

APPLICATIONS OF ONTOLOGIES

Page 13: GIGA2 Structuring Phenotype Data

For curating the ‘wild type functional phenotypes’

Genes for over 0.5 million species have associations to GO

terms

>40,000 terms

oMolecular function

o Cellular component

o Biological Process

Core and taxon-specific

Uses include

o Gene set selection

o Term enrichment

THE GENE ONTOLOGY

Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000)

http://geneontology.org

Page 14: GIGA2 Structuring Phenotype Data

Experimental

o Curated from literature

Automated methods:

o Based on sequence similarity

E.g. blast2go

o Based on protein features

Interpro2GO

o Based on phylogenetic evidence

Ensembl COMPARA

Panther Families and PAINT

Typically only applied for

conserved cellular biology

ASSIGNING GENE FUNCTION

Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.

Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042

PAINT

Page 15: GIGA2 Structuring Phenotype Data

EXTRACTING GENE LISTS AND

INTERPRETING TRANSCRIPTOMIC DATA

Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang,

Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and

green sea turtle yield insights into the development and evolution

of the turtle-specific body plan. Nature Genetics, 45(6), 701–6.

doi:10.1038/ng.2615

Page 16: GIGA2 Structuring Phenotype Data

BEYOND THE GO

Functional

Genomics: Gene

function

Transcriptomics:

Gene expression

Phenomics: Effects

of gene mutations

Gene Ontology

Anatomy and Stage

Ontology

Phenotype and Trait

Ontology

Links genes to

What they do

Links genes to

where they

are expressed

Links genes to

what happens

when they are

disrupted

Page 17: GIGA2 Structuring Phenotype Data

Core: 14,000 terms

o Bias towards vertebrate systems

Composite-Metazoan edition: 42,000 terms

o Integrates cell types, developmental stages,

o Species-specific ontologies

Uses

o Standard reference for animal anatomy

o Linking model organism databases

o Evolutionary systematics (Phenoscape)

o Comparative transcriptomics (Bgee)

o Standardized vocabulary for mammalian

sequencing consortia

o Cross-species phenotype matching (Monarch)

THE UBERON MULTI-SPECIES

COMPARATIVE ANATOMY ONTOLOGY

http://uberon.org

Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species

anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5

Page 18: GIGA2 Structuring Phenotype Data

PHENOSCAPE: LINKING EVOLUTION TO

GENOMICS USING PHENOTYPE ONTOLOGIES

Phenotypic knowledgebase

o Linking phenotypes to extant and extinct vertebrate taxa

o Integrate with model organism databases

Extending Uberon to cover diversity of vertebrates

Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014).

Unification of multi-species vertebrate anatomy ontologies for

comparative biology in Uberon. Journal of Biomedical Semantics,

5(1), 21. doi:10.1186/2041-1480-5-21

Page 19: GIGA2 Structuring Phenotype Data

UBERON FOR COMPARATIVE GENE

EXPRESSION

Page 20: GIGA2 Structuring Phenotype Data

EXAMPLE OF EXPRESSION DATA

Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence

ENSMUSG

00000071424Grid2 UBERON:00

00112

sexually

immature

UBERON:00

02979

Purkinje cell

layer of

cerebellar

cortex

high quality

ENSMUSG

00000071424Grid2 UBERON:00

18241

prime adult UBERON:00

04720

cerebellar

vermis

high quality

Mus_musculus (‘simple’ expression file)

http://bgee.org/?page=download

Page 21: GIGA2 Structuring Phenotype Data

EXAMPLE OF INFERRED EXPRESSION

DATA

Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02979

Purkinje cell layer

of cerebellar cortex

high quality

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02129

cerebellar cortex high quality

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02979

cerebellum high quality

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02028

hindbrain high quality

… …

ENSMUSG

00000071424Grid2 UBERON:0

018241

prime

adult

UBERON:00

04720

cerebellar vermis high quality

ENSMUSG

00000071424Grid2 UBERON:0

018241

prime

adult

UBERON:00

04720

cerebellum high quality

… …

Mus_musculus (‘complete’ expression file)

http://bgee.org/?page=download

Page 22: GIGA2 Structuring Phenotype Data

CURATING A DATABASE OF HOMOLOGY

HYOPTHESES

https://github.com/BgeeDB/anatomical-similarity-annotations

gastrodermis

mouth

choanoderm

osculumhomologous

homologous

Leininger S, Adamski M, …

Adamska M 10.1038/ncomms4905Developmen

tal

Gene expression

evidence

Cnidaria Porifera

Page 23: GIGA2 Structuring Phenotype Data

ONTOLOGIES FOR DATA

STANDARDIZATION IN SEQUENCING

CONSORTIA

Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the

ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010

Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive

experimental details. Database, 2011, bar023

https://www.encodeproject.org/search/?type=biosample

Page 24: GIGA2 Structuring Phenotype Data

Monarch Initiative

o Large knowledgebase connecting genes, genotypes and diseases to

phenotypes

o Find novel linkages between human diseases to model systems

o http://monarchinitiative.org

Driving use case

o Given a patient with a rare or unique spectrum of abnormal

phenotypes, determine the causative genomic variant(s)

DISEASES AND ABNORMAL PHENOTYPES

Page 25: GIGA2 Structuring Phenotype Data

Standard Clinical

Exome

Testing Pipeline

Predicts causative variant based on information in genome of patient and

background genomic data

Page 26: GIGA2 Structuring Phenotype Data

https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2

Robinson, P., et al . (2013). Improved exome prioritization of

disease genes through cross species phenotype comparison.

Genome Research. doi:10.1101/gr.160325.113

Page 27: GIGA2 Structuring Phenotype Data

http://monarchinitiative.org/analyze/phenotypes/

EXOMISER USES ONTOLOGY-BASED

PHENOTYPE MATCHING

cleft palate = cleft

(attribute)

palate

(structure)+

Page 28: GIGA2 Structuring Phenotype Data

SOLVING UNDIAGNOSED

DISEASES

Behavioural/Psychiatric Abnormality

Thyroid stimulating

hormone excess

Gait apraxia

Spasticity

increased exploration in new

environment

increased dopamine level

hyperactivity

hyperactivity

Behavioral

abnormality

Abnormality of

the endocrine

system

abnormal

locomotor

behavior

Abnormal

voluntary

movement

Patient

phenotypes Sh3kbp1 tm1Ivdi -/-

NIH Undiagnosed Disease Program, patient 2731

Page 29: GIGA2 Structuring Phenotype Data

Think about

o How your data will be re-used by others

o How what your doing will scale

Provide structured metadata for experimental data

o Free text is not enough

o Use ontologies and standardized vocabularies where possible

Failing to do so will cost you later!

o All major human and model organism omics consortia now enforce

this

ENCODE, FANTOM, LINCS

o Also major phenotyping projects

IMPC/KOMP2

LESSONS

Page 30: GIGA2 Structuring Phenotype Data

Providing metadata requires the right ontologies or

vocabularies in place

Make phenotypic knowledge about your favorite system

structured and computable

o This seems daunting, where do I start…?

LESSONS

Page 31: GIGA2 Structuring Phenotype Data

Got transcriptome data?

o Bgee will curate it for you!

o Caveat: Your genome must be in Ensembl Genomes

oWe are also interested in your homology hypotheses

Got classic systematics data?

o Talk to me about using Phenoscape infrastructure

BGEE WILL CURATE YOUR

TRANSCRIPTOME DATA

Page 32: GIGA2 Structuring Phenotype Data

Uberon Core

GOT ANATOMY EXPERTISE? CLAIM AN

INVERTEBRATE MODULE!

Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E.,

Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO):

enhancing sponge systematics with an anatomy ontology. Journal of

Biomedical Semantics, 5(1), 39

Vertebrate

structures

Porifera

Ontology

Ctenophore

Ontology

Cephalopod

Ontology

http://phenotypercn.org

Eric Edsinger, CephSeq

https://github.com/obophenotype/cephalopod-ontology

https://github.com/obophenotype/ctenophore-ontology

https://github.com/obophenotype/porifera-ontology

https://github.com/obophenotype/uberon

Arthropod

Ontology

Page 33: GIGA2 Structuring Phenotype Data

Noctua

Curation using multiple ontologies with a graph model

oWeb-based, collaborative

oAdvanced GO curation

oPhenotype curation

Beta available in summer 2015

ohttp://noctua.berkeleybop.org

CURATE GENE REGULATORY NETWORKS

AND PHENOTYPES

Page 34: GIGA2 Structuring Phenotype Data

Structured metadata is valuable

o Helps build the knowledge graph of invertebrate genomics

o Capture metadata up-front, not after the fact

o Use ontologies where possible

o Don’t repeat mistakes of projects that ignored this advice

Invertebrate Ontologies at a nascent stage

o This is an opportunity! Get involved!

CONCLUSIONS

Page 35: GIGA2 Structuring Phenotype Data

Monarch

oMelissa A Haendel

o Nicole Washington

o Sebastian Kohler

o Harry Hochheiser

oMaryann Martone

o Suzanna Lewis

o Damian Smedley

o Peter Robinson

oWilliam Bone

o Jeremy Nguyen-

Xuan

ACKNOWLEDGMENTS

Uberon

o Frederic Bastian

o Ann Niknejad

oMarc Robinson-

Rechavi

o Todd Vision

o Jim Balhoff

o Paul Sereno

o Nizar Ibrahim

o Alex Dececchi

o Yvonne Bradford

o Terry Hayamizu

o Robert Druzinsky

NSF Phenotype RCN

o Paula Mabee

o Suzanna Lewis

o Eva Huala

o Andy Deans

o Erik Segerdell

o Robert Thacker

o Eric Edsinger

oMatt Yoder

o Istvan Miko

o David Osumi-

Sutherland

Page 36: GIGA2 Structuring Phenotype Data
Page 37: GIGA2 Structuring Phenotype Data

Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence

evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/

Page 38: GIGA2 Structuring Phenotype Data

FORWARD GENOMICS

http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports