27
Ontologies and vocabularies supporting data integration: emphasis on mouse phenotypes and disease model Control C3H/HeJ Homozygous Fasl gld /Fasl gld The mouse generalized lymphoproliferative disease (gld) mutation in the FAS ligand (TNF superfamily, member 6) gene. These mice model human Autoimmune Lymphoproliferative Syndrome; ALPS, type IB Janan T. Eppig PATO Meeting, Dec. 2006

Ontologies and vocabularies supporting data integration: emphasis on mouse phenotypes and disease model Control C3H/HeJ Homozygous Fasl gld /Fasl gld The

Embed Size (px)

Citation preview

Ontologies and vocabularies supporting data integration:

emphasis on mouse phenotypes and disease model

ControlC3H/HeJ

HomozygousFaslgld/Faslgld

The mouse generalized lymphoproliferative disease (gld) mutation in the FAS ligand (TNF superfamily, member 6) gene.

These mice model human Autoimmune Lymphoproliferative Syndrome; ALPS, type IB

Janan T. EppigPATO Meeting, Dec. 2006

The genetic tools for mouse provide an ideal platform for

experimentation:

• genetic engineering

techniques to

specifically manipulate

the genome• sequenced genome

• Inbred strains

• high resolution maps

• Mammal : small, easy to breed and maintain, short lifespan• Similar to human genetically & physiologically

• human

disease model

• ES cell lines

• short domed skull

• short-limbed dwarfism

• malocclusion

• bulging abdomen as adults

• respiratory problems

• shorted lifespan

Achondroplasia

Homozygous achondroplasia mouse mutant and control

…facilitating the use of the mouse as a model for human biology by providing integrated access to data on the genetics, genomics, and biology of the laboratory mouse.

www.informatics.jax.org

…make phenotype and disease model data robust and

accessible to researchers and computational biologists

• semantically consistent search methods

• integrated access to all phenotypic variation sources (single-gene and genomic mutations, QTLs, strains)

• ability to query across sequence, orthology, expression, function, phenotype, disease

• data on human disease correlation

• access to mouse models from various approaches- Genetic- Phenotypic

- Computational

Objective

Existing Wealth of Mouse Phenotype Data in MGI

>16,800 phenotypic alleles representing ≈6,830 unique genes.

>71,000 annotations associating MP terms to genotypes. >6,550 phenotype records for 3,210 QTL. >9,000 strains catalogued.

A few of the challenges • alleles can produce pleiotropic phenotypic effects

• non-allelic mutations can produce indistinguishable

phenotypes

• modifiers and epistasis can influence mutant phenotypes

• alleles of different genes can interact to produce unique

phenotypes

• genetic background can greatly influence mutant

phenotypes

• imprinted genes/alleles influence phenotype

• quantitative trait loci (QTLs) can contribute unequally to

phenotypes

• genomic mutations can delete or disrupt multiple genes

• strains (“whole-genome”) have characteristic phenotypes

• complex genetically engineered and multiple mutation

stocks are often developed for disease models

• environmental influences and age can dramatically affect

phenotype

Data Challenge

Mouse phenotype data from • publications • electronic submissions• mutagenesis (ENU centers)

(≈ 300 new alleles; ≈ 700 publications per month on phenotypes)

New initiatives to knock-out every gene in the mouse in next 5 years…

Need for efficiency, accuracy, full description of complex observations, storage/analysis of individual and population data

Making semantic sense

Controlled vocabularies/nomenclatures• Strains• Genes• Alleles (phenotypic or variant)• Classes of genetic markers• Types of mutations• Types of assays• Developmental stages• Tissues• Clone libraries• ES cell lines

….. organized as lists or simple hierarchies

CloneLibrary Names

Inbred Strain Names

Gene Symbols

Hbp1 (high mobility group box transcription factor 1) gene expression differences in KitW-e/KitW-e homozygotes vs wild-type

AssayGene nomenclature

Results

Specimen

Semantics plus relationship data

Ontologies/structured vocabularies

• Gene Ontology (GO)• Molecular function

• Biological process

• Cellular component

• Mouse Anatomy (MA)• Embryonic

• Adult

• Mammalian Phenotype (MP)

• Sequence Ontology (SO)

….. organized as directed acyclic graphs (DAGs)

DAGs

Phenotype detail, including genotypes for mouse models of human diseases

Navigating the views of phenotypes & disease

Human/mouse disease

relationships

3.MP Ontolog

y

Summary: genotype, MP term, & ref

1.Gene Page

Summary: phenotype classes & human disease associated

4.Disease

vocabulary

2.Phenotype Query

5.Sequence

(GBrowse)

enlarged brain ventricles

L1camtm1Mtei/Y 129/SvEv none affected

C57BL/6J high percentage affected

postnatal death Gnastm1Kel-pat/Gnas+ 129/Sv * C57BL/6J most die by P2; all by P9

129/Sv * C57BL/6J * CD-1

most die by P9; 10-20% survive past P21

TMEV viral susceptibility

Cd8atm1Mak/Cd8atm1Mak C57BL/6 Inflammation after infection resolves by 45 days; disease is absent by 10 mo.

PL/J viral infection persists

Genotype = allele combinations carried in the context of a specific genetic background (strain)

Mammalian Phenotype Ontology

• Structured as DAG

• Over 4,500 terms covering physiological systems, behavior, development and survival

• Available in browser and OBO formats from MGI ftp and OBO sites

• Each term linked to all annotations to the term or its children

Summary Results

• Genotypes that are annotated to a term or children of the term

• References supporting annotation

• Links to allele detail pages for full mutant phenotype

Allele Detail Page

• full phenotype annotations (MP) for each genotype

• specific detail for MP terms

• each MP annotation referenced

• human diseases for which genotype is used as a model

Mouse model genotypes linked to phenotype details

Genes associated with phenotypes characteristic of a disease in human, mouse, or both

osteopetrosis

Human-mouse disease relationshipsOMIM terms 6,113Genotypes associated w/ OMIM 1,847OMIM associated w/ genotypes 720

to Human Disease and Mouse Model Page

Vocabularies in MGI

DAGs

DefinitionSynonyms

MP:1956

Strain: AEJ

Alleles:bd/bd

Genotype

Strain: C57BL/6

Alleles: Ppp1r3atm1Adpt/ Ppp1r3atm1Adpt

Terms

Respiratory failure

Postnatal lethality

Dilated renal tubules

Growth retardation

VocabularyNote

J:65378TAS

J:62648IDA

J:65322EE

Annotations

Making Mammalian Phenotype Ontology Work

DAG

• accommodate bio-specific terms• computationally useful• human accessible• practical for curation• cross-reference to other ontologies

Terms in MPMP term Entity Quality Other

Info

microphthalmia

eye small size

hydrocephaly

cerebro-spinal fluid

increased

excessive

brain large size

(dilated)

trauma observed

brain

increased blood pressure

? increased

Future MP Ontology Development

• New terms from ongoing curation process

• Collaborative community efforts• identify new terms • revise organization of existing terms within particular branches

• Recruit domain experts for systematic review

• Cross-ref and comparison to other relevant ontologies (GO, Anatomy, Cell Type, Mpath, etc.)

MP Ontology Growth

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1/1/00 1/2/00 1/3/00 1/4/002003 2004 2005 2006

Collaborators

…currently annotating with MP and contributing to MP development

• Rat Genome Database (RGD)• Mouse Mutagenesis Centers • Human (NCBI/dbSNP)• Online Mendelian Inheritance in Animals

(OMIA)

…under discussion• Teratology Society• Animal Traits

Summary• Structured vocabularies and ontologies support semantic

integration for the MGI system and promote broader integration of mouse knowledge

• To meet community needs, practical implementations parallel formal ontological development

• MGI has implemented a generalized structure for vocabularies and ontologies in MGI

• The Mouse Genome Informatics group continues its strong interest and participation in community bio-ontology efforts

Human FOXN1forkhead box N1

T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY

Frank J, et al. Nature 398, 473 - 474 (1999)

Mouse Foxn1Homozygous “nude” mouse. One of eight known phenotypic mutations in mouse (6 spontaneous; 2 engineered) for the forkhead box N1 gene.

www.informatics.jax.org