18
SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

Embed Size (px)

Citation preview

Page 1: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

LITERATURE CURATION

Overview & Integrated Phenotype Curation

Page 2: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

WebInterface

Literature Curation - Data Flow for First Pass

Papercollection

First Pass

Data flagged with comments for 32 different data types

Postgres DB

Some data typesstored for

future curation

Active Curation:Data Extraction

Database Input Files

St.Louis DB

CompleteDB

Caltech DB Sanger DB

Local Databases

Page 3: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

First-Pass Curation Fields (based on 5787 Papers)

1546

1291

958917 877

811 794 789697

493 472

320 293 281 279194 172 166 147 140

93 81 57 41 34 20 18 18 120

400

800

1200

1600

Expression data

RNAi

Transgene

Gene product interactions

Mutant phenotype

Gene function

Sequence change

Gene-gene interactions

Antibody

Gene-seq, gene name, synonym

Gene regulation

Structure correctionSite of action analysis

Overexpression

New allele

Sequence features

Protein functions in vitro

Mapping dataStructural info

Cell (name,function,ablation)

Microarray

Mosaic analysis

Covalent modification

SNPs

Mass-Spec

RNAi (large-scale)

Functional complementation

Chemicals

Human diseases

Page 4: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Objects in WS170

Objects in WS190/91

% Change

% Complete*

Mutant Phenotype (total alleles) 2736 4675 71% 20%RNAi (Large and small scale) 64461 74427 15% 53%Overexpression 5 9 80% < 1%Nomenclature Data 100%

Genetic interactions 4920 6795 38%Gene Product Interaction (Y2H) 11573 11573

Cell Function (ablation and mosaics) 0 183 NA 15%

Expression Data 6355 9744 53% 100%Gene Regulation on Expression Level 642 2044 218% 100%Microarray 40 53** 33% 71%

Feature Data*** Start up phase

Sequence Change 100%

Transgene 4062 5151 27% 100%C elegans Antibodies 1084 1324 22% 100%

Concise Description:†

Total Descriptions 4398 5335 21% Genes w/> 5 references 7% 85% Genes w/> 1 reference 5% 53%

Total GO annotations 75,065 141,937 47%Total non-IEA GO annotation 25,634 33,045 22%

Gene Ontology:†

Data from 78 papers since WS 170

Sanger Request Tracker - 67 since WS 170

Data from 97 papers since WS 170

Reagents:

Sequence Data:

Gene Identity and Function:

Gene Expression and Function:

Interactions:

Cell Data:

* Based on first pass papers completed unless otherwise noted † Outside of first pass** includes one tiling array*** Data from Sanger RT, - not only first pass

Page 5: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Phenotype Ontology

Provides a controlled vocabulary for phenotypic descriptions, organized hierarchically

Can annotate phenotypes to a very granular level, preserving associations with more general terms

Many Data Types Include a Phenotype Assignment

Phenotype Annotations- Consistency- Efficiency

Mutant Phenotype (total alleles)RNAi (Large and small scale)OverexpressionNomenclature Data

Genetic interactionsGene Product Interaction (Y2H)

Cell Function (ablation and mosaics)

Expression DataGene Regulation on Expression LevelMicroarray

Feature Data***Sequence Change

TransgeneC elegans Antibodies

Total Descriptions Genes w/> 5 references Genes w/> 1 reference

Total GO annotationsTotal non-IEA GO annotation

Gene Ontology:

Concise Description:

Gene Identity and Function:

Interactions:

Cell Data:

Gene Expression and Function:

Sequence Data:

Reagents:

Page 6: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

The WormBase Phenotype Ontology is Hierarchical:

Page 7: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Annotate to a very granular level, preserving associations with more general terms

Multiple vulva-like protrusions are present along the ventral side of the animal. This is usually a result of all six vulval precursor cells adopting vulval (1° or 2°) fates.

Definitionw/ references

SynonymsMuv

(OBO-EDIT)

vulva_development_abnormal

vulva_cell_fate_specification_abnormal

vulval_cell_induction_abnormal

vulval_cell_induction_increased

multivulva

reproductive_system_development_abnormal

Term Name

Page 8: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Using the Ontological Structure for Data Retrieval

Query by Name, WB IDor Synonym

Also for:Gene OntologyAnatomy Ontology

Output: Showing children of parent term and annotations

See individual annotations with references

Vulva development

Page 9: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Phenotype Ontology Overview:

Development:

Maintained with OBO-EDIT and registered with OBO foundry (NCBO)OBO-Edit is developed by the Berkeley Bioinformatics and Ontologies Project, and is funded by the Gene Ontology Consortium.

ReleasePhenotype Terms

Defined terms

Percent Defined

Terms used (%)

WS160 - Jul, 2006 (prior to PO) 119 0 0 ---

WS170 -Feb, 2007 1394 237 17% 40%

Current 1677 708 42% 60%

We will continue development in parallel w/ curation - reflects the developing complexity with which terms are described in literature

Refined by usage

(Currently there are 4,675 alleles curated w/ 10,468 phenotype associations ~125% WS170)

Community InputSeek input from experts in certain fields to develop ontology

Page 10: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

annotations 355

The embryonic_lethal branch - Fabio Piano and Kris GunsalusExpert input leads to granularity that reflects term usage

Page 11: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Integrated Phenotype CurationInitial paradigm - one curator = one data type

Paper:Chromatin regulation and sumoylation in the inhibition of Ras-induced vulval development in Caenorhabditis elegans.

RNAi Phenotype“RNAi of smo-1 on its own induces a low percentage of Muv animals”.

RNAi based Interaction (Synthetic)“smo-1 displays synMuv activity in both class A and class B backgrounds”

RNAi based Interaction (Enhancement)“let-60(n2021) increase in the percentage of Muv animals compared to smo-1(RNAi) alone”

“RNAi of the sumoylation pathway gene smo-1 leads to ectopic lag-2 expression”RNAi based Gene Regulation (Ectopic)

Poulin et al - EMBO J. 2005 Jul 20;24(14):2613-23.

First Pass

Gene Regulation

RNAi

Interactions

smo-1:

(change in expression) (genetic)

Page 12: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Need for curation integration: RNAi curation as an example

First Pass

InteractionsRNAiGene Regulation

RNAi based Interactions

RNAi curation form has functionality to generate interaction objects

Enter number of interacting genes

Page 13: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

- Keep track in Postgres database - avoids redundant curation

Enhancement

Muv

let-60(n2021)

smo-1(RNAi)

- Currently there are 2493 RNAi-based interactions in WormBase

“let-60(n2021) increase in the percentage of Muv animals compared to smo-1(RNAi) alone”

Page 14: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Coordination of RNAi based Gene Regulation

· If an RNAi object is created first:

I enter information here so that Xiaodong can create a gene regulation object for the RNAi object

Xiaodong creates a gene regulation object and I input the object name here

· Currently there are 365 RNAi-based gene regulations (46% from WS170)

· Need to set up a tracking system in Postgres

· If a gene regulation object is created first:

Page 15: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Postgres

First pass

RNAi

Alleles

Towards Integrating RNAi and Allele Curation

RNAi Checkout

Allele Checkout

Page 16: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Page 17: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008

Page 18: SAB 2008 LITERATURE CURATION Overview & Integrated Phenotype Curation

SAB 2008