Upload
lea
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Core 2: Bioinformatics. CBio-Berkeley. Outline. Berkeley group background Core 2 first round what: aims, milestones how: software lifecycle, interaction w/ other cores Current progress Discussion. Berkeley group: genomics. Formerly BDGP (Berkeley Drosophila Genome Project) Informatics - PowerPoint PPT Presentation
Citation preview
Core 2: Bioinformatics
CBio-Berkeley
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle, interaction w/
other cores• Current progress • Discussion
Berkeley group: genomics
• Formerly BDGP (Berkeley Drosophila Genome Project) Informatics– Genome sequencing, analysis and
annotation– Genomic application development– Database development
• FlyBase• Generic Model Organism Database
Apollo
GBrowse
In-situ expression database
Genomics applications
• GadFly– analysis and annotation database– pipeline software
• BOP– computational analysis integration
• CGL– Comparative Genomics Software
Library
SO and SOFA
• Sequence Ontology for Feature Annotation
• Ontology for genomics– Sequence feature classes:
• mRNA, intron, UTR, sequence_variant, …
– Sequence feature relations• exon part_of transcript• polypeptide derives_from mRNA
Chado• Model organism relational database schema
– FlyBase, GMOD
• Modules– sequence annotations– expression– map– genotype– phenotype– ontology/cv– …
• Generic schema– Uses ontologies for strong typing
Berkeley group: GO
• Gene Ontology - Informatics– Database, web portal – Ontology editing tools– Ontology QC and integration– OBO
OBO-Edit (formerly DAG-Edit)
AmiGO and GO Database
Obol
• Problem: large ontologies of composite terms are difficult to manage
• Solution: partial automation (reasoners)• Requires logical definitions
– how do we obtain them?
• Solution: Obol– Parses logical definitions from class names– Logical definitions can be reasoned over
• detect errors and automation
– Integrates OBO ontologies
OBO Relations Ontology
• Common relations used across ontologies must mean the same thing
– is_a– part_of– derives_from– has_participant– …
• OBO relations ontology provides precise definitions– defines class-level relations in terms of their
instances
• http://obo.sourceforge.net/relationship– collaboration with core5, Manchester & others
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle, interaction w/
other cores• Current progress • Open questions
Core 2 specific aims
• Aims1. Capture and describe data2. Reconcile annotation and ontology
changes3. Store, view and compare annotations4. Link disease genes
• First round– phenotypes: Fly and Zebrafish– HIV clinical trial data
Aim 1: Capture and describe data
• Phenotype data capture– OBO-Edit plug-ins– Combine classes from multiple
ontologies• PATO, anatomical ontologies
– NLP tools?
• Clinical trial data capture– what are the appropriate tools?
Aim 1: Capture and describe data
• Zebrafish, fly– PaTO: Phenotype and trait ontology
• phenotype ‘primitives’– ‘Entity-Attribute-Value’ model– Phenotype ontologies– Genetic data– Orthologs
• Clinical trial data– generic instance model– what are the appropriate ontologies here?
PATO
• An ontology of attributes and attribute values– e.g. morphology, structure, placement
• Current status of PATO?– needs work to conform to sound ontology
principles• definitions• formalisation of attributes
– working with core3-cambridge (Gkoutos) and core5 (Neuhaus)
Phenotype annotation
• Entity-attribute structured annotations– Entity term; PATO term
• brain FBbt:00005095; fused PATO:0000642
• gut MA:0000917; dysplastic PATO:0000640
• tail fin ZDB:020702-16; ventralized PATO:0000636
• kidney ZDB:020702-16; hypertrophied PATO:0000636
• midface ZDB:020702-16; hypoplastic PATO:0000636
• Pre-composed phenotype terms– Mammalian Phenotype Ontology
• “increased activated B-cell number” MPO:0000319
• “pink fur hue” MPO:0000374
Example (Fly)
Entity Attribute Value Background/Environment
embryp viability lethal Scer\GAL4[hs.PB]
dorsal cuticle shape abnormal
… … … …
wing vein L2 shape branched temperature sensitive
Gene: JraAllele: Jra[bZIP.Scer\UAS]Allele Description:defects in head and dorsal cuticle.Scer\GAL4[hs.PB] induces…..
A481G
bZIP
Genotype-Phenotype datamodel
• Need to model complex genotypes• Environment• Phenotype
– E-A-V is not enough• Relational attributes• Complex phenotypes• Measurements and assays
– CSHL 2005 Phenotype meeting
Aim 2: Reconcile annotation and
ontology changes• Ontology evolution can trigger
annotation changes• Identifiers
– all classes and annotations will have stable identifiers
– Cores 1 and 2 to decide on identifier model• LSID URNs
• OntoTrack
Aim 3: Store, view and compare annotations
• OBO: ontologies• OBD: data annotated using
ontologies– genotype-phenotype– clinical trials– others
OBD: A Database for OBO
• Data warehouse– collected from MODs and other sources
• Annotation versioning• Generic data model
– Any data typed by OBO classes can be stored
• Specific annotation data views– Clinical trial data view– Phenotype data view
• Chado-compliant• Entity-attribute-(value) model
Key technologies
• ‘Semantic Web’ database technology– ontology-aware
• ontologies are part of meta-model• higher level query languages
– SPARQL, SeRQL, …• tool interoperability
– Protégé-OWL, Jena, ..
– SQL compatibility• optionally layered on relational model
– Standards? Maturity?• Many implementations
– Sesame, Kowari,
Aim 3: Store, view and compare annotations
• Browsing– AmiGO-2
• Advanced visualization– work with core 1 (University of
Victoria)
Comparing annotations
• process vs state– regulatory processes:
• acidification of midgut has_quality reduced rate• midgut has_quality low acidity
• development vs behavior– wing development has_quality abnormal– flight has_quality intermittent
• granularity (scale)– chemical vs molecular vs cell vs tissue vs
anatomical part
Integrating anatomical ontologies
• Annotations should be comparable between species– phenotype annotations are composed of anatomical
terms
• Multiple species-centric anatomical ontologies– Problem: how do we compare across species?– XSPAN (Bard et al): creating mappings– Core 1: ontology mappings
Aim 4: Linking disease genes
• Homology data– Orthologous genes
• Genomic data– SNPs, sequence variants
• Ontologies– Disease ontologies– Semantic similarity– Ontology integration
• Obol, XSPAN
Linking disease to phenotype
• Relationship of phenotype to diseases and disorders– essentialist– statistical
• Disease ontologies– OBO disease ontology (Northwestern)– EVOC disease ontology (EVOC)– Others
• Disease ontology workshop (core 5)– November 2006
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle,
interaction w/ other cores• Current progress • Open questions
Software lifecycle
• Software is developed in phases• Different phases require
interaction with different cores• Iterative “Agile” methodology
– fast cycles– involve ‘customer’ (core3) at all
phases
Outline
• Berkeley group background• Core 2 first round
– what: aims, milestones– how: software lifecycle, interaction w/
other cores
• Current progress
Current progress
• Meetings– CSHL November 2005
• Phenotype ontology meeting• Phenotype tools workshop
– Berkeley, UVic, Core 3
• OBO-Edit complex class plug-in• Phenotype browser prototype• Genotype-Phenotype datamodel
OBO-Edit complex class plug-in
• Combinatorial composition of classes
• Current use-cases:– plant anatomical structures– integrating GO and OBO-Cell
• Ideal for phenotype classes– extend to make ‘phenotype’ plug-in
OBD Progress
• Genotype-Phenotype data model defined
• Prototype implemented• evaulating technologies
Phenotype browser
• Experimental branch of AmiGO code• Allows browsing and querying of
combinatorial phenotype annotations
• Experimental dataset• Demo
– http://yuri.lbl.gov/amigo/obd