15
2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture.

2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Embed Size (px)

Citation preview

Page 1: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

2009 GMOD Meeting

Dhileep Sivam & Isabelle Phan

Seattle Biomedical Research Institute

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

Page 2: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Seattle Biomedical Research Institute (SBRI)

• Founded in 1976• About 250 full-time staff• Focus on infectious disease• 13 Labs• Strong ties to the University of Washington• Bioinformatics Core

Page 3: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

How we first came to use Chado

LmjF Probe Set LinJ Probe Set

LmjF V5.2 LinJ V2.0 LinJ V3.0 LinJ V4.0LmjF V4.0

Mapping MappingMapping

Result Set Result Set Result Set

Page 4: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Microarray Project

Chado

NimblegenData

Parsers

Analysis ToolsNormalization

ScalingFeature-level aggregation

RemappingVisualization

Page 5: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Use Case: SSGCIDSeattle Structural Genomics Center for Infectious Disease

Vaccine Targets!

Gene Cloning & Expression

Protein Crystallization

Structure Determination

Bioinformatic Screening

Project Aim

3D Protein Structure

NIAID Emerging and re-emerging priority pathogens

Structures will serve as a starting point for drug development

Multi-center

Page 6: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

SSGCID

Vaccine Targets!

Gene Cloning & Expression

Protein Crystallization

Structure Determination

Bioinformatic Screening

Page 7: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

SSGCID

Chado

ExternalSequenceResources

BLAST Screening

ExportParsers

Bulk Loader

Page 8: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Things that have come up…

Complexity of querying BLAST results

Gene Models

Complexity of querying microarray data

Materialized Views

SimplestPossible Model

“Grouping of Genes” DBXrefs

Page 9: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

warehouse

ProteomicsMicroarrayStructural genomics

Data access

curation

Automatedanalysispipeline

Sequence data management at SBRI

Page 10: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Chado + GUS: why do we need both?

• Chado– Collaboration with IGS– Annotation tools: Manatee (apollo), Ergatis

• Internal data production

• Gus– Collaboration with UPenn– Web front end

• External data access

Page 11: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Chado

ProteomicsMicroarrayStructural genomics

ManateeManual annotation

ErgatisAnalysis pipeline

Sequence data management at SBRI

GUS

GUS WDK

Page 12: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Chado2GUS: Lost in translation

• Chado– Denormalized

schema• Polymorphism

– Mysql (IGS Chado)

• GUS– Normalized schema

• Subclassing

– Postgres port from Oracle

Page 13: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

Picking the best of two worlds

• Chado– Biological data model– Flexibility

• GUS– Software engineering– Flexibility

Page 14: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute

The future?

• SQL-free data production– Instead of custom wrappers over raw SQL:

• ORMs: Chado Hibernate, ActiveRecords• Unified object model

• RDBMS-free data mining– Instead of GUS predefined query + set

combination• Biomart + Galaxy• RDF + triple store + sparql (object store + Lucene)

Page 15: 2009 GMOD Meeting Dhileep Sivam & Isabelle Phan Seattle Biomedical Research Institute