24
GUS: A Functional GUS: A Functional Genomics Data Genomics Data Management System Management System Chris Stoeckert, Ph.D. Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Center for Bioinformatics and Dept. of Genetics Genetics University of Pennsylvania University of Pennsylvania ASM Conference on Functional Genomics and ASM Conference on Functional Genomics and Bioinformatics Approaches to Infectious Disease Bioinformatics Approaches to Infectious Disease Research Research October 8, 2004 Portland, Oregon October 8, 2004 Portland, Oregon

GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Embed Size (px)

Citation preview

Page 1: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS: A Functional GUS: A Functional Genomics Data Genomics Data

Management SystemManagement System

Chris Stoeckert, Ph.D.Chris Stoeckert, Ph.D.Center for Bioinformatics and Dept. of GeneticsCenter for Bioinformatics and Dept. of Genetics

University of PennsylvaniaUniversity of PennsylvaniaASM Conference on Functional Genomics and Bioinformatics ASM Conference on Functional Genomics and Bioinformatics

Approaches to Infectious Disease ResearchApproaches to Infectious Disease Research

October 8, 2004 Portland, OregonOctober 8, 2004 Portland, Oregon

Page 2: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Database Options for Database Options for Integrated Functional Integrated Functional

GenomicsGenomics RequirementsRequirements

Covers genomics and functional Covers genomics and functional genomicsgenomics

Active and open developer communityActive and open developer community OptionsOptions

GUS: Genomics Unified SchemaGUS: Genomics Unified Schema Chado: generic model organism Chado: generic model organism

database (GMOD http://www.gmod.org)database (GMOD http://www.gmod.org)

Page 3: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS

CoreSRESTESSRADDoTS

Oracle RDBMS

Object Layer for Data Loading

Java Servlets

A Few GUS Web SitesA Few GUS Web SitesSanger Institute

U. Georgia

Flora Centromere

Database

U. Chicago

U. Penn

U. Toronto

Phytophthora sojae

genomeVirginia BioinformiaticsInsitiute

Page 4: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS (Genomics Unified Schema)http://www.gusdb.org

MIAME/MAGE-OMGene ExpressionRAD

EST clustersGene models

Sequence and annotation

DoTS

DocumentationData ProvenanceCore

OntologiesShared

ResourcesSres

TFBS organization

Gene RegulationTESS

FeaturesDomainNamespace

Page 5: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

RAD EST clustering and assembly

DoTS

Genomic alignmentand comparativesequence analysis

Identify sharedTF binding sites

TESS

BioMaterial annotation SRES

Page 6: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Examples of GUS usersExamples of GUS users Large sequencing centerLarge sequencing center

GeneDB: Pathogen Sequencing Unit at the Sanger GeneDB: Pathogen Sequencing Unit at the Sanger InstituteInstitute

Lightly staffed genomics projectLightly staffed genomics project CryptoDB: Kissinger Lab, University of GeorgiaCryptoDB: Kissinger Lab, University of Georgia

Data mining projectData mining project Multiple plant species: Brett Tyler, Virginia Multiple plant species: Brett Tyler, Virginia

Bioinformatics Institute and collaboratorsBioinformatics Institute and collaborators Expression based projectExpression based project

dbDirt: Allen Okey, University of TorontodbDirt: Allen Okey, University of Toronto Bioinformatics Core FacilityBioinformatics Core Facility

University of Pennsylvania Bioinformatics Core FacilityUniversity of Pennsylvania Bioinformatics Core Facility

Page 7: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS Project GoalsGUS Project Goals

Provide:Provide: A platform for broad genomics data A platform for broad genomics data

integrationintegration An infrastructure system for functional An infrastructure system for functional

genomicsgenomics

Support:Support: Websites with advanced query capabilitiesWebsites with advanced query capabilities Research driven queries and miningResearch driven queries and mining

Page 8: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS componentsGUS components

Warehouse(Oracle or PostgreSQL)

Perl Object Layer

WebDevelopment

Kit

QueriesAnd

analysis

Your data GenBank

NRDBdbESTSNPs

GenetrapsMicroArraysPhenotypesPathwaysOrthologsTaxonomy

GOSOEC

More…

Data Load API

Pipeline API

Plugins (data loaders)

Page 9: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Functional genomics Functional genomics with GUSwith GUS

Sequence& Features

Study

Functional Annotation of the Genome

Central Dogma

Regulation (TESS)

Expression (RAD)

Sample

Image Analysis

Statistical Processing

Interaction

Study

Proteomics

Sample

Image Analysis

Statistical Processing

Study

In SituHybridization

ImmunoHistChem

Sample

Image Analysis

Statistical Processing

MIAME

www.mged.org

MIAPE

psidev.sf.net

MISFISHIE

www.scgap.org

Page 10: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS versus chadoGUS versus chado

GUS represents biology in the GUS represents biology in the database tablesdatabase tables Forces applications to load and retrieve Forces applications to load and retrieve

data consistentlydata consistently Chado represents biology in the Chado represents biology in the

applicationsapplications Allows flexibility in what can be stored Allows flexibility in what can be stored

but applications may not be consistentbut applications may not be consistent

Page 11: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Central dogma and Central dogma and sequencessequences

NA Sequence

GeneFeature

RNAFeature

ProteinFeature

AA Sequence

Page 12: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Central dogma and Central dogma and sequencessequences

Gene RNA Protein

NA Sequence AA Sequence

GeneFeature

RNAFeature

ProteinFeature

Page 13: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Central dogma and Central dogma and sequencessequences

Gene RNA Protein

NA Sequence AA Sequence

genome

Multiple sequences (experimental variety)

Gene 1 Gene 2

RNA

Multiple genes

Page 14: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Central dogma and Central dogma and sequencessequences

Gene RNA Protein

NA Sequence AA Sequence

GeneInstance

RNAInstance

ProteinInstance

GeneFeature

RNAFeature

ProteinFeature

Page 15: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Obtaining and Using Obtaining and Using GUSGUS

www.gusdb.orgwww.gusdb.org More info at More info at www.www.gusdbgusdb

.org/documentation.org/documentation Active Active gusdevgusdev mailing list mailing list Relatively straightforward to installRelatively straightforward to install Loading data a struggle for new usersLoading data a struggle for new users

Growing number of tools availableGrowing number of tools available Addressing how to use and write tools with visitsAddressing how to use and write tools with visits

Web Development Kit (WDK) to generate Web Development Kit (WDK) to generate web sites on GUSweb sites on GUS

Page 16: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Current GUS DevelopersCurrent GUS DevelopersAt PennAt Penn Steve FischerSteve Fischer: Project manager, WDK, : Project manager, WDK, Elisabetta ManduchiElisabetta Manduchi: RAD project manager, RAD study annotator: RAD project manager, RAD study annotator Angel Pizarro:Angel Pizarro: Schema development, proteomics, MAGE export Schema development, proteomics, MAGE export Mike SaffitzMike Saffitz: DBA, web services, Postgres: DBA, web services, Postgres Dave Barkan: WDK, GO pipeline, Apollo interfaceDave Barkan: WDK, GO pipeline, Apollo interface Thomas Gan: WDK, genomic alignments pipelineThomas Gan: WDK, genomic alignments pipeline John Iodice: ApiDoTS pipeline, data loading John Iodice: ApiDoTS pipeline, data loading Li Li: OrthoMCL pipelineLi Li: OrthoMCL pipeline Junmin Liu: RAD websites, expression displaysJunmin Liu: RAD websites, expression displays Debbie Pinney: Data loaders, Hum and MusDoTS pipelineDebbie Pinney: Data loaders, Hum and MusDoTS pipeline Jonathan Schug: TESS, architecture and schema developmentJonathan Schug: TESS, architecture and schema development Trish Whetzel: Data loading, RAD, schema developmentTrish Whetzel: Data loading, RAD, schema development Plus rest of group contributes through various GUS-based projectsPlus rest of group contributes through various GUS-based projects

Pathogen Sequencing Unit, Sanger InstitutePathogen Sequencing Unit, Sanger InstituteKissinger Group, U. of GeorgiaKissinger Group, U. of GeorgiaTerry Clark, U. of ChicagoTerry Clark, U. of Chicago

Page 17: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

WDKTestSiteWDKTestSite

Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute)

Page 18: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

The PlasmoDB TeamShailesh DateKobby EssienMartin FraunholzBindu GajriaGreg GrantJohn IodiceJessie KissingerPhilip LaboLi LiJules MilgramDavid RoosChris StoeckertTrish Whetzel

NIAID grant: R01 AI058515

Page 19: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

GUS supports a wide variety of GUS supports a wide variety of queriesqueries

Page 20: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Suppose you want to find all kinases in Suppose you want to find all kinases in P. P. falciparumfalciparum

Page 21: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Gene Report Pages Integrate Gene Report Pages Integrate Genomics and Functional GenomicsGenomics and Functional Genomics

Page 22: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

RAD Study-Annotator

Covers the MIAME checklist and exploits the MGED Ontology

Allows entering of very specific details of an experiment

Web-based forms:Modular structureWritten in PHPFront-end data integrity checks using

JavaScriptManages Data Privacy based on

Project/Group selections present in GUS schema

Manduchi et al. 2004 Bioinformatics 20:452-459.

Page 23: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Vision for GUSVision for GUS Installable for every labInstallable for every lab

Improve install scripts, documentationImprove install scripts, documentation Postgres versionPostgres version

Extendable to all areas of functional genomicsExtendable to all areas of functional genomics Sequence, array-based expression experimentsSequence, array-based expression experiments Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast

2-hybrids2-hybrids In situ hybridizations, metabolitesIn situ hybridizations, metabolites

Interoperable with other GUS installations and with Interoperable with other GUS installations and with common toolscommon tools Exchange files and scripts, MAGE-ML (use community Exchange files and scripts, MAGE-ML (use community

standards)standards) Web services (exchange objects)Web services (exchange objects) Interface with open source tools such as Gbrowse, Artemis, Interface with open source tools such as Gbrowse, Artemis,

ApolloApollo

Page 24: GUS: A Functional Genomics Data Management System Chris Stoeckert, Ph.D. Center for Bioinformatics and Dept. of Genetics University of Pennsylvania ASM

Standards and Ontologies for Standards and Ontologies for Functional Genomics 2Functional Genomics 2

October 23-26, 2004October 23-26, 2004held at the University of Pennsylvania held at the University of Pennsylvania

Medical SchoolMedical Schoolwww.jax.org/courses/eventswww.jax.org/courses/events

Funded in part byNHGRINCRRNERCGSK

Co-Hosted byThe Jackson Laboratory

University of Pennsylvania

European Bioinformatics

Institute------------------------

Student Scholarships Available

--------------------------------------------------------

Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC