Upload
michael-burke
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
GUS: A Functional GUS: A Functional Genomics Data Genomics Data
Management SystemManagement System
Chris Stoeckert, Ph.D.Chris Stoeckert, Ph.D.Center for Bioinformatics and Dept. of GeneticsCenter for Bioinformatics and Dept. of Genetics
University of PennsylvaniaUniversity of PennsylvaniaASM Conference on Functional Genomics and Bioinformatics ASM Conference on Functional Genomics and Bioinformatics
Approaches to Infectious Disease ResearchApproaches to Infectious Disease Research
October 8, 2004 Portland, OregonOctober 8, 2004 Portland, Oregon
Database Options for Database Options for Integrated Functional Integrated Functional
GenomicsGenomics RequirementsRequirements
Covers genomics and functional Covers genomics and functional genomicsgenomics
Active and open developer communityActive and open developer community OptionsOptions
GUS: Genomics Unified SchemaGUS: Genomics Unified Schema Chado: generic model organism Chado: generic model organism
database (GMOD http://www.gmod.org)database (GMOD http://www.gmod.org)
GUS
CoreSRESTESSRADDoTS
Oracle RDBMS
Object Layer for Data Loading
Java Servlets
A Few GUS Web SitesA Few GUS Web SitesSanger Institute
U. Georgia
Flora Centromere
Database
U. Chicago
U. Penn
U. Toronto
Phytophthora sojae
genomeVirginia BioinformiaticsInsitiute
GUS (Genomics Unified Schema)http://www.gusdb.org
MIAME/MAGE-OMGene ExpressionRAD
EST clustersGene models
Sequence and annotation
DoTS
DocumentationData ProvenanceCore
OntologiesShared
ResourcesSres
TFBS organization
Gene RegulationTESS
FeaturesDomainNamespace
RAD EST clustering and assembly
DoTS
Genomic alignmentand comparativesequence analysis
Identify sharedTF binding sites
TESS
BioMaterial annotation SRES
Examples of GUS usersExamples of GUS users Large sequencing centerLarge sequencing center
GeneDB: Pathogen Sequencing Unit at the Sanger GeneDB: Pathogen Sequencing Unit at the Sanger InstituteInstitute
Lightly staffed genomics projectLightly staffed genomics project CryptoDB: Kissinger Lab, University of GeorgiaCryptoDB: Kissinger Lab, University of Georgia
Data mining projectData mining project Multiple plant species: Brett Tyler, Virginia Multiple plant species: Brett Tyler, Virginia
Bioinformatics Institute and collaboratorsBioinformatics Institute and collaborators Expression based projectExpression based project
dbDirt: Allen Okey, University of TorontodbDirt: Allen Okey, University of Toronto Bioinformatics Core FacilityBioinformatics Core Facility
University of Pennsylvania Bioinformatics Core FacilityUniversity of Pennsylvania Bioinformatics Core Facility
GUS Project GoalsGUS Project Goals
Provide:Provide: A platform for broad genomics data A platform for broad genomics data
integrationintegration An infrastructure system for functional An infrastructure system for functional
genomicsgenomics
Support:Support: Websites with advanced query capabilitiesWebsites with advanced query capabilities Research driven queries and miningResearch driven queries and mining
GUS componentsGUS components
Warehouse(Oracle or PostgreSQL)
Perl Object Layer
WebDevelopment
Kit
QueriesAnd
analysis
Your data GenBank
NRDBdbESTSNPs
GenetrapsMicroArraysPhenotypesPathwaysOrthologsTaxonomy
GOSOEC
More…
Data Load API
Pipeline API
Plugins (data loaders)
Functional genomics Functional genomics with GUSwith GUS
Sequence& Features
Study
Functional Annotation of the Genome
Central Dogma
Regulation (TESS)
Expression (RAD)
Sample
Image Analysis
Statistical Processing
Interaction
Study
Proteomics
Sample
Image Analysis
Statistical Processing
Study
In SituHybridization
ImmunoHistChem
Sample
Image Analysis
Statistical Processing
MIAME
www.mged.org
MIAPE
psidev.sf.net
MISFISHIE
www.scgap.org
GUS versus chadoGUS versus chado
GUS represents biology in the GUS represents biology in the database tablesdatabase tables Forces applications to load and retrieve Forces applications to load and retrieve
data consistentlydata consistently Chado represents biology in the Chado represents biology in the
applicationsapplications Allows flexibility in what can be stored Allows flexibility in what can be stored
but applications may not be consistentbut applications may not be consistent
Central dogma and Central dogma and sequencessequences
NA Sequence
GeneFeature
RNAFeature
ProteinFeature
AA Sequence
Central dogma and Central dogma and sequencessequences
Gene RNA Protein
NA Sequence AA Sequence
GeneFeature
RNAFeature
ProteinFeature
Central dogma and Central dogma and sequencessequences
Gene RNA Protein
NA Sequence AA Sequence
genome
Multiple sequences (experimental variety)
Gene 1 Gene 2
RNA
Multiple genes
Central dogma and Central dogma and sequencessequences
Gene RNA Protein
NA Sequence AA Sequence
GeneInstance
RNAInstance
ProteinInstance
GeneFeature
RNAFeature
ProteinFeature
Obtaining and Using Obtaining and Using GUSGUS
www.gusdb.orgwww.gusdb.org More info at More info at www.www.gusdbgusdb
.org/documentation.org/documentation Active Active gusdevgusdev mailing list mailing list Relatively straightforward to installRelatively straightforward to install Loading data a struggle for new usersLoading data a struggle for new users
Growing number of tools availableGrowing number of tools available Addressing how to use and write tools with visitsAddressing how to use and write tools with visits
Web Development Kit (WDK) to generate Web Development Kit (WDK) to generate web sites on GUSweb sites on GUS
Current GUS DevelopersCurrent GUS DevelopersAt PennAt Penn Steve FischerSteve Fischer: Project manager, WDK, : Project manager, WDK, Elisabetta ManduchiElisabetta Manduchi: RAD project manager, RAD study annotator: RAD project manager, RAD study annotator Angel Pizarro:Angel Pizarro: Schema development, proteomics, MAGE export Schema development, proteomics, MAGE export Mike SaffitzMike Saffitz: DBA, web services, Postgres: DBA, web services, Postgres Dave Barkan: WDK, GO pipeline, Apollo interfaceDave Barkan: WDK, GO pipeline, Apollo interface Thomas Gan: WDK, genomic alignments pipelineThomas Gan: WDK, genomic alignments pipeline John Iodice: ApiDoTS pipeline, data loading John Iodice: ApiDoTS pipeline, data loading Li Li: OrthoMCL pipelineLi Li: OrthoMCL pipeline Junmin Liu: RAD websites, expression displaysJunmin Liu: RAD websites, expression displays Debbie Pinney: Data loaders, Hum and MusDoTS pipelineDebbie Pinney: Data loaders, Hum and MusDoTS pipeline Jonathan Schug: TESS, architecture and schema developmentJonathan Schug: TESS, architecture and schema development Trish Whetzel: Data loading, RAD, schema developmentTrish Whetzel: Data loading, RAD, schema development Plus rest of group contributes through various GUS-based projectsPlus rest of group contributes through various GUS-based projects
Pathogen Sequencing Unit, Sanger InstitutePathogen Sequencing Unit, Sanger InstituteKissinger Group, U. of GeorgiaKissinger Group, U. of GeorgiaTerry Clark, U. of ChicagoTerry Clark, U. of Chicago
WDKTestSiteWDKTestSite
Developed in collaboration with Adrian Tivey& Marie-Adele Rajandream (PSU, Sanger Institute)
The PlasmoDB TeamShailesh DateKobby EssienMartin FraunholzBindu GajriaGreg GrantJohn IodiceJessie KissingerPhilip LaboLi LiJules MilgramDavid RoosChris StoeckertTrish Whetzel
NIAID grant: R01 AI058515
GUS supports a wide variety of GUS supports a wide variety of queriesqueries
Suppose you want to find all kinases in Suppose you want to find all kinases in P. P. falciparumfalciparum
Gene Report Pages Integrate Gene Report Pages Integrate Genomics and Functional GenomicsGenomics and Functional Genomics
RAD Study-Annotator
Covers the MIAME checklist and exploits the MGED Ontology
Allows entering of very specific details of an experiment
Web-based forms:Modular structureWritten in PHPFront-end data integrity checks using
JavaScriptManages Data Privacy based on
Project/Group selections present in GUS schema
Manduchi et al. 2004 Bioinformatics 20:452-459.
Vision for GUSVision for GUS Installable for every labInstallable for every lab
Improve install scripts, documentationImprove install scripts, documentation Postgres versionPostgres version
Extendable to all areas of functional genomicsExtendable to all areas of functional genomics Sequence, array-based expression experimentsSequence, array-based expression experiments Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast Array CGH, 2-D gel electrophoresis, mass spectrometry, yeast
2-hybrids2-hybrids In situ hybridizations, metabolitesIn situ hybridizations, metabolites
Interoperable with other GUS installations and with Interoperable with other GUS installations and with common toolscommon tools Exchange files and scripts, MAGE-ML (use community Exchange files and scripts, MAGE-ML (use community
standards)standards) Web services (exchange objects)Web services (exchange objects) Interface with open source tools such as Gbrowse, Artemis, Interface with open source tools such as Gbrowse, Artemis,
ApolloApollo
Standards and Ontologies for Standards and Ontologies for Functional Genomics 2Functional Genomics 2
October 23-26, 2004October 23-26, 2004held at the University of Pennsylvania held at the University of Pennsylvania
Medical SchoolMedical Schoolwww.jax.org/courses/eventswww.jax.org/courses/events
Funded in part byNHGRINCRRNERCGSK
Co-Hosted byThe Jackson Laboratory
University of Pennsylvania
European Bioinformatics
Institute------------------------
Student Scholarships Available
--------------------------------------------------------
Photo by R. Kennedy, B Trist, R. Tarver, for GPTMC