Upload
saxton
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
GxDb a universal tool to collect, analyse , manage and visualize transcriptomic data. Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin. BingGi Days January 2010. Introduction. What is transcriptomic ? - PowerPoint PPT Presentation
Citation preview
GxDb a universal tool to collect, analyse, manage and
visualize transcriptomic data
Wolfgang Raffelsberger, Raymond Ripp and Laetitia Poidevin
BingGi DaysJanuary 2010
What is transcriptomic ?
-> a high throughput analysis of gene expression by measuring the amount of mRNA
What are the techniques ?
-> DNA microarrays-> SAGE-> Differential Display-> ….
=> large quantities of data
GxDb: integrative tool to
Introduction
collecttreatanalyzemanage visualize
GxDb is a website and a database
Organization of data in GxDb
SampleSample
Individual• name• age• description
Individual• name• age• description
OrganismOrganism
GenotypeGenotypeTissueTissue
TreatmentTreatment
SampleConditionSampleCondition
ex: mouse wt aged 9 dayex: mouse wt aged 9 day
ArraytypeArraytype ex: Mouse430_2ex: Mouse430_2
ArraytypeArraytype
RealExpRealExp
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2
CEL file r1CEL file r1
RealExp 2RealExp 2
ArraytypeArraytype
Sample 2Sample 2
CEL file r5CEL file r5CEL file r4CEL file r4
CEL file r3CEL file r3
RealExp 3RealExp 3
ArraytypeArraytype
Sample 3Sample 3
CEL file r8CEL file r8CEL file r7CEL file r7
CEL file r6CEL file r6
RealExp 4RealExp 4
ArraytypeArraytype
Sample 4Sample 4
CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9
Organization of data in GxDb
ex: Mouse430_2ex: Mouse430_2
ex: wt_d9ex: wt_d9ex: wt_d9ex: wt_d9
ex: wt_d11ex: wt_d11
ex: wt_d13ex: wt_d13
ex: wt_d15ex: wt_d15
Organization of data in GxDb
ArraytypeArraytype
RealExpRealExp
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2
CEL file r1CEL file r1
RealExp 2RealExp 2
ArraytypeArraytype
Sample 2Sample 2
CEL file r5CEL file r5CEL file r4CEL file r4
CEL file r3CEL file r3
RealExp 3RealExp 3
ArraytypeArraytype
Sample 3Sample 3
CEL file r8CEL file r8CEL file r7CEL file r7
CEL file r6CEL file r6
RealExp 4RealExp 4
ArraytypeArraytype
Sample 4Sample 4
CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9
Experiment
ArraytypeArraytypeRealExpRealExp
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2CEL file r1CEL file r1
RealExp 2RealExp 2
ArraytypeArraytype
Sample 2Sample 2
CEL file r5CEL file r5CEL file r4CEL file r4CEL file r3CEL file r3
RealExp 3RealExp 3
ArraytypeArraytype
Sample 3Sample 3
CEL file r8CEL file r8CEL file r7CEL file r7CEL file r6CEL file r6
RealExp 4RealExp 4
ArraytypeArraytype
Sample 4Sample 4
CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9
ExperimentSignal Intensity
Ratio
Cluster
≠ expressed genes
Quality
Treatment and Analysis
protocol
Treatment and Analysis
protocol
1) Normalization 6 methods: RMA, gcRMA, dChip, MAS5.0, plier, vsn
=> signal intensity
2) Calculate average (between replicats) and ratio
3) Filtering - Eliminate probesets that are never expressed in all arrays of one experiment based on distribution or call (according to normalization method) - Eliminate probesets with very low changes between condition et reference
based on fold change based on standard deviation
4) Statistical analysis - method: t-test combined with empirical bayes for shrinkage - estimation of FDR (false discovery rate) - tag probesets with differential expression (automatic threshold findings)
Treatment and Analysis protocol
Treatment and Analysis protocol
1) Normalization 2) Calculate average (replicats) and ratio 3) Filtering4) Statistical analysis
5) Clusteringtool: Cluspackmethods: k-means (DPC) Mixtures models (aic and bic)
=> clusters
6) Quality Control Reporttool: RReportGenerator for Automatic Statistical AnalysisAutomatic Statistical Analysis to estimate the quality of arrays
Upload form
Upload formStep 1: Selection of Arraytype and Experiment
Upload formStep 1
Create your new experiment
Organism
Genotype
SampleCondition
Individual
TreatmentType
Treatment
Tissue
Sample
Upload formStep 1
Create your news samples
Upload formStep 1: Selection of Arraytype and Experiment
Upload formStep 2: Upload of .cel files
Upload formStep 3: Select the corresponding sample to each cel file
Upload formStep 4: Select the interesting comparisons to calculate ratio
Ratio:Condition / reference
Example:C3H_rd1_d10 / C3H_wt_d10
Upload formStep 5: Launch Treatment and Analysis protocol
Upload formStep 5: Clustering, Quality analysis and loading in database
Signal IntensityRatio
≠ expressed gene
Clustering
RealExp
Organization of data in GxDb
QualitySample
Experiment
Cel file
Arraytype-Probeset
Query GxDb
Query GxDb
Experiment
Probeset
Sample
RealExpSignal Intensity
RatioCluster
time-co
urse
of re
tinal d
evelo
pm
en
t
Visualization in GxDb
GxDb WebsiteUpload
Querying Display
alnitak
Star3
Star4
Star5
Star6
Star7
Star8
/GxData
GxDb SQL database
http://gx.igbmc.frWeb Services
Café des sciences QSub
Ordonnanceur
GxDb ressourcesLanguages used:
PHP (HTML) - Upload - PipeWork - RadarGenerator - Fed
R - Treatment and analysis protocol - RReportGenerator
SQL
Tcl - Gx (~ Gscope) - Probeset loading
C - Cluspack
Conclusion and Prospects• Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis
=> Comparisons => Analyse the strengths and weaknesses of the different protocols
• Improvement of website • More user friendly• Visualization of clusters, ratio• Tools for meta-analysis
• Possibility of upload data directly from GEO
• Diagnostic report to analyze easier the data
• Links to others databases and tools: STRING, GSEA..
Ratio Pipework
Organism
Normalization
Ratio minimumRatio maximum
• Integration and storage in a unifying format
• Automated raw-data upload, storage, treatment and analysis multiple treatment protocols multiple clustering methods multiple human and automatic expert analysis
=> Comparisons => Analyse the strengths and weaknesses of the different protocols
• Facilitated querying and data visualization
Advantages of GxDb
ArraytypeArraytype
RealExpRealExp
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2
CEL file r1CEL file r1
ArraytypeArraytype
RealExp 2RealExp 2
ArraytypeArraytype
Sample 2Sample 2
CEL file r5CEL file r5CEL file r4CEL file r4
CEL file r3CEL file r3
ArraytypeArraytype
RealExp 3RealExp 3
ArraytypeArraytype
Sample 3Sample 3
CEL file r8CEL file r8CEL file r7CEL file r7
CEL file r6CEL file r6
ArraytypeArraytype
RealExp 4RealExp 4
ArraytypeArraytype
Sample 4Sample 4
CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9
GxDb transcriptomics
PROBESET 3• probeset_id• genename• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane
PROBESET 2• genename• probeset_id• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane
Experiment Experiment
ArraytypeArraytype
RealExp 1RealExp 1
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2CEL file r1CEL file r1
ArraytypeArraytype
RealExp 2RealExp 2
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2
CEL file r1CEL file r1
ArraytypeArraytype
RealExp 3RealExp 3
ArraytypeArraytype
SampleSample
CEL file r3CEL file r3CEL file r2CEL file r2
CEL file r1CEL file r1
ArraytypeArraytype
RealExp 4RealExp 4
ArraytypeArraytype
Sample 4Sample 4
CEL file r11CEL file r11CEL file r10CEL file r10CEL file r9CEL file r9
ArraytypeArraytype PROBESET• probeset_id• genename• genedescription• species• speciessymbol• representpublicid• refseqtranscriptid• gscope_id• swissprot• unigene_id• entrezgene• ensembl• mgi• cytoband• chromoloc• omim• tissuespecificity• linkeddiseases• go_biologicalprocess• go_cellularcomponent• go_molecularfunction• pathway• interpro• transmembrane
45000
SampleSample
Individual• name• age• description
Individual• name• age• description
OrganismOrganism GenotypeGenotype
TissueTissue
TreatmentTreatment
SampleConditionSampleCondition
Signal Intensity
Ratio
Cluster
already exists ?
Arraytypes
Createnew Arraytype
already exists ?
Sample
Create new Sample with• existing or new Individual• existing or new Organism• existing or new Tissues• existing or new Genotype• existing or new Treatment
• Upload your .CEL files
• Enter their association to Arraytypes and Samples
• Define Couples of RealExpsfor the Ratio Calculation
• Fill in the other information for the Experiment
Run Automatic AnalysisQuery and Display Results
GxDb protocol from upload to display
Quality Report
Signal Intensity
Ratio
Cluster
Differentially Expressed
Genes