Upload
jenis
View
32
Download
0
Embed Size (px)
DESCRIPTION
Daehee Hwang Leroy Hood Institute for Systems Biology. Why Prequips for Systems Biology with proteomic data?. Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample analysis - PowerPoint PPT Presentation
Citation preview
Daehee HwangLeroy Hood
Institute for Systems Biology
2Why Prequips for Systems Biology with proteomic data?
• Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample
analysis
• Need for an interface between proteomic data and systems biology analytical tools such as network/pathway analyses
3Integration of proteomic data at various levels
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
Communicationnot possible!
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
eRaw Data
(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?T
ran
s-P
rote
om
ic P
ipel
ine
4Pep3d: Quality Assessment
Prequips
Multi Sample
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
Pep3D
Properties
- quality assessment
- 2D gel-like visualizationGaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
5Pep3d: Quality Assessment
Pep3D
Instance 1
Pep3D
Instance 2Communication
not possible!
6Interface to Systems Biology
Gaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
Communicationnot possible!
7Prequips Overview
Prequips
Multi Sample
Gaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
- handles multiplesamples at all levels
Key Properties
- integrates high-levelanalysis tools
- is extensible
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
8Integration of proteomic datasets at various levels
Database Search
raw data
Mass Spectrometer
peptide-level data
e.g. mzXML, mzData, ...
Validation
Peptide Quantification
Protein Inference
protein-level data
Protein Quantitation
e.g. pepXML,AnalysisXML,...
e.g. protXML, ...
Trans-Proteomic Pipeline
annotation
further analysis results
9
Raw Data
Data model
Peptide LevelProtein Level
Core Core CoreMeta Meta Meta
Single-Sample Analysis
Multi-Sample Analysis
Project
Data Providers
Data Structures
protein-level data source,e.g. protXML files
peptide-level datasource, e.g. pepXML,dta or AnalysisXML files
raw data level,e.g. mzXML or mzDatafiles
View
ers
Perspectives
10Case Study: Toponomic change in drug treated Mø
Calreticulin
BiP
Bcl2
ATPase
Lamp1
2 4 6 8 10 12 14 16 18 20
8% 28%
114 115 116 117
Fraction #:
Mock1 Mock2 Thapsigargin
11Visualization: Single exp.
CID spectrathat have been selected
detailed information about one of the level 2 spectra
projectmanager peak map for run 29
level 1 spectrum & corresponding CID spectra
level 1
level 2
level 2all scans of Mock 1 experiment
12Visualization: Multiple exps.
(polymer?) contamination in all 4 runs(this would be hard to see with Pep3D)
green = 0red = 1
13Visualization: assess, quntify, etc.Mock Up (software is under development):
m/z
min maxretention time
min max
map 1map 2map 3map 4map 5map 6
map 1 map 2
map 3 map 4X
XX
Doesn’t really match the remaining 3 maps!
14Prequips & the Gaggle
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
15Mayday
16Cytoscape
overall mouse protein/protein interaction map in Cytoscape
17Analysis: Feature extraction
Proteintable
Gaggle pluginfor interactionwith other tools
Filters
18Analysis: Feature extraction
Gaggle plugin: selection for broadcast
calreticulin
19Analysis: Feature selection
Mock1 Mock2 Thapsigargin
20Broadcast to Gaggle
21Prequips to Gaggle
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
22Gaggle Boss
23Gaggle to Cytoscape
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
24Integration: Network Analysis
proteasome complex
ribosome large subunit
chaperones
actin filamentregulation
Thapsigargin 114 iTRAQ ratio
25Cytoscape to Prequips
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
26Analysis: Feature extraction- Module selection
the ids sentfrom Cytoscapethrough the Gaggle
proteasome proteins
27Prequips & the Gaggle
Gaggle Boss
Prequips
Mayday
R statistical environment
Cytoscape
Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks.
KEGGDAVID
Browser
28Analysis: Functional enrichmentthe proteasome complex enriched compared to a mouse genome background
29Prequips Summary
Prequips
Multi Sample
Gaggle
NetworkAnalysisCytoscape
InteractionDatabase
STRING
PathwayDatabase
KEGG
MicroarrayData Analysis
Mayday, TIGR
- handles multiplesamples at all levels
Key Properties
- integrates high-levelanalysis tools
- is extensible
Raw Data(MS, MS/MS)
PeptideId + Quantiation
ProteinId + Quantitation
?
Tra
ns-
Pro
teo
mic
Pip
elin
e
30Conclusion
• general and extensible software for systems biology research with proteomics mass spectrometry data.
• Integration capability of data from various sources for visualization and analysis.
• An interactive environment that supports (visual) data exploration.
31Software details
• implemented in Java
• based on Eclipse Rich Client Platform
• extremely modular architecture
• multiple plugin interfaces– e.g. viewers, data providers, algorithms
• meta information framework– analysis results, sequence information, annotation, ...– data structures as plugins– requirement to support future analytical tools and data
sources
32Acknowledgements
• Special thanks to Nils Gehlenborg
• Hood Lab: Inyoul Lee
• Kay Nieselt
• Aebersold Lab: Nichole King, James Eddes,
Eric Deutsch, Ning Zhang, David
Shteynberg, Wei Yan, and Andrew Garbutt
• Paul Shannon for help with the Gaggle
33
Core
Mayday
Database Gaggle
R
Visualization
Excel
PostgreSQLdatabase
MySQLdatabase
R environmentBioconductor
SBEAMSSBEAMSinstallation
Machine
Learning
WEKA Library
anything else
Prequips
34Cytoscape