Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
1
Analysis and interpretation of metabolomics and proteomics data
Matej Orešič27.9.2005
2
Nucleus
Cytoplasm
Extracellularmetabolites:NMR, MS
P P
Protein levels:2D gels, MS
Intracellular metabolites:NMR, MS
Post-translationalmodifications:2D gels, MS
P
Protein complexes:Protein chips, Ab arrays
Gene expressionmachinery
proteins
Gene expression:Microarrays
Control of geneexpression:Promoter arrays
folding
Technologies for systems biology studies at the cellular level
3
KEGG Data
4
Metabolic Fluxes
Cell
Subtrates
Products Biomass
GenomicsProteomics
Enzyme activitiesMetabolite levels
etc.
Subtrates
Products Biomass
Metabolic flux balancing
Subtrates
Products Biomass
-in most cases underdetermined system=> experimental constraints necessary
8
Flux balancing
X 3
21
v3+v2-v1=0
9
Chemostat culturesH2490: XR/XDH+XKmc
0
2
4
6
8
10
12
0 50 100 150 200 250 300 350
Cultivation time (h)
OD
600,
Etha
nol (
g/l)
0,0
0,4
0,8
1,2
1,6
2,0
2,4
CO
2 (%
), A
ceta
te, G
lyce
rol (
g/l)
OD600
ACETATE
CO2
EtOH
GLYCEROL
Aerobic (6 Tr) Anaerobic (5 Tr)13C-Labelled (1 Tr)
13C-Labelled (1 Tr)
0
2
4
6
8
10
12
-10 40 90 140 190 240 290 340
Cultivation time (h)
OD 6
00, N
orm
aliz
ed x
ylos
e (2
7 g/
l in
feed
)
0,0
0,4
0,8
1,2
1,6
2,0
2,4
CO
2 (%
), Xy
litol
, Eth
anol
(g/l)
Aerobic (6 Tr) Anaerobic (5 Tr)13C-Labelled (1 Tr) 13C-Labelled (1
Xylitol
Xylose
OD600
Ethanol
CO2
10g/L Glucose
3 g/L glucose+ 27g/L xylose
Samples:•aerobic culture •anaerobic culture•5, 30, and 60 minutesafter the switch off oxygensupply
10
Benefits of 13C labeling & NMR
• position sensitive
• isotopomer sensitive
12
METAFoR(metabolic flux analysis)
14
Mass spectrometry
15
Separation
RPLC/CapLC +ESI
Massspectrometer
(Qtof)
Relativ
e
Abundanc
e
10.00 15.00 20.00 25.00Time0
100
%
Retention time
MS-only ion chromatogram
400 600 800 1000 1200 1400m/z0
100
%
Mass / Charge
862 863 864 865 866 867 868 869m/z0
100
%
864.43
863.04862.70
864.92
865.41
865.92
866.44867.42
869.45868.34
SelectedIon
MS survey scan
400 600 800 1000 1200 1400 1600 1800 2000m/z
05
101520253035
404550556065707580859095
100 1047.5
1683.6960.4
1160.51659.5
873.4 1273.6 1570.41531.6646.4 1402.6745.5 1771.31071.31199.5 1985.5826.3 1857.9
610.1 647.3545.3425.9
b14
b7b9 b10 b11 b12
b13
b15 b17b16
y8
y15
y14
y13y12
y11
y10
y9
y7
y6y5
1328.6y16
Mass / Charge
MS/MS fragmentation pattern
Proteins, purifiedand separated
Sampledigest
labellingPeptides
LC/MS proteomics platform and data processing
16
LC/MS TIC ProfilesFraction 4
Liver Protein ProfilingFractionation using Reversed Phase Chromatography
RP Separation
Fraction 4
5 10 15 20 25 30Time (min)
20406080
100
Relative Abundance
ApoE3-2
ApoE3-3
ApoE3-4
ApoE3-5
ApoE3-7
WT-14
WT-15
WT-16
WT-20
WT-13
TIC
17
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
Time (min)
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
Time (min)
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e Ab
unda
nce
ApoE3
Wildtype
Plasma Protein ProfilingLC/MS of digested SEC fraction
18
PC 1
Unsupervised clustering reveals differences at 9 week age
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.4
-0.2
0
0.2
0.4
0.6
WT-13
WT-20WT-16
WT-14
WT-15
ApoE3-2
ApoE3-4
ApoE3-7
ApoE3-3
ApoE3-5
Fraction IPC 2
MousePlasma
PCA Score plot PC1 vs PC2
Plasma Protein ProfilingPrincipal Component Analysis: Fraction I
19
Identifyusing
MS / MSMousePlasma
Fraction II
500 1000 1500 2000
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
Factor Spectrum
465.00 485.00
535.00
622.00652.00
662.00
750.00
811.00
911.00926.00
992.00
1041.00 1050.00
1208.001216.00
1243.001267.00
1268.001298.00
1340.001366.00
1583.00
m/z
Mor
e ab
unda
ntIn
Apo
E3Le
ss a
bund
ant
In A
poE3
Plasma Protein ProfilingFactor Spectrum: Peptides Exhibiting Differences
20
ApoE3-7 #768-783 RT: 27.60-28.01 AV: 6 NL: 1.16E5F: + c ESI Full ms2 [email protected] [ 375.00-2000.00]
400 600 800 1000 1200 1400 1600 1800 2000m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
1047.5
1683.6960.4
1160.5
1659.5873.4 1273.6 1570.4
1531.6646.4 1402.6745.5 1771.31071.3 1199.5 1985.5826.3 1857.9
610.1 647.3545.3425.9
b14
b7b9 b10 b11 b12
b13
b15b17b16
y8
y15
y14
y13y12
y11
y10
y9
y7
y6y5
1328.6y16
>gi|4557325|ref|NP_000032.1|| apolipoprotein E-gi|114039|sp|P02649|APE_HUMANAPOLIPOPROTEIN E PRECURSOR (APO-E)-gi|71795|pir||LPHUE apolipoprotein E precursor- human-gi|178851|gb|AAB59546.1| (K00396) preapolipoprotein E [Homo sapiens]-gi|4105704|gb|AAD02505.1| (AF050154) apolipoprotein E [Homo sapiens][MASS=36154]MKVLWAALLV TFLAGCQAKV EQAVETEPEP ELRQQTEWQS GQRWELALGR FWDYLR WVQT LSEQVQEELLSSQVTQELR A LMDETMKELK AYKSELEEQL TPVAEETRAR LSKELQAAQA RLGADMEDVC GRLVQYRGEVQAMLGQSTEE LRVRLASHLR KLRKRLLRDA DDLQKRLAVY QAGAREGAER GLSAIRERLG PLVEQGRVRAATVGSLAGQP LQERAQAWGE RLRARMEEMG SRTRDRLDEV KEQVAEVRAK LEEQAQQIRL QAEAFQARLKSWFEPLVEDM QRQWAGLVEK VQAAVGTSAA PVPSDNH>monoisotopic mass = 36113
position sequence ( NCBI BLAST link) -------- -------- 57- 79 WVQTLSEQVQEELLSSQVTQELR
W V Q T L S E Q V Q E E L L S S Q V T Q E L Ry8y15 y14 y13 y12 y11 y10 y9 y7 y6 y5y16
b14b7 b9 b10 b11 b12 b13 b15 b17b16
m/z 1366+2 Ion hApoE3
Peptide Sequencing using MS/MS
21
MetabolomicsStudy of small molecules , or metabolites, contained in a cell, tissue or organ (including fluids) and
involved in primary and intermediary metabolism.
Homeostasis‘Housekeeping’
Organic Acids
Lipids
Amino Acids
Nucleotides
Steroids
Eicosanoids
Neurotransmitters
Peptides
Trace elements
ng/mlng/ml
pg/mlpg/ml
mg/mlmg/ml
µµg/mlg/mlNMR
MassSpectrometry
CUSTOM
22
We are not alone genomewise …
From Nicholson et al., Nature Reviews Microbiology (2005)
23
Historical note
1500-2000BCChina•Ants used to detect patients with diabetes
1940s-1970s•Advances in analytics•Pattern recognition
Metabolic profiling
21st century•Advances in analytics•Biostatistics & Bioinformatics
Modern era of metabolomics and systems biology
24
Modern metabolomics platformExperiment design + Analytical chemistry + Chemometrics + Bioinformatics
Global screening (H2O soluble)
Global screening (Lipids)
Primary metabolitesEicosanoids and fatty acids
Other
SamplesMetabolite extraction methods
and analytical platformsProfiling experiments
(LC/MS, GC/MS)
Data processingIdentification
Bio-/chemo-informaticsknowledge mining
Multivariate statistical analyses Data-driven integration
Biological insight
25
Data processing
Analysis of Variance (ANOVA)
Selection of peaks displaying significant changes between Wild Type and Transgenic, separately from
gender or age specific effects
Univariate Analysis
ParametricTests(t-test)
NonparametricTests
(Kolmogorov-Smirnov)
Pre-processing & Normalization & QC
Prioritization of Important Peaks for Identification
Correlation Analysis
Correlation NetworksLinear and Non-Linear approachto profile association calculation
Select peaks with high levelof correlations to strongest
outliers
Exploratory Analysis
PCA and Discriminant Analysis
Study general trendsIn data
Verification of Protein or Metabolite IDs. Databases Extensions/Traversals
26
Wild type
ppm1.02.03.04.05.0
ApoE3
Global Metabolite AnalysisNMR Spectra of Plasma
27Plasma Extract (Organic Phase)
∗
∗
∗∗∗
∗∗∗
-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15
-0.3
-0.2
-0.1
0.0
0.1
0.2
∗
∗∗∗
∗
∗
∗∗
∗∗∗
∗
∗
∗
∗∗
∗∗
∗∗
∗∗
∗
∗
∗
∗
∗
∗
∗
∗∗
∗
∗∗
∗
∗
∗
∗
∗∗
∗
∗
∗∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗∗
∗∗
∗∗∗
∗
∗
∗
∗
∗
∗∗∗
∗∗∗
-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15D1
-0.3
-0.2
-0.1
0.0
0.1
0.2
D2
∗
∗∗∗
∗
∗
∗∗
∗∗∗
∗
∗
∗
∗∗
∗∗
∗∗
∗∗
∗
∗
∗
∗
∗
∗
∗
∗∗
∗
∗∗
∗
∗
∗
∗
∗∗
∗
∗
∗∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗∗
∗∗
∗∗∗
∗
∗
∗
ApoE3Wild type
Metabolite AnalysisPlasma NMR Principal Component & Discriminant Analysis
28
E+067.518
8:00 16:00 24:00 32:00 40:000
20
40
60
80
100
m/z:450>950
E+067.518
8:00 16:00 24:00 32:00 40:000
20
40
60
80
100
m/z:450>950
time (min)
Inte
nsity
Wildtype
APOE3
m/z:450>950
E+066.221
8:00 16:00 24:00 32:00 40:000
20
40
60
80
100 E+066.221
8:00 16:00 24:00 32:00 40:000
20
40
60
80
100
Inte
nsity
ApoE3
Wildtype
ApoE3 vs. WT: LC-MS Plasma Lipid Profiles
PlasmaLipids
Metabolite Analysis- LC/MS of Plasma Lipids
29
0 200 400 600 800 1000 1200 1400 1600
-1
-0.5
0
0.5
1
369.00
569.00
623.00
823.00847.00848.00
849.00850.00
851.00855.00
871.00872.00
873.00874.00
875.00876.00 877.00
878.00879.00
880.00900.00
901.00902.00
903.00904.00
905.00906.00
918.00919.00
926.00928.00
929.00931.00
932.00945.00946.00
948.00950.00
951.00952.00
953.00967.00972.00
975.00995.00
996.00
1040.001041.00
1062.001063.00
1064.001065.00 1086.00
1087.001088.00
1536.001537.00
1668.00
1706.001707.00 1711.00
1732.001733.00
1734.001735.00
1736.001737.00
1738.001739.00
m/z
Mor
eab
unda
ntIn
APO
E*3
Less
abun
dant
In A
POE*
3
triglycerides
phospholipid compound
• Mouse Plasma• Metabolite (lipid)• LC-MS• IMPRESS algorithm• PARC pattern recognition
• Mouse Plasma• Metabolite (lipid)• LC-MS• IMPRESS algorithm• PARC pattern recognition
Novel Metabolite Information
Metabolite AnalysisApoE3 vs. WT: Plasma LipidDifference Factor Spectrum
30
Normalized IntegratedDifferential Profile
Genes Peptides Metabolites
Diff
eren
tial P
rofil
e(n
orm
aliz
ed s
igni
fican
ce)
100
50
75
25
0
25
50
75
100
Variable Index• Mouse Liver• mRNA +Protein + Metabolite• Normalization• Pattern recognition
• Mouse Liver• mRNA +Protein + Metabolite• Normalization• Pattern recognition
Mor
eab
unda
ntIn
Apo
E3Le
ssab
unda
ntIn
Apo
E3
Microarrays
Multi-dimensionalchromatography,LC/MS LC/MS, NMR
31
Similarity
s1 s2 s3 s4 s5 c1 c2 c3 c4 c5
m/z
1
2
3
Samples (peak 1)
s1 s2 s3 s4 s5 c1 c2 c3 c4 c5
m/z
1
2
3
Samples (peak 2)
Example: highly correlated peaks
32
Similarity
s1 s2 s3 s4 s5 c1 c2 c3 c4 c5
m/z
1
2
3
Samples (peak 1)
s1 s2 s3 s4 s5 c1 c2 c3 c4 c5
m/z
1
2
3
Samples (peak 2)
Example: uncorrelated peaks
33
Correlation networkscan reveal patterns of changes relevant to the physiological response
A. Histogram of the distribution of peaks (lipid compounds) according to up-/down-regulation.
B. Down-regulated long-chaintriacylglycerol cluster
C. Up-regulated lipids(mainly long chain phospholipids, short-chaintriacylglycerols, and diacylglycerols)
D. Downregulated cluster containing three C32 phsophatidylcholine lipids
E. Upregulated ceramide cluster
Down-regulated Up-regulated
Node color legend
0.3 0.5 0.7 1 1.4 2 2.8 4 5.60
50
100
150
200
250
mean KO/ mean WT
N p
eaks
Total 1308 peaks
G. Medina Gomez et al., Diabetes (2005)
34
Subspace clustering methods
35
Unsupervised clusteringNo prior information used
Set of “objects” (e.g. samples), each described by several variables (e.g. gene expression, metabolite profiles)
36
Unsupervised clusteringNo prior information used
• Find groups of objects with small within-group distances and large between-group distances
• Several choices of distance metrics
• Examples: K-Means, Hierarchical, Subspace clustering methods
37
Supervised clusteringPrior grouping information available → Classification
• Find a model for each group, in order to be able to classify previously ungrouped objects
• Examples: Neural networks, Genetic algorithms, Support vector machines, Linear discriminant analysis
• Main problem in clinical applications (biomarkers, diagnostics): Lack of proper validation and overfitting.
38
Subspace similarityMetabolites may be dynamically (de)coupled under
specific conditions
s1 s2 s3 s4 s5 c1 c2 c3 c4 c5
m/z
1
2
3
Samples (peak 1)
s1 s2 s3 s4 s5 c1 c2 c3 c4 c5
m/z
1
2
3
Samples (peak 2)
39
Example 2: Functional genomicsob/ob mouse model
• Spontaneous mutation in ob gene resulting in lack of leptin (product of ob gene)
• Leptin hormone is a satiety signal – hormone secreted from adipose tissue– modulates energy intake and utilization
• Model for early onset of severe obesity
40
ob/ob and WT mouse white adipose tissueLipidomic profiles reveal gender-specific differences
-6 -4 -2 0 2 4 6 8 10
x 104
-4
-3
-2
-1
0
1
2
3
4
5
6x 104
Scores on PC 1 (33.49%)
Sco
res
on P
C 4
(8.5
2%)
obF1a
obF1b
obF2a
obF2b
obF3a
obF3b
obF4a
obF4b
obM1a obM1b obM2a
obM2b
obM3a
obM3b
obM4a
obM4b wtF1a wtF1b
wtF2a
wtF2b
wtF3a
wtF3b
wtF4a
wtF4b
wtM1a wtM1b
wtM2a
wtM2b
wtM3a
wtM3b
wtM4a
wtM4b
Oresic and Vidal-Puig
2501 peaks
41
Double KO models (ob/ob and PPARγ2)WT/WT, WT/KO, KO/WT, and KO/KO
koko
M2a
koko
M2b
wtk
oM3a
wtk
oM3b
wtk
oM1a
wtk
oM2a
wtk
oM1b
wtk
oM2b wtw
tM1a
wtw
tM2a
kow
tF1a
kow
tF1b
wtw
tM2b
wtw
tM3b
wtw
tM4b
wtw
tM1b
wtw
tM3a
wtw
tM4a
wtw
tF1a
wtw
tF2a
wtw
tF2b
wtw
tF1b
wtw
tF3a
wtw
tF4a
wtw
tF3b
wtw
tF4b
wtk
oM4a
wtk
oM4b
wtk
oF4b
wtk
oF4b
wtk
oF4a
wtk
oF4a
wtk
oF1b
wtk
oM4a
koko
M1a
kow
tTzM
1ako
koF1
ako
koF2
ako
koM
1bko
wtT
zM1b
koko
TzM
3ako
koTz
M3b
koko
F1b
koko
F2b
wtk
oF3a
wtk
oF3b
wtk
oF1a
wtk
oF2a
wtk
oF2b
wtk
oM4b0.
51.
01.
52.
02.
53.
0
Lambda=10
hclust (*, "complete")d10
Hei
ght
Clustering with Euclidian distance metric
Oresic and Vidal-Puig
42
Subspace clustering (no a priori grouping assumed)COSA method
wtw
tM1a
wtw
tM1b
wtw
tM2a
wtw
tM4b
wtw
tM3a
wtw
tM4a
wtw
tM2b
wtw
tM3b kow
tF1a
kow
tF1b
wtw
tF2a
wtw
tF3a
wtw
tF4a
wtw
tF3b
wtw
tF4b
wtw
tF2b
wtw
tF1a
wtw
tF1b
kow
tTzM
1ako
wtT
zM1b
wtk
oM4a
wtk
oM4b
koko
F1a
koko
F1b
koko
TzM
3ako
koTz
M3b
koko
M1a
koko
M1b
koko
F2a
koko
F2b
wtk
oF3a
wtk
oF3b
wtk
oF2a
wtk
oF2b
wtk
oM4a
wtk
oM4b
wtk
oF4a
wtk
oF4a
wtk
oF4b
wtk
oF4b
wtk
oF1a
wtk
oF1b
wtk
oM3a
wtk
oM3b
koko
M2a
koko
M2b
wtk
oM1a
wtk
oM1b
wtk
oM2a
wtk
oM2b
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Cluster Dendogram: Lambda=0.1
hclust (*, "complete")d01
Hei
ght
Three major groups identified from lipidomic profiles: mainly WT/WT, mainly KO/KO, mainly WT/KO
Oresic and Vidal-Puig
43
Ob/Ob-F Ob/Ob-M WT-F WT-M
1000
020
000
3000
040
000
5000
0339.4X1270.7
ob/ob and WT mouse white adipose tissueLipidomic profiles reveal gender-specific differences
Ob/Ob-F Ob/Ob-M WT-F WT-M
050
010
0015
0020
0025
0030
00
703.7X1066.7Monoacylglycerol Sphingomyelin
Oresic and Vidal-Puig
Ob/Ob-F Ob/Ob-M WT-F WT-M
1500
025
000
3500
045
000
844.8X1579.7
Ob/Ob-F Ob/Ob-M WT-F WT-M
020
000
4000
060
000
902.9X1800.0Triacylglycerol Triacylglycerol
44
References
– Metabolic profiling: Its role in biomarker discovery and gene function analysis. Harrigan and Goodacre, Eds. (Kluwer, 2003)
– J.H. Friedman and J.J. Meulman. Clustering objects on subsets of attributes. J. R. Statist. Soc. B, 66, 1-25 (2004).
– M. Oresic, C.B. Clish, E.J. Davidov, E. Verheij, J.T.W.E. Vogels, L.M. Havekes, E. Neumann, A. Adourian, S. Naylor, J.v.D. Greef, and T. Plasterer. Phenotype characterization using integrated gene transcript, protein and metabolite profiling. Appl. Bioinformatics, 3, 205-217 (2004).
– Katajamaa and Oresic, Processing methods for differential analysis of LC/MS profile data, BMC Bioinformatics 6, 179 (2005).