Analysis and interpretation of metabolomics and proteomics...

Preview:

Citation preview

1

Analysis and interpretation of metabolomics and proteomics data

Matej Orešič27.9.2005

2

Nucleus

Cytoplasm

Extracellularmetabolites:NMR, MS

P P

Protein levels:2D gels, MS

Intracellular metabolites:NMR, MS

Post-translationalmodifications:2D gels, MS

P

Protein complexes:Protein chips, Ab arrays

Gene expressionmachinery

proteins

Gene expression:Microarrays

Control of geneexpression:Promoter arrays

folding

Technologies for systems biology studies at the cellular level

3

KEGG Data

4

Metabolic Fluxes

Cell

Subtrates

Products Biomass

GenomicsProteomics

Enzyme activitiesMetabolite levels

etc.

Subtrates

Products Biomass

Metabolic flux balancing

Subtrates

Products Biomass

-in most cases underdetermined system=> experimental constraints necessary

8

Flux balancing

X 3

21

v3+v2-v1=0

9

Chemostat culturesH2490: XR/XDH+XKmc

0

2

4

6

8

10

12

0 50 100 150 200 250 300 350

Cultivation time (h)

OD

600,

Etha

nol (

g/l)

0,0

0,4

0,8

1,2

1,6

2,0

2,4

CO

2 (%

), A

ceta

te, G

lyce

rol (

g/l)

OD600

ACETATE

CO2

EtOH

GLYCEROL

Aerobic (6 Tr) Anaerobic (5 Tr)13C-Labelled (1 Tr)

13C-Labelled (1 Tr)

0

2

4

6

8

10

12

-10 40 90 140 190 240 290 340

Cultivation time (h)

OD 6

00, N

orm

aliz

ed x

ylos

e (2

7 g/

l in

feed

)

0,0

0,4

0,8

1,2

1,6

2,0

2,4

CO

2 (%

), Xy

litol

, Eth

anol

(g/l)

Aerobic (6 Tr) Anaerobic (5 Tr)13C-Labelled (1 Tr) 13C-Labelled (1

Xylitol

Xylose

OD600

Ethanol

CO2

10g/L Glucose

3 g/L glucose+ 27g/L xylose

Samples:•aerobic culture •anaerobic culture•5, 30, and 60 minutesafter the switch off oxygensupply

10

Benefits of 13C labeling & NMR

• position sensitive

• isotopomer sensitive

12

METAFoR(metabolic flux analysis)

14

Mass spectrometry

15

Separation

RPLC/CapLC +ESI

Massspectrometer

(Qtof)

Relativ

e

Abundanc

e

10.00 15.00 20.00 25.00Time0

100

%

Retention time

MS-only ion chromatogram

400 600 800 1000 1200 1400m/z0

100

%

Mass / Charge

862 863 864 865 866 867 868 869m/z0

100

%

864.43

863.04862.70

864.92

865.41

865.92

866.44867.42

869.45868.34

SelectedIon

MS survey scan

400 600 800 1000 1200 1400 1600 1800 2000m/z

05

101520253035

404550556065707580859095

100 1047.5

1683.6960.4

1160.51659.5

873.4 1273.6 1570.41531.6646.4 1402.6745.5 1771.31071.31199.5 1985.5826.3 1857.9

610.1 647.3545.3425.9

b14

b7b9 b10 b11 b12

b13

b15 b17b16

y8

y15

y14

y13y12

y11

y10

y9

y7

y6y5

1328.6y16

Mass / Charge

MS/MS fragmentation pattern

Proteins, purifiedand separated

Sampledigest

labellingPeptides

LC/MS proteomics platform and data processing

16

LC/MS TIC ProfilesFraction 4

Liver Protein ProfilingFractionation using Reversed Phase Chromatography

RP Separation

Fraction 4

5 10 15 20 25 30Time (min)

20406080

100

Relative Abundance

ApoE3-2

ApoE3-3

ApoE3-4

ApoE3-5

ApoE3-7

WT-14

WT-15

WT-16

WT-20

WT-13

TIC

17

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

ApoE3

Wildtype

Plasma Protein ProfilingLC/MS of digested SEC fraction

18

PC 1

Unsupervised clustering reveals differences at 9 week age

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

-0.4

-0.2

0

0.2

0.4

0.6

WT-13

WT-20WT-16

WT-14

WT-15

ApoE3-2

ApoE3-4

ApoE3-7

ApoE3-3

ApoE3-5

Fraction IPC 2

MousePlasma

PCA Score plot PC1 vs PC2

Plasma Protein ProfilingPrincipal Component Analysis: Fraction I

19

Identifyusing

MS / MSMousePlasma

Fraction II

500 1000 1500 2000

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Factor Spectrum

465.00 485.00

535.00

622.00652.00

662.00

750.00

811.00

911.00926.00

992.00

1041.00 1050.00

1208.001216.00

1243.001267.00

1268.001298.00

1340.001366.00

1583.00

m/z

Mor

e ab

unda

ntIn

Apo

E3Le

ss a

bund

ant

In A

poE3

Plasma Protein ProfilingFactor Spectrum: Peptides Exhibiting Differences

20

ApoE3-7 #768-783 RT: 27.60-28.01 AV: 6 NL: 1.16E5F: + c ESI Full ms2 1366.00@37.00 [ 375.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

1047.5

1683.6960.4

1160.5

1659.5873.4 1273.6 1570.4

1531.6646.4 1402.6745.5 1771.31071.3 1199.5 1985.5826.3 1857.9

610.1 647.3545.3425.9

b14

b7b9 b10 b11 b12

b13

b15b17b16

y8

y15

y14

y13y12

y11

y10

y9

y7

y6y5

1328.6y16

>gi|4557325|ref|NP_000032.1|| apolipoprotein E-gi|114039|sp|P02649|APE_HUMANAPOLIPOPROTEIN E PRECURSOR (APO-E)-gi|71795|pir||LPHUE apolipoprotein E precursor- human-gi|178851|gb|AAB59546.1| (K00396) preapolipoprotein E [Homo sapiens]-gi|4105704|gb|AAD02505.1| (AF050154) apolipoprotein E [Homo sapiens][MASS=36154]MKVLWAALLV TFLAGCQAKV EQAVETEPEP ELRQQTEWQS GQRWELALGR FWDYLR WVQT LSEQVQEELLSSQVTQELR A LMDETMKELK AYKSELEEQL TPVAEETRAR LSKELQAAQA RLGADMEDVC GRLVQYRGEVQAMLGQSTEE LRVRLASHLR KLRKRLLRDA DDLQKRLAVY QAGAREGAER GLSAIRERLG PLVEQGRVRAATVGSLAGQP LQERAQAWGE RLRARMEEMG SRTRDRLDEV KEQVAEVRAK LEEQAQQIRL QAEAFQARLKSWFEPLVEDM QRQWAGLVEK VQAAVGTSAA PVPSDNH>monoisotopic mass = 36113

position sequence ( NCBI BLAST link) -------- -------- 57- 79 WVQTLSEQVQEELLSSQVTQELR

W V Q T L S E Q V Q E E L L S S Q V T Q E L Ry8y15 y14 y13 y12 y11 y10 y9 y7 y6 y5y16

b14b7 b9 b10 b11 b12 b13 b15 b17b16

m/z 1366+2 Ion hApoE3

Peptide Sequencing using MS/MS

21

MetabolomicsStudy of small molecules , or metabolites, contained in a cell, tissue or organ (including fluids) and

involved in primary and intermediary metabolism.

Homeostasis‘Housekeeping’

Organic Acids

Lipids

Amino Acids

Nucleotides

Steroids

Eicosanoids

Neurotransmitters

Peptides

Trace elements

ng/mlng/ml

pg/mlpg/ml

mg/mlmg/ml

µµg/mlg/mlNMR

MassSpectrometry

CUSTOM

22

We are not alone genomewise …

From Nicholson et al., Nature Reviews Microbiology (2005)

23

Historical note

1500-2000BCChina•Ants used to detect patients with diabetes

1940s-1970s•Advances in analytics•Pattern recognition

Metabolic profiling

21st century•Advances in analytics•Biostatistics & Bioinformatics

Modern era of metabolomics and systems biology

24

Modern metabolomics platformExperiment design + Analytical chemistry + Chemometrics + Bioinformatics

Global screening (H2O soluble)

Global screening (Lipids)

Primary metabolitesEicosanoids and fatty acids

Other

SamplesMetabolite extraction methods

and analytical platformsProfiling experiments

(LC/MS, GC/MS)

Data processingIdentification

Bio-/chemo-informaticsknowledge mining

Multivariate statistical analyses Data-driven integration

Biological insight

25

Data processing

Analysis of Variance (ANOVA)

Selection of peaks displaying significant changes between Wild Type and Transgenic, separately from

gender or age specific effects

Univariate Analysis

ParametricTests(t-test)

NonparametricTests

(Kolmogorov-Smirnov)

Pre-processing & Normalization & QC

Prioritization of Important Peaks for Identification

Correlation Analysis

Correlation NetworksLinear and Non-Linear approachto profile association calculation

Select peaks with high levelof correlations to strongest

outliers

Exploratory Analysis

PCA and Discriminant Analysis

Study general trendsIn data

Verification of Protein or Metabolite IDs. Databases Extensions/Traversals

26

Wild type

ppm1.02.03.04.05.0

ApoE3

Global Metabolite AnalysisNMR Spectra of Plasma

27Plasma Extract (Organic Phase)

∗∗∗

∗∗∗

-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15

-0.3

-0.2

-0.1

0.0

0.1

0.2

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗∗

∗∗∗

-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15D1

-0.3

-0.2

-0.1

0.0

0.1

0.2

D2

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

ApoE3Wild type

Metabolite AnalysisPlasma NMR Principal Component & Discriminant Analysis

28

E+067.518

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100

m/z:450>950

E+067.518

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100

m/z:450>950

time (min)

Inte

nsity

Wildtype

APOE3

m/z:450>950

E+066.221

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100 E+066.221

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100

Inte

nsity

ApoE3

Wildtype

ApoE3 vs. WT: LC-MS Plasma Lipid Profiles

PlasmaLipids

Metabolite Analysis- LC/MS of Plasma Lipids

29

0 200 400 600 800 1000 1200 1400 1600

-1

-0.5

0

0.5

1

369.00

569.00

623.00

823.00847.00848.00

849.00850.00

851.00855.00

871.00872.00

873.00874.00

875.00876.00 877.00

878.00879.00

880.00900.00

901.00902.00

903.00904.00

905.00906.00

918.00919.00

926.00928.00

929.00931.00

932.00945.00946.00

948.00950.00

951.00952.00

953.00967.00972.00

975.00995.00

996.00

1040.001041.00

1062.001063.00

1064.001065.00 1086.00

1087.001088.00

1536.001537.00

1668.00

1706.001707.00 1711.00

1732.001733.00

1734.001735.00

1736.001737.00

1738.001739.00

m/z

Mor

eab

unda

ntIn

APO

E*3

Less

abun

dant

In A

POE*

3

triglycerides

phospholipid compound

• Mouse Plasma• Metabolite (lipid)• LC-MS• IMPRESS algorithm• PARC pattern recognition

• Mouse Plasma• Metabolite (lipid)• LC-MS• IMPRESS algorithm• PARC pattern recognition

Novel Metabolite Information

Metabolite AnalysisApoE3 vs. WT: Plasma LipidDifference Factor Spectrum

30

Normalized IntegratedDifferential Profile

Genes Peptides Metabolites

Diff

eren

tial P

rofil

e(n

orm

aliz

ed s

igni

fican

ce)

100

50

75

25

0

25

50

75

100

Variable Index• Mouse Liver• mRNA +Protein + Metabolite• Normalization• Pattern recognition

• Mouse Liver• mRNA +Protein + Metabolite• Normalization• Pattern recognition

Mor

eab

unda

ntIn

Apo

E3Le

ssab

unda

ntIn

Apo

E3

Microarrays

Multi-dimensionalchromatography,LC/MS LC/MS, NMR

31

Similarity

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 1)

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 2)

Example: highly correlated peaks

32

Similarity

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 1)

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 2)

Example: uncorrelated peaks

33

Correlation networkscan reveal patterns of changes relevant to the physiological response

A. Histogram of the distribution of peaks (lipid compounds) according to up-/down-regulation.

B. Down-regulated long-chaintriacylglycerol cluster

C. Up-regulated lipids(mainly long chain phospholipids, short-chaintriacylglycerols, and diacylglycerols)

D. Downregulated cluster containing three C32 phsophatidylcholine lipids

E. Upregulated ceramide cluster

Down-regulated Up-regulated

Node color legend

0.3 0.5 0.7 1 1.4 2 2.8 4 5.60

50

100

150

200

250

mean KO/ mean WT

N p

eaks

Total 1308 peaks

G. Medina Gomez et al., Diabetes (2005)

34

Subspace clustering methods

35

Unsupervised clusteringNo prior information used

Set of “objects” (e.g. samples), each described by several variables (e.g. gene expression, metabolite profiles)

36

Unsupervised clusteringNo prior information used

• Find groups of objects with small within-group distances and large between-group distances

• Several choices of distance metrics

• Examples: K-Means, Hierarchical, Subspace clustering methods

37

Supervised clusteringPrior grouping information available → Classification

• Find a model for each group, in order to be able to classify previously ungrouped objects

• Examples: Neural networks, Genetic algorithms, Support vector machines, Linear discriminant analysis

• Main problem in clinical applications (biomarkers, diagnostics): Lack of proper validation and overfitting.

38

Subspace similarityMetabolites may be dynamically (de)coupled under

specific conditions

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 1)

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 2)

39

Example 2: Functional genomicsob/ob mouse model

• Spontaneous mutation in ob gene resulting in lack of leptin (product of ob gene)

• Leptin hormone is a satiety signal – hormone secreted from adipose tissue– modulates energy intake and utilization

• Model for early onset of severe obesity

40

ob/ob and WT mouse white adipose tissueLipidomic profiles reveal gender-specific differences

-6 -4 -2 0 2 4 6 8 10

x 104

-4

-3

-2

-1

0

1

2

3

4

5

6x 104

Scores on PC 1 (33.49%)

Sco

res

on P

C 4

(8.5

2%)

obF1a

obF1b

obF2a

obF2b

obF3a

obF3b

obF4a

obF4b

obM1a obM1b obM2a

obM2b

obM3a

obM3b

obM4a

obM4b wtF1a wtF1b

wtF2a

wtF2b

wtF3a

wtF3b

wtF4a

wtF4b

wtM1a wtM1b

wtM2a

wtM2b

wtM3a

wtM3b

wtM4a

wtM4b

Oresic and Vidal-Puig

2501 peaks

41

Double KO models (ob/ob and PPARγ2)WT/WT, WT/KO, KO/WT, and KO/KO

koko

M2a

koko

M2b

wtk

oM3a

wtk

oM3b

wtk

oM1a

wtk

oM2a

wtk

oM1b

wtk

oM2b wtw

tM1a

wtw

tM2a

kow

tF1a

kow

tF1b

wtw

tM2b

wtw

tM3b

wtw

tM4b

wtw

tM1b

wtw

tM3a

wtw

tM4a

wtw

tF1a

wtw

tF2a

wtw

tF2b

wtw

tF1b

wtw

tF3a

wtw

tF4a

wtw

tF3b

wtw

tF4b

wtk

oM4a

wtk

oM4b

wtk

oF4b

wtk

oF4b

wtk

oF4a

wtk

oF4a

wtk

oF1b

wtk

oM4a

koko

M1a

kow

tTzM

1ako

koF1

ako

koF2

ako

koM

1bko

wtT

zM1b

koko

TzM

3ako

koTz

M3b

koko

F1b

koko

F2b

wtk

oF3a

wtk

oF3b

wtk

oF1a

wtk

oF2a

wtk

oF2b

wtk

oM4b0.

51.

01.

52.

02.

53.

0

Lambda=10

hclust (*, "complete")d10

Hei

ght

Clustering with Euclidian distance metric

Oresic and Vidal-Puig

42

Subspace clustering (no a priori grouping assumed)COSA method

wtw

tM1a

wtw

tM1b

wtw

tM2a

wtw

tM4b

wtw

tM3a

wtw

tM4a

wtw

tM2b

wtw

tM3b kow

tF1a

kow

tF1b

wtw

tF2a

wtw

tF3a

wtw

tF4a

wtw

tF3b

wtw

tF4b

wtw

tF2b

wtw

tF1a

wtw

tF1b

kow

tTzM

1ako

wtT

zM1b

wtk

oM4a

wtk

oM4b

koko

F1a

koko

F1b

koko

TzM

3ako

koTz

M3b

koko

M1a

koko

M1b

koko

F2a

koko

F2b

wtk

oF3a

wtk

oF3b

wtk

oF2a

wtk

oF2b

wtk

oM4a

wtk

oM4b

wtk

oF4a

wtk

oF4a

wtk

oF4b

wtk

oF4b

wtk

oF1a

wtk

oF1b

wtk

oM3a

wtk

oM3b

koko

M2a

koko

M2b

wtk

oM1a

wtk

oM1b

wtk

oM2a

wtk

oM2b

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Cluster Dendogram: Lambda=0.1

hclust (*, "complete")d01

Hei

ght

Three major groups identified from lipidomic profiles: mainly WT/WT, mainly KO/KO, mainly WT/KO

Oresic and Vidal-Puig

43

Ob/Ob-F Ob/Ob-M WT-F WT-M

1000

020

000

3000

040

000

5000

0339.4X1270.7

ob/ob and WT mouse white adipose tissueLipidomic profiles reveal gender-specific differences

Ob/Ob-F Ob/Ob-M WT-F WT-M

050

010

0015

0020

0025

0030

00

703.7X1066.7Monoacylglycerol Sphingomyelin

Oresic and Vidal-Puig

Ob/Ob-F Ob/Ob-M WT-F WT-M

1500

025

000

3500

045

000

844.8X1579.7

Ob/Ob-F Ob/Ob-M WT-F WT-M

020

000

4000

060

000

902.9X1800.0Triacylglycerol Triacylglycerol

44

References

– Metabolic profiling: Its role in biomarker discovery and gene function analysis. Harrigan and Goodacre, Eds. (Kluwer, 2003)

– J.H. Friedman and J.J. Meulman. Clustering objects on subsets of attributes. J. R. Statist. Soc. B, 66, 1-25 (2004).

– M. Oresic, C.B. Clish, E.J. Davidov, E. Verheij, J.T.W.E. Vogels, L.M. Havekes, E. Neumann, A. Adourian, S. Naylor, J.v.D. Greef, and T. Plasterer. Phenotype characterization using integrated gene transcript, protein and metabolite profiling. Appl. Bioinformatics, 3, 205-217 (2004).

– Katajamaa and Oresic, Processing methods for differential analysis of LC/MS profile data, BMC Bioinformatics 6, 179 (2005).

Recommended