44
1 Analysis and interpretation of metabolomics and proteomics data Matej Orešič 27.9.2005

Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

1

Analysis and interpretation of metabolomics and proteomics data

Matej Orešič27.9.2005

Page 2: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

2

Nucleus

Cytoplasm

Extracellularmetabolites:NMR, MS

P P

Protein levels:2D gels, MS

Intracellular metabolites:NMR, MS

Post-translationalmodifications:2D gels, MS

P

Protein complexes:Protein chips, Ab arrays

Gene expressionmachinery

proteins

Gene expression:Microarrays

Control of geneexpression:Promoter arrays

folding

Technologies for systems biology studies at the cellular level

Page 3: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

3

KEGG Data

Page 4: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

4

Metabolic Fluxes

Page 5: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

Cell

Subtrates

Products Biomass

Page 6: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

GenomicsProteomics

Enzyme activitiesMetabolite levels

etc.

Subtrates

Products Biomass

Page 7: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

Metabolic flux balancing

Subtrates

Products Biomass

-in most cases underdetermined system=> experimental constraints necessary

Page 8: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

8

Flux balancing

X 3

21

v3+v2-v1=0

Page 9: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

9

Chemostat culturesH2490: XR/XDH+XKmc

0

2

4

6

8

10

12

0 50 100 150 200 250 300 350

Cultivation time (h)

OD

600,

Etha

nol (

g/l)

0,0

0,4

0,8

1,2

1,6

2,0

2,4

CO

2 (%

), A

ceta

te, G

lyce

rol (

g/l)

OD600

ACETATE

CO2

EtOH

GLYCEROL

Aerobic (6 Tr) Anaerobic (5 Tr)13C-Labelled (1 Tr)

13C-Labelled (1 Tr)

0

2

4

6

8

10

12

-10 40 90 140 190 240 290 340

Cultivation time (h)

OD 6

00, N

orm

aliz

ed x

ylos

e (2

7 g/

l in

feed

)

0,0

0,4

0,8

1,2

1,6

2,0

2,4

CO

2 (%

), Xy

litol

, Eth

anol

(g/l)

Aerobic (6 Tr) Anaerobic (5 Tr)13C-Labelled (1 Tr) 13C-Labelled (1

Xylitol

Xylose

OD600

Ethanol

CO2

10g/L Glucose

3 g/L glucose+ 27g/L xylose

Samples:•aerobic culture •anaerobic culture•5, 30, and 60 minutesafter the switch off oxygensupply

Page 10: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

10

Page 11: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

Benefits of 13C labeling & NMR

• position sensitive

• isotopomer sensitive

Page 12: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

12

METAFoR(metabolic flux analysis)

Page 13: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3
Page 14: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

14

Mass spectrometry

Page 15: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

15

Separation

RPLC/CapLC +ESI

Massspectrometer

(Qtof)

Relativ

e

Abundanc

e

10.00 15.00 20.00 25.00Time0

100

%

Retention time

MS-only ion chromatogram

400 600 800 1000 1200 1400m/z0

100

%

Mass / Charge

862 863 864 865 866 867 868 869m/z0

100

%

864.43

863.04862.70

864.92

865.41

865.92

866.44867.42

869.45868.34

SelectedIon

MS survey scan

400 600 800 1000 1200 1400 1600 1800 2000m/z

05

101520253035

404550556065707580859095

100 1047.5

1683.6960.4

1160.51659.5

873.4 1273.6 1570.41531.6646.4 1402.6745.5 1771.31071.31199.5 1985.5826.3 1857.9

610.1 647.3545.3425.9

b14

b7b9 b10 b11 b12

b13

b15 b17b16

y8

y15

y14

y13y12

y11

y10

y9

y7

y6y5

1328.6y16

Mass / Charge

MS/MS fragmentation pattern

Proteins, purifiedand separated

Sampledigest

labellingPeptides

LC/MS proteomics platform and data processing

Page 16: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

16

LC/MS TIC ProfilesFraction 4

Liver Protein ProfilingFractionation using Reversed Phase Chromatography

RP Separation

Fraction 4

5 10 15 20 25 30Time (min)

20406080

100

Relative Abundance

ApoE3-2

ApoE3-3

ApoE3-4

ApoE3-5

ApoE3-7

WT-14

WT-15

WT-16

WT-20

WT-13

TIC

Page 17: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

17

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Time (min)

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e Ab

unda

nce

ApoE3

Wildtype

Plasma Protein ProfilingLC/MS of digested SEC fraction

Page 18: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

18

PC 1

Unsupervised clustering reveals differences at 9 week age

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

-0.4

-0.2

0

0.2

0.4

0.6

WT-13

WT-20WT-16

WT-14

WT-15

ApoE3-2

ApoE3-4

ApoE3-7

ApoE3-3

ApoE3-5

Fraction IPC 2

MousePlasma

PCA Score plot PC1 vs PC2

Plasma Protein ProfilingPrincipal Component Analysis: Fraction I

Page 19: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

19

Identifyusing

MS / MSMousePlasma

Fraction II

500 1000 1500 2000

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Factor Spectrum

465.00 485.00

535.00

622.00652.00

662.00

750.00

811.00

911.00926.00

992.00

1041.00 1050.00

1208.001216.00

1243.001267.00

1268.001298.00

1340.001366.00

1583.00

m/z

Mor

e ab

unda

ntIn

Apo

E3Le

ss a

bund

ant

In A

poE3

Plasma Protein ProfilingFactor Spectrum: Peptides Exhibiting Differences

Page 20: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

20

ApoE3-7 #768-783 RT: 27.60-28.01 AV: 6 NL: 1.16E5F: + c ESI Full ms2 [email protected] [ 375.00-2000.00]

400 600 800 1000 1200 1400 1600 1800 2000m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Rel

ativ

e A

bund

ance

1047.5

1683.6960.4

1160.5

1659.5873.4 1273.6 1570.4

1531.6646.4 1402.6745.5 1771.31071.3 1199.5 1985.5826.3 1857.9

610.1 647.3545.3425.9

b14

b7b9 b10 b11 b12

b13

b15b17b16

y8

y15

y14

y13y12

y11

y10

y9

y7

y6y5

1328.6y16

>gi|4557325|ref|NP_000032.1|| apolipoprotein E-gi|114039|sp|P02649|APE_HUMANAPOLIPOPROTEIN E PRECURSOR (APO-E)-gi|71795|pir||LPHUE apolipoprotein E precursor- human-gi|178851|gb|AAB59546.1| (K00396) preapolipoprotein E [Homo sapiens]-gi|4105704|gb|AAD02505.1| (AF050154) apolipoprotein E [Homo sapiens][MASS=36154]MKVLWAALLV TFLAGCQAKV EQAVETEPEP ELRQQTEWQS GQRWELALGR FWDYLR WVQT LSEQVQEELLSSQVTQELR A LMDETMKELK AYKSELEEQL TPVAEETRAR LSKELQAAQA RLGADMEDVC GRLVQYRGEVQAMLGQSTEE LRVRLASHLR KLRKRLLRDA DDLQKRLAVY QAGAREGAER GLSAIRERLG PLVEQGRVRAATVGSLAGQP LQERAQAWGE RLRARMEEMG SRTRDRLDEV KEQVAEVRAK LEEQAQQIRL QAEAFQARLKSWFEPLVEDM QRQWAGLVEK VQAAVGTSAA PVPSDNH>monoisotopic mass = 36113

position sequence ( NCBI BLAST link) -------- -------- 57- 79 WVQTLSEQVQEELLSSQVTQELR

W V Q T L S E Q V Q E E L L S S Q V T Q E L Ry8y15 y14 y13 y12 y11 y10 y9 y7 y6 y5y16

b14b7 b9 b10 b11 b12 b13 b15 b17b16

m/z 1366+2 Ion hApoE3

Peptide Sequencing using MS/MS

Page 21: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

21

MetabolomicsStudy of small molecules , or metabolites, contained in a cell, tissue or organ (including fluids) and

involved in primary and intermediary metabolism.

Homeostasis‘Housekeeping’

Organic Acids

Lipids

Amino Acids

Nucleotides

Steroids

Eicosanoids

Neurotransmitters

Peptides

Trace elements

ng/mlng/ml

pg/mlpg/ml

mg/mlmg/ml

µµg/mlg/mlNMR

MassSpectrometry

CUSTOM

Page 22: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

22

We are not alone genomewise …

From Nicholson et al., Nature Reviews Microbiology (2005)

Page 23: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

23

Historical note

1500-2000BCChina•Ants used to detect patients with diabetes

1940s-1970s•Advances in analytics•Pattern recognition

Metabolic profiling

21st century•Advances in analytics•Biostatistics & Bioinformatics

Modern era of metabolomics and systems biology

Page 24: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

24

Modern metabolomics platformExperiment design + Analytical chemistry + Chemometrics + Bioinformatics

Global screening (H2O soluble)

Global screening (Lipids)

Primary metabolitesEicosanoids and fatty acids

Other

SamplesMetabolite extraction methods

and analytical platformsProfiling experiments

(LC/MS, GC/MS)

Data processingIdentification

Bio-/chemo-informaticsknowledge mining

Multivariate statistical analyses Data-driven integration

Biological insight

Page 25: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

25

Data processing

Analysis of Variance (ANOVA)

Selection of peaks displaying significant changes between Wild Type and Transgenic, separately from

gender or age specific effects

Univariate Analysis

ParametricTests(t-test)

NonparametricTests

(Kolmogorov-Smirnov)

Pre-processing & Normalization & QC

Prioritization of Important Peaks for Identification

Correlation Analysis

Correlation NetworksLinear and Non-Linear approachto profile association calculation

Select peaks with high levelof correlations to strongest

outliers

Exploratory Analysis

PCA and Discriminant Analysis

Study general trendsIn data

Verification of Protein or Metabolite IDs. Databases Extensions/Traversals

Page 26: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

26

Wild type

ppm1.02.03.04.05.0

ApoE3

Global Metabolite AnalysisNMR Spectra of Plasma

Page 27: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

27Plasma Extract (Organic Phase)

∗∗∗

∗∗∗

-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15

-0.3

-0.2

-0.1

0.0

0.1

0.2

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗∗

∗∗∗

-0.15 -0.10 -0.05 0.00 0.05 0.10 0.15D1

-0.3

-0.2

-0.1

0.0

0.1

0.2

D2

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

ApoE3Wild type

Metabolite AnalysisPlasma NMR Principal Component & Discriminant Analysis

Page 28: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

28

E+067.518

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100

m/z:450>950

E+067.518

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100

m/z:450>950

time (min)

Inte

nsity

Wildtype

APOE3

m/z:450>950

E+066.221

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100 E+066.221

8:00 16:00 24:00 32:00 40:000

20

40

60

80

100

Inte

nsity

ApoE3

Wildtype

ApoE3 vs. WT: LC-MS Plasma Lipid Profiles

PlasmaLipids

Metabolite Analysis- LC/MS of Plasma Lipids

Page 29: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

29

0 200 400 600 800 1000 1200 1400 1600

-1

-0.5

0

0.5

1

369.00

569.00

623.00

823.00847.00848.00

849.00850.00

851.00855.00

871.00872.00

873.00874.00

875.00876.00 877.00

878.00879.00

880.00900.00

901.00902.00

903.00904.00

905.00906.00

918.00919.00

926.00928.00

929.00931.00

932.00945.00946.00

948.00950.00

951.00952.00

953.00967.00972.00

975.00995.00

996.00

1040.001041.00

1062.001063.00

1064.001065.00 1086.00

1087.001088.00

1536.001537.00

1668.00

1706.001707.00 1711.00

1732.001733.00

1734.001735.00

1736.001737.00

1738.001739.00

m/z

Mor

eab

unda

ntIn

APO

E*3

Less

abun

dant

In A

POE*

3

triglycerides

phospholipid compound

• Mouse Plasma• Metabolite (lipid)• LC-MS• IMPRESS algorithm• PARC pattern recognition

• Mouse Plasma• Metabolite (lipid)• LC-MS• IMPRESS algorithm• PARC pattern recognition

Novel Metabolite Information

Metabolite AnalysisApoE3 vs. WT: Plasma LipidDifference Factor Spectrum

Page 30: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

30

Normalized IntegratedDifferential Profile

Genes Peptides Metabolites

Diff

eren

tial P

rofil

e(n

orm

aliz

ed s

igni

fican

ce)

100

50

75

25

0

25

50

75

100

Variable Index• Mouse Liver• mRNA +Protein + Metabolite• Normalization• Pattern recognition

• Mouse Liver• mRNA +Protein + Metabolite• Normalization• Pattern recognition

Mor

eab

unda

ntIn

Apo

E3Le

ssab

unda

ntIn

Apo

E3

Microarrays

Multi-dimensionalchromatography,LC/MS LC/MS, NMR

Page 31: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

31

Similarity

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 1)

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 2)

Example: highly correlated peaks

Page 32: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

32

Similarity

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 1)

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 2)

Example: uncorrelated peaks

Page 33: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

33

Correlation networkscan reveal patterns of changes relevant to the physiological response

A. Histogram of the distribution of peaks (lipid compounds) according to up-/down-regulation.

B. Down-regulated long-chaintriacylglycerol cluster

C. Up-regulated lipids(mainly long chain phospholipids, short-chaintriacylglycerols, and diacylglycerols)

D. Downregulated cluster containing three C32 phsophatidylcholine lipids

E. Upregulated ceramide cluster

Down-regulated Up-regulated

Node color legend

0.3 0.5 0.7 1 1.4 2 2.8 4 5.60

50

100

150

200

250

mean KO/ mean WT

N p

eaks

Total 1308 peaks

G. Medina Gomez et al., Diabetes (2005)

Page 34: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

34

Subspace clustering methods

Page 35: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

35

Unsupervised clusteringNo prior information used

Set of “objects” (e.g. samples), each described by several variables (e.g. gene expression, metabolite profiles)

Page 36: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

36

Unsupervised clusteringNo prior information used

• Find groups of objects with small within-group distances and large between-group distances

• Several choices of distance metrics

• Examples: K-Means, Hierarchical, Subspace clustering methods

Page 37: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

37

Supervised clusteringPrior grouping information available → Classification

• Find a model for each group, in order to be able to classify previously ungrouped objects

• Examples: Neural networks, Genetic algorithms, Support vector machines, Linear discriminant analysis

• Main problem in clinical applications (biomarkers, diagnostics): Lack of proper validation and overfitting.

Page 38: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

38

Subspace similarityMetabolites may be dynamically (de)coupled under

specific conditions

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 1)

s1 s2 s3 s4 s5 c1 c2 c3 c4 c5

m/z

1

2

3

Samples (peak 2)

Page 39: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

39

Example 2: Functional genomicsob/ob mouse model

• Spontaneous mutation in ob gene resulting in lack of leptin (product of ob gene)

• Leptin hormone is a satiety signal – hormone secreted from adipose tissue– modulates energy intake and utilization

• Model for early onset of severe obesity

Page 40: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

40

ob/ob and WT mouse white adipose tissueLipidomic profiles reveal gender-specific differences

-6 -4 -2 0 2 4 6 8 10

x 104

-4

-3

-2

-1

0

1

2

3

4

5

6x 104

Scores on PC 1 (33.49%)

Sco

res

on P

C 4

(8.5

2%)

obF1a

obF1b

obF2a

obF2b

obF3a

obF3b

obF4a

obF4b

obM1a obM1b obM2a

obM2b

obM3a

obM3b

obM4a

obM4b wtF1a wtF1b

wtF2a

wtF2b

wtF3a

wtF3b

wtF4a

wtF4b

wtM1a wtM1b

wtM2a

wtM2b

wtM3a

wtM3b

wtM4a

wtM4b

Oresic and Vidal-Puig

2501 peaks

Page 41: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

41

Double KO models (ob/ob and PPARγ2)WT/WT, WT/KO, KO/WT, and KO/KO

koko

M2a

koko

M2b

wtk

oM3a

wtk

oM3b

wtk

oM1a

wtk

oM2a

wtk

oM1b

wtk

oM2b wtw

tM1a

wtw

tM2a

kow

tF1a

kow

tF1b

wtw

tM2b

wtw

tM3b

wtw

tM4b

wtw

tM1b

wtw

tM3a

wtw

tM4a

wtw

tF1a

wtw

tF2a

wtw

tF2b

wtw

tF1b

wtw

tF3a

wtw

tF4a

wtw

tF3b

wtw

tF4b

wtk

oM4a

wtk

oM4b

wtk

oF4b

wtk

oF4b

wtk

oF4a

wtk

oF4a

wtk

oF1b

wtk

oM4a

koko

M1a

kow

tTzM

1ako

koF1

ako

koF2

ako

koM

1bko

wtT

zM1b

koko

TzM

3ako

koTz

M3b

koko

F1b

koko

F2b

wtk

oF3a

wtk

oF3b

wtk

oF1a

wtk

oF2a

wtk

oF2b

wtk

oM4b0.

51.

01.

52.

02.

53.

0

Lambda=10

hclust (*, "complete")d10

Hei

ght

Clustering with Euclidian distance metric

Oresic and Vidal-Puig

Page 42: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

42

Subspace clustering (no a priori grouping assumed)COSA method

wtw

tM1a

wtw

tM1b

wtw

tM2a

wtw

tM4b

wtw

tM3a

wtw

tM4a

wtw

tM2b

wtw

tM3b kow

tF1a

kow

tF1b

wtw

tF2a

wtw

tF3a

wtw

tF4a

wtw

tF3b

wtw

tF4b

wtw

tF2b

wtw

tF1a

wtw

tF1b

kow

tTzM

1ako

wtT

zM1b

wtk

oM4a

wtk

oM4b

koko

F1a

koko

F1b

koko

TzM

3ako

koTz

M3b

koko

M1a

koko

M1b

koko

F2a

koko

F2b

wtk

oF3a

wtk

oF3b

wtk

oF2a

wtk

oF2b

wtk

oM4a

wtk

oM4b

wtk

oF4a

wtk

oF4a

wtk

oF4b

wtk

oF4b

wtk

oF1a

wtk

oF1b

wtk

oM3a

wtk

oM3b

koko

M2a

koko

M2b

wtk

oM1a

wtk

oM1b

wtk

oM2a

wtk

oM2b

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Cluster Dendogram: Lambda=0.1

hclust (*, "complete")d01

Hei

ght

Three major groups identified from lipidomic profiles: mainly WT/WT, mainly KO/KO, mainly WT/KO

Oresic and Vidal-Puig

Page 43: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

43

Ob/Ob-F Ob/Ob-M WT-F WT-M

1000

020

000

3000

040

000

5000

0339.4X1270.7

ob/ob and WT mouse white adipose tissueLipidomic profiles reveal gender-specific differences

Ob/Ob-F Ob/Ob-M WT-F WT-M

050

010

0015

0020

0025

0030

00

703.7X1066.7Monoacylglycerol Sphingomyelin

Oresic and Vidal-Puig

Ob/Ob-F Ob/Ob-M WT-F WT-M

1500

025

000

3500

045

000

844.8X1579.7

Ob/Ob-F Ob/Ob-M WT-F WT-M

020

000

4000

060

000

902.9X1800.0Triacylglycerol Triacylglycerol

Page 44: Analysis and interpretation of metabolomics and proteomics ...cis.legacy.ics.tkk.fi/Opinnot/T-61.6080/2005/... · 873.4 1273.6 1570.4 646.4 1402.6 1531.6 745.5 1071.31199.5 1771.3

44

References

– Metabolic profiling: Its role in biomarker discovery and gene function analysis. Harrigan and Goodacre, Eds. (Kluwer, 2003)

– J.H. Friedman and J.J. Meulman. Clustering objects on subsets of attributes. J. R. Statist. Soc. B, 66, 1-25 (2004).

– M. Oresic, C.B. Clish, E.J. Davidov, E. Verheij, J.T.W.E. Vogels, L.M. Havekes, E. Neumann, A. Adourian, S. Naylor, J.v.D. Greef, and T. Plasterer. Phenotype characterization using integrated gene transcript, protein and metabolite profiling. Appl. Bioinformatics, 3, 205-217 (2004).

– Katajamaa and Oresic, Processing methods for differential analysis of LC/MS profile data, BMC Bioinformatics 6, 179 (2005).