49
Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University of Louisville, Louisville, KY 40292 [email protected]

Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Integrative Omics for Cancer Biology

Xiang Zhang, PhD

Department of ChemistryCenter for Regulatory and Environmental Analytical Metabolomics

University of Louisville, Louisville, KY 40292

[email protected]

Page 2: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Systems Biology

•Integrative systems biologyExtracting biological knowledge from the ‘omics through integration

•Predictive systems biologyPredicting future of biosystem using ‘omics knowledge, e.g. in-silico biosystems

Davidov, E.; Clish, C. B.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 267- 288. Clish, C. B.; Davidov, E.; Oresic, M.; Zhang, X; et al. Omics: A Journal of Integrative Biology. 2004, 8, 3 -13.

is a field in biology aiming at systems level understanding of biological processes, where a bunch of parts that are connected to one another and work together. It attempts to create predictive models of cells, organs, biochemical processes and complete organisms.

Page 3: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Differential omics is the beginning of Systems Biology

Omics Space

moleculecelltissueorganism…

Page 4: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Differential Proteomics & Metabolomics

Cancer Biomarker Discovery Nano-medicine

1. Differential proteomics and metabolomics are qualitative and quantitative comparison of proteome and metabolome under different conditions that should unravel complex biological processes

2. It can be used to study any scientific phenomena that may change the proteome and/or metabolome of a living system.

NIH

preventative medicine Environment Food and nutrition

Page 5: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Biomarker Discovery is Major Research Field of Differential Omics

These substances may be normally present in small amounts in the blood or other tissues

When the amounts of these substances change, they may indicate disease.

Valid biomarkers should demonstrate drug activity sooner facilitate clinical trial design by defining patient populations optimize dosing for safety and efficacy be sensitive and easy to assay to speed drug development

Biomarkers are naturally occurring biomolecules useful for measuring the prognosis and/or progress of diseases and therapies.

Page 6: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

What Types of Change Are Expected?

concentrationpost-

translationalmodification

sequence(mutation)

degradationProtein

structure ischanged

Proteinstructure

unchanged •Sensing structural change is a major element of comparative proteomics

•Most of metabolomics works focus on concentration change only.

•Sensing structural change is a major element of comparative proteomics

•Most of metabolomics works focus on concentration change only.

Page 7: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Challenges in Proteomics

Sample complexity About 25K types of protein coding-genes present in

human. IPI human database (v3.25) has 67,250 entries, which could generate about 106-8 peptides

More than one hundred post translational modifications (PTMs) could happen in a proteome

Large protein concentration difference 107-8 in human cells, and at least 1012 in human plasma Dynamic range of a LC-MS is about 104-6

The top 12 high abundant proteins constitute approximately 95% of total protein mass of plasma/serum Albumin, IgG, Fibrinogen, Transferrin, IgA, IgM,

Haptoglobin, alpha 2-Macroglobulin, alpha 1-Acid Glycoprotein, alpha 1-Antitrypsin and HDL (Apo A-I & Apo A-II).

Dynamic system, large subject variation

Body Fluid profiling: biomarker platform

High concentrationcompounds

Low concentrationcompounds

GenericSample prep.

FocusedSample prep.

ng/ml

pg/ml

g/ml

Page 8: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Challenges in Metabolomics

•Metabolites have a wide range of molecular weights and large variations in concentration

•The metabolome is much more dynamic than proteome and genome, which makes the metabolome more time sensitive

•Metabolites can be either polar or nonpolar, as well as organic or inorganic molecules. This makes the chemical separation a key step in metabolomics

•Metabolites have chemical structures, which makes the identification using MS an extreme challenge

cholesta-3,5-diene

Page 9: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Differential Omicsbiomarker discovery

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

Diseased Healthy

S1 S2 S3 S4 S5 S6 S7 S8

Page 10: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Informatics Platform

Exp

erim

ent

exec

utio

n

Raw

dat

a tr

ansf

orm

atio

n

Spe

ctru

m

deco

nvol

utio

n

Pea

k al

ignm

ent

Significance test

Molecular identification

Correlation

Pattern recognition

Molecular validation

Sam

ple

info

rmat

ion

Exp

erim

ent

desi

gn

Cluster loadings

Regulated peaks

Protein Function

Molecularnetworks

Regulated molecules

targeted tandem MS

Kno

wle

dge

asse

mbl

ing

Pathway modeling

Unidentified molecules

Quality control

data re-examination

LIMS Interaction

Pea

k no

rmal

izat

ion

Page 11: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Roadmap

1. Experimental design2. Molecular identification3. Data preprocessing4. Statistical significance test5. Pattern recognition6. Molecular networks

Systems Biology Differential omics

Page 12: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

MDLC Platforms

• MudPIT, i.e. SCX followed by RP• The proteome is split into 10-20X more

fractions• There is carry-over between fractions• LC fractions generally still are too complex

for MS

• Affinity Selection• Avidin selection of Cys-containing peptides• Cu-IMAC for His-containing peptides• Ga-IMAC for phosphorylated peptides• Lectins for glycosylated peptides

Sample

APR

SCX

F1 F2

AP

RPC-MSQiu, R.; Zhang, X. and Regnier, F. E. J. Chromatogr. B. 2007, 845, 143-150. Wang, S.; Zhang, X.; and Regnier, F. E. J. Chromatogr. A 2002, 949, 153-162.Regnier, F. E.; Amini, A.; Chakraborty, A.; Geng, M.; Ji, J.; Sioma, C.; Wang, S.; and Zhang, X. LC/GC 2001, 19(2), 200-213.Geng, M.; Zhang, X.; Bina, M.; and Regnier, F. E. J. Chromatogr. B 2001, 752, 293-306.

F2F2

Digestion

AP

Page 13: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

In-Gel Stable Isotope Labelinga sample gel based platform

a)Exp. Cntrl.

HeavyLight50kD

624.0 625.0 626.0 627.0 628.0 629.0 630.0 631.0 632.0 633.0 634.0 635.0 636.0 637.0m/z, amu

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%

Re

l. In

t. (%

)

629.001 629.673628.668

630.005

629.328

630.334

GHYTIGKELIDLVLDR - Tubulin 1 alpha

642 644 646 648 650 652 654 656 658 660 662 664 666

m/z, amu

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

55%

60%

65%

70%

75%

80%

85%

90%

95%

100%651.365

651.865

652.361

SDLGNLLKALGR - OGT

Light singletLight Heavy

~1:1 ratio - background

c)b)

MW

Re

l. In

t. (%

)

~15:1 ratio - major difference

Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. Nature Protocols, 2006, 1, 46-51. .Asara, J. M.; Zhang, X.; Zheng, B.; Christofk, H. H.; Wu, N.; Cantley, L. C. J. Proteome Res., 2006, 5, 155-163.Ji, J.; Chakraborty, A.; Geng M.; Zhang, X.; Amini, A.; Bina, M.; and Regnier, F. E. J. Chromatogr. A 2000, 745, 197-210.

d)

•Avoiding gel-to-gel variability•Only labeling K-containing peptides•Accurate quantification

Page 14: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Roadmap

1. Experimental design2. Molecular identification

protein identificationmetabolite identification

3. Data preprocessing4. Statistical significance test5. Pattern recognition6. Molecular networks

Systems Biology Differential omics

Page 15: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Protein Identificationdatabase searching

Protein

Peptide

Massmatchedpeptide

The database searching approach uses a protein database to find a peptide for which a theoretically predicted spectrum best matches experimental data.

Page 16: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Protein Identificationdatabase searching

Sequest Spectrum Mill Mascot X! Tandem OMSSA

1. About 20% of tandem ms spectra could provide confident peptide identification

2. < 50% of peptides can be identified by all algorithms

More than 20 algorithms have been developed.

Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.

Page 17: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Protein Identificationde novo sequencing

de novo sequencing reconstructs the partial or complete sequence of a peptide directly from its MS/MS spectrum.

Performance of de novo method is limited by low mass accuracy, mass equivalence, and completeness of fragmentation.

Pevtsov, S.; Fedulova, I.; Mirzaei, H.; Buck, C.; Zhang, X. Journal of Proteome Research. 2006, 5, 3018-3028. Fedulova, I.; Ouyang, Z.; Buck, C.; Zhang, X. The Open Spectroscopy Journal 2007, 1, 1-8.

Page 18: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Incorporating Peptide Separation Information for Protein Identificationstructure of pattern classifier

Inputlayer

xl

zn

ymwh

wo

Hiddenlayer

Outputlayer

Feature 1

Feature 2

Feature 3

Feature N

Flowthrough

Partition

Elution

QGLLPVLESFK

VSFLSALEEYTKK

LSPLGEEMR

DYVSQFEGSALGKQLNLK

DSGRDYVSQFEGSALGK

AKPALEDLRQGLLPVLESFK

DLATVYVDVLKDSGR

THLAPYSDELR

VQPYLDDFQKK

QGLLPVLESFKVSFLSALEEYTK

FeatureExtraction

Inputlayer

xl

zn

ymwh

wo

Hiddenlayer

Outputlayer

Feature 1

Feature 2

Feature 3

Feature N

Flowthrough

Partition

Elution

QGLLPVLESFK

VSFLSALEEYTKK

LSPLGEEMR

DYVSQFEGSALGKQLNLK

DSGRDYVSQFEGSALGK

AKPALEDLRQGLLPVLESFK

DLATVYVDVLKDSGR

THLAPYSDELR

VQPYLDDFQKK

QGLLPVLESFKVSFLSALEEYTK

FeatureExtraction

QGLLPVLESFK

VSFLSALEEYTKK

LSPLGEEMR

DYVSQFEGSALGKQLNLK

DSGRDYVSQFEGSALGK

AKPALEDLRQGLLPVLESFK

DLATVYVDVLKDSGR

THLAPYSDELR

VQPYLDDFQKK

QGLLPVLESFKVSFLSALEEYTK

FeatureExtraction

Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.

Page 19: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Training the ANNs with Generic Algorithm

Initial candidate solutions

whji wo

kj thj tok

Initial population

Encoding

Crossover

Mutation

Selection

Best chromosome

whji wo

kj thj tok

Optimal solution

Initial candidate solutions

whji wo

kj thj tok

Initial population

Encoding

Crossover

MutationMutation

Selection

Best chromosome

whji wo

kj thj tok

Optimal solution

Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E.; Zhang, X. Bioinformatics 2007, 23, 114-118.

Page 20: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Protein Identification Using Multiple Algorithms and Predicted Peptide Separation in HPLCPIUMA architecture

Oh, C.; Zak, S. H.; Mirzaei, H.; Regnier, F. E. and Zhang, X. Bioinformatics, 2007, 23, 114-118.Zhang, X.; Oh, C.; Riley, C. P.; Buck, C. Current Proteomics 2007, 4, 121-130.

Raw LC/MS/MS data

Processed MS/MS data

X! Tandem

Sequest

Mascot

Peptide List

Unknown modification search

De novo sequencing

Peaks

Lutefisk

Chr

omat

ogra

phy

Mod

elin

g ba

sed

Val

idat

ion

Rep

ort

Database seraching

Peptide List

1

Unmatched spectra

Unmatched spectra

3

2

mzData or mzXML format

Protein List

novoHMM

cons

ensu

sm

achi

ne

lear

ning

existing algorithms

algorithms to be developed

method descriptions

Color legend

Page 21: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Roadmap

Systems Biology Differential omics

1. Experimental design2. Molecular identification3. Data preprocessing

Spectrum deconvolutionQuality controlAlignmentNormalization

4. Statistical significance test5. Pattern recognition6. Molecular networks

Page 22: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Spectrum DeconvolutionGISTool, single sample analysis

•Smoothing and centralization•Peak cluster detection•Charge recognition•De-isotope•Peak identification at LC level•Doublet recognition•Doublet quantification

1. To differentiate signals arising from the real analytes as opposed to signals arising from contaminants or instrument noise

2. To reduce data dimensionality, which will benefit down stream statistical analysis.

Functionality

Page 23: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

GISTool AlgorithmDeconvoluting MS spectra

748.6354 3+748.9694 2+

m/z

inte

nsit

y (%

)

0

20

60

40

80

100

747 748 751

749.97

748.97

749.47

750.50

749 750

749.97

748.97

748.64749.29

749.47

749.62

750.50

749 750

749.97

748.97

748.64749.29

749.62

749 750

+2 pep+3 pep

m/z

inte

nsit

y (%

)

0

20

60

40

80

100

747 748 751

749.97

748.97

749.47

750.50

749 750

749.97

748.97

748.64749.29

749.47

749.62

750.50

749 750

749.97

748.97

748.64749.29

749.62

749 750

+2 pep+3 pep

Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; and Regnier, F. E. J. Am. Soc. Mass Spectrom. 2005, 16, 1181-1191.

Single sample analysis

Page 24: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Quality Assessment / Control

20 30 40 50 601

23

45

retention time (min)

rete

ntio

n tim

e v

ari

atio

n (

%)

0

0.02

0.04

0.06

0.08

1 2 3 4 5 6 7 8 9 10

sample ID

D v

alu

e

Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K. Bioinformatics, 2005, 21, 4054-4059.

• Biological Sample QA/C• protein assay

• Experimental Data QA/C• 2D K-S test• Percentile of detected peaks• Percentile of aligned peaks• Retention time variance vs.

retention time• m/z variance vs. retention time• Frequency distribution of RT & m/z

variance

Page 25: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Data Alignment

1. MS to MS data alignment

2. MS to MS/MS data alignment

•Referenced alignment•Blind alignment•Quality depending on the information of peak detection

•Depends on experimental design

To recognize peaks of the same molecule occurring in different samples from the thousands of peaks detected during the course of an experiment.

Page 26: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

LC-MS Data AlignmentXAlign software for proteomics & metabolomics data

Zhang, X.; Asara, J. M.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. K. Bioinformatics, 2005, 21, 4054-4059.

•Detecting median sample

•Aligning samples to the median sample

-0.8

-0.4

0

0.4

0.8

10 20 30 40 50 60 70

retention time (min)

rete

nti

on

tim

e d

iffe

ren

ce (

min

)

y = 1.3636x + 16.511

R2 = 0.9475

10

100

1000

10000

10 100 1000 10000

intensity of aligned peaks (sample 1)

inte

nsi

ty o

f al

ign

ed p

eaks

(sa

mp

le 2

)

Mj = Ii,jMi,j / Ii,j

Tj = Ii,jTi,j / Ii,j

Di = |Ti,j -µj|j=1

s

Page 27: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Chromatogram of Serum Analyzed on GCGC/TOF-MS

•Four dimension•1535 peaks have been detected

GCxGC-MS Data Alignment metabolite component of human serum

Page 28: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215

Criteria for alignment•1st dim. rt•2nd dim. rt•spec. correlation

Features*peak entry merging*cont. exclusion

GCxGC/TOF-MS Data AlignmentMSort software for metabolomics

Page 29: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Analysis Results of MAlign53 standard acids

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

200

400

600

800

1000

The number of peak entries in a row of alignment table

The

nu

mb

er

of

row

s in

th

e a

lign

me

nt

tab

le

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

2

4

6

8

10x 10

5

The number of peak entries in a row of alignment tableP

ea

k a

rea

1. 8 [OA + FA] samples and 8 [AA + FA] samples2. derivatization reagent: (N-Methyl-N-t-butyldimethylsilyl)-trifluoroacetamide (MTBSTFA)

Oh, C.; Huang, X.; Buck, C.; Regnier, F. E. and Zhang, X. J. Chromatogr. A. 2008, 1179, 205-215

Page 30: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Normalization

Methods1. Log linear model xij = ai rj eij

2. Reference sample normalization3. Auto-scaling4. Constant mean / trimmed constant mean5. Constant median / trimmed constant median

To reduce concentration effect and experimental variance to make the data comparable.

0 200 400 600 800 1000

02

00

04

00

06

00

08

00

0

peak index

inte

nsi

ty

Page 31: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

CV Distribution of Peak Intensities human serum sample

Before Normalization

CV

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

050

150

250

0.0 0.2 0.4 0.6 0.8 1.0

020

4060

8010

0

Intensity Variation

CV

rel p

eak

no (

%)

After Normalization

CV

Fre

quen

cy

0.2 0.4 0.6 0.8 1.0

050

150

250

0.2 0.4 0.6 0.8 1.0

2040

6080

100

Intensity Variation

CV

rel p

eak

no (

%)

20.7%

17.3%

Log linear model:xij = ai rj eij

log(xij) = log(ai) + log(rj) + log(eij)

Page 32: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Roadmap

1. Experimental design2. Molecular identification3. Data preprocessing4. Statistical significance test5. Pattern recognition6. Molecular networks

Systems Biology Differential omics

Page 33: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Statistical Significance Tests

Methods

1. Pair-wise t-test (diff. mean?)

2. Mann-Whitney U test (diff. median?)

3. Kolmogorov-Smirnov test (diff. population?)

4. Kruskal-Wallis analysis of variance

To find individual peaks for which there are significant differences between groups.

Page 34: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Statistical Significance Testsmetabolome of great blue heron fertilized eggs contaminated by PCBs

fold change = I_c / I_nblue line: p=0.05dashed line: fold change = 0

-3 -2 -1 0 1 2 3

02

46

8

fold change (log)

p (-

log)

down-regulated up-regulated

PCBs: polychlorinated biphenyls

Page 35: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Roadmap

1. Experimental design2. Molecular identification3. Data preprocessing4. Statistical significance test5. Pattern recognition6. Molecular networks

Systems Biology Differential omics

Page 36: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Clustering or Classification

Unsupervised MethodsPrinciple component analysis (PCA)Linear Discriminant Analysis (LDA)

Clustering objects on subsets of attributes (COSA)

Supervised MethodsSupport vector machine (SVM)Artificial neural network (ANN)

Resulting pattern recognition provides the first glimpse of improvement in understanding the underlying biology.

Page 37: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Cross Species Comparison

27 of the 28 control humans and all 8 control rats cluster to one group 11 of the 14 diseased human and all diseased rats cluster to second group

Page 38: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Differential Metabolomics of Human Blood breast cancer samples vs. control samples

Page 39: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Differential Metabolomics of

Human Blood breast cancer samples vs. control samples

Page 40: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Roadmap

1. Experimental design2. Protein identification3. Data preprocessing4. Statistical significance test5. Pattern recognition6. Molecular networks

correlation network interaction network

regulation network pathway analysis

Systems Biology Differential omics

Page 41: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Molecular Correlation Analysispair wised correlation of proteins and metabolites

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

ABCD…Z

Diseased Healthy

S1 S2 S3 S4 S5 S6 S7 S8

Page 42: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

•Reveal important relationships among the various components

•Complimentary to abundance level information

•Provides information about the biochemical processes underlying the disease or drug response

Molecular Correlation Networkan example of drug effect on disease state

phenylalanine

ALP

phenylalanine

L-28b

phenylalanine

L-28a

phenylalanine

L-27b

phenylalanine

L-27a

leucine

L-26b

leucine

L-26a

leucine

lactate

L-24b

lactate

lactate

L-24a

lactate

L-23b

lactate

L-23a

lactate

L-22b

isoleucine

L-22a

L-21b

isoleucine

L-21a

glutamine

L-20b

glutamine

L-20a

L-19b

C22:6 CE

glutamine C22:5 CE

L-19a

glutamine

L-18b

formate

L-18a

creatine

L-17b

C20:5 CE

creatine

L-17a

C20:4 LPC

b-glucose

L-16b

C20:4 CE

b-glucose

L-16a

C20:3 CE

alanine

L-15b

C20:2 CE

L-15aL-14b

C19:0 LPC

alanine

C18:3 CE

alanine

L-14a

C18:2 LPC

a-glucose

L-13b

C18:2 CE

a-glucose

L-13a

C18:1 LPC

acetate

L-12b

C18:1LPC

C60:4 TG

L-12a

C18:1 CE

C60:3 TG

L-11b

C18:0 LPC

C58:5 TG

L-11a

C18:0 CE

C58:4 TG

L-10b

C16:1 LPC

C58:3 TG

L-10a

L-9b

C16:1 CE

C58:2 TG

C16:0 CE

C56:4 TG

L-9a

AMBP

C56:3 TG

L-8b

C56:2 TG

L-8a

FG

C54:6 TG

C54:5 TG

L-7b

TT_2

C54:5 TG

L-7a

A2GC

L-6b

C54:3 TG

L-6a

L-5b

L-5a

C54:1 TG

Afamin_2

C52:6 TG

A1MG_5 C52:5 TG

C52:4 TG

C52:3 TG

A1MG_2

C52:2 TG

C52:1 TG

A1I3_4

C50:4 TG

A1I3_3

C48:1 TG

C46:1 TG

C38:4 PC

C36:4 PC

C36:2 PC

C36:1 PC

C34:2 PC

PlasPre_2

C34:1 PC

C33:1 PC

C32:1 PC

C32:0 PC

C30:0 PC

C24:1 SPM

SerPI_II_2

C24:0 SPM

Hemopex_1ApoA1_7

ApoA1_6

ApoA1_5

ApoA1_3

ApoE_1

Unkn1

TT_1

FetuinA_2

ITIH3_1

FBGB

Plasminogen

K

L-1b

LD

L-1a

GLYCNEFA

GP-1b

TRIG

HDL

GP-1a

valine

valine

GLUC

valine

valine

BUN

tyrosine

tyrosine

ALB

tyrosine

TP

tyrosine

tyrosine

Lipids

NMR

NMR diffusion

Proteins

Clinical

Lipids (LCMS)

NMR (DE)

NMR (CPMG)

Peptides

Clinical

= positivecorrelation

= negativecorrelation

Clish, C. B.; Davidov, E.; Oresic, M.; Plasterer, T.; Lavine, G.; Londo, T. R.; Meys, M.; Snell, P.; Stochaj, W.; Adourian, A.; Zhang, X.; Morel, N.; Neumann, E.; Verheij, E.; Vogels, J, T.W.E.; Havekes, L. M.; Afeyan, N.; Regnier, F. E.; Greef, J.; Naylor, S. Omics: A Journal of Integrative Biology 2004, 8, 3 -13.

Page 43: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

SysNet: Interactive Visual Data Mining of Molecular Correlation Network

Zhang, M.; Ouyang, Q.; Stephenson, A.; Salt, D.; Kane, D. M.; Burgner J.; Buck, C. and Zhang, X. BMC Systems Biology. Accepted by BMC Systems Biology.

•Integrating molecular expression information generated in different ‘omics

•Visualizing molecular correlation in interactive mode

•Enabling time course data visualization and analysis

•Automatically organizing molecules based on their expression pattern in time course.

An interactive integration and visualization environment for molecular correlation of ‘omics data.

a)

b)

Page 44: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Biomarker Verification

In-silico verification tracing lineage pathway analysis

Wet-lab verification AQUA MRM Antibody

Page 45: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Automated Lineage Tracing

Zhang, M.; Zhang, X.; Zhang, X. and Prabhakar, S. 33rd International Conference on Very Large Data Bases (VLDB 2007), 2007.

•Interested in identifying the connections between input and output data for a program

•Tracing of fine-grained lineage through run-time analysis

•Developed based on dynamic slicing techniques used in debugging

•Applicable to any arbitrary function

An

aly

sis

So

ftw

are

Lin

ea

ge

Tra

cin

g

Page 46: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Summary

• Informatics platform developed in my group can be used to analyze protein and metabolite profiling data to differentiate disease and normal samples for biomarker discovery

• Groups identified using clustering analysis reflected the phenotypic categories of cancer and control samples, the animal and human subjects, etc. with high degree of accuracy

• The application of SysNet using an interactive visual data mining approach integrates omics data into a single environment, which enables biologists performing data mining

• Lineage tracing technology is an efficient and effective approach for in-silico biomarker verification. This technique will significantly reduce the false discovery rate (FDR) of biomarker discovery

Page 47: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Acknowledgements

Dr. John BurgerDr. Michael D. KaneDr. Fred E. RegnierDr. David SaltDr. Mohammad SulmaDr. Daniel RafteryDr. Sunil Prabhakar

Irina FedulovaDr. Hamid MirzaeiDr. Cheolhwan OhSergey E. PevtsovOuyang QiAlan StephensonMingwu Zhang

Dr. David ClemmerDr. John AsaraDr. Mu WangDr. Jake ChenDr. Steve ValentineDr. Steve Naylor

Page 48: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University
Page 49: Integrative Omics for Cancer Biology Xiang Zhang, PhD Department of Chemistry Center for Regulatory and Environmental Analytical Metabolomics University

Postdoc Positions

Posting Title: Industrial Postdoctoral Fellow - BioinformaticianWork Location: University of Louisville, KYJob Type: Full timeStarting Date: Position immediately available

Job Description:Predictive Physiology and Medicine (PPM) Inc. is an exciting health and life sciences company based in Bloomington, Indiana focused on developing analytical systems for the individualized health and wellness industry. We have an immediate opening for a postdoctoral fellow. The successful candidate will develop bioinformatics systems for mass spectrometry based quantitative proteomics and metabolomics. Requirements: The position requires a bioinformatician with strong computational background. Priority will be given to the candidate with a PhD in bioinformatics, computer science, statistics, engineer, or computational physics. The successful candidate should have strong understanding of statistics and pattern recognition. Programming skills using Matlab, Microsoft .NET, or Java to accomplish analyses is required. Experience in analyzing biological data is not required; however, interest in multidisciplinary research is a must.