Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Strategies for conducting metabolomics analyses
at scale
Clary B. Clish, Ph.D.Senior Director, MetabolomicsInstitute Scientist
19 June 2019
Metabolomics: the systematic analysis of metabolites in biological specimens
• carbohydrates
• amino acids
• nucleotides
• organic acids
• acylcarnitines
• lipids
(> × 104)
adapted from Gerszten & Wang Nature 2008
Proximal reporters
of disease:
• diabetes
• CKD
• CVD
• inflammation
Microbiome
Diet
Lifestyle/environment
Rationale for doing metabolomics:
• the actual number of metabolites/small molecules that occur in biological samples is unknown
• metabolite concentrations range from a few molecules per cell to mM but the best analytical techniques have linear dynamic ranges of 4-5 orders of magnitude
• physical properties differ widely among metabolites
–polarity: range from very polar to very nonpolar
– chemical stability: labile to very stable
• multiple analytical methods are needed to obtain “full” coverage of the metabolome
Metabolomics is a significant analytical challenge
LC-MS-based metabolomics platform (Broad Institute)
NH2-HILIC
RP/C18
Atlantis-HILIC
RP/C8
Free fatty acids, bileacids, lipid mediators
MeOH extracts
Amines &cationic metabolitesMeOH/ACN extracts
Sugars, organicacids, purines, pyrimidines, etcMeOH extracts
Aliquot
Biofluid/Homogenate
LipidsIPA extracts
• + mode HRAM MS• Exactive Plus, Q Exactive
Nontargeted MS
hundreds of confirmed
knowns
&
thousandsyet-to-be-
confirmed/unknowns
Targeted MS
SamplePreparation
ChromatographyMass
SpectrometryMetabolite
ProfilesBiologicalSamples
Atlantis-HILICTrp metabolites
Nontargeted MS
Nontargeted MS
Targeted MS
• + mode HRAM MS• Exactive Plus, Q Exactive
• - mode HRAM MS• Q Exactive
• MRM profiling• Agilent 4695 QQQ
• MRM profiling• 5500 QTRAP
10µL
30µL
10µL
30µL
Nontargeted MS
• - mode HRAM MS• Q Exactive Plus
polarity
C18Eicosanoids/LM
sensitiveanalyses of
select compounds
Hybrid analyses: metabolites of known identity & unknowns
niacinamide
kynurenicacid
taurine
leucine isoleucine
carnitine
Ion chromatograms (“targeted” peaks) Full scan dataset (nontargeted)
RT = 5.41 min
m/z = 190.0498
Mitochondrial disorders
• Markers of mitochondrial disease• Mitochondrial dysfunction
Shaham PNAS 2010; 107:1571-5Chen Cell Rep 2014; 7:27-34Gohil J Biol Chem 2013; 288:35387-95Bau Elife 2016; 5. pii: e10575Delaney PNAS 2017; 114:8402-8407
Type 2 diabetes
• Metabolic predictors of future T2D in FHS, MDC, MCDS, NHS, DPP
• Metabolic phenotyping of SLC16A11 haplotypes associated with T2D
• Intervention response in DPP• Influence of diet on metabolic profiles
and T2D in PREDIMED
Wang Nat Med 2011; 17:448-53Rhee J Clin Invest 2011; 121:1402–1411Wang J Clin Invest 2013;123:4309-4317SIGMA Consortium. Nature 2014; 506:97-101Magnusson Diabetes 2015; 64:3010-6Walford Diabetes 2016; 65:1424-33O’Sullivan J Clin Invest 2017; 127:4394–4402
Renal disease
• CKD progression• Prediction of CV mortality• Novel markers of uremia
Kalim S et al. J Am Heart Assoc 2013; 2(6)Rhee E et al. J Am Soc Nephrol 2010; 21:1041Rhee E et al. Am J Nephrol 2016; 43:366-74Tran MT et al. Nature 2016; 531:528-32
Cancer metabolism
• Metabolic dependencies of cancer cells
Birsoy Nature 2014; 508:108-12Israelsen Cell 2013;155:397-409Jain M Science 2012; 336:1040-4Wang Cell 2014;158:1309-23Davidson Cell Metab 2016; 23:517-28Kryukov Science 2016; 351:1214-8Spinelli Science 2017; 358:941-946Zou Nat Commun 2019; 10:1617Li Nat Med 2019; 25:850-860
Cancer
• Early indicators of pancreatic cancer• Risk factors for breast cancer• Dietary and hormonal determinants of
cancer• Predictors of prostate cancer
Cardiovascular disease
• Predictors of CHD in DPP• Predictors of CHD in WHI• Influence of diet on metabolic profiles
and CVD in PREDIMED
Mayers Nat Med 2014; 20:1193-8Danai Nature 2018; 558:600-604.
Guasch-Ferré Am J Clin Nutr 2016; 103:1408-16Ruiz-Canela Clin Chem 2016; 62:582-92Lewis GD J Am Coll Cardiol 2016; 67:174-89Wang Circulation 2017; 135:2028-2040Paynter Circulation 2018; 137:841-853Zhao Circulation 2019; 139:2003-2011 Microbiome & disease
• Microbiome in IBD• Gut microbiome & diabetes• Bile acid profiles associated with C. diff
Infection & immunity
• Metabolic signaling and metabolism in immune cells
• Influence of infection on metabolism and vise versa
Tannahill Nature 2013; 496:238-42Mascanfroni Nat Med 2015; 21:638-46Wang Cell 2015; 163:1413-27Matheson Cell Host Microbe 2015; 18:409-23Graham Nat Commun 2015; 6:7838Palsson-McDermott Cell Metab 2015; 21:65-80Rothhammer Nat Med 2016; 22:586-97Yang Nature 2017; 548:602-606.Werling Cell 2019; 177:315-325
Meelu Inflamm Bowel Dis 2014; 20:1139-46Kostic Cell Host Microbe 2015; 17:260-73Allegretti Aliment Pharmacol Ther 2016; 43:1142Fujisaka J Clin Invest 2016; 126:4430-4443Ni Sci Transl Med 2017; 9(416)Fujisaka Cell Rep 2018; 22:3072-3086Franzosa Nat Microbiol 2019; 4:293-305Lloyd-Price Nature 2019; 569:655-662
1948
Original cohortN = 5209
Offspring studyN = 5124
1971 present
Gen 3 studyN=4200
12 yearFollow up for T2D
Exam 5 (1991-95):
3000 people underwent
OGTT
75g d-glucose in 300 mL
Plasma metabolic predictors of T2D inthe Framingham Heart Study
2002 present
present
Robert Gerszten, Thomas Wang
• Nested case-control study: baseline samples from 189 future T2D cases and 189
matched controls
• Matching based on fasting glucose, age, sex, BMI, and hypertension status
• Targeted metabolite profiling of ~250 metabolites
Metabolites dysregulated 4-12 years before T2D diagnosis
Wang TJ et al. Nat Med 2011; 17:448-453
Odds ratio for future diabetes: Plasma isoleucine, phenylalanine, and tyrosine
Discovery (FHS) Validation (Malmö)
12 year follow-up 13 year follow-up
(n=378) (n=326)
1st quartile
2nd quartile
3rd quartile
4th quartile
P for trend
1.0 (referent)
2.08 (0.97-4.46)
2.59 (1.09-6.15)
3.93 (1.54-10.04)
0.006
1.0 (referent)
3.48 (1.68-7.23
2.82 (1.25-6.34)
5.99 (2.34-15.34)
0.0009
ca
ses/c
on
tro
ls
Acyl chain carbon number Acyl chain double bonds
ca
ses/c
on
tro
ls
Acyl chain carbon number Acyl chain double bonds
ca
ses/c
on
tro
ls
Acyl chain carbon number Acyl chain double bonds
Rhee EP et al. J Clin Invest 2011; 121:1402-11
Wang TJ et al. J Clin Invest 2013; 123:4309-17
Triglycerides
2-aminoadipate
BCAA/AA
Metabolites associated with insulin resistance (HOMA-IR) in Gen 3 Framingham Heart Study participants
• 1000 participants
• Nontargeted HILIC-posmethod (knowns + unknown peaks)
• ~5000 peaks were observed in >80% of individuals
• ~500 peaks associated with key metabolic traits
~200 peaks associated with hepatic fat (age and sex adjusted)
Metabolites associated with HOMA-IR
Unknowns associated with HOMA-IR
Robert GersztenJohn O’SullivanJordan Morningstar
Phenotype covariates n beta p value
LPR AGE1:sex 464 -0.197 2.28E-24
LPR AGE1:sex:bmi1 464 -0.175 4.81E-16
LPR AGE1:sex:smoke1:alc1 463 -0.201 6.22E-25
LPR
AGE1:sex:smoke1:alc1:HDL1:
log(tg1):gluc1:diab:HTN1 457 -0.186 1.49E-16
LPR
AGE1:sex:smoke1:alc1:HDL1:
log(tg1):gluc1:diab:HTN1:bmi1 457 -0.174 1.71E-13
Cmpd #5836 (m/z 202.1185) is associatedwith hepatic fat in FHS
~200 peaks associated with hepatic fat (age and sex adjusted)
Unknown Metabolite:
m/z 202.1185
Phenotype:
Liver Fat
(CT Scan)
P = 6.22E-24
2-Hydroxy-3-methylbutyric acid
4-Hydroxyisovaleric acid
3-Hydroxyisovaleric acid
3-Hydroxyvaleric acid
4-Hydroxyvaleric acid
2-Hydroxy-2methylbutyric acid
Diethyl carbonate
3-Hydroxy-2-methyl-[S-(R,R)]-
butanoic acid
2-Hydroxyvaleric acid
L-Threonine
L-Allothreonine
Hydroxyethyl glycine
4-Amino-3-hydroxybutyrate
L-Homoserine
D-Alanyl-D-alanine
Alanyl-Alanine
4-Acetamido-2-aminobutanoic acid
1-Methylhistidine
3-Methylhistidine
Ethyl lactate
2-Methyl-3-hydroxybutyric acid
3-Hydroxy-2-methyl-[R-(R,S)]-butanoic acid
Erythronilic acid
2-Ethylhydracrylic acid
Gene:
AGXT2
(alanine-glyoxylate
aminotransferase 2)
P = 3.79E-9
O’Sullivan, Morningstar et al. J Clin Invest 2017; 127:4394–4402
AGXT2:A Multifunctional enzyme
Rodionov RN et al. Trends Pharmacol Sci 2014; 35:575-82
Theoretical m/z:
202.1185
DMGV: a-keto-dimethyl-d-(NG,NG-dimethylguanidynol) valeric acid
Plasma DMGV levels are elevated in biopsy-proven NASH and are modulated following weight loss surgery
Plasma DMGV:Biopsy-proven NASH cohort
Plasma DMGV:Roux-en-Y gastric bypass patients (n =39)
(n = 36) (n = 36)
O’Sullivan, Morningstar et al. J Clin Invest 2017; 127:4394–4402
DMGV predicts incident T2D in the Malmo Diet and Cancer Study (MDC) and the Jackson Heart Study (JHS)
Model Model1 Model 2
Adjusted for age, sex Adjusted for age, sex, BMI, glucose
MDC JHS MDC JHS
OR Quartile 1 1.0 (referent) 1.0 (referent) 1.0 (referent) 1.0 (referent)
OR Quartile Q2 1.09 (0.58-2.04) 1.15 (0.6–2.1) 1.15 (0.61–2.19) 1.30 (0.8–2.2)
OR Quartile Q3 1.49 (0.79–2.81) 1.87 (1.09–3.2) 1.59 (0.81–3.13) 1.51 (0.9–2.5)
OR Quartile Q4 2.71 (1.37–5.38) 2.6 (1.54–4.39) 2.81 (1.34–5.87) 1.79 (1.0–3.04)
P value for trend 0.003 0.00002 0.004 0.026
MDC: 196 incident cases, 126 controls; mean follow-up time = 12.8 yearsJHS: 133 incident cases, 465 controls; mean follow-up time = 7.5 years
O’Sullivan, Morningstar et al. J Clin Invest 2017; 127:4394–4402
• metabolomics data are acquired in batches
- e.g. ~1000 samples/LC column)
• ability to serially concatenate data from batches as they become available potentiates linear scalability
• strategies required for:
- QC within batches
- standardizing data to compensate for drift in instrument sensitivity within a batch
- scaling data between batches
- matching unknowns between batches: “alignment”
• small variations in measured masses and retention times between batches complicates matching unknowns
Addressing projects of scale
Reference mixtures analyzed before and after to assure system performance
Internal standard(s) added in first step of sample extraction
- monitored during analyses
- may be used to standardize data
Pooled study sample: analyzed every 20 study samples
- used to standardize data across datasets
Second pooled reference sample, analyzed every 20 study samples
- used to assess: overall reproducibility & impact of standardization procedures
- we typically use the pooled study sample
QC & data standardization
“PREFB” used to monitor coefficients of variation foreach metabolite during and across the run
up to ~1000
study samples
per LC column
. . . PREFA PREFB Study samples (20) PREFA PREFB Study samples (20) PREFA PREFB Study samples (20) PREFA PREFB
LC-MS
Sample
Queue
“PREFA” used to remove temporal drift within batches and standardize dataacross batches using nearest-neighbor normalization
Metabolomics workflow enabled by custom software tools
Cloud-based web server
Alignment appStandardization app
• standardizes data to internal standards and/or pooled reference samples
• data visualization• One batch at a time or multiple aligned
batches
• Aligns features between current batch and net alignment of previous batches
• Uses nonparametric regression based on unambiguously matched peaks to adjust m/z and RT
Risk of incident breast & colon cancer: samples from 8000 study participants using one nontargeted method
• 3800 samples
• 4 columnsNHS
• 2600 samples
• 3 columnsNHS2
• 1600 samples
• 2 columns
NHS/
HPFS
Column 1
•Knowns: 204
•Unknowns: 11055
Column 2
•Knowns: 205
•Unknowns: 11447
Column 3
•Knowns: 204
•Unknowns: 11412
Column 4
•Knowns: 204
•Unknowns: 11006
Column 1
•Knowns: 172
•Unknowns: 8958
Column 2
•Knowns: 172
•Unknowns: 10664
Column 3
•Knowns: 172
•Unknowns: 13163
Column 1
•Knowns: 199
•Unknowns: 10619
Column 2
•Knowns: 199
•Unknowns: 9364
Heather EliassenWalter WillettMeir StampferFran Grodstein
Co
effi
cien
t o
f V
aria
tio
n (
%)
Peak Area (log scale)
• 3800 samples
• 4 columnsNHS• 2600 samples
• 3 columnsNHS2• 1600 samples
• 2 columns
NHS/
HPFS
RAW
STANDARDIZED
RAW
STANDARDIZED
RAW
STANDARDIZED
CVs Raw Stand.
Knowns
(199)24.4% 3.8%
Unknowns
(4141) 56.8% 17.0%
CVs Raw Stand.
Knowns
(249)21.8% 4.0%
Unknowns
(4763) 46.9% 15.9%
CVs Raw Stand.
Knowns
(199)14.6% 3.9%
Unknowns
(6084) 39.8% 12.2%
Co
eff
icie
nt
of
Var
iati
on
(%
)
Pooled reference samples: fully aligned NHS breast cancer, NHS2 breast cancer, & NHS/HPFS colon cancer
Peak Area
Coeffic
ien
t of V
ariation
(%)
Median CV CV < 10% CV < 20%
Knowns (212) 3.9% 187 (88.2%) 205 (96.7%)
Unknowns (2816) 12.2% 1201 (42.6%) 1816 (64.5%)
• working with smaller cohorts, we have been able to identify metabolic profiles that predict incident disease
• effect sizes among metabolic predictors tend to be small (e.g. 10% difference in means) and complete analysis of the UK Biobank promises to provide unprecedented statistical power
• scalable metabolomics workflows capable of analyzing thousands of plasma samples are presently available
• the modular nature of LC-MS-based metabolomics platforms enables an a la carte approach to method selection
• nontargeted metabolite profiling methods that measure both metabolites of known identity and unknowns enable serendipity
Concluding remarks