Upload
yang-qiu
View
214
Download
1
Embed Size (px)
Citation preview
ORIGINAL ARTICLE
Multivariate classification analysis of metabolomic datafor candidate biomarker discovery in type 2 diabetes mellitus
Yang Qiu Æ Dilip Rajagopalan Æ Susan C. Connor Æ Doris Damian ÆLei Zhu Æ Amir Handzel Æ Guanghui Hu Æ Arshad Amanullah ÆSteve Bao Æ Nathaniel Woody Æ David MacLean Æ Kwan Lee Æ Dana Vanderwall ÆTerence Ryan
Received: 12 June 2008 / Accepted: 25 July 2008 / Published online: 14 August 2008
� Springer Science+Business Media, LLC 2008
Abstract Recent advances in genomics, metabolomics
and proteomics have made it possible to interrogate disease
pathophysiology and drug response on a systems level. The
analysis and interpretation of the complex data obtained
using these techniques is potentially fertile but equally
challenging. We conducted a small clinical trial to explore
the application of metabolomics data in candidate bio-
marker discovery. Specifically, serum and urine samples
from patients with type 2 diabetes mellitus (T2DM) were
profiled on metabolomics platforms before and after
8 weeks of treatment with one of three commonly used oral
antidiabetic agents, the sulfonyurea glyburide, the bigua-
nide metformin, or the thiazolidinedione rosiglitazone.
Multivariate classification techniques were used to detect
serum or urine analytes, obtained at baseline (pre-treat-
ment) that could predict a significant treatment response
after 8 weeks. Using this approach, we identified three
analytes, measured at baseline, that were associated with
response to a thiazolidinedione after 8 weeks of treatment.
Although larger and longer-term studies are required to
validate any of the candidate biomarkers, pharmacometa-
bolomic profiling, in combination with multivariate
classification, is worthy of further exploration as an adjunct
to clinical decision making regarding treatment selection
and for patient stratification within clinical trials.
Keywords Classification � Biomarker � Metabolomics �Pharmacometabolomics � Metabonomics � NMR
1 Introduction
Among the goals of systems biology is achieving a broad
or ‘systematic’ view of biological changes in a cell or
organism as a function of some perturbation. This can be
assessed by measuring changes in levels of genes, tran-
scripts, proteins or metabolites and mining these changes
using intensive multivariate statistics and pattern analyses.
The complex nature of the experimental data and compu-
tational results also have the potential to more robustly
characterize inter-individual relationships between genetic
Electronic supplementary material The online version of thisarticle (doi:10.1007/s11306-008-0123-5) contains supplementarymaterial, which is available to authorized users.
Y. Qiu (&) � D. Rajagopalan � G. Hu � N. Woody �D. Vanderwall
Department of Informatics, GlaxoSmithKline, Five Moore
Drive, Research Triangle Park, NC 27709, USA
e-mail: [email protected]
S. C. Connor
Department of Investigative Preclinical Toxicology,
GlaxoSmithKline, Five Moore Drive, Research Triangle Park,
NC 27709, USA
e-mail: [email protected]
D. Damian � A. Handzel
BG Medicine Inc, Waltham, MA 02451, USA
L. Zhu
Department of Statistical Sciences, GlaxoSmithKline, Five
Moore Drive, Research Triangle Park, NC 27709, USA
A. Amanullah � D. MacLean � T. Ryan
High Throughput Biology, GlaxoSmithKline, Five Moore Drive,
Research Triangle Park, NC 27709, USA
S. Bao
Molecular Discovery IT, GlaxoSmithKline, Five Moore Drive,
Research Triangle Park, NC 27709, USA
K. Lee
Department of Biomedical Data Sciences, GlaxoSmithKline,
Five Moore Drive, Research Triangle Park, NC 27709, USA
123
Metabolomics (2008) 4:337–346
DOI 10.1007/s11306-008-0123-5
variations, and the mechanisms underlying these differ-
ences. Similarly, a broad array of measurements might
provide greater prognostic ability regarding experimental
outcomes as compared to a single biomarker. In drug dis-
covery, these studies can be used to identify candidate
biomarkers (or a fingerprint) for a disease, drug efficacy or
toxicity. Type 2 diabetes mellitus (T2DM) represents an
interesting case study as it is a multi-factorial disease state
with considerable inter-individual heterogeneity.
T2DM is a complex disturbance of physiologic mecha-
nisms affecting many metabolic homeostatic processes,
including energy and lipid metabolism, inflammation,
clotting and vascular endothelial functions (Bastard et al.
2006; Laakso 2002; Ziegler 2005). These disturbances
arise from reduced insulin action in peripheral tissues
predominantly from a resistance to circulating insulin,
together with impaired pancreatic insulin secretion (Kil-
patrick 1997). Given the causal relationship between
hyperglycemia and diabetic complications, measures of
glycemia, such as fasting plasma glucose (FPG), glycos-
ylated hemoglobin (HbA1c), or, less commonly,
fructosamine, are typically used to monitor disease pro-
gression and treatment efficacy. However, these measures
generally do not discriminate between the various patho-
physiological phenotypes of diabetes (Ostenson 2001;
Petersen and McGuire 2005). Understanding the patho-
physiologic profile may better inform us of biologic
mechanisms and therapeutic efficacy for particular phar-
macologic agents.
Various technologies have been applied in recent years
to develop predictive biomarkers. For example, gene
expression profiling and proteomics have been used to
predict the clinical outcome of cancer treatments (‘t Veer
et al. 2002; Ma et al. 2004; Meyerson and Carbone 2005;
Petricoin et al. 2002; Raponi et al. 2006). However, our
analysis focuses on measured metabolites in the context of
larger systems biology study for two reasons. First, the
analysis of metabolomic data and the subsequent biological
contextualization from a systems biology perspective is an
area still under very active development. A survey of
diverse analysis techniques, with proven utility in other
areas, could help establish what might be expected in
future studies pursuing similar goals. Second, the long-term
goal of many similar studies will be to identify biomarkers
that, once validated, could be useful in clinical or diag-
nostic settings. As such, the analysis was undertaken to
explore the patterns and level of predictive modeling that
could be realized from biomarkers readily available with-
out biopsy, which poses a significantly greater burden than
serum or urine sample collection.
Commonly used oral antidiabetic agents were used in
our study, including the sulfonylurea glyburide, the
biguanide metformin, and the thiazolidinedione
rosiglitazone, representing a broad range of mechanism of
action (MOA) (Ahmann and Riddle 2002; Bastard et al.
2006). Our clinical trial was relatively small-scale (75 male
subjects with T2DM) and short-term (8 weeks of treat-
ment). The intent was a hypothesis-generating activity to
investigate the applicability of systems biology in a drug
discovery context. Serum and urine samples were obtained
at pre-treatment baseline, and after 8 weeks of treatment
with one of the following: placebo, rosiglitazone, metfor-
min or glyburide. High information content nuclear
magnetic resonance (NMR) and liquid chromatography/
mass spectroscopy (LC/MS)-based metabolomic platforms,
including polar metabolite and lipid profiling, were used to
profile the samples. We used a variety of multivariate
analysis techniques to determine whether polar low
molecular weight metabolites, lipids, or fatty acids, ana-
lyzed in readily accessible fluids can be used to predict
drug responder status at week 8 based on their measure-
ment at baseline.
2 Materials and methods
2.1 Experimental design
Male subjects aged 30–70 years with a documented history
of stable T2DM for no more than 10 years duration were
eligible if they had been previously treated with diet and
exercise alone, monotherapy or low-dose combination
therapy. Fasting plasma glucose at screening could not
exceed 225 mg/dl for subjects treated with diet and exer-
cise alone or 180 mg/dl for subjects receiving monotherapy
or low-dose combination therapy. HbA1c was required to
be within 5.7–10.0% with the following conditions; sub-
jects with HbA1c between 5.7% and 9% must have been
diabetic for less than 5 years and treated with mono or low-
dose combination therapy and have a FPG of 125–180
mg/dl, and subjects with HbA1c between 9.1% and 10.0%
must not have been treated with combination therapy. In
addition, body mass index must have been within the
range of 25–37.5 kg/m2, for subjects aged 30–55 years, or
25–35.0 kg/m2 for subjects aged 56–70 years. Use of
insulin for greater than 7 days during the 6 months prior to
screening was prohibited and use of the following medi-
cations within 1 month prior to screening that may affect
response of experimental drugs was also prohibited: thia-
zolidinediones, high dose HMG-CoA reductase inhibitors
(statins), and high dose cholesterol absorption inhibitors.
Eligible subjects entered the treatment phase after a 5-week
washout period and were randomly assigned to one of four
single-blind treatment groups: 19 to placebo, 22 to rosig-
litazone, 21 to metformin, and 21 to glyburide (Of the 83
subjects that went into the trial, we were able to obtain the
338 Y. Qiu et al.
123
metabolomics data for 75 of them). All subjects were
blinded to study medication (single-blinded). Based on
glucose levels, doses of glyburide (total dose 5–15 mg) and
metformin (total dose 500–1,500 mg) were single-blind
titrated upwards at weeks 2 and 4, and rosiglitazone was
titrated from 2 mg twice daily to 4 mg twice daily at week
4 only. Blood and urine samples were collected prior to and
at 4 and 8 weeks following initiation of treatment. The
baseline (week 0) clinical and biochemical characteristics
of participants are shown in Supplement Table 1, along
with normal characteristics of the general male population.
2.2 Data generation
Serum and urine samples were analyzed using various met-
abolomic platforms and with traditional serum biomarker
(‘‘non-omic’’) measurements. Both urine and serum were
measured by NMR-based metabolic profiling. Serum sam-
ples were also analyzed by LC/MS for polar metabolites and
lipids, and GC-flame ionization for fatty acids (lipidomics).
Analysis of clinical chemistry, serum and plasma protein
biomarkers, and physiological parameters such as body
weights were also included in the data set. In total, there were
over 3,000 variables included in the analysis: 98 analytes
from clinical chemistry, 303 fatty acids from GC-flame
ionization, 467 lipids from LC/MS, 921 LC/MS polar
metabolite peaks, 314 NMR serum metabolite peaks, and
1006 urine NMR metabolite peaks which include both 0 h
and 6 h measurements (Urine samples were collected at both
0 h and 6 h). Both the details of the metabolomics platform
data acquisition and signal processing can be referred to
Supplementary Method A.
2.3 Data pre-treatment
The first step in data pretreatment was to handle missing
values, since several multivariate classification methods
do not allow missing values. Metabolic analytes with too
many missing values were eliminated. Up to 25% missing
values in either class were allowed for serum NMR data,
up to 20% missing allowed for non-omic analytes, and up
to 15% missing data was allowed for the remaining
platforms. For subjects in the training set, missing values
were set to the median value of non-missing training
subjects in same class. For subjects in the holdout set,
missing values were set to the median value of all non-
missing subjects. After data preprocessing, there were
about 1,500 analytes remaining for use in classification.
The final step was location and scale transformation
performed across all samples in the analysis to ensure the
samples were from the same distribution and comparable
to each other.
2.4 Definition of treatment response (‘‘responder’’)
using a composite score of glycemic-lowering
efficacy.
Originally, efficacy response was defined as a FPG
decrease of greater than 30 mg/dl. However, glucose is
highly variable and influenced by short-term changes in
diet, activity or stress, whereas integrated measures of
glycemic response can estimate whether a patient’s average
glucose has changed over time (weeks to months) in
response to treatment (Tahara and Shima 1995). Fructos-
amine, whose half-life is determined by that of albumin,
provides a measure of integrated glucose over a period of
2–4 weeks (Hom et al. 1998). HbA1c, a form of glycos-
ylated hemoglobin, is the gold standard measure of
integrated glucose over a 6–12 week period (Howey et al.
1989, Picardi and Pozzilli 2003). These three measures of
glycemic efficacy—FPG, fructosamine and HbA1c—were
used to determine if they could more reliably predict
responder status when used in combination.
Since our analysis used an 8-week study, which is less
than the 12 weeks generally required to reach full glycemic
efficacy with PPAR-c agonists (rosiglitazone), it was nec-
essary to derive a surrogate measure of efficacy that reflects
a developing response trend. This composite measure of
efficacy was derived solely for its use in the modeling in
this study, and has not been tested or validated in a general
context. Any candidate biomarkers discovered on the basis
of this composite efficacy measure must be tested and
validated in larger and longer duration trials that use con-
ventional, well-accepted definitions of glycemic response
before they are used to make treatment decisions.
Combined data from three larger clinical trials (GSK trials
49653_011, 49653_020, 49653_024, http://ctr.gsk.co.
uk/welcome.asp) were used to model changes in FPG,
fructosamine and HbA1c at 8 weeks versus measured
changes in HbA1c (the accepted gold standard) at 17 weeks.
The goal was to establish an efficacy measure and responder
criterion at 8 weeks that matches the 17-week ‘‘truth’’. Many
composite scoring rules were able, with 8 week data, to
outperform observed change in any single measure in pre-
dicting the HbA1c change at 17 weeks. A rule was chosen
from within the mathematical ‘space’ of choices that was
relatively simple and reflected the perceived relative value of
the glycemic markers as discussed above: 1(%DFPG) +
2(%DFructosamine) + 1.5(%DHbA1c) = response. Thus,
if the composite % reduction is greater than 30% using this
formula, the subject is classified as a ‘responder’. Using the
composite score definition, the fraction of subjects
responding in this 8-week trial was shown in Supplementary
Table 2 and ranged between 43% and 60% for the three
treatments.
Multivariate classification analysis 339
123
2.5 Multivariate classification methods
Analysis of large volumes of data with a high number of
variables (dimensions) poses a challenge for data classifi-
cation. The four classification methods we used were
Random Forest (RF), Prediction Analysis for Microarray
(PAM), Partial Least Square-Discriminant Analysis (PLS-
DA), and T-test/Majority Vote (T-test). RF is a decision
tree-based classifier using an algorithm originally devel-
oped by Leo Breiman (Breiman 2001). It grows many
classification trees (a forest) and the forest chooses the
classification of a sample by choosing the class that has the
most votes across all trees. PAM is a centroid classifier
proposed by Narashiman which computes a standardized
centroid for each class and predicts the class of a new
sample based on the its distance to the class centroid
(Tibshirani et al. 2002). PLS regression is a dimension
reduction method that finds components of independent
variable space that are relevant to the outcome space
(Hellberg et al. 1986). T-test classifier is a simple, majority
vote-based classifier that uses a t-test for feature selection.
The next step for T-test classifier is to calculate a threshold
value for each selected feature, which is the mean value of
the two means from the two classes. Each analyte can then
be used independently to classify a sample, depending on
which side of the threshold the analyte value for that
sample lies and the final class is determined by majority
vote. A more detailed description of each method is
described in a supplement (Supplementary Method B).
2.6 Building the classifier and model validation
Data overfitting is a known issue in data mining where
the number of variables greatly exceeds the number of
observations. In order to ensure that the classifier has not
over fitted the data, proper data validation procedures
should be adopted (Radmacher et al. 2002). A standard
procedure was used for all four classification methods.
The samples to be classified were randomly divided into a
training set and a holdout set. The training samples were
used to determine parameters for each classifier such as
the optimal number of analytes to maximize accuracy
(based on the percentage of samples correctly classified)
using a cross-validation procedure. A four-fold cross-
validation (CV) was used in this analysis, where the
training samples were randomly divided into four CV
groups that were as class balanced as possible. In the CV
procedure, numerous combinations of free parameters of
each classifier were selected to span the parameter space;
the classifiers were built using 3 out of the 4 CV groups
and the resulting models were used to make class pre-
dictions on the samples in the 4th group. The particular
combination of parameters that maximized accuracy over
the entire parameter space was selected as the optimal
parameter set. The accuracy associated with this optimal
parameter set is known as the CV accuracy. Once the
optimal parameter set was determined, the entire set of
four groups (all the training samples) were used to rebuild
the classifier and make class predictions on the holdout
set to obtain the holdout accuracy. For individual drug
fingerprints, the total number of samples available was
only around 20; in these cases, division into training and
holdout sets was not performed. All the samples were
used in CV mode.
To assess the significance of the CV results, a permu-
tation strategy was adopted. The four-fold CV step was
repeated using randomly permuted class labels between
100 and 1,000 times depending on the method. Due to the
small sample size (n = 20–60) and small class number (2
classes), the classifier was considered significant if the
percentage of permutation runs with better CV accuracy
than the un-permuted case was on the order of 10% or less
(P \ 0.1).
3 Results and discussions
Application of multivariate classification methods to data
obtained from metabolomics, transcriptomics or proteo-
mics is an active area of research (‘t Veer et al. 2002; Fu
and Fu-Liu 2004; Kapetanovic et al. 2004; Li et al. 2004).
At present, evidence is lacking regarding the advantages of
one classification method over another. Therefore, our
approach was to apply four representative classification
techniques in parallel for every question of interest and to
compare results obtained from different methods. The
methods included both linear and non-linear classification
in original space or transformed space. The models were
used to classify responses into two groups: subjects who
respond to treatment, responders (R) and those who don’t,
nonresponders (NR). The workflow was kept constant
across all four methods (Fig. 1).
For classification using metabolomic data, serum mea-
surements of conventional glycemic markers (FPG,
fructosamine and HbA1c) were excluded from the com-
bined dataset. All NMR resonances from serum and urine
corresponding to the alpha and beta anomers of glucose
were also excluded, ie 3.215–3.26, 3.377–3.6, 3.68–3.926,
4.6–5.26 for serum and 5.2–5.3, 4.56–4.7, 3.68–3.92,
3.22–3.56 for urine, the differing values in the two biofl-
uids a reflection of the differing bucketing methods used to
derive the features (classical bucketing (0.067 ppm width)
and intelligent bucketing (0.02 ppm ± 50% width) for
serum and urine respectively). The rationale for exclusion
was to identify analytes other than the conventional
glycemic markers.
340 Y. Qiu et al.
123
3.1 Cross-drug fingerprint
Our goal was to identify a set of analytes that can predict
8-week patient response to oral antidiabetic agents with
diverse MOA. If a classifier could successfully predict
treatment response (R vs. NR) from three diverse mech-
anisms, it could be potentially useful to predict response
of a new drug with a different MOA. Classification
analysis was applied to data from 60 subjects who were
treated with one of the three study drugs. The samples
were divided into 46 subjects in the training group and 14
in the holdout group. Both treatment type and class were
properly balanced in the training and holdout groups.
Results from each of the classification methods are
summarized in Table 1A.
The cross-validation (CV) accuracy across four classi-
fication methods ranged from 59% to 74%. The
permutation procedure indicated that when cross validation
was repeated with a randomized class label, no more than
9% (for the Prediction Analysis for Microarray PAM
classifier (Tibshirani et al. 2002)) of the CV accuracy was
better than the original CV accuracy; in other words, the
permutation P value ranged from 0.09 to less than 0.01
depending on the method. The number of analytes used by
each classifier ranged from 5 to 190. Models were validated
by predicting the responder status of 14 subjects in a
holdout group and the accuracy ranged from 50% to 71%.
In particular, the T-test/Majority Vote (T-test) classifier
using 75 analytes gave the best holdout prediction of 71%
accuracy. We noted the prediction accuracies were less for
glyburide than for metformin and rosiglitazone. It is evi-
dent from the principal component analysis (PCA) plot
shown in Fig. 2b that the 13 analytes (Supplementary
Table 3) picked by all four models have the ability to
discriminate non-responders (open circles) from responders
(solid circles), while using all 1,735 analytes did not sep-
arate the two groups (Fig. 2a). The means and standard
deviations for the 13 discriminating cross-drug analytes for
responders and non-responders at baseline and week 8 are
shown in Supplementary Table 4.
Since T2DM is a disease with established biomarkers of
disease severity and therapeutic efficacy, it is important to
establish whether classification using metabolomic plat-
forms offers any advantage relative to the conventional
gylcemic biomarkers. Results for prediction of treatment
response using only the three conventional markers at
baseline (FPG, fructosamine, HbA1c) indicated that none
of the classifiers yield a statistically significant model (data
not shown), suggesting that additional data which more
comprehensively represent the underlying biology, such as
those acquired using metabolomics, are needed to predict
treatment response. A PCA plot using those three markers
also showed inter-mixed responders and non-responders
(Fig. 2c).
Holdoutprediction
Holdoutaccuracy
Permutation procedure
Repeat external CV with randomized class labels in training set.
(100-1000 runs)
% Permutationpercentage of runs with betterCV accuracy.
Holdout
set
Sample partition
1
4
2
3
Optimal Number of analytes (parameter tuning)
CV accuracy: percent of sample correctly predicted
1
2
4
3
4 Fold External Cross Validation
Training
set
Fig. 1 Workflow of building classifier and model validation. The
same workflow was applied to all four methods: Random Forrest,
Prediction Analysis of Microarray, Partial Least Squares-Discrimi-
nant Analysis, Support Vector Machine, T-test/Majority Vote.
Samples were divided into a training set and a holdout set. The
classifier was built in a 4-fold cross validation (CV) where the optimal
number of features used in the classifier was selected to give the best
cross validation accuracy. The model was then validated through two
procedures. One was holdout prediction, since the holdout set had
never been used to build model or classifier. The second procedure
was the permutation procedure. The cross validation was repeated for
100–1,000 runs (method dependent) with randomized class labels.
The percentage of permutation was the percent of permutation runs
that had better CV accuracy than the original CV accuracy
Multivariate classification analysis 341
123
3.2 Individual drug fingerprint
In this question, the goal was to find a set of analytes that
can predict patient response to a specific oral therapy:
rosiglitazone, metformin or glyburide. Since we only had
data available for *21 subjects per oral therapy, all sub-
jects were included in the cross-validation group.
Significant classifiers were obtained for predicting rosig-
litazone outcomes using metabolomic data prior to
treatment (Table 1B). CV accuracies ranged from 67% to
81% using 3–67 analytes. We noted that a classifier built
from three analytes using T-test/Majority Vote had a cross
validation accuracy of 81%. The three analytes were also
included in the list of features picked by the other three
classifiers. Figure 3b shows the discriminating power of
these three analytes (urine citrate, serum 1-methyl histi-
dine, and serum IL-8), with good separation evident
between the responder (solid circles) and non-responder
(open circles) groups, whereas using all 1,306 analytes
included in this analysis does not indicate separation of the
two groups (Fig. 3a). In comparison, the CV accuracies
were worse using the three conventional glycemic bio-
markers (glucose, fructosomine, HbA1c) than using the set
of metabolomic analytes. This was consistent with the PCA
plot of the three conventional biomarkers alone, where
there was no clear separation of responders versus non-
responders (Fig. 3c). The relationship between urinary
citrate, serum 1-methyl histidine, and serum IL-8 and the
Table 1 Cross-drug or individual drug classification results predicting
week 8 treatment response using metabolomic data at baseline. (A)
Cross-drug classification results. (B) Classification results for Rosiglit-
azone treated subjects. (C) Classification results for Metformin treated
subjects. Accuracy results for RF, PAM and T-test are not shown because
the models do not pass the permutation test. (D) Classification results for
Rosiglitazone or Metformin-treated subjects. Accuracy results for RF are
not shown because the model did not pass the permutation test
Method Number of analytes CV accuracy (%) (n = 46) %Permutation (# of permutation) Holdout accuracy (%) (n = 14)
R M G T R M G T
(A) Cross-drug classification results
PLS-DA 190 56 67 53 59 5.6 (500) 60 75 60 64
RF 20 94 87 73 85 \1 (100) 60 100 40 64
PAM 138 69 80 53 67 9 (100) 60 50 40 50
T-test 75 62 87 73 74 3.7 (1000) 80 100 41 71
Method Number of analytes CV accuracy (%) (n = 21) %Permutation (# of permutation)
NR RS T
(B) Classification results for Rosiglitazone treated subjects
PLS-DA 55 83 44 67 8 (500)
RF 66 92 67 81 4 (100)
PAM 67 92 56 76 5 (100)
T-test 3 75 89 81 7.5 (1000)
(C) Classification results for Metformin treated subjects
PLS-DA 110 56 80 68 1 (500)
RF 71 37 (100)
PAM 1141 15 (100)
T-test 88 20 (1000)
Method Number of analytes CV accuracy (%) (n = 31) % Permutation (# of permutations) Holdout accuracy (%) (n = 9)
R M T R M T
(D) Classification results for Rosiglitazone or Metformin-treated subjects
PLS-DA 10 62 73 68 0.8 (500) 60 75 67
RF 17 11 (100)
PAM 64 75 80 77 1 (100) 60 60 56
T-test 13 69 80 74 7.5 (1000) 80 100 89
The number of analytes indicates the optimal number that maximized prediction accuracy in cross-validation. The percentage of permutation is
the percent of permutation runs that had better CV accuracy than the original CV accuracy. The number in brackets indicates the number of
permutation runs which was method dependent.
R = rosiglitazone. M = metformin, G = glyburide, T = overall accuracy, RS = Responder, NR = non-responder
342 Y. Qiu et al.
123
three conventional diabetes markers (glucose, HbA1c and
fructosamine) is shown in Supplementary Fig. 1.
For metformin-treated subjects, only PLS-DA yielded
classifiers with a permutation percentage of less than 10%
(or P \ 0.1). The CV accuracies were 68% with 110
analytes picked (Table 1C). For glyburide-treated subjects,
none of the methods yielded a significant model predicting
its treatment outcome. This result is consistent with the
observation in the cross-drug analysis shown above that the
accuracy in classifying glyburide-treated subjects was
lowest among the three drugs.
3.3 Fingerprint for rosiglitazone and metformin
Since the glyburide-treated patients were hardest to classify
in both the cross-drug analysis as well as the glyburide-
only analysis, we decided to build classifiers for patients
treated with either rosiglitazone or metformin, leaving out
glyburide. Classification analysis was applied to data from
40 subjects who were treated with either rosiglitazone or
metformin. The samples were divided into 31 subjects in
the training group and nine in the holdout group. Both
treatment type and class were properly balanced in the
Fig. 2 PCA plot of subjects treated with three study drugs. (a) Uses
all 1,735 analytes at baseline; (b) Uses 13 analytes at baseline picked
by all four classifiers that are predictive of treatment response across
all three drugs; (c) Uses three conventional markers at baseline:
glucose, fructosamine, HbA1c. The solid circles in the figure
correspond to responders, and the open circles to non-responders
Fig. 3 PCA plot of subjects treated with rosiglitazone. (a) Uses all
1,306 analytes at baseline; (b) Uses three analytes at baseline picked
by all four classifiers that are predictive of treatment response for
rosiglitazone-treated subjects; (c) Uses three conventional markers at
baseline: glucose, fructosamine, HbA1c. The solid circles in the figure
correspond to responders, and the open circles to non-responders
Multivariate classification analysis 343
123
training and holdout groups. Results from each of the
classification methods are summarized in Table 1D.
The cross-validation (CV) accuracy across four classi-
fication methods ranged from 68% to 74%, and the optimal
number of analytes selected ranged from 10 to 64. Models
were validated by predicting the responder status of nine
subjects in a holdout group and the accuracy ranged from
56% to 89%. In particular, the T-test/Majority Vote (T-test)
classifier using 13 analytes gave the best holdout prediction
of 89% accuracy.
3.4 Biological contextualization
Even with measures of accuracy and statistical signifi-
cance, it is difficult to objectively assess the performance
of multiple methods without applying them in practical
studies. To better understand the biological relevance, we
examined whether any of the selected analytes have pre-
viously been implicated in the pathophysiology of T2DM.
Rosiglitazone, metformin and glyburide affect different
biological processes through various MOA and target tis-
sues (Ahmann and Riddle 2002). Therefore, it seems
intuitive that the analytes in predictive classifier rules, if
collectively predictive of a particular drug’s treatment
outcome, should be closely related to that drug’s presumed
MOA. This expectation is largely supported by our results.
For rosiglitazone responder prediction, among the 74
analytes identified by at least one method and with known
annotation (Supplement Table 5) the majority are involved
in the biological processes affected by rosiglitazone:
increased lipogenesis in adipose tissue and increased
insulin sensitivity and signaling in muscle and liver
(Stumvoll and Haring 2002). Examples include: energy
metabolism (e.g., citrate, lactate), adipogenesis and release
of adipokines (e.g., glycerol, leptin), immune or inflam-
matory response (IL-8, IL-12p40), fatty acid-induced
insulin resistance in liver or muscle (total free fatty acid,
insulin, PAPP-A, total TG, and glycerol), and amino acid
metabolism (Ile, Leu, Val, Pro, His, Tyr, Phe, Lys etc.).
Also, quite a few analytes (such as cholesterol ester,
diglyceride, nicotnamide, etc.) were not implicated in
T2DM or mechanism of PPAR-c agonists.
For metformin responder prediction, the 72 markers
identified by PLSDA and with known annotation were
similarly enriched in those biological processes potentially
involved in metformin action (Supplement Table 6). Met-
formin is thought to produce an energy ‘sink’ in the liver
possibly mediated via the energy sensing AMP kinase
system, resulting in both decreased hepatic lipogenesis and
gluconeogenesis (Kirpichnikov et al. 2002). Thus many of
the highlighted analytes were lipids and most of the non-
omic markers were also lipid-related, such as apoB, cho-
lesterol and free fatty acid. Additionally, another large
component of the metformin responder marker list inclu-
ded amino acids, which are essential substrates for
gluconeogenesis.
It is somewhat surprising that attempts failed to identify
classifier rules for glyburide using baseline analytes. This
suggests that prediction of glyburide response may not
depend on disease severity or other readily discernible
metabolite or lipid patterns. It also suggests that the analytes
detected on the ‘‘open profiling’’ metabolomic platforms do
not include strong baseline correlates for insulin reserve—a
presumed requirement for effective glyburide action.
Understanding this will require further exploration.
For cross-drug fingerprints, analytes by definition will
be less revealing of specific drug class-related mechanisms,
because the classification engines must select what is
common to the two or more of the drugs. These cross-drug
analytes are more likely to reflect markers of glucose-
lowering per se and less likely to identify markers indica-
tive of either a physiological subtype (e.g., insulin
resistance) or related to a treatment-specific mechanism of
action (e.g., increased adipose lipogenesis).
The three analytes measured at week 0 that were most
predictive of week 8 rosiglitazone treatment were serum
IL-8, serum 1-methyl histidine measured by NMR (with
medium confidence in annotation) and citrate in urine (with
high confidence in annotation). Each of the three analytes
grouped by their treatment response at week 0 and week 8
is shown in the boxchart (Fig. 4).
The level of urine citrate at baseline was significantly
lower in responders than non-responders (P \ 0.001). The
8 week treatment did not change the level of urine citrate
in non-responder subjects. However, it did increase urine
citrate (not statistically significant) in the responder group.
Citrate may play a critical role in cataplerosis and glucose-
regulated insulin release (Flamez et al. 2002). Because
citrate was not quantified in plasma or liver, it is hard to
pinpoint the actual biochemical context for its change. It
could be related to uncontrolled gluconeogenesis in liver
tissue. However, it cannot be ruled out that the higher
citrate excretion might also depend on increased citrate
production in renal tubular cells or from reduced citrate re-
absorption from the tubular fluid due to glucose overflow.
Increased excretion of urinary citrate has been observed in
previous NMR studies of diabetic human subjects (Salek
et al. 2007; Zuppi et al. 1998).
Serum 1-methyl histidine at baseline was higher in
responders than non-responders (P = 0.0016). In a diabetic
state, many alternative sources of energy are used when
tissue glucose concentration and utilization are low. These
include enhanced degradation of proteins and amino acids
(Dice and Walker 1979). Altered excretion of 3-methyl
histidine is a well established indicator of the degree of
degradation of skeletal muscle proteins (Chinkes 2005;
344 Y. Qiu et al.
123
Young and Munro 1978). The source of 1-methyl histidine
is usually attributed to degradation of anserine and has
been related to increased oxidation in muscle proteins
(Wishart et al. 2007), but its biological context related to
diabetes is unclear. Our result suggested that subjects with
a higher concentration of 1-methyl histidine were more
likely to respond to rosiglitazone treatment.
Serum IL-8 at baseline was higher in responders than
non-responders (P = 0.032). IL-8 is an important cytokine
in the inflammatory process. It is stimulated by high glu-
cose concentrations in endothelial cells in vitro and has
chemotactic activity for polymorphonuclear neutrophils
(playing an important role in the pathogenesis of chronic
complications of diabetes), as well as for T-lymphocyte
and smooth muscle cells. Serum IL-8 level was reported to
markedly increase in diabetic patients (Zozulinska et al.
1999). We observed it in our study and it is also reported in
the literature that one of the effects of rosiglitazone treat-
ment is to reduce apparent inflammation associated with
obesity and diabetes (Belvisi et al. 2006). Thus, it seems
consistent that subjects with higher IL-8 levels were more
responsive to rosiglitazone treatment.
4 Concluding remarks
The multivariate methods used here to identify the classi-
fier rules have unique value in identifying analytes that do
not necessarily declare themselves in more conventional
statistical analyses, such as correlation or univariate change
approaches. However, when used in a relational way with
the other markers within the list, they may unmask other
non-obvious elements of disease biology or treatment
effect.
A challenge in diabetes clinical trials and treatment is to
more optimally tailor individual drug assignment to the
pateint’s disease stage and underlying pathophysiology.
Our study suggests that there may be an advantage to using
metabolomics, in addition to standard laboratory parame-
ters and clinical decision making to construct pretreatment
classifier rules for prediction of subsequent treatment
response.
Our study is a relatively small clinical pharmacometa-
bolomic investigation, consisting of two post-dosing time
points. The main aim was as an initial exploration in the
progression toward understanding whether high content
‘‘open profiling’’ metabolomic platforms can help refine
drug development by providing more accurate predictors of
therapeutic efficacy or safety. Open profiling metabolomics
should always been seen as an initial screen that in turn
leads to further validation work. This validation would in
the first instance be experimental, using bespoke assays for
the analytes highlighted by open profiling, and once find-
ings were proven to hold ‘true’, across-study computational
and biological biomarker validation would commence. The
key markers identified in this manuscript, including citrate,
IL-8 and methyl-histidine, in addition to branched chain
amino acid degradation products and others identified by at
least two of the computational methods are subject to an
on-going follow-up study.
Acknowledgements We thank Lipomics Technologies Inc. for
generating fatty acid data, the Netherlands Organisation for Applied
Scientific Research (TNO) for serum NMR, BG Medicine Inc. for
polar and lipid metabolite LC and GC MS and Drs Brian C Sweat-
man, Rachel Ball, Azmina Mather and Baljit Sall for acquisition of
urine NMR data. We would also like to thank Dr. Chris Keefer, James
Robert, Robert Vermeulen and Nikheel Kolatkar for valuable review
of this manuscript.
References
Ahmann, A. J., & Riddle, M. C. (2002). Current oral agents for type 2
diabetes. Many options, but which to choose when? Postgrad-uate Medicine, 111, 32–40, 43.
Bastard, J. P., Maachi, M., Lagathu, C., et al. (2006). Recent advances
in the relationship between obesity, inflammation, and insulin
resistance. European Cytokine Network, 17, 4–12.
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
2
4
6
8
10
12
14
16
18
Week 0-N Week 0-R Week 8-N Week 8-R Week 0-N Week 0-R Week 8-N Week 8-R Week 0-N Week 0-R Week 8-N Week 8-R
Week0 R vs N: p=0.00076
A Urine citrate B 1-methyl histidine
0
50
100
150
200
250C IL-8
p=0.0016 p=0.032
Fig. 4 (a) Urine citrate measured by NMR in rosiglitazone respond-
ers (R) and non-responders (N) at week 0 and week 8. (b) Serum 1-
methyl histidine measured by NMR in rosiglitazone responders and
non-responders at week 0 and week 8. (c) Serum IL-8 in rosiglitazone
responders and non-responders at week 0 and week 8. n = 9 for
responders and n = 12 for non-responders
Multivariate classification analysis 345
123
Belvisi, M. G., Hele, D. J., & Birrell, M. A. (2006). Peroxisome
proliferator-activated receptor gamma agonists as therapy for
chronic airway inflammation. European Journal of Pharmacol-ogy, 533, 101–109. doi:10.1016/j.ejphar.2005.12.048.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
doi:10.1023/A:1010933404324.
Chinkes, D. L. (2005). Methods for measuring tissue protein
breakdown rate in vivo. Current Opinion in Clinical Nutritionand Metabolic Care, 8, 534–537. doi:10.1097/01.mco.00001
70754.25372.37.
Dice, J. F., & Walker, C. D. (1979). Protein degradation in metabolic
and nutritional disorders. Ciba Foundation Symposium, 75, 331–
350.
Flamez, D., Berger, V., Kruhoffer, M., Orntoft, T., Pipeleers, D., &
Schuit, F. C. (2002). Critical role for cataplerosis via citrate in
glucose-regulated insulin release. Diabetes, 51, 2018–2024. doi:
10.2337/diabetes.51.7.2018.
Fu, L. M., & Fu-Liu, C. S. (2004). Multi-class cancer subtype
classification based on gene expression signatures with reliability
analysis. FEBS Letters, 561, 186–190. doi:10.1016/S0014-5793
(04)00175-9.
Hellberg, S., Sjostrom, M., & Wold, S. (1986). The prediction of
bradykinin potentiating potency of pentapeptides. An example of
a peptide quantitative structure-activity relationship. Acta Che-mica Scandinavica. Series B: Organic Chemistry andBiochemistry, 40, 135–140.
Hom, F. G., Ettinger, B., & Lin, M.-J. (1998). Comparison of serum
fructosamine vs. glycohemoglobin as measures of glycemic
control in a large diabetic population. Acta Diabetologica, 35,
48–51. doi:10.1007/s005920050100.
Howey, J. E. A., Bennet, W. M., Browning, M. C. K., Jung, R. T., &
Fraser, C. G. (1989). Clinical utility of assays of glycosylated
haemoglobin and serum fructosamine compared: Use of data on
biological variation. Diabetic Medicine, 6, 793–796.
Kapetanovic, I. M., Rosenfeld, S., & Izmirlian, G. (2004). Overview
of commonly used bioinformatics methods and their applica-
tions. Annals of the New York Academy of Sciences, 1020, 10–
21. doi:10.1196/annals.1310.003.
Kilpatrick, E. S. (1997). Problems in the assessment of glycaemic control
in diabetes mellitus. Diabetic Medicine, 14, 819–831. doi :10.1002/
(SICI)1096-9136(199710)14:10\819::AID-DIA459[3.0.CO;2-A.
Kirpichnikov, D., McFarlane, S. I., & Sowers, J. R. (2002).
Metformin: An update. Annals of Internal Medicine, 137, 25–33.
Laakso, M. (2002). Lipids in type 2 diabetes. Seminars in VascularMedicine, 2, 59–66. doi:10.1055/s-2002-23096.
Li, L., Tang, H., Wu, Z., et al. (2004). Data mining techniques for
cancer detection using serum proteomic profiling. ArtificialIntelligence in Medicine, 32, 71–83. doi:10.1016/j.artmed.2004.
03.006.
Ma, X. J., Wang, Z., Ryan, P. D., et al. (2004). A two-gene expression
ratio predicts clinical outcome in breast cancer patients treated
with tamoxifen. Cancer Cell, 5, 607–616. doi:10.1016/j.ccr.
2004.05.015.
Meyerson, M., & Carbone, D. (2005). Genomic and proteomic
profiling of lung cancers: Lung cancer classification in the age of
targeted therapy. Journal of Clinical Oncology, 23, 3219–3226.
doi:10.1200/JCO.2005.15.511.
Ostenson, C. G. (2001). The pathophysiology of type 2 diabetes
mellitus: An overview. Acta Physiologica Scandinavica, 171,
241–247. doi:10.1046/j.1365-201x.2001.00826.x.
Petersen, J. L., & McGuire, D. K. (2005). Impaired glucose tolerance
and impaired fasting glucose—A review of diagnosis, clinical
implications and management. Diabetes & Vascular DiseaseResearch; Official Journal of the International Society ofDiabetes and Vascular Disease, 2, 9–15. doi:10.3132/dvdr.
2005.007.
Petricoin, E. F., Ardekani, A. M., Hitt, B. A., et al. (2002). Use of
proteomic patterns in serum to identify ovarian cancer. Lancet,359, 572–577. doi:10.1016/S0140-6736(02)07746-2.
Picardi, A., & Pozzilli, P. (2003). Dynamic tests in the clinical
management of diabetes. Journal of Endocrinological Investi-gation, 26(7, Suppl), 99–106.
Radmacher, M. D., McShane, L. M., & Simon, R. (2002). A paradigm
for class prediction using gene expression profiles. Journal ofComputational Biology, 9, 505–511. doi:10.1089/1066527027
60138592.
Raponi, M., Zhang, Y., Yu, J., et al. (2006). Gene expression
signatures for predicting prognosis of squamous cell and
adenocarcinomas of the lung. Cancer Research, 66, 7466–
7472. doi:10.1158/0008-5472.CAN-06-1191.
Salek, R. M., Maguire, M. L., Bentley, E., et al. (2007). A
metabolomic comparison of urinary changes in type 2 diabetes
in mouse, rat, and human. Physiological Genomics, 29, 99–108.
doi:10.1152/physiolgenomics.00194.2006.
Stumvoll, M., & Haring, H. U. (2002). Glitazones: Clinical effects
and molecular mechanisms. Annals of Medicine, 34, 217–224.
doi:10.1080/713782132.
Tahara, Y., & Shima, K. (1995). Kinetics of HbA1c, glycated
albumin, and fructosamine and analysis of their weight functions
against preceding plasma glucose level. Diabetes Care, 18, 440–
447. doi:10.2337/diacare.18.4.440.
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002).
Diagnosis of multiple cancer types by shrunken centroids of gene
expression. Proceedings of the National Academy of Sciences ofthe United States of America, 99, 6567–6572. doi:10.1073/
pnas.082099299.
‘t Veer, L. J., Dai, H., van de Vijver, M. J., et al. (2002). Gene
expression profiling predicts clinical outcome of breast cancer.
Nature, 415, 530–536. doi:10.1038/415530a.
Wishart, D. S., Tzur, D., Knox, C., et al. (2007). HMDB: The human
metabolome database. Nucleic Acids Research, 35, D521–D526.
doi:10.1093/nar/gkl923.
Young, V. R., & Munro, H. N. (1978). Ntau-methylhistidine (3-
methylhistidine) and muscle protein turnover: An overview.
Federation Proceedings, 37, 2291–2300.
Ziegler, D. (2005). Type 2 diabetes as an inflammatory cardiovascular
disorder. Current Molecular Medicine, 5, 309–322. doi:
10.2174/1566524053766095.
Zozulinska, D., Majchrzak, A., Sobieska, M., Wiktorowicz, K., &
Wierusz-Wysocka, B. (1999). Serum interleukin-8 level is
increased in diabetic patients. Diabetologia, 42, 117–118. doi:
10.1007/s001250051124.
Zuppi, C., Messana, I., Forni, F., Ferrari, F., Rossi, C., & Giardina, B.
(1998). Influence of feeding on metabolite excretion evidenced
by urine 1H NMR spectral profiles: A comparison between
subjects living in rome and subjects living at arctic latitudes
(Svaldbard). Clinica Chimica Acta, 278, 75–79. doi:10.1016/
S0009-8981(98)00132-6.
346 Y. Qiu et al.
123