10
ORIGINAL ARTICLE Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus Yang Qiu Dilip Rajagopalan Susan C. Connor Doris Damian Lei Zhu Amir Handzel Guanghui Hu Arshad Amanullah Steve Bao Nathaniel Woody David MacLean Kwan Lee Dana Vanderwall Terence Ryan Received: 12 June 2008 / Accepted: 25 July 2008 / Published online: 14 August 2008 Ó Springer Science+Business Media, LLC 2008 Abstract Recent advances in genomics, metabolomics and proteomics have made it possible to interrogate disease pathophysiology and drug response on a systems level. The analysis and interpretation of the complex data obtained using these techniques is potentially fertile but equally challenging. We conducted a small clinical trial to explore the application of metabolomics data in candidate bio- marker discovery. Specifically, serum and urine samples from patients with type 2 diabetes mellitus (T2DM) were profiled on metabolomics platforms before and after 8 weeks of treatment with one of three commonly used oral antidiabetic agents, the sulfonyurea glyburide, the bigua- nide metformin, or the thiazolidinedione rosiglitazone. Multivariate classification techniques were used to detect serum or urine analytes, obtained at baseline (pre-treat- ment) that could predict a significant treatment response after 8 weeks. Using this approach, we identified three analytes, measured at baseline, that were associated with response to a thiazolidinedione after 8 weeks of treatment. Although larger and longer-term studies are required to validate any of the candidate biomarkers, pharmacometa- bolomic profiling, in combination with multivariate classification, is worthy of further exploration as an adjunct to clinical decision making regarding treatment selection and for patient stratification within clinical trials. Keywords Classification Á Biomarker Á Metabolomics Á Pharmacometabolomics Á Metabonomics Á NMR 1 Introduction Among the goals of systems biology is achieving a broad or ‘systematic’ view of biological changes in a cell or organism as a function of some perturbation. This can be assessed by measuring changes in levels of genes, tran- scripts, proteins or metabolites and mining these changes using intensive multivariate statistics and pattern analyses. The complex nature of the experimental data and compu- tational results also have the potential to more robustly characterize inter-individual relationships between genetic Electronic supplementary material The online version of this article (doi:10.1007/s11306-008-0123-5) contains supplementary material, which is available to authorized users. Y. Qiu (&) Á D. Rajagopalan Á G. Hu Á N. Woody Á D. Vanderwall Department of Informatics, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA e-mail: [email protected] S. C. Connor Department of Investigative Preclinical Toxicology, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA e-mail: [email protected] D. Damian Á A. Handzel BG Medicine Inc, Waltham, MA 02451, USA L. Zhu Department of Statistical Sciences, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA A. Amanullah Á D. MacLean Á T. Ryan High Throughput Biology, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA S. Bao Molecular Discovery IT, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA K. Lee Department of Biomedical Data Sciences, GlaxoSmithKline, Five Moore Drive, Research Triangle Park, NC 27709, USA 123 Metabolomics (2008) 4:337–346 DOI 10.1007/s11306-008-0123-5

Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

Embed Size (px)

Citation preview

Page 1: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

ORIGINAL ARTICLE

Multivariate classification analysis of metabolomic datafor candidate biomarker discovery in type 2 diabetes mellitus

Yang Qiu Æ Dilip Rajagopalan Æ Susan C. Connor Æ Doris Damian ÆLei Zhu Æ Amir Handzel Æ Guanghui Hu Æ Arshad Amanullah ÆSteve Bao Æ Nathaniel Woody Æ David MacLean Æ Kwan Lee Æ Dana Vanderwall ÆTerence Ryan

Received: 12 June 2008 / Accepted: 25 July 2008 / Published online: 14 August 2008

� Springer Science+Business Media, LLC 2008

Abstract Recent advances in genomics, metabolomics

and proteomics have made it possible to interrogate disease

pathophysiology and drug response on a systems level. The

analysis and interpretation of the complex data obtained

using these techniques is potentially fertile but equally

challenging. We conducted a small clinical trial to explore

the application of metabolomics data in candidate bio-

marker discovery. Specifically, serum and urine samples

from patients with type 2 diabetes mellitus (T2DM) were

profiled on metabolomics platforms before and after

8 weeks of treatment with one of three commonly used oral

antidiabetic agents, the sulfonyurea glyburide, the bigua-

nide metformin, or the thiazolidinedione rosiglitazone.

Multivariate classification techniques were used to detect

serum or urine analytes, obtained at baseline (pre-treat-

ment) that could predict a significant treatment response

after 8 weeks. Using this approach, we identified three

analytes, measured at baseline, that were associated with

response to a thiazolidinedione after 8 weeks of treatment.

Although larger and longer-term studies are required to

validate any of the candidate biomarkers, pharmacometa-

bolomic profiling, in combination with multivariate

classification, is worthy of further exploration as an adjunct

to clinical decision making regarding treatment selection

and for patient stratification within clinical trials.

Keywords Classification � Biomarker � Metabolomics �Pharmacometabolomics � Metabonomics � NMR

1 Introduction

Among the goals of systems biology is achieving a broad

or ‘systematic’ view of biological changes in a cell or

organism as a function of some perturbation. This can be

assessed by measuring changes in levels of genes, tran-

scripts, proteins or metabolites and mining these changes

using intensive multivariate statistics and pattern analyses.

The complex nature of the experimental data and compu-

tational results also have the potential to more robustly

characterize inter-individual relationships between genetic

Electronic supplementary material The online version of thisarticle (doi:10.1007/s11306-008-0123-5) contains supplementarymaterial, which is available to authorized users.

Y. Qiu (&) � D. Rajagopalan � G. Hu � N. Woody �D. Vanderwall

Department of Informatics, GlaxoSmithKline, Five Moore

Drive, Research Triangle Park, NC 27709, USA

e-mail: [email protected]

S. C. Connor

Department of Investigative Preclinical Toxicology,

GlaxoSmithKline, Five Moore Drive, Research Triangle Park,

NC 27709, USA

e-mail: [email protected]

D. Damian � A. Handzel

BG Medicine Inc, Waltham, MA 02451, USA

L. Zhu

Department of Statistical Sciences, GlaxoSmithKline, Five

Moore Drive, Research Triangle Park, NC 27709, USA

A. Amanullah � D. MacLean � T. Ryan

High Throughput Biology, GlaxoSmithKline, Five Moore Drive,

Research Triangle Park, NC 27709, USA

S. Bao

Molecular Discovery IT, GlaxoSmithKline, Five Moore Drive,

Research Triangle Park, NC 27709, USA

K. Lee

Department of Biomedical Data Sciences, GlaxoSmithKline,

Five Moore Drive, Research Triangle Park, NC 27709, USA

123

Metabolomics (2008) 4:337–346

DOI 10.1007/s11306-008-0123-5

Page 2: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

variations, and the mechanisms underlying these differ-

ences. Similarly, a broad array of measurements might

provide greater prognostic ability regarding experimental

outcomes as compared to a single biomarker. In drug dis-

covery, these studies can be used to identify candidate

biomarkers (or a fingerprint) for a disease, drug efficacy or

toxicity. Type 2 diabetes mellitus (T2DM) represents an

interesting case study as it is a multi-factorial disease state

with considerable inter-individual heterogeneity.

T2DM is a complex disturbance of physiologic mecha-

nisms affecting many metabolic homeostatic processes,

including energy and lipid metabolism, inflammation,

clotting and vascular endothelial functions (Bastard et al.

2006; Laakso 2002; Ziegler 2005). These disturbances

arise from reduced insulin action in peripheral tissues

predominantly from a resistance to circulating insulin,

together with impaired pancreatic insulin secretion (Kil-

patrick 1997). Given the causal relationship between

hyperglycemia and diabetic complications, measures of

glycemia, such as fasting plasma glucose (FPG), glycos-

ylated hemoglobin (HbA1c), or, less commonly,

fructosamine, are typically used to monitor disease pro-

gression and treatment efficacy. However, these measures

generally do not discriminate between the various patho-

physiological phenotypes of diabetes (Ostenson 2001;

Petersen and McGuire 2005). Understanding the patho-

physiologic profile may better inform us of biologic

mechanisms and therapeutic efficacy for particular phar-

macologic agents.

Various technologies have been applied in recent years

to develop predictive biomarkers. For example, gene

expression profiling and proteomics have been used to

predict the clinical outcome of cancer treatments (‘t Veer

et al. 2002; Ma et al. 2004; Meyerson and Carbone 2005;

Petricoin et al. 2002; Raponi et al. 2006). However, our

analysis focuses on measured metabolites in the context of

larger systems biology study for two reasons. First, the

analysis of metabolomic data and the subsequent biological

contextualization from a systems biology perspective is an

area still under very active development. A survey of

diverse analysis techniques, with proven utility in other

areas, could help establish what might be expected in

future studies pursuing similar goals. Second, the long-term

goal of many similar studies will be to identify biomarkers

that, once validated, could be useful in clinical or diag-

nostic settings. As such, the analysis was undertaken to

explore the patterns and level of predictive modeling that

could be realized from biomarkers readily available with-

out biopsy, which poses a significantly greater burden than

serum or urine sample collection.

Commonly used oral antidiabetic agents were used in

our study, including the sulfonylurea glyburide, the

biguanide metformin, and the thiazolidinedione

rosiglitazone, representing a broad range of mechanism of

action (MOA) (Ahmann and Riddle 2002; Bastard et al.

2006). Our clinical trial was relatively small-scale (75 male

subjects with T2DM) and short-term (8 weeks of treat-

ment). The intent was a hypothesis-generating activity to

investigate the applicability of systems biology in a drug

discovery context. Serum and urine samples were obtained

at pre-treatment baseline, and after 8 weeks of treatment

with one of the following: placebo, rosiglitazone, metfor-

min or glyburide. High information content nuclear

magnetic resonance (NMR) and liquid chromatography/

mass spectroscopy (LC/MS)-based metabolomic platforms,

including polar metabolite and lipid profiling, were used to

profile the samples. We used a variety of multivariate

analysis techniques to determine whether polar low

molecular weight metabolites, lipids, or fatty acids, ana-

lyzed in readily accessible fluids can be used to predict

drug responder status at week 8 based on their measure-

ment at baseline.

2 Materials and methods

2.1 Experimental design

Male subjects aged 30–70 years with a documented history

of stable T2DM for no more than 10 years duration were

eligible if they had been previously treated with diet and

exercise alone, monotherapy or low-dose combination

therapy. Fasting plasma glucose at screening could not

exceed 225 mg/dl for subjects treated with diet and exer-

cise alone or 180 mg/dl for subjects receiving monotherapy

or low-dose combination therapy. HbA1c was required to

be within 5.7–10.0% with the following conditions; sub-

jects with HbA1c between 5.7% and 9% must have been

diabetic for less than 5 years and treated with mono or low-

dose combination therapy and have a FPG of 125–180

mg/dl, and subjects with HbA1c between 9.1% and 10.0%

must not have been treated with combination therapy. In

addition, body mass index must have been within the

range of 25–37.5 kg/m2, for subjects aged 30–55 years, or

25–35.0 kg/m2 for subjects aged 56–70 years. Use of

insulin for greater than 7 days during the 6 months prior to

screening was prohibited and use of the following medi-

cations within 1 month prior to screening that may affect

response of experimental drugs was also prohibited: thia-

zolidinediones, high dose HMG-CoA reductase inhibitors

(statins), and high dose cholesterol absorption inhibitors.

Eligible subjects entered the treatment phase after a 5-week

washout period and were randomly assigned to one of four

single-blind treatment groups: 19 to placebo, 22 to rosig-

litazone, 21 to metformin, and 21 to glyburide (Of the 83

subjects that went into the trial, we were able to obtain the

338 Y. Qiu et al.

123

Page 3: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

metabolomics data for 75 of them). All subjects were

blinded to study medication (single-blinded). Based on

glucose levels, doses of glyburide (total dose 5–15 mg) and

metformin (total dose 500–1,500 mg) were single-blind

titrated upwards at weeks 2 and 4, and rosiglitazone was

titrated from 2 mg twice daily to 4 mg twice daily at week

4 only. Blood and urine samples were collected prior to and

at 4 and 8 weeks following initiation of treatment. The

baseline (week 0) clinical and biochemical characteristics

of participants are shown in Supplement Table 1, along

with normal characteristics of the general male population.

2.2 Data generation

Serum and urine samples were analyzed using various met-

abolomic platforms and with traditional serum biomarker

(‘‘non-omic’’) measurements. Both urine and serum were

measured by NMR-based metabolic profiling. Serum sam-

ples were also analyzed by LC/MS for polar metabolites and

lipids, and GC-flame ionization for fatty acids (lipidomics).

Analysis of clinical chemistry, serum and plasma protein

biomarkers, and physiological parameters such as body

weights were also included in the data set. In total, there were

over 3,000 variables included in the analysis: 98 analytes

from clinical chemistry, 303 fatty acids from GC-flame

ionization, 467 lipids from LC/MS, 921 LC/MS polar

metabolite peaks, 314 NMR serum metabolite peaks, and

1006 urine NMR metabolite peaks which include both 0 h

and 6 h measurements (Urine samples were collected at both

0 h and 6 h). Both the details of the metabolomics platform

data acquisition and signal processing can be referred to

Supplementary Method A.

2.3 Data pre-treatment

The first step in data pretreatment was to handle missing

values, since several multivariate classification methods

do not allow missing values. Metabolic analytes with too

many missing values were eliminated. Up to 25% missing

values in either class were allowed for serum NMR data,

up to 20% missing allowed for non-omic analytes, and up

to 15% missing data was allowed for the remaining

platforms. For subjects in the training set, missing values

were set to the median value of non-missing training

subjects in same class. For subjects in the holdout set,

missing values were set to the median value of all non-

missing subjects. After data preprocessing, there were

about 1,500 analytes remaining for use in classification.

The final step was location and scale transformation

performed across all samples in the analysis to ensure the

samples were from the same distribution and comparable

to each other.

2.4 Definition of treatment response (‘‘responder’’)

using a composite score of glycemic-lowering

efficacy.

Originally, efficacy response was defined as a FPG

decrease of greater than 30 mg/dl. However, glucose is

highly variable and influenced by short-term changes in

diet, activity or stress, whereas integrated measures of

glycemic response can estimate whether a patient’s average

glucose has changed over time (weeks to months) in

response to treatment (Tahara and Shima 1995). Fructos-

amine, whose half-life is determined by that of albumin,

provides a measure of integrated glucose over a period of

2–4 weeks (Hom et al. 1998). HbA1c, a form of glycos-

ylated hemoglobin, is the gold standard measure of

integrated glucose over a 6–12 week period (Howey et al.

1989, Picardi and Pozzilli 2003). These three measures of

glycemic efficacy—FPG, fructosamine and HbA1c—were

used to determine if they could more reliably predict

responder status when used in combination.

Since our analysis used an 8-week study, which is less

than the 12 weeks generally required to reach full glycemic

efficacy with PPAR-c agonists (rosiglitazone), it was nec-

essary to derive a surrogate measure of efficacy that reflects

a developing response trend. This composite measure of

efficacy was derived solely for its use in the modeling in

this study, and has not been tested or validated in a general

context. Any candidate biomarkers discovered on the basis

of this composite efficacy measure must be tested and

validated in larger and longer duration trials that use con-

ventional, well-accepted definitions of glycemic response

before they are used to make treatment decisions.

Combined data from three larger clinical trials (GSK trials

49653_011, 49653_020, 49653_024, http://ctr.gsk.co.

uk/welcome.asp) were used to model changes in FPG,

fructosamine and HbA1c at 8 weeks versus measured

changes in HbA1c (the accepted gold standard) at 17 weeks.

The goal was to establish an efficacy measure and responder

criterion at 8 weeks that matches the 17-week ‘‘truth’’. Many

composite scoring rules were able, with 8 week data, to

outperform observed change in any single measure in pre-

dicting the HbA1c change at 17 weeks. A rule was chosen

from within the mathematical ‘space’ of choices that was

relatively simple and reflected the perceived relative value of

the glycemic markers as discussed above: 1(%DFPG) +

2(%DFructosamine) + 1.5(%DHbA1c) = response. Thus,

if the composite % reduction is greater than 30% using this

formula, the subject is classified as a ‘responder’. Using the

composite score definition, the fraction of subjects

responding in this 8-week trial was shown in Supplementary

Table 2 and ranged between 43% and 60% for the three

treatments.

Multivariate classification analysis 339

123

Page 4: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

2.5 Multivariate classification methods

Analysis of large volumes of data with a high number of

variables (dimensions) poses a challenge for data classifi-

cation. The four classification methods we used were

Random Forest (RF), Prediction Analysis for Microarray

(PAM), Partial Least Square-Discriminant Analysis (PLS-

DA), and T-test/Majority Vote (T-test). RF is a decision

tree-based classifier using an algorithm originally devel-

oped by Leo Breiman (Breiman 2001). It grows many

classification trees (a forest) and the forest chooses the

classification of a sample by choosing the class that has the

most votes across all trees. PAM is a centroid classifier

proposed by Narashiman which computes a standardized

centroid for each class and predicts the class of a new

sample based on the its distance to the class centroid

(Tibshirani et al. 2002). PLS regression is a dimension

reduction method that finds components of independent

variable space that are relevant to the outcome space

(Hellberg et al. 1986). T-test classifier is a simple, majority

vote-based classifier that uses a t-test for feature selection.

The next step for T-test classifier is to calculate a threshold

value for each selected feature, which is the mean value of

the two means from the two classes. Each analyte can then

be used independently to classify a sample, depending on

which side of the threshold the analyte value for that

sample lies and the final class is determined by majority

vote. A more detailed description of each method is

described in a supplement (Supplementary Method B).

2.6 Building the classifier and model validation

Data overfitting is a known issue in data mining where

the number of variables greatly exceeds the number of

observations. In order to ensure that the classifier has not

over fitted the data, proper data validation procedures

should be adopted (Radmacher et al. 2002). A standard

procedure was used for all four classification methods.

The samples to be classified were randomly divided into a

training set and a holdout set. The training samples were

used to determine parameters for each classifier such as

the optimal number of analytes to maximize accuracy

(based on the percentage of samples correctly classified)

using a cross-validation procedure. A four-fold cross-

validation (CV) was used in this analysis, where the

training samples were randomly divided into four CV

groups that were as class balanced as possible. In the CV

procedure, numerous combinations of free parameters of

each classifier were selected to span the parameter space;

the classifiers were built using 3 out of the 4 CV groups

and the resulting models were used to make class pre-

dictions on the samples in the 4th group. The particular

combination of parameters that maximized accuracy over

the entire parameter space was selected as the optimal

parameter set. The accuracy associated with this optimal

parameter set is known as the CV accuracy. Once the

optimal parameter set was determined, the entire set of

four groups (all the training samples) were used to rebuild

the classifier and make class predictions on the holdout

set to obtain the holdout accuracy. For individual drug

fingerprints, the total number of samples available was

only around 20; in these cases, division into training and

holdout sets was not performed. All the samples were

used in CV mode.

To assess the significance of the CV results, a permu-

tation strategy was adopted. The four-fold CV step was

repeated using randomly permuted class labels between

100 and 1,000 times depending on the method. Due to the

small sample size (n = 20–60) and small class number (2

classes), the classifier was considered significant if the

percentage of permutation runs with better CV accuracy

than the un-permuted case was on the order of 10% or less

(P \ 0.1).

3 Results and discussions

Application of multivariate classification methods to data

obtained from metabolomics, transcriptomics or proteo-

mics is an active area of research (‘t Veer et al. 2002; Fu

and Fu-Liu 2004; Kapetanovic et al. 2004; Li et al. 2004).

At present, evidence is lacking regarding the advantages of

one classification method over another. Therefore, our

approach was to apply four representative classification

techniques in parallel for every question of interest and to

compare results obtained from different methods. The

methods included both linear and non-linear classification

in original space or transformed space. The models were

used to classify responses into two groups: subjects who

respond to treatment, responders (R) and those who don’t,

nonresponders (NR). The workflow was kept constant

across all four methods (Fig. 1).

For classification using metabolomic data, serum mea-

surements of conventional glycemic markers (FPG,

fructosamine and HbA1c) were excluded from the com-

bined dataset. All NMR resonances from serum and urine

corresponding to the alpha and beta anomers of glucose

were also excluded, ie 3.215–3.26, 3.377–3.6, 3.68–3.926,

4.6–5.26 for serum and 5.2–5.3, 4.56–4.7, 3.68–3.92,

3.22–3.56 for urine, the differing values in the two biofl-

uids a reflection of the differing bucketing methods used to

derive the features (classical bucketing (0.067 ppm width)

and intelligent bucketing (0.02 ppm ± 50% width) for

serum and urine respectively). The rationale for exclusion

was to identify analytes other than the conventional

glycemic markers.

340 Y. Qiu et al.

123

Page 5: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

3.1 Cross-drug fingerprint

Our goal was to identify a set of analytes that can predict

8-week patient response to oral antidiabetic agents with

diverse MOA. If a classifier could successfully predict

treatment response (R vs. NR) from three diverse mech-

anisms, it could be potentially useful to predict response

of a new drug with a different MOA. Classification

analysis was applied to data from 60 subjects who were

treated with one of the three study drugs. The samples

were divided into 46 subjects in the training group and 14

in the holdout group. Both treatment type and class were

properly balanced in the training and holdout groups.

Results from each of the classification methods are

summarized in Table 1A.

The cross-validation (CV) accuracy across four classi-

fication methods ranged from 59% to 74%. The

permutation procedure indicated that when cross validation

was repeated with a randomized class label, no more than

9% (for the Prediction Analysis for Microarray PAM

classifier (Tibshirani et al. 2002)) of the CV accuracy was

better than the original CV accuracy; in other words, the

permutation P value ranged from 0.09 to less than 0.01

depending on the method. The number of analytes used by

each classifier ranged from 5 to 190. Models were validated

by predicting the responder status of 14 subjects in a

holdout group and the accuracy ranged from 50% to 71%.

In particular, the T-test/Majority Vote (T-test) classifier

using 75 analytes gave the best holdout prediction of 71%

accuracy. We noted the prediction accuracies were less for

glyburide than for metformin and rosiglitazone. It is evi-

dent from the principal component analysis (PCA) plot

shown in Fig. 2b that the 13 analytes (Supplementary

Table 3) picked by all four models have the ability to

discriminate non-responders (open circles) from responders

(solid circles), while using all 1,735 analytes did not sep-

arate the two groups (Fig. 2a). The means and standard

deviations for the 13 discriminating cross-drug analytes for

responders and non-responders at baseline and week 8 are

shown in Supplementary Table 4.

Since T2DM is a disease with established biomarkers of

disease severity and therapeutic efficacy, it is important to

establish whether classification using metabolomic plat-

forms offers any advantage relative to the conventional

gylcemic biomarkers. Results for prediction of treatment

response using only the three conventional markers at

baseline (FPG, fructosamine, HbA1c) indicated that none

of the classifiers yield a statistically significant model (data

not shown), suggesting that additional data which more

comprehensively represent the underlying biology, such as

those acquired using metabolomics, are needed to predict

treatment response. A PCA plot using those three markers

also showed inter-mixed responders and non-responders

(Fig. 2c).

Holdoutprediction

Holdoutaccuracy

Permutation procedure

Repeat external CV with randomized class labels in training set.

(100-1000 runs)

% Permutationpercentage of runs with betterCV accuracy.

Holdout

set

Sample partition

1

4

2

3

Optimal Number of analytes (parameter tuning)

CV accuracy: percent of sample correctly predicted

1

2

4

3

4 Fold External Cross Validation

Training

set

Fig. 1 Workflow of building classifier and model validation. The

same workflow was applied to all four methods: Random Forrest,

Prediction Analysis of Microarray, Partial Least Squares-Discrimi-

nant Analysis, Support Vector Machine, T-test/Majority Vote.

Samples were divided into a training set and a holdout set. The

classifier was built in a 4-fold cross validation (CV) where the optimal

number of features used in the classifier was selected to give the best

cross validation accuracy. The model was then validated through two

procedures. One was holdout prediction, since the holdout set had

never been used to build model or classifier. The second procedure

was the permutation procedure. The cross validation was repeated for

100–1,000 runs (method dependent) with randomized class labels.

The percentage of permutation was the percent of permutation runs

that had better CV accuracy than the original CV accuracy

Multivariate classification analysis 341

123

Page 6: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

3.2 Individual drug fingerprint

In this question, the goal was to find a set of analytes that

can predict patient response to a specific oral therapy:

rosiglitazone, metformin or glyburide. Since we only had

data available for *21 subjects per oral therapy, all sub-

jects were included in the cross-validation group.

Significant classifiers were obtained for predicting rosig-

litazone outcomes using metabolomic data prior to

treatment (Table 1B). CV accuracies ranged from 67% to

81% using 3–67 analytes. We noted that a classifier built

from three analytes using T-test/Majority Vote had a cross

validation accuracy of 81%. The three analytes were also

included in the list of features picked by the other three

classifiers. Figure 3b shows the discriminating power of

these three analytes (urine citrate, serum 1-methyl histi-

dine, and serum IL-8), with good separation evident

between the responder (solid circles) and non-responder

(open circles) groups, whereas using all 1,306 analytes

included in this analysis does not indicate separation of the

two groups (Fig. 3a). In comparison, the CV accuracies

were worse using the three conventional glycemic bio-

markers (glucose, fructosomine, HbA1c) than using the set

of metabolomic analytes. This was consistent with the PCA

plot of the three conventional biomarkers alone, where

there was no clear separation of responders versus non-

responders (Fig. 3c). The relationship between urinary

citrate, serum 1-methyl histidine, and serum IL-8 and the

Table 1 Cross-drug or individual drug classification results predicting

week 8 treatment response using metabolomic data at baseline. (A)

Cross-drug classification results. (B) Classification results for Rosiglit-

azone treated subjects. (C) Classification results for Metformin treated

subjects. Accuracy results for RF, PAM and T-test are not shown because

the models do not pass the permutation test. (D) Classification results for

Rosiglitazone or Metformin-treated subjects. Accuracy results for RF are

not shown because the model did not pass the permutation test

Method Number of analytes CV accuracy (%) (n = 46) %Permutation (# of permutation) Holdout accuracy (%) (n = 14)

R M G T R M G T

(A) Cross-drug classification results

PLS-DA 190 56 67 53 59 5.6 (500) 60 75 60 64

RF 20 94 87 73 85 \1 (100) 60 100 40 64

PAM 138 69 80 53 67 9 (100) 60 50 40 50

T-test 75 62 87 73 74 3.7 (1000) 80 100 41 71

Method Number of analytes CV accuracy (%) (n = 21) %Permutation (# of permutation)

NR RS T

(B) Classification results for Rosiglitazone treated subjects

PLS-DA 55 83 44 67 8 (500)

RF 66 92 67 81 4 (100)

PAM 67 92 56 76 5 (100)

T-test 3 75 89 81 7.5 (1000)

(C) Classification results for Metformin treated subjects

PLS-DA 110 56 80 68 1 (500)

RF 71 37 (100)

PAM 1141 15 (100)

T-test 88 20 (1000)

Method Number of analytes CV accuracy (%) (n = 31) % Permutation (# of permutations) Holdout accuracy (%) (n = 9)

R M T R M T

(D) Classification results for Rosiglitazone or Metformin-treated subjects

PLS-DA 10 62 73 68 0.8 (500) 60 75 67

RF 17 11 (100)

PAM 64 75 80 77 1 (100) 60 60 56

T-test 13 69 80 74 7.5 (1000) 80 100 89

The number of analytes indicates the optimal number that maximized prediction accuracy in cross-validation. The percentage of permutation is

the percent of permutation runs that had better CV accuracy than the original CV accuracy. The number in brackets indicates the number of

permutation runs which was method dependent.

R = rosiglitazone. M = metformin, G = glyburide, T = overall accuracy, RS = Responder, NR = non-responder

342 Y. Qiu et al.

123

Page 7: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

three conventional diabetes markers (glucose, HbA1c and

fructosamine) is shown in Supplementary Fig. 1.

For metformin-treated subjects, only PLS-DA yielded

classifiers with a permutation percentage of less than 10%

(or P \ 0.1). The CV accuracies were 68% with 110

analytes picked (Table 1C). For glyburide-treated subjects,

none of the methods yielded a significant model predicting

its treatment outcome. This result is consistent with the

observation in the cross-drug analysis shown above that the

accuracy in classifying glyburide-treated subjects was

lowest among the three drugs.

3.3 Fingerprint for rosiglitazone and metformin

Since the glyburide-treated patients were hardest to classify

in both the cross-drug analysis as well as the glyburide-

only analysis, we decided to build classifiers for patients

treated with either rosiglitazone or metformin, leaving out

glyburide. Classification analysis was applied to data from

40 subjects who were treated with either rosiglitazone or

metformin. The samples were divided into 31 subjects in

the training group and nine in the holdout group. Both

treatment type and class were properly balanced in the

Fig. 2 PCA plot of subjects treated with three study drugs. (a) Uses

all 1,735 analytes at baseline; (b) Uses 13 analytes at baseline picked

by all four classifiers that are predictive of treatment response across

all three drugs; (c) Uses three conventional markers at baseline:

glucose, fructosamine, HbA1c. The solid circles in the figure

correspond to responders, and the open circles to non-responders

Fig. 3 PCA plot of subjects treated with rosiglitazone. (a) Uses all

1,306 analytes at baseline; (b) Uses three analytes at baseline picked

by all four classifiers that are predictive of treatment response for

rosiglitazone-treated subjects; (c) Uses three conventional markers at

baseline: glucose, fructosamine, HbA1c. The solid circles in the figure

correspond to responders, and the open circles to non-responders

Multivariate classification analysis 343

123

Page 8: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

training and holdout groups. Results from each of the

classification methods are summarized in Table 1D.

The cross-validation (CV) accuracy across four classi-

fication methods ranged from 68% to 74%, and the optimal

number of analytes selected ranged from 10 to 64. Models

were validated by predicting the responder status of nine

subjects in a holdout group and the accuracy ranged from

56% to 89%. In particular, the T-test/Majority Vote (T-test)

classifier using 13 analytes gave the best holdout prediction

of 89% accuracy.

3.4 Biological contextualization

Even with measures of accuracy and statistical signifi-

cance, it is difficult to objectively assess the performance

of multiple methods without applying them in practical

studies. To better understand the biological relevance, we

examined whether any of the selected analytes have pre-

viously been implicated in the pathophysiology of T2DM.

Rosiglitazone, metformin and glyburide affect different

biological processes through various MOA and target tis-

sues (Ahmann and Riddle 2002). Therefore, it seems

intuitive that the analytes in predictive classifier rules, if

collectively predictive of a particular drug’s treatment

outcome, should be closely related to that drug’s presumed

MOA. This expectation is largely supported by our results.

For rosiglitazone responder prediction, among the 74

analytes identified by at least one method and with known

annotation (Supplement Table 5) the majority are involved

in the biological processes affected by rosiglitazone:

increased lipogenesis in adipose tissue and increased

insulin sensitivity and signaling in muscle and liver

(Stumvoll and Haring 2002). Examples include: energy

metabolism (e.g., citrate, lactate), adipogenesis and release

of adipokines (e.g., glycerol, leptin), immune or inflam-

matory response (IL-8, IL-12p40), fatty acid-induced

insulin resistance in liver or muscle (total free fatty acid,

insulin, PAPP-A, total TG, and glycerol), and amino acid

metabolism (Ile, Leu, Val, Pro, His, Tyr, Phe, Lys etc.).

Also, quite a few analytes (such as cholesterol ester,

diglyceride, nicotnamide, etc.) were not implicated in

T2DM or mechanism of PPAR-c agonists.

For metformin responder prediction, the 72 markers

identified by PLSDA and with known annotation were

similarly enriched in those biological processes potentially

involved in metformin action (Supplement Table 6). Met-

formin is thought to produce an energy ‘sink’ in the liver

possibly mediated via the energy sensing AMP kinase

system, resulting in both decreased hepatic lipogenesis and

gluconeogenesis (Kirpichnikov et al. 2002). Thus many of

the highlighted analytes were lipids and most of the non-

omic markers were also lipid-related, such as apoB, cho-

lesterol and free fatty acid. Additionally, another large

component of the metformin responder marker list inclu-

ded amino acids, which are essential substrates for

gluconeogenesis.

It is somewhat surprising that attempts failed to identify

classifier rules for glyburide using baseline analytes. This

suggests that prediction of glyburide response may not

depend on disease severity or other readily discernible

metabolite or lipid patterns. It also suggests that the analytes

detected on the ‘‘open profiling’’ metabolomic platforms do

not include strong baseline correlates for insulin reserve—a

presumed requirement for effective glyburide action.

Understanding this will require further exploration.

For cross-drug fingerprints, analytes by definition will

be less revealing of specific drug class-related mechanisms,

because the classification engines must select what is

common to the two or more of the drugs. These cross-drug

analytes are more likely to reflect markers of glucose-

lowering per se and less likely to identify markers indica-

tive of either a physiological subtype (e.g., insulin

resistance) or related to a treatment-specific mechanism of

action (e.g., increased adipose lipogenesis).

The three analytes measured at week 0 that were most

predictive of week 8 rosiglitazone treatment were serum

IL-8, serum 1-methyl histidine measured by NMR (with

medium confidence in annotation) and citrate in urine (with

high confidence in annotation). Each of the three analytes

grouped by their treatment response at week 0 and week 8

is shown in the boxchart (Fig. 4).

The level of urine citrate at baseline was significantly

lower in responders than non-responders (P \ 0.001). The

8 week treatment did not change the level of urine citrate

in non-responder subjects. However, it did increase urine

citrate (not statistically significant) in the responder group.

Citrate may play a critical role in cataplerosis and glucose-

regulated insulin release (Flamez et al. 2002). Because

citrate was not quantified in plasma or liver, it is hard to

pinpoint the actual biochemical context for its change. It

could be related to uncontrolled gluconeogenesis in liver

tissue. However, it cannot be ruled out that the higher

citrate excretion might also depend on increased citrate

production in renal tubular cells or from reduced citrate re-

absorption from the tubular fluid due to glucose overflow.

Increased excretion of urinary citrate has been observed in

previous NMR studies of diabetic human subjects (Salek

et al. 2007; Zuppi et al. 1998).

Serum 1-methyl histidine at baseline was higher in

responders than non-responders (P = 0.0016). In a diabetic

state, many alternative sources of energy are used when

tissue glucose concentration and utilization are low. These

include enhanced degradation of proteins and amino acids

(Dice and Walker 1979). Altered excretion of 3-methyl

histidine is a well established indicator of the degree of

degradation of skeletal muscle proteins (Chinkes 2005;

344 Y. Qiu et al.

123

Page 9: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

Young and Munro 1978). The source of 1-methyl histidine

is usually attributed to degradation of anserine and has

been related to increased oxidation in muscle proteins

(Wishart et al. 2007), but its biological context related to

diabetes is unclear. Our result suggested that subjects with

a higher concentration of 1-methyl histidine were more

likely to respond to rosiglitazone treatment.

Serum IL-8 at baseline was higher in responders than

non-responders (P = 0.032). IL-8 is an important cytokine

in the inflammatory process. It is stimulated by high glu-

cose concentrations in endothelial cells in vitro and has

chemotactic activity for polymorphonuclear neutrophils

(playing an important role in the pathogenesis of chronic

complications of diabetes), as well as for T-lymphocyte

and smooth muscle cells. Serum IL-8 level was reported to

markedly increase in diabetic patients (Zozulinska et al.

1999). We observed it in our study and it is also reported in

the literature that one of the effects of rosiglitazone treat-

ment is to reduce apparent inflammation associated with

obesity and diabetes (Belvisi et al. 2006). Thus, it seems

consistent that subjects with higher IL-8 levels were more

responsive to rosiglitazone treatment.

4 Concluding remarks

The multivariate methods used here to identify the classi-

fier rules have unique value in identifying analytes that do

not necessarily declare themselves in more conventional

statistical analyses, such as correlation or univariate change

approaches. However, when used in a relational way with

the other markers within the list, they may unmask other

non-obvious elements of disease biology or treatment

effect.

A challenge in diabetes clinical trials and treatment is to

more optimally tailor individual drug assignment to the

pateint’s disease stage and underlying pathophysiology.

Our study suggests that there may be an advantage to using

metabolomics, in addition to standard laboratory parame-

ters and clinical decision making to construct pretreatment

classifier rules for prediction of subsequent treatment

response.

Our study is a relatively small clinical pharmacometa-

bolomic investigation, consisting of two post-dosing time

points. The main aim was as an initial exploration in the

progression toward understanding whether high content

‘‘open profiling’’ metabolomic platforms can help refine

drug development by providing more accurate predictors of

therapeutic efficacy or safety. Open profiling metabolomics

should always been seen as an initial screen that in turn

leads to further validation work. This validation would in

the first instance be experimental, using bespoke assays for

the analytes highlighted by open profiling, and once find-

ings were proven to hold ‘true’, across-study computational

and biological biomarker validation would commence. The

key markers identified in this manuscript, including citrate,

IL-8 and methyl-histidine, in addition to branched chain

amino acid degradation products and others identified by at

least two of the computational methods are subject to an

on-going follow-up study.

Acknowledgements We thank Lipomics Technologies Inc. for

generating fatty acid data, the Netherlands Organisation for Applied

Scientific Research (TNO) for serum NMR, BG Medicine Inc. for

polar and lipid metabolite LC and GC MS and Drs Brian C Sweat-

man, Rachel Ball, Azmina Mather and Baljit Sall for acquisition of

urine NMR data. We would also like to thank Dr. Chris Keefer, James

Robert, Robert Vermeulen and Nikheel Kolatkar for valuable review

of this manuscript.

References

Ahmann, A. J., & Riddle, M. C. (2002). Current oral agents for type 2

diabetes. Many options, but which to choose when? Postgrad-uate Medicine, 111, 32–40, 43.

Bastard, J. P., Maachi, M., Lagathu, C., et al. (2006). Recent advances

in the relationship between obesity, inflammation, and insulin

resistance. European Cytokine Network, 17, 4–12.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0

2

4

6

8

10

12

14

16

18

Week 0-N Week 0-R Week 8-N Week 8-R Week 0-N Week 0-R Week 8-N Week 8-R Week 0-N Week 0-R Week 8-N Week 8-R

Week0 R vs N: p=0.00076

A Urine citrate B 1-methyl histidine

0

50

100

150

200

250C IL-8

p=0.0016 p=0.032

Fig. 4 (a) Urine citrate measured by NMR in rosiglitazone respond-

ers (R) and non-responders (N) at week 0 and week 8. (b) Serum 1-

methyl histidine measured by NMR in rosiglitazone responders and

non-responders at week 0 and week 8. (c) Serum IL-8 in rosiglitazone

responders and non-responders at week 0 and week 8. n = 9 for

responders and n = 12 for non-responders

Multivariate classification analysis 345

123

Page 10: Multivariate classification analysis of metabolomic data for candidate biomarker discovery in type 2 diabetes mellitus

Belvisi, M. G., Hele, D. J., & Birrell, M. A. (2006). Peroxisome

proliferator-activated receptor gamma agonists as therapy for

chronic airway inflammation. European Journal of Pharmacol-ogy, 533, 101–109. doi:10.1016/j.ejphar.2005.12.048.

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

doi:10.1023/A:1010933404324.

Chinkes, D. L. (2005). Methods for measuring tissue protein

breakdown rate in vivo. Current Opinion in Clinical Nutritionand Metabolic Care, 8, 534–537. doi:10.1097/01.mco.00001

70754.25372.37.

Dice, J. F., & Walker, C. D. (1979). Protein degradation in metabolic

and nutritional disorders. Ciba Foundation Symposium, 75, 331–

350.

Flamez, D., Berger, V., Kruhoffer, M., Orntoft, T., Pipeleers, D., &

Schuit, F. C. (2002). Critical role for cataplerosis via citrate in

glucose-regulated insulin release. Diabetes, 51, 2018–2024. doi:

10.2337/diabetes.51.7.2018.

Fu, L. M., & Fu-Liu, C. S. (2004). Multi-class cancer subtype

classification based on gene expression signatures with reliability

analysis. FEBS Letters, 561, 186–190. doi:10.1016/S0014-5793

(04)00175-9.

Hellberg, S., Sjostrom, M., & Wold, S. (1986). The prediction of

bradykinin potentiating potency of pentapeptides. An example of

a peptide quantitative structure-activity relationship. Acta Che-mica Scandinavica. Series B: Organic Chemistry andBiochemistry, 40, 135–140.

Hom, F. G., Ettinger, B., & Lin, M.-J. (1998). Comparison of serum

fructosamine vs. glycohemoglobin as measures of glycemic

control in a large diabetic population. Acta Diabetologica, 35,

48–51. doi:10.1007/s005920050100.

Howey, J. E. A., Bennet, W. M., Browning, M. C. K., Jung, R. T., &

Fraser, C. G. (1989). Clinical utility of assays of glycosylated

haemoglobin and serum fructosamine compared: Use of data on

biological variation. Diabetic Medicine, 6, 793–796.

Kapetanovic, I. M., Rosenfeld, S., & Izmirlian, G. (2004). Overview

of commonly used bioinformatics methods and their applica-

tions. Annals of the New York Academy of Sciences, 1020, 10–

21. doi:10.1196/annals.1310.003.

Kilpatrick, E. S. (1997). Problems in the assessment of glycaemic control

in diabetes mellitus. Diabetic Medicine, 14, 819–831. doi :10.1002/

(SICI)1096-9136(199710)14:10\819::AID-DIA459[3.0.CO;2-A.

Kirpichnikov, D., McFarlane, S. I., & Sowers, J. R. (2002).

Metformin: An update. Annals of Internal Medicine, 137, 25–33.

Laakso, M. (2002). Lipids in type 2 diabetes. Seminars in VascularMedicine, 2, 59–66. doi:10.1055/s-2002-23096.

Li, L., Tang, H., Wu, Z., et al. (2004). Data mining techniques for

cancer detection using serum proteomic profiling. ArtificialIntelligence in Medicine, 32, 71–83. doi:10.1016/j.artmed.2004.

03.006.

Ma, X. J., Wang, Z., Ryan, P. D., et al. (2004). A two-gene expression

ratio predicts clinical outcome in breast cancer patients treated

with tamoxifen. Cancer Cell, 5, 607–616. doi:10.1016/j.ccr.

2004.05.015.

Meyerson, M., & Carbone, D. (2005). Genomic and proteomic

profiling of lung cancers: Lung cancer classification in the age of

targeted therapy. Journal of Clinical Oncology, 23, 3219–3226.

doi:10.1200/JCO.2005.15.511.

Ostenson, C. G. (2001). The pathophysiology of type 2 diabetes

mellitus: An overview. Acta Physiologica Scandinavica, 171,

241–247. doi:10.1046/j.1365-201x.2001.00826.x.

Petersen, J. L., & McGuire, D. K. (2005). Impaired glucose tolerance

and impaired fasting glucose—A review of diagnosis, clinical

implications and management. Diabetes & Vascular DiseaseResearch; Official Journal of the International Society ofDiabetes and Vascular Disease, 2, 9–15. doi:10.3132/dvdr.

2005.007.

Petricoin, E. F., Ardekani, A. M., Hitt, B. A., et al. (2002). Use of

proteomic patterns in serum to identify ovarian cancer. Lancet,359, 572–577. doi:10.1016/S0140-6736(02)07746-2.

Picardi, A., & Pozzilli, P. (2003). Dynamic tests in the clinical

management of diabetes. Journal of Endocrinological Investi-gation, 26(7, Suppl), 99–106.

Radmacher, M. D., McShane, L. M., & Simon, R. (2002). A paradigm

for class prediction using gene expression profiles. Journal ofComputational Biology, 9, 505–511. doi:10.1089/1066527027

60138592.

Raponi, M., Zhang, Y., Yu, J., et al. (2006). Gene expression

signatures for predicting prognosis of squamous cell and

adenocarcinomas of the lung. Cancer Research, 66, 7466–

7472. doi:10.1158/0008-5472.CAN-06-1191.

Salek, R. M., Maguire, M. L., Bentley, E., et al. (2007). A

metabolomic comparison of urinary changes in type 2 diabetes

in mouse, rat, and human. Physiological Genomics, 29, 99–108.

doi:10.1152/physiolgenomics.00194.2006.

Stumvoll, M., & Haring, H. U. (2002). Glitazones: Clinical effects

and molecular mechanisms. Annals of Medicine, 34, 217–224.

doi:10.1080/713782132.

Tahara, Y., & Shima, K. (1995). Kinetics of HbA1c, glycated

albumin, and fructosamine and analysis of their weight functions

against preceding plasma glucose level. Diabetes Care, 18, 440–

447. doi:10.2337/diacare.18.4.440.

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002).

Diagnosis of multiple cancer types by shrunken centroids of gene

expression. Proceedings of the National Academy of Sciences ofthe United States of America, 99, 6567–6572. doi:10.1073/

pnas.082099299.

‘t Veer, L. J., Dai, H., van de Vijver, M. J., et al. (2002). Gene

expression profiling predicts clinical outcome of breast cancer.

Nature, 415, 530–536. doi:10.1038/415530a.

Wishart, D. S., Tzur, D., Knox, C., et al. (2007). HMDB: The human

metabolome database. Nucleic Acids Research, 35, D521–D526.

doi:10.1093/nar/gkl923.

Young, V. R., & Munro, H. N. (1978). Ntau-methylhistidine (3-

methylhistidine) and muscle protein turnover: An overview.

Federation Proceedings, 37, 2291–2300.

Ziegler, D. (2005). Type 2 diabetes as an inflammatory cardiovascular

disorder. Current Molecular Medicine, 5, 309–322. doi:

10.2174/1566524053766095.

Zozulinska, D., Majchrzak, A., Sobieska, M., Wiktorowicz, K., &

Wierusz-Wysocka, B. (1999). Serum interleukin-8 level is

increased in diabetic patients. Diabetologia, 42, 117–118. doi:

10.1007/s001250051124.

Zuppi, C., Messana, I., Forni, F., Ferrari, F., Rossi, C., & Giardina, B.

(1998). Influence of feeding on metabolite excretion evidenced

by urine 1H NMR spectral profiles: A comparison between

subjects living in rome and subjects living at arctic latitudes

(Svaldbard). Clinica Chimica Acta, 278, 75–79. doi:10.1016/

S0009-8981(98)00132-6.

346 Y. Qiu et al.

123