8
Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data Jingjing Xu a , Shuhui Cai a, , Xuejun Li b , Jiyang Dong a, , Jun Ding a , Zhong Chen a a Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China b The First Afliated Hospital of Xiamen University, Xiamen 361001, China abstract article info Article history: Received 3 September 2011 Received in revised form 8 December 2011 Accepted 7 January 2012 Available online 15 January 2012 Keywords: NMR Metabolomics Urine-serum correlation PLS-DA Statistical two-dimensional correlation spectroscopy combined with pattern recognition is demonstrated for coanalysis of NMR spectroscopic data from different sources. The urine and serum 1 H NMR spectra from metabolomics datasets of diabetes and hyperthyroidism are taken as examples. The intrinsic covariance of certain molecules between urine and serum spectra is identied. The highly urine-serum-correlated metabolites are further analyzed by using the projection to latent structure discriminant analysis (PLS-DA) method. To illus- trate the applicability of the method, the metabolomics datasets of diabetes and hyperthyroidism are imported separately to calculate the corresponding two-dimensional urine-serum correlation coefcient matrixes. The re- sults show that creatinine (δ 4.08) and succinate (δ 2.45) are found to be highly correlated between urine and serum from diabetes patients, and choline (δ 3.21) and pyruvate (δ 2.33) are highly correlated between urine and serum from hyperthyroidism patients. This study offers a new angle of view for interpreting metabolomics data and demonstrates the potential of the correlation analysis of spectra from different biological sources as a new systems biology tool. © 2012 Elsevier B.V. All rights reserved. 1. Introduction Two-dimensional (2D) correlation spectroscopy is originated from NMR spectroscopy [1]. The ability of 2D correlation spectroscopy in determining an order of bands and its superior deconvolution ability makes it widely used in optical spectroscopy. Practical application of so called generalized 2D correlation spectroscopy was enabled in 1986 by Noda [2]. A set of 2D infrared correlation spectra were con- structed for sinusoidally varying dynamic infrared signals by applying a simple cross-correlation analysis [3,4]. 2D covariance analysis [5], 2D correlation coefcient analysis [6] and full generalized 2D correla- tion method [7] were reported and used on infrared, near-infrared, Raman and other types of spectroscopy [812]. The application of generalized correlation analysis [7] for proces- sing 2D arrays of NMR data was rst reported by Eads and Noda [13]. The covariance NMR spectroscopy was introduced subsequently to make the spectra amenable to a wide range of multivariate statis- tical analysis methods [14]. Recently, 2D correlation analysis of meta- bonomics data, termed statistical total correlation (STOCSY) have been pursued through the combination of classication algorithms and correlation analysis [15]. This STOCSY method is not restricted to the 1 H1 H correlation and has been applied to different NMR- active nuclei such as 31P1 H [16,17] and 19 F1 H [18] heteronuclear correlation. Besides, other analytical spectroscopies such as mass spectrometry (MS), gas chromatography (GC) and liquid chromatogra- phy (LC) have been integrated with NMR in statistical heterospectro- scopy (SHY) [19,20] to obtain biological information on metabolic pathway and structural connectivity by examination of the correlation matrix. Presently, we extend the 2D correlation method to coanalyze urine and serum 1 H NMR spectral data. In the traditional NMR- based metabolomics analysis [21,22], well-established multivariate statistical analysis and pattern recognition methods have been com- bined to extract information of samples classication and associated biomarkers from biological source, such as urine, serum or tissue. Dif- ferent types of samples produce different metabolic patterns of bio- logical perturbations and provide complementary information with each other. However, less attention has been paid to their cross corre- lation. To nd out the cross-correlated information of urine and serum, statistical 2D correlation spectroscopy was plotted by calculating Pearson correlation coefcient matrix of urine and serum NMR spectra. The formula for Pearson correlation coefcient matrix calculation is ex- actly identical to that for STOCSY [15]. The metabolites with high corre- lation coefcient were highlighted in the statistical urine-serum correlation spectroscopy. The combination of these correlation results with those from pattern recognition analysis led us to further identica- tion of the molecules in the same metabolic pathway whose Chemometrics and Intelligent Laboratory Systems 112 (2012) 3340 Corresponding authors. E-mail addresses: [email protected] (S. Cai), [email protected] (J. Dong). 0169-7439/$ see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2012.01.003 Contents lists available at SciVerse ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab

Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

Embed Size (px)

Citation preview

Page 1: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

Contents lists available at SciVerse ScienceDirect

Chemometrics and Intelligent Laboratory Systems

j ourna l homepage: www.e lsev ie r .com/ locate /chemolab

Statistical two-dimensional correlation spectroscopy of urine and serum frommetabolomics data

Jingjing Xu a, Shuhui Cai a,⁎, Xuejun Li b, Jiyang Dong a,⁎, Jun Ding a, Zhong Chen a

a Department of Electronic Science, Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University,Xiamen 361005, Chinab The First Affiliated Hospital of Xiamen University, Xiamen 361001, China

⁎ Corresponding authors.E-mail addresses: [email protected] (S. Cai), jydong

0169-7439/$ – see front matter © 2012 Elsevier B.V. Alldoi:10.1016/j.chemolab.2012.01.003

a b s t r a c t

a r t i c l e i n f o

Article history:Received 3 September 2011Received in revised form 8 December 2011Accepted 7 January 2012Available online 15 January 2012

Keywords:NMRMetabolomicsUrine-serum correlationPLS-DA

Statistical two-dimensional correlation spectroscopy combined with pattern recognition is demonstrated forcoanalysis of NMR spectroscopic data from different sources. The urine and serum 1H NMR spectra frommetabolomics datasets of diabetes and hyperthyroidism are taken as examples. The intrinsic covariance ofcertainmolecules between urine and serum spectra is identified. The highly urine-serum-correlatedmetabolitesare further analyzed by using the projection to latent structure discriminant analysis (PLS-DA) method. To illus-trate the applicability of the method, the metabolomics datasets of diabetes and hyperthyroidism are importedseparately to calculate the corresponding two-dimensional urine-serum correlation coefficient matrixes. The re-sults show that creatinine (δ 4.08) and succinate (δ 2.45) are found to be highly correlated between urine andserum from diabetes patients, and choline (δ 3.21) and pyruvate (δ 2.33) are highly correlated between urineand serum from hyperthyroidism patients. This study offers a new angle of view for interpreting metabolomicsdata and demonstrates the potential of the correlation analysis of spectra from different biological sources as anew systems biology tool.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

Two-dimensional (2D) correlation spectroscopy is originated fromNMR spectroscopy [1]. The ability of 2D correlation spectroscopy indetermining an order of bands and its superior deconvolution abilitymakes it widely used in optical spectroscopy. Practical application ofso called generalized 2D correlation spectroscopy was enabled in1986 by Noda [2]. A set of 2D infrared correlation spectra were con-structed for sinusoidally varying dynamic infrared signals by applyinga simple cross-correlation analysis [3,4]. 2D covariance analysis [5],2D correlation coefficient analysis [6] and full generalized 2D correla-tion method [7] were reported and used on infrared, near-infrared,Raman and other types of spectroscopy [8–12].

The application of generalized correlation analysis [7] for proces-sing 2D arrays of NMR data was first reported by Eads and Noda[13]. The covariance NMR spectroscopy was introduced subsequentlyto make the spectra amenable to a wide range of multivariate statis-tical analysis methods [14]. Recently, 2D correlation analysis of meta-bonomics data, termed statistical total correlation (STOCSY) havebeen pursued through the combination of classification algorithmsand correlation analysis [15]. This STOCSY method is not restricted

@xmu.edu.cn (J. Dong).

rights reserved.

to the 1H–1H correlation and has been applied to different NMR-active nuclei such as 31P–1H [16,17] and 19F–1H [18] heteronuclearcorrelation. Besides, other analytical spectroscopies such as massspectrometry (MS), gas chromatography (GC) and liquid chromatogra-phy (LC) have been integrated with NMR in statistical heterospectro-scopy (SHY) [19,20] to obtain biological information on metabolicpathway and structural connectivity by examination of the correlationmatrix.

Presently, we extend the 2D correlation method to coanalyzeurine and serum 1H NMR spectral data. In the traditional NMR-based metabolomics analysis [21,22], well-established multivariatestatistical analysis and pattern recognition methods have been com-bined to extract information of samples classification and associatedbiomarkers from biological source, such as urine, serum or tissue. Dif-ferent types of samples produce different metabolic patterns of bio-logical perturbations and provide complementary information witheach other. However, less attention has been paid to their cross corre-lation. To find out the cross-correlated information of urine andserum, statistical 2D correlation spectroscopywas plotted by calculatingPearson correlation coefficient matrix of urine and serum NMR spectra.The formula for Pearson correlation coefficient matrix calculation is ex-actly identical to that for STOCSY [15]. The metabolites with high corre-lation coefficient were highlighted in the statistical urine-serumcorrelation spectroscopy. The combination of these correlation resultswith those from pattern recognition analysis led us to further identifica-tion of the molecules in the same metabolic pathway whose

Page 2: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

34 J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

concentrationswere interdependent or under some common regulatorymechanism.

In this study, urine and serum 1H NMR spectra frommetabolomicsdatasets of diabetes and hyperthyroidism are introduced to obtain thestatistical urine-serum correlation spectroscopy. The metaboliteswith high correlation coefficients on the correlation map are ana-lyzed. This allows covisualization of the molecules involved in differ-ent sources. Moreover, the statistical urine-serum correlationspectroscopy is integrated with the partial least squares discriminantanalysis (PLS-DA) to interpret the significance of high urine-serum-correlated metabolites. It is worth mentioning that the focus of pre-sent study is tightly on the data analysis rather than on the biochemicalanalysis of group discrimination.

2. Experimental section

2.1. Diabetes dataset

Human urine and serum samples were obtained from a study oftype 2 diabetes mellitus (T2DM) conducted by the First AffiliatedHospital of Xiamen University and Xiamen Diabetes Research Insti-tute [23]. The experimental protocol was approved by local ethicscommittee and all subjects gave written informed consent.

In this dataset, forty-two male subjects aged 30–60 years with adocumented history of stable T2DM and had been previously treatedwith diet and exercise alone, monotherapy or low-dose combinationtherapy were taken as eligible. Fasting plasma glucose at screeningcould not exceed 225 mg/dL for subjects treated with diet and exer-cise alone or 180 mg/dL for subjects receiving monotherapy or low-dose combination therapy. Thirty-three healthy male volunteerswho confirmed without any underlined disease were employed ascontrol group. Urine and serum samples were both collected fromeach volunteer and then transferred into vials containing 1% sodiumazide solution to minimize microbiological contamination, centri-fuged (3000 rpm 10 min for urine, and 6000 rpm 10 min for serum)at 4 °C for solid particle removal, and finally stored at −80 °C untilNMR acquisition.

2.2. Hyperthyroidism dataset

Twelve hyperthyroidism male volunteers aged 19–45 years wererecruited from a metabolomics study of hyperthyroidism conductedby the First Affiliated Hospital of Xiamen University [24]. Urine andserum were both collected from them and sample preparation wasconsistent with the one for the diabetes dataset given above. Twelvesubjects were randomly picked up from the healthy volunteers men-tioned in the diabetes dataset as a control group.

2.3. NMR spectroscopy

For both urine and serum samples, a volume of 0.4 mL was mixedwith 0.2 mL phosphate buffer solution (0.2 M Na2HPO4/NaH2PO4,pH 7.4, 99.9% D2O) to minimize the variation in pH. 1 mM TSP (3-(trimethylsilyl)-1-propionic-2,2,3,3-d4 acid sodium salt) was used asan internal reference standard at δ 0.0. The mixture was pipetted intoa 5 mm NMR tube. The 1H NMR measurements were performed on aVarian (Palo Alto, CA, USA) NMR System 500 MHz spectrometerequipped with a triple-resonance probe. The temperature was set to298 K and 90° pulse lengths were calibrated individually for eachsample.

For the urine samples, a conventional presaturation pulse se-quence for solvent suppression based on the one-dimensional versionof NOESY pulse sequence known as NOESYPR (nuclear Overhausereffect spectroscopy with presaturation: delay-90°-t1-90°-tm-90°-acquisition) was used [25]. The t1 was set to 2 μs, the mixing time tmwas 120 ms, and weak irradiation was applied to suppress the water

signal during the recycling delay of 2 s and the mixing time. For theserum samples, an additional CPMG (Carr-Purcell-Meiboom-Gill) [26]spin echo pulse train with a relaxation time (2nτ) 100 ms and an echotime (τ) 250 μs was incorporated into the above NOESYPR segment. Atotal of 256 scans with a spectral width of 5 kHz were collected for allNMR spectra. All of the signals were zero-filled to 16 k before Fouriertransformation (FT).

The NMR spectra were manually phased, baseline corrected andreferenced to the TSP resonance (δ=0.0). Peaks were alignedmanuallybased on an adaptive intelligent binning algorithm [27] written inMATLAB R2009a (Mathworks Inc., Natick MA) prior to calculating thecorrelation matrix of urine and serum.

2.4. Statistical urine-serum correlation spectroscopy

The NMR spectra were smoothed with Gauss function to reducethe noise interference on calculating the correlation matrix andstored at a resolution of δ 0.002, which retained the intrinsic resolu-tion of NMRmeasurement. Subsequently, each spectrumwas normal-ized to its total integrated area to account for overall variation insample concentration. For simplification, only identical metaboliteslocated on identical coordinates of urine and serum spectra were con-cerned. The section of δ 6.0–9.5 of urine spectra containing reso-nances of aromatic compounds was ignored due to the absence ofrelated resonances in serum spectra. The section of δ 0.5–4.5 ofurine spectra containing representative aliphatic compounds wasimported into MATLAB to calculate Pearson correlation coefficientmatrix with the same section of serum spectra. The sections contain-ing TSP signal (δ 0–0.5), water signal (δ 4.5–6.0), and pure noise sig-nal (beyond δ 9.5) are excluded from the analysis.

The statistical urine-serum correlation spectroscopy is plottedbased on the properties of the covariance matrix Cov=X1

TX2. HereX1 and X2 represent the autoscaled experimental matrixes with di-mensionality of n×υ1 and n×υ2 respectively; n is the number of sam-ples, υ1 and υ2 are the numbers of variables in the spectral matrixesX1 and X2. Generally, X1 and X2 can be used to represent datasetsnot only from NMR spectroscopy involving 1H or other NMR-activenuclei like 31P and 19F, but also from any other spectroscopic, electro-chemical, and multivariate analytical measurements where a samebatch of samples are measured by more than one technique. In pre-sent study, X1 and X2 denote the urine and serum NMR spectral ma-trixes respectively. In either diabetes or hyperthyroidismmetabolomics dataset, samples from the control and experimentalgroups were queued by serial number so as to ensure the correspon-dence of the rows of urine and serum matrixes. The Pearson correla-tion coefficients were then calculated between the intensities of urineand serum NMR bins using the simple MATLAB command ‘corr’where their statistical significances were evaluated by p value. Thecorrelation maps were presented as contour plots with colorbar. If pvalue was greater than 0.05, the correlation coefficient was consid-ered as coincidence and removed from the correlation map. In addi-tion, in each diagram any correlation coefficient falling below thedefault correlation cutoff value (r=0.40 for diabetes dataset andr=0.65 for hyperthyroidism dataset) was assigned to zero.

2.5. Pattern recognition analysis

A supervised pattern recognition method called PLS-DA [28,29]was carried out to model the diabetes and hyperthyroidism datasetsfrom both urine and serum NMR spectra. The PLS-DA is a partialleast squares regression of a set of binary variables Y describing thecategories of a categorical variable on a set of predictor variables X[30]. The PLS-DA projects the original variables X to latent variablesthat focus on class separation, thus obtaining a better classification.The cross-validation parameter Q2 [31] was computed to test the va-lidity of the PLS-DA against over-fitting.

Page 3: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

35J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

After PLS-DA modeling, two sets of regression coefficients corre-sponding to the two datasets were obtained. The regression coeffi-cient (i.e. loading) for a certain variable represents the correlationcoefficient between the variable and the categorical variableY. Cloarec et al. [32] ever proposed a method that computed amodel based on the autoscaled data and then back-transformed theloadings by multiplying all the values by respective standard devia-tion. Such back-transformed loadings were plotted using a colorbarcorresponding to the importance of variables for discrimination. In-spired by this method, here we plotted PLS-DA loadings of the bestclassified component with a colorbar representing the 1D correlationcoefficients between a selected variable and other variables. In thisway, the interpretation of the correlation is straightforward via thesame shape as an NMR spectrumwhere the contributions of variablesto group discrimination are also visual.

3. Results and discussion

The ability of STOCSY to decipher the structures of metabolitesfrom biological samples has been demonstrated [15]. Different reso-nances from a same molecule are highly correlated (correlation coef-ficient r=1 theoretically) though in real condition r may be inferiorto 1 due to spectral noise or peak overlapping. Furthermore, mole-cules in the same biochemical pathway may also exhibit a secondaryhigh correlation coefficient because of their similar or even codepen-dent responses to a stimulus. This would be meaningful for metabolo-mics study to analyze the disturbed metabolic pathway ofmetabolites under pathological status or toxicity insult.

In this study, the metabolic behavior of an identical metabolite inurine and serum is detected by calculating Pearson correlation coeffi-cient matrix between urine and serum NMR spectra. Finding out thecross correlation between urine and serumwill be of great significance.Firstly, fresh plasma is generally obtained from human by venous ex-sanguination or animals by cardiac/abdominal exsanguination. Thewhole plasma is kept without anticoagulant and left to clot at roomtemperature for 1 h prior to centrifugation for serum. Relative toserum, the collection of urine is more assessable, noninvasive andtime-saving. So if a key metabolite was found to be significantly corre-lated between serum and urine spectra, it would be more convenientand efficient to recognize the variations responding to a stimuli from

Fig. 1. Two-dimensional statistical urine-serum correlation spectroscopy of the aliphatic regindicated with small rectangles are from creatinine and succinate, respectively.

urine rather than from serum samples. It makes sense particularly in di-agnosing diabetes mellitus (DM) as the main diagnosis approach is stillplasma examination. If one or more indicators in plasma were highlycorrelated to that in urine, noninvasive diagnosis of DM would be po-tentially realized. Secondly, the statistical urine-serumcorrelation spec-troscopy can be considered as a normalization standard inpreprocessing of metabolomics. Calculating Pearson correlation coeffi-cient is only a preliminary scheme in integrating urine and serumdata. To realize the coanalysis of metabolic information from differentbiological sources, comprehensive multivariate statistical method isnecessary for detailed examination. Metabolites with high urine-serum correlation coefficients indicate relative stability of concentra-tions in both biofluids. If top 10 or more such metabolites were pickedup to construct a normalized standard spectrum, the differences in con-centrations between urine and serum samples can be removed or mini-mized by normalizing them to this standard spectrum. After thisoperation, it is possible to directly integrate urine and serumNMR spec-tral data into a dataset through a complete multivariate statisticalanalysis.

3.1. Statistical urine-serum correlation spectroscopy of diabetes dataset

After the NMR spectra of urine and serum from T2DM group wereacquired, the region from δ 0.5 to 4.5 containing aliphatic metabolitesignals was chosen for the correlation analysis. The resulting contourplot with colorbar on the right side indicating correlation coefficientlevel is given in Fig. 1. Typical urine and serum NMR spectra aregiven on the top and left sides of the plot. The correlation coefficientsare recognized to be not above 0.80 or below −0.80 for all spectralpoints, which may be attributed to asynchronous spectral profilingsand different signal-to-noise ratios of serum and urine NMR spectra.Threshold was selected as r=0.40 for the contours, i.e. the coeffi-cients between −0.40 to 0.40 were ignored. The positive or negativecorrelation coefficients were denoted by red or blue color in the dia-gram. The positive sign indicates synchronous increase or decrease ofsignal intensity in urine and serum NMR spectra, and the negativesign means the contrary.

The correlation information of certainmetabolite in serum to anothermetabolite in urine (i.e. an off-diagonal point) can be directly obtainedfrom Fig. 1, but it is too complicated to find a reasonable physiochemical

ion of NMR spectra of T2DM group (cutoff 0.40). The diagonal peaks at δ 4.08 and 2.45

Page 4: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

36 J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

explanation for such a correlation. Therefore in present study, only themetabolites showing covariance in urine and serum are concernedand explained, i.e. the points we are interested exclusively lie onthe diagonal of statistical urine-serum correlation spectroscopy. Thehighly-correlated points are identified as follows: creatinine (δ 4.08)and succinate (δ 2.45) are positive correlated, whereas D-glucose (δ3.68–3.80) is negative correlated. It implies that for the diabetic group,if creatinine and succinate are found increase in serum, they also in-crease in urine, and vice versa.

Although creatinine (δ 4.08) and succinate (δ 2.45) are found to behighly correlated in the urine and serum of diabetic group, the meta-bolic behaviors of these two metabolites and the relationships be-tween them and other metabolites are still unknown. To interpretthe correlation results, the PLS-DA analysis was introduced. Thescore plots of 1H NMR spectra from diabetic and control groups areshown in Fig. 2. With the ellipse of Hotelling's 95% confidence inter-val, some outliers (two for urine model and four for serum model)were recognized. Whether to remove these outliers from the PLS-DA models depends on the biological rationality of their correspond-ing spectra. The cross-validation was done for both models where thetotal explained variations of Q2 were 0.158 for the urine model and0.368 for the serum model. The points of diabetic and control groupsare dispersive in the urine model and five control ones are embeddedinto the diabetic group. Compact clustering of these two groups is vi-sual in the serum model with three control ones approaching to thediabetic group. In spite of a bit superposition, diabetic and controlgroups are still clearly separated in both urine and serum models, in-dicating inherent metabolic discrepancy under diabetes condition.The variations of many compounds in urine or serum as a response

Fig. 2. PLS-DA score plots of 1H NMR spectra (aliphatic region δ 0.5–4.5) from the dia-betic subjects (1–42, circles) and the control subjects (43–75, inverted triangles).(a) Score plot of the urine samples; (b) Score plot of the serum samples. The ellipseshows the Hotelling's T2-range (significance level 0.05).

to DM have been studied previously [33,34], so their physiochemicalsignificance will not be discussed repeatedly.

To improve the interpretability of the PLS-DA models, the auto-correlation matrixes of both urine and serum spectra were calculatedrespectively [15]. The PLS-DA regression coefficients (correspondingto the first PLS component) were combined with the results of auto-correlation coefficients. Firstly, select one significant variable (creati-nine or succinate in our study) as a driver to obtain 1D Pearson corre-lation coefficients. Secondly, plot the PLS regression coefficients asfunctions of their related chemical shifts with a color code representingthe correlations between the selected variable and other variables. Inthis way the discriminant resonances between different groups can behighlighted. Due to the intrinsic correlation between the concernedvariable and the variables it activates, the vital metabolic pathway ofthe compound can then be identified.

Creatinine (δ 4.08) was first chosen to be the ‘driver’ to calculatethe correlation coefficients with other variables in the urine andserum NMR spectra. The results are shown in Fig. 3, where the signalorientation indicates the higher (positive orientation) or lower (neg-ative orientation) concentration of metabolites in the diabetic grouprelative to the control one. In the urine model, the resonance at δ3.05 (creatinine) exhibit highest correlation to that at δ 4.08(r>0.95) because of their structural connection, and their intensitiesare positively correlated to the diabetic group, i.e. their intensities in-crease relative to the control group. A slight distortion is observed at δ3.05, which is probably attributed to the peak overlapping with crea-tine at δ 3.04. In addition, creatinine (δ 4.08) is highly correlated to D-glucose region (δ 3.68-3.80), succinate (δ 2.45) and unclear variablesat δ 2.91 and 1.75. However, in the serum model, the regression coef-ficient of creatinine (δ 4.08) is low for differentiating these twogroups and confused with its nearby noisy signal. The correlation co-efficients of creatinine to other variables are shown to be lesssignificant.

The same analysis was done for succinate (δ 2.45) and the PLS-DAregression coefficient plot is shown in Fig. 4. In the urine model, theresonance from succinate at δ 2.45 is highly correlated to acetate (δ1.93, r>0.85) and unrecognized region of δ 2.83–2.95 (r>0.75). Creati-nine at δ 4.08 and 3.05 are obviously correlated to succinate withr>0.70. The spectral intensities of these metabolites all increase signifi-cantly in diabetic group upon PLS analysis, consistent with previous re-ports [35]. Acetate is proven to contribute mostly to groupdiscrimination for it has largest regression coefficient. This is related tothe overt renal insufficiency and indicatesmild renal complication in di-abetes patients. In the serum model, succinate exhibits highest correla-tion to variables at around δ 2.92 and 1.73 which may come fromseveral metabolites such as glutarate (about δ 1.80) and methylguani-dine/trimethylamine (about δ 2.88) [36]. Confirmation of thesemetabo-lites needs 2D NMR experiments assisted with mass spectrographicdetection, which is beyond the scope of present study.

3.2. Statistical urine-serum correlation spectroscopy of hyperthyroidismdataset

Hyperthyroidism is the term for overactive behavior of the thyroidgland causing an overproduction of thyroid hormones. The golden in-dicators for diagnosing hyperthyroidism in clinic are thyroxine (T4)and/or triiodothyronine (T3) in blood. A primary metabolomic studyof hyperthyroidism has been done and the possible characteristic me-tabolites in urine and serum have been demonstrated [24]. What weconcern in this paper is the metabolites behaving high correlation be-tween urine and serum of hyperthyroidism patients. Just as the ma-nipulation for diabetes dataset, a statistical urine-serum correlationspectroscopy was obtained from the hyperthyroidism dataset (Fig. 5).

The correlation between the urine and serum NMR spectra fromthe hyperthyroidism group is much stronger than that from the dia-betic group, which is reflected from the correlation cutoff employed:

Page 5: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

Fig. 3. PLS-DA regression coefficient plots of the diabetic group with a colorbar representing the correlations between the selected variable at δ 4.08 (creatinine) and other variables.

37J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

0.65 for the hyperthyroidism group in Fig. 5 and 0.40 for the diabeticgroup in Fig. 1. The larger correlation coefficients for the hyperthyroid-ism dataset are possibly due to fewer samples together with larger ran-domicity for this dataset relative to the diabetes one. The calculation of pvalue is thus necessary to estimate the randomicity. It can be seen fromFig. 5 that the correlation level is globally higher in the hyperthyroidismgroup, implying closer relationship between the urine and serum fromwhichever person. Fig. 5 reveals that choline at δ 3.21 and pyruvate at δ2.33 are highly correlated in serum and urine with r>0.9. The increase

Fig. 4. PLS-DA regression coefficient plots of the diabetic group with a colorbar representing

of choline in serum is probably related to the breakage of membranepermeability in hyperthyroidism [37]. The rate of pyruvate uptake isfound significantly enhanced in heart mitochondria fromtriiodothyronine-treated rats and thereby decreased in serum [38]. De-tailed analysis on the correlations of urine and serum in choline and py-ruvate is done by using the PLS-DA technique.

The PLS-DA score plots of 1HNMR spectra from the hyperthyroidismand control groups are shown in Fig. 6. The ellipse of Hotelling's 95%confidence interval assists to identify a serum sample (No. 8) to be an

the correlations between the selected variable at δ 2.45 (succinate) and other variables.

Page 6: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

Fig. 5. Two-dimensional statistical urine-serum correlation spectroscopy of the aliphatic region of NMR spectra of the hyperthyroidism group (cutoff 0.65). The diagonal peaks at δ 3.21and 2.33 indicated with small rectangles are from choline and pyruvate, respectively.

Fig. 6. PLS-DA score plots of 1H NMR spectra (aliphatic region δ 0.5–4.5) from the hy-perthyroidism subjects (1–12, circles) and the control subjects (13–24, inverted trian-gles). (a) Score plot of the urine samples; (b) Score plot of the serum samples. Theellipse shows the Hotelling's T2-range (significance level 0.05).

38 J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

outlier and thus excluded from the PLS-DA serum model (Fig. 6b). Thecross-validation was done for both urine and serum models. The totalexplained variations of Q2 were 0.448 for the urine model and 0.397for the serummodel. The discrimination between the hyperthyroidismand the control groups is clear in both urine and serum PLS-DAmodels.Moreover, data points from the hyperthyroidism group in both modelsare more dispersive than those from the control group, which is attrib-uted to pathological-induced individual discrepancy. The results ofthese two PLS-DA models show that the discrimination between theurine/serum 1H NMR spectra of the two groups is evident, whichmakes the interpretation of the PLS-DA regression coefficients possible.

Just as the treatment on the diabetes dataset, the high correlationmetabolites — choline (δ 3.21) and pyruvate (δ 2.33) were chosen as“drivers” to draw the PLS-DA regression coefficient plots (corre-sponding to the first PLS component) color-coded according to corre-lation coefficients. As can be seen from the diagram of the urinemodel in Fig. 7, choline is low in regression coefficient, indicating asmall contribution to PLS-DA modeling. Its correlation to other vari-ables is too weak to be figured out except to the creatine at δ 3.04with r=0.68. In the diagram of the serum model, choline peak at δ3.21 is slightly broadened and overlap with the background lipid sig-nal. It manifests strong correlation (r>0.7) to lactate doublet at δ 1.32and acetone singlet at δ 2.21. Besides, parts of the lipid signals at δ0.84 (CH3(CH2)n) and δ 1.22 (CH3CH2CH2) [39] are correlated to cho-line, indicating that the abnormity of choline metabolism in hyper-thyroidism patients co-happens with lipid-like metabolic disorders.

The action of pyruvate is clear to recognize from the PLS-DA re-gression coefficient plot (Fig. 8). In the urine model, pyruvate at δ2.33 is highly correlated to acetone singlet at δ 2.21 with r=0.93.The concentrations of these two metabolites are both decreased inthe hyperthyroidism group relative to the control group. The de-creased level of pyruvate in urine is consistent with its enhanced up-take rate in heart mitochondria and its decreased level in the serum ofrat model [38]. However, in the serummodel, the position of pyruvate

Page 7: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

Fig. 7. PLS-DA regression coefficient plots of the hyperthyroidism group with a colorbar representing the correlations between the selected variable at δ 3.21 (choline) and othervariables.

39J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

is ambiguous due to the wiggling baseline and undetermined reso-nances nearby. Therefore its correlation to other variables is disor-dered and has no biological sense.

4. Conclusion

In this paper we try to explain the metabolomic data from a newpoint of view involving statistical correlation coefficients analysis

Fig. 8. PLS-DA regression coefficient plots of the hyperthyroidism group with a colorbar repvariables.

and PLS-DA-based pattern recognition. The urine and serum NMRspectroscopic data were imported to calculate the Pearson correlationcoefficient matrix where the diagonal of the matrix indicates the co-variation of identical variable in different biofluids. The variablesexhibiting high correlation between urine and serum were pickedout, and the PLS-DA regression coefficient plots with a colorbarexpressing the correlation between these ‘highlighted’ metabolitesand other variables on the NMR spectra were drawn and discussed.

resenting the correlations between the selected variable at δ 2.33 (pyruvate) and other

Page 8: Statistical two-dimensional correlation spectroscopy of urine and serum from metabolomics data

40 J. Xu et al. / Chemometrics and Intelligent Laboratory Systems 112 (2012) 33–40

Diabetes and hyperthyroidism dataset were taken as examples to de-scribe the analysis process and the biological relationship derivedfrom disrupted metabolic pathways were obtained.

Nowadays, a fully integrated statistical analysis of informationfrom various ‘omics’ levels, or various analytical techniques, or variousbiological sources have been becoming a trend for researchers withthe purpose of improving the understanding of biosystems. This workfocuses on the relationships between urine and serum inmetabolomicsdatasets. It is an exploratory study on data integration. Statistical urine-serum correlation spectroscopy provides a kind of biological frameworkwhere the characteristic patterns due to biological perturbation arecomplementary to the results from traditional metabolomics analysis.

Acknowledgement

This work was supported by the NNSF of China (11074209), theScience Research Foundation of Ministry of Health & United FujianProvincial Health and Education Project for Tackling the Key Research(WKJ2008-2-36), and Science and Technology Bureau of Xiamen(WSK0501).

References

[1] W.P. Aue, E. Bartholdi, R.R. Ernst, Two-dimensional spectroscopy. Application tonuclear magnetic resonance, Journal of Chemical Physics 64 (1976) 2229–2246.

[2] I. Noda, Two-dimensional infrared (2D IR) spectroscopy of synthetic and biopolymer,Bulletin of the American Physical Society 31 (1986) 520.

[3] I. Noda, Two-dimensional infrared spectroscopy, Journal of the American ChemicalSociety 111 (1989) 8116–8118.

[4] I. Noda, Two-dimensional infrared (2D IR) spectroscopy: theory and applications,Applied Spectroscopy 44 (1990) 550–561.

[5] C. Marcott, I. Noda, A.E. Dowrey, Enhancing the information content of vibrationalspectra through sample perturbation, Analytica Chimica Acta 250 (1991)131–143.

[6] F.E. Barton, D.S. Himmelsbach, J.H. Duckworth, M.J. Smith, Two-dimensional vi-bration spectroscopy: correlation of mid-and near-infrared regions, AppliedSpectroscopy 46 (1992) 420–429.

[7] I. Noda, Generalized two-dimensional correlation method applicable to infrared,Raman, and other types of spectroscopy, Applied Spectroscopy 47 (1993)1329–1336.

[8] Y. Ozaki, Y. Wang, Two-dimensional near infrared correlation spectroscopy: prin-ciple and its applications, Journal of Near Infrared Spectroscopy 6 (1998) 19–31.

[9] I. Noda, W.M. Allen, S.E. Lindberg, Two-dimensional Raman correlation spectros-copy study of an emulsion copolymerization reaction process, Applied Spectros-copy 63 (2009) 224–232.

[10] S. Sasic, An in-depth analysis of Raman and near-infrared chemical images ofcommon pharmaceutical tablets, Applied Spectroscopy 61 (2007) 239–250.

[11] S. Sasic, Y. Ozaki, Wavelength-wavelength and sample-sample two-dimensionalcorrelation analyses of short-wave near-infrared spectra of raw milk, AppliedSpectroscopy 55 (2001) 163–172.

[12] I. Noda, Y. Ozaki, Chapter 2: Principle of Two-dimensional Correlation Spectroscopy,Two-dimensional Correlation Spectroscopy: Applications in Vibrational and Opticalspectroscopy, John Wiley & Sons Ltd, Chichester, 2004, pp. 15–38.

[13] C.D. Eads, I. Noda, Generalized correlation NMR spectroscopy, Journal of theAmerican Chemical Society 124 (2002) 1111–1118.

[14] R. Bruschweiler, F.L. Zhang, Covariance nuclear magnetic resonance spectroscopy,Journal of Chemical Physics 120 (2004) 5253–5260.

[15] O. Cloarec, M.E. Dumas, A. Craig, R.H. Barton, J. Trygg, J. Hudson, C. Blancher, D.Gauguier, J.C. Lindon, E. Holmes, J. Nicholson, Statistical total correlation spectros-copy: an exploratory approach for latent biomarker identification frommetabolic1H NMR data sets, Analytical Chemistry 77 (2005) 1282–1289.

[16] Y. Wang, O. Cloarec, H.R. Tang, J.C. Lindon, E. Noimes, S. Kochhar, J.K. Nicholson,Magic angle spinning NMR and 1H–31P heteronuclear statistical total correlationspectroscopy of intact human gut biopsies, Analytical Chemistry 80 (2008)1058–1066.

[17] M. Coen, Y.S. Hong, O. Cloarec, C.M. Rhode, M.D. Reily, D.G. Robertson, E. Holmes,J.C. Lindon, J.K. Nicholson, Heteronuclear 1H–31P statistical total correlation NMRspectroscopy of intact liver for metabolic biomarker assignment: application to

galactosamine-induced hepatotoxicity, Analytical Chemistry 79 (2007)8956–8966.

[18] H.C. Keun, T.J. Athersuch, O. Beckonert, Y. Wang, J. Saric, J.P. Shockcor, J.C. Lindon,I.D. Wilson, E. Holmes, J.K. Nicholson, Heteronuclear 19F–1H statistical total corre-lation spectroscopy as a tool in drug metabolism: study of flucloxacillin biotrans-formation, Analytical Chemistry 80 (2008) 1073–1079.

[19] D.J. Crockford, A.D. Maher, K.R. Ahmadi, A. Barrett, R.S. Plumb, I.D. Wilson, J.K.Nicholson, 1H NMR and UPLC-MSE statistical heterospectroscopy: characteriza-tion of drug metabolites (xenometabolome) in epidemiological studies, Analyti-cal Chemistry 80 (2008) 6835–6844.

[20] D.J. Crockford, E. Holmes, J.C. Lindon, R.S. Plumb, S. Zirah, S.J. Bruce, P. Rainville,C.L. Stumpf, J.K. Nicholson, Statistical heterospectroscopy, an approach to the in-tegrated analysis of NMR and UPLC-MS data sets: application in metabonomictoxicology studies, Analytical Chemistry 78 (2006) 363–371.

[21] J.K. Nicholson, J. Connelly, J.C. Lindon, E. Holmes, Metabonomics: a platform forstudying drug toxicity and gene function, Nature Reviews. Drug Discovery 1(2002) 153–161.

[22] J.K. Nicholson, J.C. Lindon, E. Holmes, ‘Metabonomics’: understanding the meta-bolic responses of living systems to pathophysiological stimuli via multivariatestatistical analysis of biological NMR spectroscopic data, Xenobiotica 29 (1999)1181–1189.

[23] J.J. Xu, S.Y. Yang, S.H. Cai, J.Y. Dong, X.J. Li, Z. Chen, Identification of biochemicalchanges in lactovegetarian urine using 1H NMR spectroscopy and pattern recog-nition, Analytical and Bioanalytical Chemistry 396 (2010) 1451–1463.

[24] Y.Y. Wei, C.H. Wang, W. Li, J.Y. Dong, Z. Chen, NMR-based metabonomics study onserum and urine of hyperthyroidism, Chemical Journal of Chinese Universities-Chinese 31 (2010) 279–284.

[25] J.K. Nicholson, P.J. Foxall, M. Spraul, R.D. Farrant, J.C. Lindon, 750 MHz 1H and1H–13C NMR spectroscopy of human blood plasma, Analytical Biochemistry 67(1995) 793–811.

[26] S. Meiboom, D. Gill, Modified spin-echo method for measuring nuclear relaxationtimes, The Review of Scientific Instruments 29 (1958) 688–691.

[27] T. DeMeyer, D. Sinnaeve, B. VanGasse, E. Tsiporkova, E.R. Rietzschel,M.L. De Buyzere,T.C. Gillebert, S. Bekaert, J.C. Martins, W. Van Criekinge, NMR-based characterizationof metabolic alterations in hypertension using an adaptive, intelligent binning algo-rithm, Analytical Chemistry 80 (2008) 3783–3790.

[28] J.C. Lindon, E. Holmes, J.K. Nicholson, Pattern recognition methods and applica-tions in biomedical magnetic resonance, Progress in Nuclear Magnetic ResonanceSpectroscopy 39 (2001) 1–40.

[29] S. Wold, M. Sjostrom, L. Eriksson, PLS-regression: a basic tool of chemometrics,Chemometrics and Intelligent Laboratory Systems 58 (2001) 109–130.

[30] M. Barker, W. Rayens, Partial least squares for discrimination, Journal of Chemo-metrics 17 (2003) 166–173.

[31] J. Trygg, S. Wold, O2-PLS, a two-block (X–Y) latent variable regression (LVR)method with an integral OSC filter, Journal of Chemometrics 17 (2003) 53–64.

[32] O. Cloarec, M.E. Dumas, J. Trygg, A. Craig, R.H. Barton, J.C. Lindon, J.K. Nicholson, E.Holmes, Evaluation of the orthogonal projection on latent structure model limita-tions caused by chemical shift variability and improved visualization of biomarkerchanges in 1H NMR spectroscopic metabonomic studies, Analytical Chemistry 77(2005) 517–526.

[33] J.J. Xu, J. Zhang, J.Y. Dong, S.H. Cai, J.Y. Yang, Z. Chen, Metabonomics studies of in-tact hepatic and renal cortical tissues from diabetic db/db mice using high-resolution magic-angle spinning 1H NMR spectroscopy, Analytical and Bioanaly-tical Chemistry 393 (2009) 1657–1668.

[34] Y. Qiu, D. Rajagopalan, S.C. Connor, D. Damian, L. Zhu, A. Handzel, G.H. Hu, A.Amanullah, S. Bao, N. Woody, D. MacLean, K. Lee, D. Vanderwall, T. Ryan, Multi-variate classification analysis of metabolomic data for candidate biomarker dis-covery in type 2 diabetes mellitus, Metabolomics 4 (2008) 337–346.

[35] S. Cai, T.G. Huo, J.H. Xu, X.M. Lu, S.N. Zheng, F.M. Li, Effect of mitiglinide onStreptozotocin-induced experimental type 2 diabetic rats: a urinary metabo-nomics study based on ultra-performance liquid chromatography-tandem massspectrometry, Journal of Chromatography B 877 (2009) 3619–3624.

[36] I. Messana, F. Forni, F. Ferrari, C. Rossi, B. Giardina, C. Zuppi, Proton nuclear mag-netic resonance spectral profiles of urine in type II diabetic patients, ClinicalChemistry 44 (1998) 1529–1534.

[37] H.Q. Ye, L. Gan, X.L. Yang, H.B. Xu, Membrane-associated cytotoxicity induced byrealgar in promyelocytic leukemia HL-60 cells, Journal of Ethnopharmacology103 (2006) 366–371.

[38] G. Paradies, F.M. Ruggiero, Effect of hyperthyroidism on the transport of pyruvatein rat-heart mitochondria, Biochimica et Biophysica Acta- Bioenergetics 935(1988) 79–86.

[39] J.T. Brindle, J.K. Nicholson, P.M. Schofield, D.J. Grainger, E. Holmes, Application ofchemometrics to 1H NMR spectroscopic data to investigate a relationship betweenhuman serum metabolic profiles and hypertension, Analyst 128 (2003) 32–36.