10
Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products Yulia B. Monakhova, a,b,c * Alexey M. Tsikin, bThomas Kuballa, a Dirk W. Lachenmeier a and Svetlana P. Mushtakova b The major challenge facing NMR spectroscopic mixture analysis is the overlapping of signals and the arising impossibility to easily recover the structures for identication of the individual components and to integrate separated signals for quantica- tion. In this paper, various independent component analysis (ICA) algorithms [mutual information least dependent component analysis (MILCA); stochastic non-negative ICA (SNICA); joint approximate diagonalization of eigenmatrices (JADE); and robust, accurate, direct ICA algorithm (RADICAL)] as well as deconvolution methods [simple-to-use-interactive self-modeling mixture analysis (SIMPLISMA) and multivariate curve resolution-alternating least squares (MCR-ALS)] are applied for simultaneous 1 H NMR spectroscopic determination of organic substances in complex mixtures. Among others, we studied constituents of the following matrices: honey, soft drinks, and liquids used in electronic cigarettes. Good quality spectral resolution of up to eight-component mixtures was achieved (correlation coefcients between resolved and experimental spectra were not less than 0.90). In general, the relative errors in the recovered concentrations were below 12%. SIMPLISMA and MILCA algorithms were found to be preferable for NMR spectra deconvolution and showed similar performance. The proposed method was used for analysis of authentic samples. The resolved ICA concentrations match well with the results of reference gas chromatogra- phymass spectrometry as well as the MCR-ALS algorithm used for comparison. ICA deconvolution considerably improves the application range of direct NMR spectroscopy for analysis of complex mixtures. Copyright © 2014 John Wiley & Sons, Ltd. Keywords: 1 H NMR; independent component analysis; multivariate curve resolution; non-alcoholic beverages; food products; e-cigarettes Introduction NMR has become an important analytical tool for the characteriza- tion of mixtures, for example, food products and beverages [15] or en- vironmental matrices. [6] The NMR spectrum represents a ngerprint of the sample and possesses information about the majority of com- pounds in complex systems. Signals of a wide range of components are therefore present, and the signals are distributed over several thousand data points. [4,7] This amount of data is difcult, if not impossible, to interpret without advanced multivariate methods. [4,7] Overlapping is a serious obstacle to the qualitative and quantita- tive NMR analysis of multicomponent mixtures. For solving this problem, curve-tting algorithms have been implemented in some standard NMR software packages (e.g. TopSpin from Bruker Biospin, Rheinstetten, Germany, or Mestrenova from Mestrelab Research, Santiago de Compostela, Spain). These techniques are aimed at resolving one NMR resonance into two or more signals with strictly restricted shape (i.e. by means of Gaussian or Lorentzian functions). Although being useful in certain circumstances, these methods cannot be used for resolving the whole spectral range and, more- over, are not able to predict of the number of signicant mixture components hidden inside the overlapped peak range and their relative concentrations. Experimental techniques, such as multidimensional NMR, for enhancing the resolution and reducing the overlaps have been proposed, [811] but these methods cannot resolve signals in some cases. Moreover, the longer acquisition time and additional dif- culties in data interpretation required for 2D measurements makes it less efcient for complex samples in routine analysis. Another opportunity would be the hyphenation of NMR with physical separation methods, but they are often characterized with high complexity and are time consuming. For example, it is possible to resolve the structures of individual compounds using methods such as HPLC-NMR and GC-NMR. [1214] However, not every labora- tory can afford such expensive instruments. Moreover, HPLC-NMR is not sensitive and, consequently, requires higher concentrations, and usually, chromatographic columns are overloaded, which often causes severe overlap and sometimes tailing to the chromato- graphic peaks. [15] * Correspondence to: Yulia B. Monakhova, Chemisches und Veterinärunter- suchungsamt (CVUA) Karlsruhe, Weissenburger Strasse 3, 76187 Karlsruhe, Germany. E-mail: [email protected] These authors contributed equally to this work. a Chemisches und Veterinäruntersuchungsamt (CVUA) Karlsruhe, Weissenburger Strasse 3, 76187 Karlsruhe, Germany b Department of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012 Saratov, Russia c Bruker Biospin GmbH, Silberstreifen, 76287 Rheinstetten, Germany Magn. Reson. Chem. (2014) Copyright © 2014 John Wiley & Sons, Ltd. Research article Received: 13 September 2013 Revised: 10 February 2014 Accepted: 14 February 2014 Published online in Wiley Online Library (wileyonlinelibrary.com) DOI 10.1002/mrc.4059

Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Embed Size (px)

Citation preview

Page 1: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Research article

Received: 13 September 2013 Revised: 10 February 2014 Accepted: 14 February 2014 Published online in Wiley Online Library

(wileyonlinelibrary.com) DOI 10.1002/mrc.4059

Independent component analysis (ICA)algorithms for improved spectral deconvolutionof overlapped signals in 1H NMR analysis:application to foods and related productsYulia B. Monakhova,a,b,c*† Alexey M. Tsikin,b† Thomas Kuballa,a

Dirk W. Lachenmeiera and Svetlana P. Mushtakovab

The major challenge facing NMR spectroscopic mixture analysis is the overlapping of signals and the arising impossibility toeasily recover the structures for identification of the individual components and to integrate separated signals for quantifica-tion. In this paper, various independent component analysis (ICA) algorithms [mutual information least dependent componentanalysis (MILCA); stochastic non-negative ICA (SNICA); joint approximate diagonalization of eigenmatrices (JADE); and robust,accurate, direct ICA algorithm (RADICAL)] as well as deconvolution methods [simple-to-use-interactive self-modeling mixtureanalysis (SIMPLISMA) and multivariate curve resolution-alternating least squares (MCR-ALS)] are applied for simultaneous 1HNMR spectroscopic determination of organic substances in complex mixtures. Among others, we studied constituents of thefollowing matrices: honey, soft drinks, and liquids used in electronic cigarettes. Good quality spectral resolution of up toeight-component mixtures was achieved (correlation coefficients between resolved and experimental spectra were not lessthan 0.90). In general, the relative errors in the recovered concentrations were below 12%. SIMPLISMA and MILCA algorithmswere found to be preferable for NMR spectra deconvolution and showed similar performance. The proposed method was usedfor analysis of authentic samples. The resolved ICA concentrations match well with the results of reference gas chromatogra-phy–mass spectrometry as well as the MCR-ALS algorithm used for comparison. ICA deconvolution considerably improvesthe application range of direct NMR spectroscopy for analysis of complex mixtures. Copyright © 2014 John Wiley & Sons, Ltd.

Keywords: 1H NMR; independent component analysis; multivariate curve resolution; non-alcoholic beverages; food products; e-cigarettes

* Correspondence to: Yulia B. Monakhova, Chemisches und Veterinärunter-suchungsamt (CVUA) Karlsruhe, Weissenburger Strasse 3, 76187 Karlsruhe,Germany. E-mail: [email protected]

† These authors contributed equally to this work.

a Chemisches und Veterinäruntersuchungsamt (CVUA) Karlsruhe, WeissenburgerStrasse 3, 76187 Karlsruhe, Germany

b Department of Chemistry, Saratov State University, Astrakhanskaya Street 83,410012 Saratov, Russia

c Bruker Biospin GmbH, Silberstreifen, 76287 Rheinstetten, Germany

Introduction

NMR has become an important analytical tool for the characteriza-tion ofmixtures, for example, food products and beverages[1–5] or en-vironmental matrices.[6] The NMR spectrum represents a fingerprintof the sample and possesses information about the majority of com-pounds in complex systems. Signals of a wide range of componentsare therefore present, and the signals are distributed over severalthousand data points.[4,7] This amount of data is difficult, if notimpossible, to interpret without advanced multivariate methods.[4,7]

Overlapping is a serious obstacle to the qualitative and quantita-tive NMR analysis of multicomponent mixtures. For solving thisproblem, curve-fitting algorithms have been implemented in somestandard NMR software packages (e.g. TopSpin from Bruker Biospin,Rheinstetten, Germany, or Mestrenova from Mestrelab Research,Santiago de Compostela, Spain). These techniques are aimed atresolving one NMR resonance into two or more signals with strictlyrestricted shape (i.e. by means of Gaussian or Lorentzian functions).Although being useful in certain circumstances, these methodscannot be used for resolving the whole spectral range and, more-over, are not able to predict of the number of significant mixturecomponents hidden inside the overlapped peak range and theirrelative concentrations.

Experimental techniques, such as multidimensional NMR, forenhancing the resolution and reducing the overlaps have beenproposed,[8–11] but these methods cannot resolve signals in some

Magn. Reson. Chem. (2014)

cases. Moreover, the longer acquisition time and additional diffi-culties in data interpretation required for 2D measurementsmakes it less efficient for complex samples in routine analysis.

Another opportunity would be the hyphenation of NMR withphysical separation methods, but they are often characterized withhigh complexity and are time consuming. For example, it is possibleto resolve the structures of individual compounds using methodssuch as HPLC-NMR and GC-NMR.[12–14] However, not every labora-tory can afford such expensive instruments. Moreover, HPLC-NMRis not sensitive and, consequently, requires higher concentrations,and usually, chromatographic columns are overloaded, which oftencauses severe overlap and sometimes tailing to the chromato-graphic peaks.[15]

Copyright © 2014 John Wiley & Sons, Ltd.

Page 2: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Y. B. Monakhova et al.

Hence, separating the individual 1D NMR spectra in themixture without any chemical treatment and physical isolationof components is desirable in terms of simplicity and applica-bility. This problem has been partially solved in the applica-tion of chemometric methods to NMR spectroscopic datatreatment.[1,10,16,17] Modeling of NMR-based data often involvesclassical chemometric methods such as principal componentanalysis (PCA) or partial least squares (PLS) regression.[1,10,16,17]

These methods have the objective of describing the mainvariance in the data or the maximum covariance between thepredictor variables and some response variables.[18,19] However,the physical interpretation of the results can be difficult forchemical applications. For instance, PCA scores and loadingsvectors are abstract given that they are constrained to fulfillmaximum variance and orthogonality conditions.[19] The majorproblem is that the loadings obtained do not have real physicalmeaning and one cannot recover the spectra of individualcompounds.[20] For quantitative analysis of complex mixtures,one can apply multivariate regression methods such as PLSregression.[18] This method (as other multivariate calibrationtechniques), however, requires extensive training sets that coverall variations for the matrix under consideration.[21]

For deconvolution of overlapped signals in this paper, wepropose to use an alternative group of approaches, generallytermed ‘blind source separation’ with its most developed branchknown as ‘independent component analysis’ (ICA).[22,23] ICAtechniques were extensively used for solving the overlappingproblem. The idea of ICA came originally from the field of bio-acoustics, in which it has been used to separate the conversationsignals of every person in a room from a set of mixed voicescollected by microphones that are positioned in various placesin the room.[21] Following that the method was transferred tochemistry, various ICA techniques have been previously appliedon different spectroscopic data, such as ultraviolet-visible(UV-VIS), infrared (IR), and fluorescence.[24–31] However, little isknown about the potential of ICA in treatment of NMR data.[32–34]

There are several different algorithms available to perform ICA,depending on the criteria used to determine the statisticaldependence and the method used to maximize it. For example,two recently proposed algorithms – mutual information leastdependent component analysis (MILCA)[25,35] and stochasticnon-negative ICA (SNICA)[25,35,36] – are based on the search forleast dependent (in contrast to independent) mixture compo-nents based on precise numerical estimates of mutual informa-tion as a measure of signal dependence. SNICA additionallyincludes a non-negativity constraint, which is characteristic formany types of spectroscopic measurements.[25,35,36] The jointapproximate diagonalization of eigenmatrices (JADE) algorithmis based on the construction of a fourth-order cumulants arrayof the data and minimization of their sum of squares usingJacobi’s algorithm.[26,37] The robust, accurate, direct ICA algorithm(RADICAL) minimizes the measure of departure from indepen-dence according to the estimated Kullback–Leibler divergencebetween the joint distribution and the product of the marginaldistributions combined with the efficient entropy estimator.[38,39]

The general aim of this study was to evaluate the applicabilityof different ICA algorithms (MILCA,[25,35] SNICA,[25,35,36] JADE,[26,37]

and RADICAL[38,39]) and to compare them with the conventionaldeconvolution methods [multivariate curve resolution-alternatingleast squares (MCR-ALS)[40] and simple-to-use-interactive self-modeling mixture analysis (SIMPLISMA)[41]], for the resolution ofNMR spectra of complex mixtures. Different multicomponent

wileyonlinelibrary.com/journal/mrc Copyright © 201

model mixtures (up to eight components) as well as real samplesof some consumer products (i.e. soft drinks and liquids forelectronic cigarettes [e-cigarettes]) were considered in this study.Moreover, we evaluated the influence of the number of compo-nents in the mixture and the choice of preprocessing techniqueson the quality of spectral deconvolution.

Experimental Section

Apparatus

All NMR measurements were performed on a Bruker Avance 400Ultrashield spectrometer (Bruker Biospin, Rheinstetten, Germany)equipped with a 5-mm SEI probe with Z-gradient coils, using aBruker Automatic Sample Changer (B-ACS 120). A 5-min waitingperiod was applied for temperature equilibration prior to the startof any NMR experiment. All spectra were acquired at 300.0 K. Thedata were acquired automatically under the control of ICON-NMR(Bruker Biospin, Rheinstetten, Germany).

Spectra acquisition and sample preparation

The NMR protocols applied were based on our standard measure-ments established for NMR analysis of cola beverages,[5] milk,[10]

and honey.[7] All 1H NMR spectra were automatically phased andbaseline corrected using the TopSpin 3.0 software package(Bruker Biospin).

NMR spectra of the e-cigarette constituents (propylene glycol;1,3-butanediol; 1,3-propanediol; ethylene glycol; diethyleneglycol; and glycerol) and their mixtures were acquired using Brukerstandard water suppression 1D noesygppr1d pulse sequence with64 scans and four prior dummy scans. The sweep width was19.9914ppm, and the time domain of the free induction decay(FID) was 65536 (65k). For measurements, an appropriate aliquot ofthe stock solution (approximately 1000mg/l) was mixed with water(up to 540μl) and 60μl of NMR buffer [pH 7.4; 1.5M KH2PO4 inD2O, 0.1% 3-(trimethylsilyl)-propionate acid-d4 (TSP), 3mM NaN3].

All mixtures were prepared from stock solutions, which weremade by weighing appropriate amount/volume of compound(1000mg/l) and then dissolving in appropriate solvent. Eachmixture/sample has been measured three times, and an averagespectrum was used for the deconvolution.

Chemometrics

The algorithms used have MATLAB v. 7.0 interfaces (TheMathWorks, Natick, MA, USA) and are available for free at thereferenced websites: MILCA,[35] SNICA,[35] SIMPLISMA,[42] JADE,[43]

and RADICAL.[44] For MILCA and SNICA algorithms, we have alsochosen the optimum number of nearest neighbors Knn as an es-sential parameter for deconvolution. For the MCR-ALS, we usedUnscrambler X version 10.0.1 (Camo Software AS, Oslo, Norway).

First, to cope with small variations in pH or other sample con-ditions such as ionic strength or temperature, simple rectangularbucket tables were obtained from the 1H NMR spectra usingAmix v. 3.9.4 (Bruker Biospin, Rheinstetten, Germany). The bucketwidth varied between 0.01 and 0.25 ppm. The final pre-treateddata were converted to ASCII files and transferred to MATLABor Unscrambler for multivariate analysis.

To characterize the similarity between experimental and calcu-lated concentration matrices, we applied the Amari index.[25] TheAmari index iterates to zero, when the recovered concentrations

4 John Wiley & Sons, Ltd. Magn. Reson. Chem. (2014)

Page 3: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

mical

range

m)

Bucket

width

(ppm)

Num

ber

ofn

earest

neighbors(K

nn),

MILCAalgorithm

–7.96

0.00

031

10

–3.80

0.00

030

8

–4.85

0.00

030

5

–4.10

0.00

030

7

interactiveself-modelingmixture

ICAL).

ICA deconvolution of overlapped NMR spectra

differ from the true ones only in scaling and permutation ofcomponents, and it increases as the quality of deconvolution be-comes poor. In practice, we find that good deconvolution qualityroughly corresponds to Amari indices P< 0.05, whereas P> 0.2generally characterizes unacceptably poor performance.[25]

To assess the similarities between the normalized resolvedspectra and the original experimental (pure) spectra (qualitativeanalysis), we use the correlation coefficient (R) scaled to the[0,1] range (MATLAB implementation).

The number of significant independent components (ICs)was determined by applying the recently proposed ICA-by-Blocks method.[45]

Amariindex

Che

shift (pp

MSN

SIJ

RALS

0.19

0.17

0.02

0.03

0.37

0.03

7.88

0.01

0.57

0.01

0.04

0.55

0.02

3.20

0.03

0.54

0.21

0.45

0.63

0.05

4.70

0.12

0.48

0.03

0.08

0.60

0.10

3.20

tcomponen

tan

alysis

(SNICA);SI,simple-to-use-

indep

enden

tcomponen

tan

alysisalgorithm

(RAD

Results and Discussion

In the practice of quality control of foods, beverages, andmedicinalproducts by 1H NMR, we are sometimes confronted with situations,where the spectra show significant spectral overlap, making identi-fication and quantification impossible.[7,11] For some systems, thisoverlap occurs only for the limited number of samples, where,because of variations in pH and/or instrument condition, theresonances of analyzed compounds move toward each other,being completely resolved in other cases. In more complex cases,however, the spectral overlap is so high that we cannot resolvethe spectra of individual compounds in any sample. These twopossibilities will be discussed in this manuscript.

Table

1.Qualitativean

dquan

titative

NMRan

alysisofmulticomponen

tmixturesusingICAalgorithms

Matrix

Mixture

Substan

ceCorrelationcoefficien

t

MSN

SIJ

RALS

Colabeverag

esCaffeine–sodium

ben

zoate

Caffeine

1.00

1.00

1.00

1.00

0.99

1.00

Sodium

ben

zoate

1.00

1.00

1.00

1.00

1.00

1.00

Energydrinks

Glucuronolactone–

taurine–inositol

Glucuronolactone

1.00

0.99

1.00

1.00

1.00

0.97

Taurine

1.00

0.14

1.00

1.00

1.00

0.98

Inositol

1.00

0.28

1.00

1.00

1.00

0.96

Milk

Sucrose–glucose–

fructose

Sucrose

1.00

0.93

0.91

0.70

0.71

0.89

Glucose

0.95

0.83

0.63

0.85

0.86

0.94

Fructose

0.99

0.92

0.09

0.79

0.88

0.97

Electronic

cigarettes

Glycerol–1,3-butandiol–

1,2-propane

diol–1,

3-propane

diol–

ethylene

glycol–

diethylen

eglycol

Glycerol

0.97

0.52

0.99

0.98

0.96

0.85

1,3-Butaned

iol

0.55

0.64

0.92

0.88

0.83

0.91

1,2-Propan

ediol

0.87

0.30

0.97

0.96

0.97

0.94

1,3-Propan

ediol

0.93

0.67

0.99

0.96

0.97

0.90

Ethylen

eglycol

0.98

0.88

1.00

0.99

1.00

0.89

Diethylen

eglycol

1.00

0.51

0.99

0.95

0.99

0.97

Thebestresultsforeach

mixture

werehighlig

htedin

bold.

M,mutual

inform

ationleastdep

enden

tcomponen

tan

alysis

(MILCA);SN

,stochasticnon-neg

ativeindep

enden

analysis(SIM

PLISMA);J,jointap

proximatediagonalizationofeigen

matrices(JADE);R

,rob

ust,accurate,d

irect

Analysis of model mixtures with low spectral overlap

Multicomponent mixtures of organic compounds in beveragescan be a good example considering the first and simplest groupof our systems. We have already published results for quantita-tive analysis of cola beverages,[5] energy drinks,[11] and milk.[10]

It was found that NMR spectroscopy is an efficient tool forsimultaneous quantification of selected constituents in theseproducts.[5,10,11]

However, for these matrices, difficulties still exist in somecases.[5,11] The quantification of nine individual compounds incola beverages was possible using external calibration curvesand applying TSP as internal standard.[5] However, in some cases,the singlet of caffeine (at δ 7.90 ppm) in the aromatic rangepartially overlapped with a broad multiplet of sodium benzoate(at δ 7.92 ppm) making precise quantification of both analytesproblematic. Different ICA algorithms (MILCA, SNICA, JADE,and RADICAL) as well as spectra deconvolution approaches(SIMPLISMA and MCR-ALS) were found to cope well with thedeconvolution of the spectra of this simple binary mixture(qualitative analysis), which results in high values of correlationcoefficients (Table 1). SIMPLISMA and JADE methods gave thelowest Amari indices, meaning better concentration estimation(quantitative analysis).

In general, resonances in the aromatic range are preferablefor quantification in 1H NMR, because they are more oftenundisturbed by the signals of other compounds. Therefore, forother matrices such as energy drinks, caffeine was never a prob-lematic compound for quantification: the singlet at δ 7.90 ppmcan be perfectly quantified in the case of energy drinks.[11]

However, one common feature for this group of products is anextensive spectral overlap of chemical shifts in themid-field region,which obscures some compounds of interest such as inositol,taurine, and glucuronolactone.[11] Although J-resolved spectra can

Magn. Reson. Chem. (2014) Copyright © 2014 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/mrc

Page 4: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

B

Inte

nsity

[A.U

.]

ppm

glycerol 1,3-butanediol 1,2-propanediol 1,3-propanediol ethylene glycol diethylene glycol

4.0 3.9 3.8 3.7 3.6 3.5 3.4

0.00

0.03

0.06

0.09

4.0 3.9 3.8 3.7 3.6 3.5 3.4

0

300

600

900

Inte

nsity

[A.U

.]

ppm

mixture1 mixture2 mixture3 mixture4 mixture5 mixture6

A

Figure 1. (A) Experimental spectra of mixtures consisted of glycerol; 1,3-butanediol; 1,2-propanediol; 1,3-propanediol; ethylene glycol; anddiethylene glycol: Relative concentrations investigated are 3 : 3 : 2 : 2 : 1 : 1(mixture 1), 3 : 2 : 2 : 3 : 1 : 1 (mixture 2), 2 : 2 : 1 : 1 : 3 : 3 (mixture 3),3 : 2 : 1 : 1 : 2 : 3 (mixture 4), 2 : 1 : 1 : 1 : 1 : 1 (mixture 5), and 1 : 1 : 1 : 1 : 1 : 1(mixture 6), where 1 corresponds to 100mg/l; (B) resolved spectra of purecompounds (method SIMPLISMA). Correlation coefficients for each resolvedspectrum compared with pure reference spectra are given in Table 1.

Y. B. Monakhova et al.

be a good solution, in some cases, it was not possible to resolvetarget resonances even using 2D NMR.[11] Thus, more advancedtechniques, such as ICA, would be required for these samples.To assess the applicability of ICA algorithms, several multi-

component mixtures containing inositol, taurine, and glucurono-lactone in different concentration ratios have been studied. Weobtained maximum values of correlation coefficients for threecompounds using all ICA methods except SNICA, which indicatesthat the qualitative analysis of mixtures is performed with a highlevel of confidence (Table 1). Regarding quantitative analysis,MILCA and SIMPLISMA algorithms showed the best results (Amariindex was 0.01 in both cases) (Table 1).A general observation in NMR spectra of non-alcoholic

beverages is intensive resonances of major compounds (mainly,sugars) in the mid-field region. Usually, they do not only overlapwith each other in the mixtures but also obscure the resonancesof some minor compounds such as organic acids.[5,7] Therefore,deconvolution of NMR spectra of multicomponent mixtures ofsugars has practical importance. To check this possibility, weprovided deconvolution of experimental NMR spectra of ternarysucrose–glucose–fructose mixtures. These carbohydrates can befound together in many matrices such as milk and honey.[7,10]

The results of qualitative (correlation coefficients) and quantita-tive (Amari indices) analyses indicate a very good fit with the‘ground truth’ (the experimental spectra of pure compoundsand their relative concentrations) data. In this case, MILCA wasthe best choice for spectral deconvolution (Table 1).In our ongoing project aimed at development of NMR-based

method to ascertain the content of several constituents of e-cigarettes, we encountered some spectral overlap in the mid-fieldregion. Therefore, in this study, we consider six-component mix-tures of major solvents in e-cigarettes (glycerol; 1,3-butanediol;1,2-propanediol; 1,3-propanediol; ethylene glycol; and diethyleneglycol). As an example, the experimental spectra of the six-component system along with the results of qualitative analysisare shown in Fig. 1. Table 1 demonstrates the results of quantita-tive analysis expressed as correlation coefficients. It can be con-cluded that SIMPLISMA showed the best spectra deconvolutionperformance, the correlation coefficients ranged between 0.92for 1,3-butanediol and 1.00 for ethylene glycol; the Amari indexof 0.03 obtained with this method showed that also quantitativeinformation can be obtained. This value is comparable withthose obtained for previously studied binary and ternary mixtures(Table 1). The JADE algorithm is next in efficiency followed bythe RADICAL and MILCA algorithms. SNICA was not able toresolve two of six compounds – diethylene glycol (R= 0.51) and1,3-propanediol (R= 0.30). In general, the performance of thismethod was the worst regarding both Amari indices and correla-tion coefficients for all investigated mixtures (Table 1).Analyzing results from Table 1, we can conclude that ICA

copes well with deconvolution of model mixtures with up to sixcomponents. In all investigated cases, we can choose an appro-priate method, which can estimate spectra of pure constituentswith almost maximum values of correlation coefficients andprovide subsequent quantitative analysis with a relative error inconcentration below 10%. In most cases, however, MILCA andSIMPLISMA ensured better deconvolution quality. MCR-ALS alsoprovided stable results for all investigated mixtures. AlthoughICA algorithms are not always necessary for analyzing thesemixtures by NMR, their usage can be very reasonable whenspectral overlap occurs (e.g. for e-cigarettes, direct quantificationwas not possible in about 10% of all cases).

wileyonlinelibrary.com/journal/mrc Copyright © 201

It should be noted that ICA deconvolution capability providesan opportunity to improve chemical interpretability of the 1HNMR spectra of complex mixtures. After resolving of the 1HNMR spectrum into significant number of components, one canidentify the structures of known analytes based on the directcomparison with the experimental ones. Moreover, when acomponent in the mixture is unknown, traditional spectral inter-pretation of the peaks of the ICA-resolved spectra and theoreticalchemical shift prediction can be conducted leading to thestructure of the compound. This is probably the main advantageof ICA over other approaches that give only ‘abstract’ loadingssuch as PLS.

Analysis of model mixtures with high spectral overlap

Honey represents an interesting matrix to test ICA methods asthe spectra contain many overlapped resonances in the mid-fieldregion.[7] We were able to find 13 compounds, for which at leastone resolved unambiguous resonance could be identified.[7] Forother substances studied (e.g. isomeric forms of sugar), direct

4 John Wiley & Sons, Ltd. Magn. Reson. Chem. (2014)

Page 5: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

ICA deconvolution of overlapped NMR spectra

quantification with integration was not possible because of theextensive spectra overlap.

To examine the applicability of ICA in qualitative and quantita-tive analysis of honey’s constituents, we decomposed spectra ofvarious multicomponent mixtures with different composition(up to eight substances). The following compounds were chosenin our analysis: arabinose, fructose, galactose, sodium gluconate,sodium glucuronate, sucrose, fructose, and xylitol. Mixtureswere different in the number of components and composition(Table 2). Moreover, for the eight-component mixture, spectrawith two different bucket widths were resolved.

As an example, experimental spectra of the six-componentarabinose–fructose–galactose–sodium gluconate–sodium glucuronate–sucrose system are shown on Fig. 2A. Figure 2B exemplarydepicts the resolved arabinose spectrum in comparison withthe experimental one (experimental and resolved spectra of allother compounds in this mixture can be found in the supportinginformation). The correlation coefficients between the extractedand experimental spectra for this and some others mixtures, aslisted in Table 2, are found in best cases above 0.95 for fourcomponents (MILCA), 1.0 for six components (SNICA), and 0.52for eight components (SIMPLISMA) also suggesting that thespectra of pure compounds are appropriately extracted from theoverlapping NMR signals.

Results of quantitative analysis expressed as Amari index(ranging between 0.01 for four-component systems and 0.08for eight-component systems) are reported in Table 2. Thesevalues mean that the relative error in concentrations was be-low 12% and indicate that the relative contents of the compo-nents in the mixtures can be calculated from the overlappingNMR data by means of ICA. The absolute concentrations canbe found if the concentration of any compound in one mixtureis provided.

Based on the obtained results, we can evaluate the numberof components in mixtures that can be successfully resolved byICA. Obviously, the greater the number of components in themixture, the lower the performance of the ICA algorithms(Tables 1 and 2), which is in line with our previous observationsin UV-VIS, fluorescence, and IR spectroscopy.[25,27,46] Based onour experience with other spectroscopic techniques (UV-VIS,IR, and fluorescence), ICA algorithms generally cannot correctlyrecover more than six to eight independent components fromthe experimental mixture spectra.[25,27,46] We have also foundthat both the nature of spectra (compare broad UV-VIS bandswith narrow NMR signals) and the extent of the spectra overlapplay a significant role in decomposition.[25,27,46] In the casesconsidered in this study, we can postulate that eight is aborderline value for the number of components in mixtureregarding quantitative analysis (Amari indices never exceed0.15). Again, the results of SIMPLISMA, SNICA, and MILCA (espe-cially for more complex mixtures) are better than the ones ofJADE and RADICAL. We have also found that ICA techniquesare able to successfully recover up to seven ICs. The correlationcoefficients are unacceptable for the eighth compound for allICA algorithms (Table 2). On the other hand, the nature ofmixture components influences spectral characteristics ofcompounds (i.e. spectral width and degree of overlap) and,consequently, the quality of deconvolution. Probably, for othermixtures with more components with distinct peaks, ICA wouldperform also well. This, however, has only theoretical impor-tance, because in practice, there is no need to resolve not-overlapped signals.

Magn. Reson. Chem. (2014) Copyright © 2014 John Wiley

Determination of the number of significant compounds

The knowledge about the number of analytes in a given mixtureis crucial for successful ICA deconvolution. It is important forexperimental design as well, because when the number of inputspectra is less than the number of components, the problem isundetermined and the results are erroneous.

In previous studies, the number of significant ICs was deter-mined by the variation of the resolved spectra profiles duringdeconvolution with different numbers of ICs or based on a prioriknowledge.[24,27,34] For example, additional components canrepresent phase and baseline distortions or noise, and theirinclusion did not alter deconvolution.

Several specialized methods for determining the number of ICshave been proposed; each of them has some disadvantages.[47,48]

In this study, we used the recently proposed ICA-by-Blocksmethod.[45] The method starts with splitting the data matrix intoB blocks consisting of approximately the same number ofsamples. For each block, ICA models with different numbers ofICs are computed, and the models with the same number of ICsare then compared by calculating absolute correlation betweeneach pair of signals. The optimum number of ICs can be foundby a sharp decline in correlation coefficient values.[45]

Figure 3 shows average correlation coefficients betweenpairs of signals (experimental/resolved) for the ICA modelswith different numbers of components for the ternary systemglucuronolactone–taurine–inositol as an example. We have alsoexamined the number of blocks and the number of spectra ineach block required for the deconvolution. We have found thattwo blocks is the optimum number for the algorithm used andthe number of spectra can be equal to the number of ICs. Wehave observed that for our experimental mixtures with highlyoverlapped NMR signals of more than four components, ICA-by-Blocks failed to produce good results. This, however, does notmean that this is the threshold for the method in general, as itcould produce better results for less overlapped mixture spectra.

As can be seen, up to models with three ICs, the correlationsbetween corresponding ICs from each block are high, indicatingthat similar ICs are extracted in each segment. Therefore, it canbe concluded that these ICs are significant. Addition of moreICs to the model leads to very poor correlations between ICs ofthe two blocks, indicating that the fourth extracted IC is not validand almost exclusively based on background noise.

Previously, ICA-by-Blocks was applied only to the fluorescenceand IR data as test sets.[45] In this paper, therefore, we expandthe range of possible applications also to 1H NMR spectra. Theadvantages of this algorithm include the ability to determinethe number of ICs in the system without a priori information,whereas necessity of large experimental data sets (to enablethe computation of an ICA model with sufficient ICs) can beconsidered as the limitation of the method. One should alsoguarantee good and comparable performance of an ICA algorithmfor each block in order to match the resolved ICs.

Influence of spectral pretreatment on ICA deconvolution

No scaling was applied to our data before curve resolutioncalculation (which was necessary for exploratory data analysissuch PCA[7]), because the objective was to resolve physicallymeaningful concentrations, such that a zero concentrationcorresponds to zero intensity rather than the mean intensity ofthe sampled population.

& Sons, Ltd. wileyonlinelibrary.com/journal/mrc

Page 6: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Table

2.NMRan

alysisofhoney

constituen

tsin

multicomponen

tmixturesusingICAalgorithms

Mixture

Substan

ceCorrelationcoefficien

tAmariindex

Chem

ical

shift

range(ppm)

Bucket

width

(ppm)

Number

of

nearest

neighbors

(Knn),MILCA

algorithm

MSN

SIJ

RM

SNSI

JR

Arabinose–fructose–galactose

–sodium

gluconate

Arabinose

1.00

1.00

1.00

1.00

1.00

0.07

0.01

0.01

0.19

0.68

3.11

–4.69

0.01

10

Fructose

1.00

1.00

1.00

0.77

0.99

Galactose

0.95

1.00

1.00

0.90

0.91

Sodium

gluconate

1.00

1.00

1.00

0.97

0.99

Arabinose–fructose–galactose–

sodium

gluconate–sodium

glucuronate–sucrose

Arabinose

0.99

1.00

0.98

0.92

0.99

0.06

0.01

0.02

0.15

0.65

3.11

–4.69

0.01

15

Fructose

1.00

1.00

0.99

0.76

0.99

Galactose

0.94

1.00

0.99

0.90

0.86

Sodium

gluconate

1.00

1.00

1.00

0.97

0.97

Sodium

glucuronate

0.99

1.00

1.00

0.95

1.00

Sucrose

0.98

1.00

1.00

0.96

1.00

Arabinose–fructose–galactose–

sodium

gluconate–sodium

glucuronate–sucrose–

fructose–xylitol

Arabinose

0.93

0.90

0.62

0.53

0.97

0.15

0.16

0.12

0.29

0.67

3.11

–4.69

0.01

13

Fructose

0.91

0.86

0.92

0.62

0.93

Galactose

0.74

0.85

0.52

0.77

0.61

Sodium

gluconate

0.99

0.97

0.97

0.96

0.98

Sodium

glucuronate

0.96

0.71

0.88

0.86

0.94

Sucrose

0.79

0.85

0.86

0.92

0.92

Fructose

0.88

0.86

0.89

0.09

0.90

Xylitol

0.21

0.47

0.67

0.38

0.13

Arabinose–fructose–galactose–

sodium

gluconate–sodium

glucuronate–sucrose–

fructose–xylitol

Arabinose

0.98

0.93

0.58

0.88

0.96

0.12

0.16

0.08

0.20

0.66

3.11

–4.69

0.00

12

Fructose

0.96

0.80

0.88

0.62

0.91

Galactose

0.74

0.78

0.56

0.79

0.75

Sodium

gluconate

0.99

0.93

0.96

0.97

0.98

Sodium

glucuronate

0.92

0.87

0.85

0.94

0.96

Sucrose

0.85

0.82

0.82

0.91

0.80

Fructose

0.95

0.85

0.85

0.92

0.92

Xylitol

0.18

0.46

0.06

0.00

0.02

Thebestresultsforeach

mixture

werehighlig

htedin

bold.

M,mutual

inform

ationleastdep

enden

tcomponen

tan

alysis

(MILCA);SN

,stochasticnon-neg

ativeindep

enden

tcomponen

tan

alysis(SNICA);SI,simple-to-use-in

teractiveself-modelingmixture

analysis(SIM

PLISMA);J,jointap

proximatediagonalizationofeigen

matrices(JADE);R

,robust,accurate,d

irectindep

enden

tcomponen

tan

alysisalgorithm

(RADICAL).

Y. B. Monakhova et al.

wileyonlinelibrary.com/journal/mrc Copyright © 2014 John Wiley & Sons, Ltd. Magn. Reson. Chem. (2014)

Page 7: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

B

Nor

mal

ized

inte

nsity

ppm

experimental spectrum resolved spectrum

4.5 4.2 3.9 3.6 3.3

0.0

0.2

0.4

0.6

0.8

1.0

4.5 4.2 3.9 3.6 3.3

0.00

0.02

0.04

0.06

Inte

nsity

[A.U

.]

ppm

mixture1 mixture2 mixture3 mixture4 mixture5 mixture6

A

Figure 2. (A) Experimental spectra of mixtures consisted of arabinose,fructose, galactose, sodium gluconate, sodium glucuronate, and sucrose:Relative concentrations investigated are 3 : 3 : 2 : 2 : 1 : 1 (mixture 1),3 : 2 : 2 : 3 : 1 : 1 (mixture 2), 2 : 2 : 1 : 1 : 3 : 3 (mixture 3), 3 : 2 : 1 : 1 : 2 : 3(mixture 4), 2 : 1 : 1 : 1 : 1 : 1 (mixture 5), and 1 : 1 : 1 : 1 : 1 : 1 (mixture 6),where 1 corresponds to 500mg/l; (B) resolved and experimental spectraof arabinose (R=0.99) (method MILCA). Correlation coefficients for otherresolved spectra are given in Table 2.

Figure 3. Average correlation coefficients between pairs of signals(experimental/resolved) for the ICA models with different numbersof components for the glucuronolactone–taurine–inositol mixture(MILCA algorithm).

ICA deconvolution of overlapped NMR spectra

Magn. Reson. Chem. (2014) Copyright © 2014 John Wiley

On the other side, in NMR spectra the chemical shifts of peakscan vary as a result of variations in physicochemical factors, e.g.temperature, concentration, pH, and ion strength of the sur-rounding matrix that affect the sample.[49,50] To have a resolvabledata matrix of different samples, it is necessary to correct theshifts to ensure that all the samples are aligned in frequencythroughout the data matrix.

Some alignment methods such as co-shifting and warpingwere proven to be necessary in the modeling of NMR spectraof ternary mixtures of alcohols.[51,52] 2D NMR spectra can bealigned using reference deconvolution or free induction decaydeconvolution for lineshape enhancement method.[53] Chemicalshift correction can be also carried out by using self-organizingmaps.[34] In another work, a new mathematical transformationof NMR is proposed for the conversion of labile to inert signals,which make it possible to study the acid–base equilibrium ofnucleotides.[54]

Therefore, we have hypothesized that alignment of thespectral data is also fundamental for the successful ICAdeconvolution. For this purpose, we used bucketing as a standardmethod for NMR spectral pretreatment before multivariateanalysis, which also reduces the number of data points and savescomputational time.[5,7,55] Bucketing was also used for 1H NMRspectra preprocessing in blood serum metabolomics.[56]

Table 3 reports the results of qualitative (correlation coefficients)and quantitative analysis (Amari indices) of the four-componentsucrose–glucose–fructose–saccharin system by the MILCAmethod.It can be seen that bucketing is a necessary preprocessing stepfor successful spectra resolution, because original data of 65536points were poorly decomposed and the deconvolution time wasunacceptable (Table 3). We performed, therefore, bucketing withdifferent bucket widths (0.01–0.25 ppm) of the initial set of data.The best quantitative results were obtained for 0.04ppm bucketwidth, and the deconvolution quality is becoming slightly worsewhen the spectra resolution decreases or increases. Regardingqualitative analysis, comparable performance was achieved byanalyzing each data set. Therefore, keeping in mind also the timerequired for calculation (Table 3), we think that 0.04ppm bucketwidth is optimum for deconvolution. The same result was obtainedfor eight-component mixtures of honey compounds supportingthe fact that the 0.04ppm bucket width is the optimum solutionfor analyzing of food products by NMR.

Other mathematical transformations were proven to beimportant for NMR data. For example, in case of the SIMPLISMAalgorithm, the use of second-derivative spectra proved to behighly powerful in the resolving NMR data sets.[57] This prepro-cessing step is important when dealing with overlapping featureswith a wide peak width variation and to cope with significantbaseline problems[57] and, based on our experience, also didnot influence the interpretability of the resolved ICs.[25] In ourcase, however, derivatives did not increase the efficiency ofICA performance.

Application to authentic samples

In this section, we report the results of the application of ICAmethods to the NMR spectra of real samples, which is an impor-tant step in the evaluation of the proposed methodology.

First, we applied chemometric algorithms to the analysis offructose, sucrose, and glucose in cola beverages from theGerman market. We measured the spectra of cola beverages ata constant pH of 4.5 to prevent spectral band shifts (Fig. 4A).

& Sons, Ltd. wileyonlinelibrary.com/journal/mrc

Page 8: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Table 3. Influence of NMR spectral resolution on the deconvolution performance (algorithm MILCA)

Parameter Number of data points

Original data, 65 536 data points 2 000 (0.01)a 492 (0.04) 246 (0.08) 164 (0.12) 123 (0.16) 98 (0.20) 82 (0.25)

Amari 0.45 0.10 0.08 0.11 0.14 0.18 0.15 0.21

R Sucrose 0.81 0.98 0.99 1.00 0.99 1.00 0.99 1.00

Glucose 0.78 0.92 0.99 0.99 0.98 0.97 0.98 0.94

Fructose 0.76 0.95 1.00 0.99 0.97 0.96 0.97 0.91

Saccharin 0.68 0.97 0.99 0.98 1.00 0.99 0.98 0.97

Deconvolution

time (s)b5 320 320 120 60 40 30 25 20

aBucket width is shown in brackets (ppm).bDetermined on Intel Pentium IV at 1700MHz processor.

0.0

0.2

0.4

0.6

0.8

1.0

Inte

nsity

[A.U

.]

ppm

fructose (R=0.99) sucrose (R=1.00) glucose (R=0.95)

B

4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2

4.0 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2

0

50000

100000

150000

200000 Sample1 Sample2 Sample3 Sample4

Inte

nsity

[A.U

.]

ppm

A

Figure 4. (A) NMR spectra of the investigated cola beverages; (B)resolved spectra with the correlation coefficients values (method MILCA).

Y. B. Monakhova et al.

Applying the MILCA approach, the spectra of individual com-pounds were extracted with acceptable values of correlationcoefficients (Fig. 4B), and the concentrations of compounds inthe original beverages were obtained (Table 4). It should benoted that comparable results were obtained by the MCR-ALSmethod. The data presented in Table 4 demonstrate the highquality of deconvolution and, therefore, ICA is applicable foranalysis of sugars in soft drinks, which was so far impossiblewithout chemometric tools.

wileyonlinelibrary.com/journal/mrc Copyright © 201

Next, we performed the qualitative and quantitative spectro-scopic analysis of e-cigarette liquids. These products are a partof an emerging class of electronic nicotine-delivery systems,which aerosolize nicotine (if contained) and produce a vaporthat emulates that of cigarettes but purportedly has fewertraditional toxins than tobacco cigarettes. We selected authen-tic samples from our database, for which we were not able toquantify solvents using integration because of spectral overlap.The spectra of individual compounds (1,2-propanediol; ethyl-ene glycol; glycerol; and 1,3-propanediol) were extracted, andthe values of the correlation coefficients were not less than0.95 for all compounds. The concentrations of compounds ine-cigarettes were obtained and compared with the results ofGC/MS reference analysis (Table 5). The other analytes likelyto occur in e-cigarette liquids (1,3-butanediol and diethyleneglycol) were not detected by each experimental technique.The results of the two experimental methods correspond wellto each other.

Spectral deconvolution techniques in NMR analysis

Chemometric methods have already been used for the resolutionof overlapping NMR spectra. One of the most common methodsis MCR-ALS developed by the group of Tauler.[40] For example,using this method, Winning et al. analyzed a set of NMR spectraof 231 mixtures containing propanol, butanol, and pentanol.[51]

MCR-ALS was applied for the analysis of NMR data obtained fromkinetic studies, which allowed to determine the number ofiteration complexes, kinetic concentration profiles, and pureNMR spectra of all detected species.[58,59] MCR-ALS was also usedto model NMR metabolomics data[56,60] and to decompose up tofour-component simulated and experimental model mixtures byDOSY experiments.[53,61]

MCR-ALS and orthogonal-projection approach were applied tothe series of Hahn-echo 1H NMR spectra of a cross-linked unsatu-rated polyester resin.[62] This is one of a few examples, whereMCR was used to decompose solid state NMR data. Anotherstudy utilized magic angel spinning 13C NMR to analyze non-depolymerized alginate powders obtained from brown seaweedfor the determination of mannuronic acid/guluronic acid ratio.[63]

Although the results were quite promising in both studies, themethod hardly can be recommended as routine tool as this typeof NMR spectroscopy is not very convenient in routine foodanalysis. The SIMPLISMA approach was applied to resolve NMRspectra of surfactants.[57]

4 John Wiley & Sons, Ltd. Magn. Reson. Chem. (2014)

Page 9: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Table 4. Quantitative analysis of cola beverages ([g/L], n=3, P=0.95)

Substance Sample 1 Sample 2 Sample 3 Sample 4

Sucrose MILCA 0.38 ± 0.05 0.36 ± 0.04 0.38 ± 0.05 0.35± 0.05

MCR-ALS 0.42 ± 0.02 0.40 ± 0.05 0.30 ± 0.03 0.40± 0.01

Glucose MILCA 3.4 ± 0.3 3.9 ± 0.3 3.8 ± 0.3 3.6 ± 0.2

MCR-ALS 3.1 ± 0.2 4.2 ± 0.3 4.1 ± 0.6 3.8 ± 0.2

Fructose MILCA 6.2 ± 0.4 7.2 ± 0.5 6.7 ± 0.4 7.7 ± 0.5

MCR-ALS 7.0 ± 0.5 6.8 ± 0.4 6.9 ± 0.5 7.4 ± 0.4

MILCA, mutual information least dependent component analysis; MCR-ALS, multivariate curve resolution-alternating least squares.

Table 5. Quantitative analysis of e-cigarettes (g/100 g)a

1,2-Propanediol Ethylene glycol Glycerol 1,3-Butanediol 1,3-Propanediol Diethylene glycol

NMR-ICAb GC/MS NMR-ICA GC/MS NMR-ICA GC/MS NMR-ICA GC/MS NMR-ICA GC/MS NMR-ICA GC/MS

1 0.40 0.33 74.7 70.6 20.8 16.4 n.d. n.d. n.d. n.d. n.d. n.d.

2 1.1 1.1 73.5 77.6 8.6 9.5 n.d. n.d. n.d. n.d. n.d. n.d.

3 0.59 0.33 70.9 72.9 15.3 16.2 n.d. n.d. n.d. n.d. n.d. n.d.

4 2.1 1.9 70.9 76.5 12.7 9.1 n.d. n.d. n.d. n.d. n.d. n.d.

5 52.3 68.3 n.d. n.d. 23.5 26 n.d. n.d. n.d. n.d. n.d. n.d.

6 47.8 -c 7.8 - 62.7 - n.d. - 10.4 - n.d. -

7 47.0 - 6.0 - 48.5 - n.d. - 4.8 - n.d. -

8 50.4 - 5.6 - 45.2 - n.d. - 4.2 - n.d. -

ICA, independent component analysis; GC/MS, gas chromatography–mass spectrometry; n.d., not detectable.aTable shows average values of the three separate measurements.bMutual information least dependent component analysis algorithm was used for NMR spectra deconvolution.cGC/MS measurements were not conducted for these samples.

ICA deconvolution of overlapped NMR spectra

In comparison, only very few publications on ICA applicationsin NMR data modeling are available. Recently, the strategy, whichapplies ICA to DOSY, was proposed to extract spectra of purecompounds in a mixture.[32] The method, being highly promising,was applied only for two-component experimental mixtures.[32]

The spectra of three-component mixtures of amino acids weredecomposed by using non-negative ICA providing goodqualitative and quantitative results.[52] Several studies showed thatICA provides effective and meaningful way of analyzing magneticresonance spectroscopy imaging of human brain tumors.[33,34]

Another promising application of ICA in NMR analysis is its applica-tion for food authentication instead of commonly used PCA.[64,65]

In this study, therefore, we expand the range of possible ICAapplications in NMR spectroscopy. Results of the qualitative andquantitative analysis of our up to eight-component systemsdefinitely showed the prospects of using such algorithms foranalysis of authentic samples by NMR (beverages, honey, and e-cigarettes). Apart from reliable qualitative analysis, ICA providesaccurate quantification results, which was confirmed by refer-ence methods and other chemometric techniques.

Conclusions

It is known that the 1H NMR spectrum, which contains hundredsof resonances, represents a fingerprint of a sample. This factposes additional challenges for univariate analysis of complexsystems. The problem of spectral overlap is especially acute for

Magn. Reson. Chem. (2014) Copyright © 2014 John Wiley

the 1H NMR spectroscopy in comparison with 13C NMR, wherebetter resolution can be obtained in the broad 0–250 ppm scale(1H NMR spectra are usually acquired in the narrow 0–11 ppmrange). 13C NMR spectroscopy, being not so sensitive as 1HNMR, additionally suffers from the nuclear Overhauser effect,which prevents accurate analyte quantification. Therefore, wehave limited our application range to 1H NMR spectroscopy inthis study.

We have shown how the problem of overlap can be solved byapplying ICA-based methods to the experimentally acquiredspectral information. The ICA strategy has been proven to effec-tively extract spectra of pure components and concentrationsfrom NMR spectra of mixtures. Incorporating ICA into the NMRspectroscopic analysis allows deconvolution of spectral data intomeaningful components. In this paper, for example, we appliedICA to the analysis of honey, e-cigarettes, and soft drinks withsufficient accuracy.

The obtained results represent a good practical basis forthe implementation of chemometric techniques for the signalresolution in the NMR analysis of food products as they improvethe accuracy and rapidity of analysis. The method is expected tofacilitate the use of NMR not only in the above mentionedsettings but also for other matrices as well. We hope that instru-ment manufacturers will include ICA into their NMR data analysispackages. Currently, the use of ICA in routine NMR analysis isrestricted by several media breaks from the acquisition soft-ware, over the bucketing software, followed by ASCII exportto MATLAB analysis.

& Sons, Ltd. wileyonlinelibrary.com/journal/mrc

Page 10: Independent component analysis (ICA) algorithms for improved spectral deconvolution of overlapped signals in 1 H NMR analysis: application to foods and related products

Y. B. Monakhova et al.

Acknowledgement

The authors are grateful for Professor Douglas N. Rutledge for hisuseful suggestions.

References[1] H. Köbler, Y. B. Monakhova, T. Kuballa, C. Tschiersch, J. Vancutsem,

G. Thielert, A. Mohring, D. W. Lachenmeier. J. Agric. Food Chem.2011, 59, 6877–6881.

[2] G. Le Gall, I. J. Colquhoun. NMR spectroscopy in food authentication,in Food Authenticity and Traceability (Ed: M. Lees), WoodheadPublishing Ltd, Cambridge, UK, 2003, 131–155.

[3] P. S. Belton, I. J. Colquhoun, E. K. Kemsley, I. Delgadillo, P. Roma,M. J. Dennis, M. Sharman, E. Holmes, J. K. Nicholson, M. Spraul.Food Chem. 1998, 61, 207–213.

[4] R. Godelmann, F. Fang, E. Humpfer, B. Schütz, M. Bansbach,H. Schäfer, M. Spraul. J. Agric. Food Chem. 2013, 61, 5610–5619.

[5] P. Maes, Y. B. Monakhova, T. Kuballa, H. Reusch, D. W. Lachenmeier.J. Agric. Food Chem. 2012, 60, 2778–2784.

[6] S. M. Eldridge, S. R. Chen, Z. H. Xu, P. N. Nelson, S. E. Boyd, I. Meszaros,K. Y. Chan. Waste Manag. 2013, 33, 2157–2169.

[7] M. Ohmenhaeuser, Y. B. Monakhova, T. Kuballa, D. W. Lachenmeier,ISRN Anal.Chem. 2013, http://dx.doi.org/10.1155/2013/825318.

[8] I. A. Lewis, S. C. Schommer, B. Hodis, K. A. Robb, M. Tonelli,W. M. Westler, M. R. Sussman, J. L. Markley. Anal. Chem. 2007,79, 9385–9390.

[9] W. Gronwald, M. S. Klein, H. Kaspar, S. R. Fagerer, N. Nurnberger,K. Dettmer, T. Bertsch, P. J. Oefner. Anal. Chem. 2008, 80, 9288–9297.

[10] Y. B. Monakhova, T. Kuballa, J. Leitz, C. Andlauer, D. W. Lachenmeier.Dairy Sci. Technol. 2012, 92, 109–120.

[11] K. Wegert, Y. B. Monakhova, T. Kuballa, H. Reusch, G. Winkler,D. W. Lachenmeier. Lebensmittelchem. 2012, 66, 143–145.

[12] C. Seger, S. Sturm, H. Stuppner. Nat. Prod. Rep. 2013, 30, 970–987.[13] K. T. Johansen, S. G. Wubshet, N. T. Nyberg. Anal. Chem. 2013, 85,

3183–3189.[14] M. Kühnle, D. Kreiler, K. Holtin, H. Czesla, P. Schuler, W. Schaal,

V. Schurig, K. Alberg. Anal. Chem. 2008, 80, 5481–5486.[15] S. Sturm, C. Seger. J. Chromatogr. A 2012, 1259, 50–61.[16] Y. B. Monakhova, T. Kuballa, D. W. Lachenmeier. Appl. Magn. Reson.

2012, 42, 343–352.[17] G. Papotti, D. Bertelli, R. Graziosi, M. Silvestri, L. Bertacchini,

C. Durante, M. Plessi. J. Agric. Food Chem. 2013, 61, 1741–1746.[18] S. Wold, M. Sjöström, L. Eriksson. Chem. Intell. Lab. Syst. 2001, 58,

109–130.[19] H. Abdi, L. J. Williams. Wiley Interdisc. Revs.: Comput. Stats. 2010, 2,

433–459.[20] D. W. Lachenmeier, W. Kessler. J. Agric. Food Chem. 2008, 56, 5463–5468.[21] J. V. Stone. Trends Cogn. Sci. 2002, 6, 59–64.[22] A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis,

Wiley, New York, 2001.[23] A. Cichocki, S. Amari, Adaptive blind signal and image processing, in

Learning Algorithms and Applications, Wiley, New York, 2002.[24] Y. B. Monakhova, S. P. Mushtakova, S. S. Kolesnikova. Anal. Bioanal.

Chem. 2010, 397, 1297–1306.[25] Y. B. Monakhova, S. A. Astakhov, A. V. Kraskov, S. P. Mushtakova.

Chem. Intell. Lab. Syst. 2010, 103, 108–115.[26] D. N. Rutledge, D. Jouan-Rimbaud Bouveresse. Trends Anal. Chem.

2013, 50, 22–32.[27] Y. B. Monakhova, S. S. Kolesnikova, S. P. Mushtakova. Anal. Methods

2013, 5, 2772.[28] I. Schelkanova, V. Toronov. Biomed. Opt. Express. 2012, 3, 64–74.[29] M. Mecozzi, M. Pietroletti, M. Scarpiniti, R. Acquistucci, M. E. Conti.

Environ. Monit. Assess. 2012, 184, 6025–6036.[30] Y. B. Monakhova, T. Kuballa, J. Leitz, D. W. Lachenmeier, Int. J. Anal.

Chem. 2011, http://dx.doi.org/10.1155/2011/704795.[31] J. A. Jendral, Y. B. Monakhova, D. W. Lachenmeier, Int. J. Anal. Chem.

2011, http://dx.doi.org/10.1155/2011/797604.[32] J. Zhong, N. DiDonato, P. G. Hatcher. J. Chemometr. 2012, 26, 150–157.[33] J. Ma, Z. Q. Sun, Exploring the intrinsic structure of magnetic

resonance spectra tumor data based on independent component

wileyonlinelibrary.com/journal/mrc Copyright © 201

analysis and correlation analysis, in Artificial Neural Networks -ICANN 2006, PT 2 (Eds: S. Kollias, A. Stafylopatis, W. Duch, E. Oja),Springer, Berlin-Heidelberg, 2006, 788–797.

[34] J. Pulkkinen, A.-M. Häkkinen, N. Lundbom, A. Paetau, R. A. Kauppinen,Y. Hiltunen. Eur. J. Radiol. 2005, 56, 160–164.

[35] A. Kraskov, Mutual Information Least-dependent ComponentAnalysis (MILCA). http://www.ucl.ac.uk/ion/departments/sobell/Research/RLemon/MILCA/MILCA [Accessed 8 August 2013].

[36] S. A. Astakhov, H. Stögbauer, A. Kraskov, P. Grassberger. Anal. Chem.2006, 78, 1620–1627.

[37] J. F. Cardoso, A. Souloumiac. IEE proceedings-F 1993, 140, 362–370.[38] E. G. Miller, J. W. Fisher. Independent components analysis by direct

entropy minimization. UCB/CSD-03-1221. 2003. University ofCalifornia at Berkeley.

[39] E. G. Learned-Miller, J. W. Fisher. J. Mach. Learn. Res. 2003, 4, 1271–1295.[40] J. Jaumot, R. Gargallo, A. de Juan, R. Tauler. Chem. Intell. Lab. Syst.

2005, 76, 101–110.[41] W. Windig, J. Guilment. Anal. Chem. 1991, 63, 1425–1432.[42] SIMPLISMA. http://read.pudn.com/downloads6/sourcecode/math/

22123/image_mva_0/simplisma.m__.htm. [Accessed 4 August 2009].[43] Blind source separation and independent component analysis.

http://perso.telecom-paristech.fr/~cardoso/guidesepsou.html [Accessed13 August 2013].

[44] The RADICAL ICA page. http://people.cs.umass.edu/~elm/ICA/[Accessed 13 August 2013].

[45] D. Jouan-Rimbaud Bouveresse, A. Moya-González, F. Ammari,D. N. Rutledge. Chem. Intell. Lab. Syst. 2012, 112, 24–32.

[46] Y. B. Monakhova, S. A. Astakhov, S. P. Mushtakova, L. A. Gribov.J. Anal. Chem. 2011, 66, 351–362.

[47] S. M. Lee. Estimating the number of independent components viathe SONIC statistics. 2003. Oxford.

[48] G. Wang, W. Cai, X. Shao. Chem. Intell. Lab. Syst. 2006, 82, 137–144.[49] Y. B. Monakhova, H. Schäfer, E. Humpfer, M. Spraul, T. Kuballa,

D. W. Lachenmeier. Magn. Reson. Chem. 2011, 49, 734–739.[50] S. Masoum, C. Malabat, M. Jalali-Heravi, C. Guillou, S. Rezzi,

D. N. Rutledge. Anal. Bioanal. Chem. 2007, 387, 1499–1510.[51] H. Winning, F. H. Larsen, R. Bro, S. B. Engelsen. J. Magn. Reson. 2008,

190, 26–32.[52] X. Shao, L. Miao, Z. Liu, H. Liu, W. Cai. Spectrosc. Lett. 2011, 44,

244–250.[53] R. Huo, R. Wehrens, L. M. C. Buydens. J. Magn. Reson. 2004, 169, 257–269.[54] J. Jaumot, M. Vives, R. Gargallo, R. Tauler. Anal. Chim. Acta 2003, 490,

253–264.[55] S. A. A. Sousa, A. Magalhães, M. M. C. Ferreira. Chem. Intell. Lab. Syst.

2013, 122, 93–102.[56] S. E. Richards, Y. Wang, D. Lawler, S. Kochhar, E. Holmes, J. C. Lindon,

J. K. Nicholson. Anal. Chem. 2008, 80, 4876–4885.[57] W. Windig, B. Antalek, J. L. Lippert, Y. Batonneau, C. Brémard. Anal.

Chem. 2002, 74, 1371–1379.[58] M. Vives, R. Tauler, V. Moreno, R. Gagallo. Anal. Chim. Acta 2001, 446,

437–448.[59] M. Kompany-Zareh, S. Gholami, B. Kaboudin. Anal. Methods 2012, 4,

162–170.[60] T. K. Karakach, R. Knight, E. M. Lenz, M. R. Viant, J. A. Walter. Magn.

Res. Chem. 2009, 47, S105–S117.[61] L. C. M. van Gorkom, T. M. Hancewicz. J. Magn. Reson. 1998, 130,

125–130.[62] G. Vivó-Truyols, M. Ziari, P. C. M. M. Magusin, P. J. Schoenmakers.

Anal. Chim. Acta 2009, 641, 37–45.[63] T. Salomonsen, H. M. Jensen, F. H. Larsen, S. Steuernagel,

S. B. Engelsen. Carbohydr. Res. 2009, 344, 2014–2022.[64] M. Cuny, G. Le Gall, I. Colquhoun, M. Lees, D. Rutledge. Anal. Chim.

Acta 2007, 597, 203–213.[65] M. Cuny, E. Vigneau, G. Le Gall, I. Colquhoun, M. Lees, D. N. Rutledge.

Anal. Bioanal. Chem. 2008, 390, 419–427.

Supporting Information

Additional supporting information may be found in the onlineversion of this article at the publisher’s website.

4 John Wiley & Sons, Ltd. Magn. Reson. Chem. (2014)