8
Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass Cristiane C. Guimarães a,b , Maria Lúcia F. Simeone b , Rafael A.C. Parrella b , Marcelo M. Sena a,c, a Departamento de Química, ICEx, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil b Embrapa Milho e Sorgo, MG 424, km 45, 35701-970 Sete Lagoas, MG, Brazil c Instituto Nacional de Ciência e Tecnologia em Bioanalítica, 13083-970 Campinas, SP, Brazil abstract article info Article history: Received 17 December 2013 Received in revised form 16 May 2014 Accepted 27 June 2014 Available online 10 July 2014 Keywords: Saccharine sorghum Biofuel Cellulose Lignin Near infrared spectroscopy Multivariate analytical validation Sweet sorghum biomass is gaining importance as feedstock for second generation bioethanol production. Consequently, breeding programs are seeking to improve the quality of this feedstock in order to increase the productivity, with the generation of a great number of samples to be analyzed. Thus, this paper developed rapid and low cost methods based on partial least squares (PLS) and near infrared reectance spectroscopy for determining cellulose, hemicellulose, lignin and theoretical ethanol yield (TEY) in sorghum biomass. The models were built with 957 samples, obtained from more than 100 hybrids and inbred strains, in the ranges of 21.449.1% w/w, 18.434.8% w/w, 1.811.5% w/w and 221412 L t 1 for cellulose, hemicellulose, lignin and TEY, respectively. These models presented root mean square errors of prediction of 1.5%, 1.7%, 0.8% and 12 L t 1 (and ranges of relative errors of prediction between 5.3 and 6.5%, 9.8 and 12.2%, 28.8 and 37.6%, and 5.6 and 6.1%), respectively. The methods were submitted to a complete multivariate analytical validation in accordance with the Brazilian and international guidelines, and considered accurate, linear, sensitive and unbiased. Finally the stability of these methods was monitored for approximately six months by developing appropriate control charts. © 2014 Elsevier B.V. All rights reserved. 1. Introduction The interest for new renewable and sustainable energy sources has increased in recent decades, mainly due to the search for alternatives to help reduce emissions of pollutants in the atmosphere [1,2]. In this sense, biofuel is one of the fastest growing sectors in the world econo- my, especially the rst-generation ethanol, produced by fermentation from energy crops rich in sugar and starch [3]. However, even though the production of rst generation biofuels represents a well established and technically advanced option, lignocellulosic biomass has emerged and been considered the most promising source of sustainable energy in decades [46]. Biofuels derived from cellulosic materials are produced from renewable and widely available raw materials [2], while also increasing economic opportunities for farmers and rural communities [7]. A culture that has proved highly feasible to be used in the generation of bioethanol is sweet sorghum, Sorghum bicolor (L.) Moench, mainly due to its high yield per hectare [8,9]. Sorghum is characterized as a C4 plant with high photosynthetic efciency and one of the highest re- sistance to heat and drought [10]. In addition, it has particular potential for biofuel production in areas where sugarcane is currently produced, due to its shorter harvest period [11]. In Brazil, this makes sorghum a particularly promising complementary crop for the diversication of sugarcane croplands. After extraction of fermentable juice the solid fraction left behind is the sorghum bagasse, a lignocellulosic residue derived from the plant cell wall that contains a considerable amount of polymeric carbohydrates (cellulose and hemicellulose). These poly- saccharides can be hydrolyzed to sugars and then fermented to obtain the second generation bioethanol [12,13]. The sorghum bagasse is typically composed of 27 to 48% cellulose, 19 to 24% hemicellulose, and 9 to 32% lignin [14]. While the cost reduction of the production of second generation ethanol per unit mass of raw material can only be initially obtained by optimizing bagasse processing techniques, additional cost savings and increase in yield could be achieved by improving the quality of the feed- stock [15]. Thus the breeding programs of sweet sorghum have sought to develop new cultivars of the species that provide a higher yield of ethanol per ton of bagasse [8,16,17]. These studies often generate a very large number of samples for which the lignocellulosic composition should be determined. Another demand of this type of research is the development of models to estimate the theoretical ethanol yield that can be produced from each bred cultivar. Some works have successfully Microchemical Journal 117 (2014) 194201 Paper presented at the Brazilian Congress on Analytical Chemistry Corresponding author at: Departamento de Química, ICEx, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil. Tel.: +55 31 34096389; fax: +55 31 34095700. E-mail address: [email protected] (M.M. Sena). http://dx.doi.org/10.1016/j.microc.2014.06.029 0026-265X/© 2014 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Microchemical Journal journal homepage: www.elsevier.com/locate/microc

Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

Embed Size (px)

Citation preview

Page 1: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

Microchemical Journal 117 (2014) 194–201

Contents lists available at ScienceDirect

Microchemical Journal

j ourna l homepage: www.e lsev ie r .com/ locate /mic roc

Use of NIRS to predict composition and bioethanol yield from cell wallstructural components of sweet sorghum biomass☆

Cristiane C. Guimarães a,b, Maria Lúcia F. Simeone b, Rafael A.C. Parrella b, Marcelo M. Sena a,c,⁎a Departamento de Química, ICEx, Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazilb Embrapa Milho e Sorgo, MG 424, km 45, 35701-970 Sete Lagoas, MG, Brazilc Instituto Nacional de Ciência e Tecnologia em Bioanalítica, 13083-970 Campinas, SP, Brazil

☆ Paper presented at the Brazilian Congress on Analytic⁎ Corresponding author at: Departamento de Química

Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil. Tel31 34095700.

E-mail address: [email protected] (M.M. Sena).

http://dx.doi.org/10.1016/j.microc.2014.06.0290026-265X/© 2014 Elsevier B.V. All rights reserved.

a b s t r a c t

a r t i c l e i n f o

Article history:Received 17 December 2013Received in revised form 16 May 2014Accepted 27 June 2014Available online 10 July 2014

Keywords:Saccharine sorghumBiofuelCelluloseLigninNear infrared spectroscopyMultivariate analytical validation

Sweet sorghum biomass is gaining importance as feedstock for second generation bioethanol production.Consequently, breeding programs are seeking to improve the quality of this feedstock in order to increase theproductivity, with the generation of a great number of samples to be analyzed. Thus, this paper developedrapid and low cost methods based on partial least squares (PLS) and near infrared reflectance spectroscopy fordetermining cellulose, hemicellulose, lignin and theoretical ethanol yield (TEY) in sorghum biomass. Themodelswere built with 957 samples, obtained from more than 100 hybrids and inbred strains, in the ranges of21.4–49.1% w/w, 18.4–34.8% w/w, 1.8–11.5% w/w and 221–412 L t−1 for cellulose, hemicellulose, ligninand TEY, respectively. These models presented root mean square errors of prediction of 1.5%, 1.7%, 0.8%and 12 L t−1 (and ranges of relative errors of prediction between −5.3 and 6.5%, −9.8 and 12.2%, −28.8and 37.6%, and−5.6 and 6.1%), respectively. The methods were submitted to a complete multivariate analyticalvalidation in accordance with the Brazilian and international guidelines, and considered accurate, linear,sensitive and unbiased. Finally the stability of these methods was monitored for approximately six months bydeveloping appropriate control charts.

al Chemistry, ICEx, Universidade Federal de.: +55 31 34096389; fax: +55

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

The interest for new renewable and sustainable energy sources hasincreased in recent decades, mainly due to the search for alternativesto help reduce emissions of pollutants in the atmosphere [1,2]. In thissense, biofuel is one of the fastest growing sectors in the world econo-my, especially the first-generation ethanol, produced by fermentationfrom energy crops rich in sugar and starch [3]. However, even thoughthe production of first generation biofuels represents a well establishedand technically advanced option, lignocellulosic biomass has emergedand been considered the most promising source of sustainable energyin decades [4–6]. Biofuels derived from cellulosic materials areproduced from renewable and widely available raw materials [2],while also increasing economic opportunities for farmers and ruralcommunities [7].

A culture that has proved highly feasible to be used in the generationof bioethanol is sweet sorghum, Sorghum bicolor (L.) Moench, mainlydue to its high yield per hectare [8,9]. Sorghum is characterized as a

C4 plant with high photosynthetic efficiency and one of the highest re-sistance to heat and drought [10]. In addition, it has particular potentialfor biofuel production in areas where sugarcane is currently produced,due to its shorter harvest period [11]. In Brazil, this makes sorghum aparticularly promising complementary crop for the diversification ofsugarcane croplands. After extraction of fermentable juice the solidfraction left behind is the sorghum bagasse, a lignocellulosic residuederived from the plant cell wall that contains a considerable amountof polymeric carbohydrates (cellulose and hemicellulose). These poly-saccharides can be hydrolyzed to sugars and then fermented to obtainthe second generation bioethanol [12,13]. The sorghum bagasse istypically composed of 27 to 48% cellulose, 19 to 24% hemicellulose,and 9 to 32% lignin [14].

While the cost reduction of the production of second generationethanol per unit mass of raw material can only be initially obtained byoptimizing bagasse processing techniques, additional cost savings andincrease in yield could be achieved by improving the quality of the feed-stock [15]. Thus the breeding programs of sweet sorghum have soughtto develop new cultivars of the species that provide a higher yield ofethanol per ton of bagasse [8,16,17]. These studies often generate avery large number of samples for which the lignocellulosic compositionshould be determined. Another demand of this type of research is thedevelopment of models to estimate the theoretical ethanol yield thatcan be produced from each bred cultivar. Someworks have successfully

Page 2: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

195C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

correlated the composition analysis of biomass with the estimatedethanol yield for certain raw materials, including sorghum [8,11,15].However, the standard methods used for these determinations areexpensive, time consuming and not practically feasible for the analysisof such a large number of samples [18]. Thus, the development of fasterand less expensivemethods that can provide analytical support to plantbreeding programs is necessary.

An analytical alternative to overcome the drawbacks of the tradi-tional standardmethods is the use of near infrared reflectance spectros-copy (NIRS) combined with multivariate calibration methods. Thisalternative enables the development of methods that are rapid, nonde-structive, environmentally friendly, and solvent-free, with no or lowresidue generation, and requiring a minimum of sample pretreatmentwith less human intervention. The applicability of NIRS for biomassanalysis was revealed in the last decade [19,20]. Since then, severalpapers have been published using NIRS to characterize feedstock usedfor producing biofuels, such as corn stover, switchgrass and cerealgrains [19–22]. Only one paper has used NIRS to determine three sor-ghum bagasse properties, gross calorific value, in vitro digestibilityand crude protein [14]. More important for the aims of plant breedingprograms focused on biofuels, some papers have recently used NIRS topredict the actual ethanol yield from switchgrass [21], wheat, rye, triti-cale [22], beet [23] and maize [24]. In all of these papers, the ethanolyield was measured by HPLC after the feedstock samples were subjectto a fermentation process.

Considering that this work is focused on the demands of thesweet sorghum breeding program of the Brazilian Agriculture ResearchCorporation (EMBRAPA), which has generatedmore than one thousandcultivar samples, there is an utmost need to develop a rapid, low-costand reliable analytical method for distinguishing hybrids with highethanol yield potential. For the models developed in this paper almostone thousand samples were analyzed, which is two to five times thenumber of samples used in the previously cited papers. This makesthe development of NIRS models for predicting the actual ethanolyield too laborious and expensive, due to the need of fermentation pro-cess and chromatographic analysis. Thus, a feasible alternative is the de-velopment of a theoretical model for the ethanol yield. Only one paperhas used NIRS for developing such a model [21], but it is based on thesugar composition, demanding the use of HPLC for obtaining referencevalues. In truth, almost of all the papers that developed NIRS modelsfor biomass have required chromatographic analysis for obtaining refer-ence values [19–24]. The alternative proposed in this paper is the use ofa theoretical equation based exclusively on the cell wall polysaccharidecomposition of biomass [8,25]. This equation has been reliably appliedfor studying the potential of different sorghum cultivars in a breedingprogram [8] and has only required the use of less expensive extractiveand gravimetric methods.

With the aim to guarantee the reliability of the developed methods,they were subjected to a complete multivariate analytical validation, anissue absent in all the previous NIRSmodels for biomass found in the lit-erature. The following figures of merit (FOM)were estimated: trueness,precision, linearity, working range, selectivity, sensitivity, analyticalsensitivity, bias and residual prediction deviation (RPD). An in-depthdescription of the estimate of these figures for multivariate methodscan be found in the relevant literature [26–30]. Particularly, selectivityand sensitivity were estimated based on the net analyte signal (NAS)[26,28] and the bias was evaluated according to ASTM [31]. For evaluat-ing the linearity, this validation did not consider sufficient a correlationcoefficient closer to 1, and the random behavior of the fit residuals waschecked by appropriate statistical tests. Amethodology initially used forunivariate least squares [32] was adapted to multivariate calibration[30], and the Ryan–Joiner (RJ), Brown–Forsythe (BF), and Durbin–Watson (DW) tests were applied to the model residuals, in order toevaluate their normality, homocedasticity, and independency, respec-tively. This multivariate validation was combined with a robust outlierdetection based on the identification of samples with extreme

leverages, large residuals in the X block (spectral data) or large residualsin the Y block (property values) [29]. Finally, once NIRS methods werevalidated, they were implemented in the routine analysis and moni-tored by control charts, whose residues were also evaluated with thesame statistical tests cited above.

Thus, the aim of this study was the development, validation andpost-model monitoring of methods based on partial least squares(PLS) and NIRS for predicting the composition (cellulose, hemicelluloseand lignin) and theoretical bioethanol yield from cell wall structuralcomponents of sweet sorghum biomass, in order to satisfy the require-ments of rapidity and low cost for the analysis of a huge amount ofsamples generated by a sorghum breeding program.

2. Materials and methods

2.1. Samples

Samples of biomass analyzed in this study were of different geno-types of sweet sorghum, among which some were obtained from theBreeding Program and others were from the Active Germplasm Bank,both belonging to the Embrapa Maize and Sorghum Research Center,in Sete Lagoas, MG, Brazil. For material characterization three testswere conducted in the experimental fields of Embrapa, in Sete Lagoas,between 2010 and 2013, from which three genetic diversity panelswere assembled. In addition, in order to increase the variability incorpo-rated into the calibration models, samples from a maturation test wereincluded. These samples were collected from the early development ofthe plant until ten weeks later characterizing the last stage of matura-tion. For all laboratory analyses, representative samples of the stalks ofsorghum plants, after juice extraction, were weighed fresh and driedfor 72 h at 65 °C. Dried samples were ground to pass through a 2-mmscreen in a Wiley mill and packed in polyethylene pots. A total of 957samples obtained from about one hundred hybrids and inbred strainswere used for building the models. Once the methods were validatedand implemented, in April 2013, fifty more samples were representa-tively selected for six months and analyzed by both, reference andNIRS methods, in order to perform a post-model monitoring.

2.2. Reagents

All reagents used in this work were of analytical grade, pur-chased from certified suppliers, and used without further purifica-tion. The following reagents were employed, sulfuric acid (95–97%),cetyltrimethylammonium bromide — CTAB (99%), sodium dodecylsulfate (99%), acid ethylenediaminetetraacetic disodium salt 2-hydrate — EDTA (99–101%).; tetraborate sodium (borax) decahydrate(99–105%), sodium phosphate dibasic anhydrous (99%), triethyleneglycol, and acetone (99.5%).

2.3. Apparatus and software

Spectra were recorded on a Büchi NIRFlex N-500 FT-NIR spectrome-ter (Flawil, Switzerland), equipped with a diffuse reflectance accessory.The spectrometer was controlled and data were acquired using theNIRWareOperator software. Datawere handled using theMATLAB soft-ware, version 7.13 (The MathWorks, Natick, MA, USA). The PLS routinecame from the PLS Toolbox, version 6.5 (Eigenvector Technologies,Manson, WA, USA), and a homemade routine was also employed forthe detection of outliers [29]. Other equipments utilized included afiber determiner Tecnal (Piracicaba, Brazil), model TE-149, with capaci-ty for 30 tests, a muffle furnace microprocessor Quimis (Diadema,Brazil), model Q318S, with ramp and soak, an analytical balanceMars Scales (North York, Canada), model AUW220, with a sensitivityof ±0.1 mg, an oven American Lab (San Francisco, USA), modelAL 100/150, with air renewal and circulation, and aWiley mill (Thomas

Page 3: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

196 C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

Scientific, Swedesboro, USA). Ankom F57 filter bags (Ankom Technology,Macedon, USA) were also employed for fiber determinations.

2.4. Procedures

2.4.1. Near infrared spectroscopySorghum biomass samples were dried, milled with aWiley mill, and

transferred to glass plates of 100 mm internal diameter, which wereused as measurement cells to obtain the NIR spectra. The spectra wererecorded in triplicate from 10,000 to 4000 cm−1 with 4 cm−1 steps, asthe average of 32 scans, and at 25 ± 2 °C. Triplicates of samples withthree different theoretical ethanol yield (TEY) levels, low, medium andhigh, were obtained for evaluating repeatability. These replicates werealso analyzed on another day by a different analyst for estimating inter-mediate precision. Ten replicate spectra of the empty platewere record-ed in the same conditions in order to estimate the instrumental noise.

2.4.2. Reference methodsAcid detergent fiber (ADF), neutral detergent fiber (NDF) and acid

detergent lignin were determined by the sequential method developedby van Soest [33], and automated and simplified by Ankom Technology[34]. Approximately 500 mg of each sample was placed on filter bags,which were heat sealed. The samples were extracted with neutral de-tergent solution. The cell contents are solubilized in neutral detergent,whereas the cell wall, which is insoluble, can be separated by filtration.The filtered residue was weighed to determine the percentage of NDF.This residual material was extracted with acid detergent solution andthe new residue was weighed to determine the percentage of ADF(consisting almost entirely of lignin and cellulose). For estimating lignin,ADF residue was extracted with a H2SO4 72% v/v solution, followed byignition of the bag residues at 500 °C for 4 h. The content of cellulosewas determined as the difference betweenADF and lignin, and hemicel-lulose content was found as the difference between NDF and ADF.

2.4.3. Theoretical ethanol yield (TEY)The estimate of the TEY was carried out according to the equation

adopted by Zhao et al. [8]. Through this equation, the theoretical yieldof ethanol from carbohydrate polymers (cellulose and hemicellulose)was calculated in liters/ha of biomass processed, as shown below.

TEY = Ethanol yield from cellulose and hemicellulose (L ha−1) =cellulose and hemicellulose content (%) in dry matter × dry biomass(t ha−1) × 1.11 (conversion factor of sugar from cellulose and hemicel-lulose) × 0.85 (process efficiency of sugar from cellulose and hemicellu-lose) × 0.51 (conversion factor of ethanol from sugar) × 0.85 (process

Fig. 1. NIR spectra of 957 samples

efficiency of ethanol from sugar) × 1000 / 0.79 (specific gravity of etha-nol, g·mL−1).

In order to obtain the TEY in L t−1, the equation above wasmodifiedand the term dry biomass (t ha−1) was removed.

3. Results and discussion

3.1. NIR spectra and PLS model development

Lignocellulosic biomass consists of threemain structural units: cellu-lose, hemicellulose and lignin. Cellulose is a crystalline polymer of glu-cose, hemicellulose is an amorphous polymer of xylose and arabinose,and lignin is a complex polymer of aromatic alcohols. Vibration bandsassociated with these chemical biomass components [20,35] can beobserved in Fig. 1, which displays the NIR spectra of 957 analyzed sam-ples of sweet sorghum, and they will be specifically attributed below, inthe discussion about the regression coefficients of the PLS models. Thespectral regions between 7200 and 6600, 6000 and 5500, 5400 and4600 and 4600 and4000 cm−1 can be attributed to O–H stretch firstovertone, C–H stretch first overtone, O–H combination bands and C–Hcombination band regions, respectively [36].

The sampleswere divided into 638 for the calibration set and 319 forthe validation set with the Kennard–Stone algorithm, which systemati-cally ensures thepresence of themost representative samples in the cal-ibration set by a uniform scanning of the spectral data [37]. Thepresence of nonlinear baseline deviations, characteristic of the diffusereflectance spectra, was corrected by SNV (standard normal variate)preprocessing [38]. The alternatives of MSC (multiple scatter correc-tion) and first derivative with Savitsky–Golay smoothing were alsotried, but provided slightly worse results. All the models were thenmean centered, and the number of latent variables (LVs) was chosenby contiguous block (with 10 splits) cross validation, considering thelowest root mean square error of cross validation (RMSECV). As thespectral region between 10,000 and 9000 cm−1 showed no significantabsorption, it was removed from the models.

The regression coefficients of thedeveloped PLSmodels for cellulose,hemicellulose, lignin, and TEY are shown in Fig. 2. By observing thisfigure, it was possible to identity the wavenumbers that present themost positive coefficients, and then contribute most to each model.For cellulose (Fig. 2a), the most important spectral variables are abroad band between 6800 and 6400 cm−1 (related to the first overtoneof O–H stretching), and peaks at about 5970 cm−1 (first overtoneof C–Hstretching), 5210 cm−1 (combination band of O–H stretching and O–Hdeformation), 4415 cm−1 (O–H and C–O stretching combination band),4290 cm−1 (combination of C–H stretching and CH2 deformation) and

of sweet sorghum biomass.

Page 4: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

Fig. 2. Regression coefficients of the PLS models for (a) cellulose, (b) hemicellulose, (c) lignin, and (d) TEY.

197C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

4230 cm−1 (combination of C–H deformation and C–H stretching). Forhemicellulose (Fig. 2b), the peaks are at about 7000 cm−1 (first over-tone of O–H stretching), 5820 cm−1 (first overtone of C–H stretching),4880 cm−1, 4280 cm−1 (combination of C–H stretching and CH2 defor-mation) and 4060 cm−1. For lignin (Fig. 2c), the peaks are at about7040 cm−1 (first overtone of O\H stretching), 5940 cm−1 (first over-tone of aromatic C–H stretching), 5230 cm−1 (first overtone of aromaticO–H stretching), 4415 cm−1 (combination band of O–H and C–Ostretching), 4290 cm−1 and 4210 cm−1 [20,35,36]. For TEY (Fig. 2d),the peaks are at about 7060 cm−1, 5230 cm−1, 4440 cm−1 and4330 cm−1.

Nevertheless, this spectral interpretation of the regression vectorscan be susceptible to criticism and should be cautiously considered, asdiscussed by Brown and Green [39]. They have highlighted the criticalfactors that limit the interpretation of regression vectors in multivariatecalibration, such as the dependences on the samples in the calibrationset, on the implicit covariance of the components, and on the signal tonoise ratio of the data. In conclusion, these authors have recommendednot to rely on the spectral attributions of regression coefficients to

Table 1Results for the optimization of the PLS models (in italics for the final ones) through the detecti

Cellulose Hemicellulose

1st 2nd 3rd 4th 1st 2nd 3rd

Ncala 638 600 568 542 638 605 570Nvalb 319 319 319 248 319 319 319NLVc 6 6 6 6 6 6 6RMSECd 2.7 1.7 1.6 1.0 3.5 2.0 1.8RMSEPd 2.7 2.8 2.9 1.5 3.5 3.6 3.6

a Number of calibration samples.b Number of validation samples.c Number of latent variables.d Cellulose, hemicellulose and lignin in %. TEY in Lt−1.

evaluate the quality of the models, pointing to the need of validatingtheir predictive performance in the real word by estimating FOM suchas selectivity and sensitivity. This was carried out in this paper andwill be discussed in Section 3.3.

3.2. Optimization of the models through the detection of outliers

After the PLS models were built, they were optimized by using theappropriate statistical procedures for the detection of outliers, basedon the relevant literature [27,29,31,32]. The oultiers were detected at95% confidence level, and the results were summarized in Table 1. Theoutliers can be removed within a limit of 2 out of 9 (22.2%) of the totalnumber of samples, as established by Brazilian [40] and internationalguidelines [32,41]. The validation set optimization was only performedafter finishing the calibration optimization. Finally, no more than threerounds of outlier detection (four models) should be performed, inorder to avoid the snowballing effect, when repetitive rounds continueto identify outliers [27].

on of outliers.

Lignin TEY

4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th

550 638 598 560 532 638 599 562 534249 319 319 319 256 319 319 319 250

6 6 6 6 6 4 4 4 41.4 1.4 1.0 0.9 0.8 18 14 12 91.7 1.3 1.3 1.3 0.8 14 14 14 12

Page 5: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

198 C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

As can be seen in Table 1, the number of LV used for themodels var-ied from 4 to 6. For the model built for predicting cellulose, 96 outlierswere detected in the calibration set (15%), of which 7, 26 and 63 werebased on high leverage, large spectral (X) residuals and large property(Y) residuals, respectively; in the validation set, 71 outliers (22%)were removed. For the hemicellulose model, 88 outliers were detectedin the calibration set (14%), of which 6, 25 and 57 for high leverage,large X and Y residuals, respectively, in the validation set 70 outliers(22%) were removed. For the lignin model, 106 outliers (17%) in thecalibration set: 7, 23 and 76; and 63 (20%) in the validation set. Forthe TEY model, 104 outliers (16%) in the calibration set: 3, 28 and 73;and 69 (22%) in the validation set.

3.3. Multivariate analytical validation

Table 2 summarizes the parameters estimated for evaluating theFOM for the proposed methods. In the following subsections, they willbe specifically discussed.

3.3.1. LinearityThe linearity of the developed methods was estimated through the

fit of the predicted versus reference values. Qualitatively, the plot ofthe residuals of this fit for the calibration and validation samples indi-cates a linear behavior if they are randomly distributed. The residualsfor cellulose, hemicellulose, lignin and TEY models were plotted as afunction of the predicted values and shown in Fig. 3. As can be seen,no systematic trendswere observed in these residual distributions. Nev-ertheless, the random behavior of these residuals should be confirmedby appropriate statistical tests (Table 2). For estimating the criticalvalues for these tests, we considered N = 550, the largest number ofsamples used among the developed methods (hemicellulose, Table 1).Initially, the normality of the residuals was evaluated with the RJ test.The residuals can be considered normally distributed if Req ≥ Rcritic.The estimated Req values were all below the critical value of 0.9997 at95% confidence level, but they were all above 0.9928, the critical valueat 99%. The large number of samples used leads to a high Rcritic andmakes the RJ test too rigorous. Thus, the residuals for all the modelshad a normal distribution at 99% confidence level. In the sequence, thehomocedasticity of the residuals was evaluated with the BF test. Theywere considered homoscedastic if tL ≤ tcritic. The estimated tL valueswere all below the critical t of 1.96 (95% and infinite degrees of

Table 2Parameters for evaluating the main FOM of the developed NIRS methods for the determination

FOM Parameter Cellulose

value

Trueness Relative errors (min/max) −5.3/6.5%Precision RSD repeatabilitya 1.5%

RSD intermediate precisiona 1.5%Linearity Req (RJ test) 0.9964

tL (BF test) 0.24d (DW test) 1.95Slopeb 0.9 ± 0.1Interceptb 2.7 ± 4.3Correlation coefficient (r)b 0.9631

Working range 21.4 to 49.1%Selectivity 0.13Sensitivity 0.07Analytical sensitivity (γ) 1.7%−1

γ−1 0.6%Bias Bias ± SDVc −0.0312 ± 1.07

Estimated t 0.45RPD RPD calibration 3.7

RPD validation 2.7

a Results for three samples at three different content levels.b Values for the line fitted to the calibration samples.c Standard deviation of validation errors.

freedom), thus confirming the homocedasticity. Finally, the indepen-dence of the residuals was evaluated with the DW test and confirmedsince all the calculated d values were within the estimated acceptancerange, between 1.81 and 2.19. A deeper discussion about these testscan be found in the paper of Souza & Junqueira [32]. Once the assump-tions for linear models were verified, the parameters of the fits of thereference versus predicted values shown in Table 2 can be consideredvalid.

3.3.2. Trueness, precision, and working rangeThe trueness can be evaluated through the absolute parameters

RMSEC (root mean square error of calibration) and RMSEP (root meansquare error of prediction), as shown in Table 1. The RMSEP of 1.5%,1.7%, 0.8% and 12 L t−1 for cellulose, hemicellulose, lignin and TEY, re-spectively, indicated that the methods provided results in good agree-ment with the reference values. The trueness can also be evaluatedbased on the observed individual relative errors (Table 2). For celluloseand TEY, all of these errors were below ±10%, which is in accordancewith the Brazilian guidelines, that established acceptance limitsbetween −20 and +10% [42]. For hemicellulose, only 8 out of 249(3%) validation samples presented relative errors above 10%, but for lig-nin 69 out of 256 (27%) samples had errors larger than the limits. Thelower trueness of this last method is attributed to the lower analyticalrange for lignin.

The precision was calculated at the levels of repeatability and inter-mediary precision through the relative standard deviations (RSD). Forall methods, these values varied between 0.4 and 1.5% for repeatability,and between 0.7 and 1.6% for intermediary precision (Table 2). All ofthese results were also in accordance with the Brazilian guidelines,which prescribes a maximum acceptable RSD of 4% for repeatabilityand 10% for intermediate precision [42]. The results of trueness andprecision allow attesting the accuracy of the methods. Considering thelinearity and accuracy studies, the working ranges of the methodswere established from 21.4 to 49.1% w/w for cellulose, from 18.4 to34.8% w/w for hemicellulose, from 1.8 to 11.5% w/w for lignin, andfrom 221 to 412 L t−1 for TEY.

3.3.3. Selectivity and analytical sensitivitySelectivity and sensitivity were estimated based on the NAS [26,28].

For univariate methods the requirement of 100% of selectivity is usual,but for multivariate methods there is no practical interest in

of cellulose, hemicellulose, lignin and theoretical ethanol yield.

Hemicellulose Lignin TEY

−9.8/12.2% −28.8/37.6% −5.6/6.1%0.5% 1.5% 0.5%0.7% 1.6% 0.5%0.9952 0.9956 0.99801.94 1.15 0.471.93 1.87 1.960.7 ± 0.2 0.8 ± 0.2 0.9 ± 0.17.7 ± 6.1 1.3 ± 1.3 28.4 ± 41.90.8390 0.8920 0.955318.4 to 34.8% 1.8 to 11.5% 221 a 412 L t−1

0.11 0.14 0.260.15 0.27 0.023.6%−1 6.4%−1 0.5 t L−1

0.3% 0.2% 2 L t−1

9% 0.106 ± 1.354% 0.362 ± 0.778% 0.049 ± 8.182 L t−1

1.23 0.36 0.091.8 2.2 3.31.7 2.0 2.7

Page 6: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

Fig. 3. Residuals for the calibration (empty circles) and validation (full triangles) samples. PLS models for (a) cellulose, (b) hemicellulose, (c) lignin, and (d) TEY determination.

199C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

establishing a minimum required value. Unlike for univariate methods,low values of selectivity can be obtained even for accurate multivariatemethods. For the developed models, the estimated values of selectivityonly indicated that 13, 11, 14 and 26% of the analytic signal were usedfor predicting cellulose, hemicellulose, lignin and TEY, respectively.Since sensitivity values are not appropriate for comparison with othermethods, the analytical sensitivity (γ) was also calculated from theestimate of the instrumental noise, 0.042, which was obtained fromthe pool standard deviation of ten replicate spectra of the empty plate.The inverse ofγ (Table 2) provides an estimate of theminimumconcen-tration differences that can be distinguished by the methods, consider-ing the random instrumental noise as the only source of errors, and alsodefines the number of decimal places that should be used to express theprediction results.

3.3.4. Bias and RPDThe bias was calculated only with the validation samples [31] and

the estimates are shown in Table 2. All the calculate t-values werelower than the t-critical value (1.96, with infinite degrees of freedomand 95% confidence level). These results confirmed the absence of sys-tematic errors in the model's predictions. According to the literature[30,43], good calibration models must have a RPD higher than 2.4,while RPD values between 2.4 and 1.5 are considered acceptable. Ascan be observed in Table 2, RPD for cellulose and TEY methods wereconsidered good, while RPD for the hemicellulose and lignin wereconsidered satisfactory.

3.4. Control charts for post-model monitoring

Once the NIRS methods were validated, their stabilities were moni-tored through control charts. From April to September 2013, 50sorghum biomasses were representatively sampled, two per week,and analyzed by both, NIRS and reference methods. These results are

shown in Fig. 4. The limits indicated in these control charts are basedon the Brazilian guidelines [40], which established for monitoring thestability of quantitative methods for food and agricultural analysis thattwo thirds of the samples must have relative errors within ±15%(alert limits), while one third can present maximum allowable errorsof ±33% (action limits). As can be seen in this figure, only two samplesshowed relative errors between the alert and action limits for the cellu-lose and hemicellulose models (Fig. 4a–b), while no sample presentederrors above the alert limits for TEY (Fig. 4d). Theworst resultswere ob-served for the lignin model (Fig. 4c), in which 15 out of 50 samples(30%) showed errors between the alert and action limits. Nevertheless,this last result was still within the acceptance limits and all the modelswere considered satisfactorily stable.

To complement the analysis of the developed control charts, the ran-dom distribution of errors was checked with the same statistical testspreviously used for the linearity evaluation (Table 3). For the RJ test,the hemicellulose, lignin and TEY control charts presented Req valuesabove the Rcritic (0.9764, N = 50) at 95% and the cellulose controlchart presented a value very close to this limit. Thus, at 96% confidencelevel, all the errors were considered normally distributed. For the BFtest, cellulose, lignin and TEY showed tL values below the critical value(2.01) at 95%, while the tL estimated for hemicellulose was acceptedonly at 98% (tcritic = 2.40), guarantying the homocedascity of the errorsat this level. Finally, the DW test indicated that only the hemicelluloseand TEY presented d values within the acceptance range (1.58–2.42,at 95%), while cellulose and lignin presented errors with a certaindegree of dependence.

4. Conclusions

Considering that nowadays biorefineries do not differentiate hy-brids according to their potential ethanol production [24], and theneed of a large number of experiments to accomplish biofeedstock

Page 7: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

Fig. 4. Control charts for monitoring the stability of the NIRS models. (a) cellulose,(b) hemicellulose, (c) lignin, and (d) TEY. The sample number on the x axis corresponds to a homoge-neous sampling from April to September 2013.

200 C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

improvements, rapid and low cost analytical methods are necessary tocharacterize this feedstock and estimate the ethanol yield. Thus, thispaper developed and validated multivariate calibration methodsbased on diffuse reflectance NIRS for determining cellulose, hemicellu-lose and lignin in samples of sweet sorghum biomass. Besides, thetheoretical ethanol yield was also determined from these cell wallstructural components. The developed methods provided saving costsof US$ 3.30 per sample and drastically reduced the analysis time, sinceonly about 1 min is necessary to obtain a spectrum and predict thefour parameters. The advantages of thismethodology are evenmore ev-ident taking into account that this work analyzed around one thousandsamples, obtained from more than 100 hybrids and inbred strains, andnot demanded to carry out fermentation processes nor chromatograph-ic methods to obtain reference values. In addition, the theoretical esti-mate of ethanol yield from structural components is considered morereliable than the alternative and higher cost approach of quantifyingthe monomer sugar content, since plant structural characteristics suchas lignification affect monomer sugar availability for conversion to eth-anol [11].

The methods were submitted to a complete multivariate analyticalvalidation in accordancewith the Brazilian and international guidelines,and considered accurate, linear, sensitive and unbiased. Finally the

Table 3Parameters of the statistical tests used to evaluate the control charts developed for moni-toring the NIRS models.

Test/parameter Cellulose Hemicellulose Lignin TEY

RJ/Req 0.9763 0.9835 0.9861 0.9843BF/tL 1.53 2.36 0.23 0.30DW/d 1.21 1.59 1.16 1.64

stability of these methods was monitored during approximately sixmonths by developing appropriate control charts.

References

[1] S. Soimakallio, K. Koponen, How to ensure greenhouse gas emission reductions byincreasing the use of biofuels? — Suitability of the European Union sustainabilitycriteria, Biomass Bioenergy 35 (2011) 3504–3513.

[2] L.L. Nass, P.P.A. Pereira, D. Ellis, Biofuels in Brazil: an overview, Crop Sci. 47 (2007)2228–2237.

[3] L.A.B. Cortez (Ed.), Sugar Cane Bioethanol: R&D for Productivity and Sustainability,Blucher, São Paulo, 2010.

[4] J.S. Yuan, K.H. Tiller, H. Al-Ahmad, N.R. Stewart, C.N. Stewart, Plants to power:bioenergy to fuel the future, Trends Plant Sci. 13 (2008) 421–429.

[5] L.R. Lynd, J.H. Cushman, R.J. Nichols, C.E. Wyman, Fuel ethanol from cellulosicbiomass, Science 251 (1991) 1318–1323.

[6] V. Menon, M. Rao, Trends in bioconversion of lignocellulose: biofuels, platformchemicals & biorefinery concept, Prog. Energy Combust. Sci. 38 (2012) 522–550.

[7] A. Eisentraut, Sustainable Production of Second-generation Biofuels: Potential andPerspectives in Major Economies and Developing Countries, International EnergyAgency, Paris, 2010.

[8] Y.L. Zhao, A. Dolat, Y. Steinberger, X. Wang, A. Osman, G.H. Xie, Biomass yield andchanges in chemical composition of sweet sorghum cultivars grown for biofuel,Field Crop Res. 111 (2009) 55–64.

[9] X.F. Li, S.L. Hou, M. Su, M.F. Yang, S.H. Shen, G.M. Jiang, D.M. Qi, S.Y. Chen, G.S. Liu,Major energy plants and their potential for bioenergy development in China,Environ. Manag. 46 (2010) 579–589.

[10] E. Billa, D.P. Koullas, B. Monties, E.G. Koukios, Structure and composition of sweetsorghum stalk components, Ind. Crop. Prod. 6 (1997) 297–302.

[11] K.J. Han, W.D. Pitman, M. Kim, D.F. Day, M.W. Alison, M.E. McCormick, G. Aita,Ethanol production potential of sweet sorghum assessed using forage fiber analysisprocedures, GCB Bioenergy 5 (2013) 358–366.

[12] S. Balint, J. Reczey, Z. Somorai, Z. Kadar, D. Dienes, K. Reczey, Sweet sorghum as feed-stock for ethanol production: enzymatic hydrolysis of steam-pretreated bagasse,Appl. Biochem. Biotechnol. 153 (2009) 151–162.

[13] A. Goshadrou, K. Karimi, M.J. Taherzadeh, Bioethanol production from sweetsorghum bagasse by Mucor hiemalis, Ind. Crop. Prod. 34 (2011) 1219–1225.

[14] C.A. Roberts, J.H. Houx, F.B. Fritschi, Near-infrared analysis of sweet sorghumbagasse, Crop Sci. 51 (2011) 2284–2288.

Page 8: Use of NIRS to predict composition and bioethanol yield from cell wall structural components of sweet sorghum biomass

201C.C. Guimarães et al. / Microchemical Journal 117 (2014) 194–201

[15] A.J. Lorenz, R.P. Anex, A. Isci, J.G. Coors, N. de Leon, P.J. Weimer, Forage quality andcomposition measurements as predictors of ethanol yield from maize (Zea mays L.) stover, Biotechnol. Biofuels 2 (2009) 1–8.

[16] K. Theerarattananoon, X. Wu, S. Staggenborg, J. Propheter, R. Madl, D. Wang, Evalu-ation and characterization of sorghum biomass as feedstock for sugar production,Trans. ASABE 53 (2010) 509–525.

[17] S.C. Murray,W.L. Rooney, S.E. Mitchell, A. Sharma, P.E. Klein, J.E. Mullet, S. Kresovich,Genetic improvement of sorghum as a biofuel feedstock: II. QTL for stem and leafstructural carbohydrates, Crop Sci. 48 (2008) 2180–2193.

[18] J.B. Sluiter, R.O. Ruiz, C.J. Scarlata, A.D. Sluiter, D.W. Templeton, Compositionalanalysis of lignocellulosic feedstocks. 1. Review and description of methods, J.Agric. Food Chem. 58 (2010) 9042–9053.

[19] B.R. Hames, S.R. Thomas, A.D. Sluiter, C.J. Roth, D.W. Templeton, Rapid biomassanalysis — new tools for compositional analysis of corn stover feedstocks and pro-cess intermediates from ethanol production, Appl. Biochem. Biotechnol. 105(2003) 5–16.

[20] F. Xu, J. Yu, T. Tesso, F. Dowell, D. Wang, Qualitative and quantitative analysis oflignocellulosic biomass using infrared techniques: a mini-review, Appl. Energy104 (2013) 801–809.

[21] K.P. Vogel, B.S. Dien, H.G. Jung, M.D. Casler, S.D. Masterson, R.B. Mitchell, Quantifyingactual and theoretical ethanol yields for switchgrass strains using NIRS analyses,Bioenergy Res. 4 (2011) 96–110.

[22] F. Pohl, T. Senn, A rapid and sensitive method for the evaluation of cereal grains inbioethanol production using near infrared reflectance spectroscopy, Bioresour.Technol. 102 (2011) 2834–2841.

[23] C. Magaña, N. Núñez-Sánchez, V.M. Fernández-Cabanás, P. García, A. Serrano, D.Pérez-Marín, J.M. Pemán, E. Alcalde, Direct prediction of bioethanol yield in sugarbeet pulp using near infrared spectroscopy, Bioresour. Technol. 102 (2011)9542–9549.

[24] X. Hao, K. Thelen, J. Gao, Prediction of the ethanol yield of dry-grind maize grainusing near infrared spectroscopy, Biosyst. Eng. 112 (2012) 161–170.

[25] S. Yokoyama (Ed.), The Asian Biomass Handbook. A Guide for Biomass Productionand Utilization, The Japan Institute of Energy, Japan, 2008.

[26] P. Valderrama, J.W.B. Braga, R.J. Poppi, State of the art of figures of merit in multivar-iate calibration, Quim. Nova 32 (2009) 1278–1287.

[27] P. Valderrama, J.W.B. Braga, R.J. Poppi, Variable selection, outlier detection, andfigures of merit estimation in a partial least-squares regression multivariate calibra-tion model. A case study for the determination of quality parameters in the alcoholindustry by near-infrared spectroscopy, J. Agric. Food Chem. 55 (2007) 8331–8338.

[28] M.A.M. Silva, M.H. Ferreira, J.W.B. Braga, M.M. Sena, Development and analytical val-idation of a multivariate calibration method for determination of amoxicillin in sus-pension formulations by near infrared spectroscopy, Talanta 89 (2012) 342–351.

[29] M.H. Ferreira, J.W.B. Braga, M.M. Sena, Development and validation of a chemometricmethod for direct determination of hydrochlorothiazide in pharmaceutical samples bydiffuse reflectance near infrared spectroscopy, Microchem. J. 109 (2013) 158–164.

[30] B.G. Botelho, B.A.P. Mendes, M.S. Sena, Development and analytical validation ofrobust near-infrared multivariate calibration models for the quality inspectioncontrol of mozzarella cheese, Food Anal. Methods 6 (2013) 881–891.

[31] Annual Book of ASTM Standards, Standard Practices for Infrared MultivariateQuantitative Analysis — E1655-05, ASTM International, West Conshohocken,Pennsylvania, USA, 2012.

[32] S.V.C. Souza, R.G. Junqueira, A procedure to assess linearity by ordinary least squaresmethod, Anal. Chim. Acta. 552 (2005) 25–35.

[33] P.J. van Soest, Nutritional Ecology of Ruminant, Cornell University Press, Ithaca,1994.

[34] Ankom Technology, http://www.ankom.com/procedures.aspx (accessed in Novem-ber 2013).

[35] C.M. Popescu, M.C. Popescu, A near infrared spectroscopy study of the structuralmodifications of lime (Tilia cordata Mill.) wood during hydro-thermal treatment,Spectrochim. Acta A 115 (2013) 227–233.

[36] J.J. Workman Jr., L. Weyer, Practical Guide to Interpretive Near-Infrared Spectrosco-py, CRC – Press, Boca Raton, USA, 2008.

[37] R.W. Kennard, L.A. Stone, Computer aided design of experiments, Technometrics 11(1969) 137–148.

[38] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra, Appl. Spectrosc. 43 (1989)772–777.

[39] C.D. Brown, R.L. Green, Critical factors limiting the interpretation of regressionvectors in multivariate calibration, TrAC Trends Anal. Chem. 28 (2009) 506–514.

[40] Ministério da Agricultura Pecuária e Abastecimento (MAPA), Secretaria de DefesaAgropecuária, Instrução Normativa No 24, 2009. (Brazil).

[41] W. Horwitz, Protocol for the design, conduct and interpretation of method-performance studies: Revised 1994, Pure Appl. Chem. 67 (1995) 331–343.

[42] Ministério da Agricultura Pecuária e Abastecimento (MAPA), Secretaria de DefesaAgropecuária, Manual da Garantia da Qualidade, Brasília, Brazil, 2011.

[43] P. Williams, Implementation of near-infrared technology, in: P. Williams, K. Norris(Eds.), Near-Infrared Technology in the Agricultural and Food Industries, 2nd ed.,American Association of Cereal Chemists Inc., St. Paul, USA, 2001.