7
Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach SU-CHIN LO* and CHRIS W. BROWNt Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881 A mathematical technique for the identification of components in the near-infrared spectra of liquid mixtures without any prior chemical in- formation is demonstrated. Originally, the technique was developed for searching mid-infrared spectral libraries. It utilizes principal component analysis to generate an orthonormal reference library and to compute the projections or scores of a mixture spectrum onto the principal space spanned by the orthonormal set. Both library and mixture spectra are analyzed and processed in Fourier domain to enhance the searching performance. A calibration matrix is calculated from library scores and is used to predict the mixture composition. Five liquid mixtures were correctly identified with the use of the calibration algorithm, whereas only one mixture was correctly characterized with a straight dot-product metric. The predictions were verified with the use of an adaptive filter to remove each of the resulting components from the library and the mixture spectra. In addition, a similarity index between the original mixture spectrum and a regenerated mixture spectrum is used as a final confirmation of the predictions. The effects of random noise on the searching method were also examined, and further enhancements of searching performance are suggested for identifying poor-quality mixture spectra. Index Headings: Computer applications; Infrared; Near-infrared; Li- brary search chemometrics; Spectroscopic techniques. complex, overlapping bands, making it difficult to visu- ally interpret the chemical patterns of a molecule. Al- though computer-aided spectral mapping methods pro- vide sufficient capability for identifying a single component, 1° direct mapping of mixture spectra suffers from overlapped and unresolved bands. A method for qualifying an unknown mixture spec- trum is described herein. The method, which we refer to as a Mix-Match search, involves four parts: (1) Fourier- domain processing for improving the signal-to-noise ratio (S/N) and reducing the size of the data matrix; (2) ap- plication of principal component analysis (PCA) for gen- erating and calibrating a new spectral library as orthog- onal spectral patterns; (3) an adaptive filter for making the mixture spectrum and the library spectra orthogonal to the spectrum of each target component; and (4) a similarity index for verifying the final predictions. The theory and algorithms for handling medium- and large- size mid-infrared spectral libraries of gases have been described in detail previouslyH,12and will be briefly out- lined here. INTRODUCTION Near-infrared spectroscopy (NIRS) is being used ex- tensively as a quantitative and qualitative method to analyze pharmaceuticals, polymers, and biological sam- pies? The absorption bands observed in this region (12,500-3800 cm -1) are overtones and combinations of fundamental vibrational bands observed in the mid-in- frared spectrum. The ease of preparing samples, the re- duced interferences due to moisture absorptions, and a variety of transparent solvents make NIRS measure- ments feasible for process and quality control in the in- dustrial world. The widespread availability of computers has led to the extensive use of mathematical techniques for pro- cessing spectral data. In recent years, there have been numerous applications of multivariate methods for quantitative analysis of complex mixtures in NIRS. 2 For qualitative interpretation, the conventional method for routine identification of chemical species is a library search based on a spectral mapping algorithm. 3,4 More recent processes for classifying and interpreting spectra are based on factor analysis,5pattern recognition,° cluster analysis,7 and artificial intelligence algorithms. 8,9 T h e overtones and combinations in the NIRS region cause Received 5 January 1992. * Presentaddress:UnitedStatesTobaccoCompany, 800HarrisonStreet, Nashville, TN 37203. t Author to whomcorrespondenceshouldbe sent. THEORY The major assumption for a mixture analysis is that the spectrum of a mixture behaves as a linear combi- nation of pure-component spectra. Therefore, the mix- ture spectrum M is given by M = CX (1) where the C represents the concentration coefficients and X can be expressed as a matrix of spectra for pure com- ponents (m rows of spectra each with n columns of wave- lengths). In these terms, the problem is to determine the C matrix that gives the best fit of M as a linear combi- nation of the reference library. Orthogonalization of the X matrix is necessary to maintain the uniqueness and linear independence for each reference spectrum in the library. 13 The process of PCA is applied here to find a new system of orthogonal axes and to maintain the max- imum variances. Library searching methods based on the PCA have been described in the literature. 14,15Generally, the sim- ilarity metric is based on the dot-product comparison between the reference and unknown scores (projections). The PCA method is capable of analyzing large spectral libraries to remove redundant properties, leaving only significant information.16 The data matrix X is decom- posed as X = UV' (2) where U is an m × f matrix of PC scores and V is an n × f matrix of PC loadings (f is the number of significant 790 Volume 46, Number 5, 1992 ooo3-7o2s/92/~o5-o79o$2.oo/0 APPLIED SPECTROSCOPY © 1992 Society for Applied Spectroscopy

Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

  • Upload
    chris-w

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

S U - C H I N L O * a n d C H R I S W . B R O W N t

Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881

A mathematical technique for the identification of components in the near-infrared spectra of liquid mixtures without any prior chemical in- formation is demonstrated. Originally, the technique was developed for searching mid-infrared spectral libraries. It utilizes principal component analysis to generate an orthonormal reference library and to compute the projections or scores of a mixture spectrum onto the principal space spanned by the orthonormal set. Both library and mixture spectra are analyzed and processed in Fourier domain to enhance the searching performance. A calibration matrix is calculated from library scores and is used to predict the mixture composition. Five liquid mixtures were correctly identified with the use of the calibration algorithm, whereas only one mixture was correctly characterized with a straight dot-product metric. The predictions were verified with the use of an adaptive filter to remove each of the resulting components from the library and the mixture spectra. In addition, a similarity index between the original mixture spectrum and a regenerated mixture spectrum is used as a final confirmation of the predictions. The effects of random noise on the searching method were also examined, and further enhancements of searching performance are suggested for identifying poor-quality mixture spectra.

Index Headings: Computer applications; Infrared; Near-infrared; Li- brary search chemometrics; Spectroscopic techniques.

complex, overlapping bands, making it difficult to visu- ally interpret the chemical patterns of a molecule. Al- though computer-aided spectral mapping methods pro- vide sufficient capabi l i ty for ident i fying a single component, 1° direct mapping of mixture spectra suffers from overlapped and unresolved bands.

A method for qualifying an unknown mixture spec- trum is described herein. The method, which we refer to as a Mix-Match search, involves four parts: (1) Fourier- domain processing for improving the signal-to-noise ratio (S/N) and reducing the size of the data matrix; (2) ap- plication of principal component analysis (PCA) for gen- erating and calibrating a new spectral library as orthog- onal spectral patterns; (3) an adaptive filter for making the mixture spectrum and the library spectra orthogonal to the spectrum of each target component; and (4) a similarity index for verifying the final predictions. The theory and algorithms for handling medium- and large- size mid-infrared spectral libraries of gases have been described in detail previously H,12 and will be briefly out- lined here.

INTRODUCTION

Near-infrared spectroscopy (NIRS) is being used ex- tensively as a quantitative and qualitative method to analyze pharmaceuticals, polymers, and biological sam- pies? The absorption bands observed in this region (12,500-3800 cm -1) are overtones and combinations of fundamental vibrational bands observed in the mid-in- frared spectrum. The ease of preparing samples, the re- duced interferences due to moisture absorptions, and a variety of transparent solvents make NIRS measure- ments feasible for process and quality control in the in- dustrial world.

The widespread availability of computers has led to the extensive use of mathematical techniques for pro- cessing spectral data. In recent years, there have been numerous applications of multivariate methods for quantitative analysis of complex mixtures in NIRS. 2 For qualitative interpretation, the conventional method for routine identification of chemical species is a library search based on a spectral mapping algorithm. 3,4 More recent processes for classifying and interpreting spectra are based on factor analysis, 5 pattern recognition, ° cluster analysis, 7 and artificial intelligence algorithms. 8,9 The overtones and combinations in the NIRS region cause

Received 5 January 1992. * Present address: United States Tobacco Company, 800 Harrison Street,

Nashville, TN 37203. t Author to whom correspondence should be sent.

THEORY

The major assumption for a mixture analysis is that the spectrum of a mixture behaves as a linear combi- nation of pure-component spectra. Therefore, the mix- ture spectrum M is given by

M = CX (1)

where the C represents the concentration coefficients and X can be expressed as a matrix of spectra for pure com- ponents (m rows of spectra each with n columns of wave- lengths). In these terms, the problem is to determine the C matrix that gives the best fit of M as a linear combi- nation of the reference library. Orthogonalization of the X matrix is necessary to maintain the uniqueness and linear independence for each reference spectrum in the library. 13 The process of PCA is applied here to find a new system of orthogonal axes and to maintain the max- imum variances.

Library searching methods based on the PCA have been described in the literature. 14,15 Generally, the sim- ilarity metric is based on the dot-product comparison between the reference and unknown scores (projections). The PCA method is capable of analyzing large spectral libraries to remove redundant properties, leaving only significant information. 16 The data matrix X is decom- posed as

X = UV' (2)

where U is an m × f matrix of PC scores and V is an n × f matrix of PC loadings (f is the number of significant

790 Volume 46, Number 5, 1992 ooo3-7o2s/92/~o5-o79o$2.oo/0 APPLIED SPECTROSCOPY © 1992 Society for Applied Spectroscopy

Page 2: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

principal components). The score matrix U is composed of the original data points in a new coordinate system described by the eigenvectors (loadings).

In performing a multivariate calibration, the concen- tration matrix C is regressed onto the library scores (U) by the following equation:

C = US (3)

where S is an [ x rn matrix of regression coefficients. For the present method, the key hypothesis is that the con- centration of each normalized spectrum in the reference library is equal to one. Under this condition, the least- squares solution of the calibration coefficients S becomes

S = (UtU)-IU t (4)

where U t is the transpose of library scores (U). The decomposition of an unknown mixture is achieved

by projecting its spectrum onto the reference PCA load- ings to form a mixture score U..k. Therefore, the pre- diction of the relative composition of target components is performed by multiplying the regression coefficient matrix S by the unknown score (U..k) as shown in Eq. 3. The resulting concentrations in the C matrix are re- ferred to as the composition indices (CI) and range from 1.0 for pure components to 0.0. We sort these by de- creasing size in order to place the highest hit at the top.

After selection of the major target component which exhibits the highest hit in the prediction (CI) list, the original reference spectra and mixture spectrum are made orthogonal to this target component; this process is re- ferred to as the adaptive filter. A new orthogonal spectral library is generated by PCA and is used to determine the next target compound. The process of adaptive fil- tering to remove a target component followed by PCA is repeated until the residual composition is less than a default threshold value. Without any chemical interac- tions, nonlinearity of absorbance, impurities, or com- puter round-off errors, the residual compositions should be equal to zero after removal of all the target compo- nents. However, in real mixture samples, residual com- positions less than 0.1 suggest that the appropriate se- lection of target components has been made during the mixture evaluation.

After determination of all possible candidates in the mixture by the adaptive filter method, a similarity index (SI) is calculated. SI is defined as the dot-product be- tween the original mixture PCA score and the weighted scores of selected components from the library; TM the weights are the dot-products of the scores for the mixture and the scores for each of the selected component spectra during the original pass through the library. An SI value close to 1.00 indicates a correct combination of selected components for matching the unknown mixture.

The spectral processing in the Fourier domain im- proves the performance in the analysis of mixture spec- tra. The spectral information such as peak positions, widths, and intensities in the spectral domain is all compressed into a small number of Fourier terms. It has been shown that selecting a window of Fourier terms for spectral processing can increase the signal-to-noise ratio and remove background effects without losing the ma- jority of the spectral information during the library searching. ~7-19

TABLE I. Mixture lists of near-infrared spectra.

Concentrations File name Chemical name (M)

NIRMIX1 Benzyl alcohol 0.26 n-Hexane 0.32

NIRMIX3 1 -Propanol 0.52 Aniline 0.29

NIRMIX6 o-Xylene 0.46 m-Xylene 0.47

NIRMIX7 o-Xylene 0.52 m-Xylene 0.49 p-Xylene 0.48

NIRMIX9 Acetone 0.58 Benzyl alcohol 0.23 Cyclopentanol 0.22

EXPERIMENTAL

Spectroscopic Measurements . Near-infrared spectra of organic liquids were measured on a Bio-Rad (Digilab Division) FTS-40N Fourier transform near-infrared spectrometer equipped with a PbSe detector and a quartz beamsplitter. All spectra were measured from 10,000 cm -1 to 3600 cm -1 at 4 cm -1 resolution with 256 scans. Spectra were measured with the samples dissolved in carbon tet- rachloride and contained in a 10-mm (IR) quartz cuvette. All of the samples including the reference and the test mixtures were prepared from ACS reagent or analytical grade chemicals. The compositions for five mixtures of two- and three-component systems are shown in Table I, and the spectra of the nine pure components used to make these solutions are shown in Figs. t to 3.

Spectral Processing. All spectral data were transferred to an IBM PS/2 Model 50z personal computer from the Digilab 3200 work station for data processing. The Mix- Match program includes a fast Fourier transformation (FFT), spectral orthogonalization, PCA, and adaptive filter functions. All programs were written in C language and compiled with the Microsoft C 5.1 compiler. Each spectrum in the near-IR fingerprint region of 3800 to 4780 cm -~ was transformed to the Fourier domain via the FFT algorithm, and 128 data points were retained. In order to remove the background interferences from each spectrum, the first three data points in the Fourier domain were discarded. The number of PC loadings was selected on the basis of the criteria of the cumulative percent of variance (CPV). The CPV is the percent of total variance (eigenvalues) in the data of dimension n which is accounted for by the subspace representation. In the processing of one hundred NIR spectra, eighty- eight significant PC loadings are used for describing the Fourier-domain window (FT terms from 3 to 130) in the analyses of five mixture spectra. To determine the effects of noise, we added transmittance noise to the test spectra using the random number generator in LabCalc (Galactic Industries Co.) software.

RESULTS AND DISCUSSION

Near-Infrared Spectral Features. The near-infrared li- brary of one hundred liquid compounds contains sub- stituted aromatic compounds, aliphatic alkanes, alco- hols, amines, and some carbonyl groups which all exhibit significant absorptions in the NIR region. Spectra of

APPLIED SPECTROSCOPY 791

Page 3: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

Q)

0 C o21

.D i_ 0

.Q <

n-Hexane .,_.,,----

1 -Proponol

FIG. i.

I I

70'0o 6000 50'00 4000 Wavenumbers (cm-1)

Near-infrared spectra of hexane, propanol, and aniline.

compounds from five different chemical classes in the NIR region are shown in Fig. 4. At the concentrations and pathlength used to prepare this library, most of the spectral information is between 3600 and 7500 cm -1. The overtones of the C-H stretch are just below 6000 cm -1 and those for the N-H and O-H stretches between 6500 and 7100 cm-L The region from 3800 to 4780 cm -1 is similar to the mid-infrared fingerprint region of 600 to 1800 cm -1. The greatest spectral differences are observed in this latter region; thus, we have used it for searching mixture spectra with the Mix-Match algorithm. The ab- sorptions in this region are due to combinations of C-H, O-H, and N-H stretching vibrations with fundamental vibrations appearing in the mid-infrared fingerprint re- gion of 800 to 1800 cm-L Generally, there are complex shapes and strong intensities in this region which provide a number of distinguishing features that can be used for identification.

E v a l u a t i o n o f Mix-Match Program. The search results obtained by the Mix-Match program, as well as the direct dot-product of each unknown mixture, are given in Ta- bles II to VI. The primary lists in each table are based on processing of the entire spectral library. The com- position index is the relative "concentration" for the pre- diction of the components in the spectrum of a mixture by the Mix-Match method. The CI values should fall

0

.D i,_ 0 CO .Q <

_

o-Xylene

m-Xylene

_

p-Xylene 0 - ~

i

. i

im

7ooo 6o'oo 5o'0o 4o'00 Wavenurnbers (cm-1)

FIa. 2. Near-infrared spectra of xylene isomers.

between 0.0 to 1.0. Thus, the search reports list the re- sults by decreasing values of composition of each com- ponent in the library. The dot-product (DP) values be- tween mixture and reference scores are also reported here for comparison with the Mix-Match algorithm. Since all of the target components were condensed into the top 20 nearest matches in the primary lists, the adaptive filter processing can be computed on this subgroup in- stead of the entire reference library. The application of the adaptive filter in the dot-product algorithm is similar to that in the Mix-Match algorithm; the scores of the mixture and the library are made orthogonal to those for the target component, and the dot-product processing is repeated.

The first example (NIRMIX1) is a two-component mixture containing n-hexane and benzyl alcohol; the analysis results are given in Table II. The top two match- es have high composition indices (CI = 0.56 and 0.40), indicating that the mixture is a two-component system. After application of the adaptive filter to remove n-hex- ane from the mixture and reference library, the second target component (benzyl alcohol) is the dominant com- pound (CI = 0.47). The final residuals of the Mix-Match algorithm after removal of the target components from the mixture spectra and the library were all less than 0.02. In addition, the similarity index based on this se-

792 Volume 46, Number 5, 1992

Page 4: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

.

Q) O c- O 2 •

o

<

.

A c e t o n e

Benzyl alcohol

Cyclopentanol I! /'%

70'00 60'00 so'oo 40'o0 Wavenumbers ( c m - 1 )

Fro. 3. Near-infrared spectra of acetone, benzyl alcohol, and cyclo- pentanol.

.

Q) o c o

o I/i .D <

.

.

O.

TOLUENE

HEXANE

PENTANOL

PENTYL AMINE

2-PENTANONE

I

FIG. 4.

70'o0 6o'00 so'oo 4000 Wavenumbers (cm-1)

Near-infrared spectra of five compounds from different classes.

TABLE II. Results on the search of NIRMIX1 .

Unknown components: n-hexane and benzyl alcohol Fourier coefficients = 3-130

(1) Mix-Match search

Composition index (CI)

Removed Primary Removed benzyl

Component name list n-hexane alcohol

n-Hexane 0.56 Benzyl alcohol 0.40 tert-Butylbenzene 0.18 o-Xylene 0.17 Octanoic acid 0.13 Cyclohexanone 0.13

Similarity index = 0.992

(2) Dot-Product search

Primary Component name Hit no. list

0.00 0.00 0.47 0.00 0.05 0.00 0.00 0.02 0.00 0.00 0.00 0.00

n-Hexane 1 0.91 2-Methylhexane 2 0.90 2-Methylheptane 3 0.893 Heptane 4 0.890 Octane 5 0.88 Benzyl alcohol 52 0.65

TABLE I lL Results on the search of N I R M I X 3 .

Unknown components: 1-propanol and aniline Fourier coefficients = 3-130

(1) Mix-Match search

Composition index (CI)

Primary Removed Removed Component name list aniline 1-propanol

Aniline 1-Propanol Octanoic acid N-Methyl-p-toluidine Cyclohexanone Toluene

Similarity index = 0.964

(2) Dot-Product search

0.74 0.00 0.00 0.47 0.48 0.00 0.29 0.14 0.02 0.25 0.12 0.03 0.24 0.00 0.00 0.22 0.00 0.03

Primary Component name Hit no. list

N-Methyl-p-toluidine 1 0.86 sec-Butylbenzene 2 0.83 N,N-Diethylaniline 3 0.82 Propylbenzene 4 0.81 1-Propanol 11 0.72 Aniline 16 0.68

APPLIED SPECTROSCOPY 7 9 3

Page 5: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

TABLE IV. Results on the search of NIRMIX6.

Unknown composition: o-xylene and m-xylene Fourier coefficients = 3-130

(1) Mix-Match search

Composition index (CI)

Primary Removed Removed Component name list m-xylene o-xylene

m-Xylene o-Xylene Nitrobenzene Butyl acetate Cyclopentanol p-Chlorotoluene

Similarity index = 0.998

(2) Dot-Product search

Component name

0.56 0.00 0.00 0.47 0.61 0.00 0.09 0.04 0.05 0.09 0.00 0.01 0.08 0.05 0.03 0.08 0.10 0.08

Primary Removed Removed Hit no. list m-xylene o-xylene

m-Xylene 1 0.95 0.00 0.00 o-Xylene 2 0.89 0.22 0.00 1,3,5-Trimethyl-benzene 3 0.88 0.03 0.02 N-Methyl-p-toluidine 4 0.87 0.03 0.01 1,2,4-Trimethyl-benzene 5 0.85 0.13 0.06 p-Xylene 6 0.83 0.06 0.03

lection was 0.992. All of the results support the fact that the sample contains only two components. The dot-prod- uct metric picked n-hexane in the primary list; however, benzyl alcohol was fifty-second out of 100 compounds. It is reasonable that most of the top 20 obtained by the dot-product metric belong to aliphatic compounds which have spectral patterns similar to that of hexane.

Good search results with the Mix-Match algorithm were obtained for the analysis of NIRMIX3 (1-propanol and aniline), as are shown in Table III. The correct target components were selected in the primary and adaptive filter lists, and produced SI = 0.964. Although the sep- aration between 1-propanol and the third hit (octanoic acid) was close (0.47 to 0.29, respectively), a larger com- position difference between 1-propanol and other pos- sible components was observed after application of the adaptive filter to remove the spectrum of aniline. In eval- uation of the dot-product algorithm, the top two hits from the primary list were an aniline derivative (N-meth- yl-p-toluidine) and sec-butylbenzene. The target com- ponents, 1-propanol and aniline, were the eleventh and sixteenth hits, respectively.

The results on a mixture of o-xylene and m-xylene are shown in Table IV. The composition indices of m- and o-xylene from the primary list are 0.56 and 0.47, which are much higher than those of other similar compounds in the library. After application of the adaptive filter to remove m-xylene, the composition index of o-xylene reached to 0.61, and there was a much larger separation between the target and the other components in the li- brary. The SI value of 0.998 indicated a good identifi- cation of xylene mixtures. The search results using the dot-product algorithm also supported this conclusion. However, due to the similar target components in the mixture, the DP of o-xylene was reduced to 0.22 after removal of m-xylene from the mixture.

The results on a three-component mixture containing all isomeric xylenes are given in Table V. The strongly

TABLE V. Results on the search of NIRMIX7.

Unknown composition: o-xylene, m-xylene, and p-xylene Fourier coefficients = 3-130

(1) Mix-Match search

Composition index (CI)

Re - Re - Re - Primary moved moved moved

Component name list p-xylene m-xylene o-xylene

p-Xylene 0.35 0.00 0.00 0.00 m-Xylene 0.33 0.38 0.00 0.00 o-Xylene 0.29 0.37 0.33 0.00 p-Chlorotoluene 0.14 0.00 0.00 0.00 Cyclohexane 0.11 0.00 0.00 0.00 Butyl acetate 0.09 0.00 0.00 0.00

Similarity index = 0.999

(2) Dot-Product search

Re- Re- Re- Primary moved moved moved

Component name list p-xylene m-xylene o-xylene

p-Xylene 0.94 0.00 0.00 0.00 m-Xylene 0.93 0.18 0.00 0.00 N-Methyl-p-toluidine 0.91 0.10 0.06 0.06 o-Xylene 0.86 0.17 0.16 0.00 1,3,5-Trimethyl-benzene 0.85 0.17 0.11 0.12 1,2,4-Trimethyl-benzene 0.83 0.15 0.13 0.11

overlapped xylene spectra can be positively identified from the primary list and adaptive filter processing. The composition indices indicated that each of the xylene isomers had a relative concentration of about 0.3, and these values are much higher than those of other com- ponents in the lists. Moreover, the combination had a high SI value of 0.999. The dot-product metric picked these three isomers as the first, second, and fourth in the primary lists. All these were correctly identified with the adaptive filter, but residuals were still reasonably high after all three components were removed.

The final unknown mixture (NIRMIX9) was a com- bination of a ketone and aromatic and aliphatic alcohols. The search results obtained by the Mix-Match algorithm listed in Table VI showed that two of the target com- ponents were the top hits. The third hit is pentyl amine, which has a pattern similar to a combination of cyclo- pentanol and acetone. After removal of acetone by the adaptive filter, the other two target components (cyclo- pentanol and benzyl alcohol) are at the top of the list (0.39 and 0.36), and the separation from other hits is very good. The SI value (0.881) for the three top hits in the primary list is clearly lower than that for the combination of the true target components (#1, #2, and #4) in the mixture (SI = 0.976). Thus, in the case of one of the target components not in the top listing, both the adap- tive filter and similarity index offered alternative solu- tions for determining the appropriate target components. These results confirm the use of the adaptive filter and the SI values to support the final verification of selected components in the mixture examination. The dot-prod- uct metric placed the three target components in third, eighteenth, and fiftieth place; thus, the adaptive filter gave erroneous results since none of the target com- pounds was in first place.

In general, search results obtained by a straight dot- product metric are dependable only for classifying the

794 Volume 46, Number 5, 1992

Page 6: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

TABLE VI. R e s u l t s o n t h e s e a r c h o f NIRMIX9.

Unknown composition: acetone, benzyl alcohol, and cyclopentanol Fourier coefficients = 3-130

(1) Mix-Match search

Component name

Composition index (CI)

Re- Re- Re- moved moved

Primary moved cyclo- benzyl list acetone pentanol alcohol

Acetone 0.55 0.00 0.00 Cyclopentanol 0.39 0.39 0.00 Pentyl amine 0.23 0.00 0.00 Benzyl alcohol 0.23 0.36 0.39 Heptanol 0.22 0.01 0.00 iso-Butanol 0.20 0.00 0.00

SI of acetone, cyclopentanol, and benzyl alcohol = 0.976 SI of acetone, cyclopentanol, and pentyl amine = 0.881

(2) Dot-Product search

Hit Primary Component name no. list

0.00 0.00 0.00 0.00 0.03 0.07

2-Pentanone 1 0.85 Methyl ethyl ketone 2 0.83 Acetone 3 0.80 Butyl acetate 4 0.78 Cyclopentanol 18 0.68 Benzyl alcohol 50 0.51

most distinctive target components from a mixture. In addition, the separation between the target and incorrect components is often very close to the dot-product met- ric, which can cause ambiguous conclusions in the pro- cessing of complex mixture spectra. On the other hand, the Mix-Match algorithm not only picks the target com- ponents correctly but also produces a large difference between the correct components and other library spec- tra. Thus, the Mix-Match method provides better selec- tivity to increase the discrimination between correct and incorrect identification.

Evaluation of Random Noise Effects. The performance of a standard search algorithm in correctly identifying target components depends upon the quality of the un- known spectra. In the presence of noise and a sloping baseline, the searching algorithm can be misled.

In order to evaluate the contribution of random noise effects during the Mix-Match processing, two example sets were used to test the searching performance. The first test example contained two xylene isomers and the second set contained hexane and benzyl alcohol. Random "white" noise was added to each set at levels of 0.5, 2.0, 4.0 and 5.0% T. The results for both sample sets are displayed in a three-dimension bar graph shown in Figs. 5 and 6. As the random noise levels in the spectrum are increased from 0.5% to 5%, the difference between the target hits and the next nearest hits decreased until, at 5 %, two wrong components were selected.

There are two possible methods for improving the search performance of the Mix-Match algorithm on noisy spectra. The size of the Fourier window can be reduced and/or fewer PC loadings can be used. In both example sets, an approach using only 54 eigenvectors instead of 88 to describe the model (Fourier range: 3 to 130) tol- erated an added noise level of up to 10% T without selecting the wrong components. Furthermore, it was also

Mix-Match Search NIRI: n - H e x a n e / B e n z y l a lcohol

COMPOSITION INDEX

0.80 .,//, , 1 0.60

0.40 /"

0.20

u.uu ' i I " i i #

OH HITI# 3 HIT # 4 5

5.0% T Noise l 0.45 I 0.42 0.74 0.48 4.0% T Noise ~-~ 0.61 [ 0.24 0.34 0.29

2.0% T Noise ~ 0.56 I 0.33 0.17 0.15

0.5% T Noise ~ 0.53 I 040 0.19 0.18 Original ~ i 0.56 [ 0.40 0.18 0.17

FIG. 5. Results of Mix-Match search for NIRMIXl (hexane and ben- zyl alcohol) in the study of random noise effects.

found that using only 64 Fourier terms (3 to 66) as co- ordinates of the vectors was sufficient to identify the target components with up to 10% T noise. Thus, sig- nificant improvements in noise toleration can be ob- tained by eliminating the higher eigenvectors or drop- ping higher Fourier terms; both the higher eigenvectors and higher Fourier terms represent noise but contain little spectral information. However, it is important to realize that using fewer terms to identify the poor-quality spectra also reduces the discriminating ability to sepa- rate complex mixtures. This is especially true in analyses of mixtures in a large spectral library where the differ- ences of spectral patterns are relatively small. Never- theless, the Mix-Match algorithm is capable of analyzing noisy mixture spectra.

CONCLUSIONS

The feasibility of applying the Mix-Match algorithm to analyze the complex liquid mixtures in the NIR region

M i x - M a t c h S e a r c h NIR6: o - X y l e n e / m - X y l e n e

COMPOSITION INDEX

0.60 -

0.40 -

0 .~ )0 -

u.uu

HIT'# 5

5.0% T Noise m 0.46 4.0% T Noise ~ - ~ 2.0% T Noise ~ ) 0 - 5 3 ) 0.48 1 0.20 1 0.19 1 0.17 1 0.5% T Noise 0-56 I 0"46 I 0"09 I 0"09 ] 0"08 [ Original ~ 1 0.56 ] 0.47 ] 0.09 [ 0.09 [ 0.08 I

FIa. 6. Results of Mix-Match search for NIRMIX6 (m- and o-xylene) in the study of random noise effects.

APPLIED SPECTROSCOPY 795

Page 7: Near-Infrared Mixture Identification by an Automated Library Searching Method: A Multivariate Approach

has been evaluated. The major advantage of this method is to identify components in mixtures without any prior knowledge of the compositions. The search results ob- tained with five mixture samples showed that the method is reliable and selective. Moreover, the combination of the adaptive filter and the similarity index offers more flexibility for the final evaluation of mixtures. The data processing based on the use of Fourier representation is able to improve the searching performance on poor-qual- ity, unresolved spectra such as those which might be obtained from a GC/FT-IR instrument. Overall, this method may be particularly appealing for qualitative analyses that require rapid solution of poorly separated organic mixtures without prior information.

ACKNOWLEDGMENTS

The authors wish to express their appreciation to Steven M. Donahue for many helpful discussions and suggestions for the development of the algorithm.

1. E. Stark, K. Luchter, and M. Margoshes, Appl. Spectrosc. Rev. 22, 335 (1986).

2. H. Martens and T. N~s, Multivariate Calibration (Wiley, New York, 1989).

3. G. W. Small, Anal. Chem. 59, 535A (1987). 4. H. J. Luinge, Vibrational Spectrosc. 1, 3 (1990). 5. G. Hangac, R. C. Wieboldt, B. R. Lain, and T. L. Isenhour, Appl.

Spectrosc. 36, 40 (1982). 6. D. S. Frankel, Anal. Chem. 56, 1011 (1984). 7. G. Jalsovszka and S. Holly, J. Mol. Struct. 175, 263 (1988). 8. B.J. Wythoffand S. A. Tomellini, Anal. Chim. Acta 227, 359 (1989). 9. M. A. Puskar, S. P. Levine, and S. R. Lowry, Anal. Chem. 58, 1156

(1986). 10. S.-C. Lo and C. W. Brown, Bio-Rad Digilab FTS/IR No. 68, Jan-

uary (1989). 11. S.-C. Lo and C. W. Brown, Appl. Spectrosc. 45, 1621 (1991). 12. S.-C. Lo and C. W. Brown, Appl. Spectrosc. 45, 1628 (1991). 13. M. C. Nyden, Appl. Spectrosc. 40, 868 (1986). 14. P. B. Harrington and T. L. Isenhour, Appl. Spectrosc. 41, 449

(1987). 15. C. P. Wang and T. L. Isenhour, Appl. Spectrosc. 41, 185 (1987). 16. E.R. Malinowski and D. G. Howery, Factor Analysis in Chemistry

(Wiley-Interscience, New York, 1980). 17. C. W. Brown and S. M. Donahue, Appl. Spectrosc. 42, 347 (1988). 18. J. W. Sherman, J. A. de Haseth, and D. G. Cameron, Appl. Spec-

trosc. 43, 1311 (1989). 19. J. A. de Haseth and L. V. Azarraga, Anal. Chem. 53, 2292 (1981).

796 Volume 46, Number 5, 1992