7
Infrared Spectral Search for Mixtures in Medium-Size Libraries SU-CHIN LO and CHRIS W. BROWN* Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881 A new algorithm is presented for searching medium-size infrared spectral libraries for the components in spectra of mixtures. The algorithm treats the spectra in the library as an m-component quantitative analysis prob- lem in which each of the library spectra represents a standard mixture having a concentration of 1.0 for that component. Principal component regression (PCR) is used to reduce the dimensionality of the problem and to provide the regression coefficients for determining pseudo- concentrations or composition indices (CI) in mixtures. The PCR analysis is followed by the application of an adaptive filter to remove all similarity of the first target component from the mixture and from a selected subgroup of the library. This is followed by a second PCR analysis on the modified spectral data to identify the next target compound. If the correct target components are selected with successive applications of the adaptive filter, the residuals will approach zero. All components in five two- and three-component mixtures were correctly identified by this new Mix-Match algorithm, whereas only two of the five mixtures were completely identified by a typical dot-product search routine. Index Headings: Infrared; Chemometrics; Multivariate analysis; Li- brary search; Pattern recognition; Multicomponent analysis. INTRODUCTION Identifying unknown compounds by matching infrared spectral "fingerprints" with those of a known molecule has been a common practice for several decades. Large spectral libraries have been generated and a number of search algorithms developed. One of the main difficulties in identifying an unknown sample by searching libraries is that the unknown may contain several components either in large concentrations or as small impurities. This can lead to incorrect identification or, at best, to the identification of only one component. In a continuing effort to expand the applicability of optical spectroscopy to the analysis of real world samples, we have been exploring algorithms that can search li- braries for unknown mixtures. Our goal is to be able to search reasonably large spectral libraries for spectra of unknowns consisting of a single component or mixtures of two, three, or more components. The algorithm must be reasonably fast and identify each of the components with a reliability approaching that found in other algo- rithms for searching pure samples. Recently, Nyden and co-workers 1.2 introduced a very simple algorithm for searching libraries for mixtures. Ba- sically, the algorithm treats the search as a multicom- ponent analysis problem. Each of the spectra in the li- brary is considered to be that of a pure standard having a concentration of 1.0. Thus, for a 25-compound library, a multicomponent analysis would be performed for 25 components, each having a concentration of 1.0 in the Received 17 April 1991; revision received 28 June 1991. * Author to whom correspondence should be sent. corresponding standard spectra. A calibration matrix is calculated from the 25 pure standards. In the case of an unknown of several components, the calibration matrix is used to predict the concentration of each of the (pos- sible 25) components present in the mixture. This is an over-simplification of the method, but we will explain it in greater detail below. At the time the above method appeared in the liter- ature, we were creating a UV-visible library and testing search algorithms in the UV-visible region. 3 We also obtained excellent results with the method for two- component unknowns when searching a small library. However, as the library grew, we eventually reached a point at which the method would not work. For example, the results were very good for a 25-compound library, but the approach would not work for a 60-compound library. At that time, we started exploring the problems and possible solutions. Eventually, we found that the appropriate search algorithm depended upon the size of the library and that libraries could be generally cate- gorized into the following sizes: 1. Small library: all of the spectra are linearly indepen- dent. 2. Medium library: the spectra are not all linearly in- dependent, but the total number of spectra is less than the number of data points in each spectrum. 3. Large library: the spectra are not all linearly inde- pendent, and the total number of spectra is greater than the number of data points in each spectrum. The method developed by Nyden and co-workers 1'2 does a splendid job on the small library. We have developed an algorithm for identifying mixture components from a medium-size library, which is a combination of the one for the small library with the addition of principal com- ponent regression analysis. We also developed an algo- rithm for a large library, which is a modification of the one for the medium library, incorporating additional computations and approximations. We will discuss the development and applications for a medium-size library in the present paper and those for the large library in the following paper. 4 THEORY The method proposed by Nyden I for identifying com- ponents in mixtures expresses the spectrum of a mixture containing any of m components as a linear combination of spectra of pure components M = CX (1) where M is a row matrix of n absorbances in the mixture spectrum, C is a (1 × m) concentration matrix, and X is Volume 45, Number 1 0, 1991 0003-7028/91/4510-162152.00/0 APPLIED SPECTROSCOPY 1621 © 1991Societyfor AppliedSpectroscopy

Infrared Spectral Search for Mixtures in Medium-Size Libraries

  • Upload
    chris-w

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Infrared Spectral Search for Mixtures in Medium-Size Libraries

Infrared Spectral Search for Mixtures in Medium-Size Libraries

S U - C H I N L O and C H R I S W. BROWN* Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881

A new algorithm is presented for searching medium-size infrared spectral libraries for the components in spectra of mixtures. The algorithm treats the spectra in the library as an m-component quantitative analysis prob- lem in which each of the library spectra represents a standard mixture having a concentration of 1.0 for that component. Principal component regression (PCR) is used to reduce the dimensionality of the problem and to provide the regression coefficients for determining pseudo- concentrations or composition indices (CI) in mixtures. The PCR analysis is followed by the application of an adaptive filter to remove all similarity of the first target component from the mixture and from a selected subgroup of the library. This is followed by a second PCR analysis on the modified spectral data to identify the next target compound. If the correct target components are selected with successive applications of the adaptive filter, the residuals will approach zero. All components in five two- and three-component mixtures were correctly identified by this new Mix-Match algorithm, whereas only two of the five mixtures were completely identified by a typical dot-product search routine. Index Headings: Infrared; Chemometrics; Multivariate analysis; Li- brary search; Pattern recognition; Multicomponent analysis.

INTRODUCTION

Identifying unknown compounds by matching infrared spectral "fingerprints" with those of a known molecule has been a common practice for several decades. Large spectral libraries have been generated and a number of search algorithms developed. One of the main difficulties in identifying an unknown sample by searching libraries is that the unknown may contain several components either in large concentrations or as small impurities. This can lead to incorrect identification or, at best, to the identification of only one component.

In a continuing effort to expand the applicability of optical spectroscopy to the analysis of real world samples, we have been exploring algorithms that can search li- braries for unknown mixtures. Our goal is to be able to search reasonably large spectral libraries for spectra of unknowns consisting of a single component or mixtures of two, three, or more components. The algorithm must be reasonably fast and identify each of the components with a reliability approaching that found in other algo- rithms for searching pure samples.

Recently, Nyden and co-workers 1.2 introduced a very simple algorithm for searching libraries for mixtures. Ba- sically, the algorithm treats the search as a multicom- ponent analysis problem. Each of the spectra in the li- brary is considered to be that of a pure standard having a concentration of 1.0. Thus, for a 25-compound library, a multicomponent analysis would be performed for 25 components, each having a concentration of 1.0 in the

Received 17 April 1991; revision received 28 June 1991. * Author to whom correspondence should be sent.

corresponding standard spectra. A calibration matrix is calculated from the 25 pure standards. In the case of an unknown of several components, the calibration matrix is used to predict the concentration of each of the (pos- sible 25) components present in the mixture. This is an over-simplification of the method, but we will explain it in greater detail below.

At the time the above method appeared in the liter- ature, we were creating a UV-visible library and testing search algorithms in the UV-visible region. 3 We also obtained excellent results with the method for two- component unknowns when searching a small library. However, as the library grew, we eventually reached a point at which the method would not work. For example, the results were very good for a 25-compound library, but the approach would not work for a 60-compound library. At that time, we started exploring the problems and possible solutions. Eventually, we found that the appropriate search algorithm depended upon the size of the library and that libraries could be generally cate- gorized into the following sizes:

1. Small library: all of the spectra are linearly indepen- dent.

2. Medium library: the spectra are not all linearly in- dependent, but the total number of spectra is less than the number of data points in each spectrum.

3. Large library: the spectra are not all linearly inde- pendent, and the total number of spectra is greater than the number of data points in each spectrum.

The method developed by Nyden and co-workers 1'2 does a splendid job on the small library. We have developed an algorithm for identifying mixture components from a medium-size library, which is a combination of the one for the small library with the addition of principal com- ponent regression analysis. We also developed an algo- rithm for a large library, which is a modification of the one for the medium library, incorporating additional computations and approximations. We will discuss the development and applications for a medium-size library in the present paper and those for the large library in the following paper. 4

T H E O R Y

The method proposed by Nyden I for identifying com- ponents in mixtures expresses the spectrum of a mixture containing any of m components as a linear combination of spectra of pure components

M = CX (1)

where M is a row matrix of n absorbances in the mixture spectrum, C is a (1 × m) concentration matrix, and X is

Volume 45, Number 1 0, 1991 0003-7028/91/4510-162152.00/0 APPLIED SPECTROSCOPY 1621 © 1991 Society for Applied Spectroscopy

Page 2: Infrared Spectral Search for Mixtures in Medium-Size Libraries

an (m x n) matrix containing the spectra of the pure components. (We have altered the original nomenclature to correspond more closely to that used in chemomet- rics.) It was assumed that the reference spectra were linearly independent and X was replaced with an or- thonormal representation

M = U .kO (2)

where

o = z x . (3)

A Gram-Schmidt orthonormalization was used to trans- form the spectra in X to the orthonormal vectors in O, and the Z-matrix represents this transformation. Since O is an orthogonal matrix, we can rearrange Eq. 2 to give

Uun k -~ MO' (4)

where O t is the transpose of O and Uunk is the projection of the mixture spectrum M onto the orthogonal repre- sentation. Substituting Eq. 3 into Eq. 2 and comparing the results with Eq. 1 shows that

C = UunkZ. (5)

Thus, for an unknown mixture spectrum the transfor- mation matrix Z is multiplied by the projections of the mixture spectrum, Uuuk, to obtain the concentrations of the reference compounds in the sample of the mixture.

This method successfully identifies components in mixtures as long as all of the library spectra are linearly independent. As the library increases in size the spectra become more similar, there is a loss of linear indepen- dence, and m orthogonal spectra cannot be obtained from the m reference spectra.

For somewhat larger libraries in which the number of spectra is less than the number of spectral data points, we found a simple extension to the method which worked as well as the original method. The spectral library can be treated by principal component analysis to produce an orthogonal set of independent spectra representa- tions. The orthonormal representations in the O matrix can be replaced by the principal component represen- tations.

Mix-Match Calibration Algorithm. The algorithm for searching mixtures, which we call Mix-Match, consists of several parts. The semi-quantitative approach pro- posed by Nyden e t a l ? ,2 is augmented with principal component analysis to find linearly independent vectors for regressing the "standard concentrations." In order to reduce the size of the data set, Fourier compression is applied prior to performing the principal component re- gression analysis. The results of the regression analysis are applied to the unknown spectrum to determine the hypothetical concentrations of each of the components in the library in the unknown sample. The hits are sorted by "concentration," and the top 20 are reported and re- tained for further processing. The spectrum of the mix- ture and the spectra of the top candidates can be made orthogonal to the top hit so as to remove its similarity from each of the other spectra. A new principal com- ponent regression is performed with the remaining 19 top candidates and the resulting regression used to de- termine the concentration of these candidates in the un-

known. This processing is repeated until the residuals in the spectrum of the unknown are negligible.

Search algorithms based on fitting entire spectral regions with a Fourier series equation have been used successfully for UV-visible 3 and infrared spectral librar- ies) -7 The Fourier-domain representation of a spectrum has the majority of the signal information concentrated in a relatively small region near the beginning of the series. Thus, the peak position, peak width, and shape information in the spectral domain are all compressed into a small Fourier window. In addition, higher single- to-noise ratios and reduced background effects provided by Fourier processing offer improvements over the spec- tral and derivative domains. This approach is especially practical in identifying mixture unknowns from large spectral libraries in order to retain the most significant spectral information and to reduce computer storage space and processing time.

Principal component analysis (PCA), a variation on factor analysis for data reduction and simplification, has been used previously to compress infrared spectral li- braries, a-l° The main goal of PCA is to reduce the di- mensionality of the spectral data and thus simplify the analysis being performed. The general processing in PCA is to decompose the spectral library X (m rows of library spectra and n columns of wavelengths) into two orthog- onal matrices9 ~,12

X = UVt + E (6)

where V is an (n × f) matrix of factor loadings or prin- cipal components, U is an (m × f) score matrix contain- ing the factor scores for the projections of each of the spectra onto the principal components, and E is an (m × n) matrix of spectral variations (or noise) that are not modeled by the PCA procedure. The index, f, designates the number of principal components used to account for as much variance as possible in the data. In this case, the columns of the V matrix define the principal axes, and the rows of U matrix represent the coordinates of each of the spectra onto the principal axes.

The spectrum of a pure unknown can be easily iden- tified by the dot-product between the library scores (U) and unknown scores (U,nk). For a pure or single unknown case, a large spectral library may be reduced to 20 % of its size by PCA and still provide the same searching performance. 9 However, due to the complicated spectral patterns in a mixture which may distort the search re- sults, a 20% compression level for searching mixtures is not practical. Therefore, more principal components are needed to amplify the minor spectral differences in order to classify the unknown mixture.

In applying the modified PCR procedure, an extended concentration matrix, C, is related to the score matrix

C = US (7)

where S is an ( / x m) proportionality matrix containing the regression coefficients. The least-squares estimate of the proportionality matrix is

S = (UtU)-IUtC = (UtU)-iU t = (~k)-lU '. (8)

In this case, the C matrix is an identity matrix since the concentrations of each standard are assumed to be 1.0 and the UtU matrix equals the eigenvalues (A) of X. Once

1622 Volume 45, Number 10, 1991

Page 3: Infrared Spectral Search for Mixtures in Medium-Size Libraries

~_S_p e--Li-b :o rT- ] : t raJ Ir FFT

F~lb :~ry~l PCA IFT V~ ctors I . - - -

I __~____t_ Prod uct I C

P~rojection I I Vectors I

~ Regress with Concentrations

t Regression ] [ Coefficients

I Components I

Unl~ FT\

Dot-Product

P r o Unl:

Multiplication w

L~F ~Adaptive l~ j-

Final I mesu_ItsIl

FIG. 1.

t o w n t

r FFT lown

r

ction o w n

V

l Concentrations! in Unknown [

Flow chart of the Mix-Match processing.

the S matrix is determined from the reference score ma- trix, the absorbance spectrum of an unknown mixture M..k is projected onto the orthogonal unit vectors (V) to obtain its score vector

Uun k = MunkV. (9)

The analysis of the unknown mixture is achieved simply by multiplying the unknown score vector U~k with the predetermined S matrix, i.e.,

C u n k : U u n k S . ( 1 0 )

The concentrations in C,.k are only approximate since concentrations of the standards are not all unity and intermolecular interactions in the mixtures are not in- cluded. However, the "trends" in the C.nk should be cor- rect, at least for the major components. We refer to these pseudo-concentrations as the composition indices, CI.

Adaptive Filter. As an additional verification for the composition of the mixture, one or more target com- pounds can be removed from the mixture and from the library by a procedure known as an adaptive filter. To expedite the processing, it is applied to a limited number of top hits from the library, e.g., to the 20 or so top hits. This processing is similar to the Gram-Schmidt ortho- gonalization method. The first target component spec- trum from the library, kl, is selected and normalized according to:

h i = k l / l k l ] (11)

where [kl[ is the norm of kl. In order to remove the first target component from the mixture, the projection of the original mixture M0 onto the normalized target spectrum is obtained by the dot-product

d~2 = M o h l t (12)

and that amount is removed from the mixture spec- trum to produce an orthogonal spectrum

TABLE I. Listing of mixtures with approximate concentrations.

EPA Concentra- File name number Chemical name tions

MIX21 508 Toluene, o-chloro- 1.5 #L 863 m-toluidine 1.5 #L

MIX22 583 2-Pentanone 0.7 #L 881 Anisole 1.0 #L

MIX23 75 1,2,4-Trimethylbenzene 1.0 #L 799 1,3,5-Trimethylbenzene 0.7 #L

MIX31 9 Toluene, m-nitro- 1.0 #L 35 Toluene, p-chloro 1.0 uL

824 Benzene, m-dichloro- 1.0 #L MIX32 572 Benzene, o-dimethoxy- 1.0 #L

818 Benzene, bromo- 1.0 #L 1 6 9 7 Anisole, p-bromo- 0.2 #L

M , = M o - d ,2h, . (13)

The same processing is applied to all of the spectra in the subset of the library, i.e., each library spectrum in the subset is made orthogonal to hi. For identification of the next possible target component from the residual mixture, a new orthogonal basis set and a modified S-matrix are calculated with the use of the modified spectra.

The adaptive filter is repeatedly applied until the pre- dicted CIs reach a minimum level. Theoretically, without spectral noise or chemical interactions in the mixture, the final CIs would be closed to zero, indicating that all components have been filtered from the mixture spec- trum with the adaptive filter.

The entire method is outlined in the flow chart shown in Fig. 1. In this chart, the processing of the library spectra proceeds down the left side and that for the unknown spectrum down the right side. The "flows" in- teract at the following three points: to form the scores, to calculate the unknown concentrations (CIs), and to apply the adaptive filter.

EXPERIMENTAL

Measurement of Spectra. The mixture spectra were measured in a 10.2-cm-long, 1.9-cm-i.d gas cell equipped with AgCI windows. The cell was made from A1 and heated to 250°C for measuring the vapor-phase spectra.

1.4-

1.258-

1.]16-

0.974-

0.839-

0.69-

0.548-

0.406-

0.264-

0.122-

-0.02

.,;L, f',.

3800 3487 3174 2861 2548 2235 1822 1609 1896 983 670

FIG. 2. Infrared spectra of m-toluidine ( ) and o-chlorotoluene (---).

APPLIED SPECTROSCOPY 1623

Page 4: Infrared Spectral Search for Mixtures in Medium-Size Libraries

TABLE II. Mix-match results of small spectral library (103 spectra).

Spectral range: 3800-670 cm -1 Unknown mixture:

MIX21 508 Toluene, o-chloro 863 m-Toluidine

Chemical name

Composi t ion index

Fil- Fil- Pri- tering tering

mary 863 508

#1 863 m-Toluidine 0.596 0.000 0.000 #2 508 o-Chlorotoluene 0.566 0.746 0.000 #3 2839 Butyrophenone, 2*-hydroxy-

2-phenyl- 0.194 0.153 0.000 #4 2630 Benzofuran, 2,3-dihydroxy-

2,2,7-trimethyl- 0.190 0.000 0.000

#5 3294 1,3-Propanediol, 2-hydroxy- methyl-2-methyl- 0.166 0.189 0.064

#6 2105 Nicotinamide, N,N-dipropyl- 0.157 0.000 0.001 #7 1978 Naphtha lene , 2,3-dibromo- 0.133 0.591 0.000 #8 818 Benzene, bromo- 0.131 0.370 0.004 #9 285 6-Hendecanone 0.126 0.000 0.001

#10 2309 Morpholine, 4-piperidino- sulfonyl- 0.121 0.000 0.000

Spectra of five mixtures of two and three components listed in Table I were collected from 4000 to 450 cm -1 on Bio-Rad (Digilab) FTS-40 Fourier transform infrared spectrometer equipped with DTGS detector. All of the chemicals used were ACS reagent grade or analytical grade.

Spectral Library Processing. The infrared library con- sisted of 3300 vapor-phase spectra collected and distrib- uted by the Environmental Protection Agency (EPA) Library. All spectra were visually inspected to correct the sloping baselines and to remove questionable spectra. There were 131 spectra judged as anomalous and elim- inated in this research. The spectra were digitized at 2-cm -1 data intervals from 449.41 to 4000.35 cm -1, pro- ducing 1842 discrete values. Each spectrum in the region from 670 to 3800 cm -1 was shifted by 670 cm -~ so that the first data point was at 0 and the last at 3130 cm-1; it was zero-filled up to 4096 cm -1, made symmetrical about 0 cm -1, and then transformed to a Fourier-domain vector. A window of Fourier terms from 3 to 256 was retained for processing. The vector was normalized to unity in order to remove the concentration effects.

For the present study, two smaller libraries were con- structed from the EPA library. The first library was gen-

M i x - M a t c h S e a r c h Mix21- EPAS08/EPA863

COMPOSITION INDEX (258 Spectra)

o-Chlorotoluene/m-Toluidine

0,80 -

0 . 6 0 -

0.40

0.20

0,00 , , , , ,

LIBRARY # - 8 6 3 508 308~ ;3213 ~12e 283E .)803 188C 3198 2.767

PRIMARY W 0.61 0.57 0.29 0.29 0.84 0.23 0.28 0.21 0.20 0.20 RM 863 ~ D.O0 0.79 0.25 0.09 0.00 0.00 0.00 0.04 0.00 0.08

RM 863/508 ~ 3.00 0.00 0.09 0.08 0.00 0.00 0.00 0.00 0.00 0.04

TOP i0 LISTS

FIG. 3. Results of Mix-Match search for MIX21 in the 256-compound library.

erated by the random selection of 103 spectra from the EPA library and the second by the random selection of 256 spectra from the EPA library. Both of these smaller libraries contained all of the components found in the "unknown" mixtures.

Data Processing. Program development and process- ing were completed with an IBM Personal System/2 Model 50Z computer equipped with an Intel 80287 co- processor. Major programs were written in C language and compiled by the Microsoft C 5.0 Optimizing Com- piler (Redmond, WA). All pre-search data treatments such as baseline corrections were processed with LabCalc software (Galactic Industries, Salem, NH). Orthogonal vectors for PCR were obtained by the Successive Average Orthogonalization algorithm. 13 This algorithm was de- veloped specifically for performing orthogonalization of large data sets such as a spectral library, since it does not require that the entire library be orthogonalized; i.e., the processing can be terminated at any desired level.

RESULTS AND DISCUSSION

Mix-Match Search. The first unknown mixture con- sists of o-chlorotoluene and m-toluidine; their spectra

TABLE III. Mix-match results of medium spectral library (256 spec- tra).

16 t Spectral range: 3800-670 em -1 Unknown mixture: Composi t ion index 1. 438

MIX21 508 Toluene, o-chloro Fil- Fil- 1. z7a 863 m-Toluidine Pri- tering tering /

Chemical name mary 863 508 1.114

#1 863 m-Toluidine 0.610 0.000 0.000 o. gs2

#2 508 o-Chlorotoluene 0.574 0.785 0.000 #3 3085 1-Naphthaleneethanol 0.288 0.250 0.093 0.79 #4 3213 Propionamide, 2,2,3-trichloro- 0.287 0.094 0.000 O. 628 #5 2126 1-Hexanol, 6- /dipropylamino/- 0.236 0.000 0.000 #6 2838 A c e t o p h e n o n e , 2* -hydroxy-2 - 0.466-

phenyl- 0.227 0.000 0.000 #7 2803 Pyprolidine, 1-phenyl-2-/phenyl- 0.304-

imino/- 0.223 0.000 0.000 #8 1880 Acetoni t r i le , /methyl imino/di- 0.204 0.004 0.000 0.142- #9 3198 2-Butanone , 3 ,3-d imethyl - , ox-

ime 0.202 0.000 0.000 -0.02 #10 2767 Benzyl alcohol, m-iodo- 0.201 0.075 0.004

3ooo 34o7 3~74 2o5~ z54o 2~3s ~o~2 ~5oo ~oB 003 67o

FIG. 4. Infrared spectra of 2-pentanone ( ) and anisole ( - - -) .

1624 Volume 45, Number 10, 1991

Page 5: Infrared Spectral Search for Mixtures in Medium-Size Libraries

Mix-Match Search M i x 2 2 - E P A 5 8 3 / E P A 8 S l

COMPOSITION INDEX (103 Spectra)

8 -Pentanone/Anisole

0.60 -~""'"/I ~I - - ///

0.oo VIE'/, = / ,~ < ~ , ~ / , = <~ 7= y,f , LIBRARY # - I 881 583 1697 ~831118~ 541 ~696 8605 305.~ 690

PRIMARY ~ 10.50 0.35 0.22 0.19 0.16 0.14 0.14 0.13 0.13 0.12 RM 881 ~ 10.00 0.80 0.00 0.24 0.06 0.04 0.05 0.03 0.09 0.00 RM 881/583 ~ I0.00 0.00 0.00 0.02 0.04 0.01 0.00 0.00 0.01 0.04

TOP 10 LISTS

FIG. 5. l i b ra ry .

R e s u l t s of M i x - M a t c h s e a r c h for M I X 2 2 in t h e 1 0 3 - c o m p o u n d

are shown in Fig. 2. The results from Mix-Match for the small library are given in Table II and those for the medium library in Table III. In both cases, m-toluidine is the first hit and o-chlorotoluene is the second hit. After the adaptive filtering was applied to remove m-tolui- dine, the CI of o-chlorotoluene increased. After the adap- tive filter was applied to remove o-chlorotoluene, the residuals were reduced to under 0.1. The results for the medium library are displayed in Fig. 3; the vertical axis represents the CIs in the range of 0 to 1.0. The library reference numbers are listed along the bottom of the three-dimensional plot; below these numbers are the cor- responding CIs from the initial search, the CIs after the adaptive filter was used to remove m-toluidine (#863), and the CIs after the o-chlorotoluene (#508) was re- moved.

The only major differences between the results from searching the small library compared to those for the medium library were found for the second two-compo- nent mixture of 2-pentanone and anisole. Their spectra

v.vv LIBRARY # -

PRIMARY RM 881

RM 881/583

FIG. 6. l i b ra ry .

Mix-Match Search M i x 2 2 - E P A 5 8 3 / E P A 8 8 1

COMPOSITION INDEX (256 Spectra)

2 - P e n t a n o n e / A n i s o l e

0.40

0.20 t

8;i 30'85 16'97 ~9'60 32'I~ 0.44 0.83 0.28 0.15 0.14 0.00 0.06 0.03 0.02 0.07 0.00 0.00 0.00 0.00 0.01

TOP I0 LISTS

R e s u l t s of M i x - M a t c h s e a r c h for M I X 2 2 in t h e 2 5 6 - c o m p o u n d

1.2-

I. 078-

0. 956-

0.834-

0.712-

0 . 5 9 -

0.46B-

0. 346-

0.224-

0.102-

-0.02- 38oo 3487 3174 2861 2s40 223s 1822 1000 ~288 003 87o FIG. 7. I n f r a r e d s p e c t r a of 1 , 2 , 4 - t r i m e t h y l b e n z e n e ( ) a n d 1,3,5- t r i m e t h y l b e n z e n e ( - - - ) .

are shown in Fig. 4, the results for searching the small library are shown in Fig. 5, and those for the medium library in Fig. 6. When the small library was searched, the first search correctly identified the two components (#881 and #583), and the application of the adaptive filter produced the anticipated results. However, when the medium-size library was searched, anisole was the top hit, but 2-pentanone (#583) was number 9 on the list. After application of the adaptive filter to remove anisole from the mixture and from the library (i.e., mak- ing the mixture and all of the library spectra orthogonal to the anisole spectrum), 2-pentanone was correctly iden- tified.

In the remaining examples, searching both the small and medium libraries produced similar results, and we will discuss only those for the medium library. The third two-component example was a mixture of two trimethyl- benzenes, 1,3,5- and 1,2,4-trimethylbenzene; their re- spective spectra are shown in Fig. 7. The results of searching the medium library are shown in Fig. 8. Both target compounds (#799 and #75) were top during the first search, and application of the adaptive filter con- firmed their presence.

Mix-Match Search M i x 2 3 - E P A 7 9 9 / E P A 7 5

COMPOSITION INDEX (256 Spectra)

/ / / ~ 1,3,5- and 1,2,4-Trimethylbenzene

0.80 I / / / / ~ . . . . . . . . . . . .

° " T / M 1 ; . . . . . . . . . . . 040 ~ .... . . r e . L - N - - X : ~ . . . . . . . . . .

UBR~Y # .... -I 7~9 7'5 16'1018'60 30'06 30'85 8~8 ~2'85 2~8912%~ PRIMARY ~ I 0 . 8 0 0.19 0.17 0.18 0.13 0.13 0.12 0.12 0.11 0.10 RM 799 ~ 10.00 0.260.00 0.02 0.00 0.00 0.03 0.000.00 0.05 RM 799/75 ~ 10.00 0.00 0.01 0 0 1 0 0 0 0 0 3 0 0 5 0 0 0 0 0 3 0 0 9

TOP 10 LISTS

FIG. 8. R e s u l t s of M i x - M a t c h s e a r c h for M I X 2 3 in t h e 2 5 6 - c o m p o u n d l i b r a ry .

APPLIED SPECTROSCOPY 1625

Page 6: Infrared Spectral Search for Mixtures in Medium-Size Libraries

1.6-

1.438-

1.276-

1.114"

0. 952-

0. 7 9 -

0.628-

0. 466-

0.304-

0.142-

-O. 02 3000 3487 3 m 2861 o846 9936 1929

F r o . 9. Infrared spectra of nitrotoluene ( ( - - - ) , and p-chlorotoluene ( . . . . . ).

1609 1296 983 670

), m-dichlorobenzene

The first three-component mixture consisted of m-nitro- toluene/p-chlorotoluene/m-dichlorobenzene; their spec- tra are shown in Fig. 9, and results of searching the medium library are shown in Fig. 10. All three com- pounds (#9, #35, and #824) were the top hits; this was confirmed by application of the adaptive filter. In each case, the CI of the second highest component was en- hanced after removal of the component with the high- est CI.

The other three-component mixture contained o-di- methoxybenzene, bromobenzene, and p-bromoanisole. Their spectra are shown in Fig. 11 and the search results in Fig. 12. Again, the algorithm picked the correct target compounds. After removal of the component with the highest CI with the use of the adaptive filter, the CIs of the other two components (#572 and #1697) were en- hanced. When the higher of these two components was removed, the CIs of the remaining component (p-bro- moanisole) decreased, but it had the only significant CI of the eight possible compounds. The volume amount of this component is only 20 % of that of either of the other two components, as listed in Table I. Upon removal of

Mix-Match Search Mix3 I - E P A g / E P A 3 5 / E P A 8 2 4

COMPOSITION INDEX (256 Spectra) A m-Nitrot oluene/p- Chlorotoluene I

o. o 1 0 , 0

020 1

, PRIMARY ~ 10.60 0 . 4 1 0 . 2 7 O.2C :Z19 0.16 0.13 O. iS 0.13 9.1.~

RM 824 ~ 1 0 . 0 0 0.47 0.40 9.0~ 3 . 1 6 0 . 0 0 0 . 0 0 0 . 0 0 0 . 0 6 0 . 0 1 RM 824/9 ~ 1 0 . 0 0 0.00 0.41 9.0C I08 0.00 0.00 0.00 0.18 D.01 RM 824/9/35 ~ ]0.00 0.00 0.00 9.0C ).0O 0.04 0.05 0.00 0.0( O.01

TOP i0 LISTS

FIG. 10. Results of Mix-Match search for MIX31 in the 256-compound library.

1.5-

1. 348 -

1.196 -

1. 0 4 4 -

O. 892 -

0. 74-

O. 588 -

O. 436 -

0. 284 -

O. 132-

-0. 02

ii

!:: :i I

i :l~ I

3690 3487 3174 2891 9648 9996 1922 19o8 1986 889 670 FIG. 11. Infrared spectra of o-dimethylbenzene ( ), p-bromoani- sole ( - - - ) , and bromobenzene ( . . . . . ).

p-bromoanisole by the adaptive filter, all of the CIs were less than 0.025, which supports the identification.

Dot-Product Search. The capability of a standard search algorithm to correctly identify components in spectra of mixtures depends upon the differences in the spectra of the components and the uniqueness of their combined spectra. If the sum of the component spectra corresponds to that of a single molecule in the library, that molecule is likely to appear ahead of the correct components. An- other problem arises when one component has a number of strong distinguishing spectral features, whereas the other component or components lack such features. Fi- nally, the identification depends upon the distribution of compounds in the library. For example, if there are a number of similar compounds in the library correspond- ing to one of the components in the mixture, these may all appear in the hit list above the other component(s) in the mixture.

The results of searching the 256-compound library with the five mixtures with the use of a dot-product metric are presented in Table IV. Both components in MIX21 and MIX23 were correctly identified. At first, it seemed

Mix-Match Search Mix32-EPA572/EPA818/EPA1697

COMPOSITION INDEX ( 2 5 6 Spectra)

0.60

0.40

0,20

0.00 LIBRARY #

PRIMARY E RM 818

RM 8 1 8 / 5 7 2 RM 818/572/16977

TOP 10 LISTS

FIG. 12. Results of Mix-Match search for MIX32 in the 256-compound library.

1 6 2 6 V o l u m e 45 , N u m b e r 10, 1991

Page 7: Infrared Spectral Search for Mixtures in Medium-Size Libraries

,~ TABLE IV. Results for searching the 256-compound library with spec- tra of mixtures using dot-product metric.

Mixture Component Hit #

MIX21 o-Chlorotoluene 1 m-Toluidine 2

MIX22 Anisole 1 2-Pentanone 21

MIX23 1,3,5-Trimethylbenzene 1 1,2,4-Trimethylbenzene 2

MIX31 a m-Nitrotoluene 2 m-Dichlorobenzene 3 p-Chlorotoluene 5

MIX32 o-Dimethylbenzene 1 p-Bromoanisole 6 Bromobenzene > 100

1,4-Dichloro-2-nitrotoluene was the 1st hit.

rather surprising that the two trimethylbenzenes would be correctly identified. However, the fact that their spec- tra are very similar to each other but different from other samples in the library leads to the correct identification.

In the other three mixtures, all of the components were not correctly identified. In MIX22, anisole was the first hit, but many other similar aromatic compounds were above 2-pentanone and it appeared as the 21st hit. The three components in MIX31 were close to the top; how- ever, the first hit was a molecule (1,4-dichloro-2-nitro- toluene) that combined the chemical groups of the three components. In the final example, MIX32, one com- pound was correctly identified, one was 6th, and the third did not appear in the top 100.

As a comparison, the Mix-Match search algorithm cor- rectly identified the components in all of the mixtures during the first pass through the library, except for 2-pentanone in MIX22. This compound was correctly identified with one application of the adaptive filter.

CONCLUSIONS

Searching spectral libraries falls under the general cat- egory of qualitative analysis. However, in the case of identifying components in mixture spectra, inclusion of a semi-quantitative approach can provide identification for components in mixtures. For small libraries contain- ing linearly independent spectra, the entire library is orthogonalized and the projections (scores) of the mix- ture spectrum onto this set of orthogonal spectra produce the CIs of components in the mixture. For medium-size libraries in which the spectra are not all linearly inde- pendent, the application of PCA provides an appropri- ate orthogonal set of spectra and the qualitative anal- ysis becomes a semi-quantitative application of principal component regression.

ACKNOWLEDGMENT

The authors with to express their appreciation to Steven M. Donahue for many helpful suggestions for the development of the algorithm.

1. M. R. Nyden, Appl. Spectrosc. 40, 868 (1986). 2. M. R. Nyden, J. E. Pallister, D. T. Sparks, and A. Salari, Appl.

Spectrosc. 41, 63 (1987). 3. C. W. Brown and S. M. Donahue, Appl. Spectrosc. 42, 347 (1988). 4. S.-C. Lo and C. W. Brown, Appl. Spectrosc. 45, 1628 (1991). 5. L. V. Azarraga, R. R. Williams, and J. A. de Haseth, Appl. Spec-

trosc. 35, 466 (1981). 6. J. A. de Haseth and L. V. Azarraga, Anal. Chem. 53, 2292 (1981). 7. J. W. Sherman, J. A. de Haseth, and D. G. Cameron, Appl. Spec-

trosc. 43, 1311 8. C. P. Wang and T. L. Isenhour, Appl. Spectrosc. 41, 449 (1987). 9. P. B. Harrington and T. L. Isenhour, Appl. Spectrosc. 41, 449

(1987). 10. P. B. Harrington and T. L. Isenhour, Anal. Chem. 60, 2687 (1988). 11. E. R. Malinowski and D. G. Hower, Factor Analysis in Chemistry

(Wiley, New York, 1980). 12. K. R. Beebe and B. R. Kowalski, Anal. Chem. 59, 1007A (1987). 13. S. M. Donahue and C. W. Brown, Anal. Chem. 63, 980 (1991).

APPLIED SPECTROSCOPY 1627