4

Click here to load reader

Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

  • Upload
    chris-w

  • View
    220

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

placed on a single "key band." In addition, the method is fast (typical curve-fits require less than 30 s) and objective. A further sophistication of the least-squares curve-fitting procedure which may be desirable is the inclusion of an "interaction" term in the curve-fitting algorithm. This would improve the accuracy of the linear least-squares coefficients and yield "interaction spectra" directly during the curve-fitting computation.

ACKNOWLEDGMENT

The authors gratefully acknowledge the financial support provided by the National Science Foundation under Grant DMR76-07630.

1. J. L. Koenig, Appl. Spectrosc. 29, 293 (1975). 2. J. L. Koenig, L. D'Esposito, and M. K. Antoon, Appl. Spectrosc. 31, 292

(1977). 3. H. S. Gold, C. E. Rechsteiner, and R. P. Buck, Anal. Chem. 48, 1540 (1976). 4. H. V. Drushel, J. S. Ellerbe, R. C. Cox, and L. H. Lane, Anal. Chem. 40,

370 (1968).

5. C. Rangarajan, U. C. Mishra, Smt. S. Gopalakrishnan, and S. Sadasivan, "Analysis of Complex NaI(T1) Gamma-Spectra from Mixtures of Nuclides," B.A.R.C.-686, (Government of India Atomic Energy Commis- sion, Bhaba Atomic Research Center, Bombay, 1973).

6. J. A. Blackburn, Anal. Chem. 37, 1000 (1965). 7. A. Savitzky and M. J. E. Golay, Anal. Chem. 36, 1627 (1964). 8. A. A. Afifi and S. P. Azen, Statistical Analysis; A Computer Oriented

Approach (Academic Press, New York, 1973), Chap. 3. 9. R. Kaiser, Gas Phase Chromatography (Butterworths, Washington, 1963),

Vol. II, Chap. 3, p. 115. 10. D. Ambrose and B. A. Ambrose, Gas Chromatography, (Van Nostrand,

Princeton, NJ, 1962), Chap. 10, p. 162. 11. E. Heftmann, Chromatography, (Reinhold, New York, 1967), Chap. 29, p.

804. 12. D. H. Desty and W. T. Swanton, J. Phys. Chem. 65, 766 (1961). 13. A. B. Littlewood, T. C. Gibb, and A. H. Anderson, in Gas Chromatography

1968 (Institute of Petroleum, London, 1969), pp. 309-313. 14. J. L. Koenig, J. R. Shelton, and R. L. Pecsok, unpublished data. 15. H. Ishida and J. L. Koenig, J. Colloid Inter. Sci. submitted. 16. S. Wellinghoff, J. L. Koenig, and E. Baer, J. Poly. Sci. Poly. Phys. Ed. in

press. 17. D. L. Tabb, Ph.D. Thesis (Case Western Reserve University, Cleveland,

1974).

Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

CARL D. BAER* and CHRIS W. BROWN'~ Department of Chemistry, University of Rhode Island, Kingston, Rhode Island 02881

A novel method is presented for identifying the source of weathered petroleum by measuring the similarity between the spec t rum of a weathered oil and spect ra of artificially weathered samples . The Pear son correla t ion coefficient is used as the similarity measure , and three separate K-nearest neighbor approaches are tested on the data set.

Index Headings: In f ra red spectra; Correlat ion coefficient.

INTRODUCTION

During the past few years several groups 1 have dem- onstrated that infrared spectroscopy is one of the better analytical methods for identifying the source of spilled petroleum. However, there are certain problems associ- ated with the method. We have found that the spectra of many petroleum samples change with weathering and they cannot be "matched" to the spectra of the source oils. Furthermore, spectra of samples of many heavy oils such as residuums cannot be measured using fixed pathlength cells and, in many cases, the exact pathlength is not known. Thus, it is not always possible to overlay spectra or even to use typical pattern recog- nition techniques, since the cell pathlength is unknown.

We solved the problem of unknown pathlengths by ratioing the absorptivities of two spectra and eliminat-

Received 14 Februa ry 1977;revision received 13 Ju ly 1977. * Present address: Depar tmen t of Chemistry , Brown Universi ty ,

Providence, RI 02912. t Author to whomcor respondenee should be addressed.

ing the pathlengths of both cells; this method has been discussed in several publications. 1-3 We also developed a method to weather petroleum artificially in a test tube so that petroleum on water for less than 1 week could be identified by infrared spectroscopy using the ratio method. 4

In the present report, we offer a different solution to the weathering problem; it is a more time-consuming method, but it is more applicable to identifying the source of long term spills. The computer method for matching spectra is also different from the one we have been using, but it still eliminates the pathlength of the cells.

I. EXPERIMENTAL

All oils were weathered in the laboratory and in one of the following weathering grids: (1) Narragansett Bay; (2) On-shore; or (3) Laboratory roof.

The weathering grids are discussed elsewhere and will only be referred to in this report ) Moreover, the methods for preparing the oil samples for spectral analysis have been described previously) All spectra were measured on a Perkin-Elmer 521 infrared spec- trometer, and the digitized absorptivities have been published elsewhere. 6

II. RESULTS AND DISCUSSION

A. Data Base. Twenty-one different oils including light through heavy crudes and nos. 5 and 6 fuel oils

524 Volume 31, Number 6, 1977 APPLIED SPECTROSCOPY

Page 2: Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

TABLE I. Oils used in K-neares t neighbor identification.

Identification no. a Oil type Identification no." Oil type

152311 Crude 312 h Sirtica, N. Africa 313 101 102 104

162301 Crude 302 Venice, Louis iana 303 101 b 102 103

169301 Crude 302 Lagomedio, Venezuela 303 101 b 102 103

171301 Crude 302 Ecuador and Oriente mix 303 101 102 b 103

172331 Crude 332 b Venezuela 333 101 102 103

174311 Crude 312 b Ventura , California 313 101 102 103

177331 Crude 332 Ecuador 333 101 102 b 103

178301 u Crude 302 Mesa, E. Venezuela 303 101 102 103

179301 Crude 302 Cabinda, Off-shore, Africa 303 101 102 103 b

181311 Crude 312 Off-shore, Zaire 313 ~ 101 102 103

185301 b Crude 302 South America 303

101 102 103

213361 Crude 363 Amna, N. Africa 363 411 412 413 b

605331 No. 6 fuel 332 ~) 333 101 102 103

607331 No. 5 fuel 332 South America 333 101 102 108 b

608301 No. 5 fuel 302 U.S.A. 3O3 101 b 102 103

610301 No. 6 fuel 302 South America 303 b 101 102 103

613301 No. 6 fuel 302 Gach Savan 3O3 101 b 201 103

620301 No. 6 fuel 302 Marine 303 441 442 h 443

622381 No. 6 Fuel 382 M.V. Arion 383 451 452 b 453

623301 No. 6 Fuel 302 b 303 452 453 454

626351') No. 6 Fuel 352 353 121 122 123

a Identification no.: F i rs t three d i g i t s - identification n u m b e r of neat oil; Four th d i g i t - w e a t h e r i n g location: 1 = N a r r a g a n s e t t Bay, 2 = Roof of laboratory, 3 = In laboratory, and 4 = On-shore; Fifth d i g i t - w e a t h e r i n g exper iment n u m b e r for oils weathered dur ing more than one experiment; and Sixth d i g i t - t i m e of weathering: 1 = 2 days, 2 = 7 days, 3 = 14 days, and 4 = 28 days.

Denotes samples used as unknown.

APPLIED SPECTROSCOPY 525

Page 3: Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

were selected to test the method; the type and the origins (when possible) of the 21 oils are given in Table I. Wea thered samples were collected at 2, 7, and 14 days (one oil no. 152 was also sampled at 28 days). Spectra of all th ree labora tory weathered samples and of the th ree samples from one of the other grids were digitized, and the absorptivi t ies were stored in a com- puter da ta file; the identif icat ion numbers and the location of wea ther ing are also given in Table I.

B. Cor re la t ion Coeff ic ients . The Pearson correla- t ion coefficient is used as a measure of s imilar i ty between spectra; this t e rm has been used previously to re t r ieve infrared spectra from data files.7 The coefficient is defined as

EX~Y~- l_Ex~yi i n i i

r = (1) [C..,X2 _ 1_ (EX,)2)(~y2 _ 1 (~y,)2)],,2

n n

where X~ and Y~ are two sets of spectral da ta (absorptiv- ities) and r represents the l inear i ty between the two. The sum is over all n bands, where n = 18 in the present case. The spectral da ta for the two samples is l inear ly correlated if r = 1. If r = 0, there is no correlat ion and, if r = - 1 , the negat ive of one set is l inear ly correlated with the positive of the other set.

Eq. (1) can be rewr i t t en in te rms of absorptivi t ies for the two spectra, i.e.,

1

n r = (2)

[(~a~, - 1 (~ail)2)(~a2 2 __ 1 (~ai2)2)]1/2 n n

Absorptivi ty is defined as

aij ---- AiJbj (3)

where A~j is the absorbance of the i th band in the spectrum of the j t h sample and bj is the pa th leng th of the j t h sample. By subs t i tu t ing Eq. (3) into Eq. (2) we get

1 1 - - ~Ai,A~2 - ~A i , ~Ai~ b,b2 nb,b2

r = (4)

o r

b?•A•2 nb2

1/2

~Ai,A~2 - 1 ~Ail ~Ai2 n

Thus, the pa th leng th of both cells cancel, and r is independent of pathlength.

C. Method o f Analysis and Appl ica t ion to Da ta Set. One of the six weathered samples of each of the 21 oils was arb i t ra r i ly selected as an unknown and the other

five used as knowns. Thus, we have 21 unknowns and 105 knowns.

The type of identif ication procedure devised is what is known as a K-neares t neighbor classification. This method relies on a reference set whose classifications are known a pr ior i . In our case, this reference set is the 105 spectra of weathered oils. The correlat ion coef- ficients between the reference set and an unknown are calculated, and the highest values of this s imilar i ty measure de te rmine which knowns are the neares t neighbors. The value of K de termines how many of these neighbors are to have a par t in the decision (classification) process. Normal ly , the neares t neigh- bors are assigned a "vote" and the known which totals the highest number of votes is the classification into which the unknown is placed. A grea t deal of var ie ty is possible in the number of neares t neighbors used and in the votes given to each.

The following three K-nea res t neighbor techniques were tested:

1. In the first approach K = 1 was chosen, i.e., the spectrum of the known having the highest correlat ion with the spect rum of the unknown was selected as the spectrum of the source oil. It has been suggested tha t this approach is the best one for classifying objects whose propert ies are fixed and unchanging, s which is not real ly the type of problem we are deal ing wi th in the present case. However, the success ra te was good with 18 of 21 unknowns correctly identified as is shown in Table II.

2. In the second approach al lowance was made for the fact t ha t oil undergoes cont inuous change as it weathers and perfect duplicat ion of the spectrum of the weathered unknown is not expected. The reference file should contain samples tha t are sl ightly more and

T A B L E II . Resu l t s of K-nearest neighbor identification.

Values of K and r e s u l t a n t r a n k

U n k n o w n K = 1 r a n k K = 5 K = 3

Votes R a n k Votes R a n k

Tota l

152312 1 15 1 6 1 162101 ' "" 7 1 3 T" 169101 1 12 1 4 1 171102 1 7 T 2 2 172332 "" 7 1 3 1 174312 1 13 1 5 1 177102 1 9 1 1 2 178301 1 5 T 0 .-. 179103 1 5 T 1 2 181313 1 14 1 5 1 185301 1 12 1 3 1 213413 1 11 1 6 1 605332 1 15 1 6 1 607103 1 6 2 0 ... 608101 1 12 1 5 1 610303 1 14 1 6 1 613101 1 9 1 5 1 620442 1 5 T 3 1 622452 1 11 1 5 1 623302 1 15 1 6 1 626351 ". 9 1 6 1

correct ¢ 18 20 16

" I f no "votes" a re recorded no r a n k is de t e rmined . ~ T ind ica t e s a t ie for first . " Ties a re counted as correct iden t i f i ca t ion .

526 Vo lume 31, Number 6, 1977

Page 4: Identifying the Source of Weathered Petroleum: Matching Infrared Spectra with Correlation Coefficients

T A B L E III . E x a m p l e o f iden t i f i ca t ion by co r r e l a t i on coef f i c ien t s with K = 5. ~

Ident i f ica t ion no. of Votes Top 10 r va lue s k n o w n s

0.9944 177101 0.9914 177103 0.9911 171101 0.9898 178101 0.9892 172101 0.9886 178102 0.9885 171303 0.9884 172102 0.9877 171301 0.9869 177333

Tota l votes: 177xxx (9), 171xxx (3), 178xxx (2), 172xxx (1)

" In t h i s case, s amp le no. 177102 was t he u n k n o w n . T he h i g h e s t 10 corre la t ion coefficients a re l i s ted a long wi th the ident i f ica t ion n u m - bers of the k n o w n s and t he votes.

T A B L E IV. Iden t i f i ca t ion o f 21 unknown weathered oils by corre- lation coefficients with K = 5.

U n k n o w n K n o w n no. (votes) Success n o ,

152312 152 (15) 1st 162101 162 (7),179 (6),169 (2) 1st 169101 169 (12),626 (3) 1st 171102 171 (7),177 (7),178 (1) Tie for 1st 172332 172 (7),178 (5),171 (3) 1st 174312 174 (13),607 (2) 1st 177102 177 (9),171 (3),178 (2),172 (1) 1st 178301 178 (5),610 (5),172 (4),622 (1) Tie for 1st 179103 179 (5),169 (5),162 (3),626 (2) Tie for 1st 181313 181 (14),152 (1) 1st 185301 185 (12),620 (3) 1st 213413 213 (11),181 (4) 1st 605332 605 (15) 1st 607103 185 (7),607 (6),620 (2) 2nd 608101 608 (12),607 (3) 1st 610303 610 (14),178 (1) 1st 613101 613 (9),178 (3),171 (2),622 (1) 1st 620442 620 (5),185 (5),607 (4),174 (1) Tie for 1st 622452 622 (11),610 (3),613 (1) 1st 623302 623 (15) 1st 626351 626 (9),169 (6) 1st

slightly less weathered, i.e., bracketing the spectrum of the unknown oil. For this reason, more of the nearest neighbors were allowed to take part in the classification of the unknown. It was felt that the closest similarity measures were still the most important. Therefore, a weighted vote was applied with K = 5 and the votes as follows: nearest = 5, 2nd = 4, 3rd = 3, 4th = 2, and 5th = 1. This procedure is somewhat arbitrary, but never- theless it appears to be useful.

The results of this second method with K = 5 on unknown no. 177102 are given in Table III. Correlation coefficients between this unknown and all 105 knowns were calculated and ordered by decreasing values of r. The top ten r values and identification numbers are given in the table; the top five are assigned votes of 5 through 1, and these are summed for knowns from the same source at the bottom of the table. In this case, no.

177 was clearly identified as the source. The results of this method on all 21 unknowns are

listed in Table IV, where all knowns receiving votes are listed. Sixteen unknowns were identified correctly and unambiguously, four unknowns were correctly identified but tied in votes with another oil, and one unknown was incorrectly matched (the correct choice being second). The final results of this method are compared with those of the first method (i.e., K = 1 method) in Table II.

3. The final approach was designed to test further how well artificial weathering could provide samples for use in identification. The unknowns remained the same for this test, but the reference oils which origi- nated at the same location as the unknown oil were not allowed to have a part in the classification. As this left only three "correct" reference spectra, the voting method was changed. The value of K was set at 3, and the votes were 3, 2, and 1 for the highest three correla- tion coefficients, respectively. The results are also given in Table II, where it is shown that 16 of the 21 oils were correctly identified. This is certainly a comparable success rate.

The success of the latter approach indicates that identification of weathered oils is possible using a reference set of artificially weathered oils, i.e., oils weathered at another location under different condi- tions. Actually the results should not be taken as an upper limit on the probability of correct identification as the weathering conditions of the reference set were not optimized. For example, some of the unknown oils were weathered at one location at higher temperatures than the reference oils at the other location.

III. CONCLUSIONS

The method presented in this report makes it possible to identify spills of up to 2 weeks duration by bracketing the oil from an actual spill with samples weathered for shorter and longer periods of time. It demonstrates that in many cases laboratory weathering is sufficient for generating a reference data set. Furthermore, it provides an alternative method for matching spectra which eliminates the need to know the exact pathlength of the cells used for measuring the spectra.

ACKNOWLEDGMENT

This work was supported under U.S. Coast Guard Contract DOT-CG-81- 74-1099. The authors express appreciation to Mark Ahmadjian and Patrieia F. Lynch for help in tabulating the data and for useful advice.

1. A. P. Bentz, Anal. Chem. 48, 454A (1976), and references therein. 2. P. F. Lynch and C. W. Brown, Environ. Sci. Technol. 7, 1123 (1973). 3. C. W. Brown, M. Ahmadjian, C. D. Baer, and P. F. Lynch, Environ. Sci.

Technol. 10, 777 (1976). 4. C. W. Brown and P. F. Lynch, Anal. Chem. 46, 191 (1976). 5. M. Ahmadjian, C. D. Baer, C. W. Brown, V. M. Westervelt, D. F. Grant,

and A. P. Bentz, Anal. Chem. 48, 628 (1976). 6. C. W. Brown, "Identification of Oil Slicks by Infrared Spectroscopy," U.S.

Coast Guard Contract DOT-CG-81-74-1099, Final Report (1 June 1976). 7. K. Tanabe and S. Saeki, Anal. Chem. 47, 118 (1975). 8. J. W. Sammon, Jr., IEEE Trans. Comput. C-18, 401 (1969).

APPLIED SPECTROSCOPY 527