7
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, VOL 6, 585-591 (1992) RESEARCH ME THO DS AND A PPL ICA TI0 NS Computer-assisted Determination of Elemental Composition of Fragments in Mass Spectra K. Kumart and A. G. Menon* Instrumentation and Services Unit, Indian Institute of Science, Bangalore 560 012, India P. S. Sastry Department of Electrical Engineering, Indian Institute of Science, Bangalore 560 012, India SPONSOR REFEREE: Professor D. Mathur, Tata Institute of Fundamental Research, Bombay, India This paper presents details of an expert system that infers elemental composition of fragment peaks in a mass spectrum and suggests possible chemical formulae for the molecular ion even when it does not itself appear in the spectrum. The system relies on a knowledge of the natural isotopic abundances of the elements, and exploits the fact that fragments are siblings, with the molecular ion as their parent. The program has been written in PASCAL and runs on any IBM PC-AT compatible machine. . This paper deals with some aspects of the development of an “expert system” for the elucidation of structures of organic compounds from their mass spectra. While our earlier effort’ focused on the problem of inferring the structures of acyclic compounds (with known mole- cular formulae) from their mass spectra, the present study attempts to obtain the elemental composition of fragments in the spectrum and to suggest the molecular formula of the compound even in cases where the molecular ion is absent from the spectrum. Current methods in computer-assisted interpretation of mass spectra rely on searching through large data- bases. Such methods will become too unwieldy and expensive in the coming years on account of the prolif- eration of newer analysing and ionizing techniques. For example, there will be an enormous increase in the volume of data if the database has to contain tandem (MUMS) spectra. Further, in the context of the mass spectra of new compounds, the assistance provided by the databases would be limited principally by the non- availability of spectra generated by the new techniques. In contrast, an expert system would provide plausible inferences from the mass spectrum based on knowledge of the fragmentation process and some chemical intui- tion. Within a kernel of a few hundred kilobytes of memory, such a system would output important infor- mation that chemists could build on. Our work is motivated by these considerations and represents our first attempt at providing computer-based interpre- tation assistance to the Our system takes as its input the mass spectrum of the organic compound and outputs the elemental com- position of the various fragment peaks in the spectrum. It identifies clusters of peaks in the mass spectrum that are isotopically related and, by making use of the abundances of the natural isotopes of the elements to relate intensities within the cluster of peaks, identifies Current address: Tata Consultancy Service, Madras, India. Author to whom correspondence should be addressed. the chemical composition of the fragment peak.5 The system always considers three contiguous peaks, to be denoted by M, M + 1 and M + 2, and examines whether the peaks at M *+ 1 and M + 2 bear an isotopic relation- ship with M. This is inferrable from the ratios of intensities of these three peaks, I#, I,,,+, and IM+2. We shall call the peak at M the fragment peak and the ones at M + 1 and M + 2 the isotopic peaks. The system can also accommodate possible interference by significant abundances of an M - 1 peak, in the intensity ratios in the cluster of peaks at M, M+ 1 and M+2. From the large number of composition possibilities for a given fragment peak, the choices are pruned by checking for adherence to two rules-the rings plus double bonds (RDB) rule and the nitrogen rule5 (see Section 1). We have been guided in our choice of appropriate rules largely by McLafferty’ and have tested the program on the examples given there. The system developed is a stand-alone module and can output elemental compo- sition of fragments without being integrated with our earlier system. The program has been written in PASCAL and runs on an IBM PC-AT compatible machine. Probably the best-known expert system for interpret- ing mass spectra is the DENDRAL program,6 deve- loped in the sixties. DENDRAL infers plausible struc- tures of compounds from their mass spectra by means of fragmentation rules which constitute the so-called ‘zeroeth order theory’ of mass spectrometry. A later system, Meta-DENDRAL,7 tries to infer the relevant rules from the spectra of compounds with known struc- tures. The DENDRAL system requires a priori know- ledge of the molecular formula of the compound. However, nowadays one of the principal uses of mass spectra is in determining the molecular formulae of compounds. Though expert systems have been used for many applications in chemistry (e.g., Reference S), to the best of our knowledge the system we have deve- loped is the first that attempts to infer a molecular formula from the spectrum. In addition, our system provides the formulae for a few fragment peaks as well, which is useful in structure el~cidation.~ 095 1-4198/92/090585-07 $08.50 0 1992 by John Wiley & Sons. Ltd Received 16 July 1992 Accepred 18 July I992

Computer-assisted determination of elemental composition of fragments in mass spectra

  • Upload
    k-kumar

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer-assisted determination of elemental composition of fragments in mass spectra

RAPID COMMUNICATIONS IN MASS SPECTROMETRY, VOL 6, 585-591 (1992)

RESEARCH ME THO DS AND A PPL ICA TI0 NS Computer-assisted Determination of Elemental Composition of Fragments in Mass Spectra K. Kumart and A. G. Menon* Instrumentation and Services Unit, Indian Institute of Science, Bangalore 560 012, India

P. S. Sastry Department of Electrical Engineering, Indian Institute of Science, Bangalore 560 012, India

SPONSOR REFEREE: Professor D. Mathur, Tata Institute of Fundamental Research, Bombay, India

This paper presents details of an expert system that infers elemental composition of fragment peaks in a mass spectrum and suggests possible chemical formulae for the molecular ion even when it does not itself appear in the spectrum. The system relies on a knowledge of the natural isotopic abundances of the elements, and exploits the fact that fragments are siblings, with the molecular ion as their parent. The program has been written in PASCAL and runs on any IBM PC-AT compatible machine. .

This paper deals with some aspects of the development of an “expert system” for the elucidation of structures of organic compounds from their mass spectra. While our earlier effort’ focused on the problem of inferring the structures of acyclic compounds (with known mole- cular formulae) from their mass spectra, the present study attempts to obtain the elemental composition of fragments in the spectrum and to suggest the molecular formula of the compound even in cases where the molecular ion is absent from the spectrum.

Current methods in computer-assisted interpretation of mass spectra rely on searching through large data- bases. Such methods will become too unwieldy and expensive in the coming years on account of the prolif- eration of newer analysing and ionizing techniques. For example, there will be an enormous increase in the volume of data if the database has to contain tandem (MUMS) spectra. Further, in the context of the mass spectra of new compounds, the assistance provided by the databases would be limited principally by the non- availability of spectra generated by the new techniques.

In contrast, an expert system would provide plausible inferences from the mass spectrum based on knowledge of the fragmentation process and some chemical intui- tion. Within a kernel of a few hundred kilobytes of memory, such a system would output important infor- mation that chemists could build on. Our work is motivated by these considerations and represents our first attempt at providing computer-based interpre- tation assistance to the

Our system takes as its input the mass spectrum of the organic compound and outputs the elemental com- position of the various fragment peaks in the spectrum. It identifies clusters of peaks in the mass spectrum that are isotopically related and, by making use of the abundances of the natural isotopes of the elements to relate intensities within the cluster of peaks, identifies

’ Current address: Tata Consultancy Service, Madras, India. Author to whom correspondence should be addressed.

the chemical composition of the fragment peak.5 The system always considers three contiguous peaks, to be denoted by M, M + 1 and M + 2, and examines whether the peaks at M *+ 1 and M + 2 bear an isotopic relation- ship with M. This is inferrable from the ratios of intensities of these three peaks, I#, I,,,+, and IM+2. We shall call the peak at M the fragment peak and the ones at M + 1 and M + 2 the isotopic peaks. The system can also accommodate possible interference by significant abundances of an M - 1 peak, in the intensity ratios in the cluster of peaks at M, M + 1 and M + 2 . From the large number of composition possibilities for a given fragment peak, the choices are pruned by checking for adherence to two rules-the rings plus double bonds (RDB) rule and the nitrogen rule5 (see Section 1). We have been guided in our choice of appropriate rules largely by McLafferty’ and have tested the program on the examples given there. The system developed is a stand-alone module and can output elemental compo- sition of fragments without being integrated with our earlier system. ’ The program has been written in PASCAL and runs on an IBM PC-AT compatible machine.

Probably the best-known expert system for interpret- ing mass spectra is the DENDRAL program,6 deve- loped in the sixties. DENDRAL infers plausible struc- tures of compounds from their mass spectra by means of fragmentation rules which constitute the so-called ‘zeroeth order theory’ of mass spectrometry. A later system, Meta-DENDRAL,7 tries to infer the relevant rules from the spectra of compounds with known struc- tures. The DENDRAL system requires a priori know- ledge of the molecular formula of the compound. However, nowadays one of the principal uses of mass spectra is in determining the molecular formulae of compounds. Though expert systems have been used for many applications in chemistry (e.g., Reference S), to the best of our knowledge the system we have deve- loped is the first that attempts to infer a molecular formula from the spectrum. In addition, our system provides the formulae for a few fragment peaks as well, which is useful in structure e l~c ida t ion .~

095 1-4198/92/090585-07 $08.50 0 1992 by John Wiley & Sons. Ltd

Received 16 July 1992 Accepred 18 July I992

Page 2: Computer-assisted determination of elemental composition of fragments in mass spectra

586 RESEARCH METHODS AND APPLICATIONS

1. THE SYSTEM OVERVIEW The present system functions principally on the basis of the known natural isotopic compositions of the ele- ments and on the recognition of the fact that, to a first approximation, all fragments are subsets of the molecu- lar ion. This fact is based on the assumption that fragmentation of an organic molecule in a mass spectro- meter takes place by a process of unimolecular decomposition (with no significant recombination) of an excited precursor. The isotopic relations revealed by the relative intensities of a cluster of peaks in the mass spectrum are employed in arriving at suggestions of the elemental composition of that particular fragment. We also make use of some rules to eliminate chemically implausible suggestions (see the rings plus double bonds and nitrogen rules below). The relationship between the elemental compositions of different frag- ment peaks is used to prune the number of suggestions of elemental composition of a particular fragment and to suggest the molecular formula in the absence of a molecular ion peak. By the use of suitable data struc- tures all the searches involved are made efficient so that the system can easily handle relatively large molecules.

If the elemental composition of a particular fragment at mass M is known, then, by using a multinomial distribution, one can compute the intensity contribu- tion from the isotopes at masses M + 1 and M + 2. In general, when a large number of different elements compose a fragment , intensity contibution at masses higher than M + 2 may also be significant. In practice, however, considering isotopic contributions up to M + 2 has been found sufficient for computing elemental com- position. The expressions for the expected intensity of the isotopic peaks used in the present work are derived using a multinomial distribution and are given below (the terms used are from McLafferty'): A-elements. These are monoisotopic elements (e.g. , fluorine). For these, there will be no isotopic contribu- tion at masses M + 1 and M + 2. A+Z elements. These are elements whose second major isotope is one mass unit higher than the most abundant isotope (e.g., carbon). If the isotopic ratio of these elements is represented as 100:a, for a fragment at mass M consisting of N atoms of this element, the relative intensities at mass M + 1 and M + 2, Z,+, and

are given by (assuming Z, = 100) I,+ 1 = N x a

I , + ~ = N X ( N - ~ ) X U ~ / ( ~ X 100) A + 2 elements. These are elements whose second major isotope is two mass units higher than their most abundant isotope (e.g. sulphur). If the isotopic ratios of these elements are 100:a:6, for a fragment at mass M consisting of N atoms of this element, the intensity contribution at M + 1, I , + , , will be N x a , and that at M + 2 , Z,+2, will be

I , + , = N X b + [ ( N x ( N - 1 ) ~ ~ ~ ) / ( 2 ~ 1 0 0 ) ] At any higher mass, M +n, + (2 x n 2 ) , the intensity is given by

z,+,, + (2 x n2)

[ N ! un~6"2]/[(ZV- n, - n2)! n,! n2! 100("1+"2-')] (1) where n, is the number of atoms of isotope A + 1, and n2 is the number of atoms of isotope A + 2 . As an

example, consider a cluster with a major peak at mass 160. One of the various combinations of elements that the program will try would be 10 oxygen atoms (the ratio of 160:170:IR0 being 100:0.04:0.20). Here, N will be 10 and the intensity contribution at mass 161 would be 10 x 0.04 = 0.4%; at mass 162, two different combi- nations of isotopes could contribute with nl = 2 and n2=0 as one combination and nl=O and n 2 = 1 as another; at mass 163, the combinations are n1 =3 , n2 = 0 or n, = 1 , n2 = 1. The contribution by these com- binations, calculated from Eqn ( l ) , are added to get the relative intensity at the corresponding mass number.

In the present version we do not consider other types of elements (e.g., A - 1 elements whose second most abundant isotope is one mass unit less than the most abundant one). However, it is fairly simple to extend the system to include these elements too. Multi-element com6inations. For two elements, 1 and 2, with isotopic ratios 100:al:bl and 100:a2:b2, respect- ively, considering a fragment M consisting of N 1 atoms of element 1 and N2 atoms of element 2, the intensity contribution at M + 1, I,+,, is

I,+, =N1 x a, +N2 x a2 The contribution to the intensity at higher masses, ~ ,+k ,1+ (2k12)+k21+(2k22) ' is given by

I,+ kl1+ (2k12) +k2I + (2k22)

= [N,! / (NI - kll - kJ! kll! k12!] X [N2!/(N2 - k21- k22)! k2,! k2,!] x [(&I x 6412 x &I x b ~ ) / ( 1 0 0 k ~ ~ + k ~ ~ + k ~ ~ + k 2 ~ - 1 11

where kij are the number of atoms of isotope j of element i (1 9 i, j I 2). Thus, for a hypothetical cluster consisting of 5 sulphur atoms (the ratio of 32S:33S:34S is 100:0.8:4.4) and 3 oxygen atoms, N 1 is 5 , N2 is 3 and the mass is 208. The intensity contribution at mass 209 would be 5 X 0.8 + 3 x 0.04 = 4.12%. Mass 210 could be obtained by the following combinations

kll k12 k21 k22 2 0 0 0 0 0 2 0 1 0 1 0 0 1 0 0 0 0 0 1

The intensities computed for all these combinations are summed to get the intensity at 210 for this hypothetical cluster of atoms.

The above expressions can be easily generalized to any number of elements constituting the fragment. They can be used to choose elemental compositions which satisfy the constraints of the mass of the fragment as well as the reported intensities of the isotopic peaks. However, the combinations obtained at this stage will include some suggestions which may be chemically non- viable and thus have to be eliminated. This elimination is done by checking for conformity to the following two rules.

Rings plus double bonds (RDB) rule This rule permits the computation of the degree of unsaturation of the fragment and helps in the elimina- tion of combinations having a negative RDB value. The

Page 3: Computer-assisted determination of elemental composition of fragments in mass spectra

587 RESEARCH METHODS AND APPLICATIONS

expression used for computing the RDB value is given below’

RDB=X-0.5Y+ OSZ+ 1 where X refers to the number of tetravalent elements, Y to the number of monovalent elements and Z to the number of trivalent elements in the suggested element- al composition for the fragment. We eliminate all suggestions that result in a negative RDB value.

RDB values are also used to characterize fragments as odd-electron (if RDB is an integer) and even- electron (if RDB is a positive fraction). This infor- mation is useful in applying the nitrogen rule (explained below) and also in identifying the molecular ion.

Nitrogen rule Because of nitrogen’s exceptional property of having an odd valency and even mass for its most abundant isotope, nitrogen-bearing fragments can conform only to the following combinations:

Electron Number of Mass tY Pe nitrogen atoms number Odd Even Even Odd Odd Odd Even Even Odd Even Odd ’ Even

The suggestions generated by the program can be pruned out if they do not conform to the above combi- nations. By noting that zero is an even number, we can also apply this rule for combinations not containing nitrogen.

Identification of molecular ions Under our assumption of unimolecular decomposition (without any recombination) the molecular ion will appear, if at all, in the highest mass cluster in the spectrum. Further, on account of chemical constraints, these ions have to be odd-electron (integral RDB value) and their elemental composition has to include the elemental compositions of the lower mass frag- ments of the spectrum.

There are, however, situations where no molecular ion appears in the spectrum. In such instances, the highest mass fragment in the spectrum will fail one of the tests for molecular ions as above and it becomes necessary to use another scheme for predicting the molecule’s elemental composition. In our system the molecular formula is then computed by taking a union of the elemental compositions of two or more peaks at the high-mass end of the spectrum. This is based on the simplifying assumption that an unstable molecular ion will initially split into two or more major fragments which will then have their own independent decompo- sition series in the spectrum. As will be seen in Section 3, this strategy is useful in some cases.

2. IMPLEMENTATION DETAILS The main tasks to be performed by the expert system are: identifying possible fragment peaks, computing allowed elemental compositions (i.e., those satisfying

the mass of the fragments and relative intensities of isotopic peaks and also conforming to the RDB and nitrogen rules) and suggesting a molecular formula for the compound.

For identifying possible fragment peaks and the asso- ciated isotope peaks, the program scans the spectrum from the lowest to the highest mass. Each peak (say at mass M ) is considered as a possible fragment peak with peaks at M + 1 and M + 2, if present (see below), as the isotopic peaks. To qualify for being considered as a fragment peak, the intensity of the peak should be greater than or equal to the intensity threshold. In our current system this threshold is normally set at 10%. However, sometimes the molecular peak will appear in the spectrum with very low intensity. To take care of this, the system first scans the spectrum to locate the most intense peak in the last cluster at the high mass end of the spectrum. If this peak has an intensity greater than lo%, then the intensity threshold is fixed at 10%; otherwise the threshold is set at the intensity of this, peak in the last cluster. I t may be noted that the intensity only has to cross the threshold for a peak to be considered as a fragment peak. There is no threshold on the isotopic peaks. For each of the peaks identified as a possible fragment peak the following processing is done.

If a peak at mass M is picked, the program looks for peaks at M + 1 and M + 2 in the spectrum. If at least one of the two is present, then the system inserts the other in the spectrum with a nominal intensity of zero. In all intensity values, provision is made for an error range of 10% of the nominal intensity value, or 0.2, whichever is higher. Now the sequence of three peaks is subjected to further analysis.

The system tries systematically to assign all possible elemental compositions to the fragment peak chosen for analysis. For this search, the maximum number of atoms possible of any element in this peak can be bounded by the knowledge of the mass of the fragment. Initially, through an interactive session, the user is allowed to input information regarding elements defin- itely known to be present and those definitely known to be absent. The system excludes elements known to be absent from the computation of elemental compo- sitions. Each elemental composition is tested for its conformity with the mass of the fragment and with the expected intensities at the isotopic peaks. These intensities can be calculated with the formulae given in Section 1. If the fragment peak (at mass M) under consideration is such that there is an (M - 1) peak in the spectrum, then the contribution of the isotopic peaks of this peak to the observed intensities at masses (M+1) and ( M + 2 ) will also be taken care of during computation (see example 1 in Section 3). Further, each combination suggested for the peak is tested to make sure that it satisfies the requirements of a positive RDB number, an electron type and mass number satisfying the nitrogen rule as given in Section 1. Since an error factor is allowed in the nominal intensity values, there will be a range of valid intensities for the isotopic peaks. The system collects all combinations that result in isotopic intensities within this range and also meet the other requirements as discussed above. It may be noted that not every peak picked for analysis will be assigned elemental compositions. Sometimes the peaks at M and (M + 1) may be due to independent

Page 4: Computer-assisted determination of elemental composition of fragments in mass spectra

588 RESEARCH METHODS AND APPLICATIONS

fragments and hence the expected isotopic intensities will not be observed in the spectrum. Thus, our compu- tation also points out which peaks in the spectrum are not isotopically related.

While searching for possible elemental compositions, the program has to search through a tree of possibilities where each level corresponds to making decisions about the number of atoms of a particular element. This tree will be finite due to the constraint of mass of the fragment. The program searches this tree in a depth first fashion and at each node tries to see if the partial elemental composition already violates some con- straint. With a proper search strategy many of the failure nodes are identified quite early in the search process. Without such an efficient search algorithm, finding all allowed compositions would consume too much time.

At the end of this phase, all fragment peaks for which elemental compositions are suggested are stored as a linked list of records. In this list each record has multiple links pointing to other records in the list where the pointers represent subsetlsuperset relations among the different elemental compositions. A record repre- senting one suggested elemental composition for a fragment peak at mass, say M, will have links with other records representing elemental composition for other peaks which are subsets (if the mass of those peaks is less than M) or supersets (if those peaks are at mass greater than M). The program first builds a list of all suggestions and then using suitable data structures, establishes all the links in one pass through the list. It is easy to see whether a suggested combination is sup- ported by other suggestions for other peaks in the sense that there are subset/superset suggestions at other peaks. The program automatically eliminates all those suggestions which do not enjoy any support. At this stage, the system also checks for evidence for the presence of hydrogen in the compound. Suppose MI, M Z , . . . are the masses of fragment peaks for which elemental compositions are suggested. If in the original mass spectrum there is no peak at any of the masses (MI - l ) , ( M 2 - l ) , . . . , then the system concludes that the compound does not contain hydrogen and will not suggest any molecular formulae containing hydrogen.

Finally, the system searches for the molecular formula. It first checks whether any combinations are suggested for the peak in the last cluster at the high mass end of the spectrum. If there are, then each of the combinations is tested to see (i) whether it is an odd- electron ion, i.e., whether the RDB is an integral number, and (ii) whether each fragment in the spec- trum with suggested combinations has at least one combination that is a subset of this combination. All combinations passing both the above tests are listed as possible molecular formulae for the compound. If a molecular formula cannot be found in this fashion (either because no suggestions are made for the last peak or because all suggestions fail the above tests) then the system tries to suggest molecular formulae by combining the suggestions made for two or more peaks. For this it starts by taking the highest two mass peaks for which suggestions are made. It takes all possible unions of the combinations suggested for the two peaks and tests to see if any of them satisfies the requirement of a molecular formula as defined above. If they do, then these are output as suggestions for the molecular

formula. At the end, the program outputs the formulae com-

puted for all peaks along with subsetlsuperset relations and its suggestions, if any, for the molecular formula.

The program is written as a rule-based expert system. A rule-based architecture allows incremental develop- ment and is easily modifiable. For example, more rules can easily be added to eliminate implausible sugges- tions of elemental compositions.

3. RESULTS AND DISCUSSION We present four examples here to illustrate the capabi- lities and limitations of our system. All examples are ‘unknowns’ of McLafferty5 and each of them takes about a couple of minutes to run on a PC-AT compat- ible machine.

The first example (spectrum of trar~s-1~2- dichloroethylene, (unknown 3.4 of McLafferty5) has been chosen to show the actual output of the program in the present version (Table 1). The output lists the original mass spectrum along with peaks of zero inten- sity inserted by the program (see Section 2). For exam- ple, by inserting the peak at 28, the system is able to suggest a molecular formula for the fragment corres- ponding to mlz 26. The output also shows how many peaks are assigned elemental composition. For each of these peaks the formulae suggested are listed. For example at m/z 26 it finds one formula. The system gives the intensities at 27 and 28, after normalizing the intensity at rnlz 26 to 100, along with the allowable error range (recall that we allow an error of 10% or 0.2 whichever is greater in the nominal intensity). The system lists the lower bound, nominal and upper bound intensity at 27 and 28. Then for each of the formulae suggested, it gives the calculated intensities at 27 and 28 (i.e., the ( A + 1) and (A+2) abundance). Finally, it lists the supporting formulae it has found for other peaks (see Section 2). After listing all the formulae for all the peaks, it gives its suggestions for the molecular formula.

Another feature of the system seen in Table 1 relates to the contribution of the (M- 1) peak. Consider the cluster of mass peaks at 36, 37, 38 for which the suggested molecular formula is H(l)Cl(l). Though the 136:138 ratio suggests one chlorine atom, the high value of Ig, cannot be explained if we view these peaks in isolation. However, by considering the peak at rnlz 35 to be due to a hydrogen loss from mlz 36, its (M + 2) contribution can explain the intensity at mlz 37. From the Table it is clear that this is how the system arrived at the elemental composition. (Incidentally, if the system had been given a mass spectrum containing only the three peaks at mlz 36, 37, 38, it would not have been able to suggest any elemental composition).

For the rest of the examples, we present the mass spectrum along with formulae suggested for various peaks in the form of compact Tables.

Example 2, (unknown 4.35) is a case where the system outputs six possible elemental compositions for the highest mass peak in addition to substructures at one lower mass (Table 2). The program recommends only three formulae, uiz, S(l)C(5)H(6), S(l)C(4)N( 1)H(4) and S( l)C(3)N(2)H(2) as possible molecular formulae because of the existence of subsets at mlz39. If the input information to the system had

Page 5: Computer-assisted determination of elemental composition of fragments in mass spectra

589 RESEARCH METHODS AND APPLICATIONS

indicated the absence of nitrogen in the compound, then the program would have printed out only one (and the correct) elemental composition. Several clusters within the spectrum (most notably the cluster at rn lz 45 which has a large intensity of 21.0) failed to give elemental information due to the absence of an isotopic relationship between the cluster intensities.

Table 1. Typical output of the program. The mass spectrum is of truns-l,2-dichloroethylene (unknown 3.4’). The mass number of the first peak in the last cluster is 100. The intensity threshold is 1.80

Mass number Peak value Error

12 2.70 0.27 13 3.00 0.30 14 0.60 0.20 15 0.00 0.20 24 4.00 0.40 25 15.00 1 S O 26 34.00 3.40 27 0.70 0.20 28 0.00 0.20 31 0.30 0.20 35 7.00 0.70 36 1.90 0.20 37 2.30 0.23 38 0.70 0.20 39 0.00 0.20 47 6.50 0.65 48 5.90 0.59 49 4.20 0.42 50 1.80 0.20 51 0.70 0.20 52 0.00 0.20 59 2.60 0.26 60 24.00 2.40 61 100.00 10.00 62 9.90 0.99 63 32.00 3.20 64 0.70 0.20 65 0.00 0.20 95 1 S O 0.20 96 67.00 6.70 97 2.40 0.24 98 43.00 4.30 99 1 .OO 0.20

100 7.00 , 0.70 101 0.10 0.20 102 0.00 0.20

Number of peaks which produced structures is 4. The peaks are at mass numbers: 26 36 61 96

No. of structures at 26 is 1

A+1(27) 1.47 2.06 2.65 A + 2 (28) 0.00 0.00 0.59

(26) C(2) H(2) A + 1 abundance = 2.21 A + 2 abundance = 0.01 Supporting structures at: Mass no. (61) Cl(1) C(2) H(2) Mass no. (96) Cl(2) C(2) H(2)

No. of structures at 36 is 1

A + 1 (37) 108.95 121.05 133.16 A + 2 (38) 26.32 36.84 47.37

(36) CI(1) H(I) A + 1 abundance = 119.74 A + 2 abundance = 32.50 Supporting structures at: Mass no. (61) CI(1) C(2) H(2) Mass no. (96) Cl(2) C(2) H(2)

Table 1. (continued) No. of structures at 61 is I

A + 1 (62) A + 2 (63)

8.91 9.90 10:89 28.80 32.00 35.20

(61) C V ) C(2) H(2) A + 1 abundance = 10.00 A + 2 abundance = 32.51 Supporting structures at: Mass no. (26) C(2) H(2) Mass no. (36) Cl( 1) H( 1) Mass no. (96) Cl(2) C(2) H(2)

No. of structures at 96 is 1

A + 1 (97) A + 2 (98)

3.22 3.58 3.94 57.76 64.18 70.60

(96) C@) C(2) H(2) A + 1 abundance = 3.66 A + 2 abundance = 65.01 Supporting structures at: Mass no. (26) C(2) H(2) Mass no. (36) Cl(1) H(1) Mass no. (61) Cl(1) C(2) H(2)

The spectrum contains the molecular ion peak The molecular weight is 96

No. of structures at 96 is 1

A + 1 (97) A + 2 (98)

3.22 3.58 3.94 57.76 64.18 70.60

(96) Cl(2) C(2) H(2) A + 1 abundance = 3.66 A + 2 abundance = 65.01

No. of subsets = 3 Supporting sets at: Mass no. (26) IS C(2) H(2) Mass no. (36) IS Cl(1) H(l) Mass no. (61) IS CI(1) C(2) H(2)

The next two examples correspond to a successful (unknown 3.5,’ Table 3) and an unsuccessful (unknown 3.6,5 Table 4) attempt at getting the molecular formula when the molecular ion peak is not present in the spectrum.

We see in Table 3 that the peak in the last cluster at mlz 135 has two possible elemental compositions assigned to it. The first, C(l)CI(l)N(l)O(l)F(3)H(l), though an odd-electron ion, has no subsets at all the other lower-mass peaks (it does not have a subset at mlz 119). A scan of all the peaks with assigned elemen- tal composition in the spectrum indicates the absence of hydrogen in the compound. On this basis, the program removes the elemental composition C(l)C1(1)N(l)O(l)F(3)H(l) from consideration as the molecular ion. The second assignment, namely C(2)ClF(4), is not an odd-electron ion (its RDB value being 2-5 /2+1=0.5) indicating that it is not the molecular ion. With this information the program tries unions of suggestions at the highest two masses, namely, m l z 119 and 135. One of these unions, C(2)CIF(5) with an rn lz of 154,, is an odd-electron ion and has subsets at all the other masses (at mlz 135, 119, 85, 69 and 31) and this is identified by the program as the possible molecular ion of the compound under analysis. (The other union, namely C(2)Cl(l)N(l)O(l)F(5)H(l), is rejected because the system has already noted the absence of hydrogen).

In the second case (Table 4), the mass peak at

Page 6: Computer-assisted determination of elemental composition of fragments in mass spectra

590 RESEARCH METHODS AND APPLICATIONS

mlz 117 fails the odd-electron test (RDB value of C(l)C1(3) = 0.5), and the program does not identify any alternative molecular formula by taking unions because all such unions result in the same formula. Nevertheless, the program does output elemental com- positions for the different fragments, and these would still constitute information useful for the chemist.

4. CONCLUSION In this paper we have presented a computer program that analyses the mass spectrum of a compound and suggests possible molecular formulae for the com-

Table 2. Mass spectrum and elemental assignment for C5H6S (unknown 4.3’)

15 1.10 26 3.30 27 8.00 28 0.00 37 4.10 38 4.90 39 13.00

Mass number Peak intensity Elemental composition

40 41 45 46 47 48 49 50 51 52 53 54 55 57 58 59 60 61 62 63 64 65 66 68 69 70 71 72 73 81 82 83 84 95 96 97 98

0.40 0.00

21.00 0.90 1.90 0.80 3.10 3.60 4.00 1.40 8.70 0.30 0.00 3.90 6.60 5.10 0.70 1.60 1.90 3.00 0.50 2.40 0.00 0.70 6.20 1.80 3.80 0.50 0.00 0.90 1 .oo 0.50 0.00 1 .OO 0.60

100.00 56.00

Table 3. Mass spectrum and elemental composition assign- ment for chloropentafluoroethane (unknown 3.5’)

Peak value Elemciital composition Mass number

31 42.00 C(I)F(1) 32 0.50 33 0.00 35 2.50 36 0.00 37 0.70 43 0.50 47 1.70 48 0.00 49 0.60 50 11.00

70 0.70 71 0.00

86 1.10 87 33.00 88 0.40 89 0.00

100 2.80 119 52.00 C(2)F(5) 120 1.20 121 0.00 135 24.00 Cl(l)C(2)F(4)

69 57.00 C(lF(3)

85 100.00 CU)C(1)F(2)

136 0.50 Cl(l)O(l)C( l)N( 1)F(3)H( 1)

137 7.70 138 0.00

Molecular formula: Cl(l)C(2)F(5)

Table 4. Mass spectrum and elemental composition assign- ment for CCI4 (unknown 3.6’)

Elemental composltlon Mass number Peak intensity

35 41.00 CK1) 36 0.00 37 13.00 41 1.20 42 0.80 43 0.00 47 40.00 48 0.50 49 13.00 50 0.00 70 1.40 71 0.00 72 0.90 82 29.00 83 0.30 84 19.00

’ 85 0.20 86 2.90 87 0.00

117 100.00 C1(3)C(1) 118 1 .oo 119 96.00 120 1 .OO 121 30.00 122 0.30 123 3.10 124 0.00

Molecular formula: The spectrum does not contain the molecular ion peak. By taking a union of two high-mass peaks it was not possible to find a molecular formula.

Page 7: Computer-assisted determination of elemental composition of fragments in mass spectra

RESEARCH METHODS AND APPLICATIONS 591

pound. The system also suggests elemental compo- sitions for isotopically related clusters of peaks in the spectrum.

As seen from the results presented in the previous section, the system performs fairly well when the molecular ion is present in the spectrum. The compu- tations are not limited by the number of atoms in the molecule and hence the program should be capable of being used for large molecules. The efficiency of the program will be greater if the chemist provides some information about the absence of elements. The non- availability of this information leads, not to inaccuracy, but to increase in the number of elemental combi- nations suggested by the program.

When the molecular ion is absent in the spectrum, the program is compelled to take unions of elemental compositions of the high-mass peaks to suggest molecu- lar formulae. This may become inefficient for large molecules where a large number of combinations may have to be considered.

The strong point of the system is that its inferences are based on general chemical knowledge and the system does not need any database of mass spectra. In our opinion, such systems which can provide the chemist with useful information by arguing from first principles will have a large role to play in the future of analytical instruments.

A possible extension to this system would be to equip the program with sufficient knowledge to recognise peaks that result from ion/molecule reactions.

Currently, the system can recognise when peaks in a cluster are not isotopically related. In such cases, with some additional knowledge, the system can be enabled to infer ion/molecule reactions.

Acknowledgments We are grateful to Dr S. Vasudevan for his critical comments which have resulted in enhancing the clarity of this paper. We are also thankful for the help received from Dr Thomas Chacko in style- editing this manuscript.

REFERENCES 1. M. P. Sridhar, A. G. Menon and P. S. Sastry, Rapid Commun. Mass Spectrom. 5 , 206 (1991).

2. P. S. Sastry and A. G Menon, Artificial Intelligence and Expert System Technologies in the Indian Context, Vol. 2 , ed. by V. V. S. Sharma et al, Tata McGraw-Hill, New Delhi (1991).

3. K. Kumar, M. E . Thesis, Department of Electrical Engineering, IISC, Bangalore (1991).

4. M. C. Ravindar, M. E . Thesis, Department of Electrical Engineering, IISc, Bangalore (1992).

5 . F. W. McLafferty, Interpretation of Mass Spectra, University Science Books, CA (1982).

6. B. G. Buchanan, G. Sutherland and E. A. Feigenbaum, Machine Intelligence Vol. 4, ed. by B. Meltzer and D. Michie, Edinburgh University Press (1969).

7. B. G. Buchanan and E. A. Feigenbaum, Artifkial Intelligence, 11, 5 (1978).

8. J. B. Justice and T. L. Isenhour Eds Digital Computers in Analytical Chemistry, Parts I and I I , Benchmark Papers in Analytical Chemistry, Vol. 3, Hutchinson Press Publishing Co. (1981).