12
Small Molecules as Mathematical Partitions Daniel L. Sweeney* Pfizer Global R&D, Skokie, Illinois 60077 Small molecules can be represented as modular struc- tures: small numbers of unbreakable cells, of known elemental composition, joined together at cleavable seams. The cells are a mathematical partition of the molecular weight. A systematic process is described here for con- verting mass spectral data into these simple modular structures; a computer program was then developed and tested using this process. On the basis of this preliminary work, it appears that this partitioning approach may be practicable for many compounds. Examples illustrating some of the limitations encountered with this approach are also presented. Many aspects of LC/MS (sample preparation, chromatography, quantitation) have been streamlined and automated so that large numbers of samples can be rapidly analyzed. Methods for rapidly identifying unknown compounds from their corresponding mass spectra have also evolved. The first approach is library or database matching. The combined NIST and Wiley libraries have hundreds of thousands of spectra. Algorithms such as probability based matching (PBM) have been developed to optimize searching these libraries. 1 Library matching is especially powerful for electron impact spectra, but recently, much progress has also been made toward obtaining and using libraries of CID data. 2 A second approach is predictive software, such as ACD’s Spectrum Manager 3 and HighChem’s Mass Frontier. 4 This predic- tive software is structure-based, starting with a proposed molecular structure and then assigning fragment ions to the spectrum by applying fragmentation rules to the structure. Specialized programs, such as SEQUEST, 5 are extremely important for identifying proteins and peptides. These programs essentially work from the product ion spectra to chemical structures. SEQUEST also utilizes the database matching ap- proach previously mentioned. Developing a basic understanding of the fragmentation of peptides and other small organic compounds is an active area of research, 6,7 and consequently, these fundamental studies will influence the development of better software. Many compounds can be described in a modular format that will account for most of the fragments observed in the product ion spectra. Essentially, a molecule can be represented in the form of unbreakable cells of known elemental composition joined together at cleavable seams. One such compound is xemilofiban. The compound is shown in Figure 1 in two formats; the modular structure is shown below the corresponding molecular structure. The modular structure in Figure 1 is a convenient way of summarizing product ion mass spectral data. On the basis of this modular structure, fragment ions are viewed as different groups of connected cells. This modular structure of xemilofiban was derived after detailed analysis of its spectrum, the spectra of its analogues, and correlation of spectral features with structural features (e.g. by a “mental” process). This report describes preliminary efforts to derive modular structures directly from product ion spectra, using a computerized mathematical process. Modular structures very closely resemble the molecular structures. This resemblance would be very helpful for identifying unknown compounds for which background information is very limited (e.g., forensics). In addition, a simple change in the structure of a molecule (e.g., enzymatic oxidation) will often shift the masses of many fragment ions in the product ion mass spectrum. Using modular structures, these shifts can often be attributed to a change in the mass of a single cell, and so one can * Corresponding author. E-mail: [email protected]. (1) McLafferty, F. W.; Zhang, M.-Y.; Stauffer, D. B.; Loh, S. Y. J. Am. Soc. Mass Spectrom. 1998, 9 (1), 92-95. (2) Hough, J. M.; Haney, C. A.; Voyksner, R. D.; Bereman, R. D. Anal. Chem. 2000, 72 (10), 2265-2270. (3) http://www.acd.com/. (4) http://www.highchem.com/mf.htm. (5) Eng, J. K.; McCormack, A. L.; Yates, J. R. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976-989. (6) O’Hair, R. A. J. In Mass Spectrometry in Drug Discovery; Rossi, D. T., Sinz, M. W., Eds; Marcel Dekker: New York, 2002; Chapter 4. (7) Cheng, X.; Gao, L.; Buko, A.; Miesbauer, L. Proc. 46th ASMS Conf., Orlando, Florida, 1998, 85. Figure 1. Comparison of the molecular structure and modular structure of xemilofiban. Anal. Chem. 2003, 75, 5362-5373 5362 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 10.1021/ac034446k CCC: $25.00 © 2003 American Chemical Society Published on Web 09/19/2003

Small Molecules as Mathematical Partitions

Embed Size (px)

Citation preview

Page 1: Small Molecules as Mathematical Partitions

Small Molecules as Mathematical Partitions

Daniel L. Sweeney*

Pfizer Global R&D, Skokie, Illinois 60077

Small molecules can be represented as modular struc-tures: small numbers of unbreakable cells, of knownelemental composition, joined together at cleavable seams.The cells are a mathematical partition of the molecularweight. A systematic process is described here for con-verting mass spectral data into these simple modularstructures; a computer program was then developed andtested using this process. On the basis of this preliminarywork, it appears that this partitioning approach may bepracticable for many compounds. Examples illustratingsome of the limitations encountered with this approachare also presented.

Many aspects of LC/MS (sample preparation, chromatography,quantitation) have been streamlined and automated so that largenumbers of samples can be rapidly analyzed. Methods for rapidlyidentifying unknown compounds from their corresponding massspectra have also evolved.

The first approach is library or database matching. Thecombined NIST and Wiley libraries have hundreds of thousandsof spectra. Algorithms such as probability based matching (PBM)have been developed to optimize searching these libraries.1

Library matching is especially powerful for electron impact spectra,but recently, much progress has also been made toward obtainingand using libraries of CID data.2

A second approach is predictive software, such as ACD’sSpectrum Manager3 and HighChem’s Mass Frontier.4 This predic-tive software is structure-based, starting with a proposed molecularstructure and then assigning fragment ions to the spectrum byapplying fragmentation rules to the structure.

Specialized programs, such as SEQUEST,5 are extremelyimportant for identifying proteins and peptides. These programsessentially work from the product ion spectra to chemicalstructures. SEQUEST also utilizes the database matching ap-proach previously mentioned.

Developing a basic understanding of the fragmentation ofpeptides and other small organic compounds is an active area of

research,6,7 and consequently, these fundamental studies willinfluence the development of better software.

Many compounds can be described in a modular format thatwill account for most of the fragments observed in the production spectra. Essentially, a molecule can be represented in the formof unbreakable cells of known elemental composition joinedtogether at cleavable seams. One such compound is xemilofiban.The compound is shown in Figure 1 in two formats; the modularstructure is shown below the corresponding molecular structure.

The modular structure in Figure 1 is a convenient way ofsummarizing product ion mass spectral data. On the basis of thismodular structure, fragment ions are viewed as different groupsof connected cells. This modular structure of xemilofiban wasderived after detailed analysis of its spectrum, the spectra of itsanalogues, and correlation of spectral features with structuralfeatures (e.g. by a “mental” process). This report describespreliminary efforts to derive modular structures directly fromproduct ion spectra, using a computerized mathematical process.

Modular structures very closely resemble the molecularstructures. This resemblance would be very helpful for identifyingunknown compounds for which background information is verylimited (e.g., forensics). In addition, a simple change in thestructure of a molecule (e.g., enzymatic oxidation) will often shiftthe masses of many fragment ions in the product ion massspectrum. Using modular structures, these shifts can often beattributed to a change in the mass of a single cell, and so one can

* Corresponding author. E-mail: [email protected].(1) McLafferty, F. W.; Zhang, M.-Y.; Stauffer, D. B.; Loh, S. Y. J. Am. Soc. Mass

Spectrom. 1998, 9 (1), 92-95.(2) Hough, J. M.; Haney, C. A.; Voyksner, R. D.; Bereman, R. D. Anal. Chem.

2000, 72 (10), 2265-2270.(3) http://www.acd.com/.(4) http://www.highchem.com/mf.htm.(5) Eng, J. K.; McCormack, A. L.; Yates, J. R. J. Am. Soc. Mass Spectrom. 1994,

5 (11), 976-989.

(6) O’Hair, R. A. J. In Mass Spectrometry in Drug Discovery; Rossi, D. T., Sinz,M. W., Eds; Marcel Dekker: New York, 2002; Chapter 4.

(7) Cheng, X.; Gao, L.; Buko, A.; Miesbauer, L. Proc. 46th ASMS Conf., Orlando,Florida, 1998, 85.

Figure 1. Comparison of the molecular structure and modularstructure of xemilofiban.

Anal. Chem. 2003, 75, 5362-5373

5362 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 10.1021/ac034446k CCC: $25.00 © 2003 American Chemical SocietyPublished on Web 09/19/2003

Page 2: Small Molecules as Mathematical Partitions

easily pinpoint the region where the original molecule was altered.In addition, if one could get both high accuracy and sufficientlysmall cells, only one formula of all the possible formulas for thewhole compound would fit the elemental data for all of the cells.This is similar to the theory behind the “basket-in-the-basket”approach.8 Finally, a large amount of mass spectral data is oftenobtained on a metabolite, degradation product, or impuritybecause the same sample is analyzed on several types of instru-ments to yield both accurate mass data and MSn data;9 accuratemass MS/MS data can be easily related to MSn or CID-MS/MSdata through the use of these modular structures.

Recent technological developments have made it practicableto generate modular structures from product ion mass spectra.First, the recent proliferation of both quadrupole-time-of-flight andFourier transform ion cyclotron resonance mass spectrometershas resulted in readily obtainable accurate mass fragmentationdata. Second, high-speed desktop computers can now do the veryintensive calculations needed.

Three assumptions about the product ion spectra of protonatedmolecules are being made here. The first assumption is thatfragment ions, like protonated molecular ions, are even-electronions. Indeed, fragment ions, when neutralized by the removal ofa proton, are assumed to be molecules. The cells can also bevisualized as hypothetical molecules, as shown in Figure 2. (Someneutralized fragment ions, such as C7H6, mass 90, the neutralizedbenzyl carbocation, would be difficult to visualize as neutralmolecules.) This even-electron assumption is usually correct forpositively charged product ion spectra obtained under normalcollision energies. Exceptions are known (e.g., substituted anilines,such as clenbuteral10), but even-electron fragment ions predomi-nate in the product ion spectra of most protonated compounds.There are two consequences of this assumption: the nitrogen ruleapplies to the cells, and the number of rings and double bondsincreases by one each time two cells are cleaved.

The second assumption is that no rearrangements occur.Although rearrangements are well-known,11-13 rearrangements are

observable only when the chemical structure is known by someother means.

The third assumption is that the simplest solution to aspectrum is the most plausible solution. Like the two previousassumptions, this assumption is not always true.14

EXPERIMENTAL SECTIONChemicals. HPLC grade water was obtained from a Millipore

Simplicity system. Methanol was obtained from Burdick andJackson (Catalogue no. UN1230); acetonitrile, from EM Science(Catalogue no. AX0142); and ammonium acetate (Catalogue no.A639-500), from Fisher Scientific. Acetic acid (Ultrex grade,Catalogue no. 6903-05) was purchased from Baker. Leucineenkephalin (L-9133) and sodium acetate trihydrate (S-7670) wereobtained from Sigma. Acetamide was purchased from Aldrich(A105-3), and ESI tuning mix (G2421A) was obtained from Agilent.The other compounds are not commercially available, althoughthe structures were well-established by both synthesis and NMRanalyses.

Solutions. All compounds were dissolved at 1 µg/µL in mobilephase B (95% methanol, 5% water, 12.5 mM ammonium acetate,12.5 mM acetic acid), except compound C, which was dissolvedin 14% mobile phase B and 86% mobile phase A (5% methanol,95% water, 12.5 mM ammonium acetate, 12.5 mM acetic acid) at140 ng/µL. Compounds B and C were used without furtherdilution. All other compounds were diluted 1:40 with additionalmobile phase B to a final concentration of 25 ng/µL.

Lock Spray Solution. Agilent ESI tuning mix was diluted 1:3with acetonitrile and loaded into a 50-mL ISCO µLC500 syringepump.

Accurate Mass. All accurate mass spectra were obtained witha Micromass Q-TOF-2 mass spectrometer equipped with orthogo-nal Z-SPRAY and LockSpray. The lockmass compound used wasthe 622.0295 compound of the Lock Spray Solution above.15 AnAgilent 1100 binary pump (Catalogue no. G1312A) with an Agilentdegasser (G1322A) was used to deliver mobile phase B to themass spectrometer at a flowrate of 100 µL/min. Samples wereinjected into the mass spectrometer directly using an Agilent 1100autosampler (Catalogue no. G1329A). A Zorbax SB-Phenyl (3.5µm, 2.1 × 100 mm) column (Agilent) was placed between thepump and the injector to provide some backpressure to the pump.

The lockmass solution was infused at a flowrate of 10 µL/minusing the ISCO µLC500 syringe pump. The instrument wasscanned from 50 to 1020 Da in 2 s, with a 123-µs pusher time.W-mode was used in the +ESI mode with a resolution of ∼17 500(peak width at half-height). Details about the instrument settingsand calibration, chosen to maximize accuracy, are found in theSupporting Information.

To obtain accurate mass MS/MS spectra, a collision energywas chosen for which the relative intensity of the parent ion wasroughly 10%. Generally, this energy gave a good distribution ofhigh- and low-mass fragment ions. Experimentally, three differentcollision energies were generally tried simultaneously for eachcompound. Details of conditions used to obtain the spectra arefound in the Supporting Information. Only ions greater than 2%

(8) Wu, Q. Anal. Chem. 1998, 70, 865-872.(9) Clarke, N. J.; Rindgen, D.; Korfmacher, W. A.; Cox, K. A. Anal. Chem. 2001,

73, pp 430A - 439A.(10) Willoughby, R.; Sheehan, E.; Mitrovich, S. A Global View of LCMS; Global

View Publishing: Pittsburgh, PA, 1998; pp 554.(11) Warrack, B. M.; Hail, M. E.; Triolo, A.; Animati, F.; Seraglia, R.; Traldi, P.

J. Am. Soc. Mass Spectrom. 1998, 9, 710-715.(12) Brull, L. P.; Heerma, W.; Thomas-Oates, J.; Haverkamp, J.; Kovacik, V.;

Kovac, P. J. Am. Soc. Mass Spectrom. 1997, 8, 43-49.(13) Tiller, P. R.; Raab, C.; Hop, C. E. C. A. J. Mass Spectrom. 2001, 36, 344-

345.

(14) Hoffman, R.; Minkin, V. I.; Carpenter, B. K. Int. J. Philos. Chem. 1997, 3,3-28 (www.hyle.org/journal/issues/3/hoffman.htm).

(15) Flanagan. U.S. Patent 5872357; Feb 16, 1999.

Figure 2. The cells visualized as small molecules.

Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 5363

Page 3: Small Molecules as Mathematical Partitions

relative intensity were used in the calculations. The relativeintensities of these ions were rounded to the nearest integer.

Low-Resolution CID-MS/MS. Low-resolution CID-MS/MSspectra were obtained on the same sample solutions using aMicromass Quattro II mass spectrometer equipped with orthogo-nal Z-SPRAY. Most sample solutions were injected with a conevoltage of 70 V (for CID) and a collision energy of 20 V. The labilecompound B was fragmented at a cone voltage of 35 V and acollision energy of 12V (Supporting Information). Argon at apressure of 3 × 10-3 mBar was used as the collision gas. For theCID-MS/MS work, only two or three of the largest fragment ionsthat were detected in the accurate mass spectra obtained abovewere fragmented. Only product ions with the same integral massas ions observed in the accurate mass spectrum and >10% relativeintensity were considered present.

Computations. Accurate masses and predicted isotope ratioswere obtained using MassLynx version 3.5 (Waters). All otherprograms were run on a Dell Dimension 340 with an Intel Pentium4 clocked at 1.7 GHz. The operating system was Red Hat Linux

release 7.1 (Seawolf), operating system version no. 1 Sun Apr 820:41:30 EDT 2001, release 2.4.2-2. The C compiler was gccversion 2.96. All programs were written in C.

Computer Parameters Used: minimum_number_of_ions ) num-ber of cells + 1; MaxDefect ) 25 (equivalent to 2.5 milli-Daltons);least_squares_fit ) 5; MinimumCoverage: initially set at 65%, butvaried if necessary to increase or decrease the number ofsolutions.

DESCRIPTION OF THE PROCESSThe overall process is summarized briefly in Figure 3.Compound B, molecular weight 638 and having only five ions

in its product ion spectrum, will be used to illustrate eachindividual step.

(1) Finding Partitions: finding the integral masses of the cellsby systematically generating every partition of the integralmolecular weight having a given number of cells, summing upevery combination of those partitions, and comparing those sumsto the integral masses of the neutralized fragment ions, lookingfor partitions that account for as many fragment ions as possible.

The integral molecular weight is partitioned. A partition of apositive integer is any set of positive integers adding up to thatnumber.16

The product ion spectrum of protonated compound B istabulated in Table 1. First, all of the product ions (the protonatedmolecular ion is included) are neutralized prior to partitioning.Since this is a positive ion spectrum and the electron mass wasignored in the calibration, 1.0078 (the mass of a hydrogen atom)is subtracted from all of the masses.

The molecular weight of compound B is 638. The number 638has 319 two-cell partitions (e.g., 319 different sets of two positiveintegers that add up to 638), 33 920 three-cell partitions, 1 811 911four-cell partitions, and over 58 million five-cell partitions. A two-cell partition can account for a maximum of three ions in aspectrum, whereas a three-cell partition can account for as manyas six ions (discussed in more detail later). Since compound Bhas five ions in its spectrum, the three-cell partition, being thesimplest possible solution, is considered first.

For each of the 33 920 three-cell partitions, there are sevencombinations of three cells taken one, two, or three at a time.Each of the seven combinations is summed, and the resultant sumis compared to the masses of the neutralized fragment ions tolook for matches. The number of neutralized fragment ions thatmatch a sum is compared to the minimum_number of_ionscriterion; in this case, four ions was chosen. Every partition whosesums can account for at least four ions is saved as a possiblesolution. The other partitions are all eliminated at this stage.

The problem of finding the better partitions can be simplifiedto some extent. As mentioned previously, the cells can be viewedas hypothetical molecules. For molecules composed of thecommon elements such as C, H, N, O, S, Cl, and F, there are nosimple molecules of masses from 1 to 16 (loss of methane wouldbe unexpected), except H2. In addition, there are no moleculesbetween 21 and 25 made up of these common elements. Limitingthe elements in this way reduces the number of three-cellpartitions of the number 638 from 33 920 to 27 562.

(16) Biggs, N. Discrete Mathematics; Clarendon Press: Oxford, 1989.

Figure 3. Brief summary of the overall process showing the sevensteps.

Table 1. Accurate Mass Product Ion Spectrum ofProtonated Compound B Obtained at a CollisionEnergy of 12 V, Positive ESI Mode

mass

ion rel. inten. found neutralized

1 12 639.1859 638.17812 18 477.1330 476.12523 16 325.1135 324.10574 100 315.0807 314.07295 5 163.0614 162.0536

5364 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003

Page 4: Small Molecules as Mathematical Partitions

In this simple example, there are only five neutralized fragmentions in the spectrum. If the minimum_number_of_ions criterionis set to four ions, it is found that only 2 of the 27 562 partitionscan account for four or more ions in the spectrum. Both of thesepartitions can actually account for all five neutralized fragmentions, using the capital letters A, B, and C to represent individualcells.

The first partition can account for all five ions observed in thespectrum, and it also has two “silent” ions at 153 (B) and 487 (A+ C) Da. This partition is summarized below:

(2) Generating Systems of Equations: assigning a system orsystems of simultaneous linear equations relating the masses ofthe cells to the observed neutralized fragment ions.

After a partition accounting for the minimum_number_of_ionsis found, the fragments that match a sum of cells can then bewritten as a system of linear simultaneous equations. (Sums thatdo not correspond to an observed neutralized fragment ion, 152and 486 in this example, and neutralized ions that are not assignedto cells are ignored.) In this case, where A ) 162, B ) 152, andC ) 324, the linear simultaneous equations are

In some cases, a particular fragment can be assigned in morethan one way. This situation arises when two or more cells areequal or when a cell is the sum of two or more other cells. Inthese cases, all of the possible ways of accounting for thefragments are tested in the next steps until either all of the setsof multiple assignments are tested or until a solution is found withone set of assignments.

As an example of multiple assignments for the same fragmentions, note that the first partition (A + B + C ) 162 + 152 + 324)had a unique assignment for each neutralized fragment ion,whereas the second partition (A + B + C ) 162 + 162 + 314)can have two assignments for two of the neutralized fragment ions,162 (A or B) and 476 (A + C or B + C). The reason for this isthat two of the cells are mass 162.

As integers, these two cells of mass 162 are equal; thus, itwould appear that this would not make any difference. However,it may be assumed that the two cells may have different massdefects. A second consideration is that the two cells may be exactlyequal in mass, but not interchangeable spatially. For example,there could be two ammonia moieties in a molecule, and theassignments must be consistent with respect to the spatialconfiguration of the molecule.

Since there are two neutralized fragment ions in the secondpartition that have duplicate assignments, four systems of linearequations derived from this partition must be considered insubsequent steps.

(3) Removing “Linked” Systems of Equations: removing anysystem of equations if two or more cells are always assignedtogether (“linked”).

If two cells are always found together, those cells areconsidered “linked” because that data can always be describedwith fewer cells. An example of linked cells is the four-cell partitionfor compound B shown below:

Although this four-cell partition above works just as well as thetwo three-cell partitions and also accounts for all five ions, the 80and 82 cells (A and D) are always assigned together, and thesetwo cells should be replaced by a single cell of mass 162. Theseless simple partitions are therefore eliminated.

(4) Solving for Mass Defects: generating a system of simulta-neous linear equations using the coefficients derived in step 2above to relate the integerized mass defects of the cells to theintegerized mass defects of the corresponding neutralized frag-ment ions and solving those simultaneous linear equations forthe integerized mass defects of the cells.

As mentioned previously, the fragment assignments can beviewed as a system of linear simultaneous equations. Eachequation is an assignment of a neutralized ion; the coefficientsare all either 1 or 0. Basically, each cell is either present (1) orabsent (0) in a neutralized fragment ion. The same set ofcoefficients must apply to the simultaneous equations relating themass defects, because each cell represents an elemental composi-tion contributing both an integral mass and a mass defect.

For mathematical convenience, the mass defects of the neutral-ized ions are integerized by multiplying by 1000, and the massdefects are rounded to the nearest milli-Dalton. The mass defectsof the cells are subsequently calculated to the nearest milli-Dalton.In the simple three-cell example, for the partition A + B + C )162 + 152 + 324, the corresponding system of simultaneousequations relating the integral cell masses to the integral massesof the neutralized ions was shown in step 2 above.

Analogous equations (same coefficients) can be written interms of the integerized mass defects, where the small letters a,b, and c are the unknown integerized mass defects of each cell,

A + B + C ) 162 + 152 + 324

A + B + C ) 162 + 162 + 314

combination sum + H+ cell assignment

152 152 153 B162 162 163 A324 324 325 C152 + 162 314 315 A + B152 + 324 476 477 B + C162 + 324 486 487 A + C152 + 162 + 324 638 639 A + B + C

1*A + 0*B + 0*C ) 162

1*A + 1*B + 0*C ) 314

0*A + 0*B + 1*C ) 324

0*A + 1*B + 1*C ) 476

1*A + 1*B + 1*C ) 638

combination sum + H+ assignment

80 + 82 162 163 A + D324 324 325 C80 + 152 + 82 314 315 A + B + D152 + 324 476 477 B + C80 + 152+ 324 + 82 638 639 A + B + C + D

Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 5365

Page 5: Small Molecules as Mathematical Partitions

and the sum is the integerized mass defect of the correspondingneutralized fragment ion (in milli-Daltons).

Three equations are required to solve for three variables. [Theminimum_number_of_ions criterion must therefore be greater thanor equal to the number of cells.] However, a multistage MonteCarlo optimization can be used to find the mass defects of thecells (a, b, and c) using all five equations and, thus, use all of thedata points.17 In this case, a two-stage Monte Carlo algorithm waswritten to minimize the sum of the squares of the differencesbetween the calculated defects and the actual defects (least-_squares_fit). In many cases, the extra equations will reveal acontradiction (revealed in the form of a high least_squares_fit) thatwill rule out a partition or a set of assignments. In addition, thecalculated mass defects of the cells should be more accurate thanthe fragment ion mass defects, since the cells are “weighed” ingroups rather than one at a time18 and because there are usuallymore equations than variables. (In the C programs, the massdefects of the cells are calculated to the nearest milli-Dalton; themass defects could be calculated to the nearest tenth of a milli-Dalton at the cost of additional calculation time.)

In this case, there is a solution (a ) 54, b ) 19, and c ) 106).Thus, the exact masses of the cells are 162.054, 152.019, and324.106. In the case of the other partition, A + B + C ) 162 +162 + 314, where there were four possible ways of assigning thefragments as a result of two identical cells, all four of the solutionsare essentially identical (a ) 53, b ) 53, c ) 72), and thus, cellsA and B apparently are identical. The exact masses of the cells,calculated by the program, are 162.053, 162.053, and 314.072.

(5) Checking MS3 or CID-MS/MS Data: removing any systemof equations having third (or higher)-generation product ionspectra inconsistent with the assigned equations.

Logically, the cells of each product ion must be a subset ofthe cells of its parent ions. Three product ions of compound Bwere generated by source CID fragmentation and then furtherfragmented with MS/MS. The 477 fragment ion gave a 315product ion; the 325 fragment ion gave a 163 product ion, but the315 fragment ion did not give the 163 product ion (the only ionsmaller than 315 in the MS/MS spectrum).

The first partition was: A + B + C ) 162 + 152 + 324. The476 is assigned as 0*A + 1*B + 1*C; the 314 is assigned as 1*A+ 1*B + 0*C. Thus B + C fragments into A + B; this iscontradictory. This first partition must be eliminated from furtherconsideration on the basis of this contradiction.

The second partition was A + B + C ) 162 + 162 + 314,where cells A and B were found to be equal, since their massdefects were essentially the same. The 476 is assigned as 0*A +1*B + 1*C or 1*A + O*B + 1*C; the 314 is assigned as 0*A +0*B + 1*C. B + C or A + C fragments into C, which is valid. The324 is assigned as 1*A + 1*B + 0*C; the 162 is assigned as 0*A+ 1*B + 0*C or 1*A + 0*B + 0*C. Thus, A + B breaks up intoA or B, which is also consistent with the data.

In this simple example, all 5 ions in the spectrum wereassigned. However, because of multiple fragmentation pathwaysor the presence of odd electron ions, it is seldom that all of thefragment ions in a spectrum are assigned. If there were 11 totalions and 6 ions were assigned, a solution that accounts for themolecular ion and the 5 largest fragment ions is probably betterthan a solution that accounts for the molecular ion and the 5smallest fragment ions.

The concept of coverage was developed to give more weightto the larger fragments. The square root of the relative intensityof each fragment ion, rounded down to the nearest integer, iscalculated and saved. The protonated molecular ion is given avalue of 0. The total sum if the square roots of the fragment ionsis the total coverage. The coverage value of each fragment ion isits percent of the total coverage. For example, the relative intensityof the 325 ion in compound B is 16 (Table 1); its coverage is 20.

In general, the better solutions have the highest values ofcoverage. Other factors to compare are the least_squares_fit, thedifference between the calculated and experimentally determinedneutralized fragment ion masses, and the number of ions that wereassigned.

(6) Finding Configuration: finding configurations that areconsistent with the fragment assignments by checking everypermutation of the coefficients of the equations against truth tablesfor each configuration.

On the basis of the previous assumption of no rearrangements,no two cells can be attached to each other in a fragment ion ifthe cell or cells connecting the pair are not present. As a result,it is possible to make a connection table or “truth table” for everypossible combination of cells that make up a modular structure.The fragment combination will either be “true” (1) if the fragmentis made up of connected cells, or “false” (0) if the fragment ismade up of cells that are missing the connecting cells. Forexample, every combination of cells is true for the three-cellmodular structure except for the combination of the two end cells.The modular structures and connection tables for the three- andfour-cell configurations are shown in Figure 4.

As shown in Figure 4, the cells of a three-cell modular structurecan only be arranged in one way. Two configurations are neededto describe a four-cell modular structure. As the number of cellsincreases, the arrangements become more complex. Threeconfigurations are required to describe all of the five-cell modularstructures, as shown in Figure 5. (The connection table for five-cell modular structures is in Supporting Information).

The designations 1, 2, 3, 4, ... are positions in the modularstructures where the cells (designated A, B, C, D, ....) aresequentially tested in a systematic way. For example, there are(3!) 6 ways to arrange three cells, and (5!) 120 ways to arrangethe cells in a five-cell modular structure.

(17) Conley, W. Computer Optimization Techniques; Petrocelli Books: Princeton,NJ, 1984; pp 250.

(18) Sloane, N. J. A. In Fourier, Hadamard, and Hilbert Transforms in Chemistry;Marshall, A. G., Ed.; Plenum Press: New York, NY, 1982; pp 562.

1*a + 0*b + 0*c ) 54(e.g., the mass defect of the 162 ion was 0.0536)

1*a + 1*b + 0*c ) 73

0*a + 0*b + 1*c ) 106

0*a + 1*b + 1*c ) 125

1*a + 1*b + 1*c ) 178

5366 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003

Page 6: Small Molecules as Mathematical Partitions

For compound B, the only partition remaining is A + B + C) 162 + 162 + 314. There were four possible sets of assignments,but the multistage Monte Carlo solutions to the linear equationswere essentially the same for all four. One system of the equationsis arbitrarily chosen to check possible permutations.

The coefficients of the equations can be written as a 1-0matrix. The six permutations are then essentially obtained byshuffling the columns. Each color represents a particular cell(A ) red; B ) magenta; C ) green) and the numbers 1, 2, and 3represent the cell positions on the configurations. This is shownin Figure 6 for compound B.

ABC and its rotational duplicate CBA can be ruled out, becauseboth have assigned a 101 fragment (here, the 476 fragment ion).This fragment, as shown in Figure 4, is not compatible with thethree-cell configuration. ACB, like its rotational duplicate BCA,also has a fragment assigned as 101 (here, the 324 fragment ion).The only permutation without a 101 fragment assigned is CABand its duplicate BAC. Thus, this modular structure has cell A inthe middle. (A and B are both the same mass in this case, butare not the same cell.)

(7) Distributing the Elements: for a given molecular formula,assigning elemental compositions to the cells in such away thatthe maximum difference between the calculated mass of each celland the theoretical mass of the elemental composition of eachcell is less than the MaxDefect.

Now elemental compositions are assigned to each cell. For agiven molecular formula, assigning elements to one cell will limitthe choice of elements available to the other cells. The currentsoftware only will test one possible formula for the wholecompound at a time. The maximum difference between thecalculated mass of the cell and the theoretical mass of theelemental composition of the cell is the MaxDefect parameter,which is in units of tenths of milli-Daltons.

The nitrogen rule19 is also applied to the cells. If the mass ofthe cell is odd, the number of nitrogens is forced to be odd.Conversely, if the mass of the cell is even, the number of nitrogensin the cell is forced to be even. The application of the nitrogenrule is based on the assumption, previously stated, that thefragment ions are even-electron species.

RESULTS AND DISCUSSIONCompound B. The results for compound B from the C

program for a three-cell partition, which was used as the detailedexample, are shown in Table 2. Only one solution was found usingthe parameters in the table. This printout will be used as anexample of the output of the C programs.

The top section of Table 2 lists the inputs, the parameters forMinimumCoverage, MaxDefect, minimum_number_of_ions, and theleast_squares_fit criterion, the accurate mass data (after subtractionof the mass of a hydrogen), and the elemental formula being tested(the correct formula was used except where noted).

The middle section shows the solution or solutions suggestedby the C program. The first part of each solution is informationabout the cells: the elemental compositions, the integral masses,the calculated defects from the Monte Carlo optimization (defect),and the calculated defects based on the elemental composition(calcdefect). All mass defects are in tenths of milli-Daltons. [The

(19) McLafferty, F. W. Interpretation of Mass Spectra, 3rd ed.; University ScienceBooks: Mill Valley CA, 1980; p 303.

Figure 4. Three- and four-cell configurations and connection tables.

Figure 5. Three configurations, designated W, X, and Y, are neededto describe five-cell partitions.

1*A + 0*B + 0*C ) 162

0*A + 0*B + 1*C ) 314

1*A + 1*B + 0*C ) 324

1*A + 0*B + 1*C ) 476

1*A + 1*B + 1*C ) 638

Figure 6. Permutations of the assigned fragments of com-pound B.

Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 5367

Page 7: Small Molecules as Mathematical Partitions

cells, in the modular structures, are designated by the followingcolors: dark blue, cell E; light blue, cell D; green, cell C; magenta,cell B; and red, cell A.] The fragment assignments are summarizedbelow the cell data. The cellular composition is followed by theintegral mass. The last three columns are again mass defects: theLSdefects (the sum of the defects of the cells assigned to thefragment that were calculated from the Monte Carlo optimization,the calculated defects (CalcDefect) from the elemental composi-tions, and the actual experimentally measured mass defects(MeasDefect) of the neutralized fragment ions. The units aretenths of milli-Daltons. Next the “best permutations” are listed,CAB and its rotational equivalent BAC. This compound is unusualbecause there is only one solution possible. Additional exampleswill make it evident that more commonly, there are multiplesolutions and multiple configurations for each solution.

The last section summarizes the statistics on the computations,for example, the total number of partitions of 638 into three cells(27 562) and how many partitions were rejected because of aconflict with the CID-MS/MS data (1). The modular structure(permutation BAC) listed in Table 2 is diagrammed in Figure 7,together with the partial molecular structure of the compound.

Since two of the cells are equal, this is a case in which onemight expect four solutions, since it was previously shown thatthe fragments of this particular partition could be assigned in four

different ways. However, the program, in the case of multiplefragment assignments generating multiple systems of equations,tests each system of equations sequentially until either a solutionis found, or until all systems of equations are tested.

By dividing a molecule into smaller cells, there is a muchgreater possibility that each cell will have a unique elementalcomposition, and thus, only one overall composition will work.This same principle is behind Wu’s “basket-in-a-basket” approach.It was tested here on compound B by generating all of the possibleformulas having C(24-32), H(0-100), N(0-10), O(0-20), andS(0-2) for compound B, within 3 mDa of the experimental value,639.1859. (The carbon was set at 24 to 32 to be at least close tomatching the intensity of the first isotope.) Seven possibleformulas were found meeting these criteria. All seven formulaswere inputted, one at a time, using the three-cell program. Allbut one gave one solution; that one gave two solutions. Eight totalsolutions were found. Changing the MaxDefect from 25 to 10narrowed the mass accuracy windows on the cell masses to about1 mDa; now only two of the possible formulas gave one answereach. The results are very consistent with Wu’s hypothesis thataccurately determining the masses of smaller pieces of a com-pound should limit the elemental composition of the wholecompound.8

However, a 1-mDa window on the cell masses, although it didwork well for this compound, is probably too narrow at this point;the program is presently doing a two-stage Monte Carlo optimiza-tion, and the mass defects of the cells are being calculated onlyto the nearest milli-Dalton. Except for this experiment, a 2.5-mDawindow was used to generate all of the results reported in thisstudy. The Q-TOF-2 instrument would appear to be capable ofsupporting a 1-mDa window for the cell mass defects if the MonteCarlo optimization were revised. In the case of compound B,perhaps because of the presence of a sulfur atom, the isotoperatios of the protonated molecular ion would actually be as useful

Table 2. Results for Compound B

elemental composition used: C28H34N2O13S1Cl0F0data file: CompoundB.datminimum hits required: 4

least square error per hit: 5max mass error accepted: 25

min coverage, %: 65compd’s mol wt: 638; tot no. ions: 5

162.0536 162 54314.0729 314 73324.1057 324 106476.1252 476 125638.1781 638 178

Solution no. 1; Linear Fit 2; Coverage 100C H N O S Cl F mass defect calcdefect

cell C 16 14 2 3 1 0 0 314 720 722cell B 6 10 0 5 0 0 0 162 530 525cell A 6 10 0 5 0 0 0 162 530 525

fragment composition mass LSdefect CalcDefect MeasDefect

1 A 162 530 525 5402 C 314 720 722 7303 AB 324 1060 1050 10604 AC 476 1250 1247 12505 ABC 638 1780 1772 1780

best permutation: CABbest permutation: BAC

tot. partitions of 3 cells: 27 562partitions accounting for less than 4 fragments: 27 560

no. partitions with required no. of fragments: 2no. of linked cells rejected: 0

no. of partitions failing least squares criterion: 0no. of partitions not matching any configuration: 0

no. of partitions with contradictory CID-MS/MS data: 1no. of partitions failing MinimumCoverage test: 0no. of partitions not fitting the elemental data: 0

no. of partitions duplicated by multiple assignments: 0no. of partitions having multiple elemental compositions: 0

Figure 7. Modular assignment for compound B, which is similar tothe molecular structure.

5368 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003

Page 8: Small Molecules as Mathematical Partitions

as the narrower cell mass window in eliminating formulas for thewhole compound. The isotope ratios were graphically generatedfor the seven possible formulas and compared to the isotopic ratiosfound experimentally. Only the formulas C28H35N2O13S1 andC24H31N8O11S1 appeared to have matching isotopic ratios (Sup-porting Information). The combination of the narrower cellwindow and the matching isotope ratios gives a unique (andcorrect) elemental composition for this 638 Dalton compound.

Xemilofiban. The modular structure of xemilofiban is shownin Figure 1, derived on the basis of knowledge of the fragmenta-tion of this compound. Its spectral data was analyzed using thefive-cell partition program. A large number of partitions weregenerated (1 506 841) and tested; only four solutions were found.The coverage values were 78, 78, 90, and 78% respectively.The solution having 90% coverage (solution no. 3) is shown in

Table 3. The configurations for solution no. 3 are a rotationallyequivalent pair. Basically, this is the same as the six-cell modularstructure in Figure 1 but where the black ammonia cell iscombined with the light blue cell. Two of the other solutions,solution no. 1 and solution no. 2, gave similar results but dividedthe molecule into cells differently, as might be expected. Theirmodular structures are compared to the molecular structure inFigure 8. In solution no. 1, the ethanol cell has been incorporatedinto the alkyne, and an ammonia cell has been taken out of thealkyne. Solution no. 2 is similar to solution no. 1, but the ammoniaof the amidine has been included with the aromatic amine, andthe ethanol has been separated out again.

Solution nos. 1 and 2 have ambiguity with respect to spatialorientation of the cells. Solution no. 1 works whether the C7H8O2

or the NH3 is on the end. Solution no. 2 works with either the Y

Figure 8. Two other xemilofiban solutions found. Left: solution no. 1. Right: solution no. 2.

Table 3. Computer Output for Solution No. 3, Xemilofiban

linear fit 3, coverage 90

C H N O S Cl F mass Lsdef calcdef

cell E 7 6 2 0 0 0 0 118 530 530cell D 5 5 1 1 0 0 0 95 370 370cell C 4 2 0 2 0 0 0 82 60 54cell B 2 6 0 1 0 0 0 46 420 417cell A 0 3 1 0 0 0 0 17 260 265

fragment composition mass LSdefect CalcDefect MeasDefect

1 E 118 530 530 5202 D 95 370 370 3703 CE 200 590 584 5904 CD 177 430 424 4305 BD 141 790 787 7906 BCD 223 850 841 8507 AE 135 790 795 8008 ACE 217 850 849 8609 ABCDE 358 1640 1636 1640

10 124 0 0 011 175 0 0 012 216 0 0 0

configuration used, W; best permutation, AECDBconfiguration used, W; best permutation, BDCEA

Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 5369

Page 9: Small Molecules as Mathematical Partitions

or W configuration, likewise with ambiguous placement of groupson the right side as drawn.

Solution no. 4 is quite different. Here, it appears that the 17,78, and 46 cells of Figure 1 have been replaced with 93 (C6H7N1)and 48 (C1H4O2) cells, allowing an assignment of the 175 ion.Solution no. 4 appears to be an unlikely solution; a C1H4O2 cellwould be very unusual from a chemistry standpoint. However,solution no. 4 does appear to fit the fragmentation data as well asthe other solutions.

The statistics on xemilofiban indicate that most partitions areselected out. For example, xemilofiban had 1 506 841 partitionssince its mass was 358. Out of these, 12 545 partitions thataccounted for six or more fragments were transposed into systemsof equations, and 1132 more systems of equations were generatedas a result of multiple assignments, giving a total of 13 677 systemsof linear simultaneous equations that were tested further. TheMinimumCoverage test (set at 65%) removed 9743 of these;another 1940 failed to fit any configuration; 926 failed the

Figure 9. Top: structure of compound C. Bottom: a solution/configuration for compound C shown overlayed on the accurate mass production spectrum of protonated compound C at a collision energy of 25 V, positive ESI mode.

Figure 10. Top: compound D. Bottom: highest coverage solution (X configuration illustrated) overlayed on the accurate mass product ionspectrum of protonated compound D at a collision energy of 25 V, positive ESI mode.

5370 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003

Page 10: Small Molecules as Mathematical Partitions

least_squares_fit criterion; 851 had linked cells; 124 were notcompatible with the molecular formula; and 89 had contradictoryCID-MS/MS data. That left the four solutions that were found bythe program.

Compound C, a Symmetrical Compound. Symmetry in acompound was not a problem for the program. Like compoundB, compound C had one solution. Although the compound hadonly one solution in terms of cell masses and compositions, therewere 12 possible permutations that included all three configura-tions. The annotated spectrum is shown in Figure 9, showing theknown structure, the correct W configuration, and correctpermutation BECDA. Three of these configuration/permutationshad two saturated cells (ammonia-ammonia) that were adjacent,and these three were eliminated (to connect two cells togetheralways requires the loss of one ring or double bond). All ninepossible configurations (including BECDA) are shown in theSupporting Information. Plausible alternative modular structuresand configurations are one of the objectives of the program.

Compound D, an X Configuration Compound with Mul-tiple Cells Equal. A solution for compound D with the highestcoverage (X configuration permutation with E as the middle cell)is shown in Figure 10. There were a total of four solutions.Compound D is definitely an X-type compound; however, thesolutions all indicate that all configurations can be shown to fitthe data, perhaps because three of the cells are identical here.

Three Examples of Unanticipated Problems Observedwhen Analyzing Six- and Seven-Cell Compounds with theFive-Cell Program: Compound A, Leucine Enkephalin, andOrbofiban. Xemilofiban is a six-cell organic compound (Figure1); as previously noted, analyzing xemilofiban with the five-cellprogram gave three solutions that basically combine cells indifferent ways. It was expected that most six-, seven-, and eight-cell compounds could be analyzed with the five-cell program, atthe tradeoff of lower coverage solutions. However, if a compound

has two or more equal cells and the total number of cells in thecompound exceeds the cells in the program, problems can occur.

Compound A is similar to xemilofiban but it has an acetyl group(mass 43) in place of the alkyne (mass 25) group. The five-cellprogram was run, and 11 solutions were found. Three solutionsappeared to be reasonable answers: solution nos. 2, 3, and 6.Solution nos. 5 and 6 had the highest coverage at 76%. Solutionno. 6, with the Y configuration, is shown in Figure 11, and it is avery acceptable solution.

It was expected that one solution would be identical to solutionno. 3 (Table 3) for xemilofiban, but with one cell increased inmass by 18 mass units. However, that expected solution was notfound. A program that can trace a partition was run, and it wasfound that this expected solution did not fit any configuration, soit was eliminated. The NH3 cell was used with the 82-mass cell toaccount for the 100-Da fragment ion in the spectrum, and the sameNH3 cell was also being used with the aromatic amine (118 cell)to account for the 136-Da fragment ion in the spectrum. As noted,there are two NH3 cells possible: xemilofiban and compound Aare actually six- and seven-cell compounds, respectively. However,a solution having a single NH3 cell cannot use the same NH3 intwo different places in the modular structure, so this solution forcompound A was eliminated. In the case of xemilofiban, the 100-Da fragment ion was under the 2% relative intensity limit, so thiscontradiction was not observed (the 100-Da fragment ion has 5%relative intensity in compound A).

Leucine enkephalin has the protonated molecular ion and 23fragment ions in its product ion spectrum. The maximum numberof ions that can be assigned with five cells is 20 ions, and therefore,the MinimumCoverage parameter was set to 50%. The five-cellcomputer program assigned 10 ions (solution no. 4, SupportingInformation), and the cells were calculated at the predictedmasses, but the expected W configuration was not found. A singleCO (mass 28) cell was used in two locations, like the ammonia

Figure 11. Top: compound A. Bottom: a solution found for compound A by the C program overlayed on the accurate mass product ionspectrum of protonated compound A at a collision energy of 25 V, positive ESI mode.

Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 5371

Page 11: Small Molecules as Mathematical Partitions

had been in compound A. However, in compound A, no config-uration fit those assignments, so that system of equations waseliminated. In this case, the correct W configuration was elimi-nated, a less plausible Y configuration was chosen that allowedthe assignments of 10 ions.

A third example is orbofiban, which is a six-cell compound.The best solution for orbofiban had 87% coverage and assigned 8out of 11 ions (solution no. 6). However, this solution was not aW configuration modular structure, as expected for this linearcompound, but an X or Y configuration. In orbofiban, there aretwo equal cells of mass 71, C3H5NO. To properly assign the same8 ions using the W configuration, a six-cell modular structurewould have to be used.

The object of partitioning is to find all of the most plausiblesolutions. Currently, once a set of neutralized fragment ionassignments are made by the C program, those assignments as agroup will either pass or fail the subsequent screening steps.Contradictory assignments can occur when six- and seven-cellmolecules that have two or more identical cells are treatedmathematically as five-cell modular structures. The earlier ex-amples (compounds B, C, and D) each had two or more identicalcells. Configuration contradictions were not observed becausecompound B is a three-cell compound that was analyzed with athree-cell program, and compounds C and D are five-cell com-pounds that were analyzed with the five-cell program.

Compound E. Previous solutions that have been discussedhave cells that are arithmetic differences between two fragmentsor between a fragment and a proton. For example, for compoundC (Figure 9), the cells are 17, 118, and 82. There are pairs offragment ions in the spectrum of compound C that have thosedifferences (353-336; 336-218; 218-136). It would appear to besimpler and faster to use arithmetic differences as possible cellsinstead of partitioning and trying all possible integers as cells.

The cells are always differences between fragments or betweenfragments and a proton, but not always simple differences.Compound E is an example of a compound in which one of thecells (cell C) is not a simple difference between two fragment ionsin its spectrum. The assignments for the best solution forcompound E (solution no. 1) are shown in Table 4 (Details inSupporting Information.).

In this case, cell C can be calculated as: ion 1 + ion 5 - ion6 ) (C + D) + (A + B + C + E) - (A + B + C + D + E) ) C.Therefore, cell C is a sum/difference of three ions.

Analysis reveals that three-cell modular structures have cellsin which the cell masses are always differences between twofragments or between a fragment and a proton. But this is thesimple case. For example, at five cells, with the minimum_num-ber_of_ions set at 6, there are cell masses that potentially are sum/differences of as many as four fragment ions and a proton. Thisgreatly increases the number of sums/differences possible. Forexample, using the 12 neutralized ions in the product ion massspectrum of xemilofiban (Table 3), it is possible to generate everyinteger between 1 and 358 (the molecular weight of xemilofiban)using sums/differences of 1, 2, 3, and 4 ions. (This analysis isshown in the Supporting Information.) With the exception of three-cell modular structures, rather than generating all of the possiblesums/differences, storing these possible cell values in an array,and then using this array to generate partitions, it is computa-tionally faster to just generate and test all possible partitions ofthe molecular weight.

CONCLUSIONSComputer programs were developed to generate modular

structures of three, four, and five cells from mass spectral dataand tested on molecules with well-known and straightforwardfragmentation pathways. These computer programs were oftensuccessful in assigning fragments and in finding the exact massesand elemental compositions of the cells. The programs were alsoable to easily handle a symmetrical compound and a compoundhaving multiple cells of the same mass. Dividing a molecule intosmaller cells was also found to effectively limit the elementalcomposition of the whole compound.

This approach was not usually successful in determining aunique configuration of the cells in the modular structure bychecking all possible permutations against all possible configura-tions; multiple configurations were usually found for each solution.

Presently, the most common cause of problems is analyzingsix- and seven-cell compounds, having at least two identical cells,with a five-cell program. For compound A, this led to contradictoryfragment assignments and deletion of one expected five-cellsolution. This could be remedied with a seven-cell program. Fortwo other examples, leucine enkephalin and orbofiban, the samesituation led to incorrect configurations that use fewer cells toobtain the same coverage that a six-cell solution could achieve intheory. This could be remedied with the development of a six-cell program; however, these are two examples for which thesimpler solutions (five-cell versus six-cell solutions) are not thecorrect solutions, and one would tend to favor simpler solutionsin the case of unknown compounds.

This preliminary study indicates that partitioning may be veryuseful for identification, especially in those situations in whichbackground information is minimal. However, many improvementsare needed: programming for six-, seven-, and eight-cell partitions;an additional stage for the Monte Carlo optimization to moreprecisely calculate the mass defects of the cells; a module toeliminate elemental compositions of the whole molecule basedon the isotope ratios of the molecular ion; a module to check forrotational equivalence; and a module to apply some simplechemical rules (e.g., two saturated cells cannot be attached toone another). Much more additional work will be needed to

Table 4. Assignments for Compound E Showing ThatCell C Is Not a Simple Difference.

solution no. 1

fragment composition mass LSdefect CalcDefect MeasDefect

1 CD 141 790 787 7902 BCE 200 580 584 5903 BCD 169 730 736 7404 AE 135 800 795 8005 ABCE 217 860 849 8506 ABCDE 304 1530 1531 1530

model used, W; best permutation, AEBCDmodel used, Y; best permutation, DCEBAmodel used, Y; best permutation, BCEDAmodel used, W; best permutation, DCBEA

5372 Analytical Chemistry, Vol. 75, No. 20, October 15, 2003

Page 12: Small Molecules as Mathematical Partitions

determine whether this approach could be applied to morecomplex spectra (e.g., fragmentation via multiple pathways.)

ACKNOWLEDGMENTI thank Dr. Jeremy Hribar (Senior Research Advisor, Phar-

macia/Pfizer) for helpful discussions over many years on usingmass spectrometry to identify compounds. I thank Ms. KerryBrown, Dr. Hans Westenburg, and Dr. Rick Rhinebarger (all ofPharmacia/Pfizer) for their critical reviews of this manuscript. Ithank John Hoyes and Iian Lloyd of Waters/Micromass for helpfultips on the tuning parameters needed for obtaining excellentaccurate mass MS/MS data on small molecules.

SUPPORTING INFORMATION AVAILABLE1, Tuning parameters for the Q-TOF-2 mass spectrometer; 2,

conditions used to obtain the spectra; 3, configurations; 4, cellsas sum/differences of neutralized fragment ions; 5, compound B:inputting formulas/isotope ratios; 6, xemilofiban; 7, compound A;8, compound B; 9, compound C; 10, compound D; 11, leucineenkephalin; 12, orbofiban; and 13, compound E. This material isavailable free of charge via the Internet at http://pubs.acs.org.

Received for review April 29, 2003. Accepted August 1,2003.

AC034446K

Analytical Chemistry, Vol. 75, No. 20, October 15, 2003 5373