3
Direct MALDI analysis of naturally cleaved human saliva samples: Mapping to a series of KPQ-terminated peptides from small salivary proteins. TP10 #XXX Kenneth C. Parker 1 ; Na Tian 2 ; Frank Oppenheim 2 ; Eva Helmerhorst 2 . 1 SimulTof Corporation, Sudbury , MA; 2 Boston University School of Dental Medicine, Boston, MA Methods Introduction One of the most easily collected human biofluids is saliva. The dominant intact proteins in saliva are usually alpha-amylase, immunoglobulin A, and lysozyme, but saliva also commonly contains naturally processed peptides in the 600- 10000 m/z range that can directly monitored by MALDI-MS. Previous experiments have established that many of these peptides derive from seven additional proteins that are highly expressed in saliva: basic salivary proline rich proteins 1-4 (PRB1-PRB4), salivary acidic proline-rich phosphoprotein (PRPC) and histatin-3 (His3). Some of the responsible proteases apparently derive from commensal bacteria, for example, Rothia species, that often cleave proteins C-terminal to the tripeptide sequence KPQ (Helmerhorst et al., 2008). Conclusions: • Many peptides in saliva supernatants derive from series of staggered peptides with shared N or C- termini from histatins or proline-rich proteins (PRPs). •Presumably, these derive from a combination of endopeptidases and exopeptidases. • For PRPs, a preferred endocleavage motif is KPQ/GPP. • Depending on subtle collection parameters, different series are most prominent. • Can tentatively identify many peptides by high mass accuracy mapping to •1.) a list of previously identified salivary peptides •2.) series of peptides with shared N-termini or C- termini. • Some of these identifications have been confirmed by MSMS •Additional identifications are in progress. •Identifications of peptides in series is complicated by repeats, leading to multiple series with members with identical aa composition. •PCA separates samples into sets dominated either by PRPC or His3. •Staggered PMF may be generally useful for studying many biofluids. •Can qualify dental hygienist according to pattern of peptides after cleaning. 1. Collect whole or parotid secretion saliva from 88+ human subjects (BU) or lab personnel (Sudbury). 2. Spin; keep supernatant. 3. Dilute into HCCA MALDI matrix; spot in duplicate. 4. Collect MALDI reflectron MS spectra (14.8 m flight tube). 4. Map to: - list of 338 identified peptides - to series of staggered peptides (staggered PMF) from13 small salivary proteins. 5. Prepare 1 amu mass matrix from top 40 masses from 179 spectra from 88 patients found >=4 times -> 252 masses. 6. Normalize, perform PCA. 7. Collect selected MSMS spectra. Fig. 1. Software engineer’s Saliva. Staggered PMF 1. Get protein sequence of salivary protein 2. Make truncated peptide series starting at every possible N-terminus and at every possible C-terminus (each peptide ends up in 2 series). 3. Define each series of related peptides as a protein-like entity for PMF. 4. Increase ChemScore of peptides 2x for C-ter. Q and N-ter. G. 5. Use ordinary PMF logic to identify those series Fig. 1. Example truncation series from histatin 3 (His3) aa M ass < Sequence > mb 1 987.5 _ DSHAKRHH GYK 987 1 1044.5 _ DSHAKRHHG YKR 1044 1 1207.6 _ DSHAKRHHGY KRK 1207 1 1335.7 _ DSHAKRHHGYK RKF 1335 1 1491.8 _ DSHAKRHHGYKR KFH 1491 1 1619.9 _ DSHAKRHHGYKRK FHE 1619 1 1766.9 _ DSHAKRHHGYKRKF HEK 1766 1 1904.0 _ DSHAKRHHGYKRKFH EKH 1903 1 2033.0 _ DSHAKRHHGYKRKFHE KHH 2032 1 2161.1 _ DSHAKRHHGYKRKFHEK HHS 2160 1 2298.2 _ DSHAKRHHGYKRKFHEKH HSH 2297 1 2435.2 _ DSHAKRHHGYKRKFHEKHH SHR 2434 1 2522.3 _ DSHAKRHHGYKRKFHEKHHS HRG 2521 1 2659.3 _ DSHAKRHHGYKRKFHEKHHSH RGY 2658 1 2815.4 _ DSHAKRHHGYKRKFHEKHHSHR GYR 2814 1 2872.5 _ DSHAKRHHGYKRKFHEKHHSHRG YRS 2871 1 3035.5 _ DSHAKRHHGYKRKFHEKHHSHRGY RSN 3034 Shared mature N-terminus aa M ass < Sequence > mb 1 3035.5_ DSHAKRHHGYKRKFHEKHHSHRGY RSN 3034 2 2920.5D SHAKRHHGYKRKFHEKHHSHRGY RSN 2919 3 2833.5 DS HAKRHHGYKRKFHEKHHSHRGY RSN 2832 4 2696.4 DSH AKRHHGYKRKFHEKHHSHRGY RSN 2695 5 2625.4 SHA KRHHGYKRKFHEKHHSHRGY RSN 2624 6 2497.3 HAK RHHGYKRKFHEKHHSHRGY RSN 2496 7 2341.2 AKR HHGYKRKFHEKHHSHRGY RSN 2340 8 2204.1 KRH HGYKRKFHEKHHSHRGY RSN 2203 9 2067.1 RHH GYKRKFHEKHHSHRGY RSN 2066 10 2010.0 HHG YKRKFHEKHHSHRGY RSN 2009 11 1847.0 HGY KRKFHEKHHSHRGY RSN 1846 12 1718.9 GYK RKFHEKHHSHRGY RSN 1718 13 1562.8 YKR KFHEKHHSHRGY RSN 1562 14 1434.7 KRK FHEKHHSHRGY RSN 1434 15 1287.6 RKF HEKHHSHRGY RSN 1287 16 1150.6 KFH EKHHSHRGY RSN 1150 17 1021.5 FHE KHHSHRGY RSN 1021 Shared C-terminus at aa 24 I Sym b Series Leng #Pep #O bs #O bs_i Sam e Score %IM ppw 1 His3 51 80 18 18 12 1014002 32.8 1.3 2 PRPC 166 104 6 6 4 9687 10.4 0.3 all 24 16 1 His3 C13 16 8 8 5 661766 15.7 1.4 2 His3 N1 25 8 8 5 412745 16.6 1.4 3 His3 N15 12 2 3 2 16369 1.3 0.5 4 His3 N7 20 4 5 0 10867 1.1 2.5 5 PRPC N132 10 2 2 2 5024 0.4 2.4 6 PRPC N107 36 2 2 2 4702 9.9 0.2 all 26 16 Rank M assExp ppm < Sequence > ChS 19 2625.4 -5.4 SHA KRHHGYKRKFHEKHHSHRGY RSN 20 17 2341.2 -4.5 AKR HHGYKRKFHEKHHSHRGY RSN 20 21 1847.0 0.5 HGY KRKFHEKHHSHRGY RSN 20 11 1718.9 -1.5 GYK RKFHEKHHSHRGY RSN 20 9 1562.8 0.3 YKR KFHEKHHSHRGY RSN 20 10 1434.7 -1.2 KRK FHEKHHSHRGY RSN 20 3 1287.6 1.3RKF HEKHHSHRGY RSN 20 5 1150.6 0.9 KFH EKHHSHRGY RSN 20 25 987.5 -1.6 _ DSHAKRHH GYK 40 4 1207.6 -1.2 _ DSHAKRHHGY KRK 20 2 1335.7 -0.9 _ DSHAKRHHGYK RKF 20 7 1491.8 1.6_ DSHAKRHHGYKR KFH 20 107 1619.8 -9.2 _ DSHAKRHHGYKRK FHE 20 76 1766.9 6.3_ DSHAKRHHGYKRKF HEK 20 115 2522.3 -4.9 _ DSHAKRHHGYKRKFHEKHHS HRG 20 51 990.5 -3.1 KPQ GPPPQGGRPQ GPP 320 71 1866.9 -1.3 KPQ GPPPQGGRPQGPPQGQSPQ _ 160 42 1403.7 -9.0KSR SARSPPGKPQGPPQ QEG 40 1 2185.1 -6.3KSR SARSPPGKPQGPPQQEGNKPQ GPP 80 94 1731.9 1.8 KPQ GPPQQGGHPPPPQGRPQ GPP 320 28 2490.2 5.4 KPQ GPPQQGGHPPPPQGRPQGPPQQGGH PRP 80 26 1067.5 -1.3 RKF HEKHHSHR GYR 40 3 1287.6 1.3RKF HEKHHSHRGY RSN 20 15 1443.7 0.0RKF HEKHHSHRGYR SNY 20 14 925.5 -3.1 AKR HHGYKRK FHE 20 125 1603.8 8.4 AKR HHGYKRKFHEKH HSH 20 108 1965.0 0.2 AKR HHGYKRKFHEKHHSH RGY 20 40 2121.1 -0.7 AKR HHGYKRKFHEKHHSHR GYR 40 17 2341.2 -4.5 AKR HHGYKRKFHEKHHSHRGY RSN 20 128 2065.0 3.9 NKS qSARSPPGKPQGPPPQGGNQP QG 20 43 2193.1 4.6 NKS qSARSPPGKPQGPPPQGGNQPQ G 80 123 1864.0 7.2PPP qEGNKSRSARSPPGKPQG PPQ 20 24 2186.1 -5.4 PPP qEGNKSRSARSPPGKPQGPPQ QEG 40 11 1718.9 -1.5YKR KFHEKHHSHRGYR SNY 20 15 1443.7 0.0RKF HEKHHSHRGYR SNY 20 49 1306.7 0.1 KFH EKHHSHRGYR SNY 20 64 1049.5 -4.3 HEK HHSHRGYR SNY 20 78 1102.5 3.1_ DSHEKRHHG YRR 20 33 1421.7 -7.1 _ DSHEKRHHGYR RKF 20 70 972.6 4.8 HGY KRKFHEK HHS 20 81 1109.6 -6.8 HGY KRKFHEKH HSH 20 21 1847.0 0.5 HGY KRKFHEKHHSHRGY RSN 20 aa M ass < Sequence > 75 1731.8684 QGK PQGPPQQGGHPPPPQGR PQG 76 1731.8684 GKP QGPPQQGGHPPPPQGRP QGP 77 1731.8684 RPQ GPPQQGGHPPPPQGRPQ GPP 78 1731.8684 PQG PPQQGGHPPPPQGRPQG PPQ 79 1731.8684 QGP PQQGGHPPPPQGRPQGP PQQ 80 1731.8684 GPP QQGGHPPPPQGRPQGPP QQG 81 1731.8684 PPQ QGGHPPPPQGRPQGPPQ QGG 82 1731.8684 PQQ GGHPPPPQGRPQGPPQQ GGH 83 1731.8684 QQG GHPPPPQGRPQGPPQQG GHP 84 1731.8684 QGG HPPPPQGRPQGPPQQGG HPR 85 1731.8684 GGH PPPPQGRPQGPPQQGGH PRP 86 1731.8684 GHP PPPQGRPQGPPQQGGHP RPP Complication of truncaton series informatics: repeat sequences X12a X12b X40a X32a X10b X10a X28a X01a X30a X82b X42a X80a X69a X40b X02a X30b X32b X42b X84b X36b X04a X04b X01b X76a X22a X70b X54b X27b X88b X36a X66b X88a X46a X82a X74a X48a X43a X76b X60b X41b X68a X43b X41a X63a X56a X61a X37b X56b X62b X78c X54a X50a X48b X46b X78d X72a X37a X75a X34a X62a X52a X79b X24a X15b X80b X84a X66a X77a X60a X53aX79a X74b X17b X87a X35b X73a X57a X68b X57b X08a X02b X50b X13a X69b X83a X35a X87b X53b X70a X34b X33a X19a X52b X03a X67a X33b X81a X27a X71b X13b X73b X49a X26b X61b X03b X77b X64b X72b X71a X19b X75b X11b X55b X29a X58a X67b X63b X83b X51b X47b X29b X81b X18b X85a X15a X51a X55a X09b X58b X25a X45b X14b X64a X65b X14a X22b X85b X44a X39b X25b X31a X45a X07b X24b X09a X08b X18a X21a X47a X21b X44bX07d X39a X17a X23bX23a X07c X65a X28b X59a X31b X59b X26a X07a X16a X06b X38b X16b X38a X11a X20a X05b X20b X05a -4.00E-01 -3.00E-01 -2.00E-01 -1.00E-01 0.00E+00 1.00E-01 2.00E-01 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 PCA plot:Sam ple Space PC3 1333 1390 1866 2915 4367 1471 2520 4367 1224 1674 1731 2178 1287 1434 1562 1718 2066 2496 2624 1207 1335 1491 3034 925 951 971 990 1004 1068 1076 1106 1107 1114 1135 1193 1202 1220 1222 1238 1246 1315 1374 1378 1380 1508 1509 1570 1575 1680 1767 1805 1818 1849 1904 1931 2011 2027 2028 2039 2041 2065 2077 2087 2121 2130 2161 2182 2183 2184 2185 2240 2607 2971 2973 2975 2990 2992 2993 2999 3000 3001 3016 3017 3018 3035 4325 4326 4327 4333 4334 4350 4351 4352 4353 43614362 4363 4364 4366 4368 -20 -15 -10 -5 0 5 10 -20 -15 -10 -5 0 5 10 15 PCA plot: M assSpace Fig. 3 PCA plots: The intensities of 282 masses found in the top 40 in at least 4 samples were normal and submitted to PCA. In the mass space plot, masses are colored according to the stagger series to which they can be mapped. Samples in which His3 stagger series a prominent map to the center of the PCA plot. Samples on the far right have promin 4369 peak from intact PRPC C-terminal fragment. Samples on the far left are domina by fragments that map to PRPC stagger series. KZip 33 K 6 56 5 1 1 6 104 161 167 155 Fig. 2. Saliva from Helmerhorst lab (top 8) or from me. 2 3 1471.7 0.8GRPQGPPQQGGHQQ PRPC 3 2 2916.5 -1.9G PPPPPPGKPQ GPPPQ GGR PRPC 4 4 1731.9 -2.6GPPQQGGHPPPPQGRPQ PRPC 5 1680.9 G PPRPPQG GRPSRPPQ PRB1 6 3 2521.3 -2.0GRPQGPPQQGGHQQGPPP PRPC 7 7 2078.1 7.5G PPPPG KPQG PPPQG DKSRS PRB1 10 4 1224.6 0.1G GHPPPPQ GRPQ PRPC rank series Mass ppm Seq Sym b 1 4 1471.7 0.0GRPQGPPQQGGHQQ PRPC 2 1 1866.9 2.4G PPPQGGRPQ GPPQGQSPQ PRPC 3 2 1224.6 1.5G GHPPPPQGRPQ PRPC 4 1 990.5 -1.2G PPPQGGRPQ PRPC 5 2 1731.9 0.8GPPQQGGHPPPPQGRPQ PRPC 6 2131.1 7 6 1222.6 -0.7G PPPQGDKSRSP PRB1 8 1135.6 9 2179.1 10 5 1315.7 -0.6G PGRIPPPPPAPY SM R3B rank series M ass ppm Seq Sym b 1 4 1471.7 0.5 GRPQGPPQQGGHQQ PRPC 2 1 1866.9 0.0 G PPPQG G RPQ GPPQ GQ SPQ PRPC 3 1 990.5 0.5 G PPPQG G RPQ PRPC 4 2 1224.6 1.0 G GHPPPPQ GRPQ PRPC 5 2 1731.9 -1.7 GPPQQGGHPPPPQGRPQ PRPC 6 28 1107.6 -0.3 qRG PRG PYPP PRB1 7 9 1315.7 0.8 PG RIPPPPPAPYG SM R3B 8 5 1390.7 -4.1 GGRPQGPPQGQSPQ PRPC 9 3 2040.1 -3.0 G PPPPPPG KPQG PPPQG G R PRPC 10 1004.5 rank series Mass ppm Seq Sym b 1 2 1224.6 -5.0G GHPPPPQ GRPQ PRPC 2 15 1471.7 0.0GRPQGPPQQGGHQQ PRPC 3 G PGRIPPPPPAPY SM R3B 4 2 1731.9 0.4G GHPPPPQ GRPQ GPPQQ PRPC 5 1 990.5 -5.6G PPPQGG RPQ PRPC 6 4 1287.6 -3.4HEKHHSHRGY His3 7 4 1562.8 -1.4KFHEKHHSHRG Y His3 8 7 1056.5 -3.0HSHREFPF His3 9 6 1335.7 -2.1DSHAKRHHGYK His3 10 4 1434.7 2.4FHEKHHSHRGY His3 rank series M ass ppm Seq Sym b 1 1 1287.6 -0.3 HEKHHSHRGY His3 2 2 4369.2 0.0 G RPQG PPQQ ...Q SPQ PRPC 3 4 1335.7 1.0 DSHAKRHHGYK His3 4 2185.1 (4369.2)2+ 5 28 4352.2 -2.1 Q QGGHPP...QG GHQ QG PRPC 6 5 925.5 -3.8 HHGYKRK His3 7 1 1562.8 -0.5 KFHEKHHSHRGY His3 8 1 1434.7 1.0 FHEKHHSHRGY His3 9 6 1443.7 -0.3 HEKHHSHRGYR His3 10 7 1356.8 2.2 KRHHGYKRKF His3 rank series Mass ppm Seq Sym b 1 12 4369.2 -3.8 FDVSLEVS...PFKTENAQ PIGR 2 19 3018.5 4.1 QGPPQQ...GHQQG PRPC 3 2 1335.7 -3.9 DSHAKRHHGYK His3 4 19 4352.2 -1.4 QQGGHPP...GGHQQG PRPC 5 2625.2 6 2 3035.5 -5.5 DSHAKR...HSHRGY His3 7 1 1718.9 4.8 RKFHEKHHSHRGY His3 8 11 2185.1 -6.8 SARSPPG...EGNKPQ PRB4 9 1 1562.8 4.5 KFHEKHHSHRG Y His3 10 1491.8 rank series M ass ppm Seq Sym b 1 14 4369.2 -0.1G RPQG PP...G PPQG QSPQ PRPC 2 1 4353.2 -5.2G N KSRSARS...PPGG NP PRB4 3 4355.1 4 4328.3 5 3018.5 6 2185.1 7 5 2625.4 8.5KRHHG YKRKFHEKHHSHRGY His3 8 4330.0 9 4338.2 10 2 1287.6 -1.8HEKHHSHRG Y His3 rank series M ass ppm Seq Sym b 1 5 4369.2 -0.7 GRPQ G PP...PQ G Q SPQ PRPC 2 2 1335.7 1.0 DSHAKRHHGYK His3 3 4353.3 4 3035.6 5 1 1287.6 2.5 HEKHHSHRG Y His3 6 25 4370.2 9.2 GKPERPPP...RSARSPPG PRB4 7 3017.5 8 3019.6 9 1 1562.8 -0.6 KFHEKHHSHRG Y His3 10 1 1718.9 -2.5 RKFHEKHHSHRG Y His3 rank series M ass ppm Seq Sym b 1 6 4369.2 0.1 GRPQGPPQ...QGQSPQ PRPC 2 8 4353.2 4.6 GG QQQ...QGG HPR PRPC 3 3018.6 4 2 3035.5 -4.7 DSHAKRHH...HSHRGY His3 5 3017.4 6 2 1335.7 0.3 DSHAKRHHGYK His3 7 1 2625.4 -0.6 KRHHG YKRKFHEKHHSHRGY His3 8 4328.3 9 1 1287.6 -5.5 HEKHHSHRGY His3 10 1 1718.9 0.7 RKFHEKHHSHRG Y His3 rank series M ass ppm Seq Sym b 1 1 1866.9 -1.6 GPPPQGGRPQGPPQGQSPQ PRPC 2 4 1471.7 0.3 GRPQGPPQQGGHQQ PRPC 3 3 2916.5 -4.0 GPPPPPPG...PPQGQ SPQ PRPC 4 2 1731.9 -1.1 GPPQQGGHPPPPQGRPQ PRPC 5 57 2179.1 -5.4 PPQGGN...SARSPP PRB1 6 5 1380.7 -1.1 GPPQQGGHPRPPR PRPC 7 2131.1 8 1 990.5 -2.2 GPPPQGGRPQ PRPC 9 5 1819.0 4.8 GRPQGPPQQGGHPRPPR PRPC 10 8 1315.7 -5.3 GPG RIPPPPPAPY SM R3B Fig.2 Legend. The most intense 10 peaks ID’d by StaggeredPMF are listed. If green, the sequence has been published previously. If blue, the sequence is proposed. If red, the proposed sequence is different from a published sequence very similar in mass. If purple, no sequence proposed by StaggeredPMF, but appropriate peptide already published.

Direct MALDI analysis of naturally cleaved human saliva samples: Mapping to a series of KPQ- terminated peptides from small salivary proteins. TP10 #XXX

Embed Size (px)

Citation preview

Page 1: Direct MALDI analysis of naturally cleaved human saliva samples: Mapping to a series of KPQ- terminated peptides from small salivary proteins. TP10 #XXX

Direct MALDI analysis of naturally cleaved human saliva samples: Mapping to a series of KPQ-terminated peptides from small salivary proteins.

TP10 #XXXKenneth C. Parker1; Na Tian2; Frank Oppenheim2; Eva Helmerhorst2 . 1SimulTof Corporation, Sudbury , MA; 2Boston University School of Dental Medicine, Boston, MA

Methods

IntroductionOne of the most easily collected human biofluids is saliva.  The dominant intact proteins in saliva are usually alpha-amylase, immunoglobulin A, and lysozyme, but saliva also commonly contains naturally processed peptides in the 600- 10000 m/z range that can directly monitored by MALDI-MS. Previous experiments have established that many of these peptides derive from seven additional proteins that are highly expressed in saliva: basic salivary proline rich proteins 1-4 (PRB1-PRB4), salivary acidic proline-rich phosphoprotein (PRPC) and histatin-3 (His3).   Some of the responsible proteases apparently derive from commensal bacteria, for example, Rothia species, that often cleave proteins C-terminal to the tripeptide sequence KPQ (Helmerhorst et al., 2008).  

Conclusions:• Many peptides in saliva supernatants derive from series of staggered peptides with shared N or C-termini from histatins or proline-rich proteins (PRPs).•Presumably, these derive from a combination of endopeptidases and exopeptidases.• For PRPs, a preferred endocleavage motif is KPQ/GPP.• Depending on subtle collection parameters, different series are most prominent.• Can tentatively identify many peptides by high mass accuracy mapping to

•1.) a list of previously identified salivary peptides•2.) series of peptides with shared N-termini or C-termini.

• Some of these identifications have been confirmed by MSMS•Additional identifications are in progress.

•Identifications of peptides in series is complicated by repeats, leading to multiple series with members with identical aa composition.

•PCA separates samples into sets dominated either by PRPC or His3.•Staggered PMF may be generally useful for studying many biofluids.•Can qualify dental hygienist according to pattern of peptides after cleaning.

References: 1.) Helmerhorst et al, Identification of KPQ as a Novel Cleavage Site Specificity of Saliva-associated Proteases.JBC 2008, 283:19957-19966.1.) Parker KC. Scoring Methods in MALDI Peptide Mass Fingerprinting: ChemScore and the ChemApplex Program. JASMS 2002;13:22-39.

1. Collect whole or parotid secretion saliva from 88+ human subjects (BU) or lab personnel (Sudbury).

2. Spin; keep supernatant.3. Dilute into HCCA MALDI matrix; spot in duplicate.4. Collect MALDI reflectron MS spectra (14.8 m flight tube).4. Map to:

- list of 338 identified peptides- to series of staggered peptides (staggered PMF)from13 small salivary proteins.

5. Prepare 1 amu mass matrix from top 40 masses from 179 spectra from 88 patients found >=4 times -> 252 masses.

6. Normalize, perform PCA.7. Collect selected MSMS spectra.

Fig. 1. Software engineer’s Saliva.

Staggered PMF1. Get protein sequence of salivary protein2. Make truncated peptide series starting at every possible

N-terminus and at every possible C-terminus (each peptide ends up in 2 series).

3. Define each series of related peptides as a protein-like entity for PMF.

4. Increase ChemScore of peptides 2x for C-ter. Q and N-ter. G.5. Use ordinary PMF logic to identify those series that are most

prominent (based on Parker(2002)).

Fig. 1. Example truncation series from histatin 3 (His3)

aa Mass < Sequence > mb1 987.5 _ DSHAKRHH GYK 9871 1044.5 _ DSHAKRHHG YKR 10441 1207.6 _ DSHAKRHHGY KRK 12071 1335.7 _ DSHAKRHHGYK RKF 13351 1491.8 _ DSHAKRHHGYKR KFH 14911 1619.9 _ DSHAKRHHGYKRK FHE 16191 1766.9 _ DSHAKRHHGYKRKF HEK 17661 1904.0 _ DSHAKRHHGYKRKFH EKH 19031 2033.0 _ DSHAKRHHGYKRKFHE KHH 20321 2161.1 _ DSHAKRHHGYKRKFHEK HHS 21601 2298.2 _ DSHAKRHHGYKRKFHEKH HSH 22971 2435.2 _ DSHAKRHHGYKRKFHEKHH SHR 24341 2522.3 _ DSHAKRHHGYKRKFHEKHHS HRG 25211 2659.3 _ DSHAKRHHGYKRKFHEKHHSH RGY 26581 2815.4 _ DSHAKRHHGYKRKFHEKHHSHR GYR 28141 2872.5 _ DSHAKRHHGYKRKFHEKHHSHRG YRS 28711 3035.5 _ DSHAKRHHGYKRKFHEKHHSHRGY RSN 3034

Shared mature N-terminusaa Mass < Sequence > mb1 3035.5 _ DSHAKRHHGYKRKFHEKHHSHRGY RSN 30342 2920.5 D SHAKRHHGYKRKFHEKHHSHRGY RSN 29193 2833.5 DS HAKRHHGYKRKFHEKHHSHRGY RSN 28324 2696.4 DSH AKRHHGYKRKFHEKHHSHRGY RSN 26955 2625.4 SHA KRHHGYKRKFHEKHHSHRGY RSN 26246 2497.3 HAK RHHGYKRKFHEKHHSHRGY RSN 24967 2341.2 AKR HHGYKRKFHEKHHSHRGY RSN 23408 2204.1 KRH HGYKRKFHEKHHSHRGY RSN 22039 2067.1 RHH GYKRKFHEKHHSHRGY RSN 206610 2010.0 HHG YKRKFHEKHHSHRGY RSN 200911 1847.0 HGY KRKFHEKHHSHRGY RSN 184612 1718.9 GYK RKFHEKHHSHRGY RSN 171813 1562.8 YKR KFHEKHHSHRGY RSN 156214 1434.7 KRK FHEKHHSHRGY RSN 143415 1287.6 RKF HEKHHSHRGY RSN 128716 1150.6 KFH EKHHSHRGY RSN 115017 1021.5 FHE KHHSHRGY RSN 1021

Shared C-terminus at aa 24

I Symb Series Leng #Pep #Obs #Obs_i Same Score %IM ppw1 His3 51 80 18 18 12 1014002 32.8 1.32 PRPC 166 104 6 6 4 9687 10.4 0.3all 24 16

1 His3 C13 16 8 8 5 661766 15.7 1.42 His3 N1 25 8 8 5 412745 16.6 1.43 His3 N15 12 2 3 2 16369 1.3 0.54 His3 N7 20 4 5 0 10867 1.1 2.55 PRPC N132 10 2 2 2 5024 0.4 2.46 PRPC N107 36 2 2 2 4702 9.9 0.2all 26 16

Rank MassExp ppm < Sequence > ChS19 2625.4 -5.4 SHA KRHHGYKRKFHEKHHSHRGY RSN 2017 2341.2 -4.5 AKR HHGYKRKFHEKHHSHRGY RSN 2021 1847.0 0.5 HGY KRKFHEKHHSHRGY RSN 2011 1718.9 -1.5 GYK RKFHEKHHSHRGY RSN 209 1562.8 0.3 YKR KFHEKHHSHRGY RSN 2010 1434.7 -1.2 KRK FHEKHHSHRGY RSN 203 1287.6 1.3 RKF HEKHHSHRGY RSN 205 1150.6 0.9 KFH EKHHSHRGY RSN 20

25 987.5 -1.6 _ DSHAKRHH GYK 404 1207.6 -1.2 _ DSHAKRHHGY KRK 202 1335.7 -0.9 _ DSHAKRHHGYK RKF 207 1491.8 1.6 _ DSHAKRHHGYKR KFH 20

107 1619.8 -9.2 _ DSHAKRHHGYKRK FHE 2076 1766.9 6.3 _ DSHAKRHHGYKRKF HEK 20115 2522.3 -4.9 _ DSHAKRHHGYKRKFHEKHHS HRG 20

51 990.5 -3.1 KPQ GPPPQGGRPQ GPP 32071 1866.9 -1.3 KPQ GPPPQGGRPQGPPQGQSPQ _ 160

42 1403.7 -9.0 KSR SARSPPGKPQGPPQ QEG 401 2185.1 -6.3 KSR SARSPPGKPQGPPQQEGNKPQ GPP 80

94 1731.9 1.8 KPQ GPPQQGGHPPPPQGRPQ GPP 32028 2490.2 5.4 KPQ GPPQQGGHPPPPQGRPQGPPQQGGH PRP 80

26 1067.5 -1.3 RKF HEKHHSHR GYR 403 1287.6 1.3 RKF HEKHHSHRGY RSN 2015 1443.7 0.0 RKF HEKHHSHRGYR SNY 20

14 925.5 -3.1 AKR HHGYKRK FHE 20125 1603.8 8.4 AKR HHGYKRKFHEKH HSH 20108 1965.0 0.2 AKR HHGYKRKFHEKHHSH RGY 2040 2121.1 -0.7 AKR HHGYKRKFHEKHHSHR GYR 4017 2341.2 -4.5 AKR HHGYKRKFHEKHHSHRGY RSN 20

128 2065.0 3.9 NKS qSARSPPGKPQGPPPQGGNQP QG 2043 2193.1 4.6 NKS qSARSPPGKPQGPPPQGGNQPQ G 80

123 1864.0 7.2 PPP qEGNKSRSARSPPGKPQG PPQ 2024 2186.1 -5.4 PPP qEGNKSRSARSPPGKPQGPPQ QEG 40

11 1718.9 -1.5 YKR KFHEKHHSHRGYR SNY 2015 1443.7 0.0 RKF HEKHHSHRGYR SNY 2049 1306.7 0.1 KFH EKHHSHRGYR SNY 2064 1049.5 -4.3 HEK HHSHRGYR SNY 20

78 1102.5 3.1 _ DSHEKRHHG YRR 2033 1421.7 -7.1 _ DSHEKRHHGYR RKF 20

70 972.6 4.8 HGY KRKFHEK HHS 2081 1109.6 -6.8 HGY KRKFHEKH HSH 2021 1847.0 0.5 HGY KRKFHEKHHSHRGY RSN 20

aa Mass < Sequence >75 1731.8684 QGK PQGPPQQGGHPPPPQGR PQG76 1731.8684 GKP QGPPQQGGHPPPPQGRP QGP77 1731.8684 RPQ GPPQQGGHPPPPQGRPQ GPP78 1731.8684 PQG PPQQGGHPPPPQGRPQG PPQ79 1731.8684 QGP PQQGGHPPPPQGRPQGP PQQ80 1731.8684 GPP QQGGHPPPPQGRPQGPP QQG81 1731.8684 PPQ QGGHPPPPQGRPQGPPQ QGG82 1731.8684 PQQ GGHPPPPQGRPQGPPQQ GGH83 1731.8684 QQG GHPPPPQGRPQGPPQQG GHP84 1731.8684 QGG HPPPPQGRPQGPPQQGG HPR85 1731.8684 GGH PPPPQGRPQGPPQQGGH PRP86 1731.8684 GHP PPPQGRPQGPPQQGGHP RPP

Complication of truncaton series informatics:repeat sequences

X12aX12b

X40a

X32aX10b

X10a X28aX01a

X30a

X82bX42a X80a X69a

X40b

X02a X30bX32b X42bX84bX36b X04aX04bX01b X76aX22a X70b X54bX27b

X88bX36a X66bX88a X46aX82a X74aX48aX43a X76bX60bX41b X68aX43b X41a X63aX56a X61aX37b X56bX62bX78cX54a X50aX48bX46bX78dX72aX37a X75aX34a X62aX52aX79bX24aX15b X80bX84a X66aX77aX60aX53aX79aX74bX17b X87aX35b X73a X57aX68bX57bX08aX02b X50bX13a X69bX83aX35a X87bX53b X70aX34b X33aX19a X52bX03a X67aX33b X81aX27a X71bX13b X73bX49aX26b X61bX03b X77bX64b X72b X71aX19b X75bX11b X55bX29a X58aX67b X63bX83bX51bX47bX29b X81bX18b X85aX15a X51aX55aX09b X58bX25a X45bX14b X64aX65bX14a X22b X85bX44aX39b X25bX31a X45aX07b X24bX09aX08bX18a X21a X47aX21bX44bX07dX39aX17a X23bX23aX07c X65aX28b

X59aX31b X59bX26a X07aX16aX06b X38bX16bX38aX11aX20aX05bX20bX05a

-4.00E-01

-3.00E-01

-2.00E-01

-1.00E-01

0.00E+00

1.00E-01

2.00E-01

-0.15 -0.1 -0.05 0 0.05 0.1 0.15

PCA plot: Sample Space

PC3

1333

1390

1866

2915

4367

1471

2520

4367

1224

1674

1731

2178

1287

1434 1562 17182066

2496

2624

12071335

1491

3034

925951

971990

1004

1068

10761106

1107

1114

1135

1193

1202

1220

1222

1238

1246

1315

1374

1378

13801508

1509

157015751680

1767

1805

18181849

1904

1931

2011

2027

2028

2039

2041

2065

2077

2087

2121

2130

2161

2182

2183

2184

2185

2240

2607

29712973

2975

2990

2992

2993

2999

30003001

30163017

3018

30354325

4326

432743334334

4350

4351

43524353 43614362

4363

4364

4366

4368

-20

-15

-10

-5

0

5

10

-20 -15 -10 -5 0 5 10 15

PCA plot: Mass Space

Fig. 3 PCA plots:The intensities of 282 masses found in the top 40 in at least 4 samples were normalized and submitted to PCA. In the mass space plot, masses are colored according to the stagger series to which they can be mapped. Samples in which His3 stagger series are prominent map to the center of the PCA plot. Samples on the far right have prominent 4369 peak from intact PRPC C-terminal fragment. Samples on the far left are dominated by fragments that map to PRPC

stagger series.

KZip 33

K

6

56

51

16

104

161

167

155

Fig. 2. Saliva from Helmerhorst lab (top 8) or from me.rank series Mass ppm Seq Symb

1 1 1866.9 4.5 GPPPQGGRPQGPPQGQSPQ PRPC2 3 1471.7 0.8 GRPQGPPQQGGHQQ PRPC3 2 2916.5 -1.9 GPPPPPPGKPQGPPPQGGR PRPC4 4 1731.9 -2.6 GPPQQGGHPPPPQGRPQ PRPC5 1680.9 GPPRPPQGGRPSRPPQ PRB16 3 2521.3 -2.0 GRPQGPPQQGGHQQGPPP PRPC7 7 2078.1 7.5 GPPPPGKPQGPPPQGDKSRS PRB18 2 2040.1 -1.9 GPPPPPPGKPQGPPPQGGR PRPC9 17 1767.9 1.5 SPPGKPQGPPPQGGNQPQ PRB210 4 1224.6 0.1 GGHPPPPQGRPQ PRPC

rank series Mass ppm Seq Symb1 4 1471.7 0.0 GRPQGPPQQGGHQQ PRPC2 1 1866.9 2.4 GPPPQGGRPQGPPQGQSPQ PRPC3 2 1224.6 1.5 GGHPPPPQGRPQ PRPC4 1 990.5 -1.2 GPPPQGGRPQ PRPC5 2 1731.9 0.8 GPPQQGGHPPPPQGRPQ PRPC6 2131.17 6 1222.6 -0.7 GPPPQGDKSRSP PRB18 1135.69 2179.110 5 1315.7 -0.6 GPGRIPPPPPAPY SMR3B

rank series Mass ppm Seq Symb1 4 1471.7 0.5 GRPQGPPQQGGHQQ PRPC2 1 1866.9 0.0 GPPPQGGRPQGPPQGQSPQ PRPC3 1 990.5 0.5 GPPPQGGRPQ PRPC4 2 1224.6 1.0 GGHPPPPQGRPQ PRPC5 2 1731.9 -1.7 GPPQQGGHPPPPQGRPQ PRPC6 28 1107.6 -0.3 qRGPRGPYPP PRB17 9 1315.7 0.8 PGRIPPPPPAPYG SMR3B8 5 1390.7 -4.1 GGRPQGPPQGQSPQ PRPC9 3 2040.1 -3.0 GPPPPPPGKPQGPPPQGGR PRPC10 1004.5

rank series Mass ppm Seq Symb1 2 1224.6 -5.0 GGHPPPPQGRPQ PRPC2 15 1471.7 0.0 GRPQGPPQQGGHQQ PRPC3 GPGRIPPPPPAPY SMR3B4 2 1731.9 0.4 GGHPPPPQGRPQGPPQQ PRPC5 1 990.5 -5.6 GPPPQGGRPQ PRPC6 4 1287.6 -3.4 HEKHHSHRGY His37 4 1562.8 -1.4 KFHEKHHSHRGY His38 7 1056.5 -3.0 HSHREFPF His39 6 1335.7 -2.1 DSHAKRHHGYK His310 4 1434.7 2.4 FHEKHHSHRGY His3

rank series Mass ppm Seq Symb1 1 1287.6 -0.3 HEKHHSHRGY His32 2 4369.2 0.0 GRPQGPPQQ...QSPQ PRPC3 4 1335.7 1.0 DSHAKRHHGYK His34 2185.1 (4369.2)2+5 28 4352.2 -2.1 QQGGHPP...QGGHQQG PRPC6 5 925.5 -3.8 HHGYKRK His37 1 1562.8 -0.5 KFHEKHHSHRGY His38 1 1434.7 1.0 FHEKHHSHRGY His39 6 1443.7 -0.3 HEKHHSHRGYR His310 7 1356.8 2.2 KRHHGYKRKF His3

rank series Mass ppm Seq Symb1 12 4369.2 -3.8 FDVSLEVS...PFKTENAQ PIGR2 19 3018.5 4.1 QGPPQQ...GHQQG PRPC3 2 1335.7 -3.9 DSHAKRHHGYK His34 19 4352.2 -1.4 QQGGHPP...GGHQQG PRPC5 2625.26 2 3035.5 -5.5 DSHAKR...HSHRGY His37 1 1718.9 4.8 RKFHEKHHSHRGY His38 11 2185.1 -6.8 SARSPPG...EGNKPQ PRB49 1 1562.8 4.5 KFHEKHHSHRGY His310 1491.8

rank series Mass ppm Seq Symb1 14 4369.2 -0.1 GRPQGPP...GPPQGQSPQ PRPC2 1 4353.2 -5.2 GNKSRSARS...PPGGNP PRB43 4355.14 4328.35 3018.56 2185.17 5 2625.4 8.5 KRHHGYKRKFHEKHHSHRGY His38 4330.09 4338.210 2 1287.6 -1.8 HEKHHSHRGY His3

rank series Mass ppm Seq Symb1 5 4369.2 -0.7 GRPQGPP...PQGQSPQ PRPC2 2 1335.7 1.0 DSHAKRHHGYK His33 4353.34 3035.65 1 1287.6 2.5 HEKHHSHRGY His36 25 4370.2 9.2 GKPERPPP...RSARSPPG PRB47 3017.58 3019.69 1 1562.8 -0.6 KFHEKHHSHRGY His310 1 1718.9 -2.5 RKFHEKHHSHRGY His3

rank series Mass ppm Seq Symb1 6 4369.2 0.1 GRPQGPPQ...QGQSPQ PRPC2 8 4353.2 4.6 GGQQQ...QGGHPR PRPC3 3018.64 2 3035.5 -4.7 DSHAKRHH...HSHRGY His35 3017.46 2 1335.7 0.3 DSHAKRHHGYK His37 1 2625.4 -0.6 KRHHGYKRKFHEKHHSHRGY His38 4328.39 1 1287.6 -5.5 HEKHHSHRGY His310 1 1718.9 0.7 RKFHEKHHSHRGY His3

rank series Mass ppm Seq Symb1 1 1866.9 -1.6 GPPPQGGRPQGPPQGQSPQ PRPC2 4 1471.7 0.3 GRPQGPPQQGGHQQ PRPC3 3 2916.5 -4.0 GPPPPPPG...PPQGQSPQ PRPC4 2 1731.9 -1.1 GPPQQGGHPPPPQGRPQ PRPC5 57 2179.1 -5.4 PPQGGN...SARSPP PRB16 5 1380.7 -1.1 GPPQQGGHPRPPR PRPC7 2131.18 1 990.5 -2.2 GPPPQGGRPQ PRPC9 5 1819.0 4.8 GRPQGPPQQGGHPRPPR PRPC10 8 1315.7 -5.3 GPGRIPPPPPAPY SMR3B

Fig.2 Legend.The most intense 10 peaks ID’d by StaggeredPMF are listed. If green, the sequence has been published previously. If blue, the sequence is proposed. If red, the proposed sequence is different from a published sequence very similar in mass. If purple, no sequence proposed by StaggeredPMF, but appropriate peptide already published.

Page 2: Direct MALDI analysis of naturally cleaved human saliva samples: Mapping to a series of KPQ- terminated peptides from small salivary proteins. TP10 #XXX
Page 3: Direct MALDI analysis of naturally cleaved human saliva samples: Mapping to a series of KPQ- terminated peptides from small salivary proteins. TP10 #XXX