Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Overcoming the Challenges in Data Independent Acquisition (DIA) via High Resolution Accurate Mass Orbitrap Based Mass Spectrometer Yue Xuan,1 Jan Muntel,2 Sebastian T. Berger,2 Andreas FR Huhme,3 Hanno Steen,2
Thomas Moehring1 1Thermo Fisher Scientifi c, Bremen, Germany; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientifi c, San Jose, CA
Po
ster No
te 64
40
5
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
2 Overcoming the Challenges in Data Independent Acquisition (DIA) via High Resolution Accurate Mass Orbitrap Based Mass Spectrometer
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
PN64405-EN 0615S
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600
Iden
tifie
d Pr
otei
ns
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800 Si
gnifi
cant
ly C
hang
ed P
rote
ins
(Man
n-W
hitn
ey-U
-tes
t p<0
.05)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Overcoming the challenges in Data Independent Acquisition (DIA) via high resolution accurate mass Orbitrap based mass spectrometer Yue Xuan1; Jan Muntel2; Sebastian T. Berger2; Andreas FR Huhmer3; Hanno Steen2; Thomas Moehring1 1Thermo Fisher Scientific, Bremen, GERMANY; 2Departments of Pathology, Boston Children’s Hospital, Boston, MA; 3Thermo Fisher Scientific, San Jose, CA
Conclusion We successfully established a DIA workflow, which enables large scope of urine
proteomics studies based on high resolution accurate mass Orbitrap technology.
Even without any prefractionation of the samples, we detected more than 2,500 proteins, which is close to complete coverage of the urinary proteome by our DIA workflow.
The highly reproducible quantification (median CV 7%) enables to skip technical replicates and to focus on the analysis of biological samples.
The high number of quantified proteins per sample (in average 1,300) per single DIA experiment enabled to get insights into the host cell response in an UTI based on changes in the urinary proteome as well into the formation of an ovarian cyst.
We envision that the comprehensiveness and low analysis time per sample will allow the application of this DIA workflow in synchronous biomarker discovery and validation which requires the analysis of hundreds of samples.
References 1. Rosenberger, G. SCIENTIFIC DATA | 1:140031 | DOI: 10.1038/sdata.2014.31
2. Zheng, J. BMC genomics 2013,14, 777
3. Takaya et al., Int J Oncol, 2015
4. Feldman et al., Clin Cancer Res, 2009
Overview Purpose: Utilize Thermo Scientific TM Q ExactiveTM HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome DIA challenges and enable large scope of urine proteomics studies.
Methods: Urine samples from 87 patients associated with seven different differential diagnoses were collected and trypsinized using a membrane-based processing method in a 96-well plate format. For generation of a spectral library all samples were analyzed by DDA methods. DIA analysis was performed on the Q Exactive HF MS to quantitatively mapping the urinary proteomes.
Results: In this study, we have developed a robust DIA method for the comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies with high throughput ( less than 4 days for 87 biological sample measurements) and excellent reproducibility (media of CV% is 7%).
Introduction The promises of data independent acquisition (DIA) strategies are a comprehensive and reproducible data collection for large-scale quantitative proteomics experiments. However, the wide isolation window (usually >10Da) of DIA experiments co-isolates and co-fragments multiple peptides, resulting in highly complex DIA MS/MS spectra and makes the DIA data analysis challenging. To accurately identify and precisely quantify thousands of proteins per DIA experiment, the completeness and specificity of spectral library, the mass accuracy of the data, and the technical variance in quantitation play important roles. In this work, we utilize the Q Exactive HF mass spectrometer for DIA LC-MS/MS experiments to study the urinary proteome, and demonstrate how high resolution/ accurate mass spectrometry is employed to overcome these DIA challenges and enable large scope of urine proteomics studies..
Methods Sample Preparation
Urine samples were collected from consenting patients visiting the Emergency Department at Boston Children's Hospital. Upon consent, urine samples from 87 patients associated with seven different differential diagnoses were collected (abdominal pain controls: n = 33, ovarian cyst: n = 12, mesenteric adenitis: n = 6, constipation n = 7, urinary tract infection; UTI: n = 11, gastritis: n = 6, gastroenteritis: n = 12) and trypsinized using a membrane-based processing method in a 96-well plate format.
Liquid Chromatography
For the spectral library, all 87 samples were analyzed by a nanoLC system equipped with a LC-chip system (cHiPLC nanoflex, Eksigent, trapping column: Nano cHiPLC Trap column 200 μm x 0.5 mm Reprosil C18, 3 μm, analytical column: Nano cHiPLC column 75 μm x 15 cm Reprosil C18, 3 μm) coupled online to a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Peptides (4 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 75 min.
DIA experiments on all 87 samples: Each sample was analyzed on a Thermo ScientificTM EASY-nLCTM 1000 nanoLC system equipped with a trapping column (Thermo ScientificTM PepMapTM100, 75um x 2cm, C18, 3um) and an analytical column (PepMapRSLC, 75um x 25cm, C18, 2um) coupled to a Q Exactive HF mass spectrometer equipped with a Thermo Scientific TM EASY-Spray TM nanoelectrospray ion source. Peptides (2 µl of digest) were separated by a linear gradient from 93% buffer A (0.2% FA in water) / 7% buffer B (0.2% FA in ACN) to 75% buffer A / 25% buffer B within 30 min. The total run time with loading and washing steps was 50 min. Column oven was set to 40°C .
Mass Spectrometry
DDA method on QE: The MS was operated in data-dependent TOP10 mode with the following settings: mass range 400-1,000 Th; resolution for MS1 scan 70,000 @ 200 Th; lock mass: 445.120025 Th; resolution for MS2 scan 17,500 @ 200 Th; isolation width 1.6 m/z; NCE 27; underfill ratio 1%; charge state exclusion: unassigned, 1, >6; dynamic exclusion 30 s.
Additionally a subset of randomly chosen 23 samples was analyzed on a commercial TripleTOF TM 5600 (AB Sciex, Concord, Canada) using the same LC setup and gradient. The MS was operated in data-dependent TOP50 mode with following settings: MS1 mass range 400-1,000 Th with 250 ms acc. time; MS2 mass range 100-1,700 Th with 50 ms acc. time and following MS2 selection criteria: UNIT resolution, intensity threshold 100 cts; charge states 2-5. Dynamic exclusion set to 17 s.
FIGURE 1. The established DIA workflow for Urinary Biomarker discovery
FIGURE 2. Influence of Spectral Library: Percentage of peptides and proteins that are detected in all three replicates, in 2 out of 3 replicates, in single replicate.
FIGURE 3. Reproducibility Evaluation with Three Replicates: Technical Reproducibility Of DDA And DIA Quantification Based On Three Replicates (Fig 3A). Number of Identification in Relation to Technical Replicates ( Fig 3B)
Results Selecting the Most Appropriate Spectral Library
To investigate the influence of the spectral library on the data analysis, we generated six different spectral libraries (Table 1). The quality of a spectral library can be assessed by the completeness of the library and the reproducibility of peptide/protein detection.
The comprehensive urinary library compromised 2,600 protein groups and 20,000 peptides (Spec Lib 06 in Table 1). Based on current studies, this library covers the vast majority of the urinary proteome (2).
We calculated the percentage of peptides and proteins that were detected in all three replicates, in 2 out of 3 replicates or in only one single replicate (Figure 2). The results show a higher reproducibility of peptide/protein identification and quantification when using the urinary spectral library 06. This finding underscores the importance of using a sample-specific spectral library .
High Reproducible Peptide Detection and Quantitation with DIA experiments
To elucidate how the DIA routine performs compared to a DDA workflow, we analyzed an unrelated sample three times by a DDA and the DIA method on the Q Exactive HF using the same 30 min gradient. The DIA data were analyzed using the comprehensive urinary library (Spec lib 06, Table 1).
With our optimized DIA workflow (Figure 1), the median CV of the peptide/protein quantification of the DIA data was only 6.7% and 8.1%, which is twofold better than from the DDA data (Figure 3A). The number of detected peptides (~5,220) and proteins (1,120) with a single DIA experiment are almost twofold as the DDA data (Figure 3B). Our DIA method show a highly reproducible and precise quantification on peptide level.
FIGURE 5. Comprehensive Urinary Proteome Coverage by DIA workflow
FIGURE 6. Top Biomarker Discovery by using DIA workflow
The power of Mass Accuracy
The application of wide isolation windows (>10 Th) in typical DIA experiments results in complex MS/MS spectra. During data analysis, the ion chromatograms of multiple fragment ions are extracted and aligned for peptide detection and quantification. To separate the analyte of interest from interferences, a highly accurate mass of the ions is crucial. We applied different mass tolerances for the data analysis (50 ppm, 20 ppm, and 10 ppm). One example is shown in Figure 4. With ± 2.5 mins retention time window and 50 ppm mass tolerance for extracting the multiple transitions of a peptide LVGYLDR, several interferences overlapped with the peptide of interest. Only with 10 ppm mass accuracy, the interferences are removed from the spectrum, therefore, 10 ppm mass accuracy is minimum requirement for an accurate detection of peptides .
TABLE 1. Spectral libraries
Library Name Protein Groups Peptides Fragment ions Instrument Source
Spec Lib 01 2,077 15,470 109,064 Q Exactive MS 23 DDA files
Urine
Spec Lib 02 1,436 9,569 65,880 TripleTOF 5600 23 DDA files
Urine
Spec Lib 03 2,226 16,985 123,087 Combined from Spec Lib 01 and 02
Urine
Spec Lib 04 14,158 149,420 2,832,306 TripleTOF 5600 see Rosenberger et al., 2014 (1)
Spec Lib 05 1,869 40,902 925,156 TripleTOF 5600 subset of Human-14,000
Spec Lib 06 2,575 19,854 144,643 87 Q Exactive MS DDA files + 23 TripleToF 5600
DDA files
Urine
DIA method on QE HF: Each DIA duty cycle contains one full scan and 24DIA MS/MS scans to covering the mass range 400 -1000 TH. Full scan with a resolution 30,000 @ 200 Th; AGC target – 3e6, maximal IT – 50ms; mass range 400-1,000 Th; followed by DIA scans with resolution 30,000 @ 200 Th; isolation width 20 m/z for the first 20 DIA scans, 40 m/z for the following 2 DIA scans, and 60m/z for the last two DIA scans; NCE 30; target value 1e6, maximal injection time set to “auto”, which automatically calculates the maximal injection time based on the detection time to allow the mass spectrometer always operating in the parallel ion filling and detection mode.
FIGURE 4. The ion chromatograms of multiple transitions of peptide LVGYLDR of protein (P07602) are extracted with 50 ppm, 20 ppm, and 10 ppm. Retention time window is ± 2.5 mins. 10 ppm mass accuracy is the minimum requirement for an accurate peptide detection.
Comprehensive Urinary Proteome Coverage by DIA workflow
The establishment of the DIA workflow (Figure 1) was applied to a urinary study compromising 87 samples. The samples were derived from patients with abdominal pain in a pediatric emergency room. After digestion in a 96-well plate format we were able to analyze them in less than 4 days by our optimized DIA workflow. In average, we detected 1,301 protein groups (848 – 1,720) (Figure 5A) and 5,714 peptides per sample (3,172 – 8,231) (Figure 5B). In total, our DIA workflow enabled the detection of 2,456 proteins, representing 95% of the proteins in the spectral library. These numbers are contrasted by the DDA output, in which only 7% of the proteins were identified in more than 95% of the samples and 60% in less than 25% of the samples (Figure 5C). Compared to all other samples, 773 proteins were significantly changed in their amount in the UTI samples (non-parametric Mann-Whitney U test, p<0.05), 502 in the ovarian cyst samples, 209 in the constipation samples, 111 in the mesenteric adenitis samples and 58 in the gastroenteritis samples (Figure 5D).
73
77
71
26
61
69
12 10
13 22 15
13 15 13
16
52
24 18
0
1000
2000
3000
4000
5000
Iden
tifie
d Pe
ptid
es
Peptide Level
3of3 2of3 1of3
80
81
75
22
62
75
8 8 10
18
12 9 12 12
15
60
25 15
0
400
800
1200
1600 Id
entif
ied
Prot
eins
Protein Level
3B
1 2 3 0
1000
2000
3000
4000
5000
6000
7000
Replicate
Iden
tifie
d Pe
ptid
es
Peptide Level 3A
1 2 3 0
200
400
600
800
1000
1200
1400
Replicate
Iden
tifie
d Pr
otei
ns
Protein Level
DDA ID increase DDA DDA - matching ID increase DDA - matching DIA ID increase DIA
6 8 10
Median CV 16.7%
20
0
40
60
80
100
120
140
Protein Level
DDA – Peptide Peak Areas
Log10 Protein Intensity
Coe
ffici
ent O
f Var
ianc
e %
2 4 6 8
Median CV
8.1% 20
0
40
60
80
100
120
140 DIA – Fragment Ion Peak Areas
Log10 Protein Intensity
10 7 9
Median CV 15.7%
20
0
40
60
80
100
120
140
6 8 7 4 6 8
Median CV
6.7% 20
0
40
60
80
100
120
140
3 5
Peptide Level
DIA – Fragment Ion Peak Areas
DDA – Peptide Peak Areas
Log10 Peptide Intensity
Coe
ffici
ent O
f Var
ianc
e %
Log10 Peptide Intensity
Data Analysis
Database search in MaxQuant software, version 1.5.0.0 was performed directly with the RAW- and WIFF files using the human UniProt database. Trypsin with up to 2 missed cleavages; mass tolerances set to 20 ppm for the first search and 4.5 ppm for the second search for the Q-Exactive data and 0.1 Da for the first search and 0.01 for the main search for the TripleTOF 5600 data. Oxidation of M was chosen as dynamic modification (+15.995 Da) and carbamidomethylation of C as static modification (+57.021 Da). FDR was set to 1% on peptide and protein level.
All DIA data were directly analyzed in Spectronaut 7.0 (Biognosys). dynamic score refinement and MS1 scoring – enabled; interference correction and cross run normalization (total peak area) – enabled; All results were filtered by a Q value of 0.01 (equals a FDR of 1% on peptide level).
Protein intensity was calculated by summing the peptide intensities of each protein from the Spectronaut output file. The data were imported into Perseus 1.5.1.6 and missing values imputed by the minimum value for each protein. Significance of protein abundance changes were calculated using the u-test (non-parametric test) and protein with a p value below 0.01 were considered to be significantly changed. The annotation of biological process was done with the DAVID online tool using the comprehensive urinary spectral library- Spec Lib 06 in Table 1).
Biomarker Discovery
To find the best suitable biomarker candidates we focused on the ten highest significantly changed proteins for each of the five conditions. The performance of the biomarker candidates has been assessed by calculating the area under the receiver-operating characteristic (AUROC) (Figure 6). For example, Cystatin-B (CYTB) is an intracellular thiol proteinase inhibitor. It is increased level in Ovarian Cyst (OC) with a pValue equals to 1.3e-5 (Bonferroni-corrected: p=0.027), and AUROC is 0.91. Cystatin-B has been identified as biomarker candidate in the context of malignant growths in ovaries (3) and bladder (4), i.e. genitourinary tract.
50 ppm 20 ppm 10 ppm
─ [y5+] – 623.315 m/z ─ [y6+] – 722.383 m/z ─ [y6+ -H3PO4] – 624.406 m/z ─ [y5++] – 312.161 m/z ─ [y3+ -NH3] – 386.203 m/z ─ [y4+] – 566.293 m/z
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.2838-623.3462 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.88E5Base Peak m/z= 722.3469-722.4191 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.24E4Base Peak m/z= 624.3748-624.4372 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.02E7Base Peak m/z= 312.1454-312.1766 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.83E6Base Peak m/z= 386.1837-386.2223 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2647-566.3213 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3025-623.3275 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 4.30E5Base Peak m/z= 722.3686-722.3974 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3935-624.4185 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 5.85E6Base Peak m/z= 312.1548-312.1672 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.97E5Base Peak m/z= 386.1953-386.2107 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2817-566.3043 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
RT: 18.93 - 24.07
19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
ance
NL: 7.12E5Base Peak m/z= 623.3088-623.3212 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 2.13E5Base Peak m/z= 722.3758-722.3902 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 3.25E4Base Peak m/z= 624.3998-624.4122 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 7.60E3Base Peak m/z= 312.1579-312.1641 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.21E5Base Peak m/z= 386.1991-386.2069 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9NL: 1.14E5Base Peak m/z= 566.2873-566.2987 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.
0 200 400 600 800
1000 1200 1400 1600 1800
Prot
ein
Grou
ps
Identified Protein Groups
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Uni
que
Pept
ides
Identified Peptides
0 100 200 300 400 500 600
0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5
Prot
eins
Sample Coverage
DDA DIA Protein Sample Coverage %
0
100
200
300
400
500
600
700
800
Sign
ifica
ntly
Cha
nged
Pro
tein
s (M
ann-
Whi
tney
-U-t
est p
<0.0
5)
Significantly Changed Proteins
5A 5B
5C 5D
Pain Control Group Ovarian Cyst Urinary Tract Infection
Constipation Mesenteric Adenitis Gastroenteritis
Ovarian Cyst (OC) Marker: CYTB UTI Marker: PERM
Constipation (Con) Marker: UROK Mesenteric Adenitis (MA) Marker: LEG3
Gastroenteritis (GE) Marker: KPYR
Ovarian Cyst (AUROC=0.91)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
CYTB
1e7 1e6
1e3 1e2
1e4 1e5
Mesenteric Adenitis (AUROC=0.872)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
LEG3
1e7 1e6
1e3 1e2
1e4 1e5
Gastroenteritis (AUROC=0.76)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02
1.E+03
1.E+04
1.E+05
OC
U
TI
Con
M
A
GE
P
ain
KPYR
1e3
1e2
1e4
1e5
Urinary Tract Infection (AUROC=0.968)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
PERE
1e7 1e6
1e3 1e2
1e4 1e5
Constipation (AUROC=0.928)
1-Specificity
Sens
itivi
ty
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07
OC
U
TI
Con
M
A
GE
P
ain
UROK
1e7 1e6
1e3 1e2
1e4 1e5
Peptides
Comprehensive Urinary Spectral Library
2,575 protein groups (87 Q Exactive MS and
23 TripleToF 5600TM runs)
19,854 unique peptides
144,643 transitions
87 ER patients in sixth groups
Constipation
Mesenteric Adenitis
Gastroenteritis
Pain Control Group
Ovarian Cyst
Urinary Tract Infection
DIA (Q Exactive HF MS)
Digestion
Search against database using
Spectronaut FDR 1%
Urinary Samples (87)
Scoring and Q Value calculation by Spectronaut
Perseus
u-test (non-parametric test)
Validated Peptides/proteins from 87 samples
Proteins
Samples
Num
ber o
f pro
tein
s/pe
ptid
es
Optimized DIA method:
average 8-10 data points per LC peak ( 9s @ FWHM)
average cycle time 2 s
Patient 1
Patient 2
Patient N
KEGG Pathway Enrichment p value
0.001 0.05 1
RT: 20.66 - 21.64
20.7 20.8 20.9 21.0 21.1 21.2 21.3 21.4 21.5 21.6Time (min)
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Relat
ive A
bund
ance
NL: 2.35E7Base Peak m/z= 506.2709-506.2811 F: FTMS + p NSI Full ms2 [email protected] [200.00-1305.00] MS DIA_B9
Africa +43 1 333 50 34 0Australia +61 3 9757 4300Austria +43 810 282 206Belgium +32 53 73 42 41Canada +1 800 530 8447China 800 810 5118 (free call domestic)
400 650 5118
Denmark +45 70 23 62 60Europe-Other +43 1 333 50 34 0Finland +358 10 3292 200France +33 1 60 92 48 00Germany +49 6103 408 1014India +91 22 6742 9494Italy +39 02 950 591
Japan +81 45 453 9100Korea +82 2 3420 8600Latin America +1 561 688 8700Middle East +43 1 333 50 34 0Netherlands +31 76 579 55 55New Zealand +64 9 980 6700Norway +46 8 556 468 00
Russia/CIS +43 1 333 50 34 0Singapore +65 6289 1190Spain +34 914 845 965Sweden +46 8 556 468 00Switzerland +41 61 716 77 00UK +44 1442 233555USA +1 800 532 4752
www.thermoscientific.com©2015 Thermo Fisher Scientifi c Inc. All rights reserved. ISO is a trademark of the International Standards Organization. Spectronaut is a trademark of Biognosys AG. MaxQuant software is a trademark of Max-Planck Institute of Biochemistry. TripleTOF is a trademark of Sciex, Pte. Ltd. All other trademarks are the property of Thermo Fisher Scientifi c and its subsidiaries. This information is presented as an example of the capabilities of Thermo Fisher Scientifi c products. It is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others. Specifi cations, terms and pricing are subject to change. Not all products are available in all countries. Please consult your local sales representative for details.
Thermo Fisher Scientifi c, San Jose, CA USA is ISO 13485 Certifi ed.
ISO 13485