11
W H I T E P A P E R Shotgun Proteomics and Other Biomarker Discovery Methods Gel-Based, CE, DIA and Shotgun All Have Advantages and Disadvantages in the Quest for New Protein Targets. Protein discovery and human proteome mapping moved rapidly with a wealth of genomics data to guide analyses and a particularly strong contribution from standardized shotgun proteomics. Remaining proteins likely remain “unseen” due to a variety of factors. Thus, life sciences still has fertile ground to cultivate for biomarkers despite the dearth of new protein antigen- based assays that have come to market in the age of genomics and since the emergence of other ‘omics’. The first two years of the Human Proteome Project (September 2010-October 2012) saw significant progress with at least one peptide for roughly 12,500 protein-encoding gene sequences (contained in Swiss-Prot) identified, leaving roughly 7,500 gene-predicted proteins yet to be discovered with confidence in a proteomics experiment. 1 Some limitations of current biomarker discovery as it is predominantly practiced can be overcome through the use of alternative ion sources, underrepresented sample resources, intact protein analysis, and alternative sample preparation. Additionally, the cataloguing of proteins as gene products represents only a base level of proteomics knowledge; the validity of proteins as biomarkers and drug targets may rest upon their post- translational modification that affect their function and activity within cellular and biological systems. Additionally, the roles of identified proteins and their association with disease have not been completely documented, leaving potential for already discovered proteins to find use as biomarkers. 1 Farrah T., Deutsch E. W., et al., Journal of Proteome Research, December 2012

Shotgun Proteomics and Other Biomarker Discovery Methods

Embed Size (px)

Citation preview

Page 1: Shotgun Proteomics and Other Biomarker Discovery Methods

W H I T E P A P E R Shotgun Proteomics and Other Biomarker Discovery Methods

Gel-Based, CE, DIA and Shotgun All Have Advantages and Disadvantages in the Quest for New Protein Targets. Protein discovery and human proteome mapping moved rapidly with a wealth of genomics data to guide analyses and a particularly strong contribution from standardized shotgun proteomics. Remaining proteins likely remain “unseen” due to a variety of factors. Thus, life sciences still has fertile ground to cultivate for biomarkers despite the dearth of new protein antigen-based assays that have come to market in the age of genomics and since the emergence of other ‘omics’.

The first two years of the Human Proteome Project (September 2010-October 2012) saw significant progress with at least one peptide for roughly 12,500 protein-encoding gene sequences (contained in Swiss-Prot) identified, leaving roughly 7,500 gene-predicted proteins yet to be discovered with confidence in a proteomics experiment.1

Some limitations of current biomarker discovery as it is predominantly practiced can be overcome through the use of alternative ion sources, underrepresented sample resources, intact protein analysis, and alternative sample preparation. Additionally, the cataloguing of proteins as gene products represents only a base level of proteomics knowledge; the validity of proteins as biomarkers and drug targets may rest upon their post-translational modification that affect their function and activity within cellular and biological systems. Additionally, the roles of identified proteins and their association with disease have not been completely documented, leaving potential for already discovered proteins to find use as biomarkers.

                                                            1 Farrah T., Deutsch E. W., et al., Journal of Proteome Research, December 2012

Page 2: Shotgun Proteomics and Other Biomarker Discovery Methods

In general terms, proteomics-assisted biomarker discovery relies on the ability of mass spectrometers to determine the mass of ionized proteins, peptides and peptide fragments using their mass/charge (m/z) ratios. The masses of peptides and peptide fragments indicated by their m/z values are compared to the known masses of amino acid sequence fragments; entire protein chain(s) can be discovered or identified through the assembling and association of a combination of mass data from intact proteins, peptides, and peptide fragments. Protein sequence informs identification, structural characterization, and the parameters of targeted analyses for a single protein or set of putative biomarkers in more samples. Numerous strategies exist for proteomic biomarker discovery and rely upon separation and fractionation of samples for improved resolution during mass analysis as well as extensive sample preparation prior to MS analysis for targeted searches.

Biomarker discovery can begin with less than 10 samples to over 100 samples that include diseased or other phenotypic samples and control samples for the purposes of relative quantitation and preliminary differential analysis. Identified proteins with abundance differentials between samples can be designated as putative or candidate biomarkers, particularly if plausibly affected or involved in a cellular pathway implicated with a disease.

Shotgun Proteomics

Shotgun proteomics has powered a significant amount of protein discovery and can be summarized by protein digestion; isobaric mass tagging; orthogonal separation by one method combined with reversed-phase liquid chromatography (RPLC); tandem mass spectrometry (MS/MS); and data-dependent acquisition (DDA). The shotgun approach is capable of identifying up to roughly 1,000 proteins in a single run (MS/MS analysis of LC elution) through enhanced resolution afforded by multiple separation parameters. Sample complexity can also be mitigated by orthogonal separation and elution as well as the selection of limited or restrained separated fractions.2 Enhanced resolution coupled with informatics-assisted matching of peptide fragments to proteins provides high protein identification rates. Relative quantitation is also provided through comparison of the tag reporter fragments specific to each sample.

The traditional shotgun approach has enabled deeper interrogation of complex proteomic samples, but remains limited by the characteristics of DDA and choice of separation methods orthogonal to RPLC. Data-dependent acquisition determines the m/z value and intensity of LC-eluted precursor ions in the first stage of MS/MS and then randomly or through select rules (often to the preference of high-abundance proteins) fragments only select precursor ions to record secondary m/z values and intensities. Co-eluted peptides or those with close retention times cannot all be fragmented due to the rate of the experiment. Low-abundance peptides are commonly not fragmented during DDA due to masking by a higher intensity signal at a similar m/z value (ion suppression) or the rule-driven choice of a higher-abundance peptide.

Shotgun proteomics is frequently defined by its use of orthogonal separation or the front-end reduction of sample complexity using a separation or fractionation method before in-line LC, typically high performance

                                                            2 Two-dimensional gel electrophoresis (2DGE) of labeled or dyed samples also provides a basis for investigation of differentially expressed proteins

Page 3: Shotgun Proteomics and Other Biomarker Discovery Methods

liquid chromatography (HPLC) and more advanced ultra-high performance liquid chromatography (UHPLC) with flow rates down to the nanoliter for nanospray injection. Separation prior to RPLC is commonly size-exclusion chromatography (SEC), ion exchange chromatography (including strong cation exchange [SCX]), RPLC, or isoelectric focusing (IEF) with off-gel electrophoresis (OGE) or immobilized pH gradient (IPG) that deliver samples in liquid phase for RPLC elution.

A common orthogonal separation to RPLC in shotgun proteomics is SCX chromatography. A configuration of SCX-RPLC-MS/MS was termed multidimensional protein identification technology (MudPIT) by its chief developer Michael Washburn in the beginning of the last decade. The complexity of MudPIT is more demanding on lab personnel, but also enables the acquisition of thousands of MS/MS spectra. The resolution of so many peptides was originally a challenge to MS system data acquisition though current models have improved significantly since the time MudPIT was introduced. The limitations of SCX as an in-line separation method in shotgun proteomics include its relatively limited resolution of peptides and reduced sample recovery due to sample loss. Reversed-phase LC can be preferred over SCX due to its use of low salt or salt-free buffers, removing the need for desalting, and the comparable separation orthogonality of RPLC-RPLC at different pHs to SCX-RPLC.3

Many other two-dimensional separation configurations exist for shotgun proteomics, but are used specially for certain samples or the study of select protein classes. While termed sometimes as a shotgun method, GeLC-MS/MS is less commonly used now with so many separation alternatives. Gel-based shotgun methods suffer from an inability to integrate with a high-throughput in-line (liquid phase) system and also the less than ideal orthogonality of separation by molecular size and hydrophobicity (SDS-PAGE followed by RPLC). The sensitivity of gel-based separations is also limited. Affinity LC separation can be used to capture select peptides for discovery or to deplete abundant non-analytical peptides or proteins. Isoelectric focusing with OGE-IPG has also been used successfully to reduce sample complexity. Additional methods may use three dimensions of separation such as SDS-PAGE (molecular weight), followed by IEF (pI), and RPLC.

While deeper achievable proteome coverage and global protein profiling remain paramount for de novo studies in basic research, proteomics technology employed in translational research often takes a more targeted approach. Biomarker discovery more often may proceed with the analysis of protein classes or from knowledge regarding a certain cellular signaling pathway. Thorough resolution of complex samples using two or more dimensions of separation is achievable using current shotgun configurations, but is more tedious and may involve more off-line work. Trained searches through sample proteomes using reproducible, automated, in-line methodology is a more relevant goal for biomarker discovery.

Shotgun proteomics is also inherently limited by its overall scheme as a “bottom-up” method that performs mass analysis on digested proteins or peptide fragments rather than proteins in their original form. In complex samples it is more likely that identical peptide fragments are produced through digestion from different proteins, leading to false identifications or indeterminate proteome coverage. Bottom-up methods also limit the effective analysis of PTMs better studied through using intact proteins and their fragments (“top-down” and “middle-down” proteomics). Shotgun proteomics has dramatically expanded proteome coverage and continues to find                                                             3 Yang F., Shen Y., et al., 2012

Page 4: Shotgun Proteomics and Other Biomarker Discovery Methods

applications in the study of other species’ proteomes and clinical sample studies, though more targeted analyses are increasingly feasible using accumulated proteomics knowledge and are taking precedence in the search for potential biomarkers.

Shotgun Proteomics Summary (MS Types, Separation/Fractionation Methods, Advantages, Disadvantages)

Shotgun Proteomics

Mass Spec Technologies IT; Q-TOF; LIT-Orbitrap; Q-IT; FT-ICR (top-down)

Separation/Fractionation 2DLC, commonly SCX/SEC-RPLC; IEF-RPLC

Application Bottom-up proteomic discovery and global protein profiling

Advantages -In-line, standardized process for multiplexed identification and quantification of potentially thousands of proteins

Disadvantages

-Loss of some structural information (fragile PTM sites) -Low sensitivity -False identification from peptide fragment matching in complex samples

Source: Kalorama Information

Gel-Based Discovery

Gel-based separation prior to mass analysis is generally considered apart from shotgun proteomics with the latter predominantly using liquid phase orthogonal separation. Proteomic biomarker discovery is more often performed using gel-free shotgun approaches, though gel-based methods, particularly difference gel electrophoresis (DIGE), remain relevant. Difference gel electrophoresis improves upon traditional 2DGE by permitting the loading of multiple samples into the same gel through the use of different dyes. Up to two samples (and internal pooled standard) can be loaded into the same gel for DIGE. This capability reduces hands-on time by reducing the need to prepare multiple gels. Reproducibility limitations of 2DGE are also mitigated; inter-gel variability is eliminated with an internal pooled standard that also allows for more confident quantitation and differential analysis between samples. Advanced dyes used with DIGE also significantly improve the sensitivity of spot detection and effectiveness of image analysis systems.4 Intact proteins are commonly loaded into DIGE, providing an effective separation method for the analysis of PTMs and other species including isoforms, mutants, splice variants, and proteolytic cleavage products.

                                                            4 Applied Bionomics; accessed April 2015

Page 5: Shotgun Proteomics and Other Biomarker Discovery Methods

Gel-Based Discovery Proteomics Summary (MS Types, Separation/Fractionation Methods, Advantages, Disadvantages)

Gel-Based Discovery Proteomics

Mass Spec Technologies IT; Q-TOF; MALDI-TOF/TOF; LIT-Orbitrap; Q-IT

Separation/Fractionation 2DGE, including DIGE

Application Intact protein characterization and identification; differential analysis

Advantages

-Well-established method -Improved reproducibility with use of DIGE or multiple samples in same gel -Effective fractionation of protein species and aided analysis with dye and gel imaging platforms

Disadvantages

- Non-DIGE 2DGE has limited reproducibility -In-line (with LC-MS/MS) difficult with gel -Added time and labor for experiment

Source: Kalorama Information

Capillary Electrophoresis

Capillary electrophoresis (CE) saw limited prospects for appreciable market development in proteomics research until a recent shift in analytical priorities from high-throughput protein identification, or global proteome profiling, to targeted protein characterization. Intact protein analysis is increasingly important to proteomics research both for the identification of drug targets and biomarkers that are unique species or structurally elaborated from the straight gene product. Several factors make CE, particularly the simplest sub-technique capillary zone electrophoresis (CZE), potentially attractive and superior to LC for expanded use in top-down discovery proteomics:

• CE mitigates weak bonding or analyte interaction with separation media as in the case with LC columns; interaction between proteins or polypeptides and LC separation media impacts sample recovery and separation efficiency

• Better handles large proteins and polypeptides (+50 kDa); large proteins in reverse-phase LC (RPLC) columns can take longer and are more difficult to elute, produce broad spectrum peaks, and reduce column lifetime

• Faster separation can be accomplished using CZE than HPLC or within 10-30 minutes for most elutions compared to 50 minutes with HPLC

• Greater sensitivity with resolution of analytes in the low nanogram range compared to the microgram-high nanogram range of HPLC

• Higher efficiency separation is possible with theoretical plate counts approaching 100,000

Page 6: Shotgun Proteomics and Other Biomarker Discovery Methods

• Utilizes very small amounts of sample with CZE loading capacity one to three orders of magnitude lower than HPLC

The potential advantages of CE in top-down proteomics applications are moderated by the technology’s current limitations in common practice when integrated with proteomics MS platforms. The efficiency and speed of CZE separation outstrip the effective or quality spectra acquisition rate of most current MS systems. In other words, individual CZE runs may be unable to thoroughly capture high-complexity sample proteomes. Additionally, small sample loading capacities in the nanoliter with CE can also limit the detectable dynamic range (low-abundance proteins in such a small sample can escape MS sensitivity) and MS instrument sensitivity.

Multiple studies have demonstrated the feasibility of CZE-ESI-MS/MS with front-end orthogonal separations for large-scale bottom-up analyses of tryptic digest.5 Bottom-up proteomics similarly stands to benefit from the fast separation time of CZE. However, CZE has received most of its recent attention in research for top-down proteomics, specifically the resolution of protein isoforms (PTM and sequence variants of the same protein). Top-down proteomic discovery using CZE has been hindered by the technology’s minute sample loading capacity; top-down studies commonly require larger sample amounts than bottom-up digests. A solution to the minute sample loading capacity of CZE is coupling the separation method with capillary isoelectric focusing (CIEF) or another second dimension of separation, though at a cost of added complexity in sample manipulations.6 A related liquid-phase technique, free flow electrophoresis (FFE), has also been explored for top-down proteomics separation as a continuous electrophoretic technique that permits higher sample volumes than CZE. Work also continues on optimizing CZE for top-down proteomics. Importantly, researchers have been able to apply and refine electrospray ionization (ESI) with a sheathless design for on-line CZE-ESI-MS development.

                                                            5 Yan, X., Essaka, D. C., et al., Proteomics, September 2013 6 Chromatography Online Interview with Norman Dovichi, December 2013

Page 7: Shotgun Proteomics and Other Biomarker Discovery Methods

Capillary Electrophoresis-Based Discovery Proteomics Summary (MS Types, Separation/Fractionation Methods, Advantages, Disadvantages)

CE-Based Discovery Proteomics Mass Spec Technologies IT; Q-TOF; LIT-Orbitrap; Q-IT Separation/Fractionation Capillary electrophoresis

Application Intact protein characterization and identification; bottom-up proteomic discovery and global protein profiling

Advantages

-Lower sample consumption than LC -More efficient separation of intact proteins and large biological molecules; theoretical plate counts approaching 100,000 with digested peptides -Higher sensitivity (to low nanogram range) -Faster elution

Disadvantages

-Small sample loading capacity -Rapid elution difficult for some MS systems -Little experience in proteomics research compared to LC and gel methods

Source: Kalorama Information

Data-Independent Acquisition

Alternatives to shotgun proteomics are defined not only in terms of front-end separation strategies (gel- and CE-based separation), but also their spectral scanning strategy or method of spectra acquisition. Shotgun proteomics relies upon data dependent acquisition that is generally biased towards high-abundance or high-(spectral) intensity peptides, selectively obtaining fragmentation spectra. Shotgun proteomics falls victim to the constraint of run time or the limited operational window to obtain fragment spectra before the elution of more precursor ions. The depth of proteome coverage in shotgun proteomics is ultimately limited by instrument speed.

In contrast to DDA, data-independent acquisition (DIA) methods provide extensive coverage through the systematic fragmentation of all peptides over an assigned range or mass (m/z) window. During DIA, the instrument randomly selects uniformly wide m/z windows that pieced together cover the user-set mass range. The m/z values of all precursors and fragment products are then obtained in switched MS and MS/MS modes. The experiment continues until all windows are analyzed for precursor and product ions and a thorough, high resolution map of all spectra is produced. The instrument-manageable cycled precursor windows allow for the comprehensive fragmentation and acquisition of all peptides and peptide fragments. Reproducibility of generated peptide maps is improved with DIA over DDA due to the latter’s often variable results in fragmentation selection.

Label-free quantitation strategies such as relative quantitation by peak intensity and spectral counting are also limited for low-abundance proteins when using DDA. In addition to the irreproducible nature of LC retention and consequent variance in elution time and peak areas, relative quantitation by peak intensity with DDA is not effective for low-abundance peptides co-eluted or masked by high-abundance species in both the

Page 8: Shotgun Proteomics and Other Biomarker Discovery Methods

chromatogram and also MS/MS spectrum. Without comprehensive fragmentation, DDA quantitation by spectral counting becomes problematic as there are fewer fragment peptides available to determine relative abundance.

Data-independent acquisition offers a robust, label-free relative quantitation method using the comprehensive MS/MS fragmentation spectra. The high-resolution MS/MS spectra are interrogated post-acquisition and integrated by common peak elution time to determine the relative abundance of peptides. The interrogation of highly complex DIA data sets is often only limited by the available spectral library used for reference and software algorithms. The drawback of DIA is the potential for peptides that are not of interest within a particular DIA fragmentation window to contribute the same peptide fragments as the peptide of interest. Contamination from non-targeted peptides can skew intensity levels and quantitation, limit the dynamic range, and raise the limits of detection and quantitation (LOD/LOQ) from the MS/MS spectra. Multiplex DIA was developed to overcome this problem; the current iteration uses 5 non-consecutive 4 m/z-wide windows to select peptides for fragmentation and comprehensive MS/MS acquisition. The narrow mass window sampling continues until the entire desired mass range is covered. The enhanced selectivity of multiplex DIA decreases the chance that another peptide contributes fragment ions that are the same as the peptide of interest.7

Data-independent acquisition is able to broadly capture peptide spectra for discovery purposes while providing some functionality as a targeted proteomics method. In silico or computer-driven mining of DIA data – more thorough than DDA data, but not as readily interpretable without recorded precursor-fragment associations – can be used for targeted searches of select peptides and relative quantitation. With extensive reference libraries already available from thousands of shotgun experiments, discovery phase experiments with the aid of DIA packages have begun leaning towards targeted proteomics and effectively jumpstarting biomarker verification.

                                                            7 The Scientist, June 1, 2014

Page 9: Shotgun Proteomics and Other Biomarker Discovery Methods

Data-Independent Acquisition Discovery Proteomics Summary (MS Types, Separation/Fractionation Methods, Advantages, Disadvantages)

DIA Discovery Proteomics Mass Spec Technologies Q-TOF; LIT-Orbitrap; Q-Orbitrap Separation/Fractionation HPLC

Application Bottom-up proteomic discovery and global protein profiling; differential analysis; relative quantitation

Advantages

-Improved sensitivity and targeting of low-abundance peptides/proteins -Highly complex, thorough representation of sample proteome -Reduces on-line (LC-MS/MS) operation time -Software-driven with widely available peptide reference libraries for protein identification and peptide quantitation from fragments -Effective discovery quantitation of peptides/proteins of interest

Disadvantages

-“Contamination” from similar fragments contributed by other peptides in DIA window -Technically dense approach with significant in silico or workstation-based time for interpretation

Source: Kalorama Information

Page 10: Shotgun Proteomics and Other Biomarker Discovery Methods

  

The Proteomic Research Market Opportunity, Defined

Kalorama Information’s Proteomic Research Markets includes analysis of markets for: Instruments (Mass Spectrometry, Liquid Chromatography, Electrophoresis), Disease Areas (Oncology, Cardio/Blood, Neurology, Other), Trends and Market Share Pharmaceutical and diagnostic companies spend tens of billions of dollars on translational research, which has increased the focus on instruments capable of powerfully assaying and identifying proteins implicated in disease pathways. Analyst Emil Salazar examines proteomics research processes in detail and in terms of their instrument and technological requirements. As part of its coverage, the report includes market size estimates (2014) and market projections (2019) for the following instrument groups: • Mass Spectrometry • Liquid Chromatography • Electrophoresis

This report also explores proteomics research market by the current and forecast size of disease group market segments that include: • Oncology • Neurology • Cardiovascular • Other Proteomics research factors into life sciences research as a demanding, advanced area of research antecedent to clinical research and clinical trials that are performed more widely across the world. Among the “omics,” proteomics trails only the fields of genomics and transcriptomics in terms of maturity and heft in research investment.

FOR MORE INFORMATION OR TO ORDER: http://www.kaloramainformation.com/Proteomics-

Research-Mass-8989207/

Page 11: Shotgun Proteomics and Other Biomarker Discovery Methods

Mass spectrometry-based proteomics is a particular focus of this report, as MS-based research methods represent a leading option for pharmaceutical and life science companies to mitigate research expenditures by multiplexing detection in a relatively ‘antibody-less’ workflow.

This report compares the following types of MS Systems: • Triple Quadrupole (QqQ) • Time-of-flight (TOF) • Quadrupole-TOF (Q-TOF) • Ion Traps (IT,QIT,LIT) • Orbitraps • FT-ICR The process of validating useful proteins for new drugs and clinical tests begins with translational research that transforms laboratory or preclinical study discoveries into useful pre-clinical studies and clinical trials. Following protein biomarker and drug target validation, translational research proceeds alternatively with the optimization of drug leads active for the identified target or the development of assays used in preclinical studies, as a less-regulated lab developed test (LDT), research use only (RUO) assay, or clinically.

Global Segmentation Kalorama Information's market analysis is worldwide and this report breaks out the market for proteomic research into geographic regions. North America is the dominant regional market for proteomics research with instrumentation distributed widely among universities, research centers, medical centers, core pharmaceutical laboratories, and specialty contract research organizations (CROs) focusing on biomarker development. Asia-Pacific is expected to see build-up of proteomics instrumentation capacity following the same trajectory of genomics research development in the region. The report defines the geographic trend in the market and provides specific proteomic research market sizes for the following regions. • United States • Europe • Asia-Pacific • Rest of World (ROW)

Who Competes in This Market and Where Do They Stand? As part of its analysis of proteomics research instrument markets, the report identifies company market share for specific instrument areas including liquid chromatography and mass spectrometry as well as the market leaders in electrophoresis. The report notes the background, products and revenue of the major market players.

http://www.kaloramainformation.com/Proteomics-Research-Mass-8989207/