1
Conclusion Modern proteomics studies generate an enormous amount of peptide and protein identifications. One of the remaining challenges in proteomics is data analysis, particularly when trying to interpret the proteins and peptides identified by automated database search engines and extract biologically- meaningful information from mass spectrometry experiments. Here we presented a software tool that enables characterization of protein datasets based on the involvement of identified proteins in different pathways. Using the study of global site-specific phosphorylation dynamics in EFGR signalling network (Olsen et al, Cell, 2006) as an example, we concluded that the dominant over-represented pathway was the ErbB signalling pathway (20% of pathway coverage). As expected, when compared to the reference Swiss- Prot database, other over-represented pathways included myeloid leukemia, as well as insulin and mTor signalling pathways. As anticipated, after 10 min. of stimulation, top over- represented pathways included ErbB and MAPK. This is in agreement with the current model of EGF signal transduction. References 1. Olsen, JV; Blagoev, B., Gnad, F., Macek, B., Kumar, C., Mortensen, P., Mann M., "Global, in vivo, and site- specific phosphorylation dynamics in signaling networks.” Cell. 2006;127(3):635-48. 2. Futschik, M.E. and Carlisle, B., "Noise-robust soft clustering of gene expression time-course data". Journal of Bioinformatics and Computational Biology, 2005, Vol. 3, No. 4, p. 965–988. Overview Purpose: Extraction of biologically meaningful information from proteomics datasets Method: Development of a novel bioinformatics tool Results: Implementation of KEGG pathway repository into existing bioinformatics tool Introduction The mass spectrometry-based approach to proteomics studies has significantly enhanced our understanding of complex biological processes. However, a number of challenges remain, one of which is how to quickly extract biologically-relevant information from the data. Despite the wide variety available, most bioinformatics tools have been designed to deal with genetic arrays and do not take into consideration the specific characteristics associated with proteomics data. Here we present a new development in the Thermo Scientific ProteinCenter bioinformatics tool: Implementation of the Kyoto Encyclopedia of Genes and Genomes (KEGG). The KEGG pathways repository speeds up and facilitates the extraction of biologically- meaningful information and statistical analysis of MS data. In the current study, epidermal growth factor (EGF) pathways were investigated. Methods ProteinCenter™ software is specifically designed for the interpretation of proteomics data, supporting direct import from multiple database search engines and protein repositories. Recently, a pathway analysis tool based on the KEGG pathways repository has been implemented. Proteomics datasets were downloaded from literature sources, in particular, data from Olsen, JV, et al., Cell. 2006 127(3):635-48. Data loaded into ProteinCenter software included protein accession codes, peptide sequences and quantitative information. Protein lists were clustered according to an indistinguishable proteins group algorithm. IPI and Swiss- Prot databases were used as reference databases. Statistical analysis was performed based on Benjamini- Hochberg correction of p-values with false discovery rate (FDR) values of 5%. Over-representative analysis of KEGG pathways was performed on the 1.249 times up- or down-regulated proteins. Values were calculated for whole datasets, as well as for each dataset from the kinetics study for 0, 5, 10, 20 minutes after EGF treatment of HeLa cells. Results Expression Profile Clustering of KEGG Pathways Clustering KEGG pathways according to the median of their respective phosphopeptide ratios gives a quick overview as to which biological pathways are activated upon EGF stimulation, see Figure 2. The ErbB pathway shows activation after 10 minutes of EFG stimulation. KEGG Over-Representation Analysis Identified proteins containing phosphopeptides with a ratio > 1.25 after 10 minutes of treatment were selected and compared to all human proteins in Swiss-Prot. Figure 3 shows the ErbB pathway to be over-represented, as expected from the above KEGG pathway cluster analysis. Over-representative analysis is based on Benjamini- Hochberg correction of p values to determine which protein sets are selected with a particular FDR and statistically over-represented in comparison to a reference dataset. The Swiss-Prot database was used as a reference database. Swiss-Prot is a registered trademark of Institut Suisse De Bioinformatique. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries. This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others. Figure 5 displays the ErbB pathway from KEGG after 0 minutes and 10 minutes. Key components of the ErbB pathways show significant up-regulation. Express Biological Pathway Analysis of Mass Spectrometry-Based P1189 Proteomics Datasets Christian Ravnsborg 1 , Martin Damsbo 1 , Ole Vorm 1 , Michaela Scigelova 2 and Madalina Oppermann 2 1 Thermo Fisher Scientific, Odense, Denmark; 2 Thermo Fisher Scientific, Stockholm, Sweden FIGURE 1. Workflow for analysis of EGF pathways using data from EGF treated HeLa cells. Biological experiment Digestion and LC/MS/MS analysis followed by database search Import peptide, proteins, meta- information into ProteinCenter Data clustering: Homology or peptide clustering Data comparison and filtering Statistical analysis Display of heat maps, profiling and visualization FIGURE 2. Expression profile clustering on KEGG pathway using the phosphopeptides aggregated to their annotated KEGG pathway. Pathways are clustered on the median value using soft clustering as described in [2]. FIGURE 3. Over-representation analysis of KEGG pathway of regulated proteins after 10 minutes of treatment. Visualization of Genes Involved in the ErbB Pathway Figure 4 shows a heat map of genes in the ErbB pathway colored according to the median of their respective phosphopeptides. FIGURE 4. Heat map of proteins in the ErbB pathway FIGURE 5. Visualization of detected proteins in the ErbB pathway colored according to median phosphopeptide ratio. The pathway maps show 0 and 10 minute stimulations.

Express Biological Pathway Analysis of Mass Spectrometry

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Express Biological Pathway Analysis of Mass Spectrometry

Conclusion• Modern proteomics studies generate an enormous

amount of peptide and protein identifications.

• One of the remaining challenges in proteomics is data analysis, particularly when trying to interpret the proteins and peptides identified by automated database search engines and extract biologically-meaningful information from mass spectrometry experiments.

• Here we presented a software tool that enables characterization of protein datasets based on the involvement of identified proteins in different pathways.

• Using the study of global site-specific phosphorylation dynamics in EFGR signalling network (Olsen et al, Cell, 2006) as an example, we concluded that the dominant over-represented pathway was the ErbB signalling pathway (20% of pathway coverage).

• As expected, when compared to the reference Swiss-Prot database, other over-represented pathways included myeloid leukemia, as well as insulin and mTor signalling pathways.

• As anticipated, after 10 min. of stimulation, top over-represented pathways included ErbB and MAPK. This is in agreement with the current model of EGF signal transduction.

References1. Olsen, JV; Blagoev, B., Gnad, F., Macek, B., Kumar,

C., Mortensen, P., Mann M., "Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.” Cell. 2006;127(3):635-48.

2. Futschik, M.E. and Carlisle, B., "Noise-robust soft clustering of gene expression time-course data". Journal of Bioinformatics and Computational Biology, 2005, Vol. 3, No. 4, p. 965–988.

OverviewPurpose: Extraction of biologically meaningful information from proteomics datasets

Method: Development of a novel bioinformatics tool

Results: Implementation of KEGG pathway repository into existing bioinformatics tool

IntroductionThe mass spectrometry-based approach to proteomics studies has significantly enhanced our understanding of complex biological processes. However, a number of challenges remain, one of which is how to quickly extract biologically-relevant information from the data. Despite the wide variety available, most bioinformatics tools have been designed to deal with genetic arrays and do not take into consideration the specific characteristics associated with proteomics data.

Here we present a new development in the Thermo Scientific ProteinCenter bioinformatics tool: Implementation of the Kyoto Encyclopedia of Genes and Genomes (KEGG). The KEGG pathways repository speeds up and facilitates the extraction of biologically-meaningful information and statistical analysis of MS data. In the current study, epidermal growth factor (EGF) pathways were investigated.

MethodsProteinCenter™ software is specifically designed for the interpretation of proteomics data, supporting direct import from multiple database search engines and protein repositories. Recently, a pathway analysis tool based on the KEGG pathways repository has been implemented. Proteomics datasets were downloaded from literature sources, in particular, data from Olsen, JV, et al., Cell. 2006 127(3):635-48.

Data loaded into ProteinCenter software included protein accession codes, peptide sequences and quantitative information. Protein lists were clustered according to an indistinguishable proteins group algorithm. IPI and Swiss-Prot databases were used as reference databases. Statistical analysis was performed based on Benjamini-Hochberg correction of p-values with false discovery rate (FDR) values of 5%. Over-representative analysis of KEGG pathways was performed on the 1.249 times up- or down-regulated proteins. Values were calculated for whole datasets, as well as for each dataset from the kinetics study for 0, 5, 10, 20 minutes after EGF treatment of HeLa

cells.

ResultsExpression Profile Clustering of KEGG Pathways

Clustering KEGG pathways according to the median of their respective phosphopeptide ratios gives a quick overview as to which biological pathways are activated upon EGF stimulation, see Figure 2. The ErbB pathway shows activation after 10 minutes of EFG stimulation.

KEGG Over-Representation Analysis

Identified proteins containing phosphopeptides with a ratio > 1.25 after 10 minutes of treatment were selected and compared to all human proteins in Swiss-Prot. Figure 3 shows the ErbB pathway to be over-represented, as expected from the above KEGG pathway cluster analysis.

Over-representative analysis is based on Benjamini-Hochberg correction of p values to determine which protein sets are selected with a particular FDR and statistically over-represented in comparison to a reference dataset. The Swiss-Prot database was used as a reference database.

Swiss-Prot is a registered trademark of Institut Suisse De Bioinformatique. All other trademarks are the property of Thermo Fisher Scientific and its subsidiaries.

This information is not intended to encourage use of these products in any manners that might infringe the intellectual property rights of others.

Figure 5 displays the ErbB pathway from KEGG after 0 minutes and 10 minutes. Key components of the ErbB pathways show significant up-regulation.

Express Biological Pathway Analysis of Mass Spectrometry-Based P1189Proteomics DatasetsChristian Ravnsborg1, Martin Damsbo1, Ole Vorm1, Michaela Scigelova2 and Madalina Oppermann2

1Thermo Fisher Scientific, Odense, Denmark; 2Thermo Fisher Scientific, Stockholm, Sweden

FIGURE 1. Workflow for analysis of EGF pathways using data from EGF treated HeLa cells.

Biological experiment

Digestion and LC/MS/MS analysis followed by database search

Import peptide, proteins, meta-information into ProteinCenter

Data clustering:Homology or peptide clustering

Data comparison and filtering

Statistical analysis

Display of heat maps, profiling and visualization

FIGURE 2. Expression profile clustering on KEGG pathway using the phosphopeptides aggregated to their annotated KEGG pathway. Pathways are clustered on the median value using soft clustering as described in [2].

FIGURE 3. Over-representation analysis of KEGG pathway of regulated proteins after 10 minutes of treatment.

Visualization of Genes Involved in the ErbB Pathway

Figure 4 shows a heat map of genes in the ErbB pathway colored according to the median of their respective phosphopeptides.

FIGURE 4. Heat map of proteins in the ErbB pathway

FIGURE 5. Visualization of detected proteins in the ErbB pathway colored according to median phosphopeptide ratio. The pathway maps show 0 and 10 minute stimulations.