Upload
juan-antonio-vizcaino
View
24
Download
2
Embed Size (px)
Citation preview
EMBL-EBI Now and in the Future
The ProteomeXchange Consortium: 2016 updateDr. Juan Antonio Vizcano
Proteomics Team LeaderEMBL-European Bioinformatics InstituteHinxton, Cambridge, UK
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
1
PSI Spring Meeting 2017
Beijing Proteome Research Center, ChinaApril 24-26, 2017April 23: 2nd PHOENIX Mini-Symposium on Frontiers of ProteomicsApril 27: Hiking the Great Wall
Focus topics:Quality control: qcMLProteogenomics formatsproXI: proteomics eXpression InterfacePrivacy and Proteomics Data
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
2
OverviewGeneral introduction to ProteomeXchange
Overall submission statistics
Updated HPP guidelines
Specifics about MassIVE (Nuno)
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
ProteomeXchange: A Global, distributed proteomics database
PASSEL (SRM data)
PRIDE (MS/MS data)
MassIVE (MS/MS data)
Raw
ID/Q
Meta
Mandatory raw data deposition since July 2015
Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.org
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
ProteomeXchange: A Global, distributed proteomics database
PASSEL (SRM data)
PRIDE (MS/MS data)
MassIVE (MS/MS data)
Raw
ID/Q
Meta
jPOST(MS/MS data)
Mandatory raw data deposition since July 2015
Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.orgNew in 2016
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
6
ProteomeCentral: Centralised portal for all PX datasetshttp://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
8
ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
UniProt/neXtProt
Peptide Atlas
Other DBs Receiving repositories
PRIDE
GPMDB
Researchers results
Raw dataMetadata
PASSEL
proteomicsDB
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
OmicsDIIntegration with other omics datasets
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
9
OmicsDI: Portal for omics datasetshttp://www.ebi.ac.uk/Tools/omicsdi/Aims to integrate of omics datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVEjPOSTPASSELGPMDB
ArrayExpressExpression Atlas
MetaboLightsMetabolomics WorkbenchGNPS
EGAPerez-Riverol et al., 2016, BioRXxiv
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.10
OmicsDI: Portal for omics datasets
Perez-Riverol et al., 2016, BioRXxiv
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.11
OverviewGeneral introduction to ProteomeXchange
Overall submission statistics
Updated HPP guidelines
Specifics about MassIVE (Nuno)
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
Countries with at least 100 datasets: 1105 USA 546 Germany 411 United Kingdom 356 China 229 France 188 Netherlands 178 Canada 150 Switzerland 125 Australia 123 Spain 123 Denmark 117 Japan 101 Sweden
ProteomeXchange: 4,534 datasets up until 31st July, 2016Type: 4067 PRIDE 339 MassIVE 115 PeptideAtlas/PASSEL 13 jPOSTPublicly Accessible: 2597 datasets, 57% of all 2334 PRIDE 135 MassIVE 115 PASSEL 13 jPOST
Datasets/year: 2012: 102 2013: 527 2014: 963 2015: 1758 2016 (till end of July): 1184Top Species studied by at least 100 datasets:2010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus 936 reported taxa in total
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Datasets are being reused more and more.
Data download volume for PRIDE in 2015: ~ 200 TB
Vaudel et al., Proteomics, 2016
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
14
OverviewGeneral introduction to ProteomeXchange
Overall submission statistics
Updated HPP guidelines
Specifics about MassIVE (Nuno)
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016HPP guidelines version 2.1
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016
16
CompletePartialComplete vs Partial submissions: processed resultsFor complete submissions, it is possible to connect the spectra with the identificationprocessed results and they can be visualized.
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Complete vs Partial submissions: experimental metadata
CompletePartialGeneral experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016An observer of ProteomeXchange consortium - iProX
Proteome data sharing platform in ChinaFocusingCollection and sharing of proteome experiment raw dataStandardized metadata of proteome experimentVisualization of proteome dataset
ProvidingA User friendly data submission pipelineStructured management of datasets An effective user authority systemStandardized metadata collectionPowerful computing, storage, and network resources to support the pipelineRemote data backup and synchronous updatewww.iprox.org
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016OverviewGeneral introduction to ProteomeXchange
Overall submission statistics
Updated HPP guidelines
Specifics about MassIVE (Nuno)
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016 MassIVE updateMingxun Wang1,2,4, Jeremy Carver1,4, Nuno Bandeira1-4
1Center for Computational Mass Spectrometry2Computer Science and Engineering3Skaggs School of Pharmacy and Pharmaceutical Sciences4University of California, San Diego
Center forComputationalMassSpectrometry
http://massive.ucsd.edu
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 201621
http://massive.ucsd.edu http://proteomics.ucsd.edu MassIVE InteractivityMassIVE = Mass spectrometry Interactive Virtual Environment
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Massive reanalysisCommunity knowledge requires reproducible, well-characterized results
MS-GF+ standard database searchReanalyzed 15 TB of Human data with ~185M MS/MS spectra79 million new FDR-controlled PSMs3.6 million modified versions of 2.8 million unique peptide sequences
CPTAC colon cancer available with 5 different results sets[Original] Imported CPTAC results: 6.9M PSMs[Reanalysis] MS-GF+ database search: 8.9M PSMs, 70k mod variants (169k total)[Reanalysis] Spectral library search (MSPLIT): 10M PSMs, including 387K mixture spectra[Reanalysis] Proteogenomics searches of TCGA transcriptomics sequences (Enosi): 6.8M total PSMs, 19,728 proteogenomic events[Reanalysis] Blind modification search (MODa): 7.8M PSMs, 2.8M PSMs for 221k mod variants (306k total), 203K new mod variants (unique modified peptides)
http://massive.ucsd.edu http://proteomics.ucsd.edu
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Massive: Do it yourselfMSGF+ - Database search engineMSPLIT Spectral Library Search EngineENOSI ProteoGenomic Search EngineMODa- Multi-blind modification database search engineSpectral Networks spectral alignment-based analysis and propagation of identificationsMulti-pass - MSPLIT, MSGFDB, MODa cascade Search WorkflowMSGFDB - Database search engineMSPLIT-DIA Spectral Library Search for SWATHUpload your own! (mzIdentML, mzTab, TSV)
http://massive.ucsd.edu http://proteomics.ucsd.edu
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Check what others think the spectrum is Massive Search
Find peptide, proteins, PTMsAgreement in spectrum identification?One-stop search across tens of millions of PSMs
OriginalReanalysishttp://massive.ucsd.edu http://proteomics.ucsd.edu
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016What can you do?How can the community work together to reveal the whole human proteome?Mass spectrometrists share DataAt least: partial submissions with raw mass spectrometry data and enough metadata to allow for reanalysisEspecially useful: rare tissues/conditions or very deep acquisition
Biologists share KnowledgeAt least: complete submissions with FDR-filtered results in open format (mzIdentML or mzTab)Especially useful: human-curated knowledge of proteins, PTMs, endogenous peptides, etc
Bioinformaticians share ReanalysesAt least: FDR-filtered results in open format (mzIdentML or mzTab)Especially useful: algorithms that identify new types of PSMs (e.g., PTM-specific, mixtures)http://massive.ucsd.edu http://proteomics.ucsd.edu
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 2016Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)
Yasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak
Former team members, especially: Rui Wang Florian Reisinger Noemi del Toro Jose A. Dianes Henning Hermjakob
Acknowledgements: The PRIDE Team and all PX partnersAll data submitters !!!
Eric DeutschZhi SunDavid CampbellNuno BandeiraMingxun WangJeremy CarverYasushi IshihamaShujiro OkudaShin Kawano
Follow new datasets @proteomexchange
Juan A. [email protected] 2016 World ConferenceTaipei, 20 September 201627
PXD identifierHits/ No files = dataset downloadsDataset Title
PXD00056146578/ 2383 = 20A draft map of the human proteome
PXD00158713435/140 = 96
DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics
PRD00006612748/4090 = 3
Quantitative Proteomics Analysis of the Secretory Pathway
PXD0006584004/460 = 9
Global phosphoproteomic profiling reveals distinct signatures in B-cell non-Hodgkin
PXD0001493781/598 = 6The potato tuber mitochondrial proteome
PXD00086512535/1368 = 9Mass spectrometry based draft of the human proteome