Ensemble Learning Model for MS/MS

Ensemble Learning Model to Estimate the Accuracy ofPeptide Identifications Made by MS/MS

Qiang Kou Shan Xiao Xiaohui Yao Zongliang Yue

qkou@umail.iu.edu

April 29, 2014

Qiang Kou (qkou@umail.iu.edu) Ensemble Learning 1/25 April 29, 2014 1 / 25

Background

Mass spectrometry has become the most widely used tool for thecharacterization of proteins

Many database searching softwares and algorithms have beendeveloped, including SEQUEST, MASCOT, X!tandem, InsPecT,MS-Align+

Scores always have significant overlap between correct and incorrectidentification

Background

-3.9 -2.3 -0.7 0.9 2.5 4.1 5.7 7.3

“Correct”

“Incorrect”

Descriminant Score (D)

From Brian C. SearleQiang Kou (qkou@umail.iu.edu) Ensemble Learning 3/25 April 29, 2014 3 / 25

Background

Available Software

PeptideProphet [1]

F (x1, x2, . . . , xn) = c0 +∑n

i=1 cixip(+|F ) = p(F |+)p(+)

p(F |+)p(+)+p(F |−)p(−)

Percolator [2, 3]

F (x) =∑

i wihi (x) + b, where hk(x) = tanh((wk)Tx + bk)sigmod loss function: L(F (x), y) = 1/exp(1 + F (x))

Background

Trans-Proteomic Pipeline

PeptideProphet

mzXML X!Tandem Percolator ProteinProphet Proteins

EnsembleLearning

Ensemble Learning

Homogeneous: learners from the same category

boostingbaggingrandom forest

Heterogeneous: learners from different categories

Ensemble Learning

Example of Ensemble Learning

Two real variables

Two random pseudo variables

Three methods: linear model,SVM and random forest

Ensemble Learning

Results of Three Methods

Ensemble Learning

Average of Three Methods

Ensemble Strategy

Non-negative Least Squares and Logistic Regression

Non-negative least squares regression

fe(X ) =k∑

αi fi (X ),∑

αi = 1, αi ≥ 0

Non-negative logistic regression

fe(X ) =1

1 + exp(−∑k

i αi fi (X )), αi ≥ 0

Ensemble Strategy

Greedy Strategy

1 Start with the empty ensemble;

2 Add the model which can maximize the ensemble’s classificationresult on the training dataset;

3 Repeat Step 2 for a fixed number of iterations;

4 Return the final ensemble.

Application in MS/MS

Available Features

Symbol Description

mass precursor neutral masstime retention time∆M mass difference#match numuber of matched ionspepLen peptide lengthcharge charge stateexp E-value#missed #missed cleavagesenzN if prceded by an enzymatic siteenzC if there is an enzymatic C-terminus#consistent #peptide termini consistent with cleavage#ions #fragment ions predicted for peptide#proteins #proteins containing peptideArg,. . . ,Val # each kind of amino acidHyperscore, Nextscore, BScore, YScore scoring functions in X!tandem

Weights in Regularized Generalized Linear Model

Description Weights Description Weights

#missed -1.923 Arg -1.321charge 1.246 Cys -1.062Lys -0.990 His 0.790Trp 0.726 #consistent -0.494Pro 0.407 Asp 0.388Met -0.369 Val 0.350bscore -0.347 Tyr 0.238#ions 0.210

Model Used

Algorithm Description R Package

glm linear model statsrandomForest random forest randomForestknn k-nearest neighbour statsglmnet elastic net glmnetsvm SVM e1071step stepwise glm stats

Training and Testing Dataset

Paola Picotti, et al. Nature 494:266-270, 2013 [4]

ROC Curves

False positive rate

0.0 0.2 0.4 0.6 0.8 1.0

Ensemble Learning 0.873Percolator 0.821PeptideProphet 0.789

Relation between FDR and Ensemble Score with LOESS

●●

●●●

●●

●●●

●●

●●●

●●●●●●●●0.00

0.00 0.25 0.50 0.75 1.00Ensemble Score

Relation between FDR and Ensemble Score with LOESS

●●

●●●

●●

●●●

●●

●●●

●●●●●●●●0.00

0.00 0.25 0.50 0.75 1.00Ensemble Score

Number of Correct/Incorrect Identifications with 0.05 FDR

PeptideProphet Percolator Ensemblemethods

ber variable

correct

incorrect

Some Conclusion

Ensemble learning methods often have better results

Very easy to over fit on training data

Time-consuming for model training

Some Conclusion

References

Andrew Keller, Alexey I Nesvizhskii, Eugene Kolker, and Ruedi Aebersold.Empirical statistical model to estimate the accuracy of peptide identifications made byMS/MS and database search.Analytical Chemistry, 74(20), 2002.

Lukas Kall, Jesse D Canterbury, Jason Weston, William Stafford Noble, and Michael JMacCoss.Semi-supervised learning for peptide identification from shotgun proteomics datasets.Nature Methods, 4(11), 2007.

Marina Spivak, Jason Weston, Lon Bottou, Lukas Kll, and William Stafford Noble.Improvements to the percolator algorithm for peptide identification from shotgunproteomics data sets.Journal of Proteome Research, 8(7):3737–3745, 2009.

Paola Picotti, Mathieu Clment-Ziza, Henry Lam, David S Campbell, Alexander Schmidt,Eric W Deutsch, Hannes Rst, Zhi Sun, Oliver Rinner, Lukas Reiter, Qin Shen, Jacob JMichaelson, Andreas Frei, Simon Alberti, Ulrike Kusebauch, Bernd Wollscheid, Robert LMoritz, Andreas Beyer, and Ruedi Aebersold.A complete mass-spectrometric map of the yeast proteome applied to quantitative traitanalysis.Nature, 494(7436), 2013.

Thank you

Thank you!http://qkou.info/sl.pdf

Ensemble Learning Model for MS/MS

Education

Virtual Sample Generation and Ensemble Learning Based

Deep ensemble learning of sparse regression models for ...adni.loni.usc.edu/adni-publications/Deep ensemble learning of sparse... · Meanwhile, deep representation learning has recently

MotherNets: Rapid Deep Ensemble Learning

Ensemble Learning for Load Forecasting

Ensemble Learning - textmining.biz: Text-Mining in

COMPETITIVE LEARNING NEURAL NETWORK ENSEMBLE …d-scholarship.pitt.edu/6478/1/YE_QIANG_Apr_2010... · COMPETITIVE LEARNING NEURAL NETWORK ENSEMBLE WEIGHTED BY PREDICTED PERFORMANCE

Ensemble deep learning: A review - arXiv

L25: Ensemble learning

Tutorial on Ensemble Learning - infochim.u-strasbg.frinfochim.u-strasbg.fr/CS3_2010/Tutorial/Ensemble/Ensem… · Web viewThis tutorial demonstrates performance of ensemble learning

Robust ensemble learning framework for day-ahead …geogroup.mcgill.ca/Research/Journals/39-Robust ensemble...Robust ensemble learning framework for day-ahead forecasting of household

L21 ensemble learning - Virginia Tech

Varnek A., Tutorial on Ensemble Learning - Unistrainfochim.u-strasbg.fr/new/CS3_2010/Tutorial/Ensemble/Ensemble... · Tutorial on Ensemble Learning 1 ... The following ensemble learning

Ensemble Selection in a Nutshell - GitHub Pages · Ensemble Selection is an ensemble learning method. The focus of the Ensemble Selection learning algorithm is combining a large set

Ensemble Deep Learning Models for Forecasting

Online Ensemble Learning

Bayesian Hyperparameter Optimization for Ensemble Learning

Learning Stochastic Closures Using Ensemble Kalman Inversion

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning Methods · 2019. 12. 7. · 1 Class #20: Ensemble Learning and Boosting Machine Learning (COMP 135): M. Allen, 13 Nov. 19 1 Ensemble Learning Methods}An ensemble

Machine Learning V05: Ensemble Methods - AIssays · Machine Learning V05: Ensemble Methods Meta learning Ensembles in practice AdaBoost ... Ensembles still win competitions, but Deep