Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Developing Algorithms for the Determination ofRelative Peptide Abundances from LC/MS Data
Jake Marcus, Anne Eaton, Melanie Kanter, Arunima Ray
Mentors: Shawn Cokus, Matteo Pellegrini, Parag Mallick, Roland Luethy
Presented by: Arunima Ray, SUNY Geneseo
• Spielberg Family Center for Applied Proteomics (SFCAP)
– Aims to develop and use proteomic tools to facilitate patient management decisions
• Proteomics
– Large-scale study of proteins
– Proteins are the effectors of most biological functions
– The proteome is dynamic: • differs across cell types
• changes over time based on external and internal factors
Proteomics and SFCAP
• Need to quantify the differences in proteomes
Individualized treatment
Data generation: LC/MS
• Start with a sample of proteins
• Protein sizes are varied:
– Digest into fragments: peptides
• Separate by affinity to water
• Ionize
• Further separate by mass/charge
Affinity to water Mass/charge Peptide Protein
… … … …
246 725.4 K.ACSQRPR.W ADH
793 432.87 R.IGYADIK.W EPO
1075 5367.91 K.LGANAILK.W HB
Affinity to water Mass/charge Peptide Protein
… … … …
246 725.4 K.ACSQRPR.W ADH
793 432.87 R.IGYADIK.W EPO
1075 5367.91 K.LGANAILK.W HB
Aff
inity to w
ate
r
Mass/Charge
List of Identifications
This picture was generated by just
6 proteins !!
The data
Challenges
• Locate isotopes
• Identifications not centered
• Unknown spread along vertical axis
• Noise
In
ten
sit
y
Isotopes
Point of Identification
Step 1: find associated feature
• Use affinity and charge information from identification
• Mass of the most abundant isotope
• Consider a region large enough to include the entire feature of interest
Mass/Charge
Inte
nsity (
x10
5)
Point of Identification
Step 2: consolidate information from isotopes
• Isotopes form parallel curves
• Select relevant mass/charge values, extract corresponding data
Mass/Charge
Inte
nsity (
x10
5)
Step 2: consolidate information from isotopes
Mass/Charge
Inte
nsity
Step 3: limit along vertical axis (basic)
• Find highest peak
• Search along vertical axis until 4 out of 5 consecutive data points are below threshold
Affinity to water
Inte
nsity (
x10
5)
Threshold
Step 3: limit along vertical axis (fingerprinting)
• Carbon in nature:
– 98.93% in C-12 form
– 1.07% in C-13 form
• For a peptide of n carbons:
– Probability for all C-12 = 0.9893n
– Probability for 1 C-13, n-1 C-12 = n.(0.9893)n-1.(0.0107)
• Compute similarity score for intensities along mass/charge
• Graph for different affinity values
• Stop when scores below 10%
0
5
10
15
20
25
30
35
40
0 1 2 3 4 5
Exp
ecte
d p
erce
nta
ge
Number of C-13 atoms
Step 4: fit a curve and quantify
10 15 20 25 30 35 40 45 50 55 600
0.2
0.4
0.6
0.8
1
1.2
1.4Gamma, R2=0.98137
Affinity to water
Inte
nsi
ty (
scale
d)
Gamma, R2 = 0.9814Gamma, R2 = 0.98
• Need to model the distribution of intensities along vertical axis
• Gauss or gamma curve fit to data by nonlinear regression
Evaluation
• 5 protein mix – (~250 peptides)– Different amounts in
each sample
• 6 protein mix – (~150 peptides)– Same amount in every
sample
1x 2x 3x
1x 1x 1x
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
1 2 3 4 5 6 7
Modules
IQR
Final Version
Module combinations
Optimizing the algorithm
1.1
1.12
1.14
1.16
1.18
1.2
1.22
1.24
1.26
1.28
1 2 3 4 5 6 7
Modules
Ob
s/E
xp
Final Version
Module combinations
5 protein mix evaluations
6 protein mix evaluations
Outcomes
• Adaptable algorithm
• Modularity facilitates comparative analysis
• Addressed key challenges in quantification
• Investigated characteristics of features
• Attempt with real data
– The first (and only) run!
Acknowledgments
• Institute for Pure and Applied Mathematics
• Spielberg Family Center for Applied Proteomics
• My teammates Jake, Anne and Melanie
• Our awesome mentors Shawn, Matteo, Roland and Parag.