Upload
judah
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Common parameters. At the beginning one need to set up the parameters. http://human.thegpm.org. Common parameters. Most important: the input experimental spectra Self-explaining. . Common parameters. Taxon, and database Self-explaining. - PowerPoint PPT Presentation
Citation preview
Common parameters
• At the beginning one need to set up the parameters.
• http://human.thegpm.org
Common parameters
• Most important: the input experimental spectra– Self-explaining.
Sequest Mascot X!Tandem
.DTA x X X
.RAW X
.MGF X X X
.PKL X X
.PKS X
.mzData X X X
.mzXML X X X
.mzML X
Common parameters
• Taxon, and database– Self-explaining.– E.g. samples form human cells should be queried
against human protein database.– Sometimes Protein Sequence libraries are
available.
Common parameters
• Parent mass tolerance• If it is much smaller than the optimal would be:
– the correct peptide can be eliminated from the search space
– Execution timedecreases 100 200 300 400 500 600 700 800 900 1000 1100
0
20
40
60
80
100
120
m/z
inte
nsity
(%)
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
Spectra comparison |)()(| tPMqPM
Common parameters
• Parent mass tolerance• If it is much bigger than the optimal would be:
– decreases the significance of the scores, – makes execution time longer
Spectra comparison
T hscore
A
B
probability distributionof random scores
probability distributionof correct scores
p-value of hit h
Freq
uenc
y
Common parameters
• Parent mass tolerance• Usually is around 1Da.
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
Spectra comparison |)()(| tPMqPM
Common parameters
• Fragment ion match tolerance– Depends on the instrument accuracy.– If it is mach small than
the optimum:matches will be lost
100%
0%1
0
Common parameters
• Fragment ion match tolerance– If it is much smaller than the optimal would be:
Correct matched peaks will be lost.Increases the FDR, increases the false negatives, decreases the sensitivity,
T hscore
Freq
uenc
y
A
B
probability distributionof random scores
probability distributionof correct scores
p-value of hit h
T hscore
A
B
probability distributionof random scores
probability distributionof correct scores
p-value of hit h
Freq
uenc
y
Common parameters
• If the fragment ion match tolerance is much bigger than the optimal would be:– Many theoretical peaks will match to an
experimental peak– Increases the random scores and it decreases the
statistical significance
Common parameters
Fragment ion tolerance (T)T = 0.4Da (correct) T = 0.05Da (too small) T = 2.0Da (too large)
Fragment ion tolerance (T)T = 0.4 (correct) T = 0.05 (too small) T = 2.0 (too large)
217 proteins 713 homologs930 proteins
132 proteins 406 homologs538 proteins
197 proteins 589 homologs786 proteins
Common parameters
• Instrument– Some database search software's allow you to
select the type of the instruments like ESI QUAD or Quad-TOF
– This fine-tunes the search engine according to which fragment ion series will be used for scoring.
– E.g.: Immonium ions, a series ions, b-, c-, x-, a-NH3,z+H series, y-H2O etc.
Common parameters
• Enzyme,– the enzyme used for enzymatic digestion in the
biological sample preparation.– This will be used for the in silico digestion of
protein sequences for peptide generation.
Common parameters
• E-value cut off
T hscore
A
B
probability distributionof random scores
probability distributionof correct scores
p-value of hit h
Freq
uenc
y
Common parameters
• Ion mass search type– Monoisotopic (default)
• More accurate,
– Average• Might need larger fragment ion tolerance,
Common parameters
• Charge state– Too high charge state increases the FDR.
T hscore
A
B
probability distributionof random scores
probability distributionof correct scores
p-value of hit h
Freq
uenc
y
Common parameters
• Decoy search– Includes reversed dataset in the peptide
identification.– Provides more accurate p-value and FDR estimation– Can double the search time
Common parameters• Error tolerant search. Large number of spectra
remain without significant score. Reasonable number of fragment ion peaks might have not match.– Underestimated mass measurement error (should be
seen in peptide view graphs,– Incorrect determination of precursor charge state– Peptide sequence is not in the database.– Missed cleavage & unexpected cleavage,– Unexpected chemical & post-translational
modification.
Scores:13. 156. 41. 49. 34. 33. 27. 211. 28. 110. 12. 15. 112. 1
Input dataExperimental Spectra
Protein sequence DB
Score: 4Peptide: AELDLNMTR
Score: 32Peptide: SHLITLLLFLFHSETICR
Score: 3Peptide: MEICRGLRScore: 15Peptide: LLHGDPGEEDKScore: 4Peptide: MDHPEDESHSEKScore: 5Peptide: SAEDLEADK
Score: 3Peptide: SIEAKLTLR
Input data Peptide assignment Validation
Protein inference
Quantitation
Interpretation
Cn=(32-4)/32=0.875
Cn=(4-4)/4=0
Cn=(3-3)/3=0
Cn=(15-4)/15=0.733
Keep the peptide assignment that exceeds a certain limit.
>IPI:IPI00000044.1|SWISS-PROT:P01127MNRTFGQVVARLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
Input dataExperimental Spectra
Scores:1. 2
Input data Peptide assignment
Validation Protein inference
Quantitation
Interpretation
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
Spectra comparison:
Protein sequence DB
TFGQVVAR FGQVVAR GQVVAR QVVAR VVAR VAR AR TFGQVVA TFGQVV TFGQV TFGQ TFG TF
Unexpected cleavages
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/zin
tens
ity (%
)
>IPI:IPI00000044.1|SWISS-PROT:P01127MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
Input dataExperimental Spectra
Scores:1. 2
Input data Peptide assignment
Validation Protein inference
Quantitation
Interpretation
Spectra comparison:
Protein sequence DB
Missed cleavages
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/zin
tens
ity (%
)
>IPI:IPI00000044.1|SWISS-PROT:P01127MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
Input dataExperimental Spectra
Scores:1.22.2
Input data Peptide assignment
Validation Protein inference
Quantitation
Interpretation
Spectra comparison:
Protein sequence DB
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
Missed cleavages
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/zin
tens
ity (%
)
>IPI:IPI00000044.1|SWISS-PROT:P01127MNRCWALFLSLCCYLRLVSAEGDPIPEELYEMLSDHSIRSFDDLQRLLHGDPGEEDKAELDLNMTRSHSGGELESLARGRRSLGSLTIAEPAMIAECKTRTEVFEISRRLIDRTNANFLVWPPCVEVQRCSGCCNNRNVQCRPTQVQLRPVQVRKIEIVRKKPIFKKATVTLEDHLACKCETVAAARPVTRSPGGSQEQRAKTPQTRVTIRTVRVRRPPKGKHRKFKHTHDKTALKETLGA
100 200 300 400 500 600 700 800 900 1000 11000
20
40
60
80
100
120
m/z
inte
nsity
(%)
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10y1 y2 y3 y4 y5 y6 y7 y8 y9 y10
Input dataExperimental Spectra
Scores:1.22.23.1
Input data Peptide assignment
Validation Protein inference
Quantitation
Interpretation
Spectra comparison:
Protein sequence DB
Missed cleavages
Common parameters
• Automatic error tolerant search.• Chemical and Post-Translational Modifications
(PTMs)• Fixed modification (simply modifies the mass of
the Amino Acid)• Variable modifications (can modify the mass)
• Search engines iteratively insert all combination of the possible PTMs.
Common parameters
• Automatic error tolerant search.
– more peptides can be indentified.– enlarges the search space much more
• Increases the execution time• Decreases the statistical significance, increases the
FDR.
Common parameters• Automatic error tolerant search.• In order to reduce the search space two pass
approach is applied.– 1st pass:
• Identification of perfect peptides (no PTMs, perfect digestion)
– 2nd pass:• Pass the proteins whose one of the peptides was identified
in the 1st pass.• Extensive search in the reduced protein sequence, including
missed and unexpected cleavage, PTMs, point mutations, etc.
Common parameters
• Output parameters– Mainly about formatting the results files. What
and how many details want to see.
Common parameters
• Other program specific parameters. • Different for X!tandem, Mascot, Sequest, etc.
X!Tandem
Outputs – Browsing the results
Outputs – Browsing the results
Outputs – Browsing the results
Outputs – Browsing the results
Outputs – Browsing the results
OMSSA’s search engine
OMSSA’s output
OMSSA’s result
• Good spectrum, good score, bad annotation– Rare if the p-value is significant
• Good spectrum, bad score, bad annotation– Peptide might be modified, non-perfect digestion,
not in the database.
• Bad spectrum, bad score, bad annotation
• Good spectrum, good score, good annotation
Trans-Proteomic Pipeline (TPP)• Trans-Proteomic Pipeline (TPP) is a data analysis
pipeline for the analysis of LC/MS/MS proteomics data. • TPP includes modules for validation of database search
results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results.
• The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines.
Trans-Proteomic Pipeline (TPP)
Summary
• Protein identification from MS/MS data is not a black box.
• Always look at the results and understand how it